登录
首页 » Others » raw

raw

于 2021-01-06 发布
0 450
下载积分: 1 下载次数: 19

代码说明:

说明:  10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)

文件列表:

raw, 0 , 2019-02-10
raw\other, 0 , 2019-02-10
raw\other\zx, 0 , 2019-02-10
raw\other\zx\test.zhuxian.wordpos, 280885 , 2019-02-10
raw\other\zx\train.zhuxian.wordpos, 559793 , 2019-02-10
raw\other\zx\dev.zhuxian.wordpos, 166113 , 2019-02-10
raw\other\cnc, 0 , 2019-02-10
raw\other\cnc\dev.txt, 5581923 , 2019-02-10
raw\other\cnc\train.txt, 44824963 , 2019-02-10
raw\other\cnc\test.txt, 5571735 , 2019-02-10
raw\other\udc, 0 , 2019-02-10
raw\other\udc\dev.conll, 422116 , 2019-02-10
raw\other\udc\test.conll, 400684 , 2019-02-10
raw\other\udc\train.conll, 3282103 , 2019-02-10
raw\other\wtb, 0 , 2019-02-10
raw\other\wtb\dev.conll, 49336 , 2019-02-10
raw\other\wtb\test.conll, 49702 , 2019-02-10
raw\other\wtb\train.conll, 393054 , 2019-02-10
raw\other\sxu, 0 , 2019-02-10
raw\other\sxu\train.txt, 3600697 , 2019-02-10
raw\other\sxu\test.txt, 776035 , 2019-02-10
raw\other\ctb, 0 , 2019-02-10
raw\other\ctb\ctb6.dev.seg, 300375 , 2019-02-10
raw\other\ctb\ctb6.train.seg, 4030528 , 2019-02-10
raw\other\ctb\ctb6.test.seg, 312025 , 2019-02-10
raw\sighan2005, 0 , 2019-02-10
raw\sighan2005\cityu_test_gold.utf8, 239427 , 2019-02-10
raw\sighan2005\msr_training.utf8, 16804586 , 2019-02-10
raw\sighan2005\cityu_training.utf8, 8499903 , 2019-02-10
raw\sighan2005\as_test_gold.utf8, 711891 , 2019-02-10
raw\sighan2005\pku_test_gold.utf8, 716386 , 2019-02-10
raw\sighan2005\as_training.utf8, 30558193 , 2019-02-10
raw\sighan2005\msr_test_gold.utf8, 762801 , 2019-02-10
raw\sighan2005\pku_training.utf8, 7709182 , 2019-02-10

下载说明:请别用迅雷下载,失败请重下,重下不扣分!

发表评论

0 个回复

  • ictclas
    中科院的分词系统ictclas源码,自由开放的源码,很好(ictclas code)
    2009-11-21 11:05:05下载
    积分:1
  • txtLine
    Vb 读取文本数据,每次一行一行显示,以及对文本字符串的分割。(read text data, each party and his party, and the text string segmentation.)
    2006-11-28 17:04:41下载
    积分:1
  • HanLP-master
    NamedEntityRecognition github
    2018-01-31 01:47:04下载
    积分:1
  • multi_channel_model
    说明:  多径信道模型,包括快衰落,慢衰落,频率选择性衰落,平坦衰落,多普勒信道等(multipath channel model, including fast fading, slow fading, frequency selective fading, flat fading, etc)
    2021-04-05 16:59:04下载
    积分:1
  • luyfSearch2.0.tar
    一个中文分词开发包,可以用到搜索引擎的开发当中,比较好用。(A Chinese word segmentation development kit, you can use search engine in development and are relatively easy to use.)
    2009-11-05 10:09:53下载
    积分:1
  • ppp
    说明:  各种去电离层相位污染算法的比较,文章提到了各种不同的算法以及不同算法的性能比较包括PWVD 最大熵法,相位分段多项式法等(Comparison and anyalysis of ionospheric phase decontamination methods for backscattered signals)
    2009-08-14 12:51:39下载
    积分:1
  • raw
    10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)
    2021-01-06 11:48:53下载
    积分:1
  • VisualC
    在Visual C~(++)中使用Unicode编程,世界上有数百种用计算机指定一个数字,来储存字母或其他字符的编码系统。(In Visual C ~(++) use Unicode programming, there are hundreds of the world, with a number assigned to the computer to store letters or other characters in the coding system.)
    2010-09-03 11:47:29下载
    积分:1
  • Leza
    it s a good code for troias project
    2009-06-04 06:50:59下载
    积分:1
  • ChineseSegment
    根据输入的中文词来进行检索~检索出用户想要的内容(ChineseSegment)
    2009-09-11 21:39:14下载
    积分:1
  • 696516资源总数
  • 106648会员总数
  • 8今日下载