登录
首页 » Others » raw

raw

于 2021-01-06 发布 文件大小:48345KB
0 358
下载积分: 1 下载次数: 5

代码说明:

  10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)

文件列表:

raw, 0 , 2019-02-10
raw\other, 0 , 2019-02-10
raw\other\zx, 0 , 2019-02-10
raw\other\zx\test.zhuxian.wordpos, 280885 , 2019-02-10
raw\other\zx\train.zhuxian.wordpos, 559793 , 2019-02-10
raw\other\zx\dev.zhuxian.wordpos, 166113 , 2019-02-10
raw\other\cnc, 0 , 2019-02-10
raw\other\cnc\dev.txt, 5581923 , 2019-02-10
raw\other\cnc\train.txt, 44824963 , 2019-02-10
raw\other\cnc\test.txt, 5571735 , 2019-02-10
raw\other\udc, 0 , 2019-02-10
raw\other\udc\dev.conll, 422116 , 2019-02-10
raw\other\udc\test.conll, 400684 , 2019-02-10
raw\other\udc\train.conll, 3282103 , 2019-02-10
raw\other\wtb, 0 , 2019-02-10
raw\other\wtb\dev.conll, 49336 , 2019-02-10
raw\other\wtb\test.conll, 49702 , 2019-02-10
raw\other\wtb\train.conll, 393054 , 2019-02-10
raw\other\sxu, 0 , 2019-02-10
raw\other\sxu\train.txt, 3600697 , 2019-02-10
raw\other\sxu\test.txt, 776035 , 2019-02-10
raw\other\ctb, 0 , 2019-02-10
raw\other\ctb\ctb6.dev.seg, 300375 , 2019-02-10
raw\other\ctb\ctb6.train.seg, 4030528 , 2019-02-10
raw\other\ctb\ctb6.test.seg, 312025 , 2019-02-10
raw\sighan2005, 0 , 2019-02-10
raw\sighan2005\cityu_test_gold.utf8, 239427 , 2019-02-10
raw\sighan2005\msr_training.utf8, 16804586 , 2019-02-10
raw\sighan2005\cityu_training.utf8, 8499903 , 2019-02-10
raw\sighan2005\as_test_gold.utf8, 711891 , 2019-02-10
raw\sighan2005\pku_test_gold.utf8, 716386 , 2019-02-10
raw\sighan2005\as_training.utf8, 30558193 , 2019-02-10
raw\sighan2005\msr_test_gold.utf8, 762801 , 2019-02-10
raw\sighan2005\pku_training.utf8, 7709182 , 2019-02-10

下载说明:请别用迅雷下载,失败请重下,重下不扣分!

发表评论

0 个回复

  • word_split
    这个一个基于逆向最大匹配的分词程序,语料规模比较小。(The maximum matching based on the reverse of the sub-term process, relatively small-scale corpus.)
    2009-06-22 15:04:24下载
    积分:1
  • wordsegmentation
    一种基于自动机的分词方法,可进行中文分词及统计(Based method of automatic machine word)
    2011-09-21 11:38:57下载
    积分:1
  • ViewPage
    联系人拖动后动态显示滑动到的拼音的首字母(Dynamic display after the first letter of the sliding contact to drag Pinyin)
    2014-01-11 18:14:24下载
    积分:1
  • MultiLanguage
    程序实现多国语言的动态切换解决方案(procedures for multi-language dynamic switching solutions)
    2004-09-22 16:50:36下载
    积分:1
  • ChineseSegment
    根据输入的中文词来进行检索~检索出用户想要的内容(ChineseSegment)
    2009-09-11 21:39:14下载
    积分:1
  • HMM
    说明:  HMM中文分词,基于隐马尔科夫模型 。需要进行训练后试验(Word Segmentation Based on Hidden Markov Model)
    2019-04-07 11:58:08下载
    积分:1
  • Leza
    it s a good code for troias project
    2009-06-04 06:50:59下载
    积分:1
  • multi_channel_model
    说明:  多径信道模型,包括快衰落,慢衰落,频率选择性衰落,平坦衰落,多普勒信道等(multipath channel model, including fast fading, slow fading, frequency selective fading, flat fading, etc)
    2021-04-05 16:59:04下载
    积分:1
  • Chinese-WordCut
    这是一个中文分词程序,读入一个Txt文档,可以对里面的段落进行分词(This is a Chinese word segmentation program that reads a Txt document segmentation paragraphs inside)
    2012-11-18 17:44:16下载
    积分:1
  • 12
    说明:  全新图片防盗链全能后台版 for PW5.X 正式版(GBK、BIG5、UTF8一起发) 说明: 1、所有参数均可后台设置,没有任何功能限制。 2、支持完全防盗链和当天有效两种模式,禁止盗链时显示设定的图片。 3、允许自定义允许链接的域名,自定义防盗链图片地址。(The new version of the background image anti-hotlinking Almighty for PW5.X official version (GBK, BIG5, UTF8 hair together): 1, all parameters can be set back, without any functional limitations. 2, supports full security chain and effective the same day in two modes, the display setting of the pictures is prohibited hotlinking. 3, allows custom links allows domain name, custom anti-hotlinking image address. )
    2016-06-29 21:59:33下载
    积分:1
  • 696516资源总数
  • 106446会员总数
  • 9今日下载