登录
首页 » Others » raw

raw

于 2021-01-06 发布 文件大小:48345KB
0 382
下载积分: 1 下载次数: 5

代码说明:

  10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)

文件列表:

raw, 0 , 2019-02-10
raw\other, 0 , 2019-02-10
raw\other\zx, 0 , 2019-02-10
raw\other\zx\test.zhuxian.wordpos, 280885 , 2019-02-10
raw\other\zx\train.zhuxian.wordpos, 559793 , 2019-02-10
raw\other\zx\dev.zhuxian.wordpos, 166113 , 2019-02-10
raw\other\cnc, 0 , 2019-02-10
raw\other\cnc\dev.txt, 5581923 , 2019-02-10
raw\other\cnc\train.txt, 44824963 , 2019-02-10
raw\other\cnc\test.txt, 5571735 , 2019-02-10
raw\other\udc, 0 , 2019-02-10
raw\other\udc\dev.conll, 422116 , 2019-02-10
raw\other\udc\test.conll, 400684 , 2019-02-10
raw\other\udc\train.conll, 3282103 , 2019-02-10
raw\other\wtb, 0 , 2019-02-10
raw\other\wtb\dev.conll, 49336 , 2019-02-10
raw\other\wtb\test.conll, 49702 , 2019-02-10
raw\other\wtb\train.conll, 393054 , 2019-02-10
raw\other\sxu, 0 , 2019-02-10
raw\other\sxu\train.txt, 3600697 , 2019-02-10
raw\other\sxu\test.txt, 776035 , 2019-02-10
raw\other\ctb, 0 , 2019-02-10
raw\other\ctb\ctb6.dev.seg, 300375 , 2019-02-10
raw\other\ctb\ctb6.train.seg, 4030528 , 2019-02-10
raw\other\ctb\ctb6.test.seg, 312025 , 2019-02-10
raw\sighan2005, 0 , 2019-02-10
raw\sighan2005\cityu_test_gold.utf8, 239427 , 2019-02-10
raw\sighan2005\msr_training.utf8, 16804586 , 2019-02-10
raw\sighan2005\cityu_training.utf8, 8499903 , 2019-02-10
raw\sighan2005\as_test_gold.utf8, 711891 , 2019-02-10
raw\sighan2005\pku_test_gold.utf8, 716386 , 2019-02-10
raw\sighan2005\as_training.utf8, 30558193 , 2019-02-10
raw\sighan2005\msr_test_gold.utf8, 762801 , 2019-02-10
raw\sighan2005\pku_training.utf8, 7709182 , 2019-02-10

下载说明:请别用迅雷下载,失败请重下,重下不扣分!

发表评论

0 个回复

  • hanziinput
    实现按照拼音输入汉字; 功能详尽,有使用例程; (Realized in accordance with the Pinyin input Chinese characters Features detailed, there is the use of routine )
    2014-09-15 16:04:59下载
    积分:1
  • ACWPS
    词是最小的能够独立活动的有意义的语言成分。 但汉语是以字为基本的书写单位,词语之间没有明显的区分标记,因此,中文词语分析是中文信息处理的基础与关键。(The word is the smallest independent activities meaningful language component. But Chinese is the word as the basic unit of writing, there is no obvious mark of distinction between the words, so Chinese word analysis is the foundation of Chinese information processing and critical.)
    2013-04-03 10:22:22下载
    积分:1
  • HanLP-master
    NamedEntityRecognition github
    2018-01-31 01:47:04下载
    积分:1
  • tranditionized
    中文简繁转换 GreenBrowser/TheWorld2.0插件(Tranditional Chinese Script Conversion GreenBrowser/TheWorld2.0 Plug-in)
    2010-02-24 19:20:05下载
    积分:1
  • 4305685
    应用中文分词源码程序,结合易语言模块彗星HTTP应用模块.ec,实现中文分词的效果。(Application of Chinese Word source program, combined with easy language module Comet HTTP application modules .ec, realize the effect of the Chinese word .)
    2017-01-11 23:13:31下载
    积分:1
  • 12
    说明:  全新图片防盗链全能后台版 for PW5.X 正式版(GBK、BIG5、UTF8一起发) 说明: 1、所有参数均可后台设置,没有任何功能限制。 2、支持完全防盗链和当天有效两种模式,禁止盗链时显示设定的图片。 3、允许自定义允许链接的域名,自定义防盗链图片地址。(The new version of the background image anti-hotlinking Almighty for PW5.X official version (GBK, BIG5, UTF8 hair together): 1, all parameters can be set back, without any functional limitations. 2, supports full security chain and effective the same day in two modes, the display setting of the pictures is prohibited hotlinking. 3, allows custom links allows domain name, custom anti-hotlinking image address. )
    2016-06-29 21:59:33下载
    积分:1
  • zhijiehanhua
    Directly tool which sinicizes the software
    2010-07-10 20:00:59下载
    积分:1
  • ViewPage
    联系人拖动后动态显示滑动到的拼音的首字母(Dynamic display after the first letter of the sliding contact to drag Pinyin)
    2014-01-11 18:14:24下载
    积分:1
  • ViewPage
    联系人拖动后动态显示滑动到的拼音的首字母(Dynamic display after the first letter of the sliding contact to drag Pinyin)
    2014-01-11 18:14:24下载
    积分:1
  • ChineseSegment
    根据输入的中文词来进行检索~检索出用户想要的内容(ChineseSegment)
    2009-09-11 21:39:14下载
    积分:1
  • 696516资源总数
  • 106648会员总数
  • 8今日下载