登录
首页 » Others » raw

raw

于 2021-01-06 发布 文件大小:48345KB
0 340
下载积分: 1 下载次数: 5

代码说明:

  10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)

文件列表:

raw, 0 , 2019-02-10
raw\other, 0 , 2019-02-10
raw\other\zx, 0 , 2019-02-10
raw\other\zx\test.zhuxian.wordpos, 280885 , 2019-02-10
raw\other\zx\train.zhuxian.wordpos, 559793 , 2019-02-10
raw\other\zx\dev.zhuxian.wordpos, 166113 , 2019-02-10
raw\other\cnc, 0 , 2019-02-10
raw\other\cnc\dev.txt, 5581923 , 2019-02-10
raw\other\cnc\train.txt, 44824963 , 2019-02-10
raw\other\cnc\test.txt, 5571735 , 2019-02-10
raw\other\udc, 0 , 2019-02-10
raw\other\udc\dev.conll, 422116 , 2019-02-10
raw\other\udc\test.conll, 400684 , 2019-02-10
raw\other\udc\train.conll, 3282103 , 2019-02-10
raw\other\wtb, 0 , 2019-02-10
raw\other\wtb\dev.conll, 49336 , 2019-02-10
raw\other\wtb\test.conll, 49702 , 2019-02-10
raw\other\wtb\train.conll, 393054 , 2019-02-10
raw\other\sxu, 0 , 2019-02-10
raw\other\sxu\train.txt, 3600697 , 2019-02-10
raw\other\sxu\test.txt, 776035 , 2019-02-10
raw\other\ctb, 0 , 2019-02-10
raw\other\ctb\ctb6.dev.seg, 300375 , 2019-02-10
raw\other\ctb\ctb6.train.seg, 4030528 , 2019-02-10
raw\other\ctb\ctb6.test.seg, 312025 , 2019-02-10
raw\sighan2005, 0 , 2019-02-10
raw\sighan2005\cityu_test_gold.utf8, 239427 , 2019-02-10
raw\sighan2005\msr_training.utf8, 16804586 , 2019-02-10
raw\sighan2005\cityu_training.utf8, 8499903 , 2019-02-10
raw\sighan2005\as_test_gold.utf8, 711891 , 2019-02-10
raw\sighan2005\pku_test_gold.utf8, 716386 , 2019-02-10
raw\sighan2005\as_training.utf8, 30558193 , 2019-02-10
raw\sighan2005\msr_test_gold.utf8, 762801 , 2019-02-10
raw\sighan2005\pku_training.utf8, 7709182 , 2019-02-10

下载说明:请别用迅雷下载,失败请重下,重下不扣分!

发表评论

0 个回复

  • LS-SVMlab1.5aw
    一种基于matlab的支持向量机小例子,用于预测,LS(A matlab-based support vector machine small example for the prediction, LS)
    2009-03-13 09:49:36下载
    积分:1
  • ACWPS
    词是最小的能够独立活动的有意义的语言成分。 但汉语是以字为基本的书写单位,词语之间没有明显的区分标记,因此,中文词语分析是中文信息处理的基础与关键。(The word is the smallest independent activities meaningful language component. But Chinese is the word as the basic unit of writing, there is no obvious mark of distinction between the words, so Chinese word analysis is the foundation of Chinese information processing and critical.)
    2013-04-03 10:22:22下载
    积分:1
  • raw
    10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)
    2021-01-06 11:48:53下载
    积分:1
  • lucene
    java中lucene的源代码,用于文本分类的一个很好的工具,是由一个著名的语言研究者编写的(lucene code for java)
    2009-03-30 17:28:22下载
    积分:1
  • word_split
    这个一个基于逆向最大匹配的分词程序,语料规模比较小。(The maximum matching based on the reverse of the sub-term process, relatively small-scale corpus.)
    2009-06-22 15:04:24下载
    积分:1
  • ViewPage
    联系人拖动后动态显示滑动到的拼音的首字母(Dynamic display after the first letter of the sliding contact to drag Pinyin)
    2014-01-11 18:14:24下载
    积分:1
  • Leza
    it s a good code for troias project
    2009-06-04 06:50:59下载
    积分:1
  • pipe
    这可是全球著名IT公司ILog的APS高级排产优化引擎,就连SAP、Oracle等ERP中的物料需求计划与生产计划算法都来源于ILog。我研究了好久,中间的性线求解算法可真谓难呀。(This is the world s leading IT companies ILog the APS Senior Scheduling optimization engine, and even SAP, Oracle and other ERP s MRP and production planning algorithm are derived from the ILog. I have studied for a long time, Central and line algorithm that can be really difficult for me.)
    2008-04-27 23:08:23下载
    积分:1
  • icajade
    ICA分解的优化算法——JADE法 - Dinga s Blog(ICA decomposition of the optimization algorithm- JADE Act- Dinga s Blog)
    2008-03-26 12:55:52下载
    积分:1
  • Reader
    在中文分词之前,要对文档进行读取,本代码是实现了从磁盘读取的任务。(In the Chinese word prior to reading the document, the code is read from the disk to achieve the task.)
    2013-09-10 11:09:28下载
    积分:1
  • 696518资源总数
  • 106155会员总数
  • 8今日下载