原始数据划分为train.dat和test.dat
代码说明:
应用背景开始是做movielense的数据,可以把原始数据任意划分为train.dat和test.dat,主要是为了做验证实验。非常简单明了,适合初学者看看,如果不喜欢,请轻喷。关键技术# -*- coding: cp936 -*- from sklearn import cross_validation c = [] filename = r"Raw.data" #原始数据 out_train = open(r"train.txt","w") #训练集 out_test = open(r"test.txt","w") #测试集 for line in open(filename): items = line.strip().split(",") c.append(items) c_train,c_test = cross_validation.train_test_split(c,test_size=0.1)#size =你需要的比例 for i in c_train: out_train.write(",".join(i)+" ") for i in c_test: out_test.write(",".join(i)+" ")
下载说明:请别用迅雷下载,失败请重下,重下不扣分!


