▍1. department
实现动态登录,部门文档库,有链接跳转,搜索未实现(Dynamic login, departmental document libraries, there are links to jump, search unrealized)
实现动态登录,部门文档库,有链接跳转,搜索未实现(Dynamic login, departmental document libraries, there are links to jump, search unrealized)
实现动态登录,部门文档库,有链接跳转,搜索未实现(Dynamic login, departmental document libraries, there are links to jump, search unrealized)
新浪微博爬虫程序,小蜜蜂,新浪微博爬虫程序,小蜜蜂(Sina micro-blog crawler, small bee,Sina micro-blog crawler, small bee)
新浪微博爬虫程序,小蜜蜂,新浪微博爬虫程序,小蜜蜂(Sina micro-blog crawler, small bee,Sina micro-blog crawler, small bee)
新浪博客,CSDN博客,腾讯空间的简单的爬虫系统源码,java版。(blog.sina.com,csdn, qzone, spider java source)
对于建立的索引进行检索,检索关键词,可以对两个关键词进行联合检索。并将检索结果打印出来。(For the establishment of an index search, retrieval keyword, you can search for two words joint. And search results printed out.)
搜索硬盘上所有指定文件并可选择清理 可以搜索硬盘上所有带"._"前缀的文件,并可根据自己的选择进行清理(Specified file search on your hard disk and select cleanup can search the hard disk with _ " prefix documents, and according to their own choice to clean up)
这是一个搜索引擎的安装包,很简单,自己学吧哈(This is a search engine installation package is simple own now)
让一个地址簿,包括ID、名称、 电话号码、地址和公司。 可以插入、显示、搜索和删除。(make an address book,include ID,Name, Phone number,Address and company number. can Insert,Display,Search and Delete.)
网络爬虫抓取页面的链接,利用递归可以从该链接到达的页面继续获取链接(Web crawler recursively crawls pages link)
it is a great file used for searching information
爬虫,简单语句,爬虫快速 支持单线程操作,容易改正,并有注释(Reptiles, simple statement, reptiles fast support single-threaded operation, easy to correct, and Notes)
使用网络爬虫技术实现自动查找指定网页上的新闻链接(Using web crawler technology automatically find links to news on a given page)
基于百度的网络爬虫,一个简单的小程序,实现从百度中爬出某个搜索的检索结果(a simple crawler based on baidu,get the result of a query from baidu)
Lucene搜索引擎,C#开发实例代码,包括引擎开发接口和一个页面小程序(Lucene search engine, C# development example code, including engine development program interface and a small page)
主要应用领域: • 垂直搜索(Vertical Search):也称为专业搜索,高速、海量和精确抓取是定题网络爬虫DataScraper的强项,每天24小时每周7天无人值守自主调度的周期性批量采集,加上断点续传和软件看门狗(Watch Dog),确保您高枕无忧 • 移动互联网:手机搜索、手机混搭(mashup)、移动社交网络、移动电子商务都离不开结构化的数据内容,DataScraper实时高效地 采集内容,输出富含语义元数据的XML格式的抓取结果文件,确保自动化的数据集成和加工,跨越小尺寸屏幕展现和高精准信息检索的障碍。手机互联网不是 Web的子集而是全部,由MetaSeeker架设桥梁 • 企业竞争情报采集/数据挖掘:俗称商业智能(Business Intelligence),噪音信息滤除、结构化转换,确保数据的准确性和时效性,独有的广域分布式架构,赋予DataScraper无与伦比的情报采 集渗透能力,AJAX/Javascript动态页面、服务器动态网页、静态页面、各种鉴权认证机制,一视同仁。在微博网站数据采集和舆情监测领域远远领 先其它产品。(The main application areas: • Vertical Search (Vertical Search): also known as professional search, speed, mass and precision is the SDI Web crawler to crawl the strengths DataScraper 24 hours a day 7 days a week periodic unattended batch capture self-scheduling, Canada and software watchdog on the HTTP (Watch Dog), make sure you sit back and relax • Mobile Internet: mobile search, mobile mashups (mashup), mobile social networking, mobile commerce are inseparable from the structure of the data content, DataScraper efficiently capture real-time content, the output is rich semantic metadata XML format for the capture outcome document, to ensure that automated data integration and processing, across the small size screen display and high precision information retrieval obstacles. Mobile Internet is not a subset of Web but all, by building bridges MetaSeeker • Competitive intelligence gathering/data mining: commonly known as Business Intelligence (Business Intelli)
说明: nutch开发自己的搜索引擎 视频教程 简单 环境搭建(nutch own yourself search engine)
说明: 搜索引擎及Web智能的经典书籍,很多该方向导师推荐的信息检索必读书目。(Classic book on search and web intelligence, which is recommended by a lot of Prof. on information intelligence.)