网站首页

登录

搜索引擎

于 2022-05-07 发布文件大小:2.47 MB

 0  176

下载积分: 2 下载次数: 1

我要下载

代码说明：

爬虫模块、预处理模块和搜索模块。搜索引擎的三个部分是相互独立的，三个部分分别工作，主要的关系体现在前一部分得到的数据结果为后一部分提供原始数据。用户通过浏览器提交查询的词或者短语 P，搜索引擎根据用户的查询返回匹配的网页信息列表 L；上述过程涉及到两个问题，如何匹配用户的查询以及网页信息列表从何而来，根据什么而排序？用户的查询 P 经过分词器被切割成小词组并被剔除停用词 ( 的、了、啊等字 )，根据系统维护的一个倒排索引可以查询某个词 pi 在哪些网页中出现过，匹配那些都出现的网页集即可作为初始结果，更进一步，返回的初始网页集通过计算与查询词的相关度从而得到网页排名，即 Page Rank，按照网页的排名顺序即可得到最终的网页列表；假设分词器和网页排名的计算公式都是既定的，那么倒排索引以及原始网页集从何而来？原始网页集在之前的数据流程的介绍中，可以得知是由爬虫 spider 爬取网页并且保存在本地的，而倒排索引，即词组到网页的映射表是建立在正排索引的基础上的，后者是分析了网页的内容并对其内容进行分词后，得到的网页到词组的映射表，将正排索引倒置即可得到倒排索引；网页的分析具体做什么呢？由于爬虫收集来的原始网页中包含很多信息，比如 html 表单以及一些垃圾信息比如广告，网页分析去除这些信息，并抽取其中的正文信息作为后续的基础数据。

下载说明：请别用迅雷下载，失败请重下，重下不扣分！

发表评论

0 个回复

选择法排序原理：一次选定数组中的每一个数，记下当前位置并假设它是从当前位置开始后面数中的最小数min=i，从这个数的下一个数开始扫描直到最后一个数，并记录下最小...
选择法排序原理：一次选定数组中的每一个数，记下当前位置并假设它是从当前位置开始后面数中的最小数min=i，从这个数的下一个数开始扫描直到最后一个数，并记录下最小数的位置min，扫描结束后如果min不等于i，说明假设错误，则交换min与i位置上数。-Selection sort works: each time the selected number of the array, note the current position and assume that it is behind a number starting from the current location of the minimum number min = i, from this number the number of start scanning the next one until the last one the number of , and record the location of the minimum number of min, after scanning, if min is not equal to i, instructions assume that the error, then the exchange of a few min and i position.

2022-08-07 23:09:54下载

积分：1
解析字符串，对字符串协议的字段分别解析出，存在map中
解析字符串，对字符串协议的字段分别解析出，存在map中-Parsing a string, the string field, respectively, to resolve the agreement that exists in the map

2022-02-04 07:37:51下载

积分：1
A game engine NumenGameEngine. Can be used for the development of a simple 3D ga...
一个游戏引擎NumenGameEngine。可以用于开发简单的3D游戏。做的很不错-A game engine NumenGameEngine. Can be used for the development of a simple 3D game. To do a very good

2022-03-06 12:02:52下载

积分：1
explains the technique of finding permutations and provides source code for the...
explains the technique of finding permutations and provides source code for the recursive implementation

2022-01-25 23:12:27下载

积分：1
学校管理信息系统
2008 R2 用 Visual C# 作为编程语言，开发在 Visual studio 中的学校管理系统，Microsoft Access 用于数据库。

2022-06-01 22:13:10下载

积分：1
程序能够实现黑白棋子交替落子，判断输赢，重新开始游戏
程序能够实现黑白棋子交替落子，判断输赢，重新开始游戏-Sub-programs can be achieved Reversi alternate Lazi, judge winners and losers, to re-start the game

2022-09-04 05:20:03下载

积分：1
软件自动升级
软件自动升级程序-automatic software upgrade

2022-02-16 08:58:50下载

积分：1
目录纸用语言源代码。拉链
文件列目录 TURBOC语言源代码.zip-directory paper out TURBOC2.0 language source code. Zip

2022-03-23 15:29:28下载

积分：1
C++ Builder V5.0 AutoCAD VCL源码
c++builder v5.0 AutoCAD vcl源代码

2023-03-18 12:30:03下载

积分：1
DSP库LIB
这是stm32f1xx 的DSP官网库，安装完就可以用。 This user manual describes the STM32F10x DSP (digital signal processing) library, which is a suite of common digital signal processing functions: a suite of common digital signal processing functions: ● PID controller ● Fast Fourier transform ● FIR and IIR filters The library contains C and assembly functions. The assembly code is ported on ARM®, GCC and IAR Systems™ assemblers.

2022-08-18 11:37:11下载

积分：1

696516资源总数
106605会员总数
12今日下载