ABSTRACT
As World Wide Web (WWW) based Internet services become more popular, information overload also becomes a pressing research problem. Difficulties with searching on the Internet get worse as the amount of information that is available increases.
A new approach to build an intelligent personal spider (agent), which is based on automatic textual analysis of Internet documents, is proposed. These personal spiders are able to dynamically and intelligently analyze the contents of the users' selected homepages as the starting point to search for the most relevant homepages based on the links and indexing.
It is straightforward to define an evaluation function that is a mathematical formulation of the user request and to define a steady state algorithm. Querying standard search engine performs the creation of individuals.
1. Introduction:
To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago.
1.1 What is a search engine?
A Web search engine is a tool designed to search for information on the World Wide Web. The search results are usually presented in a list and are commonly called hits. Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks:
• They search the Internet -- or select pieces of the Internet -- based on important words.
• They keep an index of the words they find, and where they find them.
• They allow users to look for words or combinations of words found in that index.
A top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.
1.2 How search engine works?
To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages.
How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.
Words occurring in the title, subtitles, Meta tags and other positions of relative importance were noted for special consideration during a subsequent user search.
Download Full Project Report
As World Wide Web (WWW) based Internet services become more popular, information overload also becomes a pressing research problem. Difficulties with searching on the Internet get worse as the amount of information that is available increases.
A new approach to build an intelligent personal spider (agent), which is based on automatic textual analysis of Internet documents, is proposed. These personal spiders are able to dynamically and intelligently analyze the contents of the users' selected homepages as the starting point to search for the most relevant homepages based on the links and indexing.
It is straightforward to define an evaluation function that is a mathematical formulation of the user request and to define a steady state algorithm. Querying standard search engine performs the creation of individuals.
1. Introduction:
To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago.
1.1 What is a search engine?
A Web search engine is a tool designed to search for information on the World Wide Web. The search results are usually presented in a list and are commonly called hits. Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks:
• They search the Internet -- or select pieces of the Internet -- based on important words.
• They keep an index of the words they find, and where they find them.
• They allow users to look for words or combinations of words found in that index.
A top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.
1.2 How search engine works?
To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages.
How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.
Words occurring in the title, subtitles, Meta tags and other positions of relative importance were noted for special consideration during a subsequent user search.
Download Full Project Report
0 comments:
Post a Comment