Information mining and retrieval
Project Domain / Category
Search Engine actually is an information retrieval system that helps users to find information stored on computer system or systems. The search results commonly known as “hits” are presented in the form of a list to the users. The current search engines like Google, Yahoo and MSN hits millions of records against a single query. Among these millions of records, it’s very difficult and time consuming for the users to find the relevant information. These search engines search information based on key words mentioned in the query. Date sensitive search engines have the capability to give priority to the dates mentioned in the query. It will consider only dates mentioned inside the text (page contents) and not the date on which page is updated, created or published. Student should be very careful while crawling web for creating indexes and plucking of dates from contents that mentioned in the contents because they may have different formats. There is need to handle all form of dates and dates references and convert them to ISO standard (ISO8601) like YYY – MM – DD. Student need to maintain a list or local data base for storing dates and offsets at which they occur in document.
Students are required to select/specify a particular dataset to test and evaluate their project.
Main modules and their functions: This project have the following basic modules:
- Web Crawler:
Web search engines work by storing information about many web pages, which they retrieve from the html itself. These pages are retrieved by a Web crawler which is an automated Web browser which follows every link on the site. The contents of each page are then analyzed to determine how it should be indexed.
- Front end for query processing and their results:
The front-end presents a search bar for users and the query processor parses the request and executes the search. The results are displayed by the front-end.
- Data base:
- Maintaining a list or database for storing dates specific information.
- Data about web pages are stored in an index database for use in later queries. The purpose of an index is to allow information to be found as quickly as possible.
The following tools can be used for developing the above project.
- Microsoft.Net, SQL Server 2. Java, SQL Server/MySQL