Computational Web Intelligence: Intelligent Technology for Web Applications

Chapter 20: Content and Link Structure Analysis for Searching the Web

Kemal Efe, Vijay Raghavan, and Arun Lakhotia
Center for Advanced Computer Studies
University of Louisiana, Lafayette LA 70504
E-mail: {kefe@cacs.louisiana.edu, raghavan@cacs.louisiana.edu, alakhotia@cacs.louisiana.edu }

Overview

Finding relevant pages in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches. Link analysis methods can substantially improve the search quality when they are combined with content analysis. This chapter surveys the mainstream work in this area.

20.1 Introduction

Automated search engines continuously discover, index, and store information about web pages. When a user issues a query, this repository is searched to find a result set of most relevant pages. An ideal search scheme must satisfy two basic requirements: high recall, and high precision. Recall measures the ability of an algorithm to find as many relevant pages as possible. Precision measures the ability of an algorithm to reject as many nonrelevant pages as possible. An ideal search algorithm should find all of the relevant pages, rank them by relevance to the user query, and present a rank-ordered result to the user.

The earlier generations of search engines relied solely on keyword matching to perform the search. Unfortunately this approach didn t work very well. Too many nonrelevant pages were returned along with relevant ones, and their rankings rarely agreed with users interests. Since user queries are short, usually consist of 2 3 words [Jansen et. al. (1998)], the problems associated with synonyny and polysemy make it particularly difficult to evaluate...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Search Engine Software
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.