Google Hacking for Penetration Testers

In a relatively short time, Google has become one of the largest collections of information in the world certainly one of the largest freely available on the Internet. Outside the corporate anomaly and considering its founders and go-to-market strategy, it is nothing short of amazing that this Internet search powerhouse has become the de facto standard for searching the Internet for desired information. That said, Google's collected information has become more sought after than the proprietary Web-crawling algorithms, massive storage techniques, or information retrieval system that seems to offer up the requested search information in mere nanoseconds.
Similar to nearly all other high-technology industries, the niche information security industry continues to assimilate advanced algorithms for the quick determination of more accurate information. Expert systems, artificial intelligence, dynamic database-driven applications, and profiling are four of the overarching initiatives that are currently driving the security applications to the next level of automated computation.
Numerous mechanisms exist for collecting information from Google's online index of Web sites. Throughout this chapter, we discuss multiple methods for retrieving information from Google's database, including an overview of Google's API and manual Web page scraping. Manual Web page scraping is the technique of pulling out desired information from a returned Web page after a query is sent. These page-scraping techniques are quickly gaining in popularity and are currently being utilized in a number of security, information-gathering, and other gimmick search engines. Although the underlying algorithm is nearly identical, the particular implementations of the search algorithm are quite...