Google Hacking for Penetration Testers

This book has already given a very good picture of exactly what can be found just in the content. But it's important to also understand the constraints of search engine hacking. Certainly using a search engine will find targets of opportunity, but when you're talking about actually doing a concerted test on a target system, you need to understand that anything you turn up using a search engine is just the tip of the iceberg. To put this in graphical terms, Figure B.1 displays the subset of vulnerabilities that are exposed to Google.
First, not all sites are crawled by Google. That's hard to believe, but remember that for every public Web application any sizable company has (and has submitted to Google to crawl), many others are either not on the Web at all or are not public Web sites. These could include the strictly internal Web applications within a company or extranets that are external facing but meant for an extremely limited audience.
Even of the sites Google does crawl, not all of each site will be crawled. Google can only follow linked pages, and it doesn't do any guessing at filenames or follow clues to other files. Not even all linked files are followed; certainly those linked with HTML links are, but JavaScript links might not necessarily be followed, and pages that can only be found via a form submission won't be...