AUTOMATION: Advanced Google Searches (a.k.a. Google Dorking) for OSINT
by darkhorse10123 - Friday January 24, 2025 at 05:54 AM
#1
Google is a top-tier resource for open-source intelligence (OSINT). By leveraging “advanced operators” (commonly called “Google Dorking”), investigators can pinpoint specific file types, hidden directories, vulnerable pages, or other unique data. This thread will cover:
  1. Advanced Google Operators
  2. Google Dorking Use Cases
  3. A Complete Python Script that utilizes the Google Custom Search JSON API
Disclaimer: This information and script are intended for ethical, legal OSINT research. Be aware that directly scraping or automating queries on Google might violate certain Terms of Service if done improperly or excessively. Use the official JSON API to stay on the safe side of policy and law.

1. Advanced Google Operators
Google supports numerous advanced operators that refine searches:
  1. site:
    • Restrict results to a specific domain.
    • e.g.
      site:example.com login
  2. inurl:
    • Search for URLs containing a specific keyword.
    • e.g.
      inurl:admin
  3. intitle:
    • Search for web pages with a specific keyword in the title.
    • e.g.
      intitle:"index of" confidential
  4. filetype:
    • Search for specific file extensions.
    • e.g.
      filetype:pdf "internal use only"
  5. -
    (minus sign)
    • Exclude certain terms or domains from results.
    • e.g.
      camera filetype:pdf -site:gov
  6. OR
    • Combine multiple queries, retrieving results that match either.
    • e.g.
      (intitle:"index of" OR inurl:"/backup/")
These operators, combined, let you craft extremely focused searches.

2. Google Dorking Use Cases
  1. Finding Exposed Admin Panels
    • e.g.
      inurl:adminlogin.asp OR inurl:admin_login.php site:mytarget.com
  2. Locating Sensitive Documents
    • e.g.
      site:mytarget.com filetype:xls "budget"
  3. Searching for Publicly Accessible Cameras
    • e.g.
      intitle:"WebcamXP" "Select a camera" -site:youtube.com
  4. Uncovering Backup Directories
    • e.g.
      intitle:"index of" (backup OR admin_backup)
Note: Use caution and respect the boundaries of your target’s Terms of Service if you have permission. Unauthorized access can be illegal.

3. Complete Python Script: Google Custom Search JSON API
Because automated scraping of Google results is disallowed by Google’s ToS, we’ll use the official Custom Search JSON API. Follow these steps:
  1. Create a Custom Search Engine (CSE)
    • Go to the Custom Search Engine page.
    • Click “Add” to create a new search engine.
    • In the “Sites to search” field, you can enter any domain if you want to limit your results. Alternatively, you can allow searching the entire web by using
      *.com
      or other placeholders.
    • You’ll get a
      search engine ID
      (also called
      cx
      ).
  2. Enable the API & Get an API Key
    • Go to the Google Cloud Console.
    • Create or select an existing project.
    • Enable the Custom Search API.
    • Generate an API key.
  3. Install Python Dependencies
    • We’ll use the
      requests
      library (standard for Python HTTP calls).

Below is the complete Python script which queries the Custom Search Engine with advanced operators:

  1.          ==================================================================
  1.          ==================================================================         ==================================================================

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
AUTOMATION: Advanced Google Searches (a.k.a. Google Dorking) for OSINT
Author: YourName
Date: 2025-01-24

Description:
    This script demonstrates how to use the Google Custom Search JSON API
    to automate advanced Google queries (a.k.a. Google Dorking). It
    retrieves and displays summarized search results for further OSINT
    investigations.

Disclaimer:
    This script is for EDUCATIONAL and LEGAL OSINT usage ONLY. Use responsibly.
"""

import requests
import sys
import time

# ------------------------------------
# CONFIGURATION
# ------------------------------------
API_KEY = "YOUR_GOOGLE_API_KEY"    # Replace with your actual Google API key
SEARCH_ENGINE_ID = "YOUR_CSE_ID"    # Replace with your Custom Search Engine ID
RESULTS_PER_PAGE = 5                # Maximum is 10 for free tier
MAX_PAGES = 2                      # Increase carefully, respecting rate-limits
SLEEP_BETWEEN_QUERIES = 2          # Delay between queries (in seconds) to avoid overloading or hitting rate-limits

def build_query_string(advanced_operators):
    """
    Creates a single string from the list of advanced operators or keywords.
    Example advanced_operators list: ["site:example.com", 'filetype:pdf', '"internal use only"']
    """
    return " ".join(advanced_operators)

def google_search(query, start_index=1):
    """
    Performs a single Google Custom Search API call with the provided query string
    and start index. Returns JSON if successful, None otherwise.
    """
    url = "https://www.googleapis.com/customsearch/v1"
    params = {
        "key": API_KEY,
        "cx": SEARCH_ENGINE_ID,
        "q": query,
        "start": start_index,
        "num": RESULTS_PER_PAGE
    }
    try:
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as http_err:
        print(f"[!] HTTP error: {http_err}")
    except Exception as e:
        print(f"[!] Error: {e}")
    return None

def parse_results(data):
    """
    Extracts and returns search results from the JSON data.
    Each result includes title, link, and a snippet if available.
    """
    if "items" not in data:
        return []
    items = data["items"]
    results_list = []
    for item in items:
        title = item.get("title", "No Title")
        link = item.get("link", "No Link")
        snippet = item.get("snippet", "No Snippet")
        results_list.append({
            "title": title,
            "link": link,
            "snippet": snippet
        })
    return results_list

def main():
    """
    Main function. Demonstrates how to gather advanced operators as input,
    build queries, and page through results.
    """
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <advanced operator 1> [<advanced operator 2> ...]")
        print(f"Example: {sys.argv[0]} site:example.com filetype:pdf \"internal use only\"")
        sys.exit(1)

    # Collect advanced operators from command-line arguments
    advanced_ops = sys.argv[1:]
    query = build_query_string(advanced_ops)
    print(f"
[*]Using query: {query}")
    if API_KEY == "YOUR_GOOGLE_API_KEY" or SEARCH_ENGINE_ID == "YOUR_CSE_ID":
        print("[!] Please configure your API_KEY and SEARCH_ENGINE_ID in the script.")
        sys.exit(1)
    all_results = []
    # Google results start indexes at 1 and increments by 'num' for pagination
    current_start = 1
    for page in range(1, MAX_PAGES + 1):
        print(f"\n[+] Fetching Page {page} ...")
        data = google_search(query, start_index=current_start)
        if not data:
            print("[!] No data returned or error encountered.")
            break
        results_on_page = parse_results(data)
        if not results_on_page:
            print("[!] No more items found.")
            break
        all_results.extend(results_on_page)
        current_start += RESULTS_PER_PAGE
        # If we have fewer results than we requested, we likely reached the end
        if len(results_on_page) < RESULTS_PER_PAGE:
            break
        time.sleep(SLEEP_BETWEEN_QUERIES)
    # Display results
    print("\n--- Search Results ---\n")
    if not all_results:
        print("[!] No results found.")
    else:
        for idx, item in enumerate(all_results, start=1):
            print(f"{idx}. Title : {item['title']}")
            print(f"  Link  : {item['link']}")
            print(f"  Snippet: {item['snippet']}\n")
if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print("\n[!] Interrupted by user.")
        sys.exit(1)
    except Exception as e:
        print(f"[!] An unexpected error occurred: {e}")
        sys.exit(1)



  1.          ==================================================================
  1.          ==================================================================             ==================================================================

Script Highlights
  1. API-Key & Search Engine ID
    • You must set
      API_KEY
      and
      SEARCH_ENGINE_ID
      to your valid credentials from Google’s Custom Search Engine.
  2. Advanced Operators
    • The script simply combines command-line arguments into a single query string. So if you run:


                            python google_dorking.py site:example.com filetype:pdf "internal use only"


... it constructs the advanced query:
site:example.com filetype:pdf "internal use only"
[*]Pagination
  • MAX_PAGES
    sets how many pages you’ll fetch. Each page returns up to
    RESULTS_PER_PAGE
    items.
  • Keep in mind free-tier limits and the daily quota for Google API usage.
[*]Result Handling
  • Each result is displayed with Title, Link, and Snippet.
  • Customize output or store them in a database if you need further analysis.
[*]Legal & Ethical Notes
  • While the script uses the official API, be mindful of your search queries and usage volume.
  • Some advanced queries can surface private or sensitive information. Only proceed if you have proper authorization or if it’s open-source (public) data.

Usage Tips & Best Practices
  1. Modularize Queries
    • Rather than a single query, you can loop over multiple domain names or multiple sets of advanced operators for a thorough search.
  2. Avoid Unnecessary Data
    • Use negative operators (e.g.,
      -site:gov
      or
      -filetype:pdf
      ) to exclude irrelevant or known noisy results.
  3. Obey Rate Limits & Quota
    • Google enforces daily quotas on API usage. If you run large-scale queries, you may need a paid plan.
  4. Respect Privacy & Ownership
    • Discovered files