01-24-2025, 05:54 AM
Google is a top-tier resource for open-source intelligence (OSINT). By leveraging “advanced operators” (commonly called “Google Dorking”), investigators can pinpoint specific file types, hidden directories, vulnerable pages, or other unique data. This thread will cover:
1. Advanced Google Operators
Google supports numerous advanced operators that refine searches:
2. Google Dorking Use Cases
3. Complete Python Script: Google Custom Search JSON API
Because automated scraping of Google results is disallowed by Google’s ToS, we’ll use the official Custom Search JSON API. Follow these steps:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
AUTOMATION: Advanced Google Searches (a.k.a. Google Dorking) for OSINT
Author: YourName
Date: 2025-01-24
Description:
This script demonstrates how to use the Google Custom Search JSON API
to automate advanced Google queries (a.k.a. Google Dorking). It
retrieves and displays summarized search results for further OSINT
investigations.
Disclaimer:
This script is for EDUCATIONAL and LEGAL OSINT usage ONLY. Use responsibly.
"""
import requests
import sys
import time
# ------------------------------------
# CONFIGURATION
# ------------------------------------
API_KEY = "YOUR_GOOGLE_API_KEY" # Replace with your actual Google API key
SEARCH_ENGINE_ID = "YOUR_CSE_ID" # Replace with your Custom Search Engine ID
RESULTS_PER_PAGE = 5 # Maximum is 10 for free tier
MAX_PAGES = 2 # Increase carefully, respecting rate-limits
SLEEP_BETWEEN_QUERIES = 2 # Delay between queries (in seconds) to avoid overloading or hitting rate-limits
def build_query_string(advanced_operators):
"""
Creates a single string from the list of advanced operators or keywords.
Example advanced_operators list: ["site:example.com", 'filetype:pdf', '"internal use only"']
"""
return " ".join(advanced_operators)
def google_search(query, start_index=1):
"""
Performs a single Google Custom Search API call with the provided query string
and start index. Returns JSON if successful, None otherwise.
"""
url = "https://www.googleapis.com/customsearch/v1"
params = {
"key": API_KEY,
"cx": SEARCH_ENGINE_ID,
"q": query,
"start": start_index,
"num": RESULTS_PER_PAGE
}
try:
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as http_err:
print(f"[!] HTTP error: {http_err}")
except Exception as e:
print(f"[!] Error: {e}")
return None
def parse_results(data):
"""
Extracts and returns search results from the JSON data.
Each result includes title, link, and a snippet if available.
"""
if "items" not in data:
return []
items = data["items"]
results_list = []
for item in items:
title = item.get("title", "No Title")
link = item.get("link", "No Link")
snippet = item.get("snippet", "No Snippet")
results_list.append({
"title": title,
"link": link,
"snippet": snippet
})
return results_list
def main():
"""
Main function. Demonstrates how to gather advanced operators as input,
build queries, and page through results.
"""
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <advanced operator 1> [<advanced operator 2> ...]")
print(f"Example: {sys.argv[0]} site:example.com filetype:pdf \"internal use only\"")
sys.exit(1)
# Collect advanced operators from command-line arguments
advanced_ops = sys.argv[1:]
query = build_query_string(advanced_ops)
print(f"
[*]Using query: {query}")
if API_KEY == "YOUR_GOOGLE_API_KEY" or SEARCH_ENGINE_ID == "YOUR_CSE_ID":
print("[!] Please configure your API_KEY and SEARCH_ENGINE_ID in the script.")
sys.exit(1)
all_results = []
# Google results start indexes at 1 and increments by 'num' for pagination
current_start = 1
for page in range(1, MAX_PAGES + 1):
print(f"\n[+] Fetching Page {page} ...")
data = google_search(query, start_index=current_start)
if not data:
print("[!] No data returned or error encountered.")
break
results_on_page = parse_results(data)
if not results_on_page:
print("[!] No more items found.")
break
all_results.extend(results_on_page)
current_start += RESULTS_PER_PAGE
# If we have fewer results than we requested, we likely reached the end
if len(results_on_page) < RESULTS_PER_PAGE:
break
time.sleep(SLEEP_BETWEEN_QUERIES)
# Display results
print("\n--- Search Results ---\n")
if not all_results:
print("[!] No results found.")
else:
for idx, item in enumerate(all_results, start=1):
print(f"{idx}. Title : {item['title']}")
print(f" Link : {item['link']}")
print(f" Snippet: {item['snippet']}\n")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("\n[!] Interrupted by user.")
sys.exit(1)
except Exception as e:
print(f"[!] An unexpected error occurred: {e}")
sys.exit(1)
Script Highlights
python google_dorking.py site:example.com filetype:pdf "internal use only"
... it constructs the advanced query:
site:example.com filetype:pdf "internal use only"
[*]Pagination
Usage Tips & Best Practices
- Advanced Google Operators
- Google Dorking Use Cases
- A Complete Python Script that utilizes the Google Custom Search JSON API
1. Advanced Google Operators
Google supports numerous advanced operators that refine searches:
- site:
- Restrict results to a specific domain.
- e.g.
site:example.com login
- Restrict results to a specific domain.
- inurl:
- Search for URLs containing a specific keyword.
- e.g.
inurl:admin
- Search for URLs containing a specific keyword.
- intitle:
- Search for web pages with a specific keyword in the title.
- e.g.
intitle:"index of" confidential
- Search for web pages with a specific keyword in the title.
- filetype:
- Search for specific file extensions.
- e.g.
filetype:pdf "internal use only"
- Search for specific file extensions.
- -
(minus sign)- Exclude certain terms or domains from results.
- e.g.
camera filetype:pdf -site:gov
- Exclude certain terms or domains from results.
- OR
- Combine multiple queries, retrieving results that match either.
- e.g.
(intitle:"index of" OR inurl:"/backup/")
- Combine multiple queries, retrieving results that match either.
2. Google Dorking Use Cases
- Finding Exposed Admin Panels
- e.g.
inurl:adminlogin.asp OR inurl:admin_login.php site:mytarget.com
- e.g.
- Locating Sensitive Documents
- e.g.
site:mytarget.com filetype:xls "budget"
- e.g.
- Searching for Publicly Accessible Cameras
- e.g.
intitle:"WebcamXP" "Select a camera" -site:youtube.com
- e.g.
- Uncovering Backup Directories
- e.g.
intitle:"index of" (backup OR admin_backup)
- e.g.
3. Complete Python Script: Google Custom Search JSON API
Because automated scraping of Google results is disallowed by Google’s ToS, we’ll use the official Custom Search JSON API. Follow these steps:
- Create a Custom Search Engine (CSE)
- Go to the Custom Search Engine page.
- Click “Add” to create a new search engine.
- In the “Sites to search” field, you can enter any domain if you want to limit your results. Alternatively, you can allow searching the entire web by using
*.com
or other placeholders.
- You’ll get a
search engine ID
(also called
cx
).
- Go to the Custom Search Engine page.
- Enable the API & Get an API Key
- Go to the Google Cloud Console.
- Create or select an existing project.
- Enable the Custom Search API.
- Generate an API key.
- Go to the Google Cloud Console.
- Install Python Dependencies
- We’ll use the
requests
library (standard for Python HTTP calls).
- We’ll use the
- ==================================================================
- ================================================================== ==================================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
AUTOMATION: Advanced Google Searches (a.k.a. Google Dorking) for OSINT
Author: YourName
Date: 2025-01-24
Description:
This script demonstrates how to use the Google Custom Search JSON API
to automate advanced Google queries (a.k.a. Google Dorking). It
retrieves and displays summarized search results for further OSINT
investigations.
Disclaimer:
This script is for EDUCATIONAL and LEGAL OSINT usage ONLY. Use responsibly.
"""
import requests
import sys
import time
# ------------------------------------
# CONFIGURATION
# ------------------------------------
API_KEY = "YOUR_GOOGLE_API_KEY" # Replace with your actual Google API key
SEARCH_ENGINE_ID = "YOUR_CSE_ID" # Replace with your Custom Search Engine ID
RESULTS_PER_PAGE = 5 # Maximum is 10 for free tier
MAX_PAGES = 2 # Increase carefully, respecting rate-limits
SLEEP_BETWEEN_QUERIES = 2 # Delay between queries (in seconds) to avoid overloading or hitting rate-limits
def build_query_string(advanced_operators):
"""
Creates a single string from the list of advanced operators or keywords.
Example advanced_operators list: ["site:example.com", 'filetype:pdf', '"internal use only"']
"""
return " ".join(advanced_operators)
def google_search(query, start_index=1):
"""
Performs a single Google Custom Search API call with the provided query string
and start index. Returns JSON if successful, None otherwise.
"""
url = "https://www.googleapis.com/customsearch/v1"
params = {
"key": API_KEY,
"cx": SEARCH_ENGINE_ID,
"q": query,
"start": start_index,
"num": RESULTS_PER_PAGE
}
try:
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as http_err:
print(f"[!] HTTP error: {http_err}")
except Exception as e:
print(f"[!] Error: {e}")
return None
def parse_results(data):
"""
Extracts and returns search results from the JSON data.
Each result includes title, link, and a snippet if available.
"""
if "items" not in data:
return []
items = data["items"]
results_list = []
for item in items:
title = item.get("title", "No Title")
link = item.get("link", "No Link")
snippet = item.get("snippet", "No Snippet")
results_list.append({
"title": title,
"link": link,
"snippet": snippet
})
return results_list
def main():
"""
Main function. Demonstrates how to gather advanced operators as input,
build queries, and page through results.
"""
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <advanced operator 1> [<advanced operator 2> ...]")
print(f"Example: {sys.argv[0]} site:example.com filetype:pdf \"internal use only\"")
sys.exit(1)
# Collect advanced operators from command-line arguments
advanced_ops = sys.argv[1:]
query = build_query_string(advanced_ops)
print(f"
[*]Using query: {query}")
if API_KEY == "YOUR_GOOGLE_API_KEY" or SEARCH_ENGINE_ID == "YOUR_CSE_ID":
print("[!] Please configure your API_KEY and SEARCH_ENGINE_ID in the script.")
sys.exit(1)
all_results = []
# Google results start indexes at 1 and increments by 'num' for pagination
current_start = 1
for page in range(1, MAX_PAGES + 1):
print(f"\n[+] Fetching Page {page} ...")
data = google_search(query, start_index=current_start)
if not data:
print("[!] No data returned or error encountered.")
break
results_on_page = parse_results(data)
if not results_on_page:
print("[!] No more items found.")
break
all_results.extend(results_on_page)
current_start += RESULTS_PER_PAGE
# If we have fewer results than we requested, we likely reached the end
if len(results_on_page) < RESULTS_PER_PAGE:
break
time.sleep(SLEEP_BETWEEN_QUERIES)
# Display results
print("\n--- Search Results ---\n")
if not all_results:
print("[!] No results found.")
else:
for idx, item in enumerate(all_results, start=1):
print(f"{idx}. Title : {item['title']}")
print(f" Link : {item['link']}")
print(f" Snippet: {item['snippet']}\n")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("\n[!] Interrupted by user.")
sys.exit(1)
except Exception as e:
print(f"[!] An unexpected error occurred: {e}")
sys.exit(1)
- ==================================================================
- ================================================================== ==================================================================
Script Highlights
- API-Key & Search Engine ID
- You must set
API_KEY
and
SEARCH_ENGINE_ID
to your valid credentials from Google’s Custom Search Engine.
- You must set
- Advanced Operators
- The script simply combines command-line arguments into a single query string. So if you run:
- The script simply combines command-line arguments into a single query string. So if you run:
python google_dorking.py site:example.com filetype:pdf "internal use only"
... it constructs the advanced query:
site:example.com filetype:pdf "internal use only"
[*]Pagination
- MAX_PAGES
sets how many pages you’ll fetch. Each page returns up to
RESULTS_PER_PAGE
items.
- Keep in mind free-tier limits and the daily quota for Google API usage.
- Each result is displayed with Title, Link, and Snippet.
- Customize output or store them in a database if you need further analysis.
- While the script uses the official API, be mindful of your search queries and usage volume.
- Some advanced queries can surface private or sensitive information. Only proceed if you have proper authorization or if it’s open-source (public) data.
Usage Tips & Best Practices
- Modularize Queries
- Rather than a single query, you can loop over multiple domain names or multiple sets of advanced operators for a thorough search.
- Rather than a single query, you can loop over multiple domain names or multiple sets of advanced operators for a thorough search.
- Avoid Unnecessary Data
- Use negative operators (e.g.,
-site:gov
or
-filetype:pdf
) to exclude irrelevant or known noisy results.
- Use negative operators (e.g.,
- Obey Rate Limits & Quota
- Google enforces daily quotas on API usage. If you run large-scale queries, you may need a paid plan.
- Google enforces daily quotas on API usage. If you run large-scale queries, you may need a paid plan.
- Respect Privacy & Ownership
- Discovered files