Using Web Archive for Hacking and Bug Hunting
by Hasagasuo - Monday February 17, 2025 at 08:36 PM
#1
This guide will explain how to use the Wayback Machine (web.archive.org) to find sensitive files, old endpoints, and other potentially valuable information for security testing.

1. Extracting Archived URLs with Web Archive The Wayback Machine stores historical snapshots of websites, including pages and files that might no longer be publicly accessible. By querying its index, you can retrieve archived URLs for a specific domain.
Querying Web Archive via API
Use the following cURL command to extract archived URLs for a target domain:
curl -G "https://web.archive.org/cdx/search/cdx" \
--data-urlencode "url=*.example.com/*" \
--data-urlencode "collapse=urlkey" \
--data-urlencode "output=text" \
--data-urlencode "fl=original" > out.txt
  • url=*.example.com/*
    fetches all subdomains and paths of the target domain.
  • collapse=urlkey
    ensures only unique URLs are retrieved.
  • output=text
    outputs raw URLs.
  • fl=original
    extracts only the original URLs.
  • The result is saved into
    out.txt.

Example:
example$ curl -G "https://web.archive.org/cdx/search/cdx" \
--data-urlencode "url=*.deepseek.com./*" \
--data-urlencode "collapse=urlkey" \
--data-urlencode "output=text" \
--data-urlencode "fl=original" > out.txt
  % Total    % Received % Xferd  Average Speed  Time    Time    Time  Current
                                Dload  Upload  Total  Spent    Left  Speed
100  99k    0  99k    0    0  37056      0 --:--:--  0:00:02 --:--:-- 37049
                                                                                                                                                                           
example$ cat out.txt                                     
http://www.deepseek.com:80/
https://www.deepseek.com/%0A
https://www.deepseek.com/%0A%0A
https://www.deepseek.com/&quot
https://www.deepseek.com/?gad_source=1
https://www.deepseek.com/?ref=ducttapemarketing
https://www.deepseek.com/?ref=localhost
https://www.deepseek.com/?ref=prompt.cn
https://www.deepseek.com/?ref=testingcatalog.com
https://www.deepseek.com/?utm_source=www.theautomated.co&utm_medium=referral&utm_campaign=china-s-new-ai-challenger-deepseek-r1
https://www.deepseek.com/?utm_source=www.aidrop.news&utm_medium=referral&utm_campaign=figure-bmw-a-frota-de-robos-autonomos
https://www.deepseek.com/?utm_medium=referral&utm_campaign=topic-10-inside-deepseek-models&utm_source=www.turingpost.com
https://www.deepseek.com/?utm_source=tap4-ai&utm_medium=referral
https://www.deepseek.com/?utm_source=chatgpt.com
https://www.deepseek.com/?utm_source=www.turingpost.com
https://www.deepseek.com/%5Cn
https://www.deepseek.com/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo.png&w=828&q=75%201x,%20/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo.png&
https://www.deepseek.com/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo.png&w=828&q=75%201x,%20/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo.png&w=1920&q=75%202x
https://www.deepseek.com/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo.png&w=1920&q=75
https://www.deepseek.com/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo.png&w=828&q=75
https://www.deepseek.com/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo_2.png%3Fv%3D1&w=1080&q=75
https://www.deepseek.com/_next/image?url=https%3A%2F%2Fcdn.deepseek.com%2Flogo_2.png%3Fv%3D1&w=640&q=75
https://www.deepseek.com/_next/image%3Furl%3Dhttps://cdn.deepseek.com/logo.png%26w%3D1920%26q%3D75
<SNIP>

Once the URLs are retrieved, you can process them further to find potentially interesting files.



2. Searching for Sensitive Files After extracting the URLs, use the following command to filter out files that might contain sensitive information:
cat out.txt | grep -E '.(xls|xml|xlsx|json|pdf|sql|doc|docx|pptx|txt|zip|tar.gz|tgz|bak|7z|rar|log|cache|secret|db|backup|yml|yaml|md|md5|exe|dll|bin|ini|bat|sh|tar|deb|rpm|iso|img|apk|msi|dmg|tmp|crt|pem|key|pub|asc)'
This filters common sensitive file types such as databases, configuration files, scripts, certificates, documents, backups, and logs.

3. Gathering Additional Intelligence To enhance reconnaissance, you can use additional sources such as VirusTotal and AlienVault OTX to gather information about a domain.
VirusTotal API Lookup
VirusTotal allows querying a domain to check for known malware, open ports, and related URLs:
curl -s "[url=https://www.virustotal.com/vtapi/v2/domain/report?apikey=YOUR_API_KEY&domain=example.com]https://www.virustotal.com/vtapi/v2/domain/report?apikey=YOUR_API_KEY&domain=example.com[/url]"
Replace
example.com
with your target domain. This requires an API key from VirusTotal (free tier available).
AlienVault OTX Intelligence
AlienVault OTX provides intelligence on domains, including linked URLs, indicators of compromise (IoCs), and threat reports:
curl -s "[url=https://otx.alienvault.com/api/v1/indicators/hostname/example.com/url_list?limit=500&page=1]https://otx.alienvault.com/api/v1/indicators/hostname/example.com/url_list?limit=500&page=1[/url]"
Replace
example.com
with your target domain. This retrieves up to 500 URLs associated with the domain.

4. Applying These Techniques
Use Cases in Bug Hunting and Security Research
  • Finding forgotten endpoints that companies have removed but left traces in Web Archive.
  • Extracting old exposed files such as backups, config files, logs, and documentation.
  • Checking for sensitive information leaks, including publicly visible API keys, credentials, or database connection strings in old site versions.
  • Enumerating subdomains and endpoints to expand the attack surface.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [FREE] Wi-Fi hacking manual rizee 97 1,840 1 hour ago
Last Post: Asura0
  ⚡BEST HACKING TOOLS ⚡ ssrf 1,204 60,989 11 hours ago
Last Post: T0tu
  Why do u upload SQL, CSV, Json etc without putting in Archive? AntiBrok3rs 5 1,844 08-07-2025, 05:30 PM
Last Post: Laxusx
  Best websites to start hacking ssrf 1,776 95,765 08-07-2025, 05:17 PM
Last Post: Laxusx
  ⭐️ Utlimate Hacking Tool Location, Camera, Audio KingJulien 163 6,347 07-30-2025, 05:15 PM
Last Post: pana78

Forum Jump:


 Users browsing this thread: 1 Guest(s)