Google Hacking Database (GHDB): What It Is and How to Use It Defensively
Inhaltsübersicht
- What is the Google Hacking Database?
- Who is Johnny Long and how did the GHDB start?
- Where is the GHDB hosted today?
- The 14 GHDB categories explained
- How attackers use the GHDB
- How defenders should use the GHDB
- Building a defensive GHDB workflow
- GHDB and ethical considerations
- Limitations of the GHDB in 2026
- FAQs
- Einpacken
If you‘ve ever stumbled across a Google dork in a write-up and wondered where the author found it, the answer is almost always the same: the Google Hacking Database.
The GHDB is the canonical, public library of Google search queries that surface exposed information online. It‘s been running since 2004, currently hosts thousands of catalogued dorks across 14 categories, and underpins more reconnaissance work, by attackers and defenders alike, than any other single resource on the internet.
This article is for the question that comes after that: what is the GHDB, how is it structured, and how should your security team actually use it?
What is the Google Hacking Database?
The Google Hacking Database (GHDB) is a public, searchable repository of Google dorks — advanced search queries that use Google‘s operators to find sensitive information indexed by the search engine.
Each entry in the database includes the search string itself, the category of information it surfaces (think login portals, exposed databases, sensitive files), a short description, and the date it was submitted. Together, the entries form a working library of every documented way a Google search has ever been used to find something it probably shouldn‘t.
The GHDB isn‘t a hacking tool in the strict sense. It runs no code. It exploits nothing. It‘s a catalogue — but it‘s a catalogue of human error at internet scale, which is why it remains one of the most powerful resources in the OSINT toolkit.
Who is Johnny Long and how did the GHDB start?
The GHDB was created in 2004 by security researcher Johnny Long, an early authority on what he termed “Google hacking,” using search engines as reconnaissance tools rather than as portals to web content.
Long popularized the concept in his book Google Hacking for Penetration Testers, which became the foundational text on the topic. Alongside the book, he built and maintained a public list of submitted dorks, classifying them by what they exposed and how they could be used. The community contributed entries, Long curated them, and the GHDB grew from a personal project into a recognised reference.
In 2010, Long handed over maintenance of the database to Offensive Security, the same team behind Kali Linux and the OSCP certification. The GHDB has been hosted on their Exploit Database (Exploit-DB) ever since, where it continues to accept community submissions and curate them under the original category structure.
The handover is part of what gives the GHDB its longevity. It‘s been continuously maintained, professionally moderated, and integrated into mainstream pentesting tooling for more than two decades.
Where is the GHDB hosted today?
The current home of the GHDB is exploit-db.com/google-hacking-database, operated by Offensive Security. The interface is straightforward: a filterable list of entries, searchable by keyword, category, author, and date.
Most professional reconnaissance tools also integrate the GHDB directly. The Metasploit Framework includes a search module that pulls queries from the database. Recon-ng can fetch GHDB entries through its module system. OSINT-focused distributions like Kali Linux and BlackArch ship with GHDB-aware tooling preinstalled.
You can also browse the GHDB without an Exploit-DB account, which is part of why it‘s such a low-friction resource for both attackers and defenders. Anyone with a browser can pull up the full catalogue in seconds.
The 14 GHDB categories explained
The GHDB‘s structural backbone is its category system. Every submitted dork is filed under one of 14 categories, each one corresponding to a class of exposure. Understanding the categories is the fastest way to use the database productively — instead of scrolling through thousands of entries, you go straight to the type of risk that applies to your environment.
| GHDB Category | What it surfaces | Defensive implication |
|---|---|---|
| Footholds | Initial entry points — exposed shells, command injection points, accessible admin scripts. | Audit web-facing applications for forgotten administrative endpoints. |
| Files Containing Usernames | Log files, dumps, and configuration files that include valid usernames. | Treat usernames as semi-sensitive — they fuel password spraying. |
| Sensitive Directories | Open directory listings (Index of /) revealing internal file structures. | Disable directory listing at the web server level. Don‘t rely on obscurity. |
| Web Server Detection | Default pages and banners that reveal server software and version. | Strip server banners. Replace default landing pages. |
| Vulnerable Files | Files associated with known CVEs — outdated CMS plugins, exploitable scripts. | Patch hygiene and version management at the application layer. |
| Vulnerable Servers | Server-level vulnerabilities discoverable through search. | Asset inventory plus continuous vulnerability scanning. |
| Error Messages | Verbose application errors that leak stack traces, paths, and database structure. | Production should never display debug errors. Catch and log silently. |
| Files Containing Juicy Info | Anything sensitive that doesn‘t fit elsewhere — internal documents, draft policies, HR data. | Review what‘s in your public web root. Apply noindex headers where appropriate. |
| Files Containing Passwords | Files literally containing plaintext or hashed passwords. | Secret management, period. No credentials in source. |
| Sensitive Online Shopping Info | Customer data, order records, payment information from misconfigured e-commerce systems. | PCI scope review. Storage segmentation. |
| Network or Vulnerability Data | Firewall logs, IDS output, network configuration files. | Internal monitoring data should never reach a public-facing directory. |
| Pages Containing Login Portals | Admin panels, internal dashboards, VPN portals exposed to the open web. | MFA on every login surface. Restrict admin paths by IP. |
| Various Online Devices | IoT devices, cameras, printers, ICS interfaces exposed to the internet. | Network segmentation. Devices should not be publicly reachable by default. |
| Advisories and Vulnerabilities | Dorks tied to specific published advisories — quick way to find unpatched instances. | Treat new advisories as immediate hunting opportunities, not paperwork. |
The category labels are decades old, but the exposure types they describe are evergreen. A web server still leaks its banner the same way it did in 2008. An Index of / still reveals the same kinds of files. The technologies change; the configuration mistakes don‘t.
How attackers use the GHDB
For attackers, the GHDB is a starting point, not a finished playbook. Threat actors typically use it in one of three modes.
Targeted reconnaissance. Given a specific target organization, an attacker will pull dorks from relevant categories — login portals, exposed devices, vulnerable files — and append site:targetdomain.com to each one. A single afternoon of this work can yield a complete picture of an organization‘s misconfigurations, forgotten subdomains, and exposed administrative interfaces.
Opportunistic sweeps. Instead of targeting a specific organization, attackers run broad GHDB queries across the open web — inurl:phpmyadmin/index.php returns every exposed phpMyAdmin instance on the internet, regardless of who owns it. The output becomes a target list, sorted by which look easiest to exploit.
Advisory chasing. When a new CVE is published, attackers immediately look for a GHDB dork that surfaces vulnerable instances of the affected software. The race is between defenders patching and attackers finding unpatched instances through search. Search usually wins, because it scales faster than human patching schedules.
Our breakdown of Google dorking in real attacks walks through six documented use cases that follow exactly these patterns — including a ransomware incident that began with a single GHDB-style search.
How defenders should use the GHDB
The GHDB is just as useful, arguably even more useful, for the defender‘s side of the chair. The asymmetry that makes it dangerous in attacker hands becomes a strength when you turn it inward.
Defensive use of the GHDB falls into three buckets:
Continuous self-audit. Take the dorks in the categories most relevant to your environment (Pages Containing Login Portals, Vulnerable Files, Files Containing Passwords) and append site:yourdomain.com to each one. Run them weekly. Anything Google has indexed about you, an attacker can find — so finding it first is the only winning move.
Advisory response. When a CVE drops for software your organization uses, check whether a corresponding GHDB entry exists or has been recently submitted. If one does, assume opportunistic scanning has already started. Patch on attacker time, not vendor time.
Red team support. Internal red teams and authorized pentesters can use the GHDB to scope realistic recon exercises. The goal isn‘t to find something exotic; it‘s to verify that nothing in the well-known catalogue applies to you.
For organizations without dedicated red teams, services like CybelAngel‘s Angriffsflächenmanagement automate this kind of continuous self-audit — running structured queries across the open web and surfacing exposures tied to your domain before attackers spot them.
Building a defensive GHDB workflow
A one-off GHDB sweep finds yesterday‘s exposures. The leaks that hurt are the ones that get indexed after your last audit. To turn the GHDB into an ongoing control:
- Build your category shortlist. Of the 14 categories, three to five will dominate your risk profile. For most organizations, that‘s Pages Containing Login Portals, Files Containing Passwords, Sensitive Directories, Web Server Detection, and Various Online Devices. Map your own list once, then stop questioning it.
- Maintain a domain-pinned dorks library. Take each relevant GHDB query, hard-code your domain into it (
site:yourcompany.com), and store the result as a runbook. The library lives in version control, gets reviewed quarterly, and runs on a schedule. - Automate the execution. Tools like Recon-ng, theHarvester, and the Metasploit search modules can run GHDB queries programmatically. Output flows into your ticketing or SIEM system, not a researcher‘s notebook.
- Trigger on advisories. When a new CVE drops for software you run, your workflow should check the GHDB for a corresponding entry within hours, not weeks.
- Track time-to-remediation. Every finding gets a clock. The lag between detection and rotation/patching is the single best indicator of how well the workflow is actually working.
- Loop in development teams. Most GHDB findings point to a process gap — a
noindexheader that wasn‘t set, a.bakfile that shouldn‘t be in the web root, a default admin path that was never renamed. Fix the process, not just the artefact.
GHDB and ethical considerations
Browsing the GHDB is legal. Submitting entries to it is legal. Running queries from it against your own domain is legal.
Running them against a domain you don‘t own — and then acting on the results — is where things change. Under the Computer Fraud and Abuse Act in the US, the Computer Misuse Act in the UK, and equivalent legislation in most jurisdictions, accessing systems or data you don‘t have authorization to touch is a criminal offense, regardless of how easy it was to find them through Google.
Ethical use of the GHDB requires:
- Authorization. Only run queries against your own assets, or assets explicitly scoped in an engagement contract or bug bounty program.
- Restraint. Reading a publicly indexed file is generally legal. Logging in with credentials you found in it is not.
- Responsible disclosure. If you find someone else‘s exposure incidentally, report it through their security disclosure channel. Don‘t exploit it. Don‘t share it publicly until they‘ve had time to remediate.
The GHDB itself is a defensive resource that happens to be useful offensively. How you use it determines which side you‘re on.
Limitations of the GHDB in 2026
The GHDB is a venerable resource, but it‘s not a complete one. Several limitations are worth knowing:
Submission lag. The database depends on community submissions. Novel exposure patterns may not appear for weeks or months after they start being exploited in the wild.
Google‘s changes. Google has deprecated several operators over the years — Zwischenspeicher was retired in 2024, Link: has been unreliable since 2017, and + no longer forces exact matches. Older GHDB entries that rely on deprecated operators no longer work as written. Always test before relying.
Coverage gaps. The GHDB focuses on Google. It does not cover Bing, DuckDuckGo, Yandex, or specialist engines like Shodan and Censys — all of which surface different exposure classes. A defender relying on the GHDB alone has visibility into one search index.
Volume signal-to-noise. Some categories contain thousands of entries. Without filtering by relevance to your environment, you can spend more time triaging the database than triaging your actual exposures.
This is why most mature security programs treat the GHDB as one input to a broader exposure-monitoring strategy — not as the strategy itself.
FAQs
Einpacken
The Google Hacking Database isn‘t glamorous, but it‘s remained relevant for over two decades for a reason. The exposure patterns it catalogues, misconfigured directories, leaked credentials, forgotten admin portals, are exactly the kind of low-effort findings that fuel most opportunistic attacks.
For attackers, it‘s a permanent starting line. For defenders, it‘s a permanent self-audit. The asymmetry between those two uses is one of the more interesting features of public security tooling: the same resource that helps the wrong people find your exposures can help you find them first.
Manage your external-facing assets before they end up in someone else‘s reconnaissance workflow. CybelAngel‘s Angriffsflächenmanagement und Prävention von Datenschutzverletzungen services continuously monitor the open, deep, and dark web for exposures tied to your organization — turning structured queries like those in the GHDB into actionable intelligence delivered to your security team.
