Google Hacking Database (GHDB): What It Is and How to Use It Defensively
Table of contents
- What is the Google Hacking Database?
- Who is Johnny Long and how did the GHDB start?
- Where is the GHDB hosted today?
- The 14 GHDB categories explained
- How attackers use the GHDB
- How defenders should use the GHDB
- Building a defensive GHDB workflow
- GHDB and ethical considerations
- Limitations of the GHDB in 2026
- FAQs
- Wrapping up
If you‘ve ever stumbled across a Google dork in a write-up and wondered where the author found it, the answer is almost always the same: the Google Hacking Database.
The GHDB is the canonical, public library of Google search queries that surface exposed information online. It‘s been running since 2004, currently hosts thousands of catalogued dorks across 14 categories, and underpins more reconnaissance work, by attackers and defenders alike, than any other single resource on the internet.
This article is for the question that comes after that: what is the GHDB, how is it structured, and how should your security team actually use it?
What is the Google Hacking Database?
The Google Hacking Database (GHDB) is a public, searchable repository of Google dorks — advanced search queries that use Google‘s operators to find sensitive information indexed by the search engine.
Each entry in the database includes the search string itself, the category of information it surfaces (think login portals, exposed databases, sensitive files), a short description, and the date it was submitted. Together, the entries form a working library of every documented way a Google search has ever been used to find something it probably shouldn‘t.
The GHDB isn‘t a hacking tool in the strict sense. It runs no code. It exploits nothing. It‘s a catalogue — but it‘s a catalogue of human error at internet scale, which is why it remains one of the most powerful resources in the OSINT toolkit.
Who is Johnny Long and how did the GHDB start?
The GHDB was created in 2004 by security researcher Johnny Long, an early authority on what he termed “Google hacking,” using search engines as reconnaissance tools rather than as portals to web content.
Long popularized the concept in his book Google Hacking for Penetration Testers, which became the foundational text on the topic. Alongside the book, he built and maintained a public list of submitted dorks, classifying them by what they exposed and how they could be used. The community contributed entries, Long curated them, and the GHDB grew from a personal project into a recognised reference.
In 2010, Long handed over maintenance of the database to Offensive Security, the same team behind Kali Linux and the OSCP certification. The GHDB has been hosted on their Exploit Database (Exploit-DB) ever since, where it continues to accept community submissions and curate them under the original category structure.
The handover is part of what gives the GHDB its longevity. It‘s been continuously maintained, professionally moderated, and integrated into mainstream pentesting tooling for more than two decades.
Where is the GHDB hosted today?
The current home of the GHDB is exploit-db.com/google-hacking-database, operated by Offensive Security. The interface is straightforward: a filterable list of entries, searchable by keyword, category, author, and date.
Most professional reconnaissance tools also integrate the GHDB directly. The Metasploit Framework includes a search module that pulls queries from the database. Recon-ng can fetch GHDB entries through its module system. OSINT-focused distributions like Kali Linux and BlackArch ship with GHDB-aware tooling preinstalled.
You can also browse the GHDB without an Exploit-DB account, which is part of why it‘s such a low-friction resource for both attackers and defenders. Anyone with a browser can pull up the full catalogue in seconds.
The 14 GHDB categories explained
The GHDB‘s structural backbone is its category system. Every submitted dork is filed under one of 14 categories, each one corresponding to a class of exposure. Understanding the categories is the fastest way to use the database productively — instead of scrolling through thousands of entries, you go straight to the type of risk that applies to your environment.
| GHDB Category | What it surfaces | Defensive implication |
|---|---|---|
| Footholds | Initial entry points — exposed shells, command injection points, accessible admin scripts. | Audit web-facing applications for forgotten administrative endpoints. |
| Files Containing Usernames | Log files, dumps, and configuration files that include valid usernames. | Treat usernames as semi-sensitive — they fuel password spraying. |
| Sensitive Directories | Open directory listings (Index of /) revealing internal file structures. | Disable directory listing at the web server level. Don‘t rely on obscurity. |
| Web Server Detection | Default pages and banners that reveal server software and version. | Strip server banners. Replace default landing pages. |
| Vulnerable Files | Files associated with known CVEs — outdated CMS plugins, exploitable scripts. | Patch hygiene and version management at the application layer. |
| Vulnerable Servers | Server-level vulnerabilities discoverable through search. | Asset inventory plus continuous vulnerability scanning. |
| Error Messages | Verbose application errors that leak stack traces, paths, and database structure. | Production should never display debug errors. Catch and log silently. |
| Files Containing Juicy Info | Anything sensitive that doesn‘t fit elsewhere — internal documents, draft policies, HR data. | Review what‘s in your public web root. Apply noindex headers where appropriate. |
| Files Containing Passwords | Files literally containing plaintext or hashed passwords. | Secret management, period. No credentials in source. |
| Sensitive Online Shopping Info | Customer data, order records, payment information from misconfigured e-commerce systems. | PCI scope review. Storage segmentation. |
| Network or Vulnerability Data | Firewall logs, IDS output, network configuration files. | Internal monitoring data should never reach a public-facing directory. |
| Pages Containing Login Portals | Admin panels, internal dashboards, VPN portals exposed to the open web. | MFA on every login surface. Restrict admin paths by IP. |
| Various Online Devices | IoT devices, cameras, printers, ICS interfaces exposed to the internet. | Network segmentation. Devices should not be publicly reachable by default. |
| Advisories and Vulnerabilities | Dorks tied to specific published advisories — quick way to find unpatched instances. | Treat new advisories as immediate hunting opportunities, not paperwork. |
The category labels are decades old, but the exposure types they describe are evergreen. A web server still leaks its banner the same way it did in 2008. An Index of / still reveals the same kinds of files. The technologies change; the configuration mistakes don‘t.
How attackers use the GHDB
For attackers, the GHDB is a starting point, not a finished playbook. Threat actors typically use it in one of three modes.
Targeted reconnaissance. Given a specific target organization, an attacker will pull dorks from relevant categories — login portals, exposed devices, vulnerable files — and append site:targetdomain.com to each one. A single afternoon of this work can yield a complete picture of an organization‘s misconfigurations, forgotten subdomains, and exposed administrative interfaces.
Opportunistic sweeps. Instead of targeting a specific organization, attackers run broad GHDB queries across the open web — inurl:phpmyadmin/index.php returns every exposed phpMyAdmin instance on the internet, regardless of who owns it. The output becomes a target list, sorted by which look easiest to exploit.
Advisory chasing. When a new CVE is published, attackers immediately look for a GHDB dork that surfaces vulnerable instances of the affected software. The race is between defenders patching and attackers finding unpatched instances through search. Search usually wins, because it scales faster than human patching schedules.
Our breakdown of Google dorking in real attacks walks through six documented use cases that follow exactly these patterns — including a ransomware incident that began with a single GHDB-style search.
How defenders should use the GHDB
The GHDB is just as useful, arguably even more useful, for the defender‘s side of the chair. The asymmetry that makes it dangerous in attacker hands becomes a strength when you turn it inward.
Defensive use of the GHDB falls into three buckets:
Continuous self-audit. Take the dorks in the categories most relevant to your environment (Pages Containing Login Portals, Vulnerable Files, Files Containing Passwords) and append site:yourdomain.com to each one. Run them weekly. Anything Google has indexed about you, an attacker can find — so finding it first is the only winning move.
Advisory response. When a CVE drops for software your organization uses, check whether a corresponding GHDB entry exists or has been recently submitted. If one does, assume opportunistic scanning has already started. Patch on attacker time, not vendor time.
Red team support. Internal red teams and authorized pentesters can use the GHDB to scope realistic recon exercises. The goal isn‘t to find something exotic; it‘s to verify that nothing in the well-known catalogue applies to you.
For organizations without dedicated red teams, services like CybelAngel‘s Attack Surface Management automate this kind of continuous self-audit — running structured queries across the open web and surfacing exposures tied to your domain before attackers spot them.
Building a defensive GHDB workflow
A one-off GHDB sweep finds yesterday‘s exposures. The leaks that hurt are the ones that get indexed after your last audit. To turn the GHDB into an ongoing control:
- Build your category shortlist. Of the 14 categories, three to five will dominate your risk profile. For most organizations, that‘s Pages Containing Login Portals, Files Containing Passwords, Sensitive Directories, Web Server Detection, and Various Online Devices. Map your own list once, then stop questioning it.
- Maintain a domain-pinned dorks library. Take each relevant GHDB query, hard-code your domain into it (
site:yourcompany.com), and store the result as a runbook. The library lives in version control, gets reviewed quarterly, and runs on a schedule. - Automate the execution. Tools like Recon-ng, theHarvester, and the Metasploit search modules can run GHDB queries programmatically. Output flows into your ticketing or SIEM system, not a researcher‘s notebook.
- Trigger on advisories. When a new CVE drops for software you run, your workflow should check the GHDB for a corresponding entry within hours, not weeks.
- Track time-to-remediation. Every finding gets a clock. The lag between detection and rotation/patching is the single best indicator of how well the workflow is actually working.
- Loop in development teams. Most GHDB findings point to a process gap — a
noindexheader that wasn‘t set, a.bakfile that shouldn‘t be in the web root, a default admin path that was never renamed. Fix the process, not just the artefact.
GHDB and ethical considerations
Browsing the GHDB is legal. Submitting entries to it is legal. Running queries from it against your own domain is legal.
Running them against a domain you don‘t own — and then acting on the results — is where things change. Under the Computer Fraud and Abuse Act in the US, the Computer Misuse Act in the UK, and equivalent legislation in most jurisdictions, accessing systems or data you don‘t have authorization to touch is a criminal offense, regardless of how easy it was to find them through Google.
Ethical use of the GHDB requires:
- Authorization. Only run queries against your own assets, or assets explicitly scoped in an engagement contract or bug bounty program.
- Restraint. Reading a publicly indexed file is generally legal. Logging in with credentials you found in it is not.
- Responsible disclosure. If you find someone else‘s exposure incidentally, report it through their security disclosure channel. Don‘t exploit it. Don‘t share it publicly until they‘ve had time to remediate.
The GHDB itself is a defensive resource that happens to be useful offensively. How you use it determines which side you‘re on.
Limitations of the GHDB in 2026
The GHDB is a venerable resource, but it‘s not a complete one. Several limitations are worth knowing:
Submission lag. The database depends on community submissions. Novel exposure patterns may not appear for weeks or months after they start being exploited in the wild.
Google‘s changes. Google has deprecated several operators over the years — cache: was retired in 2024, link: has been unreliable since 2017, and + no longer forces exact matches. Older GHDB entries that rely on deprecated operators no longer work as written. Always test before relying.
Coverage gaps. The GHDB focuses on Google. It does not cover Bing, DuckDuckGo, Yandex, or specialist engines like Shodan and Censys — all of which surface different exposure classes. A defender relying on the GHDB alone has visibility into one search index.
Volume signal-to-noise. Some categories contain thousands of entries. Without filtering by relevance to your environment, you can spend more time triaging the database than triaging your actual exposures.
This is why most mature security programs treat the GHDB as one input to a broader exposure-monitoring strategy — not as the strategy itself.
FAQs
Yes. The GHDB is a public, freely accessible database of search queries. Browsing it, downloading entries, and running queries against assets you own or are authorized to test is fully legal. Using results to access systems you don‘t own is not.
Offensive Security has maintained the GHDB since 2010, hosting it on Exploit-DB. They accept community submissions and curate entries before publication. Johnny Long originally created and maintained it from 2004 to 2010.
The database has grown to thousands of catalogued dorks across the 14 categories. New entries are submitted by the community on an ongoing basis, and Exploit-DB curates them before publication. The exact count fluctuates as old entries are retired and new ones added.
Not exactly. Google dorks are the search queries themselves. The GHDB is a specific, curated collection of those queries, hosted by Exploit-DB. Most Google dorks you‘ll encounter in security writing come from or pass through the GHDB, but the two terms aren‘t interchangeable.
Yes. Exploit-DB accepts community submissions through their site. Submitted dorks go through a curation process before being published, and accepted entries are credited to the submitter.
The GHDB catalogues queries for Google, which indexes public web content. Shodan is a separate search engine that indexes internet-connected devices and services by their banners. Both are reconnaissance resources, but they cover different parts of the internet. Most security professionals use both.
Many are. The configuration mistakes that GHDB dorks expose — open directory listings, default pages, hardcoded credentials in public files — haven‘t changed structurally since 2004. Older entries that rely on deprecated Google operators (cache:, +) need updating, but the underlying exposure patterns remain valid.
Yes. Public threat reports and post-incident analyses consistently identify GHDB-style queries as the starting point for reconnaissance in opportunistic attacks, particularly against small and mid-sized organizations that lack continuous exposure monitoring.
Wrapping up
The Google Hacking Database isn‘t glamorous, but it‘s remained relevant for over two decades for a reason. The exposure patterns it catalogues, misconfigured directories, leaked credentials, forgotten admin portals, are exactly the kind of low-effort findings that fuel most opportunistic attacks.
For attackers, it‘s a permanent starting line. For defenders, it‘s a permanent self-audit. The asymmetry between those two uses is one of the more interesting features of public security tooling: the same resource that helps the wrong people find your exposures can help you find them first.
Manage your external-facing assets before they end up in someone else‘s reconnaissance workflow. CybelAngel‘s Attack Surface Management and Data Breach Prevention services continuously monitor the open, deep, and dark web for exposures tied to your organization — turning structured queries like those in the GHDB into actionable intelligence delivered to your security team.
