LLM hallucinations are a known issue. But as frustrating as they may be, hallucinated AI citations are just a symptom.
Every time an AI assistant invents a URL to your site—confidently, wrongly—it’s doing so because a real person asked a real question. The model expects your site to have an answer, and when it can’t find one, it fills in the blank.
That’s the gap worth paying attention to.
Buried in your 404 logs is a record of those moments: a map of topics your audience is actively researching, questions they’re asking AI tools, and content your site hasn’t created yet.
Each hallucinated URL is a signal. Zoom out, and you start to see patterns across dozens of those URLs that reveal where your next content opportunities lie.
After seeing a post from Tim Soulo about using Ahrefs to uncover these cases, we decided to dig into the data ourselves.
Table of contents
What your 404 pages actually tell you
404 errors usually signal technical issues: broken links, redirect gaps, or an old campaign URL that’s slipped through the cracks.
But AI assistants are introducing a new kind of 404, that may be more valuable than most marketers realize.
LLMs like ChatGPT, Claude, and Gemini are now generating plausible-looking URLs to sites they’ve indexed, as part of responses to research queries. And even though the URL doesn’t exist, the intent behind it does.
After Tim Soulo surfaced this behavior using Ahrefs, we ran the same analysis on CXL’s own site. The numbers were modest but instructive:
- 84 total 404 pages detected
- 257 unique referring domains pointing to those broken URLs
That second number is the one that matters.
These aren’t random crawl errors. Multiple independent sources, many of them AI-generated content or AI-assisted research outputs, are all pointing to the same hallucinated pages.
The URLs themselves are worth reading carefully. Pull them into a spreadsheet, and you’ll notice patterns: recurring topic clusters, specific framing choices, and implied expertise levels.
This means AI tools aren’t hallucinating randomly. They’re generating URLs that match what should exist on a site like yours, given its existing content profile.
The insight: LLMs have formed a model of what your site covers. The hallucinated URLs represent the next logical layer of that model: topics adjacent to what you’ve published or questions you’ve gestured at but never fully answered.
How to find them in Ahrefs
Finding hallucinated AI citations isn’t complicated. Here’s the workflow:

- Open Site Explorer → Best by Links. This shows pages ranked by inbound link volume.
- Filter for 404 status. This surfaces broken URLs that are still attracting traffic or links, including AI-generated citations.
- Export the list. You want the full URL, referring domain count, and anchor text where available.
The anchor text is particularly useful.
If multiple AI-generated sources are pointing to /blog/b2b-content-attribution-framework with anchor text like “content attribution methodology,” that’s a clear signal about the specific framing your audience expects.
One important caveat: not every 404 is an LLM hallucination. Genuine broken links, old campaign URLs, and redirected content all show up in this view.
You’re looking for patterns: URLs with plausible-but-nonexistent slugs, often under /blog/, often covering topics adjacent to your existing content.
The referring domain context helps filter: a 404 with 12 referring domains from AI-generated content sites is a different signal than one link from a four-year-old forum post.
Turning hallucinated URLs into content ideas
Once you have the spreadsheet, the analysis is straightforward. Use a prompt like this:
| Analyze this spreadsheet containing hallucinated URLs that lead to 404 pages. Identify patterns in the URLs and extract the likely topics or search queries behind them. Group them into content themes or pillars. Suggest 10 content ideas that would address these topics and match the likely intent of the original queries. Return the results organized by topic cluster, with suggested article titles for each. |
What you’re doing here is reverse-engineering the research queries that prompted the hallucinations. LLMs tend to generate URLs that match expected patterns—so /blog/content-attribution-for-lean-teams signals something very specific about who was asking and what they needed.
The output is a prioritized content brief list, organized by topic cluster, derived from actual audience behavior rather than keyword volume estimates. That’s a meaningfully different input into your editorial calendar.
A few things worth validating before you start writing:
- Check whether the topic is already covered elsewhere on your site, just under a different URL structure. Hallucinations sometimes point to content that exists but is poorly indexed or linked.
- Assess intent depth. Some hallucinated URLs suggest introductory overviews; others suggest very specific tactical content. Match your response to the implied depth.
- Look for clusters, not one-offs. A single hallucinated URL doesn’t mean much. But when five of them keep pointing at the same topic, you’re looking at a content gap.
Personalizing the 404 experience
The second thing worth doing with this data is to fix what visitors experience when they land on these broken pages.
A standard 404 is a dead end. A visitor who arrived there from an AI-generated citation, following up on research they were already doing, gets nothing. They bounce, while you lose a warm, intent-driven visit.
The intervention is simple: a personalized 404 page that acknowledges the likely source of the error and redirects visitors toward real content. Something that says, in effect, we know why you’re here, and here’s what actually exists.
Because our LLM content gap analysis showed that hallucinated URLs were almost entirely concentrated under /blog/, we implemented a conditional rule: /blog/ 404s get the personalized experience, everything else gets the standard page. That keeps the personalization targeted and avoids overcomplicating non-blog error pages.
You can see the difference live:
- Personalized 404: cxl.com/blog/thisdoesnotexist/
- Standard 404: cxl.com/thisdoesnotexist/
We’re tracking whether this reduces bounce rate on 404 sessions, increases navigation to related content, and improves session continuation for AI-referred visitors.
It’s early, but the logic holds: a visitor who arrived because an AI cited your site has more intent than a random bounce. They deserve a better landing.
What to do next
- Run the Ahrefs audit now. Site Explorer → Best by Links → filter for 404. Export everything. This takes 10 minutes and gives you a dataset you didn’t know existed.
- Build the topic cluster map. Feed the spreadsheet into Claude or ChatGPT with the prompt above. Review the output critically—not all clusters will be content gaps. Tag the ones worth pursuing.
- Prioritize by referring domain count. Topics with multiple AI-generated sources pointing to the same hallucinated URL are higher-priority signals. One hallucination is an anomaly. A cluster is a pattern.
- Implement the conditional 404 rule. If your CMS supports URL-based logic, set it up now. This is low-effort and captures value from traffic you’re already getting. The personalized page doesn’t need to be sophisticated; it just needs to acknowledge the likely context and offer three or four relevant links.
- Revisit the list quarterly. LLM behavior changes as models update. New hallucination patterns will emerge as AI assistants expand their topical coverage. This isn’t a one-time audit; it’s an ongoing signal worth monitoring.
Your 404 pages are smarter than your keyword research
Keyword tools tell you what people type into search boxes. Hallucinated URL patterns tell you what people expect sophisticated AI research tools to retrieve on their behalf.
That’s a different, and arguably richer, signal. It reflects higher-intent research behavior, more specific topical needs, and the implicit expectations that AI users have developed about what authoritative sites in your space should cover.
The irony is that AI hallucinations, widely treated as a credibility problem, are doing something useful: surfacing the gap between what your site is and what your audience expects it to be. That gap is your content roadmap.
You already have the data. It’s sitting in your 404 logs, waiting for an Ahrefs filter.
Want to go deeper on AI-driven content strategy? CXL’s AI for B2B Marketing course shows how to build content systems designed for the way AI tools discover, synthesize, and cite information.
You’ll learn how to structure pages for LLM visibility, automate competitive and topic research, and turn emerging search behaviors into actionable content strategy.
→ We’re also hosting several live AI-focused sessions, including:
- Measuring the modern AI-powered funnel
- Optimizing B2B content funnels with AI
- Increase your visibility and revenue from AI-based discovery engines.
- Content strategy for LLM visibility and changing search habits
- Optimize pages for AI search with GEO/AEO
→ To learn how to automate repetitive marketing workflows, join our 5-day n8n webinar series.
And if you’re leading AI adoption inside your company, don’t miss our free webinar: AI Adoption for Leaders with WeSimplify co-founders Ramir Arya and Ilinca Munteanu.