DocsModulesAudits

An audit scans every page on your site and scores it for SEO, AEO and LLMO citation-readiness.

Audits

An audit is CiteFlow's structured assessment of how well your site is set up for traditional search, answer engines, and large language models. It crawls a representative slice of your site, evaluates each page against a rubric of 35 checks across three pillars, and produces both per-page findings and site-wide findings with prioritised remediation guidance.

This page is the full reference for the audit module. For how re-audits, deltas, and improvement tracking work over time, see re-audits and improvement tracking.

What an audit measures

Every audit runs the same rubric of 35 finding keys:

  • 24 page-scope checks, evaluated against every crawled page and rolled up into per-page scores.
  • 11 site-scope checks, evaluated once at the site level (sitemap presence, robots configuration, llms.txt, internal link graph, etc.).

Each finding is tagged with a pillar (SEO, AEO, LLMO) and a severity (critical, important, minor).

Pillar breakdown

SEO, Traditional search foundations.

  • Page scope: title, meta_description, canonical, h1, heading_hierarchy, og_tags, robots, alt_text.
  • Site scope: sitemap, plus eight site-wide hygiene checks (robots.txt presence and validity, duplicate titles, duplicate descriptions, orphan pages, broken internal links, mixed content, HTTPS consistency, hreflang where applicable).

AEO, Answer Engine Optimisation.

  • article_schema, direct_answers, faqpage_schema, howto_schema, internal_links, question_headings, snippet_structure.
  • These checks reward content that's structured to be quoted: clear question-style headings, concise answer paragraphs near the top of the page, and FAQ markup that lets search engines lift answers directly.

LLMO, Large Language Model Optimisation.

  • Page scope: author_schema, breadcrumbs, citation_structure, content_depth, external_links, org_schema.
  • Site scope: ai_crawlers (whether you allow GPTBot, ClaudeBot, PerplexityBot and friends in robots.txt) and llms_txt (a /llms.txt file describing your site to LLMs).

What do the severity levels mean?

SeverityMeaning
criticalBlocks a major channel. Examples: missing <title>, robots.txt disallowing AI crawlers when you want citations, no sitemap.
importantMaterially reduces ranking, citation likelihood, or extraction quality. Examples: weak meta descriptions, missing FAQ schema on FAQ-style pages, no breadcrumbs.
minorPolish. Worth fixing once criticals and importants are done.

The action plan always works top-down: clear the criticals first, then importants, then minors.

How is the score calculated?

Each finished audit produces four numbers, all on a 0–100 scale:

  • Overall score, weighted average across the three pillars.
  • SEO score, AEO score, LLMO score, per-pillar rollups.

Scores are derived from the proportion of checks that pass, weighted by severity (a failed critical hurts the score far more than a failed minor). They're directional, not absolute: chasing 100 isn't the goal. The goal is to move scores up over time, and the performance dashboard is built around that.

GEO score (synthesised)

Underneath the four primary scores, every audit also reports a GEO (Generative Engine Optimisation) score, a synthesised view of how ready your site is to be cited by generative AI engines (ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews). It is an equal 50/50 weighting of the AEO and LLMO pillar scores:

GEO = round((AEO * 0.5) + (LLMO * 0.5))

GEO is a presentation-layer summary, not a separate pillar, there are no GEO-specific finding keys. Anything that improves your AEO or LLMO score improves your GEO score. For the full GEO primer, see What is GEO?.

How does the crawler behave?

CiteFlow's crawler is designed to be a polite, well-behaved member of your traffic:

  • Respects robots.txt, paths disallowed for our user-agent are skipped. If you block our crawler entirely, the audit will be empty.
  • Rate limited, defaults to 2 requests per second, configurable per site if you've negotiated higher limits.
  • Declares itself, sends User-Agent: CiteFlowBot with a link to our /bot disclosure page so your ops team can identify the traffic.
  • Discovers via sitemap and on-page links, starts from your declared sitemap (if any), then expands by following internal links found during the crawl.
  • Sub-domain handling, by default the crawl stays on the exact hostname you verified. Marketplace tier can add an allowlist of sub-domains (e.g. blog, shop) that should be treated as same-origin. Hard cap of 10 entries.

How are page types categorised?

Every crawled page is classified into a page_type so findings can be grouped and pattern-aware:

  • homepage, content, listing, product, category, blog_post, author, utility, or unknown.

This matters most for marketplace-style sites where the same template powers thousands of pages. Page-type categorisation feeds the section-pattern intelligence available on Marketplace tier (see Marketplace).

How do you read findings?

Findings are presented in two views.

Site-wide findings

Issues that apply to the whole site (missing sitemap, no llms.txt, inconsistent HTTPS) are shown once with a single fix. Don't expect per-page noise for these.

Page-level findings

Issues found on individual pages are grouped by finding key, with the affected page count visible up front. So instead of 800 separate "missing meta description" rows, you see one row saying "Missing meta description, 800 pages affected", which you can expand to see the URLs.

Within each group, findings are sorted by severity (critical first).

Enhanced finding cards

Each finding opens an enriched card with:

  • Why it matters, plain-language impact statement.
  • Implementation steps, numbered steps you (or your developer) can follow.
  • Code snippet, a copy-pastable example where relevant (e.g. JSON-LD blocks, meta tag examples).
  • Testing steps, how to verify the fix took effect.
  • Impact points, a 1–10 score reflecting expected lift from fixing this finding, used to rank the action plan.
  • CMS-specific guidance, if we've detected your CMS (WordPress, Webflow, Shopify, Ghost) we surface tailored instructions instead of the generic version.

What is the Action Plan?

The Action Plan view filters down to the top 10 findings by impact points. This is the answer to "I only have an hour this week, what should I fix?". Each entry links straight through to the enriched finding card.

How does the by-template view work in Marketplace?

On Marketplace tier, findings can additionally be grouped by page type. For a directory site with 50,000 listing pages built from one template, this view shows whether the issue lives in the template (fix once, fix everywhere) or only on a subset.

What does the pages-crawled table show?

Every audit ships with a table of every page the crawler touched:

  • URL, HTTP status code, fetched timestamp.
  • Page type classification.
  • Per-page overall and pillar scores.
  • Number of findings on the page.

Pages that returned 4xx/5xx, were excluded by robots.txt, or timed out are flagged with status failed or skipped so you can see what wasn't audited.

How long an audit takes

Duration scales with site size and rate limit. Rough ranges at the default 2 req/sec:

PagesTypical duration
Up to 1005–15 minutes
100–1,00015–60 minutes
1,000–10,0001–4 hours
10,000+Several hours, sometimes overnight

You don't need to keep the dashboard open. Audits run server-side and the dashboard polls for status.

How does ongoing tracking work?

An audit on its own is a snapshot. To see whether your fixes are landing, you want re-audits and deltas:

  • Re-audit cadence by tier.
  • The four delta types (resolved, new, persisted, regressed).
  • Page-type categorisation across audits.
  • Manual re-audit triggers.

See re-audits and improvement tracking for the full lifecycle.

References

Related

  • Re-audits and deltasScheduled re-audits, the four delta change types, page-type categorisation, manual re-audits.
  • Knowledge baseAuto-synthesised from your audit. Review, edit and approve before content generation reads it.