A technical SEO audit systematically checks whether search engines can crawl, render, index, and understand your site. Run through these 40 checkpoints, covering crawlability, Core Web Vitals, indexation, structured data, and AI-search signals, and you have a complete, prioritized fix list ready for your sprint board.
By Guru Editorial, June 9, 2026
Technical audits are not glamorous work. But they are the foundation everything else depends on. You can have the best content on the internet and still rank nowhere if Googlebot cannot crawl your pages, your canonical signals conflict, or your server response times drag past two seconds.
The 2025 Web Almanac found that more than half of all mobile pages fail at least one Core Web Vital, and that figure includes sites with dedicated engineering and SEO teams. The issues are rarely exotic. Broken redirect chains, duplicate content from trailing slashes, render-blocking JavaScript, and absent structured data account for the large majority of problems that show up in any real-world audit.
This checklist is organized into seven audit domains. Work through them in order, crawlability first, AI signals last, because the earlier checks gate everything downstream.
What a Technical SEO Audit Actually Covers
A technical audit is a structured inspection of the infrastructure layer of your website: everything that affects how search engines discover, fetch, render, index, and interpret your pages. It is distinct from an on-page content audit or a backlink profile review, though those disciplines overlap at the edges.
The goal is a prioritized fix list, not a 200-page PDF that lives in a Google Drive folder. Every finding should map to a specific owner, a severity level, and a ticket in your sprint board.
For teams running ongoing audits through a connected platform, SEOguru's technical module tracks crawl health, indexation status, and Core Web Vitals continuously so issues surface before they compound into traffic drops.
The 40-Point Technical SEO Audit Checklist
1. Crawlability and Robots Control
Before anything else, confirm Googlebot can physically reach your pages.
Checkpoints 1-7:
- Robots.txt is valid and intentional. Fetch
yoursite.com/robots.txtand confirm no production directories or pages are blocked unintentionally. Google caches robots.txt for up to 24 hours, so changes propagate slowly, test every edit in Search Console's robots.txt tester before deploying. - No critical pages return noindex. Use a crawler (Screaming Frog, Sitebulb, or your platform's native crawler) to flag any page with
<meta name="robots" content="noindex">or anX-Robots-Tag: noindexheader. Every noindex on a page you want to rank is a silent kill switch. - XML sitemap is present, clean, and submitted. Sitemaps should list only canonical, indexable URLs. Exclude redirects, 404s, and noindex pages, submitting those wastes crawl budget.
- Sitemap is submitted in Search Console. Navigate to Index > Sitemaps in GSC and confirm status shows "Success." Connecting GSC properly is a prerequisite for every data point in the rest of this audit.
- Crawl budget is not being wasted on low-value URLs. Faceted navigation, session IDs, internal search result pages, and print versions of pages are the most common crawl-budget killers on large sites.
- Server response codes are clean. A 5xx response on any page blocks crawling. A 4xx that has persisted for 90+ days should be addressed with a 410 if the content is gone permanently, or a 301 if it moved.
- Googlebot is not rate-limited or blocked at the CDN layer. Some WAF configurations inadvertently block crawl agents. Verify in your server logs that Googlebot is receiving 200s at a consistent rate.
2. Indexation Quality
Getting crawled is step one. Getting *indexed* for the right pages is step two.
Checkpoints 8-13:
- Run a "site:" query for diagnostic signal.
site:yourdomain.comgives a rough indexed count. A large gap between your sitemap URL count and the site: count warrants deeper investigation in Search Console's Page Indexing report. - No unintended duplicate URLs are indexed. Check that
http://,https://,www., and non-www versions all 301 to a single canonical version. Trailing slash inconsistencies (/pagevs/page/) create silent duplicate-content problems. - Canonical tags are correctly self-referencing. Every indexable page should have a
<link rel="canonical">pointing to its own URL unless it is a deliberate alternate. Conflicting canonicals, where the tag points to a different page than the 301 redirect target, confuse Google's canonicalization logic and split link equity. - Hreflang tags are bidirectionally correct. If you run a multilingual or multi-regional site, every page must reference every alternate including itself. One missing return tag invalidates the entire cluster.
- Paginated series use the correct treatment. Google dropped
rel="prev/next"support years ago, but many sites still use it without the correct canonical strategy. Each paginated page should either have a self-referencing canonical or be consolidated intentionally. - No soft 404s are slipping through. A page that returns a 200 status but shows "no results" or an empty template is a soft 404. Search Console surfaces these under Page Indexing > "Crawled - currently not indexed."
3. Site Architecture and Internal Linking
Architecture determines how PageRank flows through your site and how deep Googlebot has to crawl to reach important content.
Checkpoints 14-18:
- No important page is more than three clicks from the homepage. Pages deeper than four clicks from the root receive proportionally less crawl attention and link equity. Use a crawl visualization to identify orphan pages and deep content silos.
- Internal links use descriptive anchor text. Generic anchors like "click here" or "read more" pass no topical signal. Every internal link is an opportunity to reinforce the target page's relevance for a keyword cluster.
- No orphan pages exist for content you want to rank. An orphan page has zero internal links pointing to it. It may appear in your sitemap but will receive almost no crawl attention or equity. For teams managing large content libraries, SEOguru's internal linking recommendations surface these gaps automatically.
- Redirect chains are three hops or fewer. Each redirect hop absorbs a small percentage of link equity. A chain of five or more redirects is measurable lost authority. Flatten all chains to a single 301.
- No redirect loops exist. A→B→A loops block crawling entirely and cause 5xx-equivalent behavior in some crawlers.
4. Core Web Vitals and Page Experience
Core Web Vitals are a confirmed ranking signal, and as of March 2025, pages with severe Core Web Vitals failures appear less frequently in Google AI Overviews. According to the 2025 Web Almanac, only 48% of mobile pages pass all three Core Web Vitals, which means passing all three is itself a competitive differentiator.
Mobile Core Web Vitals pass rates, 2025 Web Almanac (HTTP Archive). LCP remains the hardest metric to pass; CLS is easiest.
Checkpoints 19-24:
- LCP is under 2.5 seconds on real user data. Largest Contentful Paint is the hardest Core Web Vital to pass, only 62% of mobile pages achieve it. The most common causes are unoptimized hero images, render-blocking CSS, and slow server response times (TTFB). Use PageSpeed Insights on your top 20 pages by traffic, not just the homepage.
- INP is under 200ms. Interaction to Next Paint replaced FID in March 2024. Heavy JavaScript, long tasks on the main thread, and third-party scripts are the primary culprits.
- CLS is under 0.1. Cumulative Layout Shift is usually caused by images without explicit dimensions, web fonts that swap after load, or dynamically injected content above the fold.
- TTFB is under 800ms. Time to First Byte is not a Core Web Vital but it is the upstream constraint for LCP. A slow server or CDN makes LCP improvement nearly impossible regardless of other optimizations.
- No render-blocking resources delay the critical rendering path. Audit with Chrome DevTools' Coverage report. CSS and JavaScript that block rendering but are not needed for above-the-fold content should be deferred or inlined selectively.
- Mobile usability has no errors in Search Console. Google indexes the mobile version of your site first. Clickable elements too close together, text too small to read, and content wider than the viewport are the most common mobile usability failures.
5. HTTPS, Security, and Technical Hygiene
Checkpoints 25-29:
- All pages are served over HTTPS with a valid certificate. An expired or misconfigured SSL certificate will show a browser security warning that kills organic traffic before Googlebot is even part of the equation.
- Mixed content warnings are absent. A page served over HTTPS that loads HTTP resources (images, scripts, iframes) generates mixed content warnings. These degrade user trust and can trigger browser blocking.
- No sensitive pages are accidentally accessible to crawlers. Admin panels, staging environments, and internal tools should be protected at the server level, not just behind a login form, because some crawlers follow links regardless of authentication.
- Your site is not on any spam or malware blocklists. Check Google's Transparency Report and Sucuri SiteCheck. A flagged domain loses rankings abruptly and triggers "This site may be hacked" warnings in SERPs.
- HTTP headers are correctly configured. At minimum:
Strict-Transport-Security,X-Content-Type-Options, and a functionalContent-Security-Policyheader. These also reduce vulnerability surface area.
6. Structured Data and Schema Markup
Structured data is no longer just a rich-result play. In 2026, it is the primary machine-readable signal that AI search engines use to understand, cite, and surface your content. Pages with clean schema implementation consistently appear more frequently in AI-generated answers, the markup gives LLMs a structured, unambiguous reading of your entities, claims, and content type.
Checkpoints 30-35:
- Article or BlogPosting schema is implemented on all editorial content. Include
author,datePublished,dateModified,headline, andpublisherat minimum. JSON-LD is the recommended format, every major AI engine prefers it because it is cleanly separated from HTML markup. - FAQPage schema is implemented on pages with Q&A content. FAQPage schema no longer produces Google SERP rich results (removed May 7 2026), but it remains one of the highest-ROI structured data investments for AI citation probability, AI engines extract Q&A pairs directly from the markup. Each
Question/Answerpair should be 40-60 words and self-contained. - Understand which schema types still earn Google rich results. HowTo rich results were removed by Google in 2023; FAQPage rich results were fully removed on May 7 2026. Neither type causes ranking drops, and both remain valid schema.org types that Google and AI engines still parse for content understanding, so keep FAQPage and Article/BlogPosting markup even though the visual SERP result is gone. Audit all templates with Google's Rich Results Test to confirm there are no validation errors on types that still produce rich results (e.g., Product, Recipe, Event, Review).
- Breadcrumb schema matches the actual site navigation. Breadcrumb schema helps AI engines understand site hierarchy and improves SiteLinks appearance in branded searches.
- Organization or WebSite schema is present on the homepage. This enables the Sitelinks Search Box feature and helps establish entity-level trust signals that flow into AI knowledge graphs.
- Structured data has no validation errors. Run every template through Google's Rich Results Test and Schema.org Validator. Errors in schema are silent, they do not cause visible page errors, so they persist indefinitely without active auditing.
For GEO-optimized content that needs to perform in AI answer engines, SEOguru's GEO scoring module evaluates your structured data implementation alongside other AI-search signals.
7. AI Search Visibility Signals
The 2026 audit checklist must include a category that did not exist three years ago: the signals that determine whether AI engines cite your content in generated answers.
Checkpoints 36-40:
- Entity clarity: your brand, authors, and key topics map to known entities. AI engines build knowledge graphs. Pages that reference named entities, people, organizations, products, places, with consistent, unambiguous language are easier to cite accurately.
- Content is structured for direct-answer extraction. Headers that match common question formats, definition blocks, and concise summary paragraphs at the top of sections are the patterns AI engines extract most reliably. The inverted-pyramid structure, conclusion first, is the single most impactful structural change you can make for AI citation probability.
- Your site's E-E-A-T signals are machine-readable. Author bylines with linked profiles,
authorschema pointing to a known Person entity, and an About page with organizational credentials all feed into AI trust assessment. - AI Overviews and featured snippets are monitored for your target queries. Use SEOguru's GSC integration to track which queries are generating zero-click AI responses for your brand. These queries need structured, direct-answer content to recapture citation share.
- GEO page scoring is tracked for your highest-priority pages. GEO (Generative Engine Optimization) scoring evaluates how well a page is structured for AI engine consumption, clarity of claims, schema completeness, entity disambiguation, and citation-friendliness. See the full GEO optimization guide for implementation detail.
Audit Domain Summary: Priority and Tooling
| Domain | Audit Points | Priority | Primary Tools |
|---|---|---|---|
| Crawlability & Robots | 1-7 | Critical | GSC, Screaming Frog, server logs |
| Indexation Quality | 8-13 | Critical | GSC Page Indexing report, crawler |
| Architecture & Linking | 14-18 | High | Sitebulb, SEOguru, Ahrefs |
| Core Web Vitals | 19-24 | High | PageSpeed Insights, CrUX, GSC |
| HTTPS & Hygiene | 25-29 | High | SSL Labs, Sucuri, browser DevTools |
| Structured Data | 30-35 | Medium-High | Rich Results Test, Schema Validator |
| AI Search Signals | 36-40 | Medium (rising) | SEOguru GEO module, GSC, Perplexity |
How to Turn Audit Findings Into a Fix Sprint
An audit that produces a spreadsheet nobody acts on is worse than no audit, it generates false confidence.
The most effective workflow maps every finding to a ticket with a severity, an owner, and an acceptance criterion before the audit document is closed. Severity levels should be consistent: Critical (blocks indexing or causes active ranking loss), High (measurable impact within 30 days if fixed), Medium (incremental gains), and Low (hygiene items).
Changes that touch on-page signals, meta tags, canonical values, structured data, internal links, should route through a formal approval record before they publish. An unchecked canonical tag change can accidentally deindex an entire content category. SEOguru's approval workflow enforces this gate on every recommended change, with a full audit trail of who approved what and when.
The five-stage technical SEO fix workflow. The Approve stage is where most teams lose control, changes go live without a record of who decided what.
How Often Should You Run a Technical Audit?
The answer depends on your site's velocity of change. A site publishing 20+ pages per month, running A/B tests, or migrating CMS platforms needs continuous monitoring, not a quarterly point-in-time crawl.
For most teams the practical answer is a tiered cadence:
- Weekly: GSC coverage errors, Core Web Vitals field data, crawl anomalies
- Monthly: Full crawl of the entire site, structured data validation, redirect chain audit
- Quarterly: Deep architecture review, log file analysis, competitor gap analysis
- After every major change: URL migrations, CMS upgrades, template redesigns, server changes
Frequently Asked Questions
How long does a technical SEO audit take?
A thorough technical audit of a site under 10,000 pages takes 8-16 hours for an experienced practitioner using proper tooling. Enterprise sites with 100,000+ pages can take several days. The crawl itself is fast; the analysis, prioritization, and documentation take most of the time. Using a platform that runs continuous checks cuts this time significantly on recurring audits.
What is the most common technical SEO issue found in audits?
Canonical tag misconfiguration is the single most common finding across both small and enterprise sites. Research consistently shows that a large share of international and enterprise sites carry conflicting canonical signals or malformed hreflang, errors that accumulate quietly because no one is checking systematically. Redirect chains and missing structured data are close behind.
Do I need a paid tool to run a technical SEO audit?
No, but the free tools have real limits. Google Search Console is free and provides GSC Coverage, Core Web Vitals field data, and crawl stats. Screaming Frog's free version crawls up to 500 URLs. PageSpeed Insights is free for Core Web Vitals diagnostics. For sites above 500 pages, or for ongoing monitoring rather than point-in-time audits, a paid platform saves significantly more time than it costs.
Does fixing technical SEO issues guarantee a ranking improvement?
No guarantee exists in SEO. Fixing critical issues, blocked crawl paths, duplicate canonicals, severe Core Web Vitals failures, removes constraints that are actively suppressing rankings. Removing a constraint does not automatically produce a lift; it restores the page's ability to compete on content quality and authority. The improvements are real but not always immediate, and Google's re-crawl and re-index cycle adds latency of days to weeks.
How does a technical audit relate to GEO (Generative Engine Optimization)?
Technical audit and GEO increasingly overlap. Structured data, entity clarity, crawlability, and page speed all affect AI-search visibility, not just traditional rankings. Pages with severe technical issues appear less frequently in AI Overviews and AI-generated answers. A clean technical foundation is a prerequisite for effective GEO, not a separate concern.
What is crawl budget and does it matter for my site?
Crawl budget is the number of URLs Googlebot is willing and able to crawl on your site within a given timeframe (defined by Google in their official crawl budget documentation). For sites under 1,000 pages with fast load times and no serious structural issues, crawl budget is rarely a limiting factor. For large e-commerce or content sites with millions of URLs, especially those with faceted navigation, infinite scroll, or large numbers of near-duplicate pages, crawl budget management is critical to ensuring priority content gets indexed.
Start Your Audit
The 40 checkpoints above cover every technical domain that affects rankings and AI-search visibility in 2026. Work through them once to establish a baseline, then build monitoring cadences so you catch issues before they compound into traffic events. For a deeper diagnosis of the specific issues most likely to cap your growth silently, see 15 technical SEO issues quietly capping your organic growth.
For teams who want continuous tracking rather than manual spot-checks, SEOguru's technical audit module runs all of this against your live site and routes findings to a sprint board, no PDF, no spreadsheet, no one-and-done checklist. See how it works.
Sources
- HTTP Archive, 2025 Web Almanac: Performance & Core Web Vitals
- Google Search Central, Crawl Budget Management
- Google Search Central, FAQPage Structured Data
- Search Engine Land, How Schema Markup Fits Into AI Search (2026)
- Semrush, Technical SEO Checklist
- Backlinko, Canonical URL and Duplicate Content Guide