Structural Integrity

How LLMs read your page structure

When a large language model processes a webpage, it doesn't read it the way a human does. It extracts signals and the first layer of signals it reaches for is structural. Before it processes your paragraphs, your arguments, or your expertise, it looks at the skeleton of your page: what's the title, what's the language, is there a canonical signal, does the heading hierarchy make sense?

Structural Integrity measures exactly these foundational signals. A page with strong structural integrity tells an LLM unambiguously what it is, what it's about, and how authoritative its metadata is. A page with weak structural integrity forces the LLM to guess, and when LLMs guess, they often skip or misattribute.

Structural signals are processed before content. A broken heading hierarchy or a missing canonical tag can reduce your citation likelihood even if your content is excellent.

This pillar accounts for 30% of your total score making it the second most influential factor after AI Extractability. It covers six distinct check categories, each mapped to a specific LLM extraction behavior.

What gets measured and why

Title Tag

Up to +15 pts

The <title> element is the single most important structural signal on a page. LLMs use it as the primary identifier when referencing or citing content it's often what appears in AI-generated summaries, citations, and answers.

The analyzer checks not just whether a title exists, but whether it's meaningful. Titles under 10 characters or over 80 characters are penalized, as are generic titles like "Home" or "Untitled". A well-crafted title in the 30–60 character range earns full points.

✓ How to Optimize Content for LLM Visibility hey-eye

✗ Home

✗ Welcome to our website we offer great services for all your needs in a wide range of categories

Meta Description

Up to +8 pts

While meta descriptions don't influence traditional search ranking directly, they provide LLMs with a concise, author-intended summary of the page's purpose. This is valuable for extraction accuracy it tells the model what the page claims to be about.

The analyzer rewards descriptions between 50 and 160 characters that are specific and informative. Missing or extremely short descriptions lose significant points, as do descriptions that are clearly keyword-stuffed or generic.

Heading Hierarchy (H1–H3)

Up to +20 pts

Heading tags are the outline of your content. LLMs use them to chunk long-form content into discrete sections each heading signals the start of a new topic or subtopic, and the model uses this hierarchy to navigate and attribute information correctly.

The checks here are detailed. A single H1 is required (multiple H1s or zero H1s both lose points). H2s are expected for any page with substantial content. Proper nesting matters: jumping from H1 to H3 without an H2 in between signals poor structure. H3s used for section subdivision earn bonus points.

✓ H1 → H2 → H3 → H2 → H3

✗ H1 → H1 → H3 (skipped H2, duplicate H1)

This is the highest-scoring individual check in the Structural Integrity pillar. A page with no H1 can lose up to 20 points in this pillar alone.

Canonical URL

Up to +10 pts

The rel="canonical" tag tells both search engines and AI crawlers which version of a page is the "official" one. For LLMs, this is important for deduplication if your content exists at multiple URLs (www vs non-www, HTTP vs HTTPS, paginated versions), the canonical tag ensures attribution is consistent.

The analyzer checks for the presence of a canonical tag and whether it points to a valid, absolute URL. A self-referencing canonical (pointing to the current page's own URL) is the recommended pattern and earns full points.

Open Graph Tags

Up to +12 pts

Open Graph metadata (og:title, og:description, og:image, og:url) was originally designed for social sharing previews, but it has become a secondary structured signal that LLMs increasingly rely on.

When a page has complete OG tags, it provides redundant, machine-readable confirmation of the page's identity the same information as the title and meta description, but in a format specifically designed for automated processing. This redundancy improves extraction confidence.

The analyzer checks for all four core OG properties. Partial implementation (e.g., only og:title without og:description) earns partial points.

Semantic HTML Tags

Up to +15 pts

Semantic HTML elements <article>, <section>, <main>, <nav>, <header>, <footer>, <aside> are the single most underused LLM optimization lever in web development.

These tags tell LLMs exactly what role each block of content plays. A <main> element says: "this is the primary content, ignore the navigation and footer." An <article> says: "this is a self-contained piece of content worth extracting." Without these signals, LLMs must infer structure from visual patterns which is far less reliable.

✓ <main><article><section>...</section></article></main>

✗ <div class="main"><div class="article">...</div></div>

Using only <div> and <span> throughout your HTML is one of the most common structural issues detected by the analyzer and one of the easiest to fix.

Language & Hreflang

Up to +10 pts

The lang attribute on the <html> element is a direct signal to LLMs about the language of the content. This matters for two reasons: it helps the model apply the correct language model, and it prevents misclassification of content that happens to contain foreign words or phrases.

For multilingual sites, hreflang tags additionally signal which language/region variant is canonical for each audience. LLMs that power AI search features (like Perplexity or Bing's AI) use these signals to serve the appropriate language version.

The analyzer checks for a valid lang attribute (e.g., lang="en" or lang="el") and, for sites with multiple language versions, checks for consistent hreflang implementation.

How the score is calculated

The Structural Integrity pillar has a maximum raw score that is then normalized to a 0–100 scale. Each check contributes a specific number of points, and penalties can be applied for actively wrong implementations (e.g., multiple H1s, extremely long titles, missing canonical).

Check

Max Points

Key Conditions

Title Tag

+15

Present, 30–60 chars, not generic

Meta Description

Present, 50–160 chars

H1 Tag

+15

Exactly one H1, meaningful content

H2 Tags

At least one H2 present

H3 Tags

Used for section subdivision

Canonical URL

+10

Present, absolute URL

Open Graph

+12

All 4 core properties present

Semantic HTML

+15

article, main, section, nav, header, footer

Lang Attribute

Valid lang on <html> element

Hreflang

Present for multilingual sites

Total (normalized to 100)

100

Penalties apply for wrong implementations

What the analyzer finds most often

Missing or generic title tag

Pages with titles like "Home", "Index", or no title at all are among the most common findings. LLMs cannot reliably identify or cite content without a meaningful title.

Multiple H1 tags

CMS templates and page builders often inject an H1 in the header or hero section in addition to the article H1, creating ambiguity about the primary topic of the page.

No semantic HTML elements

Sites built entirely with <div> containers common in older WordPress themes and custom-built sites score zero on semantic HTML checks. This is often the largest single point loss in this pillar.

Incomplete Open Graph implementation

Many sites implement og:title but skip og:description, og:image, or og:url. Partial OG implementation earns partial credit but complete implementation is straightforward and takes minutes to fix.

Missing canonical tag

E-commerce and CMS sites frequently generate multiple URLs for the same content (filtered views, pagination, tracking parameters). Without a canonical tag, LLMs may fragment attribution across URL variants.

Missing lang attribute

A surprisingly common omission, especially on older sites. Without a lang attribute, LLMs may misidentify the language of the content particularly problematic for sites in non-English languages.

Quick wins for Structural Integrity

Unlike content quality improvements, most structural fixes are implementation changes not creative work. They can typically be done in an afternoon and have immediate, measurable impact on your score.

Add a descriptive title tag to every page

Format: Primary keyword Secondary context | Brand name. Aim for 50–60 characters. Every page should have a unique title that describes its specific content.

Low effort

Audit your H1 usage across all pages

Every page should have exactly one H1 that matches or closely reflects the title tag. Use H2s for major sections and H3s for subsections. Never skip a heading level.

Low effort

Replace structural <div>s with semantic elements

Wrap your main content in <main>, individual articles or posts in <article>, and content sections in <section>. Your navigation should be in <nav>, your page header in <header>, and your footer in <footer>.

Medium effort

Implement complete Open Graph tags

Add og:title, og:description, og:image, and og:url to every page. Most CMS platforms have plugins that do this automatically.

Low effort

Add a self-referencing canonical tag

Every page should include <link rel="canonical" href="[full URL of this page]"> in the <head>. This is the single easiest check to pass and one of the most commonly missed.

Low effort

Set the lang attribute on your <html> element

Add lang="en" (or your page's language code) to the opening <html> tag. For Greek content: lang="el". This takes 30 seconds and should be done immediately if missing.

Low effort

How LLMs read your page structure

What gets measured and why

How the score is calculated

What the analyzer finds most often

Quick wins for Structural Integrity

See how your page scores on Structural Integrity

The four LLM visibility pillars