The highest-weighted pillar. Measures how well a large language model can extract, chunk, and cite your content from schema markup to paragraph structure.
Structural integrity tells an LLM what your page is. AI Extractability determines whether it can actually use your content. These are two different problems and this pillar, at 35% of the total score, is the more consequential one.
When an LLM processes a page for potential citation or summarization, it doesn't read it linearly. It chunks content into discrete units, identifies patterns that signal factual or instructional value, and evaluates whether the information is structured for reliable extraction. A page full of long, unbroken paragraphs with no schema markup and no date signals is technically readable but it's not extractable in any meaningful sense.
AI Extractability is the single strongest predictor of whether your content gets cited in an AI-generated answer. A page can have perfect structural integrity and still score poorly here.
This pillar covers eight distinct checks, each targeting a specific behavior in how LLMs parse and prioritize web content. JSON-LD schema markup alone can account for up to 26 points making it the most impactful individual optimization available.
JSON-LD structured data is the single most impactful optimization in this entire scoring system. It provides LLMs with explicit, machine-readable descriptions of your content's type, structure, and relationships without requiring the model to infer any of this from prose.
The analyzer rewards specific schema types based on their extractability value. FAQPage earns the most points (+10) because it explicitly structures question-answer pairs exactly the format LLMs use when generating responses. HowTo earns +8 for similar reasons. Organization and Person schemas earn +5 as trust and authorship signals. Having more than one schema type on a page earns an additional +3 for breadth.
FAQPage + Organization (+18 pts combined)
HowTo + BreadcrumbList (+11 pts combined)
No JSON-LD schema found (0 pts)
If you implement only one change from this entire pillar, make it JSON-LD schema. A single FAQPage schema on a relevant page can add 10 points to this pillar alone.
LLMs extract content in chunks, and paragraphs are the natural chunking unit in prose-based content. Paragraphs that are too short provide insufficient context for accurate extraction. Paragraphs that are too long force the model to split them in unpredictable ways, often losing nuance or misattributing sentences.
The analyzer measures the average word count across all paragraphs with more than 20 characters. The ideal range is 40–120 words per paragraph. Paragraphs averaging over 200 words receive a −5 penalty. Paragraphs averaging under 40 words receive 0 points not a penalty, but a missed opportunity.
Avg paragraph: 72 words (ideal 40–120) → +8 pts
Avg paragraph: 240 words (too long) → −5 pts
Lists are among the most LLM-friendly content formats in existence. They present information in discrete, labeled units that are trivially easy to extract and reformat. When an LLM generates a response that includes "here are five reasons why…", it is almost always drawing from list-formatted source content.
The analyzer checks for the presence of <ul> and <ol> elements. Two or more lists earns +5 points. Zero lists earns a −5 penalty because a page with no list content at all is significantly harder for an LLM to extract from than one that structures at least some information in list form.
Converting even one section of your prose into a bulleted list especially for steps, features, or comparisons can meaningfully improve your extractability score.
LLMs are disproportionately likely to extract and cite content that answers direct questions or defines concepts. This is because AI assistants are primarily used in a question-answering context and they preferentially source from content that is already structured as an answer.
The analyzer detects six specific patterns in your page's text that signal definitional or instructional intent: "What is", "How to", and their Greek equivalents "Τι είναι", "Πώς να", "Οδηγός", and "Βήματα". Each detected pattern earns +3 points, up to a maximum of +9.
"What is LLM visibility?" + "How to improve your score" → +6 pts
No definitional patterns detected → 0 pts
HTML tables are highly structured content that LLMs can parse with near-perfect accuracy. Comparative data, specifications, pricing tables, and feature matrices are all significantly more extractable in table format than in prose and LLMs will preferentially use table data when generating structured comparisons.
The analyzer checks for the presence of at least one <table> element. This is a modest bonus (+4 pts) because tables are relevant to a subset of page types not every page benefits from tabular data. But for product, comparison, or reference pages, a well-structured table is a strong extractability signal.
Breadcrumbs serve a dual purpose for LLM extractability: they provide navigational context (where this page sits within the site hierarchy) and, when implemented with BreadcrumbList JSON-LD schema, they provide machine-readable path information that helps LLMs understand the topical context of the content.
The analyzer distinguishes between two levels of implementation. A BreadcrumbList JSON-LD schema earns full +5 points it's the most reliable signal. A breadcrumb navigation detected via CSS class or ID patterns (without schema) earns +3 points. No breadcrumb at all earns 0.
BreadcrumbList JSON-LD schema → +5 pts
HTML breadcrumb nav (no schema) → +3 pts
No breadcrumb detected → 0 pts
Publication and modification dates are freshness signals that LLMs use to assess content recency. For factual or rapidly-changing topics, an LLM may prefer more recently dated content over older content even if the older content is structurally superior. Explicitly marking your content with dates removes ambiguity about when it was written.
The analyzer detects date signals across multiple implementation methods: itemprop="datePublished", property="article:published_time", <time datetime="...">, and JSON-LD "datePublished". Having both datePublished and dateModified earns the full +5 points. Having only one earns +3.
Internal links signal to LLMs that a page is part of a larger content ecosystem rather than an isolated document. They also help AI crawlers navigate and index your site more effectively. Pages with strong internal link structures tend to receive more consistent attribution because the model can contextualize them within a broader topical authority.
The analyzer counts links with href values starting with / or # (relative internal links). Five or more internal links earns full +5 points. Two to four earns +2. Fewer than two earns 0 and for content pages this is almost always fixable by adding contextual links to related content.
8 internal links found → +5 pts
3 internal links found → +2 pts
1 internal link found → 0 pts
The AI Extractability pillar has a raw maximum of 54 points, normalized to a 0–100 scale. Penalties apply for actively poor implementations (long paragraphs, no lists). The schema checks alone can contribute up to 26 points nearly half the raw maximum.
Unlike some pillars where improvements require content rewrites, most AI Extractability fixes are additive you add schema, you add structure, you add links. The underlying content doesn't have to change.
Run a free analysis and get a detailed breakdown of every check with specific recommendations for your page.