Content Clarity

Why it matters

Clarity is a proxy for trustworthiness

Content Clarity is the smallest of the four pillars at 15%, but it addresses something the others don't: the quality of the content itself, independent of how it's structured or marked up. A page can have perfect schema markup and a clean heading hierarchy, but if the prose is dense, convoluted, and filled with jargon, it's harder for an LLM to extract reliably and harder for a human to trust.

LLMs are sensitive to readability for a specific reason: they are trained on and optimized for human-readable text. Content that reads more naturally produces more reliable extraction outputs. Dense academic prose or keyword-stuffed marketing copy both introduce noise that degrades extraction accuracy.

Content Clarity improvements are the only changes in this scoring system that simultaneously improve your LLM visibility score and your human conversion rate. Clearer writing serves both audiences.

This pillar covers five checks, ranging from computational readability metrics to the ratio of visible content to surrounding code. The Flesch Reading Ease score the most well-known readability metric is calculated here, with a special provision for Greek-language content where the formula does not apply.

The checks

What gets measured and why

Average Sentence Length

+5 pts (or −5 penalty)

Sentence length is one of the most reliable proxies for writing complexity. Long sentences typically contain multiple clauses, qualifications, and dependent phrases all of which increase the cognitive load required to parse them. For LLMs, long sentences introduce ambiguity about what the sentence is actually asserting, which can cause extraction errors or hedging in AI-generated summaries.

The analyzer calculates average word count per sentence across all text content. Sentences averaging under 20 words earn the full +5 points. Sentences averaging 30 or more words receive a −5 penalty. The range between 20–29 words earns 0 not penalized, but not rewarded either.

✓ Avg sentence: 14 words → +5 pts

✗ Avg sentence: 34 words → −5 pts

The most effective way to shorten average sentence length is to split compound sentences at conjunctions like "and", "but", and "which" and to move subordinate clauses into separate sentences.

Heading Density

+5 pts

Heading density measures the ratio of heading elements (H1–H6) to paragraph elements on a page. This ratio captures a specific quality: whether the content is structured into labeled, navigable sections, or whether it's written as a continuous block of undivided prose.

The ideal ratio is between 0.15 and 0.40 meaning roughly one heading for every 2–7 paragraphs. A ratio below 0.15 indicates too few headings for the amount of content (wall-of-text risk). A ratio above 0.40 indicates too many headings relative to content (fragmented, thin-content risk). Both extremes earn 0 points.

✓ 5 headings / 20 paragraphs → ratio 0.25 (ideal) → +5 pts

✗ 2 headings / 30 paragraphs → ratio 0.07 (too few) → 0 pts

✗ 10 headings / 5 paragraphs → ratio 2.0 (too many) → 0 pts

Flesch Reading Ease Score

+5 pts (or −5 penalty) · English only

The Flesch Reading Ease formula is the most widely used computational readability metric. It calculates a score from 0 to 100 based on average sentence length and average number of syllables per word. Higher scores indicate simpler, more accessible text. The formula is: 206.835 − (1.015 × avg sentence length) − (84.6 × avg syllables per word).

Scores of 60 or above are considered "standard" readability and earn +5 points this corresponds roughly to the reading level of a typical news article. Scores below 40 indicate very difficult text (academic or highly technical) and receive a −5 penalty. Scores between 40–59 earn 0 points.

✓ Flesch score: 68 (easy to read) → +5 pts

✗ Flesch score: 28 (very difficult) → −5 pts

Greek-language content is automatically detected and skipped for this check. The Flesch formula was developed for English and produces unreliable results for Greek text. Pages identified as Greek-language receive a neutral score (0 pts, no penalty) for this check.

Content-to-Code Ratio

+5 pts

The content-to-code ratio compares the length of the page's visible text content to the total length of its raw HTML. Pages with a high ratio are content-dense most of the file is readable text. Pages with a low ratio are code-heavy most of the file is HTML markup, inline CSS, JavaScript, tracking scripts, and other non-content boilerplate.

LLMs processing raw HTML (as this analyzer does) must filter out code to reach the actual content. A very low content-to-code ratio means the model has to wade through a large amount of noise to find extractable text which can reduce extraction accuracy and increase the chance of misattribution.

A ratio of 25% or above earns +5 points. A ratio of 15–24% is considered acceptable but earns 0 points. Below 15% earns 0 and is flagged as "too much boilerplate".

✓ Content-to-code ratio: 38% → +5 pts

✗ Content-to-code ratio: 9% (too much boilerplate) → 0 pts

External Authority Links

Up to +5 pts

Linking to authoritative external sources is a signal of content credibility. When a page references and links to well-established, high-trust sources, it contextualizes its own claims within a broader knowledge ecosystem which LLMs interpret as a positive signal when deciding whether to cite or summarize that content.

The analyzer checks for outbound links to a specific set of recognized authority domains, including wikipedia.org, government domains (.gov, gov.gr), academic institutions (.edu), and major international news organizations (bbc.com, reuters.com, apnews.com, who.int, europa.eu). Two or more such links earn +5 points. One earns +2.

This is the only check in the Content Clarity pillar that requires you to add new content rather than restructure existing content. Even one citation to a relevant authoritative source can earn +2 points.

Video Captions

+5 pts

Video content is invisible to LLMs unless it has been transcribed or captioned in an extractable format. A page with a video as its primary content source but no accompanying text, captions, or transcript is essentially blank from an LLM extraction perspective.

The analyzer checks for two conditions. A <video> element with a <track kind="captions"> child earns +5 points the captions are directly extractable from the HTML. A YouTube embed (youtube.com/embed) is detected and flagged as neutral YouTube auto-captions exist but are not directly accessible from the page's HTML source.

For most pages this check earns 0 and has no impact. It becomes relevant primarily for video-first content pages, tutorial sites, and media publications.

Scoring

How the score is calculated

The Content Clarity pillar has a raw maximum of 30 points for English content (25 for Greek, since Flesch is skipped), normalized to 0–100. Penalties apply for overly long sentences and very difficult Flesch scores. Unlike other pillars, most checks here reward absence of problems as much as presence of positives.

Check

Max Points

Key Conditions

Sentence length

+5 / −5

Under 20 words avg: +5. Over 30 words avg: −5

Heading density

+5

Ratio 0.15–0.40 (headings to paragraphs)

Flesch Reading Ease

+5 / −5

Score ≥60: +5. Score <40: −5. Greek: skipped

Content-to-code ratio

+5

25%+ ratio earns points. Under 15%: flagged

External authority links

+5

2+ authority links: +5. 1 link: +2

Video captions

+5

<track kind="captions"> on a <video> element

Total (normalized to 100)

100

Raw max: 30 pts (EN) / 25 pts (EL)

Common issues

What the analyzer finds most often

Sentences averaging 30+ words

Common in B2B and professional services content, where writers tend toward formal, complex prose. The −5 penalty here compounds with a likely low Flesch score, meaning this issue can cost up to 10 points in this pillar alone.

Very low Flesch score on English content

Legal, medical, technical, and financial pages are the most frequent offenders. These domains have legitimate complexity, but often the complexity is in the subject matter not the sentences and the two can be separated with careful editing.

Low content-to-code ratio

Pages built with heavy JavaScript frameworks, inline SVG graphics, large amounts of inline CSS, or multiple tracking/analytics scripts frequently score under 15% on this check. The fix is usually moving scripts to external files and minifying inline code.

No external authority links

Many marketing and product pages are entirely self-referential they link only to other pages on the same domain. Adding even one citation to a relevant authoritative source can shift this from 0 to +2 points with minimal editorial effort.

Heading density out of range

Long-form blog posts frequently add many H2s and H3s without proportionate paragraph content, pushing the ratio above 0.40. Conversely, homepages and landing pages often have very few headings relative to their paragraph count, falling below 0.15.

Video content without captions

Video-first pages (tutorials, webinars, product demos) that rely on the video for their primary content but provide no transcript or caption track are essentially invisible to LLM extraction. A text transcript below the video solves this entirely.

Action plan

Quick wins for Content Clarity

Content Clarity improvements are unique in that many of them involve editing existing content rather than adding new elements. The good news: shorter sentences and cleaner paragraphs are almost always improvements regardless of the scoring impact.

01

Edit for shorter sentences throughout

Go through your content and break any sentence over 25 words at a natural clause boundary. Pay particular attention to sentences with "which", "that", "however", and "although" these are usually candidates for splitting into two sentences.

Medium effort

02

Move scripts and styles to external files

Inline JavaScript, inline CSS, and large SVG assets all reduce your content-to-code ratio. Moving them to external files (<script src="...">, <link rel="stylesheet">) cleans up the HTML and improves the ratio immediately.

Low effort

03

Add at least two external authority citations

Identify two claims in your content that could be supported by an external authoritative source a government site, Wikipedia, a major publication. Add inline links. This earns +5 points and also strengthens the credibility of your content for human readers.

Low effort

04

Check and adjust your heading-to-paragraph ratio

Count your headings and paragraphs. If your ratio is below 0.15, add more section headings. If it's above 0.40, either add more paragraph content under existing headings or consolidate some headings. Aim for roughly one heading per 3–5 paragraphs.

Low effort

05

Add a text transcript for video-first pages

If your page's main content is a video, add a full transcript as text below or alongside it. This makes the content fully extractable and also improves SEO. Adding a <track kind="captions"> element to hosted video earns the full +5 points.

Medium effort