Information gain: the thing AI engines are measuring you on

Q: Can combining facts in a fresh way work, or do I need my own data?

**Short answer:** both work. You do not always need original data. A fresh combination of existing facts, or a well-evidenced case against the consensus, gives an engine something it cannot get elsewhere. As of 2026, no clean study has settled whether curation fully replaces original research. But Perplexity still rewards a non-redundant take, as long as the evidence is real.

Table of Contents

TL;DR

Information gain is the new information a page adds beyond what competitors already say, the signal AI engines use to pick which page to cite.
Writing quality is the baseline, not the differentiator: two equally clear pages are not equal if one carries a fact the other lacks.
Original data, first-hand testing, a named expert quotation, and an evidenced counter-narrative are the four reliable sources of information gain.
A 2024 Princeton study found that adding specific statistics to a page lifted AI visibility by 41 percent.
Small publishers need no research budget: a 50-person survey with a clear method, or a documented test, counts as information gain.

Every guide to AI search optimization focuses on structured data, answer blocks, and clear H2s. All three are useful — but they explain how engines read your page, not why an engine picks your page over a competitor’s.

Perplexity, ChatGPT, and Google’s AI Overview each compare your page against every page they have read on the same topic. A 2024 Princeton study found that adding specific, verifiable numbers to a page lifted AI visibility by 41 percent. Not because numbers look more authoritative — because numbers are information the engine cannot reconstruct from pages that lack them. That gap between what your page says and what every other page already says is called information gain.

What does information gain actually measure on your page?

Information gain is not a quality score. It is not about being better-written than the competition.

It is about being different in a specific way. Your page must contain information the other pages on that topic do not have.

Why would an engine cite a page that repeats what ten others already say? It would not. The engine has no reason to. It already has ten pages covering the same ground.

A new number. A tested result. A named expert’s position. Any of those gives the engine a reason to include your page.

So does a perspective that contradicts the consensus — as long as it comes with cited evidence.

What does your page say that the others cannot? That is the question that determines whether you get cited or filtered out.

Most content advice frames the competition as a quality race. The actual competition is an information race. Two pages of equal quality are not equal if one says something the other cannot.

What counts as new information for a small publisher?

The obvious source is original research. But original does not have to mean a funded study or a large sample.

A 50-person survey with a disclosed methodology is a citable primary source. A before-and-after measurement from a real project you ran is a citable case study. A test you conducted — even an informal one — with stated conditions and a specific outcome is something the engine cannot find elsewhere.

Here are the four types that consistently produce information gain:

Original data: a survey, a benchmark, an analytics export, or a before-and-after measurement from a real implementation.
First-hand testing: a specific experiment with stated conditions and a measured result — not a description of what an approach does in theory.
A named expert quotation: a claim attributed to a verifiable, named person, not paraphrased into a generic “experts say.”
A counter-narrative with evidence: an argument against a widely accepted view, backed by cited sources rather than strong opinion alone.

What do counter-narratives give the engine that consensus pages do not? A non-redundant perspective. An AI engine synthesizing a broad answer wants to represent a range of views, not just the majority position. A page arguing against the consensus — with cited evidence — is the kind of source the engine includes to avoid looking one-sided.

The requirement is evidence. An unsupported contrarian claim is not a citation signal. It is a credibility risk.

A 2024 paper from Princeton, Georgia Tech, and the Allen Institute tested nine optimization methods across 10,000 queries on BingChat and Perplexity. The GEO study found that adding statistics produced the single largest visibility lift of any method tested — 41 percent. No other technique came close.

Why does a well-written page still earn no citations?

How much of your last piece could a competitor write from the same three sources you used?

If the honest answer is “most of it,” the engine has no reason to prefer your version over theirs.

A page can be clear, structured, and genuinely useful — and still earn zero citations if it says only what the other pages already say. The engine is not grading your writing. It is measuring what you say against the pool of pages it has already read. If your page says what the others say — even more elegantly — there is no gain to extract.

Writing quality is the baseline. The differentiator is the information you carry that no one else has.

Where you place that information inside the page also changes what the engine extracts. I wrote about where AI engines look first on a page — the placement side of the same problem. Placement matters, but only after you have something worth placing.

If AI engines are citing your page but getting the facts wrong, that is a different problem. Correcting what AI engines say about your content starts from a different place than building information gain from scratch.

Why the engine may be weighing your effort, not just your words

A May 2024 leak of Google’s internal Content Warehouse API exposed an attribute called contentEffort. Google’s own description called it “LLM-based effort estimation for article pages.”

Barry Adams, an editorial SEO practitioner, named this the LLM-Effort Hypothesis. His argument: search and AI citation systems may use a language model to estimate the intellectual work behind a page. A page built on original research and first-hand testing would score differently than a page assembled from summarized existing content.

Google has not confirmed whether this signal affects AI citations specifically. It is a hypothesis, not a verified ranking factor.

But the direction it points matches the information gain principle. Content that required original work is harder to replicate. A competing page cannot reproduce a number only you measured. Whatever the mechanism — effort scoring, information gain measurement, or both — content that cannot be assembled from existing sources holds the strongest structural advantage.

Understanding how engines discover pages before they measure them is a separate question. How AI engines find and evaluate content covers the retrieval side of the chain.

What does your next piece of content contain that no competitor has?

Ask that question before you open a document.

If the honest answer is “nothing I cannot find on three other pages,” you have a choice. Find an approach that produces a real answer. Or wait until you have one.

The answer does not have to be a research study. It can be a 50-person survey with a clear methodology. It can be a documented failure: something you tried, measured, and found did not work.

A fair comparison that admits genuine trade-offs counts as information gain. So does a conversation with a named expert whose position differs from the consensus. None of those require a budget — only the discipline to ask the question before writing starts, not after.

A page that says something others cannot say gets cited. A page that says what everyone else already says becomes part of the noise the engine is filtering out.

If you are unsure whether your recent content passes that test, I am glad to take a look. You can contact me here. Tell me what the page covers and who it was written for.

I will read it and tell you where the information gap is. No pitch and no upsell.

Information gain: the thing AI engines are measuring you on

What does information gain actually measure on your page?

What counts as new information for a small publisher?

Why does a well-written page still earn no citations?

Why the engine may be weighing your effort, not just your words

Other questions worth answering

Do ChatGPT and Perplexity both reward unique facts?

How many statistics belong in a single article?

Why does clean structure matter even when the facts are strong?

Can combining facts in a fresh way work, or do I need my own data?

What does your next piece of content contain that no competitor has?

Why AI crawlers can’t reach your website

How to fix wrong information about you in AI answers

Should you allow GPTBot and AI training crawlers?

How to tell if AEO advice is worth trusting

Why ChatGPT now links to brands in its answers

Does FAQ markup really help AI citation?

What does information gain actually measure on your page?

What counts as new information for a small publisher?

Why does a well-written page still earn no citations?

Why the engine may be weighing your effort, not just your words

Other questions worth answering

Do ChatGPT and Perplexity both reward unique facts?

How many statistics belong in a single article?

Why does clean structure matter even when the facts are strong?

Can combining facts in a fresh way work, or do I need my own data?

What does your next piece of content contain that no competitor has?

Similar Posts