Majestic

  • Site Explorer
    • Majestic
    • Summary
    • Ref Domains
    • Backlinks
    • * New
    • * Lost
    • Context
    • Anchor Text
    • Pages
    • Topics
    • Link Graph
    • Related Sites
    • Advanced Tools
    • Author ExplorerBeta
    • Summary
    • Similar Profiles
    • Profile Backlinks
    • Attributions
  • Compare
    • Summary
    • Backlink History
    • Flow Metric History
    • Topics
    • Clique Hunter
  • Link Tools
    • My Majestic
    • Recent Activity
    • Reports
    • Campaigns
    • Verified Domains
    • OpenApps
    • API Keys
    • Keywords
    • Keyword Generator
    • Keyword Checker
    • Search Explorer
    • Link Tools
    • Bulk Backlinks
    • Neighbourhood Checker
    • Submit URLs
    • Experimental
    • Index Merger
    • Link Profile Fight
    • Mutual Links
    • Solo Links
    • PDF Report
    • Typo Domain
  • Free SEO Tools
    • Get started
    • Backlink Checker
    • Majestic Million
    • Browser Plugins
    • Google Sheets
    • Post Popularity
    • Social Explorer
  • Support
    • Blog External Link
    • Support
    • Get started
    • Tools
    • Subscriptions & Billing
    • FAQs
    • Glossary
    • Style Guide
    • How To Videos
    • API Reference Guide External Link
    • Contact Us
    • About Backlinks and SEO
    • SEO in 2026
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Link Building Guides
  • Sign Up for FREE
  • Plans & Pricing
  • Login
  • Language flag icon
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文
  • Get started
  • Login
  • Plans & Pricing
  • Sign Up for FREE
    • Summary
    • Ref Domains
    • Map
    • Backlinks
    • New
    • Lost
    • Context
    • Anchor Text
    • Pages
    • Topics
    • Link Graph
    • Related Sites
    • Advanced Tools
    • Summary
      Pro
    • Backlink History
      Pro
    • Flow Metric History
      Pro
    • Topics
      Pro
    • Clique Hunter
      Pro
  • Bulk Backlinks
    • Keyword Generator
    • Keyword Checker
    • Search Explorer
      Pro
  • Neighbourhood Checker
    Pro
    • Index Merger
      Pro
    • Link Profile Fight
      Pro
    • Mutual Links
      Pro
    • Solo Links
      Pro
    • PDF Report
      Pro
    • Typo Domain
      Pro
  • Submit URLs
    • Summary
      Pro
    • Similar Profiles
      Pro
    • Profile Backlinks
      Pro
    • Attributions
      Pro
  • Custom Reports
    Pro
    • Get started
    • Backlink Checker
    • Majestic Million
    • Browser Plugins
    • Google Sheets
    • Post Popularity
    • Social Explorer
    • Get started
    • Tools
    • Subscriptions & Billing
    • FAQs
    • Glossary
    • How To Videos
    • API Reference Guide External Link
    • Contact Us
    • Site Updates
    • The Company
    • Style Guide
    • Terms & Conditions
    • Privacy Policy
    • GDPR
    • Contact Us
    • SEO in 2026
    • The Majestic SEO Podcast
    • All Podcasts
    • What is Trust Flow?
    • Link Building Guides
  • Blog External Link
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文

Make your content interpretable, not just crawlable

Chris Green

When we consider the fact that AI agents are starting to visit our sites, with a view to making purchase decisions or recommendations, and other AI engines are regurgitating our content, we have to ensure that they understand the essence of what we provide.

@chrisgreenseo    
Chris Green 2026 podcast cover with logo
More SEO in 2026 YouTube Podcast Playlist Link Spotify Podcast Playlist Link Audible Podcast Playlist Link Apple Podcast Playlist Link

Make your content interpretable, not just crawlable

Chris says: “Technical success for SEO in 2026 requires us to understand the interpretability of content rather than just the crawlability.”

Why is that?

“We're getting more and more concerned with how AI, LLMs, and chatbots understand and regurgitate our content. We always should have been, but as we see more players besides Google come into the space, we need to be really sure that how these machines interpret our content is how we want to be interpreted.

Brands now have less and less control over how their content is consumed, reproduced, and summarised elsewhere. Is it even being interpreted correctly from the word go? What content of yours have they seen? If you were to query a model based on its knowledge of your content, do you get back what you want to get back?

You need to be sure of the meaning, how that meaning is established, and what is being taken from that. Does the form and structure of the content back up what you’re trying to say, or is the nuance getting lost because it’s poorly organised and expressed?

How do you know how these machines are interpreting your content?

“Part of it is analysing how they are displaying your content, and part of it is visibility tools and tracking the most likely prompts for our key users, and understanding what content is returned –not just what the references and cited sources are, which is where we get a bit hung up.

What's the nature of the content that comes back? What are the key points that are highlighted? To understand the interpretability beyond that, you can pass all of your content into a model and query it from a number of different angles. Ask for different details in different ways, and see what is returned.

You can do that via direct queries to the API itself. Basically, you send it the HTML of the page, ask some questions about that page, and then validate what it returns to make sure it’s what you're expecting. One of my favourites is e-commerce pages. If a product has a stock status within the code, does that model see that stock status? Is that correct? Is it not? What's the availability? What are the different variants? Can it understand all of those key details from what’s in the code?

Spend time parsing the content and lots of different queries to a model, just to see what’s coming out. Where is the ambiguity or incorrect details being brought in?”

What is RAG in SEO, and why is it important?

“RAG (Retrieval-Augmented Generation) is the process by which the data is retrieved and augmented with other information.

From our point of view, the most useful way of thinking of that is when the answer to a search query needs validation, or backing up with other data and sources. It's an acknowledgement that the trained models don't contain all of the information. They're not super fresh, the links that they have might be wrong, the reviews/ratings might not be up to date, etc.

It's basically saying, we'll ask the model the question, and then we'll overlay that/augment that with other answers. A key one is search results. OpenAI has its own sources that it validates with. Sometimes it's Google, sometimes it's Bing, sometimes it's others, and they're using that data to validate what is part of that.

That often includes the live fetching of URLs. If there's data that the model decides it needs right now, it will go and get that. It will go and crawl that webpage, and run that analysis there and then, rather than pull it from the original training data.

There are subtle differences in those pipelines, but we need to ensure that our content works in both contexts. You need to be available for the live fetch, and what you provided in that training data needs to be good and reflect what you need it to be, however long ago the cutoff was.”

You say that confusing semantic HTML may get crawled and be interpreted incorrectly. What's an example of that, and how do you correct it?

“One example would be what I mentioned before about stock statuses for a live fetch. Is this product in stock right now? You want that data to be on the page and easily displayed.

I have seen e-commerce sites where, within the code, you have multiple stock statuses. There may only be one stock status (i.e. it's either in stock or it's out of stock), but if it has different types within the code, the model might just see the words ‘out of stock’ somewhere and take that as the current stock. That's a live search that could detrimentally affect your visibility because, if the user is told that the page is out of stock, or an agent operating on your behalf thinks it's out of stock, they may look elsewhere.

Another less obvious example could be using divs to create what look like tables comparing different product data, but the way that you've built those divs isn't easy to interpret. An HTML table is really straightforward. It's super basic and so easy to interpret. Putting your data in a table rather than a div, or using an ordered list instead of bullet points that are illustrated in other CSS, can really cement that interpretation.

It might not be flat wrong, but if you're trying to show a comparison and it's not in a table, it might not be obvious that it's a comparison, so that data might be interpreted inaccurately. There may be some guessing, particularly in the order, where the divs are displayed, and how it's rendered.

You’re removing that guesswork by marking it up as clearly as possible.”

How do you prioritise what to fix first?

“There are two directions. Firstly, there's the straightforward revenue-based direction. If you know which pages have driven the most revenue to date, you could reasonably assume that they should continue to do so.

However, LLMs/chatbots are now tailoring experiences much more to specific people. ChatGPT has a memory, and it will deliver results based on preferences – and we have every reason to believe Google AI Mode will go in that same direction: getting more to grips with your key market, your key personas, and considering what they are searching for and how they are searching.

Therefore, you could consider what your perfect customer looks like, and what they're likely to query based on what you know. Then, you can input those queries and see what is returned.

There are two different ways to do that. You can do it via the API; you can build something very lightweight to loop through a lot of the pages on your website and send those off. If you're just doing it via an API, most of the time you'll get what's in the training data. That will give you a good view of what was present when that model was trained.

If you want to see what's coming out of the interface today, from live validation, you need to do that in the UI. Initially, there will be a lot of pasting prompts in, seeing what's returned, recording the output, and running that check – and I would prioritise it.

For most businesses, you want an indicative view. You're not going to get a perfect view because a lot of that's still being defined, frankly. Be confident that, for your key customers searching for your key products, the responses from these prompts are what you're hoping for. Use a Red, Amber, Green status. Red means it's not returning what you need at all, Amber means that it's okay, but it may be missing key details or it's not pushing the right elements forward, and Green means that it's providing exactly what you expect.

The key is that you need to be doing this relatively frequently to start with, for your core terms, because the answers that come out do change due to the nature of LLMs and how they generate responses. Fresh content is often preferred, especially when you're fetching live results or trying to ground it in search data. It's not a one-and-done thing.

Look at what's driving revenue and look at what's key from a customer point of view. I usually start with ChatGPT because it’s driving the most traffic right now. I would then see what's coming from AI overviews, and increasingly AI Mode. If you want to branch out to other LLMs, you can, but I'm not seeing the revenue coming from those directions just yet.”

AI search is still quite a small percentage of overall traffic, so how do you justify the work that you spend on that compared with Google?

“Firstly, we're very likely at the thin end of this wedge, in the sense that this traffic's going to build. The challenge is that we don’t know by exactly how much.

I've seen a lot of people compare ChatGPT traffic revenue to something like Bing and say, ‘Well, you're not paying that same attention to Bing. Are you just being a magpie, chasing the shiny things?’

Consider how a model is trained, how long it takes to collect that training data, when that cut-off window is, and how frequently it decides to do a live search.

There's so much today where, if you query ChatGPT, it will return you data that is 12 months old or more. We're now seeing from log data that these AI providers are crawling and building their next training set. They're always iterating on the next corpus that they're going to put out, and this cycle may well get quicker.

A lot of people were surprised by how early the cutoff for GPT-5 was, relative to when it was launched. The data that you may be being judged on in 6-, 8-, or 12-months’ time was collected three months ago, or is being collected right now. That puts more emphasis on the need to check. You want to at least make sure that the content you’re providing is the right content and gives people the best example.

If it's not, and you find out when that next iteration is released or when that model data is starting to be utilised, you could find yourself at a disadvantage. In my tests, 15-20% of queries that you put into ChatGPT-5 are invoking search functions. For the rest, where is that information coming from?

Then, from a revenue point of view, the attributability of this is a massive problem. Your brand, your products, and your services are likely being seen. However, if people are searching on ChatGPT and then completing a purchase journey via Google, they're not clicking on a link directly. They're going straight to a website. All of this information we're providing is probably driving more clicks and revenue than we're getting attribution for, and that will almost certainly only get worse.

AIOs don't get clicked on as frequently, and we just don't have great data on AI Mode right now. We need to get less comfortable with being able to determine our value through the direct attributable revenue.

That’s probably a separate topic, but all of those reasons are why I'm saying that we can’t afford not to at least do the work that will benefit SEO, too. The fixes you can do for crawlability and interpretability will benefit all of the traditional information retrieval methods anyway, as long as you don't go nuts.

There is a hazard or health warning here: I wouldn't go back through thousands of pages and try to add some new chunking methodology or rewrite absolutely everything into triplets. That's a little extreme for now.

At least just audit it. Be confident that, if we switched to an AI search-first world tomorrow, you would be doing okay.”

You say that some content formats will be easier to work with, so how do you define the ideal content formats for you?

“Plain text, or HTML that's built using semantic HTML that doesn't rely on JavaScript or other things, is always best. It's the easiest to crawl and consume. Consider the examples I gave earlier: if you're building a table, make it a table, using table tags. If you're building a list, use ordered lists or unordered lists.

Also, basics like semantic headings (H1s, H2s, etc.). Ensure that you have clearly marked up those sections of your page, and you’ve got a clear understanding within your writing as well. If you’re raising a question in an H2, does the first sentence of the following section answer that? If not, that format may be less effective in an AI search context. One way to think about it is: one question, one answer. Keep your concepts really clear.

Don’t chunk them into such small points that it becomes really challenging to read, but if you've got a section that's asking and answering a question, ask it simply, and answer it simply. Then, if you want to provide extra detail afterwards, do so. If your questions and answers are closely related, super clear, and have a definitive answer, that will make it far clearer. It will be less likely that your content will get confused or you'll get passed over.

Probably the best things you can do are look for divs as tables or lists and mark them up properly. Then, other HTML5 markup, in terms of just sections of the page, can help as well.

Changing page templates at that fundamental level could be expensive. If your site's not built super well, and you start making small recommendations on page structure from HTML, it will be challenging to show the benefit, but the principle is that the more clearly marked up the page is, the easier it is for Google, LLMs, and assistive tools to understand your page.

The first things I would do are tables, lists, FAQ modules, and things like that. Mark them up really clearly so everybody understands what is in them.”

Chris, what's the key takeaway from the tip you shared today?

“Interpreting content isn't a given. Just because what you've put on that page can be crawled doesn't mean that the interpretation of that won't be confused by other data on the page or the markup itself.

Spend some time testing it, copying and pasting the code of that page into different models, and posing it some questions. What are you getting out of it? Is that the desired response? If not, why not?”

Chris Green is Technical Director at Torque Partnership. Find out more over at TorquePartnership.com.

@chrisgreenseo    

Also with Chris Green

Chris Green 2023 podcast cover with logo
2023 Additional Insight
Volume/depth of content doesn't mean you will rank well
Chris Green says that 2023 is the year that you really need to invest in understanding what makes qualitatively good content.
Majestic SEO Podcast - the Majestic SEO podcast cover
Majestic SEO Podcast
#30: SEO QA & Testing
How do you know that your team’s SEO work is of high quality? And how do you test the impact of your SEO changes? That’s what we’re going to be covering in episode 30 of Old Guard vs New Blood.
Chris Green 2022 podcast cover with logo
SEO in 2022
Adopt anintent-first mindset

Chris believes that 'intent' is something that you should consider prior to carrying out any SEO activity.

Majestic SEO Podcast - the Majestic SEO podcast cover
Majestic SEO Podcast
#21: SEO on the Edge
Do you SEO for a big website? If so, do you struggle to get all the on-site SEO improvements implemented due to lack of site access or a poor CMS? Perhaps conducting ‘SEO on the Edge’ may be exactly what you’re looking for.

Choose Your Own Learning Style

Webinar iconVideo

If you like to get up-close with your favourite SEO experts, these one-to-one interviews might just be for you.

Watch all of our episodes, FREE, on our dedicated SEO in 2026 playlist.

youtube Playlist Icon

Podcast iconPodcast

Maybe you are more of a listener than a watcher, or prefer to learn while you commute.

SEO in 2026 is available now via all the usual podcast platforms

Spotify Apple Podcasts Audible

Book iconBook

This is our favourite. Sometimes it's better to sit and relax with a nice book.

The best of our range of interviews is available right now as a physical copy and eBook.

Amazon US Amazon UK

Don't miss out

Opt-in to receive email updates.

It's the fastest way to find out more about SEO in 2026.


Could we improve this page for you? Please tell us

Fresh Index

Unique URLs crawled 230,229,906,954
Unique URLs found 834,913,146,485
Date range 09 Oct 2025 to 06 Feb 2026
Last updated 14 minutes ago

Historic Index

Unique URLs crawled 4,502,566,935,407
Unique URLs found 21,743,308,221,308
Date range 06 Jun 2006 to 26 Mar 2024
Last updated 03 May 2024

SOCIAL

  • LinkedIn
  • YouTube
  • Facebook
  • Bluesky
  • Twitter / X

COMPANY

  • Blog External Link
  • About
  • Terms and Conditions
  • Privacy Policy
  • GDPR
  • Contact Us

TOOLS

  • Plans & Pricing
  • Site Explorer
  • Compare Domains
  • Bulk Backlinks
  • Search Explorer
  • Developer API External Link

MAJESTIC FOR

  • Trust Flow
  • Flow Metric Scores
  • Link Context
  • Backlink Checker
  • Influencer Discovery
  • Enterprise External Link

PODCASTS & PUBLICATIONS

  • The Majestic SEO Podcast
  • SEO in 2026
  • SEO in 2025
  • SEO in 2024
  • SEO in 2023
  • SEO in 2022
  • All Podcasts
top ^