Make your content interpretable, not just crawlable
Chris says: “Technical success for SEO in 2026 requires us to understand the interpretability of content rather than just the crawlability.”
Why is that?
“We're getting more and more concerned with how AI, LLMs, and chatbots understand and regurgitate our content. We always should have been, but as we see more players besides Google come into the space, we need to be really sure that how these machines interpret our content is how we want to be interpreted.
Brands now have less and less control over how their content is consumed, reproduced, and summarised elsewhere. Is it even being interpreted correctly from the word go? What content of yours have they seen? If you were to query a model based on its knowledge of your content, do you get back what you want to get back?
You need to be sure of the meaning, how that meaning is established, and what is being taken from that. Does the form and structure of the content back up what you’re trying to say, or is the nuance getting lost because it’s poorly organised and expressed?
How do you know how these machines are interpreting your content?
“Part of it is analysing how they are displaying your content, and part of it is visibility tools and tracking the most likely prompts for our key users, and understanding what content is returned –not just what the references and cited sources are, which is where we get a bit hung up.
What's the nature of the content that comes back? What are the key points that are highlighted? To understand the interpretability beyond that, you can pass all of your content into a model and query it from a number of different angles. Ask for different details in different ways, and see what is returned.
You can do that via direct queries to the API itself. Basically, you send it the HTML of the page, ask some questions about that page, and then validate what it returns to make sure it’s what you're expecting. One of my favourites is e-commerce pages. If a product has a stock status within the code, does that model see that stock status? Is that correct? Is it not? What's the availability? What are the different variants? Can it understand all of those key details from what’s in the code?
Spend time parsing the content and lots of different queries to a model, just to see what’s coming out. Where is the ambiguity or incorrect details being brought in?”
What is RAG in SEO, and why is it important?
“RAG (Retrieval-Augmented Generation) is the process by which the data is retrieved and augmented with other information.
From our point of view, the most useful way of thinking of that is when the answer to a search query needs validation, or backing up with other data and sources. It's an acknowledgement that the trained models don't contain all of the information. They're not super fresh, the links that they have might be wrong, the reviews/ratings might not be up to date, etc.
It's basically saying, we'll ask the model the question, and then we'll overlay that/augment that with other answers. A key one is search results. OpenAI has its own sources that it validates with. Sometimes it's Google, sometimes it's Bing, sometimes it's others, and they're using that data to validate what is part of that.
That often includes the live fetching of URLs. If there's data that the model decides it needs right now, it will go and get that. It will go and crawl that webpage, and run that analysis there and then, rather than pull it from the original training data.
There are subtle differences in those pipelines, but we need to ensure that our content works in both contexts. You need to be available for the live fetch, and what you provided in that training data needs to be good and reflect what you need it to be, however long ago the cutoff was.”
You say that confusing semantic HTML may get crawled and be interpreted incorrectly. What's an example of that, and how do you correct it?
“One example would be what I mentioned before about stock statuses for a live fetch. Is this product in stock right now? You want that data to be on the page and easily displayed.
I have seen e-commerce sites where, within the code, you have multiple stock statuses. There may only be one stock status (i.e. it's either in stock or it's out of stock), but if it has different types within the code, the model might just see the words ‘out of stock’ somewhere and take that as the current stock. That's a live search that could detrimentally affect your visibility because, if the user is told that the page is out of stock, or an agent operating on your behalf thinks it's out of stock, they may look elsewhere.
Another less obvious example could be using divs to create what look like tables comparing different product data, but the way that you've built those divs isn't easy to interpret. An HTML table is really straightforward. It's super basic and so easy to interpret. Putting your data in a table rather than a div, or using an ordered list instead of bullet points that are illustrated in other CSS, can really cement that interpretation.
It might not be flat wrong, but if you're trying to show a comparison and it's not in a table, it might not be obvious that it's a comparison, so that data might be interpreted inaccurately. There may be some guessing, particularly in the order, where the divs are displayed, and how it's rendered.
You’re removing that guesswork by marking it up as clearly as possible.”
How do you prioritise what to fix first?
“There are two directions. Firstly, there's the straightforward revenue-based direction. If you know which pages have driven the most revenue to date, you could reasonably assume that they should continue to do so.
However, LLMs/chatbots are now tailoring experiences much more to specific people. ChatGPT has a memory, and it will deliver results based on preferences – and we have every reason to believe Google AI Mode will go in that same direction: getting more to grips with your key market, your key personas, and considering what they are searching for and how they are searching.
Therefore, you could consider what your perfect customer looks like, and what they're likely to query based on what you know. Then, you can input those queries and see what is returned.
There are two different ways to do that. You can do it via the API; you can build something very lightweight to loop through a lot of the pages on your website and send those off. If you're just doing it via an API, most of the time you'll get what's in the training data. That will give you a good view of what was present when that model was trained.
If you want to see what's coming out of the interface today, from live validation, you need to do that in the UI. Initially, there will be a lot of pasting prompts in, seeing what's returned, recording the output, and running that check – and I would prioritise it.
For most businesses, you want an indicative view. You're not going to get a perfect view because a lot of that's still being defined, frankly. Be confident that, for your key customers searching for your key products, the responses from these prompts are what you're hoping for. Use a Red, Amber, Green status. Red means it's not returning what you need at all, Amber means that it's okay, but it may be missing key details or it's not pushing the right elements forward, and Green means that it's providing exactly what you expect.
The key is that you need to be doing this relatively frequently to start with, for your core terms, because the answers that come out do change due to the nature of LLMs and how they generate responses. Fresh content is often preferred, especially when you're fetching live results or trying to ground it in search data. It's not a one-and-done thing.
Look at what's driving revenue and look at what's key from a customer point of view. I usually start with ChatGPT because it’s driving the most traffic right now. I would then see what's coming from AI overviews, and increasingly AI Mode. If you want to branch out to other LLMs, you can, but I'm not seeing the revenue coming from those directions just yet.”
AI search is still quite a small percentage of overall traffic, so how do you justify the work that you spend on that compared with Google?
“Firstly, we're very likely at the thin end of this wedge, in the sense that this traffic's going to build. The challenge is that we don’t know by exactly how much.
I've seen a lot of people compare ChatGPT traffic revenue to something like Bing and say, ‘Well, you're not paying that same attention to Bing. Are you just being a magpie, chasing the shiny things?’
Consider how a model is trained, how long it takes to collect that training data, when that cut-off window is, and how frequently it decides to do a live search.
There's so much today where, if you query ChatGPT, it will return you data that is 12 months old or more. We're now seeing from log data that these AI providers are crawling and building their next training set. They're always iterating on the next corpus that they're going to put out, and this cycle may well get quicker.
A lot of people were surprised by how early the cutoff for GPT-5 was, relative to when it was launched. The data that you may be being judged on in 6-, 8-, or 12-months’ time was collected three months ago, or is being collected right now. That puts more emphasis on the need to check. You want to at least make sure that the content you’re providing is the right content and gives people the best example.
If it's not, and you find out when that next iteration is released or when that model data is starting to be utilised, you could find yourself at a disadvantage. In my tests, 15-20% of queries that you put into ChatGPT-5 are invoking search functions. For the rest, where is that information coming from?
Then, from a revenue point of view, the attributability of this is a massive problem. Your brand, your products, and your services are likely being seen. However, if people are searching on ChatGPT and then completing a purchase journey via Google, they're not clicking on a link directly. They're going straight to a website. All of this information we're providing is probably driving more clicks and revenue than we're getting attribution for, and that will almost certainly only get worse.
AIOs don't get clicked on as frequently, and we just don't have great data on AI Mode right now. We need to get less comfortable with being able to determine our value through the direct attributable revenue.
That’s probably a separate topic, but all of those reasons are why I'm saying that we can’t afford not to at least do the work that will benefit SEO, too. The fixes you can do for crawlability and interpretability will benefit all of the traditional information retrieval methods anyway, as long as you don't go nuts.
There is a hazard or health warning here: I wouldn't go back through thousands of pages and try to add some new chunking methodology or rewrite absolutely everything into triplets. That's a little extreme for now.
At least just audit it. Be confident that, if we switched to an AI search-first world tomorrow, you would be doing okay.”
You say that some content formats will be easier to work with, so how do you define the ideal content formats for you?
“Plain text, or HTML that's built using semantic HTML that doesn't rely on JavaScript or other things, is always best. It's the easiest to crawl and consume. Consider the examples I gave earlier: if you're building a table, make it a table, using table tags. If you're building a list, use ordered lists or unordered lists.
Also, basics like semantic headings (H1s, H2s, etc.). Ensure that you have clearly marked up those sections of your page, and you’ve got a clear understanding within your writing as well. If you’re raising a question in an H2, does the first sentence of the following section answer that? If not, that format may be less effective in an AI search context. One way to think about it is: one question, one answer. Keep your concepts really clear.
Don’t chunk them into such small points that it becomes really challenging to read, but if you've got a section that's asking and answering a question, ask it simply, and answer it simply. Then, if you want to provide extra detail afterwards, do so. If your questions and answers are closely related, super clear, and have a definitive answer, that will make it far clearer. It will be less likely that your content will get confused or you'll get passed over.
Probably the best things you can do are look for divs as tables or lists and mark them up properly. Then, other HTML5 markup, in terms of just sections of the page, can help as well.
Changing page templates at that fundamental level could be expensive. If your site's not built super well, and you start making small recommendations on page structure from HTML, it will be challenging to show the benefit, but the principle is that the more clearly marked up the page is, the easier it is for Google, LLMs, and assistive tools to understand your page.
The first things I would do are tables, lists, FAQ modules, and things like that. Mark them up really clearly so everybody understands what is in them.”
Chris, what's the key takeaway from the tip you shared today?
“Interpreting content isn't a given. Just because what you've put on that page can be crawled doesn't mean that the interpretation of that won't be confused by other data on the page or the markup itself.
Spend some time testing it, copying and pasting the code of that page into different models, and posing it some questions. What are you getting out of it? Is that the desired response? If not, why not?”
Chris Green is Technical Director at Torque Partnership. Find out more over at TorquePartnership.com.