Produce content in the mediums that surface in AI-driven results
Katherine says: “Multimodal content isn't new, but AI is going to make it non-negotiable.”
How would you describe multimodal content?
“Basically, it is content that exists across several mediums. We have audio, video, text, and images.
For as long as I can remember, SEOs and marketers have always prioritised repurposing content across several mediums: you create a blog post, and then you transform that into video, and into infographics as well. That is multimodal content.”
Where are the best places to publish content in different forms nowadays?
“Each individual business will have a number of queries they prioritise, and it will depend on the particular features that these queries are triggering.
For instance, for most of our clients, where we had updated their content for previous SERP features (like featured snippets and People Also Ask), they are also triggering AI overviews.
If you are in a field where your competitors are getting citations in AI overviews – and when you check your queries, one might trigger a video search and another might trigger an image search – it would make sense to diversify your content into images, videos, or whichever format is most relevant.
You can also start with your own website and publish this content on your personal/brand website. Then, for videos, you can host it on popular hosting platforms like YouTube.”
Why do you believe that AI Search will make this non-negotiable?
“This is because of the way that AI systems handle multimodality.
These systems are able to process, interpret, and synthesise content across different mediums. Let's say you're looking for a ceramic plant pot. Before, you could probably search for that using text on Google search, or you could upload an image to Google Lens.
Now, if you're using something like AI Mode, you can upload an image of the specific item you're looking for and add voice instructions specifying the colour you need. You don't even need to know what it’s called; you can just upload an image and add text saying, ‘What is this called?’ You will not only get specific, relevant results, but you will also get additional information in different formats that the AI system deems to be more valuable.
We’ve seen instances where you put in a query and see websites being referenced by the AI, but on top of that, you also see an image carousel and YouTube thumbnails coming out on these searches – and the images being referenced in AI Mode are completely different from the list of websites it had already referenced.
Therefore, if you don't have content in this specific format/medium that it has deemed to be relevant, you’re reducing your chances of being included or being cited in the first place.”
What are the different ways to optimize for the different platforms on which you might publish multimodal content?
“I’m going to focus on image and text, because those are the areas where we have the biggest opportunities. Many businesses are really reluctant to invest lots of money into video and audio, unless it is their primary medium. So, you can start with image optimization first and, as a result of that, you can gain leverage for getting approval for the heavier optimizations.
For images, there are obviously the basic practices of optimizing for speed, format, etc. However, you can take that a step further by optimizing for relevance and context. You can look at the specific element within the image, look at the entities that are associated with your specific image, and look at the classification of the explicit nature of the content.
For instance, if you have an infographic or an image for a fashion item and you upload that to Google Cloud Vision, you will get an idea of how this content is classified. If you get a classification of ‘medical content’, that means there is a disconnect between the message you're trying to parse and what it has been classified as.
For the object side of things, LLMs don’t just interpret a specific item but also the relationships between the objects in the image. If you run it through these tools like Google Cloud Vision or other LLMs, you will get a list of the specific objects within an image and then get labels – and these labels are the entities it can draw from these images.
You could upload an image of swimming trunks, you might get entities like ‘swim shorts’, ‘board shorts’, ‘swimwear’, ‘fashion’, etc. This helps you to validate that the entities associated with these specific images are the entities you want your image or your business to be associated with.
Also, if you deal with content that is meant to convey certain emotions, then you have to make sure that these emotions are being extracted as they should be, so the model recognises the vibe of your brand. In the same tool, you can upload that image and get an idea of the sentiment and the emotions that it is displaying.
If you're gunning for an image that elicits satisfaction and the LLM tool is interpreting it as unhappy or angry, then that does not fit in with the image you're trying to present for your brand.
Another thing you can do is diversify your photo content. It’s not enough to produce high-quality images. You can actually add more information that not only helps the LLMs but could also help your users. Let’s say you sell stainless steel ball valves. Alongside an image of that product, you could also add other technical images. That could be a second image that highlights the technical dimensions of the product and a third image that shows a cross-sectional diagram of the interior of that specific valve.
This improves your chances that your image will show up when a user searches for things that are related to these queries.
Also, if you have step-by-step content, it is not enough to just create an image for each step. I've seen a very peculiar case where an infographic was being cited by AI Mode, and I went to the website to check what made it so special, because it was the only infographic that was being cited for that specific query. It turns out that the website had different images for each step, but they then combined all these images into an infographic that was detailed enough for it to be cited.
That makes a really big difference.”
What do you do if you find that your product is associated with the wrong entity?
“In that case, you would take another look at your photography strategy.
If, for some reason, your swimming trunks are being interpreted as ‘men's suits’, then chances are the product quality is not clear enough to be interpreted, or the focus of the image is on the wrong item.
It boils down to the quality of the photograph in the first place. The resolution might have been off, the focus might have been on the wrong object, etc. Then, you will probably have to retake or change the image.
Schema can help, but why would you want to be in a situation where the wrong context can be extracted from an image, even though your schema tells another story? You want to give these systems the clearest information about your image possible.
If it's going to see that image as a suit, and you add in schema describing it as swimming trunks, I doubt that will make much difference in the long run.”
If people are consuming content directly on AI search engines, how do you measure the impact of that?
“There is no way to measure the specific volume of this for images, particularly. However, if you have reporting set up for AI citations or you've already added an AI channel to your GA4 and you can see events and the performance of a page, then you optimize that page.
Let's say you optimize the page for image search, or you optimize it for video. Over time, you will start seeing really cool traffic coming from AI search or events where people are coming from these platforms. If you really have a great annotation system in place, you obviously know that these optimizations have caused those changes.
You can also use tools like Semrush to see whether that content initially triggered AI overviews or image pack search results. As I mentioned, there is currently an overlap between those features, and if a query triggers an image pack and video results, it will often also trigger an AI overview.
Therefore, if you're seeing these features being triggered after the optimization, the chances are your optimization has brought in those results.”
How do you identify opportunities to publish more content on AI search engines and multi-modal forums?
“Personally, I focus on content for queries that trigger the specific features that I have in mind.
First of all, during your competitor analysis, you can get a look at your competitor's content. What are the specific queries that are bringing in AI overviews or an image pack for them? If you have content related to this, you can prioritise that based on what is highest-converting and highest-value, and work your way from there.
It could be that your competitor is featuring in video results, image search, or infographics for a particular query, and you have content for this that is not optimized for the format that the AI systems really prioritise. That is your opportunity.”
What would you say to stakeholders who don't want to focus on AI search at the moment, because the majority of their traffic is coming from Google?
“I’m talking about AI Mode and AI overviews as well, not just ChatGPT and the rest of them.
It's not just for the other LLMs that are outside of the regular Google search; these are the AI features within Google, which your audience is definitely tapping into. If you keep ignoring those features, that's fine, but you are likely to get left behind.
If we need to demonstrate it, the first thing we do is leverage our results. That is the easiest way to do this, and we already have some AI overviews and charts that have been referenced. It’s always easier to get approval and buy-in if you already show that something is working. We can show that this content is doing great, unlike other content, where we are losing out.
You can show that their competitors are not being mentioned here, but you are. Therefore, if you take the opportunity to tap into this, you can increase your chances of getting ahead of your competitors. Then, when that happens, you take a screenshot, share it with them, and they are happy.
Often, you won’t need to go to them for the next stage. They will come to you.”
What tends to be the biggest wins to begin with?
“For e-commerce, it is definitely identifying your highest-converting pages. If you have landing pages that have product photography on them, and you're not really sure how this content is being picked up, then you can optimize for those images as well.
If you deal with packaged products and goods, you can obviously optimize for the images of your packaging on those pages, so that these LLMs can easily extract information from those images when people search for similar products.”
Katherine, what's the key takeaway from the tip you shared today?
“In AI-driven multimodal search, having content across relevant mediums is really important.
Your image has to be machine-readable. It has to contain contextual elements that these systems can interpret and draw information and entities from, and you need to make sure that these match the image you're trying to present.
You can convert text content into images and other formats, but you can also structure that in better ways. You can structure content about sizes into tables, show hierarchical relationships through tree diagrams, and use bullets or infographics to show step-by-step information.
You don't need to do it across the board. Focus your multimodal efforts towards high-converting, high-value content for queries that already trigger AI overviews and SERP features. That way, you have a better shot at delivering value while improving your chances of being included in AI citations.”
Katherine Nwanorue is an SEO Specialist at Fusion Inbound. Find out more over at FusionInbound.com.