White Hat SEO Google’s SAGE Agentic AI Research: What It Means For SEO

shattered world

Regular Member
Patron
Bronze Star
Joined
Oct 12, 2025
Messages
56
Reaction Score
75
Feedback
0 / 0 / 0
Some interesting info that confirms some things about SEO.


Google published a research paper about creating a challenging dataset for training AI agents for deep research. The paper offers insights into how agentic AI deep research works, which implies insights for optimizing content.

The acronym SAGE stands for Steerable Agentic Data Generation for Deep Search with Execution Feedback.

Synthetic Question And Answer Pairs
The researchers noted that the previous state of the art AI training datasets (like Musique and HotpotQA) required no more than four reasoning steps in order to answer the questions. On the number of searches needed to answer a question, Musique averages 2.7 searches per question and HotpotQA averaged 2.1 searches. Another commonly used dataset named Natural Questions (NQ) only required an average of 1.3 searches per question.

These datasets that are used to train AI agents created a training gap for deep search tasks that required more reasoning steps and a greater number of searches. How can you train an AI agent for complex real-world deep search tasks if the AI agents haven’t been trained to tackle genuinely difficult questions.
The researchers created a system called SAGE that automatically generates high-quality, complex question-answer pairs for training AI search agents. SAGE is a “dual-agent” system where one AI writes a question and a second “search agent” AI tries to solve it, providing feedback on the complexity of the question.

The goal of the first AI is to write a question that’s challenging to answer and requires many reasoning steps and multiple searches to solve.
The goal of the second AI is try to measure if the question is answerable and calculate how difficult it is (minimum number of search steps required).
The key to SAGE is that if the second AI solves the question too easily or gets it wrong, the specific steps and documents it found (the execution trace) is fed back to the first AI. This feedback enables the first AI to identify one of four shortcuts that enable the second AI to solve the question in fewer steps.
SEO Takeaways
It’s possible to gain some insights into what kinds of content satisfies the deep research. While these aren’t necessarily tactics for ranking better in agentic AI deep search, these insights do show what kinds of scenarios caused the AI agents to find all or most of the answers in one web page.
“Information Co-location” Could Be An SEO Win
The researchers found that when multiple pieces of information required to answer a question occur in the same document, it reduces the number of search steps needed. For a publisher, this means consolidating “scattered” facts into one page prevents an AI agent from having to “hop” to a competitor’s site to find the rest of the answer.

Triggering “Multi-query Collapse”
The authors identified a phenomenon where information from different documents can be retrieved using a single query. By structuring content to answer several sub-questions at once, you enable the agent to find the full solution on your page faster, effectively “short-circuiting” the long reasoning chain the agent was prepared to undertake.

Eliminating “Shortcuts” (The Reasoning Gap)
The research paper notes that the data generator fails when it accidentally creates a “shortcut” to the answer. As an SEO, your goal is to be that shortcut—providing the specific data points like calculations, dates, or names that allow the agent to reach the final answer without further exploration.
The Goal Is Still To Rank In Classic Search
For an SEO and a publisher, these shortcuts underline the value of creating a comprehensive document because it will remove the need for an AI agent from getting triggered to hop somewhere else. This doesn’t mean it will be helpful to add all the information in one page. If it makes sense for a user it may be useful to link out from one page to another page for related information.

The reason I say that is because the AI agent is conducting classic search looking for answers, so the goal remains to optimize a web page for classic search. Furthermore, in this research, the AI agent is pulling from the top three ranked web pages for each query that it’s executing. I don’t know if this is how agentic AI search works in a live environment, but this is something to consider.

In fact, one of the tests that the researchers did was conducted using the Serper API to extract search results from Google.

So when it comes to ranking in agentic AI search, consider these takeaways:

It may be useful to consider the importance of ranking in the top three.
Do optimize web pages for classic search.
Do not optimize web pages for AI search
If it’s possible to be comprehensive, remain on-topic, and rank in the top three, then do that.
Interlink to relevant pages to help those rank in classic search, preferably in the top three (to be safe).
It could be that agentic AI search will consider pulling from more than the top three in classic search. But it may be helpful to set the goal of ranking for the top 3 in classic search and to focus on ranking other pages that may be a part of the multi-hop deep research.

The research paper was published by Google on January 26, 2026. It’s available in PDF form: SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback.
 
Back
Top