(Search) Agents

← Back to main

Lecture

What are LLM agents?

Search agents ▷ Example query: ask to recommend stock to buy based on today's global economy.

Why do we need agents? ▷ Limitation on pretraining: "The stock option today is going xxx". Then xxx (up/down) is optimized over historical data. Assume that, 80% of human text says stock is going up, then xxx will likely be up. Thus, we need websearch. ▷ Limitation of SFT: still cannot interact in real web search, but the real world is continuously changing. ▷ Thus we need RL and agents! cf. Running SFT before running RL is usually considered helpful.

An empirical sudy on RL for reasoning-search interleaved LLM agents ▷ How do we do RL? Three questions: model - how to model scale and type (general vs reasoning-specialized) affect training? tool - using different search engines training model to use tools - how to set the reward. ▷ Model types: plots of training reward, number of search calls, and test accuracy. General models train more stably, and call search earlier; but reasoning model struggle with format adherence; and scaling improves performance, yet marginal improvement is getting smaller as the model size gets larger.

▷ Reward setting: use a 'format reward' which consider both correctness of answer and the format we want. This speeds up convergence, improves accuracy (especially for non-instruction-tuned LLMs), but it can occur overfitting. ▷ Another reward setting: intermediate retrieval reward; but seems not too effective.

ZeroSearch ▷Another problem about search: the price for search is high. Then, using search in training will cost a lot too. ▷ TLDR for ZeroSearch: let's teach LLMs to search, without searching -- by including simulation LLM, mimicking a search engine. ▷ Recipe: replace search engine with an LLM, and increase search engine LLM noise over time; use standard PPO, GRPO, REINFORCE. ▷ The paper's result claims ZeroSearch performs as good as using real search engine in the algorithm.