Similar Questions in AI System Design
Medium
Explain the pattern of searching for small, granular chunks but feeding a larger "parent" context to the LLM. Why is this more accurate?
View
Hard
Agent systems can be expensive due to multiple model calls. How would you optimize for cost and latency without sacrificing quality? When would you introduce caching, batching, or model downgrades?
View
Hard
If a tool returns a massive 50MB JSON file, you can't feed that to the LLM. How do you "summarize" or "filter" tool observations for the agent?
View