Similar Questions in Context and Retrieval (RAG)
Medium
Why might you use a "Cross-Encoder" re-ranker after your initial vector retrieval? What is the trade-off in terms of latency?
View
Easy
Why is "fixed-length" chunking often insufficient? How would you handle a document where a single sentence contains a critical fact but spans a chunk boundary?
View
Medium
Retrieval adds a "hop" before generation. How would you minimize the time-to-first-token (TTFT) for a user?
View