This is a great example of RAG done right, feeding domain-specific data to an LL...

This is a great example of RAG done right, feeding domain-specific data to an LLM instead of relying on generic training. The signal-to-noise ratio in CI logs is brutal though. Curious how you handled deduplication and filtering before embedding? In my experience that preprocessing step makes or breaks the quality of retrieval.