This is a great example of RAG done right, feeding domain-specific data to an LLM instead of relying on generic training.
The signal-to-noise ratio in CI logs is brutal though.
Curious how you handled deduplication and filtering before embedding?
In my experience that preprocessing step makes or breaks the quality of retrieval.