This comprehensive guide outlines an advanced approach to implementing Recency-Aware Retrieval-Augmented Generation (RAG) using Databricks' ecosystem, specifically focusing on integrating recency scoring into a Vector Search-based retrieval system. Here’s a summary of the key points and steps involved:
Key Components
-
Data Ingestion:
- Use
binaryFileto read new PDFs from Unity Catalog (UC) volumes. - Capture actual modification timestamps (
source_modified_at) for each document.
- Use
-
Document Processing:
- Extract text, tables, and figures using
ai_parse_document(). - Chunk the extracted content into smaller segments suitable for vector search indexing.
- Extract text, tables, and figures using
-
Change Data Feed (CDF):
- Use CDF to propagate changes in the dataset to the vector index.
- Soft-delete outdated documents by setting
is_activeto false instead of hard-deleting them.
-
Vector Search Indexing:
- Create and manage a vector search index that filters out inactive (
is_active = false) chunks during retrieval.
- Create and manage a vector search index that filters out inactive (
-
Recency Scoring:
- Implement exponential decay scoring based on the actual modification timestamp.
- Adjust weights to balance similarity
Read the full article at Towards AI - Medium
Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

![[AINews] The Unreasonable Effectiveness of Closing the Loop](/_next/image?url=https%3A%2F%2Fmedia.nemati.ai%2Fmedia%2Fblog%2Fimages%2Farticles%2F600e22851bc7453b.webp&w=3840&q=75)



