Skip to content

Update a Document Without Duplicates

RAGWire deduplicates by SHA256 file hash — re-ingesting the same file is a no-op. To update a document, the file content must change (even a single byte triggers re-ingestion).

from ragwire import RAGWire

rag = RAGWire("config.yaml")

# First ingest
stats = rag.ingest_documents(["reports/Q1_2025.pdf"])
print(stats["processed"])   # → 1

# Re-run with same file — skipped automatically
stats = rag.ingest_documents(["reports/Q1_2025.pdf"])
print(stats["skipped"])     # → 1

# Update the file, re-run — new version ingested
# (old chunks remain; to remove them, set force_recreate: true and re-ingest all)
stats = rag.ingest_documents(["reports/Q1_2025.pdf"])
print(stats["processed"])   # → 1

Full replacement

RAGWire does not delete old chunks when a file is updated. For a full replacement, set force_recreate: true in config and re-ingest all documents.

Scheduled Re-ingestion

For a folder that receives new or updated files regularly:

import schedule, time
from ragwire import RAGWire

rag = RAGWire("config.yaml")

def sync():
    stats = rag.ingest_directory("data/")
    print(f"Processed: {stats['processed']}, Skipped: {stats['skipped']}, Failed: {stats['failed']}")

schedule.every(1).hours.do(sync)

while True:
    schedule.run_pending()
    time.sleep(60)

Only changed files are re-ingested — unchanged files are skipped automatically.