Skip to content

RAGWire with FastEmbed

FastEmbed is a lightweight, fast embedding library by Qdrant. It runs locally with no API key and is optimized for CPU inference.

FastEmbed is also used internally for sparse (keyword) vectors when hybrid search is enabled — that part is automatic and requires no extra config.

Prerequisites

  • RAGWire installed: pip install ragwire
  • FastEmbed installed: pip install fastembed
  • Qdrant running: docker run -d -p 6333:6333 qdrant/qdrant

1. Install Dependencies

# FastEmbed for embeddings + Ollama for LLM (fully local, no cost)
pip install fastembed "ragwire[ollama]"

# Or with OpenAI for LLM
pip install fastembed "ragwire[openai]"

2. Configuration

FastEmbed Embeddings + Ollama LLM (fully local)

embeddings:
  provider: "fastembed"
  model_name: "BAAI/bge-small-en-v1.5"    # 384-dim, fast and lightweight
  # model_name: "BAAI/bge-base-en-v1.5"   # 768-dim, better quality

llm:
  provider: "ollama"
  model: "qwen3.5:9b"
  base_url: "http://localhost:11434"
  num_ctx: 16384

vectorstore:
  url: "http://localhost:6333"
  collection_name: "my_docs"
  use_sparse: true
  force_recreate: false

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: false   # set true to enable LLM-based filter extraction from every query

FastEmbed Embeddings + OpenAI LLM

embeddings:
  provider: "fastembed"
  model_name: "BAAI/bge-small-en-v1.5"

llm:
  provider: "ollama"
  model: "qwen3.5:9b"
  base_url: "http://localhost:11434"

vectorstore:
  url: "http://localhost:6333"
  collection_name: "my_docs"
  use_sparse: true
  force_recreate: false

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: false   # set true to enable LLM-based filter extraction from every query

3. Python Usage

from ragwire import RAGWire

rag = RAGWire("config.yaml")

# Ingest
stats = rag.ingest_documents(["data/Apple_10k_2025.pdf"])
print(f"Chunks created: {stats['chunks_created']}")

# Retrieve
results = rag.retrieve("What is Apple's total revenue?", top_k=5)
for doc in results:
    print(doc.metadata.get("company_name"), doc.page_content[:200])

4. Run the Example

python examples/basic_usage.py
Model Dimensions Notes
BAAI/bge-small-en-v1.5 384 Default, very fast
BAAI/bge-base-en-v1.5 768 Better quality
BAAI/bge-large-en-v1.5 1024 Best quality
sentence-transformers/all-MiniLM-L6-v2 384 Popular alternative

Full list: qdrant.github.io/fastembed/examples/Supported_Models

Notes

  • Models are downloaded and cached locally on first use.
  • FastEmbed uses ONNX Runtime — fast on CPU without requiring PyTorch or CUDA.
  • If you change the model after ingestion, set force_recreate: true once to rebuild the collection.