RAGWire with FastEmbed¶
FastEmbed is a lightweight, fast embedding library by Qdrant. It runs locally with no API key and is optimized for CPU inference.
FastEmbed is also used internally for sparse (keyword) vectors when hybrid search is enabled — that part is automatic and requires no extra config.
Prerequisites¶
- RAGWire installed:
pip install ragwire - FastEmbed installed:
pip install fastembed - Qdrant running:
docker run -d -p 6333:6333 qdrant/qdrant
1. Install Dependencies¶
# FastEmbed for embeddings + Ollama for LLM (fully local, no cost)
pip install fastembed "ragwire[ollama]"
# Or with OpenAI for LLM
pip install fastembed "ragwire[openai]"
2. Configuration¶
FastEmbed Embeddings + Ollama LLM (fully local)¶
embeddings:
provider: "fastembed"
model_name: "BAAI/bge-small-en-v1.5" # 384-dim, fast and lightweight
# model_name: "BAAI/bge-base-en-v1.5" # 768-dim, better quality
llm:
provider: "ollama"
model: "qwen3.5:9b"
base_url: "http://localhost:11434"
num_ctx: 16384
vectorstore:
url: "http://localhost:6333"
collection_name: "my_docs"
use_sparse: true
force_recreate: false
retriever:
search_type: "hybrid"
top_k: 5
auto_filter: false # set true to enable LLM-based filter extraction from every query
FastEmbed Embeddings + OpenAI LLM¶
embeddings:
provider: "fastembed"
model_name: "BAAI/bge-small-en-v1.5"
llm:
provider: "ollama"
model: "qwen3.5:9b"
base_url: "http://localhost:11434"
vectorstore:
url: "http://localhost:6333"
collection_name: "my_docs"
use_sparse: true
force_recreate: false
retriever:
search_type: "hybrid"
top_k: 5
auto_filter: false # set true to enable LLM-based filter extraction from every query
3. Python Usage¶
from ragwire import RAGWire
rag = RAGWire("config.yaml")
# Ingest
stats = rag.ingest_documents(["data/Apple_10k_2025.pdf"])
print(f"Chunks created: {stats['chunks_created']}")
# Retrieve
results = rag.retrieve("What is Apple's total revenue?", top_k=5)
for doc in results:
print(doc.metadata.get("company_name"), doc.page_content[:200])
4. Run the Example¶
Recommended Models¶
| Model | Dimensions | Notes |
|---|---|---|
BAAI/bge-small-en-v1.5 |
384 | Default, very fast |
BAAI/bge-base-en-v1.5 |
768 | Better quality |
BAAI/bge-large-en-v1.5 |
1024 | Best quality |
sentence-transformers/all-MiniLM-L6-v2 |
384 | Popular alternative |
Full list: qdrant.github.io/fastembed/examples/Supported_Models
Notes¶
- Models are downloaded and cached locally on first use.
- FastEmbed uses ONNX Runtime — fast on CPU without requiring PyTorch or CUDA.
- If you change the model after ingestion, set
force_recreate: trueonce to rebuild the collection.