Component Map¶
How all modules in the RAGWire package relate to each other — who owns what, who calls whom, and which external libraries each component depends on.
Module Dependency Graph¶
graph TD
INIT["ragwire/__init__.py\nPublic API — exports all symbols"]
INIT --> PIPE
PIPE["core/pipeline.py\nRAGWire — main orchestrator"]
PIPE --> CFG["core/config.py\nConfig"]
PIPE --> LOAD["loaders/markitdown_loader.py\nMarkItDownLoader"]
PIPE --> SPLIT["processing/splitter.py\nText Splitters"]
PIPE --> HASH["processing/hashing.py\nSHA256 Hashing"]
PIPE --> EXT["metadata/extractor.py\nMetadataExtractor"]
PIPE --> SCH["metadata/schema.py\nDocumentMetadata"]
PIPE --> EMB["embeddings/factory.py\nget_embedding"]
PIPE --> QS["vectorstores/qdrant_store.py\nQdrantStore"]
PIPE --> HYB["retriever/hybrid.py\nget_retriever / hybrid_search"]
PIPE --> LOG["utils/logging.py\nsetup_logging"]
External Library Mapping¶
| RAGWire Module | Third-Party Libraries | Notes |
|---|---|---|
markitdown_loader.py |
markitdown |
Document → Markdown conversion |
splitter.py |
langchain-text-splitters |
Markdown + recursive splitting |
extractor.py |
langchain-core (ChatPromptTemplate) |
Prompt building + LLM chain |
schema.py |
pydantic |
Metadata schema validation |
factory.py (embeddings) |
langchain-openai · langchain-ollama · langchain-huggingface · langchain-google-genai |
Lazy import — only the configured provider is loaded |
qdrant_store.py |
qdrant-client · langchain-qdrant · fastembed |
fastembed only needed for hybrid search |
hybrid.py |
langchain-qdrant (QdrantVectorStore) |
Similarity / MMR / hybrid retrieval |
config.py |
pyyaml · python-dotenv |
YAML loading + env var resolution |
pipeline.py (LLM) |
langchain-openai · langchain-ollama · langchain-google-genai · langchain-groq · langchain-anthropic |
Lazy import — only the configured provider is loaded |
RAGWire Class — Internal State¶
classDiagram
class RAGWire {
+config: dict
+loader: MarkItDownLoader
+splitter: TextSplitter
+embedding: EmbeddingModel
+metadata_extractor: MetadataExtractor
+vectorstore_wrapper: QdrantStore
+vectorstore: QdrantVectorStore
+retriever: Retriever
-_filter_fields: List[str]
-_stored_values_cache: dict or None
+ingest_documents(file_paths) dict
+ingest_directory(directory) dict
+retrieve(query, top_k, filters) List[Document]
+hybrid_search(query, k, filters) List[Document]
+extract_metadata(text) dict
+get_field_values(fields, limit) dict
+filter_fields List[str]
+discover_metadata_fields() List[str]
+get_stats() dict
-_process_document(text, file_path, ...) List[Document]
-_extract_filters_from_query(query) dict
-_build_qdrant_filter(filters) Filter
-_stored_values: dict [property]
-_initialize_logging()
-_initialize_loader()
-_initialize_splitter()
-_initialize_embeddings()
-_initialize_llm()
-_initialize_vectorstore()
-_initialize_retriever()
}
class MetadataExtractor {
+llm: ChatModel
+schema_model: BaseModel
+prompt: ChatPromptTemplate
+fields: List[str] or None
+extract(text, stored_values) dict
+extract_batch(texts, stored_values) List[dict]
+build_prompt_from_fields(fields)$ str
+from_yaml(llm, yaml_path)$ MetadataExtractor
-_parse_json_response(text) dict
}
class QdrantStore {
+client: QdrantClient
+embedding: EmbeddingModel
+collection_name: str
+config: dict
+set_collection(name)
+get_store(use_sparse) QdrantVectorStore
+create_collection(use_sparse)
+delete_collection()
+collection_exists() bool
+file_hash_exists(file_hash) bool
+get_metadata_keys() List[str]
+get_field_values(fields, limit) dict
+create_payload_indexes(fields)
+get_collection_info() CollectionInfo
}
RAGWire --> MetadataExtractor
RAGWire --> QdrantStore
Data Types Flowing Through the Pipeline¶
flowchart LR
F["str\nfile path"] -->|"MarkItDownLoader"| MD["str\nmarkdown text"]
MD -->|"TextSplitter"| CL["List[str]\nchunk texts"]
CL -->|"MetadataExtractor + metadata dict"| DL["List[Document]\npage_content + metadata"]
DL -->|"EmbeddingModel + QdrantStore"| VEC["Qdrant points\nvector + payload"]
Q["str\nquery"] -->|"EmbeddingModel"| QV["List[float]\nquery vector"]
QV -->|"Retriever"| RES["List[Document]\nranked results"]