AI Hub
Unified AI backend integrating local LLMs, speech processing, image generation, RAG pipelines, and model training.
LLM Integration
Local LLM inference via llama.cpp with an OpenAI-compatible HTTP API. Supports model loading, GPU offloading, context management, and multiple concurrent sessions. The Copilot backend provides code-aware assistance with project context.
Speech & Audio
Speech-to-Text
Whisper integration for real-time transcription. Supports multiple languages and model sizes from tiny to large-v3.
Text-to-Speech
Neural TTS with configurable voices, speed, and output format. Streaming audio generation for real-time playback.
RAG Pipeline
Retrieval-Augmented Generation with DuckDB vss (vector similarity search). Documents are chunked, embedded, and stored in a vector database. Queries retrieve relevant context before LLM generation.
VectorStore
DuckDB-backed vector database with HNSW indexing. Automatic index promotion when chunk count exceeds threshold.
CodeChunker
Language-aware code chunking that respects function/class boundaries for accurate code search and retrieval.
RAGPipeline
End-to-end pipeline: ingest sources, chunk, embed, store, query, and augment LLM prompts with relevant context.
Training & GPU Management
Fine-tuning support with QLoRA and LoRA via Python subprocess orchestration. GPU memory management with VRAM LRU eviction and exclusive mode for training workloads.
