Building production-ready AI systems and tools
agkavin/TitanCompute
Distributed LLM inference engine with zero-proxy streaming architecture and intelligent MCDA scheduling across heterogeneous consumer devices
agkavin/RAG-Pipelines
Modular agentic RAG framework with autonomous tool-based reasoning, web-search retrieval, and multi-source document querying capabilities
agkavin/FloatChat-Backend
Production-grade NL-to-SQL analytics platform with hybrid RAG architecture, automated anomaly detection, and scientific PDF report generation
Real-time conversational avatar system optimized by preloading models into RAM, applying latent space optimizations in the VAE, and caching intermediate results to minimize recomputation
agkavin/Railway-Complaint-Management
Multimodal complaint classification system using Vision Transformers and OCR for automated ticket metadata extraction and severity-based routing
agkavin/llm-finetuning
Fine-tuning Phi-3-mini-4k-instruct on ArXiv math dataset via QLoRA, optimized for domain-specific mathematical queries with local inference support
agkavin/Tokenizer-From-Scratch
Toolkit for building custom tokenizers from scratch, covering fundamental text tokenization techniques without relying on pre-built libraries
agkavin/Transformer-From-Scratch
Raw PyTorch implementation of the Transformer model from scratch, demonstrating the core architecture from 'Attention Is All You Need' with multi-head self-attention and positional encoding