llm-mux – Framework for Cost-Efficient LLM Applications
A lightweight FastAPI framework to build LLM applications fast — with tiered model routing, cost optimization, and observability out of the box.
Software engineer focused on distributed systems, databases, and applied ML for retrieval. I build fault-tolerant and data-driven systems that stay reliable under scale.
A lightweight FastAPI framework to build LLM applications fast — with tiered model routing, cost optimization, and observability out of the box.
Built fault-tolerant, distributed lock service in Go, implementing Raft consensus algorithm from scratch to ensure high availability and data consistency.
Successfully migrated 5 backend services to CloudAuth, executing safe, staged rollouts and ensuring business continuity by validating rollback plans.
Built e2e pipeline for AI-powered health insights, serving 6M+ users. Led cross-team API redesign improving maintainability and scalability.
Built observability system cutting false alerts by 99% for DynamoDB (billions of req/s). Improved new Rust client reducing p99 latency by 28%.
I am deepening my specialization in applied ML and information retrieval. I am certified in Deep Learning (2022), Generative AI with LLMs (2023), and Retrieval-Augmented Generation (RAG) (2025).
I design, build, and maintain systems that are not only scalable but also resilient and performant. My goal is to create software that endures.