π Hello, Iβm
Rohan Patil
Building AI Systems
That Scale π
AI/ML Engineer with experience at Perplexity and Amazon, building production-grade LLM pipelines, RAG systems, and distributed ML infrastructure for real-world high-scale environments.
5+ yrs
Experience
25%
Latency Improvement
1M+
Requests handled
Selected Projects
Adaptive RAG Chatbot
Adaptive query routing, hybrid retrieval (vector + context-aware search), and LLM orchestration to minimize hallucinations and optimize response accuracy and latency.
LENS β AI Image Intelligence
Multi-mode AI vision system that analyzes images and generates contextual outputs including storytelling, humor, and semantic interpretation using multimodal LLMs.
Second Brain β Knowledge Graph
Paste any notes, ideas, or research and watch your thoughts come alive as an interactive force-directed knowledge graph powered by GPT-4o.
LLM Evaluation Dashboard
Tracking model performance, latency, and hallucination metrics.
Improved evaluation accuracy
Vector Search Engine
Hybrid FAISS + Redis retrieval system for semantic search.
Recall β 20%
Inference Optimization System
Optimized GPU inference using batching and Triton.
Throughput β 25%
Experience

AI/ML Engineer β Perplexity
June 2024 β Present Β· San Francisco, CA
- β’ Architected RAG pipelines integrating vector search + web indexing.
- β’ Built FAISS + Redis hybrid retrieval improving recall/precision tradeoff.
- β’ Optimized Triton GPU inference β +25% throughput.
- β’ Designed LLM routing (on-device + cloud) for sub-second latency.
- β’ Improved factual consistency via ranking + citation pipelines.
- β’ Built evaluation systems tracking latency, accuracy, UX metrics.
- β’ Led 0β1 agentic AI features β +18% engagement.

AI/ML Engineer β Amazon
Oct 2019 β June 2023 Β· India
- β’ Built batch + streaming pipelines using AWS, Spark, Kafka.
- β’ Designed feature systems β +30% faster data access.
- β’ Prevented training-serving skew in real-time ML systems.
- β’ Built Kafka + Spark streaming pipelines for low latency updates.
- β’ Orchestrated ML workflows with Airflow + SageMaker.
- β’ Built drift detection + monitoring datasets.
- β’ Reduced infra cost by ~15% via optimization.