Engineering Agents for Production

Architecting Deterministic Control Loops
around Non-Deterministic LLMs

I build scalable agentic workflows, RAG systems, and distributed training pipelines. Turning research concepts into reliable infrastructure.

AI Agent Engineer Specializing in autonomous workflows & infra

I bridge the gap between AI research and production-grade infrastructure. With a background in distributed GPU training and genomics sequence classification, I specialize in building the orchestration layers that make LLMs reliable at scale.

Whether it’s architecting agentic workflows to automate complex SQL generation or optimizing inference performance for multi-billion parameter models, my goal is to turn non-deterministic research concepts into high-availability systems that save real-world time and cost.

~20 hrs/wk
Of Manual SQL Generation Automated
70%
Training Cost Reduction via Distributed Tuning β†—
Krishna Vamsi Dhulipalla

Technical Approach

Building resilient systems that handle unpredictability. I prioritize observability, fault tolerance, and type-safety to ensure reliable behavior in production environments.

Reliability First

Implementing circuit breakers, backoff strategies, and dead-letter queues.

Deterministic AI

Constraining LLM outputs via Pydantic schemas and strict validation layers.

Observability Driven

Structured logging and metric tracing to ensure system transparency.

agent-controller β€” zsh
vamsi@dev-box:~$ kubectl logs pod/vamsi-brain-node-1 -n production --follow
[INFO] Initializing context... Coffee_Level: 98% (OPTIMAL)
[INFO] Loading model weights... Quantizing to 4-bit because VRAM is expensive.
[WARN] Detected anomaly: "It works on my machine".
[INFO] > Containerizing environment to fix anomaly... DONE.
[ERROR] DNS_PROBE_FINISHED_NXDOMAIN. It's always DNS.
[INFO] Resolving... Bypassing DNS with hardcoded IP (don't tell security).
[INFO] Deploying 'Agent-Sandbox' to production on Friday...
[CRIT] Guardrails triggered: "Senior_Dev_Protocol_Violation". Aborting.
[SUCCESS] Rescheduling deployment to Monday. System Stable.
❯

Projects

Clinical AI Assistant
Healthcare AI
Generative AI

Clinical Pre-Rounding Assistant

Built a FHIR-based RAG system (AWS Bedrock, LangChain) to summarize 24-48h patient changes. Implemented deterministic fact extraction, PHI redaction, and LLM-as-judge evaluation for 100% groundedness.

FHIR Bedrock LangChain
Agent K8s Sandbox
Infrastructure

Agent K8s Sandbox

Production-grade sandbox environment for untrusted AI-generated code. Features namespace isolation, CPU/Memory throttling, and a sidecar proxy that intercepts implementation details for observability.

Kubernetes Go Agents
Autonomous UI Agent
Agentic Workflow

Autonomous UI Agent

Engineered a deterministic control loop for non-deterministic web agents. Achieved 94% task success rate by implementing a self-correcting vision pipeline that validates UI state changes before proceeding.

LangGraph Playwright Vision
Internal Data Agents
Enterprise RAG

Internal Data Agents

Replaced 65% of manual data analyst tickets with autonomous SQL agents. Built a schema-aware RAG system that sanitizes query generation and adheres to enterprise RBAC policies.

LangChain SQL/Pandas K8s
Proxy TuNER
LLM Efficiency

Proxy TuNER (LLaMA 2)

Slashed LLM training costs by 70% by steering frozen LLaMA 2 models with lightweight domain experts. Eliminated the need for full-parameter fine-tuning while retaining general reasoning capabilities.

PyTorch LLaMA 2 Eff. Tuning
IntelliMeet
Real-Time System

IntelliMeet

Scaled real-time video infrastructure to support 500+ concurrent peers. Optimized WebRTC signaling and implemented client-side network recovery, reducing stream dropouts by 25%.

WebRTC FastAPI Computer Vision
DNA Classifier
BioComuping

DNA Sequence Classifier

Processed 1M+ genomic sequences by orchestrating distributed inference on HPC clusters. Optimized DNABERT attention heads to ignore non-coding regions, improving throughput by 30%.

Airflow HPC/Batch DNABERT
PulseMap
Agentic + Streaming

PulseMap

Built a disaster response platform capable of ingesting 10k+ events/sec. Agents triage social signals and sensor data to generate geospatial risk maps in real-time.

Kafka Agents Geospatial

Experience

AI Software Engineer

Tabner Inc β€’ Jul 2025 - Present

Architected agentic workflows to automate 65% of internal data requests. Rebuilt ETL pipelines reducing runtime by 25% and deployed containerized services on K8s maintaining 99.9% uptime.

Machine Learning Engineer

Virginia Tech β€’ Aug 2024 - Jul 2025

Increased genomics throughput by 32% via LoRA/Soft-prompting. Orchestrated 100+ distributed GPU experiments and containerized fine-tuning workflows to slash setup time.

Graduate Research Assistant

Virginia Tech β€’ Jun 2023 - May 2024

Improved NER F1-score by 8% using proxy-tuned LLaMA 2 models. Optimized inference performance by 30% and reduced training costs by 70%.

Software Engineer

UJR Technologies β€’ Jul 2021 - Dec 2022

Standardized REST APIs reducing integration defects by 40%. Automated CI/CD pipelines via GitHub Actions cutting release failures by 20%.

Education

Virginia Tech

Master of Science, Computer Science

GPA 3.9/4.0

Vel Tech University

Bachelor of Technology, Computer Science and Engineering

GPA 8.24/10

πŸ“œ Selected Research

Predicting Circadian Transcription in mRNAs and lncRNAs

IEEE BIBM 2024

DNA Foundation Models for Cross-Species TF Binding Prediction

NeurIPS ML in CompBio 2025

Multi-omics atlas of the plant nuclear envelope

Under Review

Science Advances β€’ University of California, Berkeley

Core Capabilities

Specialized expertise in building compliant, high-scale Healthcare AI systems and autonomous infrastructure.

Healthcare & AI

  • FHIR/HL7 Integration, Clinical Workflows
  • RAG for Healthcare, Hybrid Retrieval
  • LangChain, LangGraph, Multi-Agent Systems
  • AWS Bedrock (Claude), OpenAI API
  • Eval Frameworks (LangSmith, Guardrails)
  • Vector DBs (Pinecone, Milvus) & Chunking

Data & Compliance

  • PHI/PII Handling, HIPAA-aware Arch
  • Healthcare Data Stds, Audit Logging
  • Deterministic Fact Extraction
  • Citation Grounding & Evidence
  • Postgres, Data Governance

Programming

  • Python (Deep Learning, API), R
  • Go (High Perf), TypeScript (Frontend)
  • FastAPI, Node.js, REST & OpenAPI
  • SQL, Agile Methodologies

ML & Infra

  • PyTorch, Embeddings, MLflow
  • Docker, Kubernetes (K8s), Linux
  • CI/CD (GitHub Actions), Redis, Git
  • System Design for AI

Cloud & Tools

  • AWS (Bedrock, SageMaker, S3, Lambda)
  • AWS Step Functions, CloudWatch
  • Grafana, Prometheus, Tableau
  • GCP, Terraform (IaC)

Outside Work

πŸ‘¨β€πŸ³
Cooking
Experimenting with Indian dishes
🏏
Cricket
Weekend matches with friends
πŸš€
Space
Deep space exploration enthusiast
πŸ₯Ύ
Hiking
Exploring trails & nature

Let's talk

Open to roles and collaborations. The fastest way to reach me is email or LinkedIn.