Applied AI Systems Engineer

Akhilesh Veerapareddy

I build and own production AI systems end-to-end.

Work spans agent orchestration, inference paths, data and ML platforms, and real-time decision systems, designed through clear interfaces and pipelines, with controlled releases and observable behavior in production.

In production, the failure is rarely the model—it is the implicit assumption about ordering, timing, or partial failure that never made it into the contract.

Ten years across backend, distributed, full-stack, and cloud systems. Agentic orchestration and ML platforms are where the work is.

Akhilesh Veerapareddy

Background

Operating experience

Work has moved from backend and distributed platform foundations into applied AI and real-time systems. What stayed constant is the integration problem: data, APIs, and orchestration meeting in places where production behavior is actually decided.

Reliability, latency, correctness under bad inputs, and enough signal to debug when behavior diverges under load: those are the constraints that keep showing up.

Roles

  • 2021 - present

    Senior Software Engineer, Mouritech LLC

    I build and own production systems end-to-end: designing and operating distributed systems across service layers, data pipelines, and orchestration boundaries.

    Ownership extends beyond delivery into how systems behave in production: degradation modes, rollback paths, incident response, and closing the gap between intended design and observed behavior.

    A growing share of that work is applied AI: integrating agentic workflows, orchestration layers, and retrieval-backed pipelines. Routing, evaluation, and reliability are treated as system concerns, not afterthoughts.

    The focus is on execution and integration around inference and retrieval, not training models from scratch. Most of the engineering judgment shows up when something upstream is slow, wrong, or missing: whether behavior stays bounded and explainable matters more than the nominal path.

    These systems span real-time and event-driven paths alongside batch and long-running workloads. Under load, the same constraints repeat: late or inconsistent inputs, schema drift, latency versus correctness tradeoffs, and observability that can reconstruct system behavior during incidents.

  • 2015 - 2021

    Associate Manager, Engineering, Mouritech Pvt Ltd

    Led architecture review and delivery across distributed systems, translating constraints into interfaces, boundaries, and failure modes that remain legible under incident conditions. The bar was operability: teams could reason about system behavior in production, not only on a diagram.

Focus

How I think about systems

Production is where models, data, and orchestration meet—and where timeouts, drift, and partial failure become the real spec. Under load, the seams you assumed were synchronized are usually where behavior first surprises you.

Failure behavior is part of the design.

When inputs skew or fail, outcomes stay explainable: retries, degradation, and escalation are explicit. The happy path is not the contract.

Constraints before interfaces.

Promise what you can defend on latency, consistency, and policy—then draw boundaries. Reverse that order and you tune the wrong layer.

What ran beats what was intended.

Incidents do not read the design doc—they read spans. That is the behavior you actually operated.

Own the tradeoff.

You cannot max every axis. I name what we optimize and revisit when traffic or risk shifts.

Models sit inside a system.

Routing, retrieval, evaluation, data quality—the checkpoint sits in the middle; reliability is everything around it.

Where this shows up

  • Agentic systems & orchestration

    Multi-step workflows and tool routing; RAG and NL-to-SQL behind APIs. Grounding and guardrails as operational concerns, not demo polish.

  • Inference & GPU serving

    Request paths, batching, queue depth, autoscaling signals, tail latency and cost. Tie slow or expensive calls to decisions in the serving stack—scheduling and capacity, not kernels.

  • ML platforms & control planes

    Lineage, promotion, serving telemetry—a hard wall between experiment and production so changes stay reviewable and rollback stays plausible.

  • Streaming & online decisions

    High-volume events and continuous signals; replay and reprocessing when online behavior has to be explained or corrected after the fact.

  • Distributed data processing

    Services and batch/stream pipelines at scale: correctness under schema drift, incremental vs batch tradeoffs, observability across hops—not bigger batches for their own sake.

  • Autonomy-related software

    Planning, replay, simulation, staged rollout—timing, logging, and evaluation loops, not only a model score.

Stack

Representative of layers I've built and operated—not a feature checklist.

Systems & application layer

Python · TypeScript · FastAPI · gRPC

Data & streaming

PostgreSQL · Redis · Kafka · Airflow

AI & inference

PyTorch · MLflow · serving paths, evaluation hooks, workflow integration around inference

Platform & operations

Docker · Kubernetes · AWS

Control surfaces

React · Next.js — operator and internal UIs when the system ships with a surface

Work

Systems built and owned

Constraints, interfaces, and behavior under load—not feature lists.

Summaries are drawn from shipping and operating real systems; names and commercial specifics are omitted. Links only where code is public.

Agentic AI Platform

A multi-agent execution platform over models, tools, and enterprise data, with explicit workflows, bounded actions, and traceable outcomes for every run.

Context

Teams increasingly embed LLMs into workflows that query systems, join data, and take actions. These flows break under real conditions: timeouts, partial data, tool failures, and policy constraints. The gap is rarely model capability alone. It is execution semantics: how work is decomposed, how tools are invoked safely, and how a run can be reconstructed when results are questioned.

System design

The platform treats reasoning as part of a structured execution system rather than a single model call. Workflows are defined as execution graphs where planners, specialist agents, and validators operate within clear boundaries. Retrieval, NL-to-SQL, observability queries, and external tools are invoked through typed contracts rather than embedded prompt logic. A central orchestrator owns execution state, step scheduling, retries, and failure handling. Runs are durable and replayable. They can resume after interruption, return partial results when dependencies fail, and preserve traceability across every step. Validation and policy enforcement are part of the execution path. Outputs are grounded in evidence, and any action is evaluated against access control and policy constraints before execution. Approval gates are enforced for higher-risk operations. Each step records inputs, outputs, latency, tool usage, and outcomes. This makes it possible to inspect where a run stalled, failed, or produced low-confidence results. The system prioritizes explicit contracts and observability over implicit behavior. A post-execution analysis layer, the Mukti Agent, processes traces to identify recurring failure patterns, planning inefficiencies, and validation gaps. Improvements are introduced through controlled updates rather than uncontrolled online learning.

Constraints & tradeoffs

Explicit orchestration, validation, and policy enforcement add latency compared with single-call systems. Retrieval and guardrails introduce extra round-trips, but improve correctness, safety, and auditability. Bounded agents and typed tool contracts reduce flexibility, but make behavior more predictable and debuggable. Idempotent tool design and durable execution add upfront complexity, but prevent cascading failures when upstream systems return partial or inconsistent responses. The platform favors inspectable, controlled execution over open-ended autonomy, especially for workflows that touch external systems or policy-sensitive actions.

Ownership

End-to-end design of the execution runtime, including orchestration, agent boundaries, tool integration, validation, and policy enforcement. Defined execution semantics for retries, partial results, and failure handling. Built traceability and replay as core primitives, and designed the post-execution improvement loop for continuous system refinement.

GPU Inference Platform

Inference serving where admission, batch formation, and queue depth set tail latency and cost—not dashboard throughput.

Context

Scarce GPU capacity was shared across workloads and revision cadences. Headline QPS hid p99 pain and cost per good response; a bad revision or noisy neighbor could starve others without isolation and rollback.

System design

Admission, batching, and execution are separated; latency and utilization map to revision, queue depth, and pool. Batch scheduling reacts to queue age and pressure, not a fixed size. Autoscaling follows queue depth and sustained load—CPU is rarely the bottleneck. Isolation and revision rollback cap blast radius when behavior regresses or one tenant dominates a pool.

Constraints & tradeoffs

Larger batches raise throughput and hurt small-request latency; dynamic batching trades simplicity for tunability. Reserved versus opportunistic capacity trades cost against eviction risk. Per-request attribution adds overhead; it shortens incidents when the pool is shared.

Ownership

Serving topology, capacity signals, and behavior under shared GPU pools and rolling revisions.

Enterprise RAG Platform

Governed retrieval and generation: corpus access and policy run before a response leaves the service.

Context

Answers had to honor document scope and role policy at interactive latency. Unstructured retrieval added noise; unconstrained generation was not shippable.

System design

Ingestion, indexing, and query paths expose which corpus a principal may read. Retrieved context and policy gates precede return. Retrieval and generation emit enough signal to debug bad answers without replaying live traffic. Grounding checks ship with the path, not as a one-off benchmark.

Constraints & tradeoffs

Freshness competes with ingestion cost and latency. Tighter grounding adds per-request work. Wider retrieval raises recall until precision and latency break; that trade is owned jointly with policy owners.

Ownership

Ingestion, retrieval services, and inference integration; evaluation bars and enforcement aligned with security and product.

Streaming Decision System

Stream-backed decisions with durable offsets: rules re-run against stored inputs when accountability or correction outlives the live path.

Context

Decisions ran on high-volume streams with reordering and late data. Operators needed to answer what was decided and why long after the fact.

System design

Event-time processing with durable offsets and replay hooks; decision logic stays off the wire so the same inputs reproduce decisions when determinism matters. Downstream effects are idempotent where the broker may redeliver.

Constraints & tradeoffs

Strict ordering and exactly-once are costly; replay plus idempotency beat assuming a lossless stream. Freshness targets trade against reproducible re-runs when semantics require them.

Ownership

Event plumbing, decision modules, and replay and audit tooling.

MLOps Platform

Promotion and lineage: experiment namespaces never resolve in production; only promoted artifacts bind downstreams.

Context

Training and notebooks had to reach serving without “ran once” counting as production-ready for consumers of the artifact.

System design

Gates and lineage isolate sandboxes from what downstreams load. Production dependency resolution never points at experiment namespaces—promotion is the only bridge. Serving surfaces revision and telemetry together so behavior shifts are visible before traffic.

Constraints & tradeoffs

Speed of change trades against gate strictness; the bias is explicit promotion over unchecked deploys. Shared rules trade some local autonomy for one rollback story.

Ownership

Control plane, integration with serving and delivery, and rollback when artifacts regress.

AutonomyOS

Autonomy stacks where logs and replay make episodes reconstructable before new logic rides live hardware.

Context

New behavior could not be proven only on live hardware; operations needed repeatable episodes short of full deployment.

System design

Planning and perception interfaces record state for reconstruction; rollout stages fidelity before production. Clocks and ordering in logs align replay with what the platform actually saw. Evaluation ties logged outcomes to decisions, not only offline scores.

Constraints & tradeoffs

Logging depth trades against bandwidth and storage; simulation fidelity trades against iteration speed. The problem is timing, drops, and partial views—not only accuracy on a fixed set.

Ownership

Logging, replay, and rollout discipline with teams owning models and hardware interfaces.

GitHub for additional public repositories.

Reference

Manager signal (fleet & analytics)

Validates delivery on operational, data-heavy products: real-time insight, cross-functional work, and sound technical execution, adjacent to the kinds of systems AI platforms often integrate with.

I highly recommend Akhilesh as an exceptionally productive software engineer. He led operational dashboards for fleet vehicles with business and data teams (React, GraphQL, Cube, Snowflake) and delivered real-time insight for operational decisions. He translates complex requirements into clear technical work and maintains strong cross-functional relationships.

Anto Thomas, Head of Data Management Office, Canoo · Managed directly · July 2025

Contact

Hiring and collaboration

I build production AI systems across orchestration, inference, and ML platforms—especially where the goal is to ship systems others can operate and extend, not just prototype models.

Best way to reach me is email.