Talari Pradeep

AI Infrastructure Engineer |

I architect scalable AI infrastructure, platform MLOps/LLMOps pipelines, and multi-agent orchestration frameworks — from model design to production rollout. Passionate about self-healing clusters, speed, and high reliability.

View My Work Get In Touch

0 Years Exp.

0 Projects

0 Certs

0 % Uptime

sre-shell@tp-core:~

LIVE

visitor@tp-shell:~$

About Me

The Engineer Behind the Stack

Bangalore, India

Open to Work

I'm an AI Infrastructure & Platform Engineer with a passion for building robust, scalable platforms that power intelligent applications.

With 6+ years of IT operations experience and expertise across Azure, AWS, Kubernetes, and Terraform, I specialize in designing LLMOps pipelines, multi-agent orchestration frameworks (LangGraph), and semantic document search engines (RAG) that run at scale.

Working as an Infrastructure & AI Engineer at Accenture in Bangalore, I focus deeply on platform reliability, self-healing architectures, and automated MLOps pipelines — enabling teams to transition models to production seamlessly.

Multi-Agent Orchestration & LangGraph OS

Cloud-Native MLOps/LLMOps on AKS & EKS

Enterprise RAG Platforms & Vector DBs

Full-Stack Observability & OpenTelemetry

GitHub Profile LinkedIn

Playground

AI Infrastructure & Platform Sandbox

Experiment with live AI infrastructure parameters, trace network flows, scale model worker nodes, and audit platform resources.

Target SLA: 99.9%

99.0% 99.9% 99.99% 99.999%

Weekly Budget 1.68 hrs

Monthly Budget 7.31 hrs

Yearly Budget 3.65 days

System: OPERATIONAL

Error Budget Remaining: 100%

💻

Git Repo

v1.2.0

➜

⚓

ArgoCD

Synced

Ingress Route: us-east-1 (Primary)

us-east-1 (Primary) Online

pod-0

pod-1

pod-2

pod-3

eu-central-1 (DR) Standby

pod-dr-0

pod-dr-1

pod-dr-2

pod-dr-3

CPU Usage

28%

RAM Usage

45%

API Latency

120ms

Error Rate

0.0%

SRE Automation Log (Live Stream)

[HEALER] Health check active - all probes OK.

Select Runbook Scenario:

Select a runbook scenario to start the SRE checklist.

User Space

🖥️

HTTP Web Server

LISTEN

Kernel Space (eBPF Probes)

⚡

XDP Probe

Block Port 22 (SSH) Drop TCP Handshake

[SYSTEM] eBPF probes attached. Waiting for frames...

eBPF Syscall Tracer: Stream System Calls

main.tf (Editable)

resource "aws_security_group" "web" {
  name = "web-sec-group"
  ingress {
    from_port = 80
    to_port   = 80
    protocol  = "tcp"
  }
}

Live Cloud Topology

🌐

VPC

🔒

SG (Port 80)

$ terraform plan

No changes. Infrastructure matches configuration.

CPU Critical

Disk Exhaustion

➜

Rules Engine

Silences: None

➜

Slack 0

Silence CPU Warnings Route Disk to PagerDuty

📥

Checkout

🧪

Lint & Test

🛡️

Security

📦

Build

🚀

Deploy

Fail Test Suite (Vitest) Inject CVE (Trivy Scan)

[SYSTEM] Pipeline idle. Ready for execution...

Pods Target 2

Nodes Count 1

Hourly Cost $0.24

Prioritize Spot Instances (Cost Saving)

Ingress

Splitter

90/10

v1.0 (Stable) 100% OK

v2.0 (Canary) 100% OK

Canary Split: 10%

Inject Canary Error Spike (Trigger Auto-Rollback)

us-east-1a

App Node

DB Replica

us-east-1b

App Node

DB Replica

us-east-1c

App Node

DB Replica

Uptime Zones: 3 / 3 Active

💥 Blast Radius Heatmap: System Stable (100% Availability)

VPC Gateways
DB Clustering
App Routing

base.yaml (Base manifest)

patch.yaml (Overlay patch)

resolved.yaml (Compiled output)

Contact

Let's Connect

Open to full-time roles, freelance contracts, and interesting collaborations. Drop me a message!

Talari Pradeep

The Engineer Behind the Stack

AI Infrastructure & Platform Sandbox

SLA & Error Budget

ArgoCD GitOps Rollout

us-east-1 (Primary) Online

eu-central-1 (DR) Standby

Chaos & Auto-Healing

SRE Runbook Simulator

eBPF Kernel Sniffer

Terraform Drift Engine

Alertmanager Routing

CI/CD Pipeline Runner

Karpenter Autoscaler

Canary Traffic Splitter

Chaos Monkey Multi-AZ

Manifest Overlay Compiler

📈 Prometheus Live Timeline

Let's Connect

TP. AI Platform Copilot

Talari Pradeep

The Engineer Behind the Stack

AI Infrastructure & Platform Sandbox

SLA & Error Budget

ArgoCD GitOps Rollout

us-east-1 (Primary) Online

eu-central-1 (DR) Standby

Chaos & Auto-Healing

SRE Runbook Simulator

eBPF Kernel Sniffer

Terraform Drift Engine

Alertmanager Routing

CI/CD Pipeline Runner

Karpenter Autoscaler

Canary Traffic Splitter

Chaos Monkey Multi-AZ

Manifest Overlay Compiler

📈 Prometheus Live Timeline

Let's Connect

⚙️ SRE Portal Backups & Settings