Back to Portfolio
βš“

Enterprise AWS EKS Deployment

Designed and deployed a highly available Kubernetes cluster on AWS EKS across public/private subnets. Integrated AWS Load Balancer Controller, HPA, and core ingress controllers.

View setup Steps & Diagram
EKSAWS VPCKubernetesHelm
πŸš€

Jenkins Pipeline Shared Library

Created a reusable Jenkins Shared Library in Groovy to standardize CI/CD pipelines across 10+ microservices. Built automated stages for build, containerization, and EKS rollouts.

View setup Steps & Diagram
JenkinsGroovyDockerGitHub
πŸ”­

Prometheus & Grafana Monitoring

Configured full observability for AWS EKS workloads. Setup node-exporters, kube-state-metrics, Prometheus alerts, and customized dashboard visualizations in Grafana.

View setup Steps & Diagram
PrometheusGrafanaCloudWatchSlack alerts
πŸ—οΈ

Automated Terraform IaC Modules

Developed reusable Terraform modules for standard AWS resource provisioning (VPC, EC2, S3, IAM, ASG). Implemented secure remote state locking via DynamoDB.

View setup Steps & Diagram
TerraformAWSIaCHCL
πŸ’°

AWS Cost Optimizer Bot

Wrote automated Python Lambda functions triggered by EventBridge scheduler to clean up unattached EBS volumes and shut down non-prod resources outside working hours.

View setup Steps & Diagram
PythonAWS LambdaEventBridgeCost Explorer
🧠

RAG Knowledge Chatbot

An offline Enterprise RAG chatbot that crawls local runbooks and manuals, computes embeddings using nomic-embed-text, stores them in ChromaDB/pgvector, and queries local LLMs for factual DevOps answers.

ChromaDBLangChainFastAPIStreamlit
πŸ€–

DevOps Copilot (LLM Integration)

A code/config explanation assistant integrated across all 15 studios. Automatically parses Helm charts, Terraform configurations, Ansible playbooks, and Dockerfiles to explain SRE implications.

Ollama APIQwen 3 CodingJavaScriptHTML5
πŸͺ΅

AI-Powered Log Analyzer

Real-time log collector and parser that maps application log streams. It flags critical error stack traces (like Java Maven failures or Nginx proxy breaks) and uses AI to summarize root causes.

ELK StackGrafana LokiFastAPIRegEx Parser
πŸ›‘οΈ

SRE GenAI Copilot

An automated risk auditor integrated into pipelines. Checks Terraform planning files and Kubernetes values parameters for scaling shortages or configuration vulnerabilities before rollout.

PrometheusTrivy ScanPython SDKRisk Analysis
⚑

Self-Healing Infrastructure

An alert-to-remediation system that subscribes to Prometheus/Alertmanager hooks. When alerts trigger (e.g. disk usage > 95%), it executes self-healing playbooks to clean up host caches.

AlertmanagerAnsible RunnerShell scriptsCron