Prometheus & Grafana Observability

Back to Projects

Full-stack SRE observability infrastructure designed to monitor containerized Kubernetes pods. Features customizable metric scrape policies, rule alerts, routing integrations, and premium visual dashboards in Grafana.

💡 What We Will Learn in This Repo

1

Prometheus Scrape Configurations

Define scrape intervals, dynamic SD tags, and path variables inside YAML to track applications metrics endpoints.
2

Alertmanager Notification routing

Construct complex receivers routing alert groups to specific Slack channels and SMTP email groups.
3

Custom PromQL Alert Rules

Write alerting threshold queries assessing target instances down, memory usage increases, and volume capacities.
4

Exporter Agent configurations

Configure Prometheus node-exporters and kube-state-metrics telemetry to scrap host-level data logs.

📖 Step-by-Step Installation Guide

1 Clone the Repository

Fetch the repository to your local workspace:

bash

git clone https://github.com/Pradeeptalari14/sre-monitoring-system.git
cd sre-monitoring-system

2 Validate Config Syntax

Verify that your Prometheus and Alertmanager configurations are free of structure errors:

bash

promtool check config config/prometheus.yml
amtool check-config config/alertmanager.yml

3 Deploy Exporters and Monitoring Stack

Deploy the YAML manifests inside your Kubernetes monitoring namespace:

bash

kubectl create namespace monitoring
kubectl apply -f manifests/ -n monitoring

4 Bind Grafana dashboards

Port-forward Grafana web port locally, connect Prometheus data source, and load templates:

bash

kubectl port-forward svc/grafana 3000:80 -n monitoring
# Access localhost:3000 (admin/admin) and import dashboard IDs: 1860, 8685

🔄 Things You Need to Replace (Customization Checklist)

Adapt Alertmanager credentials inside config files before starting monitoring:

Target Element	File Location	Placeholder / Target variable
Slack Webhook URLs	`config/alertmanager.yml`	`https://hooks.example.com/services/...` (replace with your hook URL)
SMTP Email login	`config/alertmanager.yml`	`SMTP_PASSWORD_PLACEHOLDER` (supply your SMTP password)
Custom Recipient List	`config/alertmanager.yml`	`talaripradeep45@gmail.com` (replace with target SRE list)
Metric scraping Targets	`config/prometheus.yml`	`targets: ['localhost:9090']` (update to actual service endpoints)

📊 Architectural Workflow

graph TD
    Pods[App Pods / Nodes] -->|Expose Metrics| Exp[Node Exporter / kube-state]
    Prom[Prometheus Server] -->|Scrapes / Pulls| Exp
    
    subgraph Observability Engine
        Prom -->|Stores| TSDB[(Time Series DB)]
        Prom -->|Evaluates| Rules[PromQL Rules Evaluator]
    end
    
    Rules -->|Alert Trigger| AM[Alertmanager Dispatcher]
    Grafana[Grafana Dashboard] -->|Queries| Prom
    
    AM -->|Route Slack Payloads| Slack[Slack Channel API]
    AM -->|Route Email Alerts| SMTP[SMTP Relay Server]

🛠️ Useful Commands (Project Reference)

Common CLI tasks for validating monitoring configurations:

# Check Prometheus syntaxes:
promtool check config config/prometheus.yml

# Test Alertmanager rule definitions:
amtool check-config config/alertmanager.yml

# Force reload Prometheus configs:
curl -X POST http://localhost:9090/-/reload

# Query alert status via amtool:
amtool alert --alertmanager.url=http://localhost:9093

💡 What We Will Learn in This Repo

Prometheus Scrape Configurations

Alertmanager Notification routing

Custom PromQL Alert Rules

Exporter Agent configurations

📖 Step-by-Step Installation Guide

🔄 Things You Need to Replace (Customization Checklist)

📊 Architectural Workflow

🛠️ Useful Commands (Project Reference)