Prometheus & Grafana Observability
Full-stack SRE observability infrastructure designed to monitor containerized Kubernetes pods. Features customizable metric scrape policies, rule alerts, routing integrations, and premium visual dashboards in Grafana.
π‘ What We Will Learn in This Repo
-
Prometheus Scrape Configurations
Define scrape intervals, dynamic SD tags, and path variables inside YAML to track applications metrics endpoints.
-
Alertmanager Notification routing
Construct complex receivers routing alert groups to specific Slack channels and SMTP email groups.
-
Custom PromQL Alert Rules
Write alerting threshold queries assessing target instances down, memory usage increases, and volume capacities.
-
Exporter Agent configurations
Configure Prometheus node-exporters and kube-state-metrics telemetry to scrap host-level data logs.
π Step-by-Step Installation Guide
Fetch the repository to your local workspace:
git clone https://github.com/Pradeeptalari14/sre-monitoring-system.git
cd sre-monitoring-system
Verify that your Prometheus and Alertmanager configurations are free of structure errors:
promtool check config config/prometheus.yml
amtool check-config config/alertmanager.yml
Deploy the YAML manifests inside your Kubernetes monitoring namespace:
kubectl create namespace monitoring
kubectl apply -f manifests/ -n monitoring
Port-forward Grafana web port locally, connect Prometheus data source, and load templates:
kubectl port-forward svc/grafana 3000:80 -n monitoring
# Access localhost:3000 (admin/admin) and import dashboard IDs: 1860, 8685
π Things You Need to Replace (Customization Checklist)
Adapt Alertmanager credentials inside config files before starting monitoring:
| Target Element | File Location | Placeholder / Target variable |
|---|---|---|
| Slack Webhook URLs | config/alertmanager.yml |
https://hooks.example.com/services/... (replace with your hook URL) |
| SMTP Email login | config/alertmanager.yml |
SMTP_PASSWORD_PLACEHOLDER (supply your SMTP password) |
| Custom Recipient List | config/alertmanager.yml |
talaripradeep45@gmail.com (replace with target SRE list) |
| Metric scraping Targets | config/prometheus.yml |
targets: ['localhost:9090'] (update to actual service endpoints) |
π Architectural Workflow
π οΈ Useful Commands (Project Reference)
Common CLI tasks for validating monitoring configurations:
# Check Prometheus syntaxes:
promtool check config config/prometheus.yml
# Test Alertmanager rule definitions:
amtool check-config config/alertmanager.yml
# Force reload Prometheus configs:
curl -X POST http://localhost:9090/-/reload
# Query alert status via amtool:
amtool alert --alertmanager.url=http://localhost:9093