Back to Projects
observability

Prometheus & Grafana Observability

Full-stack SRE observability infrastructure designed to monitor containerized Kubernetes pods. Features customizable metric scrape policies, rule alerts, routing integrations, and premium visual dashboards in Grafana.

πŸ’‘ What We Will Learn in This Repo

  • 1

    Prometheus Scrape Configurations

    Define scrape intervals, dynamic SD tags, and path variables inside YAML to track applications metrics endpoints.

  • 2

    Alertmanager Notification routing

    Construct complex receivers routing alert groups to specific Slack channels and SMTP email groups.

  • 3

    Custom PromQL Alert Rules

    Write alerting threshold queries assessing target instances down, memory usage increases, and volume capacities.

  • 4

    Exporter Agent configurations

    Configure Prometheus node-exporters and kube-state-metrics telemetry to scrap host-level data logs.

πŸ“– Step-by-Step Installation Guide

1 Clone the Repository

Fetch the repository to your local workspace:

bash
git clone https://github.com/Pradeeptalari14/sre-monitoring-system.git
cd sre-monitoring-system
2 Validate Config Syntax

Verify that your Prometheus and Alertmanager configurations are free of structure errors:

bash
promtool check config config/prometheus.yml
amtool check-config config/alertmanager.yml
3 Deploy Exporters and Monitoring Stack

Deploy the YAML manifests inside your Kubernetes monitoring namespace:

bash
kubectl create namespace monitoring
kubectl apply -f manifests/ -n monitoring
4 Bind Grafana dashboards

Port-forward Grafana web port locally, connect Prometheus data source, and load templates:

bash
kubectl port-forward svc/grafana 3000:80 -n monitoring
# Access localhost:3000 (admin/admin) and import dashboard IDs: 1860, 8685

πŸ”„ Things You Need to Replace (Customization Checklist)

Adapt Alertmanager credentials inside config files before starting monitoring:

Target Element File Location Placeholder / Target variable
Slack Webhook URLs config/alertmanager.yml https://hooks.example.com/services/... (replace with your hook URL)
SMTP Email login config/alertmanager.yml SMTP_PASSWORD_PLACEHOLDER (supply your SMTP password)
Custom Recipient List config/alertmanager.yml talaripradeep45@gmail.com (replace with target SRE list)
Metric scraping Targets config/prometheus.yml targets: ['localhost:9090'] (update to actual service endpoints)

πŸ“Š Architectural Workflow

graph TD
    Pods[App Pods / Nodes] -->|Expose Metrics| Exp[Node Exporter / kube-state]
    Prom[Prometheus Server] -->|Scrapes / Pulls| Exp
    
    subgraph Observability Engine
        Prom -->|Stores| TSDB[(Time Series DB)]
        Prom -->|Evaluates| Rules[PromQL Rules Evaluator]
    end
    
    Rules -->|Alert Trigger| AM[Alertmanager Dispatcher]
    Grafana[Grafana Dashboard] -->|Queries| Prom
    
    AM -->|Route Slack Payloads| Slack[Slack Channel API]
    AM -->|Route Email Alerts| SMTP[SMTP Relay Server]
            

πŸ› οΈ Useful Commands (Project Reference)

Common CLI tasks for validating monitoring configurations:

# Check Prometheus syntaxes: promtool check config config/prometheus.yml # Test Alertmanager rule definitions: amtool check-config config/alertmanager.yml # Force reload Prometheus configs: curl -X POST http://localhost:9090/-/reload # Query alert status via amtool: amtool alert --alertmanager.url=http://localhost:9093
πŸ“‹ Code copied to clipboard!