全部学科
Python全栈
python
NodeJS全栈
nodejs
小程序首页
📅 2026-05-21 7 分钟 ✍️ juanwangdev

日志聚合与监控栈

容器监控需要日志聚合和指标监控,下面介绍完整监控栈部署。

Prometheus 部署

docker-compose.yml

YAML
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'

volumes:
  prometheus-data:

prometheus.yml

YAML
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'docker'
    static_configs:
      - targets: ['host.docker.internal:9323']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

cAdvisor

YAML
services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
    command:
      - '-docker_only'

Grafana 部署

YAML
services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

volumes:
  grafana-data:

数据源配置

Bash
# 添加 Prometheus 数据源
# Grafana UI: Configuration → Data Sources → Prometheus
# URL: http://prometheus:9090

告警规则

prometheus-rules.yml

YAML
groups:
  - name: container-alerts
    rules:
      - alert: HighCPUUsage
        expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} high CPU usage"

      - alert: HighMemoryUsage
        expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.name }} high memory usage"

通知渠道

YAML
# alertmanager.yml
route:
  receiver: 'slack-notifications'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#alerts'
        send_resolved: true
        text: "{{ .CommonAnnotations.summary }}"

常用指标

指标说明告警阈值
CPU 使用率container_cpu_usage_seconds_total>80%
内存使用率container_memory_usage_bytes>90%
磁盘 IOcontainer_fs_reads_bytes_total异常增长
网络 IOcontainer_network_receive_bytes_total异常增长
容器重启container_restart_count>3 次/小时

仪表板

Bash
# 导入预置仪表板
# Grafana → Import → Dashboard ID: 193 (Docker)
# Grafana → Import → Dashboard ID: 893 (cAdvisor)

要点总结

  • Prometheus 采集和存储容器指标,cAdvisor 提供 Docker 指标
  • Grafana 可视化和告警,支持多种数据源
  • 告警规则定义阈值(CPU、内存、重启次数)
  • Alertmanager 发送通知到 Slack、邮件等渠道
  • 生产环境必须部署完整监控栈,及时发现问题

📝 发现内容有误?点击此处直接编辑

← 上一篇 故障排查工具箱
下一篇 → 版本升级策略
想查看更多题目和详细解析?
小程序提供完整的题库、模拟考试和详细解析
马上就来

长按或扫描二维码,立即体验

扫码体验小程序
马上就来
使用微信扫描二维码
立即体验完整题库