性能调优与监控

NGINX 性能调优需要结合配置优化、系统调优和监控分析，形成闭环。

性能调优清单

基础配置检查

nginx

# worker 进程
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 10240;
    multi_accept on;
}

# 网络优化
http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    
    # keepalive
    keepalive_timeout 65;
    keepalive_requests 100;
    
    # 缓冲区
    client_body_buffer_size 16k;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;
    
    # 压缩
    gzip on;
    gzip_comp_level 5;
    gzip_types text/plain text/css application/json application/javascript;
}

监控指标

stub_status 模块

nginx

server {
    location /nginx_status {
        stub_status;
        allow 10.0.0.0/8;
        deny all;
    }
}

输出：

nginx

Active connections: 291 
server accepts handled requests
 16630948 16630948 31070465 
Reading: 6 Writing: 179 Waiting: 106

Active connections — 当前活跃连接数
accepts — 累计接受的连接数
handled — 累计处理的连接数
requests — 累计处理的请求数
Reading — 正在读取请求头的连接数
Writing — 正在发送响应的连接数
Waiting — keepalive 空闲连接数

关键指标解读

指标	正常范围	告警阈值
Active connections	< worker_connections × 80%	> 90%
Waiting	高是好事（keepalive 生效）	低可能 keepalive 未生效
accepts ≠ handled	应该相等	不等说明连接被丢弃

日志监控

慢请求检测

Bash

log_format timed '$remote_addr - $request - $status - $request_time s';
access_log /var/log/nginx/slow.log timed if=$slow_request;

map $request_time $slow_request {
    default 0;
    ~^[3-9] 1;  # 大于 3 秒
}

错误率监控

Bash

# 实时 5xx 错误率
tail -f /var/log/nginx/access.log | awk '$9 ~ /^5/ {count++} END {print count}'

# 按分钟统计错误率
awk '{print substr($4,2,17)}' /var/log/nginx/access.log | \
  sort | uniq -c | sort -nr | head

Prometheus 集成

nginx-exporter

Bash

# 安装
docker run -d -p 9113:9113 nginx/nginx-prometheus-exporter \
  --nginx.scrape-uri=http://localhost/nginx_status

# Prometheus 配置
scrape_configs:
  - job_name: nginx
    static_configs:
      - targets: ['localhost:9113']

nginx-exporter 自动采集 stub_status 数据并暴露 Prometheus 格式指标。

核心指标

nginx_connections_active — 活跃连接
nginx_connections_waiting — 空闲连接
nginx_http_requests_total — 总请求数
nginx_http_request_time_seconds — 请求延迟直方图

压测工具

ab / wrk 基准测试

text

# ab 测试
ab -n 10000 -c 100 http://localhost/

# wrk 测试（推荐）
wrk -t12 -c400 -d30s http://localhost/

压测前关闭 access_log 避免 I/O 干扰。比较调优前后的 RPS（每秒请求数）和延迟。

瓶颈定位

常见瓶颈

CPU 满载：worker_processes 不足或 gzip 级别过高
文件描述符耗尽：worker_rlimit_nofile 太低
内存不足：proxy_cache_path max_size 过大
连接排队：somaxconn 和 tcp_max_syn_backlog 太小
磁盘 I/O：日志缓冲未开启或静态文件过多

要点总结

stub_status 提供实时连接数和请求统计
accepts == handled 确保无连接丢弃
Waiting 值高说明 keepalive 正常工作
慢请求使用 map + 条件日志单独记录
Prometheus + nginx-exporter 实现指标采集和告警
压测使用 wrk 比 ab 更准确（多线程、更真实）
调优是持续过程，需结合实际流量数据驱动

📝 发现内容有误？点击此处直接编辑