http log 統計、分析

2024-04-17

先確認 log 檔案的格式

apache httpd 記錄檔的設定:

/etc/httpd/conf/httpd.conf

內定值是
LogFormat "%h %l %u %t \"%r\" %>s %b" common
改為
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{X-Forwarded-For}i\" %Dus" common

nginx

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '$upstream_cache_status "$http_user_agent" "$http_x_forwarded_for"';

log檔案的內容 會類似:

192.168.0.202 - - [12/May/2024:16:19:23 +0900] "GET /robots.txt HTTP/1.0" 200 22 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "66.249.66.204,172.71.223.9" 7836us
192.168.0.202 - - [12/May/2024:16:34:23 +0900] "GET /blog/100086 HTTP/1.0" 200 16425 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.6367.155 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "66.249.66.203, 172.71.223.69" 39395us
192.168.0.202 - - [12/May/2024:16:49:23 +0900] "GET /?page=2 HTTP/1.0" 200 8775 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.6367.155 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "66.249.66.205, 172.71.222.204" 14903us
192.168.0.202 - - [12/May/2024:17:04:23 +0900] "GET /blog/100062 HTTP/1.0" 302 - "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, likeGecko) Chrome/124.0.6367.155 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "66.249.66.204, 172.70.43.199" 12256us
192.168.0.202 - - [12/May/2024:17:04:24 +0900] "GET /blog/aws_waf HTTP/1.0" 200 20520 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/124.0.6367.155 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "66.249.66.204, 172.70.42.242" 9967us
192.168.0.202 - - [12/May/2024:17:34:23 +0900] "GET /robots.txt HTTP/1.0" 200 22 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "66.249.66.204,172.70.175.183" 10242us

如何統計流量

統計網站總流量(GB):

$ cat /var/log/httpd/access_log |awk '{sum+=$10} END {print sum/1024/1024/1024}'
12.00783026



列出前10大 request數量的 ip:

$ cat /var/log/httpd/access_log |awk '{print $1}'|sort|uniq -c|sort -nr|head -10
   3272 192.168.0.202
    213 192.168.0.152
    113 192.168.0.160
     11 192.168.0.150
         ::



列出輸出大於 100kb 的 jpg/png/gif/pdf 等檔案被讀取的次數、file size:

$ cat /var/log/httpd/access_log |awk '($10 > 100000 && $7~/.jpg|.png|.pdf|.gif/){print $10 " " $7}'|sort -n|uniq -c|sort -nr|head -50
      7 177755 /upload/2024/04/p129165650c_263451.b.jpg
      1 302675 /upload/2024/03/pbf1d1b9a9_64bbaf.b.jpg
      1 280064 /upload/2024/03/pbf1a5aa04_04ea33.b.jpg
      1 241164 /upload/2024/03/pbf1d1b9a9_64bbaf.jpg
      1 229435 /upload/2024/03/pbf1a5aa04_04ea33.jpg
      1 220945 /upload/2024/03/pbf1bdfd87_7428c0.b.jpg
      1 208685 /upload/2024/03/pdc09b771e_9857c5.b.jpg
      ::



列出最耗時的頁面 (超過400000us/0.4秒) -1:
$ cat access_log | awk -F\" '{print $NF,$2}' | awk '{print $1,$3}' | sed 's/us=//' | sed 's/us//' | awk '{if ($1 > 400000) {print $1, $2}}' |sort -nr
1708200 /
693955 /blog/100086
617641 /blog/100086
558629 /blog/etf00939_00940
539513 /blog/cat/55
531085 /dv/stat/blog?act=referer
486492 /

列出最耗時的頁面 (超過400000us/0.4秒) -2:
$ cat access_log | awk -F\" '{print $NF,$2,$1}' | awk '{print $1,$3,$5,$8}' | sed 's/us=//' | sed 's/us//' | awk '{if ($1 > 400000) {print $1, $2,$3,$4}}' |sort -nr
1708200 / 192.168.0.202 [14/May/2024:11:03:15
693955 /blog/100086 192.168.0.202 [14/May/2024:11:03:50
617641 /blog/100086 192.168.0.202 [14/May/2024:11:03:24
558629 /blog/etf00939_00940 192.168.0.202 [14/May/2024:01:04:36
539513 /blog/cat/55 192.168.0.202 [14/May/2024:11:03:40
531085 /dv/stat/blog?act=referer 192.168.0.202 [14/May/2024:16:18:06
486492 / 192.168.0.202 [14/May/2024:10:55:08

**注意:
awk -F\" 是指用 " 來分割 access_log 中的資料,而不是內定的"空格"
因 log檔案中 若有紀錄 user agent 時,分割會異常


參考:

apache httpd log 檔案格式

分類:網站設計      151
Tag apache , httpd , nginx ,
留言

留言
top