0%

Filebeat-Logstash-Elasticsearch解析CSV

逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。纯文本意味着该文件是一个字符序列,不含必须像二进制数字那样被解读的数据。CSV文件由任意数目的记录组成,记录间以某种换行符分隔;每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符。通常,所有记录都有完全相同的字段序列。

Filebeat-Logstash-Elasticsearch解析CSV

一、准备

  1. 分别下载安装相同版本的Elasticsearch、Logstash、Filebeat
  2. 准备测试数据movies.csv传送门,删除一部分,cat movies.csv
    • 最终目录/Users/your_path/elk/ml-25m/movies.csv
1
2
3
4
movieId,title,genres
38,It Takes Two (1995),Children|Comedy
39,Clueless (1995),Comedy|Romance
40,"Cry, the Beloved Country (1995)",Drama

二、实战

  1. 启动Elasticsearch实例./bin/elasticsearch,默认配置config/elasticsearch.yml
1
2
3
4
5
6
7
8
9
10
11
12
#cluster.name: my-application
#node.name: node-1
#node.attr.rack: r1
#path.data: /path/to/data
#path.logs: /path/to/logs
#bootstrap.memory_lock: true
#network.host: 192.168.0.1
#http.port: 9200
#discovery.seed_hosts: ["host1", "host2"]
#action.destructive_requires_name: true
http.cors.enabled: true
http.cors.allow-origin: "*"
  1. 启动Logstash实例./bin/logstash -f config/logstash.conflogstash.conf内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
beats {
port => 9011
}
#file {
# path => ["/Users/your_path/elk/ml-25m/movies.csv"]
# start_position => "beginning"
# sincedb_path => null
#}
}

filter {
csv {
separator => ","
columns => ["movieId","title","genre"]
}
mutate {
split => { "genre" => "|" }
# remove_field => ["path", "host","@timestamp","message"]
}
mutate {
convert => {
"year" => "integer"
}
strip => ["title"]
#remove_field => ["path", "host","@timestamp","message","content"]
}
}

output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "movies"
}
stdout {}
}
  1. 启动Filebeat实例./filebeat -e -c movie.yml,配置如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
filebeat.inputs:
- type: log
enabled: true
paths:
- /Users/your_path/elk/ml-25m/movies.csv

setup.template.settings:
index.number_of_shards: 1

output.logstash:
hosts: ["localhost:9011"]

#output.elasticsearch:
#hosts: ["localhost:9200"]
#protocol: "http"
#index: "filebeat-movies-%{+yyyy.MM.dd}"

setup.ilm.enabled: false
setup.template.name: "filebeat-movies"
setup.template.pattern: "filebeat-movies-*"
  1. 查询

    • curl -XGET "localhost:9200/movies/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"Beloved*"}}}'
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    {
    "took" : 107,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 1,
    "relation" : "eq"
    },
    "max_score" : 0.94597876,
    "hits" : [
    {
    "_index" : "movies",
    "_type" : "_doc",
    "_id" : "On05CXkBxPLXGnNxw4p0",
    "_score" : 0.94597876,
    "_source" : {
    "movieId" : "40",
    "title" : "Cry, the Beloved Country (1995)"
    }
    }
    ]
    }
    }
  2. 追加数据

    • echo 111,liusir,haha >> movies.csv
    • echo 222,cry,hehe >> movies.csv
    • echo 222,take,hehe >> movies.csv
    • echo 222,takes off,heihei >> movies.csv
  3. 查询title包含takes的文档

    • curl -XGET "localhost:9200/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"takes*"}}}'
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    {
    "took" : 489,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 2,
    "relation" : "eq"
    },
    "max_score" : 1.3125186,
    "hits" : [
    {
    "_index" : "movies",
    "_type" : "_doc",
    "_id" : "QH3nC3kBxPLXGnNxIYo-",
    "_score" : 1.3125186,
    "_source" : {
    "movieId" : "222",
    "title" : "takes off"
    }
    },
    {
    "_index" : "movies",
    "_type" : "_doc",
    "_id" : "OX05CXkBxPLXGnNxw4pz",
    "_score" : 0.9411969,
    "_source" : {
    "movieId" : "38",
    "title" : "It Takes Two (1995)"
    }
    }
    ]
    }
    }

三、参考

  1. 参考一
  2. 参考二
  3. 参考三
  4. 参考四
  5. 参考五

Filebeat-Logstash-Elasticsearch解析nginx日志

一、概念

  1. access_log,一般配合log_format、open_log_file_cache使用。

    • 语法:access_log path [format [buffer=size] [gzip[=level]] [flush=time] [if=condition]];
      • path:指定日志的存放位置。
      • format:指定日志的格式。
      • gzip:日志写入前先进行压缩,默认开启。压缩率可以指定,从1到9数值越大压缩比越高,同时压缩的速度也越慢。
      • buffer:指定日志写入时的缓存大小。(和flush一起使用)
      • flush:设置缓存的有效时间,如果超过flush指定的时间,缓存中的内容将被清空。(和buffer一起使用)
      • if:条件判断,如果指定的条件计算为0或空字符串,那么该请求不会写入日志。
    • 作用域:http、server、location、limit_except、if in location。
    • log_format
      • 语法:log_format name [escape=default|json|none] string ...;
        • name:格式名称,即access_log指令中format。
        • escape:设置变量中的字符编码方式是json还是default,默认是default。
        • string参考

          特殊的:access_log off;,指定该选项则当前作用域下的所有的请求日志都被关闭。

  2. error_log

    • 语法:error_log file [level];
      • level:debug、info、notice、warn、error、crit、alert、emerg。
    • 作用域: main、http、mail、stream、server、location。

二、使用

  1. nginx配置

    • nginx.conf
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    #user  nobody;
    worker_processes 1;
    #pid logs/nginx.pid;
    events {
    worker_connections 1024;
    }
    http {
    include mime.types;
    default_type application/octet-stream;
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
    '$status $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';
    #access_log logs/access.log main;
    log_format escape=json '{ "@timestamp": "$time_iso8601", '
    '"time": "$time_iso8601", '
    '"remote_addr": "$remote_addr", '
    '"remote_user": "$remote_user", '
    '"body_bytes_sent": "$body_bytes_sent", '
    '"request_time": "$request_time", '
    '"status": "$status", '
    '"host": "$host", '
    '"request": "$request", '
    '"request_method": "$request_method", '
    '"uri": "$uri", '
    '"http_referrer": "$http_referer", '
    '"body_bytes_sent":"$body_bytes_sent", '
    '"http_x_forwarded_for": "$http_x_forwarded_for", '
    '"http_user_agent": "$http_user_agent" '
    '}';
    sendfile on;
    #tcp_nopush on;
    #keepalive_timeout 0;
    keepalive_timeout 65;
    #gzip on;
    include conf.d/*.conf;
    }
    • conf.d/test1.conf
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    server {
    listen 80;
    server_name localhost;
    root your_path;

    access_log /usr/local/nginx1.17.9/logs/backend_access.log main;
    location / {
    index index.php index.html index.htm;
    }
    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
    root html;
    }

    location ~ \.php$ {
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include fastcgi_params;
    }
    }
    • cat backend_access.log
    1
    127.0.0.1 - - [07/May/2021:11:09:31 +0800] "GET /index.php HTTP/1.1" 200 2298784 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36" "-"
    • conf.d/test2.conf
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    server {
    listen 80;
    server_name phpweb.com;
    root your_path2;

    access_log /usr/local/nginx1.17.9/logs/phpweb_access.log json;

    location / {
    index index.php index.html index.htm;
    }
    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
    root html;
    }

    location ~ \.php$ {
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include fastcgi_params;
    }
    }
    • cat phpweb_access.log
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    {
    "@timestamp":"2021-05-07T11:09:54+08:00",
    "time":"2021-05-07T11:09:54+08:00",
    "remote_addr":"127.0.0.1",
    "remote_user":"-",
    "body_bytes_sent":"62436",
    "request_time":"0.003",
    "status":"200",
    "host":"phpweb.com",
    "request":"GET /inc.php HTTP/1.1",
    "request_method":"GET",
    "uri":"/inc.php",
    "http_referrer":"-",
    "http_x_forwarded_for":"-",
    "http_user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
    }
  2. 使用Filebeat实时收集Nginx日志

    • 启动Elasticsearch服务,cd your_elasticsearch_path && ./bin/elasticsearch
    • 切换到filebeat安装目录,复制nginx.yml,内容如下:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    filebeat.inputs:
    - type: log
    enabled: true
    paths:
    - /usr/local/nginx1.17.9/logs/phpweb_access.log

    setup.template.settings:
    index.number_of_shards: 1

    output.elasticsearch:
    hosts: ["localhost:9200"]
    protocol: "http"
    index: "filebeat-nginx-%{+yyyy.MM.dd}"

    setup.ilm.enabled: false
    setup.template.name: "filebeat-nginx"
    setup.template.pattern: "filebeat-nginx-*"
    1
    2
    health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    yellow open filebeat-nginx-2021.05.07 JTzFfsmiTuKQF8uudgET1A 1 1 1 0 38.4kb 38.4kb
    • 通过curl请求查看文档详情curl -XGET "localhost:9200/filebeat-nginx-2021.05.07/_search?pretty"
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    {
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 1,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    {
    "_index" : "filebeat-nginx-2021.05.07",
    "_type" : "_doc",
    "_id" : "i6p_RXkBlQNTrkx8sucY",
    "_score" : 1.0,
    "_source" : {
    "@timestamp" : "2021-05-07T06:24:26.152Z",
    "input" : {
    "type" : "log"
    },
    "ecs" : {
    "version" : "1.8.0"
    },
    "host" : {
    "name" : "xxxdeMacBook-Air-5.local"
    },
    "agent" : {
    "name" : "xxxdeMacBook-Air-5.local",
    "type" : "filebeat",
    "version" : "7.12.0",
    "hostname" : "xxxdeMacBook-Air-5.local",
    "ephemeral_id" : "72d51d2b-dd60-4a05-b014-90c3185e2c58",
    "id" : "fb31744d-e497-45c7-a173-bd8c15b4652e"
    },
    "message" : "{ \"@timestamp\": \"2021-05-07T14:24:25+08:00\", \"time\": \"2021-05-07T14:24:25+08:00\", \"remote_addr\": \"127.0.0.1\", \"remote_user\": \"-\", \"body_bytes_sent\": \"21\", \"request_time\": \"0.005\", \"status\": \"200\", \"host\": \"phpweb.com\", \"request\": \"GET /main.php HTTP/1.1\", \"request_method\": \"GET\", \"uri\": \"/main.php\", \"http_referrer\": \"-\", \"body_bytes_sent\":\"21\", \"http_x_forwarded_for\": \"-\", \"http_user_agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36\" }",
    "log" : {
    "offset" : 2088,
    "file" : {
    "path" : "/usr/local/nginx1.17.9/logs/phpweb_access.log"
    }
    }
    }
    }
    ]
    }
    }
    • filebeat可对json数据进行简单的处理,修改nginx.yml配置如下:
      • json.keys_under_root:默认false,解码后的json被放置在一个以”json”为key的输出文档中;如果启用这个设置则这个key在文档中被复制为顶级。
      • json.overwrite_keys:如果keys_under_root被启用,那么在key冲突的情况下,解码后的json对象将覆盖Filebeat正常的字段。
      • add_error_key:启用后则当json解析出现错误的时候Filebeat添加error.messageerror.type: json两个key。
      • message_key:可选配置,用于在应用行过滤和多行设置的时候指定一个json key。
        • 指定的这个key必须在json对象中是顶级的,而且其关联的值必须是一个字符串,否则没有过滤或者多行聚集发送。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    filebeat.inputs:
    - type: log
    enabled: true
    paths:
    - /usr/local/nginx1.17.9/logs/phpweb_access.log
    tags: ["phpweb-accesslog"]
    json.keys_under_root: true
    json.overwrite_keys: true

    setup.template.settings:
    index.number_of_shards: 1

    output.elasticsearch:
    hosts: ["localhost:9200"]
    protocol: "http"
    index: "filebeat-nginx-%{+yyyy.MM.dd}"

    setup.ilm.enabled: false
    setup.template.name: "filebeat-nginx"
    setup.template.pattern: "filebeat-nginx-*"
    • 重新启动filebeat服务sudo ./filebeat -e -c nginx.yml
    • 通过curl请求查看文档详情curl -XGET "localhost:9200/filebeat-nginx-2021.05.07/_search?pretty"
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    {
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
    },
    "hits" : {
    "total" : {
    "value" : 6,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    {
    "_index" : "filebeat-nginx-2021.05.07",
    "_type" : "_doc",
    "_id" : "jKqORXkBlQNTrkx8g-db",
    "_score" : 1.0,
    "_source" : {
    "@timestamp" : "2021-05-07T06:40:23.000Z",
    "body_bytes_sent" : "21",
    "request_time" : "0.018",
    "request" : "GET /main.php HTTP/1.1",
    "ecs" : {
    "version" : "1.8.0"
    },
    "log" : {
    "offset" : 2608,
    "file" : {
    "path" : "/usr/local/nginx1.17.9/logs/phpweb_access.log"
    }
    },
    "http_x_forwarded_for" : "-",
    "time" : "2021-05-07T14:40:23+08:00",
    "remote_addr" : "127.0.0.1",
    "input" : {
    "type" : "log"
    },
    "agent" : {
    "version" : "7.12.0",
    "hostname" : "xxxdeMacBook-Air-5.local",
    "ephemeral_id" : "99f15770-df47-4235-a766-7dfbd6e6718c",
    "id" : "fb31744d-e497-45c7-a173-bd8c15b4652e",
    "name" : "xxxdeMacBook-Air-5.local",
    "type" : "filebeat"
    },
    "host" : {
    "name" : "xxxdeMacBook-Air-5.local"
    },
    "http_user_agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
    "http_referrer" : "-",
    "status" : "200",
    "uri" : "/main.php",
    "request_method" : "GET",
    "remote_user" : "-"
    }
    }
    ]
    }
    }
  3. 使用Filebeat模块(module)分析Nginx日志,传送门

    • 启动Elasticsearch和Kibana服务
    • 修改默认filebeat.yml只需要改两项,无需改input相关设置
    1
    2
    3
    4
    5
    setup.kibana:
    host: "localhost:5601"

    output.elasticsearch:
    hosts: ["localhost:9200"]
    • 安装nginx模块./filebeat modules enable nginx
    • 查看已安装模块./filebeat modules list
    • 修改modules.d/nginx.yml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    # Module: nginx
    # Docs: https://www.elastic.co/guide/en/beats/filebeat/7.x/filebeat-module-nginx.html
    - module: nginx
    # Access logs
    access:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths: ["/usr/local/nginx1.17.9/logs/backend_access.log"]

    # Error logs
    error:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths: ["/usr/local/nginx1.17.9/logs/error.log"]

    # Ingress-nginx controller logs. This is disabled by default. It could be used in Kubernetes environments to parse ingress-nginx logs
    ingress_controller:
    enabled: false

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    #var.paths:
    • 保存应用设置./filebeat setup,使nginx模块能够正确地被Kibana显示
    1
    2
    3
    4
    5
    6
    7
    8
    9
    Overwriting ILM policy is disabled. Set `setup.ilm.overwrite: true` for enabling.

    Index setup finished.
    Loading dashboards (Kibana must be running and reachable)
    Loaded dashboards
    Setting up ML using setup --machine-learning is going to be removed in 8.0.0. Please use the ML app instead.
    See more: https://www.elastic.co/guide/en/machine-learning/current/index.html
    Loaded machine learning job configurations
    Loaded Ingest pipelines
    • 启动服务./filebeat -e
    • 访问http://localhost以生成access.log
    • 访问http://localhost:9200/_cat/indices?v查看现有的索引
    1
    2
    3
    4
    health status index                            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    yellow open filebeat-7.12.0-2021.05.11-000001 C8AcE2pTTkOV9isV3X0BDw 1 1 22 0 71.5kb 71.5kb
    ...
    ...
    • 访问http://localhost:5601进入Kibana后台
    • 点击开发工具,运行脚本
    1
    2
    3
    4
    5
    6
    GET filebeat-7.12.0-2021.05.11-000001/_search
    {
    "query": {
    "match_all": {}
    }
    }
  4. 使用Kibana查看分析日志(手动选择导入)

    • 进入kibana后台http://localhost:5601
    • 点击上传文件
    • 选择或拖放文件,选择对应的日志文件
    • 点击导入
    • 设置索引名称
    • 点击导入,完成

三、参考

  1. 参考一
  2. 参考二
  3. 参考三
  4. 参考四
  5. 参考五
  6. 参考六
  7. 参考七