逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。纯文本意味着该文件是一个字符序列,不含必须像二进制数字那样被解读的数据。CSV文件由任意数目的记录组成,记录间以某种换行符分隔;每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符。通常,所有记录都有完全相同的字段序列。
Filebeat-Logstash-Elasticsearch解析CSV
一、准备
- 分别下载安装相同版本的Elasticsearch、Logstash、Filebeat
- 准备测试数据
movies.csv
,传送门,删除一部分,cat movies.csv
- 最终目录
/Users/your_path/elk/ml-25m/movies.csv
- 最终目录
1 | movieId,title,genres |
二、实战
- 启动Elasticsearch实例
./bin/elasticsearch
,默认配置config/elasticsearch.yml
1 | #cluster.name: my-application |
- 启动Logstash实例
./bin/logstash -f config/logstash.conf
,logstash.conf
内容如下:
1 | # Sample Logstash configuration for creating a simple |
- 启动Filebeat实例
./filebeat -e -c movie.yml
,配置如下:
1 | filebeat.inputs: |
查询
curl -XGET "localhost:9200/movies/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"Beloved*"}}}'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29{
"took" : 107,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.94597876,
"hits" : [
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "On05CXkBxPLXGnNxw4p0",
"_score" : 0.94597876,
"_source" : {
"movieId" : "40",
"title" : "Cry, the Beloved Country (1995)"
}
}
]
}
}追加数据
echo 111,liusir,haha >> movies.csv
echo 222,cry,hehe >> movies.csv
echo 222,take,hehe >> movies.csv
echo 222,takes off,heihei >> movies.csv
查询title包含takes的文档
curl -XGET "localhost:9200/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"takes*"}}}'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39{
"took" : 489,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.3125186,
"hits" : [
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "QH3nC3kBxPLXGnNxIYo-",
"_score" : 1.3125186,
"_source" : {
"movieId" : "222",
"title" : "takes off"
}
},
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "OX05CXkBxPLXGnNxw4pz",
"_score" : 0.9411969,
"_source" : {
"movieId" : "38",
"title" : "It Takes Two (1995)"
}
}
]
}
}
三、参考
Filebeat-Logstash-Elasticsearch解析nginx日志
一、概念
access_log,一般配合log_format、open_log_file_cache使用。
- 语法:
access_log path [format [buffer=size] [gzip[=level]] [flush=time] [if=condition]];
- path:指定日志的存放位置。
- format:指定日志的格式。
- gzip:日志写入前先进行压缩,默认开启。压缩率可以指定,从1到9数值越大压缩比越高,同时压缩的速度也越慢。
- buffer:指定日志写入时的缓存大小。(和flush一起使用)
- flush:设置缓存的有效时间,如果超过flush指定的时间,缓存中的内容将被清空。(和buffer一起使用)
- if:条件判断,如果指定的条件计算为0或空字符串,那么该请求不会写入日志。
- 作用域:http、server、location、limit_except、if in location。
- log_format
- 语法:
log_format name [escape=default|json|none] string ...;
- name:格式名称,即access_log指令中format。
- escape:设置变量中的字符编码方式是json还是default,默认是default。
- string参考
特殊的:
access_log off;
,指定该选项则当前作用域下的所有的请求日志都被关闭。
- 语法:
- 语法:
error_log
- 语法:
error_log file [level];
- level:debug、info、notice、warn、error、crit、alert、emerg。
- 作用域: main、http、mail、stream、server、location。
- 语法:
二、使用
nginx配置
nginx.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36#user nobody;
worker_processes 1;
#pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
#access_log logs/access.log main;
log_format escape=json '{ "@timestamp": "$time_iso8601", '
'"time": "$time_iso8601", '
'"remote_addr": "$remote_addr", '
'"remote_user": "$remote_user", '
'"body_bytes_sent": "$body_bytes_sent", '
'"request_time": "$request_time", '
'"status": "$status", '
'"host": "$host", '
'"request": "$request", '
'"request_method": "$request_method", '
'"uri": "$uri", '
'"http_referrer": "$http_referer", '
'"body_bytes_sent":"$body_bytes_sent", '
'"http_x_forwarded_for": "$http_x_forwarded_for", '
'"http_user_agent": "$http_user_agent" '
'}';
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
include conf.d/*.conf;
}- conf.d/test1.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21server {
listen 80;
server_name localhost;
root your_path;
access_log /usr/local/nginx1.17.9/logs/backend_access.log main;
location / {
index index.php index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
}- cat backend_access.log
1
127.0.0.1 - - [07/May/2021:11:09:31 +0800] "GET /index.php HTTP/1.1" 200 2298784 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36" "-"
- conf.d/test2.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22server {
listen 80;
server_name phpweb.com;
root your_path2;
access_log /usr/local/nginx1.17.9/logs/phpweb_access.log json;
location / {
index index.php index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
}- cat phpweb_access.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16{
"@timestamp":"2021-05-07T11:09:54+08:00",
"time":"2021-05-07T11:09:54+08:00",
"remote_addr":"127.0.0.1",
"remote_user":"-",
"body_bytes_sent":"62436",
"request_time":"0.003",
"status":"200",
"host":"phpweb.com",
"request":"GET /inc.php HTTP/1.1",
"request_method":"GET",
"uri":"/inc.php",
"http_referrer":"-",
"http_x_forwarded_for":"-",
"http_user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
}使用Filebeat实时收集Nginx日志
- 启动Elasticsearch服务,
cd your_elasticsearch_path && ./bin/elasticsearch
- 切换到filebeat安装目录,复制nginx.yml,内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17filebeat.inputs:
- type: log
enabled: true
paths:
- /usr/local/nginx1.17.9/logs/phpweb_access.log
setup.template.settings:
index.number_of_shards: 1
output.elasticsearch:
hosts: ["localhost:9200"]
protocol: "http"
index: "filebeat-nginx-%{+yyyy.MM.dd}"
setup.ilm.enabled: false
setup.template.name: "filebeat-nginx"
setup.template.pattern: "filebeat-nginx-*"- 启动filebeat服务
sudo ./filebeat -e -c nginx.yml
- 通过url获取Elasticsearch信息,http://localhost:9200/_cat/indices?v
1
2health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open filebeat-nginx-2021.05.07 JTzFfsmiTuKQF8uudgET1A 1 1 1 0 38.4kb 38.4kb- 通过curl请求查看文档详情
curl -XGET "localhost:9200/filebeat-nginx-2021.05.07/_search?pretty"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "filebeat-nginx-2021.05.07",
"_type" : "_doc",
"_id" : "i6p_RXkBlQNTrkx8sucY",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2021-05-07T06:24:26.152Z",
"input" : {
"type" : "log"
},
"ecs" : {
"version" : "1.8.0"
},
"host" : {
"name" : "xxxdeMacBook-Air-5.local"
},
"agent" : {
"name" : "xxxdeMacBook-Air-5.local",
"type" : "filebeat",
"version" : "7.12.0",
"hostname" : "xxxdeMacBook-Air-5.local",
"ephemeral_id" : "72d51d2b-dd60-4a05-b014-90c3185e2c58",
"id" : "fb31744d-e497-45c7-a173-bd8c15b4652e"
},
"message" : "{ \"@timestamp\": \"2021-05-07T14:24:25+08:00\", \"time\": \"2021-05-07T14:24:25+08:00\", \"remote_addr\": \"127.0.0.1\", \"remote_user\": \"-\", \"body_bytes_sent\": \"21\", \"request_time\": \"0.005\", \"status\": \"200\", \"host\": \"phpweb.com\", \"request\": \"GET /main.php HTTP/1.1\", \"request_method\": \"GET\", \"uri\": \"/main.php\", \"http_referrer\": \"-\", \"body_bytes_sent\":\"21\", \"http_x_forwarded_for\": \"-\", \"http_user_agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36\" }",
"log" : {
"offset" : 2088,
"file" : {
"path" : "/usr/local/nginx1.17.9/logs/phpweb_access.log"
}
}
}
}
]
}
}- filebeat可对json数据进行简单的处理,修改nginx.yml配置如下:
- json.keys_under_root:默认false,解码后的json被放置在一个以”json”为key的输出文档中;如果启用这个设置则这个key在文档中被复制为顶级。
- json.overwrite_keys:如果keys_under_root被启用,那么在key冲突的情况下,解码后的json对象将覆盖Filebeat正常的字段。
- add_error_key:启用后则当json解析出现错误的时候Filebeat添加
error.message
和error.type: json
两个key。 - message_key:可选配置,用于在应用行过滤和多行设置的时候指定一个json key。
- 指定的这个key必须在json对象中是顶级的,而且其关联的值必须是一个字符串,否则没有过滤或者多行聚集发送。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20filebeat.inputs:
- type: log
enabled: true
paths:
- /usr/local/nginx1.17.9/logs/phpweb_access.log
tags: ["phpweb-accesslog"]
json.keys_under_root: true
json.overwrite_keys: true
setup.template.settings:
index.number_of_shards: 1
output.elasticsearch:
hosts: ["localhost:9200"]
protocol: "http"
index: "filebeat-nginx-%{+yyyy.MM.dd}"
setup.ilm.enabled: false
setup.template.name: "filebeat-nginx"
setup.template.pattern: "filebeat-nginx-*"- 重新启动filebeat服务
sudo ./filebeat -e -c nginx.yml
- 通过curl请求查看文档详情
curl -XGET "localhost:9200/filebeat-nginx-2021.05.07/_search?pretty"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "filebeat-nginx-2021.05.07",
"_type" : "_doc",
"_id" : "jKqORXkBlQNTrkx8g-db",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2021-05-07T06:40:23.000Z",
"body_bytes_sent" : "21",
"request_time" : "0.018",
"request" : "GET /main.php HTTP/1.1",
"ecs" : {
"version" : "1.8.0"
},
"log" : {
"offset" : 2608,
"file" : {
"path" : "/usr/local/nginx1.17.9/logs/phpweb_access.log"
}
},
"http_x_forwarded_for" : "-",
"time" : "2021-05-07T14:40:23+08:00",
"remote_addr" : "127.0.0.1",
"input" : {
"type" : "log"
},
"agent" : {
"version" : "7.12.0",
"hostname" : "xxxdeMacBook-Air-5.local",
"ephemeral_id" : "99f15770-df47-4235-a766-7dfbd6e6718c",
"id" : "fb31744d-e497-45c7-a173-bd8c15b4652e",
"name" : "xxxdeMacBook-Air-5.local",
"type" : "filebeat"
},
"host" : {
"name" : "xxxdeMacBook-Air-5.local"
},
"http_user_agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
"http_referrer" : "-",
"status" : "200",
"uri" : "/main.php",
"request_method" : "GET",
"remote_user" : "-"
}
}
]
}
}- 启动Elasticsearch服务,
使用Filebeat模块(module)分析Nginx日志,传送门
- 启动Elasticsearch和Kibana服务
- 修改默认
filebeat.yml
,只需要改两项,无需改input相关设置
1
2
3
4
5setup.kibana:
host: "localhost:5601"
output.elasticsearch:
hosts: ["localhost:9200"]- 安装nginx模块
./filebeat modules enable nginx
- 查看已安装模块
./filebeat modules list
- 修改
modules.d/nginx.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26# Module: nginx
# Docs: https://www.elastic.co/guide/en/beats/filebeat/7.x/filebeat-module-nginx.html
- module: nginx
# Access logs
access:
enabled: true
# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
var.paths: ["/usr/local/nginx1.17.9/logs/backend_access.log"]
# Error logs
error:
enabled: true
# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
var.paths: ["/usr/local/nginx1.17.9/logs/error.log"]
# Ingress-nginx controller logs. This is disabled by default. It could be used in Kubernetes environments to parse ingress-nginx logs
ingress_controller:
enabled: false
# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
#var.paths:- 保存应用设置
./filebeat setup
,使nginx模块能够正确地被Kibana显示
1
2
3
4
5
6
7
8
9Overwriting ILM policy is disabled. Set `setup.ilm.overwrite: true` for enabling.
Index setup finished.
Loading dashboards (Kibana must be running and reachable)
Loaded dashboards
Setting up ML using setup --machine-learning is going to be removed in 8.0.0. Please use the ML app instead.
See more: https://www.elastic.co/guide/en/machine-learning/current/index.html
Loaded machine learning job configurations
Loaded Ingest pipelines- 启动服务
./filebeat -e
- 访问
http://localhost
以生成access.log - 访问
http://localhost:9200/_cat/indices?v
查看现有的索引
1
2
3
4health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open filebeat-7.12.0-2021.05.11-000001 C8AcE2pTTkOV9isV3X0BDw 1 1 22 0 71.5kb 71.5kb
...
...- 访问
http://localhost:5601
进入Kibana后台 - 点击开发工具,运行脚本
1
2
3
4
5
6GET filebeat-7.12.0-2021.05.11-000001/_search
{
"query": {
"match_all": {}
}
}使用Kibana查看分析日志(手动选择导入)
- 进入kibana后台
http://localhost:5601
- 点击上传文件
- 选择或拖放文件,选择对应的日志文件
- 点击导入
- 设置索引名称
- 点击导入,完成
- 进入kibana后台