0%

ELK常用工具

逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。纯文本意味着该文件是一个字符序列,不含必须像二进制数字那样被解读的数据。CSV文件由任意数目的记录组成,记录间以某种换行符分隔;每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符。通常,所有记录都有完全相同的字段序列。

cerebro

一、 基础

使用curl等客户端工具即可通过Restful API对Elasticsearch进行操作,但也有一些客户端工具提供对于ElasticSearch更加友好的可视化操作支持,比如cerebro。


mutate

一、基础

      mutate插件可以对事件中的数据进行修改,包括rename、update、replace、convert、split、gsub、uppercase、lowercase、strip、remove_field、join、merge等功能。

  1. rename:对于已经存在的字段,重命名其字段名称。
1
2
3
4
5
filter {
mutate {
rename => ["old_name", "new_name"]
}
}
  1. update:更新字段内容(如果字段不存在不会新建)。
1
2
3
4
5
filter {
mutate {
update => {"old_data" => "new_data"}
}
}
  1. replace:与update功能相同,区别在于如果字段不存在则会新建字段。
1
2
3
4
5
filter {
mutate {
replace => {"message" => "%{source_host}: new_host" }
}
}
  1. convert:数据类型转换。
1
2
3
4
5
filter {
mutate {
convert => ["request_time", "float"]
}
}
  1. gsub:通过正则表达式实现文本替换的功能。
1
2
3
4
5
6
7
8
filter {
mutate {
gsub => [
"fieldname", "/", "_",
"fieldname2", "[\\?#-]", "."
]
}
}
  1. uppercase/lowercase:大小写转换。
1
2
3
4
5
filter {
mutate {
uppercase => [ "fieldname" ]
}
}
  1. split:将提取到的某个字段按照某个字符分割。
1
2
3
4
5
filter {
mutate {
split => ["message", "|"]
}
}
  1. strip:去除首尾的空白字符。
1
2
3
4
5
filter {
mutate {
strip => ["field1", "field2"]
}
}
  1. remove_field:删除字段。
1
2
3
4
5
filter {
mutate {
remove_field => [ "foo_%{somefield}" ]
}
}
  1. join:将类型为array的字段中的元素使用指定字符为分隔符聚合成一个字符串。
1
2
3
4
5
6
7
8
filter {
mutate {
split => ["message", "|"]
}
mutate {
join => ["message", ","]
}
}
  1. merge:合并字段。
1
2
3
4
5
filter {
mutate {
merge => [ "dest_field", "added_field" ]
}
}

二、使用

  1. 下载测试数据
  2. 解压至/Users/your_name/elk/ml-25m/movies.csv
  3. 启动Elasticsearch实例
  4. 修改Logstash配置logstash.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
input {
#beats {
# port => 9011
#}
file {
path => ["/Users/your_name/elk/ml-25m/movies.csv"]
start_position => "beginning"
sincedb_path => null
}
}

filter {
csv {
separator => ","
columns => ["movieId","title","genre"]
}
mutate {
split => { "genre" => "|" }
# remove_field => ["path", "host","@timestamp","message"]
}
mutate {
convert => {
"year" => "integer"
}
strip => ["title"]
#remove_field => ["path", "host","@timestamp","message","content"]
}
}

output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "movies"
}
stdout {}
}
  1. 启动Logstash实例

  2. 查询

    • curl -XGET "localhost:9200/movies/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"liu*"}}}'

    • curl -XGET "localhost:9200/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"liu*"}}}'

三、参考

  1. 参考一
  2. 参考二

Grok

一、概念

      GrokELK Stack中用来快速解析日志的一个脚本工具,运用得好的话可以极大程度的降低日志解析的工作,是将非结构化的日志数据解析为可查询的结构化数据的一种方法。它使用正则表达式提取日志记录中的数据,其正则表达式语法与PerlRuby语言中的正则表达式语法类似,语法:%{SYNTAX:SEMANTIC},SYNTAX匹配模式的名称,分为配置pattern和自定义pattern,SEMANTIC则是对匹配到的文本气的别名。

默认情况下SEMANTIC匹配到的是string,特殊的%{SYNTAX:SEMANTIC:type},即执行匹配文本的数据类型,目前仅支持int和float。

  1. 常用内置pattern
类型 含义 正则
INT int类型 (?:[+-]?(?:[0-9]+))
NUMBER 数字 (?:%{BASE10NUM})
DATA 数据,可以对应字符串 .*?
GREEDYDATA 数据,可以对应字符串,贪婪匹配 .*
WORD 单词 \b\w+\b
IP ip地址,v4或v6 (?:%{IPV6}
DATE 日期 %{DATE_US}
TIME 时间 (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATESTAMP 日期+时间 %{DATE}[- ]%{TIME}
PATH 系统路径 (?:%{UNIXPATH}
HOSTNAME 主机名 \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?
MAC mac地址 (?:%{CISCOMAC}
UUID uuid [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
EMAILADDRESS email地址 %{EMAILLOCALPART}@%{HOSTNAME}
  1. 自定义pattern(不常用,内置基本已够用)

二、使用

  1. 使用内置pattern
1
2
3
4
5
filter {
grok {
match => { "message" => "%{NUMBER:id} %{TIME:created}"}
}
}
  1. 测试grok是否生效,传送门

    • nginx日志格式
    1
    2
    3
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    '$status $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';
    • 测试文件access.log
    1
    2
    127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/jquery.dump.js HTTP/1.1" 200 4482 "http://localhost/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
    127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/main.js HTTP/1.1" 200 3018 "http://localhost/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
    • grok模式
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
     %{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{HTTPDATE:time_local}\] \"%{DATA:request}\" %{INT:status} %{NUMBER:bytes_sent} \"%{DATA:refer}\" \"%{DATA:http_user_agent}\"

    {
    "remote_addr": [
    [
    "127.0.0.1"
    ]
    ],
    "HOSTNAME": [
    [
    "127.0.0.1"
    ]
    ],
    "IP": [
    [
    null
    ]
    ],
    "IPV6": [
    [
    null
    ]
    ],
    "IPV4": [
    [
    null
    ]
    ],
    "remote_user": [
    [
    "-"
    ]
    ],
    "time_local": [
    [
    "26/Apr/2017:16:29:31 +0800"
    ]
    ],
    "MONTHDAY": [
    [
    "26"
    ]
    ],
    "MONTH": [
    [
    "Apr"
    ]
    ],
    "YEAR": [
    [
    "2017"
    ]
    ],
    "TIME": [
    [
    "16:29:31"
    ]
    ],
    "HOUR": [
    [
    "16"
    ]
    ],
    "MINUTE": [
    [
    "29"
    ]
    ],
    "SECOND": [
    [
    "31"
    ]
    ],
    "INT": [
    [
    "+0800"
    ]
    ],
    "request": [
    [
    "GET /demo/Demo/jquery.dump.js HTTP/1.1"
    ]
    ],
    "status": [
    [
    "200"
    ]
    ],
    "bytes_sent": [
    [
    "4482"
    ]
    ],
    "BASE10NUM": [
    [
    "4482"
    ]
    ],
    "refer": [
    [
    "http://localhost/index.php"
    ]
    ],
    "http_user_agent": [
    [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
    ]
    ]
    }
    • grok模式:%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
     {
    "COMBINEDAPACHELOG": [
    [
    "127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/jquery.dump.js HTTP/1.1" 200 4482 "http://localhost/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36""
    ]
    ],
    "COMMONAPACHELOG": [
    [
    "127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/jquery.dump.js HTTP/1.1" 200 4482"
    ]
    ],
    "clientip": [
    [
    "127.0.0.1"
    ]
    ],
    "HOSTNAME": [
    [
    "127.0.0.1"
    ]
    ],
    "IP": [
    [
    null
    ]
    ],
    "IPV6": [
    [
    null
    ]
    ],
    "IPV4": [
    [
    null
    ]
    ],
    "ident": [
    [
    "-"
    ]
    ],
    "USERNAME": [
    [
    "-",
    "-"
    ]
    ],
    "auth": [
    [
    "-"
    ]
    ],
    "timestamp": [
    [
    "26/Apr/2017:16:29:31 +0800"
    ]
    ],
    "MONTHDAY": [
    [
    "26"
    ]
    ],
    "MONTH": [
    [
    "Apr"
    ]
    ],
    "YEAR": [
    [
    "2017"
    ]
    ],
    "TIME": [
    [
    "16:29:31"
    ]
    ],
    "HOUR": [
    [
    "16"
    ]
    ],
    "MINUTE": [
    [
    "29"
    ]
    ],
    "SECOND": [
    [
    "31"
    ]
    ],
    "INT": [
    [
    "+0800"
    ]
    ],
    "verb": [
    [
    "GET"
    ]
    ],
    "request": [
    [
    "/demo/Demo/jquery.dump.js"
    ]
    ],
    "httpversion": [
    [
    "1.1"
    ]
    ],
    "BASE10NUM": [
    [
    "1.1",
    "200",
    "4482"
    ]
    ],
    "rawrequest": [
    [
    null
    ]
    ],
    "response": [
    [
    "200"
    ]
    ],
    "bytes": [
    [
    "4482"
    ]
    ],
    "referrer": [
    [
    ""http://localhost/index.php""
    ]
    ],
    "QUOTEDSTRING": [
    [
    ""http://localhost/index.php"",
    ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36""
    ]
    ],
    "agent": [
    [
    ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36""
    ]
    ],
    "extra_fields": [
    [
    ""
    ]
    ]
    }

三、参考

  1. 参考一
  2. 参考二
  3. 参考三
  4. 参考四
  5. 参考五
  6. 参考六