0%

Logstash过滤器详解

消息队列是分布式系统中重要的组件,使用消息队列主要是为了通过异步处理提高系统性能和削峰、降低系统耦合性。


ELK常用工具

一、概念

      mutate插件可以对事件中的数据进行修改,包括rename、update、replace、convert、split、gsub、uppercase、lowercase、strip、remove_field、join、merge等功能。

  1. rename:对于已经存在的字段,重命名其字段名称。
1
2
3
4
5
filter {
mutate {
rename => ["old_name", "new_name"]
}
}
  1. update:更新字段内容(如果字段不存在不会新建)。
1
2
3
4
5
filter {
mutate {
update => {"old_data" => "new_data"}
}
}
  1. replace:与update功能相同,区别在于如果字段不存在则会新建字段。
1
2
3
4
5
filter {
mutate {
replace => {"message" => "%{source_host}: new_host" }
}
}
  1. convert:数据类型转换。
1
2
3
4
5
filter {
mutate {
convert => ["request_time", "float"]
}
}
  1. gsub:通过正则表达式实现文本替换的功能。
1
2
3
4
5
6
7
8
filter {
mutate {
gsub => [
"fieldname", "/", "_",
"fieldname2", "[\\?#-]", "."
]
}
}
  1. uppercase/lowercase:大小写转换。
1
2
3
4
5
filter {
mutate {
uppercase => [ "fieldname" ]
}
}
  1. split:将提取到的某个字段按照某个字符分割。
1
2
3
4
5
filter {
mutate {
split => ["message", "|"]
}
}
  1. strip:去除首尾的空白字符。
1
2
3
4
5
filter {
mutate {
strip => ["field1", "field2"]
}
}
  1. remove_field:删除字段。
1
2
3
4
5
filter {
mutate {
remove_field => [ "foo_%{somefield}" ]
}
}
  1. join:将类型为array的字段中的元素使用指定字符为分隔符聚合成一个字符串。
1
2
3
4
5
6
7
8
filter {
mutate {
split => ["message", "|"]
}
mutate {
join => ["message", ","]
}
}
  1. merge:合并字段。
1
2
3
4
5
filter {
mutate {
merge => [ "dest_field", "added_field" ]
}
}

二、使用

  1. 下载测试数据
  2. 解压至/Users/your_name/elk/ml-25m/movies.csv
  3. 启动Elasticsearch实例
  4. 修改Logstash配置logstash.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
input {
#beats {
# port => 9011
#}
file {
path => ["/Users/your_name/elk/ml-25m/movies.csv"]
start_position => "beginning"
sincedb_path => null
}
}

filter {
csv {
separator => ","
columns => ["movieId","title","genre"]
}
mutate {
split => { "genre" => "|" }
# remove_field => ["path", "host","@timestamp","message"]
}
mutate {
convert => {
"year" => "integer"
}
strip => ["title"]
#remove_field => ["path", "host","@timestamp","message","content"]
}
}

output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "movies"
}
stdout {}
}
  1. 启动Logstash实例

  2. 查询

    • curl -XGET "localhost:9200/movies/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"liu*"}}}'

    • curl -XGET "localhost:9200/_search?pretty" -H "content-type:application/json" -d '{"_source":["movieId","title"],"query":{"match":{"title":"liu*"}}}'

三、参考

  1. 参考一
  2. 参考二

Grok

一、概念

      GrokELK Stack中用来快速解析日志的一个脚本工具,运用得好的话可以极大程度的降低日志解析的工作,是将非结构化的日志数据解析为可查询的结构化数据的一种方法。它使用正则表达式提取日志记录中的数据,其正则表达式语法与PerlRuby语言中的正则表达式语法类似,语法:%{SYNTAX:SEMANTIC},SYNTAX匹配模式的名称,分为配置pattern和自定义pattern,SEMANTIC则是对匹配到的文本气的别名。

默认情况下SEMANTIC匹配到的是string,特殊的%{SYNTAX:SEMANTIC:type},即执行匹配文本的数据类型,目前仅支持int和float。

  1. 常用内置pattern
类型 含义 正则
INT int类型 (?:[+-]?(?:[0-9]+))
NUMBER 数字 (?:%{BASE10NUM})
DATA 数据,可以对应字符串 .*?
GREEDYDATA 数据,可以对应字符串,贪婪匹配 .*
WORD 单词 \b\w+\b
IP ip地址,v4或v6 (?:%{IPV6}
DATE 日期 %{DATE_US}
TIME 时间 (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATESTAMP 日期+时间 %{DATE}[- ]%{TIME}
PATH 系统路径 (?:%{UNIXPATH}
HOSTNAME 主机名 \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?
MAC mac地址 (?:%{CISCOMAC}
UUID uuid [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
EMAILADDRESS email地址 %{EMAILLOCALPART}@%{HOSTNAME}
  1. 自定义pattern(不常用,内置基本已够用)

二、使用

  1. 使用内置pattern
1
2
3
4
5
filter {
grok {
match => { "message" => "%{NUMBER:id} %{TIME:created}"}
}
}
  1. 测试grok是否生效,传送门

    • nginx日志格式
    1
    2
    3
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    '$status $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';
    • 测试文件access.log
    1
    2
    127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/jquery.dump.js HTTP/1.1" 200 4482 "http://localhost/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
    127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/main.js HTTP/1.1" 200 3018 "http://localhost/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
    • grok模式
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
     %{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{HTTPDATE:time_local}\] \"%{DATA:request}\" %{INT:status} %{NUMBER:bytes_sent} \"%{DATA:refer}\" \"%{DATA:http_user_agent}\"

    {
    "remote_addr": [
    [
    "127.0.0.1"
    ]
    ],
    "HOSTNAME": [
    [
    "127.0.0.1"
    ]
    ],
    "IP": [
    [
    null
    ]
    ],
    "IPV6": [
    [
    null
    ]
    ],
    "IPV4": [
    [
    null
    ]
    ],
    "remote_user": [
    [
    "-"
    ]
    ],
    "time_local": [
    [
    "26/Apr/2017:16:29:31 +0800"
    ]
    ],
    "MONTHDAY": [
    [
    "26"
    ]
    ],
    "MONTH": [
    [
    "Apr"
    ]
    ],
    "YEAR": [
    [
    "2017"
    ]
    ],
    "TIME": [
    [
    "16:29:31"
    ]
    ],
    "HOUR": [
    [
    "16"
    ]
    ],
    "MINUTE": [
    [
    "29"
    ]
    ],
    "SECOND": [
    [
    "31"
    ]
    ],
    "INT": [
    [
    "+0800"
    ]
    ],
    "request": [
    [
    "GET /demo/Demo/jquery.dump.js HTTP/1.1"
    ]
    ],
    "status": [
    [
    "200"
    ]
    ],
    "bytes_sent": [
    [
    "4482"
    ]
    ],
    "BASE10NUM": [
    [
    "4482"
    ]
    ],
    "refer": [
    [
    "http://localhost/index.php"
    ]
    ],
    "http_user_agent": [
    [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
    ]
    ]
    }
    • grok模式:%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
     {
    "COMBINEDAPACHELOG": [
    [
    "127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/jquery.dump.js HTTP/1.1" 200 4482 "http://localhost/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36""
    ]
    ],
    "COMMONAPACHELOG": [
    [
    "127.0.0.1 - - [26/Apr/2017:16:29:31 +0800] "GET /demo/Demo/jquery.dump.js HTTP/1.1" 200 4482"
    ]
    ],
    "clientip": [
    [
    "127.0.0.1"
    ]
    ],
    "HOSTNAME": [
    [
    "127.0.0.1"
    ]
    ],
    "IP": [
    [
    null
    ]
    ],
    "IPV6": [
    [
    null
    ]
    ],
    "IPV4": [
    [
    null
    ]
    ],
    "ident": [
    [
    "-"
    ]
    ],
    "USERNAME": [
    [
    "-",
    "-"
    ]
    ],
    "auth": [
    [
    "-"
    ]
    ],
    "timestamp": [
    [
    "26/Apr/2017:16:29:31 +0800"
    ]
    ],
    "MONTHDAY": [
    [
    "26"
    ]
    ],
    "MONTH": [
    [
    "Apr"
    ]
    ],
    "YEAR": [
    [
    "2017"
    ]
    ],
    "TIME": [
    [
    "16:29:31"
    ]
    ],
    "HOUR": [
    [
    "16"
    ]
    ],
    "MINUTE": [
    [
    "29"
    ]
    ],
    "SECOND": [
    [
    "31"
    ]
    ],
    "INT": [
    [
    "+0800"
    ]
    ],
    "verb": [
    [
    "GET"
    ]
    ],
    "request": [
    [
    "/demo/Demo/jquery.dump.js"
    ]
    ],
    "httpversion": [
    [
    "1.1"
    ]
    ],
    "BASE10NUM": [
    [
    "1.1",
    "200",
    "4482"
    ]
    ],
    "rawrequest": [
    [
    null
    ]
    ],
    "response": [
    [
    "200"
    ]
    ],
    "bytes": [
    [
    "4482"
    ]
    ],
    "referrer": [
    [
    ""http://localhost/index.php""
    ]
    ],
    "QUOTEDSTRING": [
    [
    ""http://localhost/index.php"",
    ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36""
    ]
    ],
    "agent": [
    [
    ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36""
    ]
    ],
    "extra_fields": [
    [
    ""
    ]
    ]
    }

三、参考

  1. 参考一
  2. 参考二
  3. 参考三
  4. 参考四
  5. 参考五
  6. 参考六)