本文主要介绍 Filebeat 的相关概念及基本使用,相关的环境及软件信息如下:CentOS 7.9、FileBeat 8.2.2、Logstash 8.2.2、Elasticsearch 8.2.2。
1、Filebeat 简介
1.1、Filebeat 总览
Filebeat 是用于转发和集中日志数据的轻量级传送程序。 作为服务器上的代理安装,Filebeat 监视你指定的日志文件或位置,收集日志事件,并将它们转发到 Elasticsearch 或 Logstash 以进行索引。
Filebeat 具有如下的一些特性:
- 正确处理日志旋转:针对每隔一个时间段生产一个新的日志的案例,Filebeat 可以帮我们正确地处理新生产的日志,并重新启动对新生成日志的处理
- 背压敏感:如果日志生成的速度过快,从而导致 Filebeat 生产的速度超过 Elasticsearch 处理的速度,那么 Filebeat 可以自动调节处理的速度,以达到 Elasticsearch 可以处理的范围内
- “至少一次”保证:每个日志生成的事件至少被处理一次
- 结构化日志:可以处理结构化的日志数据数据
- 多行事件:如果一个日志有多行信息,也可以被正确处理,比如错误信息往往是多行数据
- 条件过滤:可以有条件地过滤一些事件
Filebeat 的工作方式如下:启动 Filebeat 时,它将启动一个或多个输入,这些输入将在为日志数据指定的位置中查找。 对于 Filebeat 所找到的每个日志,Filebeat 都会启动收集器(havester)。 每个收集器都读取一个日志以获取新内容,并将新日志数据发送到 libbeat。libbeat 会汇总事件,并将汇总的数据发送到为 Filebeat 配置的输出。
从上面有可以看出来在 spooler 里有一些缓存,这个可以用于重新发送以确保至少一次的事件消费,同时也可以用于背压敏感。一旦 Filebeat 生成的事件的速度超过 Elasticsearch 能够处理的极限,这个缓存可以用于存储一些事件。每个 Filebeat 可以配置多个 input,并且每个 input 可以配置来采集一个或多个文件路径的文件。 就像上面的图显示的那样,Filebeat 支持多种输入方式。Filbeat 支持如下的一些输出:
- Elasticsearch
- Logstash
- Kafka
- Redis
- File
- Console
- Cloud
1.2、Filebeat 模块
一个 Filebeat 模块通常由如下的部分组成:
Filebeat 模块简化了常见日志格式的收集,解析和可视化。文件集包含以下内容:
- Filebeat 输入配置,其中包含在其中查找日志文件的默认路径。 这些默认路径取决于操作系统。 Filebeat 配置还负责在需要时将多行事件缝合在一起。
- Elasticsearch Ingest Node 管道定义,用于解析日志行。
- 字段定义,用于为每个字段配置具有正确的 Elasticsearch 类型,它们还包含每个字段的简短说明。
- 示例 Kibana 仪表板(如果有)可用于可视化日志文件。
Filebeat 会根据你的环境自动调整这些配置,并将它们加载到相应的 Elastic Stack 组件中。
一个典型的模块(例如,对于 Nginx 日志)由一个或多个文件集(对于 Nginx,访问和错误日志)组成,比如,Nginx 模块解析 NGINX HTTP 服务器创建的访问和错误日志。它在幕后执行如下的一些任务:
- 设置日志文件的默认路径(你可以更改)
- 确保每个多行日志事件都作为单个事件发送
- 使用 ingest node 来解析和处理日志行
- 将数据塑造成适合在 Kibana 中进行可视化的结构
- 部署仪表板以可视化日志数据
针对其它的 Beats 模块来说,基本和 Filebeat 一样。目前针对 Elasticsearch 所提供的模块来说,有非常多的模块可以供使用:
关于 Filbeat 的模块信息,可查看官网文档:https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules.html。
2、Filebeat 安装
根据环境下载对应的安装包:https://www.elastic.co/cn/downloads/beats/filebeat,这里选择 Linux x86_64 版本;下载完成后在服务器上解压即可:
3、Filebeat 使用
这里使用 Filebeat 来收集 Nginx 访问日志并发送到 Logstash 和 Elasticsearch 中。
3.1、按照普通日志文件方式收集 Nginx 访问日志
3.1.1、配置
修改配置文件 filebeat.yml,配置输入、输出,开始可以先把输出配成 console,调试没问题后再配置输出到 Logstash 和 Elasticsearch。
3.1.2、启动
3.1.3、验证
访问 Nginx,日志文件 /home/hadoop/app/nginx-1.8.0/logs/access.log 会输出日志:
filebeat 的控制台打印采集的日志信息如下:
View Code
3.1.4、处理器
默认情况下配置了如下处理器:
如果觉得输出的日志过多,可以把这些处理器去掉并添加去除字段的处理器:
3.1.5、配置输出到 Logstash
Logstash 接受到的日志信息如下:
View Code
3.1.6、配置输出到 Elasticsearch
Elasticsearch 会自动创建名为 filebeat-8.2.2 的数据流,数据流中保存的日志信息如下:
{ "_index": ".ds-filebeat-8.2.2-2022.09.16-000001", "_id": "8qxKRYMBPnCOyxVi1GuP", "_version": 1, "_score": 1, "_source": { "agent": { "name": "pxc2", "id": "197bfd49-e03a-416e-b53f-4ac143b94fa5", "ephemeral_id": "a6b1d9a5-8485-4391-92be-6a6ae530a5cd", "type": "filebeat", "version": "8.2.2" }, "nginx": { "access": { "remote_ip_list": [ "10.49.196.1" ] } }, "log": { "file": { "path": "/home/hadoop/app/nginx-1.8.0/logs/access.log" }, "offset": 17052 }, "source": { "address": "10.49.196.1", "ip": "10.49.196.1" }, "fileset": { "name": "access" }, "url": { "path": "/", "original": "/" }, "input": { "type": "log" }, "@timestamp": "2022-09-16T07:54:24.000Z", "ecs": { "version": "1.12.0" }, "_tmp": {}, "related": { "ip": [ "10.49.196.1" ] }, "service": { "type": "nginx" }, "host": { "name": "pxc2" }, "http": { "request": { "method": "GET" }, "response": { "status_code": 304, "body": { "bytes": 0 } }, "version": "1.1" }, "event": { "ingested": "2022-09-16T07:54:34.439241175Z", "original": "10.49.196.1 - - [16/Sep/2022:15:54:24 +0800] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"", "timezone": "+08:00", "created": "2022-09-16T07:54:33.213Z", "kind": "event", "module": "nginx", "category": [ "web" ], "type": [ "access" ], "dataset": "nginx.access", "outcome": "success" }, "user_agent": { "original": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36", "os": { "name": "Windows", "version": "10", "full": "Windows 10" }, "name": "Chrome", "device": { "name": "Other" }, "version": "105.0.0.0" } }, "fields": { "event.category": [ "web" ], "user_agent.os.full": [ "Windows 10" ], "user_agent.original.text": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" ], "url.original.text": [ "/" ], "source.address": [ "10.49.196.1" ], "user_agent.os.name.text": [ "Windows" ], "user_agent.os.version": [ "10" ], "user_agent.os.name": [ "Windows" ], "traefik.access.user_agent.name": [ "Chrome" ], "service.type": [ "nginx" ], "agent.type": [ "filebeat" ], "event.module": [ "nginx" ], "http.request.method": [ "GET" ], "related.ip": [ "10.49.196.1" ], "traefik.access.user_agent.original": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" ], "source.ip": [ "10.49.196.1" ], "agent.name": [ "pxc2" ], "host.name": [ "pxc2" ], "user_agent.version": [ "105.0.0.0" ], "http.response.status_code": [ 304 ], "http.version": [ "1.1" ], "event.kind": [ "event" ], "event.timezone": [ "+08:00" ], "event.outcome": [ "success" ], "user_agent.original": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" ], "event.original": [ "10.49.196.1 - - [16/Sep/2022:15:54:24 +0800] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"" ], "fileset.name": [ "access" ], "nginx.access.remote_ip_list": [ "10.49.196.1" ], "input.type": [ "log" ], "log.offset": [ 17052 ], "user_agent.name": [ "Chrome" ], "agent.hostname": [ "pxc2" ], "http.response.body.bytes": [ 0 ], "traefik.access.user_agent.os_name": [ "Windows" ], "user_agent.os.full.text": [ "Windows 10" ], "event.ingested": [ "2022-09-16T07:54:34.439Z" ], "url.original": [ "/" ], "@timestamp": [ "2022-09-16T07:54:24.000Z" ], "url.path": [ "/" ], "agent.id": [ "197bfd49-e03a-416e-b53f-4ac143b94fa5" ], "ecs.version": [ "1.12.0" ], "event.type": [ "access" ], "log.file.path": [ "/home/hadoop/app/nginx-1.8.0/logs/access.log" ], "event.created": [ "2022-09-16T07:54:33.213Z" ], "agent.ephemeral_id": [ "a6b1d9a5-8485-4391-92be-6a6ae530a5cd" ], "agent.version": [ "8.2.2" ], "user_agent.device.name": [ "Other" ], "event.dataset": [ "nginx.access" ] } }
View Code
3.2、使用 Nginx 模块收集 Nginx 访问日志
Filebeat 中 Nginx 模块可以针对 Nginx 日志进行解析,简化了我们处理日志的过程。
3.2.1、配置
A、filebeat.yml 文件配置
B、modules.d/nginx.yml 文件配置
3.2.2、启用 Nginx 模块并启动 Filebeat
3.2.3、验证
访问 Nginx 后,Elasticsearch 会自动创建名为 filebeat-8.2.2 的数据流,数据流中保存的日志信息如下:
{ "_index": ".ds-filebeat-8.2.2-2022.09.16-000001", "_id": "-qxgRYMBPnCOyxVikGvO", "_version": 1, "_score": 1, "_source": { "agent": { "name": "pxc2", "id": "197bfd49-e03a-416e-b53f-4ac143b94fa5", "type": "filebeat", "ephemeral_id": "c7505853-b6de-46b0-abb0-7727dfb37d4b", "version": "8.2.2" }, "nginx": { "access": { "remote_ip_list": [ "10.49.196.1" ] } }, "log": { "file": { "path": "/home/hadoop/app/nginx-1.8.0/logs/access.log" }, "offset": 17424 }, "source": { "address": "10.49.196.1", "ip": "10.49.196.1" }, "fileset": { "name": "access" }, "url": { "path": "/", "original": "/" }, "input": { "type": "log" }, "@timestamp": "2022-09-16T08:18:09.000Z", "ecs": { "version": "1.12.0" }, "_tmp": {}, "related": { "ip": [ "10.49.196.1" ] }, "service": { "type": "nginx" }, "host": { "name": "pxc2" }, "http": { "request": { "method": "GET" }, "response": { "status_code": 304, "body": { "bytes": 0 } }, "version": "1.1" }, "event": { "ingested": "2022-09-16T08:18:19.574911507Z", "original": "10.49.196.1 - - [16/Sep/2022:16:18:09 +0800] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"", "timezone": "+08:00", "created": "2022-09-16T08:18:18.539Z", "kind": "event", "module": "nginx", "category": [ "web" ], "type": [ "access" ], "dataset": "nginx.access", "outcome": "success" }, "user_agent": { "original": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36", "os": { "name": "Windows", "version": "10", "full": "Windows 10" }, "name": "Chrome", "device": { "name": "Other" }, "version": "105.0.0.0" } }, "fields": { "event.category": [ "web" ], "user_agent.os.full": [ "Windows 10" ], "user_agent.original.text": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" ], "url.original.text": [ "/" ], "source.address": [ "10.49.196.1" ], "user_agent.os.name.text": [ "Windows" ], "user_agent.os.version": [ "10" ], "user_agent.os.name": [ "Windows" ], "traefik.access.user_agent.name": [ "Chrome" ], "service.type": [ "nginx" ], "agent.type": [ "filebeat" ], "event.module": [ "nginx" ], "http.request.method": [ "GET" ], "related.ip": [ "10.49.196.1" ], "traefik.access.user_agent.original": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" ], "source.ip": [ "10.49.196.1" ], "agent.name": [ "pxc2" ], "host.name": [ "pxc2" ], "user_agent.version": [ "105.0.0.0" ], "http.response.status_code": [ 304 ], "http.version": [ "1.1" ], "event.kind": [ "event" ], "event.timezone": [ "+08:00" ], "event.outcome": [ "success" ], "user_agent.original": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" ], "event.original": [ "10.49.196.1 - - [16/Sep/2022:16:18:09 +0800] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"" ], "fileset.name": [ "access" ], "nginx.access.remote_ip_list": [ "10.49.196.1" ], "input.type": [ "log" ], "log.offset": [ 17424 ], "user_agent.name": [ "Chrome" ], "agent.hostname": [ "pxc2" ], "http.response.body.bytes": [ 0 ], "traefik.access.user_agent.os_name": [ "Windows" ], "user_agent.os.full.text": [ "Windows 10" ], "event.ingested": [ "2022-09-16T08:18:19.574Z" ], "url.original": [ "/" ], "@timestamp": [ "2022-09-16T08:18:09.000Z" ], "url.path": [ "/" ], "agent.id": [ "197bfd49-e03a-416e-b53f-4ac143b94fa5" ], "ecs.version": [ "1.12.0" ], "event.type": [ "access" ], "log.file.path": [ "/home/hadoop/app/nginx-1.8.0/logs/access.log" ], "event.created": [ "2022-09-16T08:18:18.539Z" ], "agent.ephemeral_id": [ "c7505853-b6de-46b0-abb0-7727dfb37d4b" ], "agent.version": [ "8.2.2" ], "user_agent.device.name": [ "Other" ], "event.dataset": [ "nginx.access" ] } }
View Code
可以看到日志信息已经被解析出各个单独的有用字段,如:source.ip,url.path,user_agent.name 等等。