6.5 How to configure log collection from Plain Text files
SyskeyOT Agent Plain Text Log Collection
SyskeyOT Agent supports collection of logs from plain text log files created by various applications like SCADA systems.
The file path, format of the logs and datetime must be configured to collect the logs from text file. To handle different log formats, the SyskeyOT Agent supports the concept of Grok patterns.
What are Grok Patterns?
A Grok pattern is a technique to parse and structure unstructured log data. Grok uses regular expressions combined with pre-defined patterns to extract specific data fields from log messages. It simplifies log processing by matching patterns to identify components like timestamps, IP addresses, log levels, etc.
For example, a pattern like %{IP:client_ip} %{WORD:method} %{URIPATHPARAM:request} extracts the client IP, HTTP method, and request URL from a log line.
Grok patterns help convert raw log entries into structured data.
Procedure
- In the top navigation pane, click on the Plain Text Log
- Click the "Add New" button, which will open a configuration window
- Provide the following details:
Configuration Parameters
Source Name – A name to uniquely identify the source of the log
Source
- File – Collect logs from single file. Opens the file and listens for changes and collects continuously from the file. Preferred when the name of the file does not get changed.
- Folder – Collect logs from multiple files from a folder. Preferred when an application generates different files per day/per hour (i.e., the app creates file1.log, file2.log and so on). In those cases, it is preferred to enable Folder-based collection.
File / Folder Path – The path of the file where the logs are to be collected. The system will automatically take care of file rollover (i.e., source application deletes and recreates the file).
Files have duplicate Logs – Applicable for folder-based collection only. When new files created in the folder contain old logs, enabling this correctly handles repeated logs in the new file by skipping those.
File Name Filters - Applicable for folder-based collection only. A wildcard file filter can be set to include only those files matching the criteria for log collection.
Read only new content from end – Setting to indicate whether the system only listens and collects the new logs from now on or should collect everything from the file. The default is disabled (i.e., by default the system reads all entries from start).
Save the last read location – Setting to indicate whether to bookmark the last position. This setting is important so that the system resumes properly from where it left off due to a service restart or OS restart. If this is disabled, there is a chance that logs generated when the Agent Service was offline could be omitted.
Match Pattern – Grok pattern to match the log entry in the file. Refer to Appendix 1 for the list of supported grok patterns. It can also be a valid regex with capture groups. The captures must contain the following names:
- "DATE" or "DAY" and "MONTH" and "YEAR"
- "MESSAGE"
- "TIME" → is optional
- The system extracts the date and time as specified in the capture groups and applies timezone rules and constructs the date.
Date Format – The format of the date and/or time in the log message, so that the dates will be parsed correctly. Refer to Appendix 2 for more details.
Time Format – This is optional. If the log message's date and time are located at different places, then the grok expression may have different keys for DATE and TIME. In that case, the time format is required. If the DATE contains time, then there is no need to define the time format.
Time Zone Handling
- Assume Local → The log data does not have time zone information. Assumes the time in the log file is in local time zone.
- Assume UTC → The log data does not have time zone information. Assumes the time in the log file is in UTC (Universal Coordinated Time) time zone (00:00).
- Assume Custom Offset → The log data does not have time zone information. Assume the time is in a custom offset. Provide the offset value (e.g., "-04:00", "-05:30").
- Assume Custom IANA TimeZone → The log data does not have time zone information. Assume the time is in a specific IANA timezone ID (e.g., "Asia/Dubai", "America/New_York").
- Data Has Offset → The log data has the time zone offset. Make sure to provide the timezone parsing string in the time/datetime pattern. Refer to the appendix for more details.
- Data Has IANA Time zone → The log data has the time zone ID. Make sure to provide the timezone parsing string in the time/datetime pattern. Refer to the appendix for more details.
Severity / Log Level Mapping Field – If the log message has a level/severity field which has to be mapped to syslog severity, provide the field name and define the mapping by clicking the "Map Severity" button. For example, refer to the parsing template "Qognify VMS → Client Log".
Pick from Template
SyskeyOT Agent already provides a list of grok patterns, date formats, and time formats required to process different types of log files, which can be picked from the template instead of writing a custom one.
Example 1: WIN CC Project Manager Log File
Log Message:
2024-08-09 08:36:00.667 PM( 5148-12204) CSensSink::ProcessSessionChangeMessage WTS_SESSION_LOGON Called
Configuration:
- Grok Pattern:
%{TIMESTAMP_ISO8601:DATE}\s*%{WORD:TYPE}\(\s*%{WORD:PID}-%{WORD:TID}\)\s*%{GREEDYDATA:MESSAGE} - Date Format:
yyyy-MM-dd HH:mm:ss.fff - Time Format: Not Required
- Time Zone Handling: Local
- Time Zone Offset: Not Required
Parsed Output:
{
"EventTime": "2024-08-09T08:36:00.667+05:30",
"Host": "SYSKEYOT-10",
"Source": "WinCC_Project",
"Message": "CSensSink::ProcessSessionChangeMessage WTS_SESSION_LOGON Called",
"TYPE": "PM",
"PID": "5148",
"TID": "12204"
}
Note: The Grok pattern should contain labels "DATE", "MESSAGE", "TIME" (optional) for optimal processing. If DATE is split into multiple locations, then the system looks for "YEAR", "MONTH", "DAY" in the pattern to identify. All other keys will be automatically transformed into JSON keys.
Example 2: Qognify Server Logs
Log Message:
VMS Core Server STARTED at Fri Nov 24 07:28:41 EST 2023
Configuration:
- Grok Pattern:
%{GREEDYDATA:MESSAGE} at %{WORD} %{WORD:MONTH} %{INT:DAY} %{TIME:TIME} %{WORD} %{INT:YEAR} - Date Format:
yyyyMMMdd - Time Format:
HH:mm:ss - Time Zone Handling: Assume Custom IANA TimeZone
- Time Zone Id:
America/New_York
Parsed Output:
{
"EventTime": "2023-11-24T07:28:41-05:00",
"Host": "SEY-02",
"Source": "Qognify_VMS_-_VMS_Core_Server_Status_Log",
"Message": "VMS Core Server STARTED"
}
Important Points to Consider
- Since date is split into different places in the log data, the DATE capture group is not used. Instead, DAY MONTH YEAR pattern is used, which is combined by the system as YEARMONTHDAY. Hence the date pattern is
yyyyMMMdd. - Since the message contains non-standard timezone abbreviation EST/EDT, the IANA timezone information is required to process the log, which should be set by Time Zone Handling and Time Zone Id.
- The patterns without capture group like
%{WORD}will be ignored in the final result.