Managed to find a workaround for this.
We recently changed our syslog server to Elastic Stack. The stack includes Logstash which ingests the syslog data and has powerful grok/regex filters. Here's the filter I came up with to workaround this issue:
filter {
if [type] == "syslog" {
grok {
match => { "message" => "(?:%{SYSLOGTIMESTAMP:syslog_timestamp}|%{TIMESTAMP_ISO8601:syslog_timestamp})(?: %{YEAR:aruba_erroneous_year})? %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\/%{DATA:container_name}\/%{DATA:container_id})?(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
remove_field => ["message", "aruba_erroneous_year"]
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601" ]
}
dns {
reverse => [ "host", "syslog_hostname" ]
action => "replace"
}
}
}
If it finds an erroneous year after the timestamp, it puts it in a field called aruba_erroneous_year and then promptly removes the field. Prior to this, the year was ending up in the syslog_hostname field.
Seems crazy, some may say arrogant, that Aruba chooses not to comply with RFC 3164 or its replacement, RFC 5424.
EDIT: Updated the filter to make one of the surrounding spaces around the erroneous year optional too, otherwise it wouldn't match on correctly-formatted syslog messages that didn't include the year :$
Also added support for ISO8601 date formats.