Why auditing Linux for Security is like finding a needle in the haystack
Introduction
When engaging in threat hunting, system administrators and security teams often invest much time in collecting and analyzing data. A prerequisite for successfully undertaking such activities requires a high degree of visibility into the cloud environment and introspection about what those activities mean for the organization’s security posture. Only with this type of coverage can you get a clear view of the tangible outcomes of the risk coverage your cloud investments will yield. Ultimately though, the security team must provide a more actionable outcome, which can only be done through automated, continuous detection and analysis of cloud events.
An often overlooked, but highly critical element of this is the Linux Audit Subsystem. It is an important tool that professionals use to identify attempted and realized violations of a system’s security policy. The data sources are embedded at the lowest level in the kernel, supplemented with trusted originators in the user space. The level of granularity raises the expectations of being able to wrap the solution with user defined rules, based on sifting through millions of event records to find an event, or a slightly more complicated distinguisher of a benign event from a malicious one, which is the holy grail of security.
Background
Linux Audit Subsystem is developed and maintained by Red Hat. It fulfills the need to provide access monitoring and accounting of a running Linux system. Although it doesn’t offer additional security on its own, it is used to raise the bar on security in a Linux system. The event details provided by the Linux Audit Subsystem can be used to identify security violations and implement targeted security procedures. Right at the onset, the designers of the Linux Audit Subsystem had set forth the goal of being a performant and reliable solution for host auditing. This led to the design and tooling of a solution that modularized event triggers; the result was an event generator and a decoupled events processor.
There are many ways to control the operation of the audit subsystem. The available controls range from compilation time, boot time, daemon startup time, and the daemon while it is running.
With every level the overhead is close to zero unless the rule(s) are engaged that alter the behavior in the kernel. When trigger happens, the code path is altered to run, which does not have a requirement to finish very quickly — also referred to as the slow path of execution.
Audit Context
The operative principle of the Linux Audit Subsystem is at the task level. It rightfully co-locates the audit context in the task structure.
Upon task creation, the audit context is built and attached. Then, during the system call, one enters and/or exits the audit context data. File system auditing is implemented using the inotify kernel file modification notification system, placing the watched file status in audit context. Lastly the control rules allow configuring and fine tuning the audit subsystem itself. The audit context also allows for auxiliary audit information, which might be needed for specific audit events. As a result, the key-value event data format can be multi-part, engaging different subsystems of the Linux kernel to provide supporting data, answering the uber sensitive question using the principle of Five Ws — who, what, when, where, why (and how).
Audit Scenarios
Let’s do a deep dive, dissecting a few use cases to develop a deeper understanding of audit events logged by the Linux Audit Subsystem.
Connect System Call
Who: UID-1000
What: Connect system call (42) x86_64 architecture
When: 1481952319.591 [Saturday December 17, 2016 05:25:19 UTC]
Where: PID-110291 with Parent PID-39016 on the machine which logged the audit event
Why: Unknown
How: Interactive on a terminal (pts21) running curl application (/usr/bin/curl)
We are on a solid start and can answer the questions with ease. Curl is a known application used to perform HTTP GET or POST request. The application ran on this machine which internally calls the connect system call and was successful (ongoing in this case) in contacting the domain specified (more on this later). The security outcome has to be better, but so far, we’re making progress.
Multipart Audit
This example is a little more involved and shines a light on the Linux Audit Subsystem engaging different subsystem of the kernel.
Who: UID-1000
What: Read system call (2) x86_64 architecture
When: 1531158661.105 [Monday July 09, 2018 17:51:01 UTC]
Where: PID-22109 with Parent PID-1998 on the machine which logged the audit event
Why: Unknown
How: Interactive on a terminal (pts8) running cat application (/bin/cat)
This event indicates that user (UID-1000) used cat (a cross platform application used to read/concatenate files and send it to standard out). The command was executed interactively on a terminal (pts8). The interesting piece to notice is the attribute item = 1 which indicates a multipart message. These events are linked using the sequence number guiding the processor to order the events with respect to time and inspect the event bundle in its totality.
Looking further in the event logs we find the CWD (current working directory) audit event with the same sequence number 187534 as before.
This event indicates the process PID-22109 was started while the current working directory (cwd) was /tmp/dragon. Nothing unusual or suspicious here, but recall from earlier the original system call read event is multipart and has an additional piece to the puzzle.
This is where it gets interesting. From the original event (UID-1000), the private key (/home/snowy/.ssh/id_rsa) of user snowy (UID-1007) is read using cat from a directory (/tmp/dragon). This should raise some eyebrows (this is where we ask, “Why?”). This is like finding a needle in the haystack and we have been successful in concluding the investigation. The corrective action is to revoke the SSH keys of user snowy. Hopefully in time that has thwarted a potential breach.
Often such techniques of spatial locality are rendered ineffective because of the sheer volume of events (as high as hundreds of thousands of log lines per day across a large, modern fleet of servers) and sometimes the lack thereof. It may be argued that circumstance might occur where audit events are generated at a rate that is faster than that of being processed, leading to a missing signal, catastrophic both from the perspective of monitoring and investigation purposes.
Missing Application
Let’s revisit the system call connect with a slightly different scenario.
Who: UID-1000
What: Connect system call (42) x86_64 architecture
When: 1532217779.976 [Sunday July 22, 2018 00:02:59 UTC]
Where: PID-23291 with Parent PID-5817 on the machine which logged the audit event
Why: Unknown
How: Interactive on a terminal (pts12) running java application (/usr/lib/jvm/java-8-oracle/jre/bin/java)
This audit event has the usual signature of an application Java-invoking system call connect, which is almost by definition, unexpected. We know very well that Java is the JVM (Java Virtual Machine) runtime engine which launches the Java-based application. But the audit event context is missing the application detail.
Looking further in the log we find event type PROCTITLE which has the same sequence number (913318) as the original connect system call.
Decoding the proctitle which is in hexadecimal string format reveals the application name: JavaGetUrl, run by JVM. But the location of class data file in the filesystem is unknown. This requires application knowledge and the support of additional audit rules to be effective. In this case, the security outcome is unclear.
Root Trawler
Who: UID-0
What: Connect system call (42) x86_64 architecture
When: 1532236769.221 [Sunday July 22, 2018 05:19:29 UTC]
Where: PID-25217 with Parent PID-25060 on the machine which logged the audit event
Why: Unknown
How: Interactive on a terminal (pts24) running java application (/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java)
This audit event has an identical signature as the previous one. The Java application runs as UID-0 (root) interactively from terminal pts24. This is unusual; I did not login as the root, nor am I part of the sudoer list to run the Java application as root.
Sifting through events with the matching sequence number (914435) as the original connect system call, we see the event type PROCTITLE
Decoding the proctitle which is in hexadecimal string format reveals the application name JavaGetUrl run by JVM. We know this event does not provide the class data file path. A little more alarming is the fact that the audit context reports the process UID as root. A reasonable question to ask — does the needle even exist in the haystack?
Well computers don’t do magic, humans do. The mystery of UID here is that the same Java application (JavaGetUrl) was run inside a container environment performing a connect system call. The run environment dodged the security team, leaving it with little data upon which to operate.
Container technology is disruptive and transformative, and with the increasing use of software container technology, running an application and its dependencies in resource-isolated processes (namespaces, cgroups, etc.) leaves the audit context clueless within the containerization primitives. By simply bringing the container into the equation, this creates a new attack vector — a container escape. Both the scale and speed needs of the container workload adds increased pressure, dampening the enthusiasm of applications and platform developers. The open source community is already acknowledging this problem, and is making efforts to bring container awareness to audit context, enhancing visibility and increasing security awareness. Until that time, make a note that “containers” don’t really exist at some level, they’re merely configured Linux processes.
Conclusion
Security is much more than technology and rules. A secure organization tends to have better security outcomes that are intended to guide users to successful end results. The flexibility of being able to strike a balance between performance and the efficacy of rules generating actionable insights is straining the resources of security professionals. A malformed rule or practice has the ability to damper an organization’s workflow. Implementing optimal rules, a process without considering networks, users, applications, containers and machines, weakens the overall security posture.
Despite the limited visibility, Linux Audit Subsystem adopters are still responsible for the security of their data, applications, and services running anywhere. The lack of details in the audit context can mean that important clues pointing to underlying issues are missed. In a world dominated by speed and scale this makes the pursuit of identifying attempted and realized violations of the system’s and applications security policy cumbersome. Good security practices recommends that an organization should regularly test the hypothesis of security tools. This process can help identify gaps ensuring that attackers are kept at bay and attack surfaces are minimized.
###