docs/chapters/chapter1.tex

\chapter{Introduction}
\section{Motivation} \label{section:motivation}
%M-> SA bit long, but it summarizes and presents the ideas and background needed to understand the topic in order:
% Main idea: Malware keeps evolving -> 
% -> Relevance of innovating and researching on the new techniques ->
% -> Relevance of stealth software in targeted attacks-> 
% -> Introduce eBPF as the logical step of innovation in the field ->
% -> There is a need to research on this topic now.

As the efforts of the computer security community grow to protect
increasingly critical devices and networks from malware infections, the
techniques used by malicious actors become more sophisticated.  Following
the incorporation of ever more capable firewalls, Endpoint Detection and
Response (EDR), and Intrusion Detection Systems (IDS), cybercriminals have
in turn sought novel attack vectors and exploits in common software, taking
advantage of an inevitably larger attack surface that keeps growing due to
the continued incorporation of new programs and functionalities into modern
computer systems.

In contrast with ransomware incidents, which remained the most significant
and common cyber threat faced by organizations in 2021
\cite{ransomware_pwc}, a powerful class of malware called rootkits is found
considerably more infrequently, yet it is usually associated to
high-profile targeted attacks that lead to greatly impactful consequences.

A rootkit is a piece of computer software characterized for its advanced
stealth capabilities. Once it is installed on a system it remains invisible
to the host, usually hiding its related processes and files from the user,
while at the same time performing the malicious operations for which it was
designed. Common operations include storing keystrokes, sniffing network
traffic, exfiltrating sensitive information from the user or the system, or
actively modifying critical data at the infected device. The other
characteristic functionality is that rootkits seek to achieve persistence
on the infected hosts, meaning that they keep running on the system even
after a system reboot, without further user interaction or the need of a
new compromise. The techniques used for achieving both of these
capabilities depend on the type of rootkit developed. One of the most
commmon classifications is based on the level of privileges on which the
rootkit operates in the system:
\begin{itemize}
\item \textbf{User-mode} rootkits run at the same level of privilege as
common user applications. They usually work by hijacking legitimate
processes on which they may inject code by preloading shared libraries,
thus modifying the calls issued to user APIs, on which malicious code is
placed by the rootkit. Although easier to build, these rootkits are exposed
to detection by common anti-malware programs and other simple system
inspection techniques.
%I am mentioning the kernel panic part because that could be considered an advantage for eBPF, there is less worry about crashing the system
\item \textbf{Kernel-mode} rootkits run at the same level of privilege as
the operating system, thus enjoying unrestricted access to all system
resources. These rootkits usually come as kernel modules or device drivers
and once loaded, they reside in the kernel. This implies that special
attention must be taken to avoid programming errors since they could
potentially corrupt user or kernel memory, resulting in a fatal kernel
panic and a subsequent system reboot, which goes against the original
purpose of maintaining stealth.

Common techniques used for the development of their malicious activities
include hooking system calls made to the kernel by user applications (on
which malicious code is then injected) or modifying data structures in the
kernel to change the data of user programs at runtime. Therefore, trusted
programs on an infected machine can no longer be trusted to operate securely.

Kernel-mode rootkits are usually the most attractive (and difficult to
build) option for a malicious actor, but their installation requires a
complete previous compromise of the system, meaning that administrator or
root privileges must have been already achieved by the attacker, commonly
by the execution of an exploit or a local installation of a privileged user.
\end{itemize}

Historically, kernel-mode rootkits have been tightly associated with
espionage activities on governments, research centers, or key industry
actors by Advanced Persistent Threat (APT) groups
\cite{rootkit_ptsecurity}---state-sponsored or criminal organizations
specialized on long-term operations to gather intelligence and gain
unauthorized persistent access to computer systems. Although rootkits'
functionality is tailored for each specific attack, a common set of
techniques and procedures can be identified being used by these
organizations.

%Yes, I am not mentioning that eBPF comes from "Extended Berkeley Packet %Filters here since apparently it is no longer considered an acronym, we'll %tackle that on the history section
During the last years, a new technology called eBPF has been found to be at
the heart of the latest innovation on the development of rootkits.  eBPF is
a technology incorporated in the 3.18 version of the Linux kernel
\cite{ebpf_linux318} that allows running code in the kernel without the
need of loading a kernel module. Programs are created in a restrictive
version of the C language and compiled into eBPF bytecode, which is loaded
into the kernel via a new bpf() system call. After a mandatory step of
verification by the kernel in which the code is checked to be safe to run,
the bytecode is compiled into native machine instructions. These programs
can then get access to kernel-exclusive functionalities including network
traffic filtering, system calls hooking or tracing.

Although eBPF has built an outstanding environment for the creation of
networking and tracing tools, its ability to run kernel programs without
the need to load a kernel module has attracted the attention of multiple
APT groups. On February 2022, the Chinese security team Pangu Lab reported
about a NSA backdoor that remained unnoticed since 2013. This implant used
eBPF for its networking functionality and infected military and
telecommunications systems worldwide \cite{bvp47_report}. Also on 2022, PwC
reports about a China-based threat actor that has targeted
telecommunications systems with a eBPF-based backdoor \cite{bpfdoor_pwc}.

Current official efforts are focused on porting the eBPF technology to
Windows \cite{ebpf_windows} and Android systems \cite{ebpf_android}, which
could spread the mentioned risks to new platforms.  Therefore, we can
confidently claim that there is a growing interest in researching the
capabilities of eBPF in the context of offensive security, in particular
given its potential to become a common component for modern rootkits and
other offensive tools. This knowledge would be valuable to the computer
security community, both in the context of pen-testing and for analysts
which need to know about the latest trends in malware to prepare their
defenses.


\section{Project objectives} \label{section:project_objectives}
The main objective of this project is to investigate and demonstrate the capabilities of
the eBPF technology that could be weaponized by a malicious actor. In
particular, we will focus on functionalities present in the Linux platform,
given the maturity of eBPF on these environments and which therefore offers
a wider range of possibilities. We will be approaching this study from the
perspective of a threat actor, meaning that we will develop an eBPF-based
rootkit which shows these capabilities live in a current Linux system,
including proof of concepts (PoC) showing specific features, and also by
building a realistic rootkit system which leverages these PoCs and
integrates them into a fully operational implant.

%According to the library guide, previous research should be around here. %Is it the best place tho?
Before narrowing down our objectives and selecting a specific list of
rootkit capabilities to provide using eBPF, we analyze previous research in
this area. The work by Jeff Dileo from NCC Group at DEFCON 27
\cite{evil_ebpf} is particularly relevant, as it discusses for the first
time the ability of eBPF to overwrite userland data, highlighting the
possibility of overwriting the memory of a running process and executing
arbitrary code on it.
%
Subsequent talks on 2021 by Pat Hogan at DEFCON 29 \cite{bad_ebpf}, and by
Guillaume Fournier and Sylvain Afchainthe from Datadog at DEFCON 29
\cite{ebpf_friends}, research deeper on eBPF's ability to support rootkit
capabilities. In particular, Hogan shows how eBPF can be used to hide the
rootkit's presence from the user and to modify data at system calls, while
Fournier and Afchainthe built the first instance of an eBPF-based backdoor
with command-and-control(C2) capabilities, enabling to communicate with
the malicious eBPF program by sending network packets to the compromised
machine.

Taking these previous research works into account, and considering the
common functionality usually to be incorporated into a rootkit, the
objectives of our research on eBPF are summarized in the following points:
\begin{itemize}
\item Analyze eBPF's possibilities to hook system calls and kernel
functions.
\item Explore eBPF's potential to read/write arbitrary memory.
\item Research networking capabilities with eBPF packet filters.
\end{itemize}

The knowledge gathered by the previous three pillars will be then used as a
basis for building our rootkit. We will present attack vectors and
techniques different than the ones presented in previous research, although
inevitably we will also tackle common points, which will be clearly
indicated and on which we will try to perform further research. In essence,
our eBPF-based rootkit aims at:
\begin{itemize}
\item Hijacking the execution of user programs while they are running, injecting libraries and executing malicious code, without impacting their normal execution.
\item Featuring a command-and-control module powered by a network backdoor, which can be operated from a remote client. This backdoor should be controlled with stealth in mind, featuring similar mechanisms to those present in rootkits found in the wild.
\item Tampering with user data at system calls, resulting in running malware-like programs and for other malicious purposes.
\item Achieving stealth, hiding rootkit-related files from the user.
\item Achieving rootkit persistence, the rootkit should run after a complete system reboot.
\
\end{itemize}

The rootkit will work in a fresh-install of a Linux system with the following characteristics:
\begin{itemize}
%Maybe a table for this?
\item Distribution: Ubuntu 21.04.
\item Kernel version: 5.11.0-49.
\end{itemize} 


\subsection{Social and economic environment}\label{sec:social_econ_env}
%M-> Mentioned talking about community outreach and its role under pentesting
%TODO Talk about the difference between having always on BPF and always on kernel modules, BPF is consider "safe" in production while it's almost as dangerous (I think this might fit here)

Our world has a growing dependency on digital systems. From the use of
increasingly complex computer systems and networks in business environments
to the thriving industry of consumer devices, the use these digital systems
has shaped today's society and will likely continue to do so in the future. 

As discussed in our project motivation, this has also implied an increasing
relevance of the cybersecurity industry, particularly as a consequence of a
growing number of cyber incidents. The use of malware and, in particular,
ransomware attacks currently stands as one of the major trends among threat
actors, which has impacted both the private and public sector with infamous
attacks. Moreover, during the last decade there has been a steady influx of
targeted high-impact attacks featuring increasingly complex techniques and
attack vectors, which raises the need to stay up to date with the latest
discovered vulnerabilities.

As a response for this growing concern, the computer security community has
proposed multiple procedures and frameworks with the aim of minimizing
these cyber incidents, setting a series of fundamental pillars on which
cyber protection activities on organizations shall be based. As a summary,
these pillars are often defined to revolve around the following actions
\cite{nist_cyber}:
\begin{itemize}
\item Identifying security risks.
\item Protecting computer systems from the identified security risks.
\item Detecting attacks and malicious activity.
\item Responding and taking action when a cyber incident is detected.
\item Recovering after the cyber incident, reducing the impact of the attack.
\end{itemize} 

Focusing our view on the identification of security risks, we can find the
use of adversary simulation exercises, whose purpose is to test the
security of a system or network by emulating the role of a threat actor,
thus trying to find vulnerabilities and exploit them in this controlled
environment so that these security flaws can be patched. There exist two
main types of assessments \cite{pentest_redteam}:
\begin{itemize}
\item Penetration testing (pentesting) exercises, whose aim is mainly to discover which known unpatched vulnerabilities are present in the computer system, attempting to exploit them. These exercises are focused on uncovering as many vulnerabilities as possible and, therefore, in many ocassions the stealth which a real attacker would need while performing such process is disregarded.
\item Red teaming exercises, whose aim is to uncover vulnerabilities as in pentesting, but this process is done quietly (with stealth in mind) and using any resource available, often crafting targeted attacks emulating the procedures which a real threat actor such as an APT would follow. Therefore, the goal is not to find as many vulnerabilities as possible, but rather these exercises take place in already security-audited environments to be further protected from targeted cyber attacks.
\end{itemize}

Our efforts to better understand the offensive capabilities offered by eBPF
are relevant to both pentesters and red teamers. For the security
professionals performing these exercises, it is essential not only to know
about the latest security trends being used by threat actors, but also to
have expertise on the techniques and attack vectors employed in these cyber
attacks. Therefore, a research on last-generation rootkits using eBPF is
useful and relevant for the security community, which will benefit
positively from having an open-source rootkit showcasing the offensive
capabilities of the eBPF technology.

Consequently, given the growing importance of eBPF for offensive security,
it also undertakes a positive impact in the area of defensive security. In
particular, it presents a clear example on how eBPF may be weaponized for
malicious purposes, thus inspiring system administrators and other
professionals to consider eBPF programs as a possible attack vector. As we
will show during this research work, our rootkit can achieve similar
capabilities compared to classic rootkits based on kernel modules. However,
while kernel modules are usually considered a risk and might be restricted
(or its activity, particularly loading a new one, easy to flag), in many
environments eBPF remains as a technology often available by default and
not considered in the security framework of most organizations. Therefore
our project aims to raise awareness on this regard.


\section{Regulatory framework}
As discussed in Section \ref{sec:social_econ_env}, this project is tightly
related to both cybersecurity in general and to offensive tools in
particular. We will now analyze the most relevant frameworks that regulate
or are related to both activities with the purpose of studying how they can
be applied to the development of our rootkit.

\subsection{NIST Cybersecurity Framework}
In the case of activities related to cybersecurity, multiple standards and
frameworks regulate the best practices and guidelines to follow for
managing cyber risks. One of the most relevant is the Framework for
Improving Critical Infrastructure Cybersecurity by the National Institute
of Standards and Technology (NIST) \cite{nist_cyber}. This is the framework
that regulates the 5 pillars of cyber risk mamagement which we have
discussed in Section \ref{sec:social_econ_env}, describing the needs
originated by each pillar (in the framework named as 'Categories') and the
actions needed for meeting the requirements of each of these needs
('Subcategories'). In particular, we can identify the following procedures
on each of these pillars relevant in our context:
\begin{itemize}
\item With respect to the 'Identify' pillar, the framework highlights the need for Asset Management and Risk Assessment between others:
	\begin{itemize}
	\item Asset Management refers to the identification and management of data, devices and systems, so as to consider their relative importance in the organization objectives and cyber risk strategy. This involves inventorizing all software platforms and applications used in the organization. In our case, maintaining strict control over the software present on each system reduces the risk of an infection.
	\item Risk Assessment refers to the identification of the vulnerabilities of each of the organization assets, receiving intelligence about cyber threat from external forums and sources, and identifying the risks, likelihook and impact of an specific risk. In the case of eBPF, it relates to the identification of devices and systems supporting this technology and assessing the risk of malicious eBPF programs using, for instance, this research work as one of the external sources described in the framework.
	\end{itemize}
\item With respect to the 'Protect' pillar, it describes the need for Identify Management, Authentication and Access Control, together with the use of Protective Technologies, between others:
	\begin{itemize}
	\item With respect to Identify Management, Authentication and Access Control, the framework describes the need to use the principle of least privilege and management of access permissions, that is, assigning the least permissions possible to each system component. In the case of our rootkit, this is particularly relevant given that it needs to be executed as root or by an user with specific capabilities, as we will explain in section \ref{section:ebpf_security}.
	\item Protective Techniques are solutions with the aim of managing the security of systems and organization assets. In this category we can find the storage of log records about activity on the system, and the protection of communication in the organization network. In the case of our rootkit, maintaining logs and non-plaintext connection means the rootkit shall increase its complexity and invest some resources into stealth functionalities.
	\end{itemize}
\item With respect to the 'Detection' pillar, the framework describes the need for an Anomalies and Events policy and Security Continuous Monitoring, between others.
	\begin{itemize}
	\item An Anomalies and Events policy relates to detecting and analysing the risk of suspicious events in the network and at systems. This includes gathering information about each of the events in the system using multiple sensors, analysing the data and the origin of each, and analysing the impact of them. In the context of our rootkit, a proper management of system events could disclose the rootkit activities (e.g.: when it is loaded or when it executes user process) although this can be mitigated by the use of stealth techniques.
	\item Security Continuous Monitoring relates to the monitoring of information systems and organization assets with the purpose of identifying cybersecurity-related events. Some actions described in this regard by the framework include monitoring the network for events, scanning programs for malicious code, and implementing monitoring policies for detecting unauthorized software and network connections. In our case, these all belong to recommended steps an organization shall take to prevent and early detect an infection by a rootkit (and therefore the rootkit will attempt to circumvent these actions by means of stealth techniques).
	\end{itemize}
\item With respect to the 'Respond' pillar, the framework describes the need for Analysis, between others:
	\begin{itemize}
	\item Analysis refers to conducting response processes after the detection of a cyber attack, analysing the causes to support recovery activities. This includes analysing the events gathered in log traces and other sensors, performing a forensic investigation on the cyber attack. In our case, an organization infected by an eBPF rootktit needs to analyse the source of the compromise and analyse its functioning so as to know the extent of the infection.
	\end{itemize}
\item Finally, with respect to the 'Recover' pillar the NIST framework shows the need for Recovery Planning and Improvements policies between others:
	\begin{itemize}
	\item Recovery Planning relates to the process of restoring the original state of systems and assets impacted by a cyber incident. In the case of our rootkit, previous conduced analysis should unveil the rootkit persistence capabilities, so that in this step these are nullified and the eBPF programs belonging to the rootkit are unloaded.
	\item Improvements relates to the need to incorporate the new knowledge and leasons learned after the cyber incident into existing organization policies. In the case of an organization infected by an eBPF rootkit, it would proceed to adopt protective measures for mitigating a similar attack, such as blocking its use. 
	\end{itemize}
\end{itemize}


\subsection{MITRE ATT\&CK}
MITRE Adversarial Tactics, Techniques, and Common Knowledge (MITRE ATT\&CK) is a framework collecting knowledge about adversarial techniques, that is, techniques, methodologies and offensive actions followed by threat actors that can be used against computer systems. This is an useful framework for red teaming or pentesting activities performing adversary emulation exercises, since it details adversary behaviours and the techniques being used, which can help to build multiple attack scenarios. Moreover, it is also relevant for professionals in charge of defending a system, since they can prepare and design mitigations for the techniques described in the framework \cite{mitre_blog}\cite{mitre_blog_2}.

A relevant aspect of the MITRE ATT\&CK framework is the MITRE ATT\&CK Matrix, which contains all the adversarial techniques organized as 'tactics'. These tactics are the objective of the adversary, which it aims to achieve by using each corresponding technique. Therefore, the MITRE ATT\&CK Matrix shows a list of columns, where each column is one tactic (one adversary objective), and each row on that column shows the techniques associated to that tactic, explaining how that objective can be achieved. Additionally, different matrices exist depending on the platform. In this project, we will consider the Linux Matrix \cite{mitre_matrix_linux}.

Using the Linux MITRE ATT\&CK matrix, red teamers and pentesters can evaluate the techniques incorporated in our rootkit according to the tactics described in the framework. These tactics range from an initial access step (which usually preceeds the adversary attack) to the description of the impact that the attack has on the normal functioning of the system. In summary, these tactics are the following:
\begin{itemize}
\item \textbf{Initial access}, comprising techniques to gain a foothold in the system or network, such as spear-phising attacks, with which the adversary may obtain credentials that can be used to achieve access to a machine.
\item \textbf{Execution}, comprising techniques used to execute code in the target system. This includes exploiting vulnerabilities that lead to Remote Code Execution (RCE).
\item \textbf{Persistence}, comprising techniques that enable the adversary to maintain access at the system after the initial foothold, independently on the actions performed by the target machine (which may reboot or change passwords). One of these techniques is using scheduled jobs (such as Cron, which will be used in our rootkit).
\item \textbf{Privelege escalation}, consisting on techniques used to achieve privileged access in a machine from an original unprivileged access position. This includes techniques that abuse the elevation control mechanisms of a system, such as sudo, which will be used in our rootkit.
\item \textbf{Defense evasion}, comprising techniques to avoid detection after a machine infection. This includes hiding processes, files and directories, or network connections related to the adversary activities.
\item \textbf{Credential access}, comprising techniques to steal passwords and other credentials from the system. An example of such a technique is sniffing the network for credentials transmitted in plaintext.
\item \textbf{Discovery}, comprising techniques used by the adversary to gather knowledge about the target system and the available actions it may engage with (once it has access to the system, e.g. execution of commands). This includes techniques such as listing running processes or scanning the network.
\item \textbf{Lateral movement}, comprising techniques allowing for pivoting through systems from the internal network after having compromised the original target machine. An example of a technique acomplishing this is the exploitation of vulnerabilities that can only be exploited from the local network.
\item \textbf{Collection}, comprising techniques to gather critical information in the compromised system, with the purpose of, often, leaking them. In contrast to the discovery tactic, collection techniques do not search for possible targets in the compromised system, but rather use this knowledge to locate key data and exfiltrate it.
\item \textbf{Command and control}, comprising techniques that enable an attacker to communicate with the compromised machine, usually issuing commands and actions to be executed by it. Since network traffic belonging to the adversary activities should remain hidden, techniques belonging to this category include encoding or obfuscating data so that they can be transmitted secretly.
\item \textbf{Exfiltration}, containing the techniques used for exfiltrating the data collected during the Collection step, transmitting this data outside of the compromised network. The use of C2 encrypted channels is a recurrent technique. Our rootkit will use this and other communication means for sending data from the infected to the attacker machine.
\item \textbf{Impact}, comprising techniques used by the adversary to manipulate or destroy data, and to disrupt the normal services at the compromised machine. A common technique belonging to this tactic is the modification of system files, which we will use to implement multiple of the rootkit functionalities.
\end{itemize}

\subsection{Software licenses}
Finally, it must be noted that this project uses the libbpf library
\cite{libbpf_github}, as described in Section \ref{subsection:libbpf}, for
the development of our eBPF rootkit. This library is licensed under dual
BSD 2-clause license and GNU LGPL v2.1 license. 
%Should I say something else? I usually license my own projects under GPLv3 because I don't like corporations taking the code, but I guess I am restricted to use the Creative Commons license.


\section{Structure of the document}
This section details the structure of this document and the contents of each chapter with the aim of offering a summarized view and improving its readibility.

\textbf{Chapter 1: Introduction} describes the motivation behind the project and the purposes it aims to achieve, presenting the functionalities expected to be implemented in our rootkit. It also discusses the regulatory frameworks and the environmental issues related to the development of the research work.

\textbf{Chapter 2: Background} presents all the concepts needed for the later discussion of offensive capabilities. It includes an in-depth description of the eBPF system, a brief discussion of its security features and multiple alternatives for developing eBPF programs. It also discusses networking concepts and an offers an overview on the memory architecture at Linux systems, showing basic attacks and techniques that are the basis of those later incorporated to the rootkit.

\textbf{Chapter 3: Analysis of offensive capabilities of eBPF} discusses the possible capabilities of a malicious eBPF program, describing which features of the eBPF system could be weaponized and used for offensive purposes.

\textbf{Chapter 4: Design of a malicious eBPF rootkit} describes the architecture of the rootkit we have developed, offering a comprehensive view of the different techniques and attacks designed and implemented on each of the rootkit modules and components.

\textbf{Chapter 5: Evaluation} analyses whether the rootkit developed meets the expected functionality proposed in the project objectives by testing the rootkit capabilities in a simulated testing environment. We will prepare a virtualized network consisting of two connected machines, where one is the infected host and the other belongs to the attacker, proceeding to test every rootkit functionality.

\textbf{Chapter 6: Related work} includes a comprehensive review of previous work on UNIX/Linux rootkits, their main types and most relevant examples. We also offer a comparison in terms of techniques and functionality with previous families. In particular, we highlight the differences of our eBPF rootkit with respect to others that rely on traditional methods, and also to those already built using eBPF.

\textbf{Chapter 7: Budget} describes the costs associated to the development of this project, including personnel, hardware and software related costs.

\textbf{Chapter 8: Conclusions and future work} revisits the project objectives, discusses the work presented in this document, and describes possible future research lines.

\section{Code availability}
%Is it ok to reference the repo as a cite? Maybe it's better writing the link directly?
All the source code belonging to the rootkit development can be visited publicly at the GitHub repository \cite{triplecross_github}. The most important folders and files of this repository are described in Table \ref{table:triplecross_dirs}.

%I can go with more detail if needed. Is it needed?
\begin{table}[htbp]
\begin{tabular}{|>{\centering\arraybackslash}p{4cm}|>{\centering\arraybackslash}p{10cm}|}
\hline
\textbf{DIRECTORY} & \textbf{DESCRIPTION}\\
\hline
\hline
src/client & Source code of rootkit client.\\
\hline
src/client/lib & RawTCP\_Lib shared library.\\
\hline
src/common & Constants and configuration for the rootkit. It also includes the implementation of elements common to the eBPF and user space side of the rootkit, such as the ring buffer.\\
\hline
src/ebpf & Source code of the eBPF programs used by the rootkit.\\
\hline
src/helpers & Includes programs for testing rootkit modules functionality, and the malicious program and library used at the execution hijacking and library injection modules respectively.\\
\hline
src/libbpf & Contains the libbpf library, integrated with the rootkit.\\
\hline
src/user & Source code of the user land programs used by the rootkits.\\
\hline
src/vmlinux & Headers containing the definition of kernel data structures (this is the recommended method when using libbpf).\\
\hline
\end{tabular}
\caption{Relevant directories at TripleCross repository.}
\label{table:triplecross_dirs}
\end{table} 

Additionally, the source code of the RawTCP\_Lib library can be visited publicly at its own GitHub directory \cite{rawtcp_lib}.