Skip to content

kernelCTF: add CVE-2024-26923_lts_cos#308

Open
lambdasprocket wants to merge 1 commit intogoogle:masterfrom
lambdasprocket:CVE-2024-26923
Open

kernelCTF: add CVE-2024-26923_lts_cos#308
lambdasprocket wants to merge 1 commit intogoogle:masterfrom
lambdasprocket:CVE-2024-26923

Conversation

@lambdasprocket
Copy link
Contributor

No description provided.

@koczkatamas koczkatamas added the kCTF: vuln OK The submission exploits the claims vulnerability (passed manual verification) label Jan 19, 2026

We have to use the other CPU available to perform 2 operations during this window:
1. Send the victim socket through this connecting socket.
2. Close the victim socket
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Close the victim socket
2. Close the victim socket, so that its standard file reference count drops to zero, leaving only the garbage collector's internal references.

We have to use the other CPU available to perform 2 operations during this window:
1. Send the victim socket through this connecting socket.
2. Close the victim socket
3. Trigger garbage collection and run unix_gc() until the start of window 2.
Copy link
Collaborator

@artmetla artmetla Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Trigger garbage collection and run unix_gc() until the start of window 2.
3. Trigger garbage collection and run `unix_gc()` until the start of window 2, by closing an unrelated socket, which forces `unix_gc()` to wake up and scan the inflight list.

This function is triggered by executing connect on CPU 0. This CPU will do nothing else until the race conditions part of the exploit is over.

We have to use the other CPU available to perform 2 operations during this window:
1. Send the victim socket through this connecting socket.
Copy link
Collaborator

@artmetla artmetla Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Send the victim socket through this connecting socket.
1. Send the victim socket through this connecting socket, using SCM_RIGHTS to make it an 'inflight' socket, which forces the garbage collector to track it.
> Note: SCM_RIGHTS is a special message type that allows Unix sockets to send open file descriptors to each other. When a socket is sent this way but hasn't been read out of the queue yet, it is considered "inflight." The garbage collector specifically tracks inflight sockets to prevent cyclic memory leaks.

...
```

This function is triggered by executing connect on CPU 0. This CPU will do nothing else until the race conditions part of the exploit is over.
Copy link
Collaborator

@artmetla artmetla Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This function is triggered by executing connect on CPU 0. This CPU will do nothing else until the race conditions part of the exploit is over.
This function is triggered by executing connect on CPU 0. To win the race, the exploit intentionally stalls this thread right inside Window 1. This CPU will do nothing else until the race conditions part of the exploit is over, leaving the newly created "embryo" socket (newsk) allocated but not yet linked to the receive queue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check all of my assumptions in form of "suggestions". If you have an idea how to explain it better, please do.


This function is triggered by executing connect on CPU 0. This CPU will do nothing else until the race conditions part of the exploit is over.

We have to use the other CPU available to perform 2 operations during this window:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We have to use the other CPU available to perform 2 operations during this window:
We have to use the other CPU available to perform 3 operations during this window:

1. The first scan_children() can not see the embryo in the receive queue of the server socket
2. The second scan_children() has to see the embryo.

This causes a decrement/increment mismatch and the resulting use-after-free.
Copy link
Collaborator

@artmetla artmetla Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This causes a decrement/increment mismatch and the resulting use-after-free.
This causes a decrement/increment mismatch and the resulting use-after-free. Because the garbage collector does not expect an embryo to be enqueued mid-scan, it misses the embryo during the first pass (failing to decrement the victim's `u->inflight` counter). When the stalled thread unfreezes, the embryo is enqueued. The GC sees it during the second pass and increments the victim's count. The victim's `unix_sock` reference count is now artificially, leaving a dangling pointer in the gc_inflight_list when the socket is closed.

@@ -0,0 +1,219 @@
## Triggering the race condition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on the setup needed for the triggering race?

From what I see, the exploit sets up a listening server socket, a client socket to connect to it, and a separate "victim" socket that will eventually be corrupted. During the client's connect() call, the kernel dynamically allocates a new socket (the "embryo") to represent the server's side.

To have a chance of aligning the two threads correctly we have to extend both race windows as much as possible.
To do that we use a well-known timerfd technique invented by Jann Horn.
The basic idea is to set hrtimer based timerfd to trigger a timer interrupt during our race window and attach a lot (as much as RLIMIT_NOFILE allows)
of epoll watches to this timerfd to make the time needed to handle the interrupt longer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of epoll watches to this timerfd to make the time needed to handle the interrupt longer.
of epoll watches to this timerfd. When the timer fires, the kernel is forced to slowly iterate over hundreds of these watchers inside the interrupt handler, artificially stretching the race window from nanoseconds to milliseconds.


## Exploiting the use-after-free

At this point our victim socket is inflight, linked in the gc_inflight_list and has a inflight reference value of 2.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At this point our victim socket is inflight, linked in the gc_inflight_list and has a inflight reference value of 2.
At this point our victim socket is inflight, linked in the gc_inflight_list and has a inflight reference value of 2 (stored inside the struct unix_sock).

## Exploiting the use-after-free

At this point our victim socket is inflight, linked in the gc_inflight_list and has a inflight reference value of 2.
Next step is to receive this socket and close it. This will cause its struct sock object to be freed, but it will stay referenced in the gc_inflight_list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Next step is to receive this socket and close it. This will cause its struct sock object to be freed, but it will stay referenced in the gc_inflight_list.
Next step is to receive this socket and close it. Receiving it drops the inflight count from 2 down to 1, and closing it drops its standard file descriptor reference count to 0. This will cause its struct sock object to be freed, but it will stay referenced in the gc_inflight_list.

To be able to exploit the use-after-free we have to cause the slab containing our victim objects to be discarded and returned to the page allocator.
This is done using standard cross-cache techniques:
1. Free all objects of the given slab
2. Create a lot of partial slabs to unfreeze the empty slab and get it discarded
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Create a lot of partial slabs to unfreeze the empty slab and get it discarded
2. Create a lot of partial slabs to unfreeze the empty slab and force the kernel to flush it out and get it discarded.


However, in this case we need maximum reliability - winning the race is such a rare event that we can't afford to make mistakes in the later stages of the exploit.

Because of this we used the /proc/zoneinfo parsing technique to establish a known UNIX cache state before starting the exploit attempt.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Because of this we used the /proc/zoneinfo parsing technique to establish a known UNIX cache state before starting the exploit attempt.
Because of this we used the /proc/zoneinfo parsing technique to establish a known UNIX cache state before starting the exploit attempt. By monitoring the raw page allocation counters in /proc/zoneinfo, we can observe exactly when the cache grabs a fresh page from the system.


Because of this we used the /proc/zoneinfo parsing technique to establish a known UNIX cache state before starting the exploit attempt.
This is done in the get_fresh_unix() function.
One problem that we have to solve is that when a unix socket an allocation is also made from sock_inode_cache, which uses slabs of the same size (0x4000) as the UNIX cache, causing issues with detecting a new UNIX slab.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One problem that we have to solve is that when a unix socket an allocation is also made from sock_inode_cache, which uses slabs of the same size (0x4000) as the UNIX cache, causing issues with detecting a new UNIX slab.
One problem that we have to solve is that when a unix socket is allocated, an allocation is also made from sock_inode_cache, which uses slabs of the same size (0x4000) as the UNIX cache, causing issues with detecting a new UNIX slab.

At this point we have a struct sock object linked in the gc_inflight_list that we can fill with arbitrary data.
This list is used by unix_gc() and if we are able to craft a fake sock object convincing enough that unix_gc() will be able to traverse the gc_inflight_list and move sk_buff objects from our sock object to the 'hitlist' that will be passed to the skb_queue_purge().

unix_gc() uses list handling functions to move the victim object between lists multiple times and CONFIG_DEBUG_LIST is on, so our object has to have valid prev/next list pointers.
Copy link
Collaborator

@artmetla artmetla Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unix_gc() uses list handling functions to move the victim object between lists multiple times and CONFIG_DEBUG_LIST is on, so our object has to have valid prev/next list pointers.
unix_gc() uses list handling functions to move the victim object between lists multiple times and CONFIG_DEBUG_LIST is on, so our object has to have valid prev/next list pointers. Because the kernel strictly verifies list integrity (e.g., node->next->prev == node), these pointers cannot be dummy values; they must be precisely calculated to interlock with the kernel's actual gc_inflight_list to avoid a panic.

Copy link
Collaborator

@artmetla artmetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the comments, answer them and if needed apply fixes to the exploit.md description

Copy link
Collaborator

@artmetla artmetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lambdasprocket Please have a look at the comments and make necessary fixes or answerers. Looking forward to hearing from you.


for (int i = 0; i < 15; i++)
{
prepare_sock(g_mmapped_buf + 1088*i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number used for offset multiplier. Use a #define for the size or add a comment explaining the value.

See the 'Name and/or comment numeric constants' section of the style guide.

static uint64_t g_kernel_text;
char *g_stack1;
#define STACK_SIZE (1024 * 1024) /* Stack size for cloned child */
uint64_t leak_kernel_text();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function declaration. Remove the unused function declaration.

Check the 'Unused code' section of the style guide.


static int g_pwned;
static char *g_rop2;
static size_t g_rop2_len;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused global variable. Remove the unused variable.

See the 'Unused code' section of the style guide.

static char *g_rop2;
static size_t g_rop2_len;

#define ROP2_CONST_AREA 0x10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused macro definition. Remove the unused macro.

Check the 'Unused code' section of the style guide.

static size_t g_rop2_len;

#define ROP2_CONST_AREA 0x10
#define ROP2_CONST_OFFSET 0x200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused macro definition. Remove the unused macro.

Check the 'Unused code' section of the style guide.

ret = recvmsg(sock, &msg, MSG_DONTWAIT);

if (ret < 0) {
// perror("recvmsg unix");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused commented out code is left in the source file. Remove the commented out code.

Check the 'Commented out code' section of the style guide.

Comment on lines +438 to +439
// if (ret < 0)
// perror("sendmsg fd\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused commented out code is left in the source file. Remove the commented out code.

See the 'Commented out code' section of the style guide.


static char *g_sh_argv[] = {"sh", NULL};

static int g_status;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global variable is used where a local variable would suffice. Move the variable inside the after_pwn function.

See the 'Usage of global variables instead of local ones' section of the style guide.

uint64_t leak_direct_mapping();


static int g_event1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global variable is used where a local variable would suffice. Move the variable inside main and pass it via the carg structure.

Check the 'Usage of global variables instead of local ones' section of the style guide.

timerfd_settime(carg->tfd, 0, &its, NULL);
close(carg->gc);

volatile uint64_t v;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A one-character variable name is used for a non-iterator variable. Use a more descriptive name for the variable.

Check the 'Naming conventions' section of the style guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kCTF: vuln OK The submission exploits the claims vulnerability (passed manual verification)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants