Skip to content

Conversation

@lyp256
Copy link
Contributor

@lyp256 lyp256 commented Jan 15, 2026

fix unshare.MaybeReexecUsingUserNamespace() infinite loop without CGO_ENABLED.

fixes: #160

@github-actions github-actions bot added the storage Related to "storage" package label Jan 15, 2026
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 15, 2026
@podmanbot
Copy link

✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6637

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 15, 2026
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 15, 2026
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I can’t see how this can possibly work. AFAICT the code with _Containers-pid-pipe and _Containers-continue-pipe exists in C became we need it to run (and give all the process environment changes in unshare.Cmd.Start an opportunity to happen) before the Go runtime starts creating any OS-level threads.

We can mimic the child process’ behavior in Go but I don’t see how we can reliably achieve the same outcome.

Cc: @giuseppe

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this supposed to work?

Is this AI-generated?

Comment on lines 29 to 45
attr.AmbientCaps = []uintptr{
unix.CAP_CHOWN,
unix.CAP_DAC_OVERRIDE,
unix.CAP_DAC_READ_SEARCH,
unix.CAP_FOWNER,
unix.CAP_FSETID,
unix.CAP_SYS_ADMIN,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not tested this PR but this won't work, we need all caps because we launch containers from this environment (and they don't need to be in the ambient set)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've compared processes with C code and those without, and you're right—the C process does include all capabilities. However, I'm not entirely sure if adding all privileges is the correct decision. The reason I added these capabilities in my code was that when using storage, it directly attempted to delete a folder with permissions r-xr-xr-x, which resulted in a Permission deniederror. Adding the capabilities was my workaround to resolve this permission issue. Based on your suggestion, I will proceed with adding all the capabilities.

Copy link
Contributor Author

@lyp256 lyp256 Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found any alternative methods to add capabilities other than using the ambient set. Do you have any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you create a user namespace you automatically get them. Is Go dropping them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my testing, I found that child processes will not have capabilities unless the ambient method is used. I collected relevant child process information from the /proc/<pid>/status file under three scenarios: using CGO, not using CGO without setting ambient capabilities, and not using CGO with ambient capabilities set.

Using CGO:

CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

Not using CGO and not setting ambient capabilities:

CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

Not using CGO but setting ambient capabilities:

CapInh: 000001ffffffffff
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 000001ffffffffff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This affects also processes executed by Podman, could you investigate why this is happening? An alternative would be to drop the ambient caps once we are in the namespace

@lyp256
Copy link
Contributor Author

lyp256 commented Jan 16, 2026

Thanks!

I can’t see how this can possibly work. AFAICT the code with _Containers-pid-pipe and _Containers-continue-pipe exists in C became we need it to run (and give all the process environment changes in unshare.Cmd.Start an opportunity to happen) before the Go runtime starts creating any OS-level threads.

We can mimic the child process’ behavior in Go but I don’t see how we can reliably achieve the same outcome.

Cc: @giuseppe

how is this supposed to work?

Is this AI-generated?

I needed to use statically compiled Skopeo and Buildah for container image migration tasks in an offline environment. I compiled the necessary executables using commands similar to:

CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -tags exclude_graphdriver_btrfs,containers_image_openpgp skopeo/cmd/skopeo.

However, I encountered an issue where the unsharepackage did not function properly when CGO_ENABLED=0. While Skopeo worked correctly in most scenarios, operations involving containers-storage(which call unshare.MaybeReexecUsingUserNamespace, e.g. skopeo copy docker://docker.io/library/alpine:3 containers-storage:alpine:3) resulted in an infinite loop of child process creation.

To resolve this, I modified the code to directly pass syscall.CLONE_NEWUSER via exec.Cmd.SysProcAttr.Cloneflags. In the initfunction, I implemented logic to mimic the operations related to _Containers-pid-pipe and _Containers-continue-pipe as done in the C code. After recompiling both Buildah and Skopeo with these modifications, they now function as expected.

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 16, 2026
@mtrmac
Copy link
Contributor

mtrmac commented Jan 16, 2026

I implemented logic to mimic the operations related to _Containers-pid-pipe and _Containers-continue-pipe as done in the C code. After recompiling both Buildah and Skopeo with these modifications, they now function as expected.

None of this addresses the concern that doing any such changes after the Go runtime starts is too late. What am I missing? (Giuseppe?)

The conversation in #160 was AFAICT saying that to resolve this

MaybeReexecUsingUserNamespace can fail with an useful error message

I realize that does not help your use case but that doesn’t mean that we can ignore the constraints (if there indeed are any).

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 19, 2026
@lyp256
Copy link
Contributor Author

lyp256 commented Jan 19, 2026

Under Giuseppe guidance, I re-investigated the implementation of unshare. After reading the source code related to golang's exec.Cmd, I found that exec.Cmd.Start() handles uid/gid mapping by writing directly to the u(g)id_map files. Due to Linux kernel restrictions, ordinary users cannot write additional ID mappings to the uid_mapand gid_mapfiles directly; this requires using the newuidmap and newgidmap commands in conjunction with the /etc/subuid and /etc/subgid files.

The golang exec.Cmd.Start() method does not use the newu(g)idmap commands. Writing directly to the u(g)id_map files results in permission issues. If the child process does not have the uid/gid mapping set, the system will not grant capabilities to the child process.

The current approach uses the EXECVE system call to handle the issue of writing ID mappings too late (a step I had previously missed, which necessitated the Ambient set). The complete process is roughly as follows:

The main process uses clone() to create a child process -> The child process uses execve() to start the child program -> The parent and child processes interactively complete the ID mapping -> The child process uses execve() again to re-execute the child program.

Please review @mtrmac @giuseppe

Signed-off-by: lyp256 <lyp256@qq.com>
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 19, 2026
@lyp256 lyp256 requested a review from giuseppe January 20, 2026 01:27
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested locally.

LGTM

@lyp256 lyp256 requested a review from mtrmac January 22, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reexec causes infinite loop, if built without CGO_ENABLED

4 participants