Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 8 additions & 13 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ The following files are on the **prohibited modifications list**. Agents MUST NO
| `kernel/src/interrupts/timer.rs` | Timer fires every 1ms - <1000 cycles budget |
| `kernel/src/interrupts/timer_entry.asm` | Assembly timer entry - must be minimal |

### Tier 2: High Scrutiny (explain why GDB is insufficient)
### Tier 2: High Scrutiny (explain why change is required)
| File | Reason |
|------|--------|
| `kernel/src/interrupts/context_switch.rs` | Context switch path - timing sensitive |
Expand All @@ -250,11 +250,11 @@ The following files are on the **prohibited modifications list**. Agents MUST NO

If you believe you must modify a prohibited file:

1. **Explain why GDB debugging is insufficient** for this specific problem
1. **Explain why the change is required** and why nonintrusive debugging isn't enough
2. **Get explicit user approval** before making any changes
3. **Never add logging** - use GDB breakpoints instead
3. **Never add logging** - use nonintrusive debugging if needed
4. **Remove any temporary debug code** before committing
5. **Test via GDB** to verify the fix works
5. **Verify via boot stages or targeted tests** (GDB optional)

### Detecting Violations

Expand Down Expand Up @@ -312,18 +312,13 @@ Before approving changes to interrupt/syscall code:
- [ ] No heap allocations
- [ ] Timing-critical paths marked with comments

## GDB-Only Kernel Debugging - MANDATORY
## GDB Debugging - Recommended (Not Required)

**ALL kernel execution and debugging MUST be done through GDB.** This is non-negotiable.
GDB is the preferred tool for root-cause debugging of timing-sensitive or low-level issues. Boot stages and end-to-end boot task tests are the default for verification and CI.

Running the kernel directly (`cargo run`, `cargo test`, `cargo run -p xtask -- boot-stages`) without GDB:
- Provides only serial output, which is insufficient for timing-sensitive bugs
- Cannot inspect register state, memory, or call stacks
- Cannot set breakpoints to catch issues before they cascade
- Cannot intercept panics to examine state
- Burns context analyzing log output instead of actual debugging
Running without GDB provides only serial output; that's often sufficient for boot-stage verification, but it won't help when you need register state, memory inspection, or breakpoints.

### Interactive GDB Session (PRIMARY WORKFLOW)
### Interactive GDB Session (Optional Workflow)

Use `gdb_session.sh` for persistent, interactive debugging sessions:

Expand Down
42 changes: 42 additions & 0 deletions docs/planning/AMD64_SIGALTSTACK_FAILURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# AMD64 sigaltstack() Failure Analysis (Boot Stage 72/258)

## Context
- Boot stages report failure at stage 72/258: sigaltstack() syscall verified.
- Meaning: sigaltstack() or SA_ONSTACK signal delivery is failing in AMD64.

## What I Could Not Retrieve
- The referenced GitHub Actions job log requires authentication, so I could not pull the exact failure output from that URL in this environment.

## Likely Root Cause (Code Analysis)
The signal delivery paths currently save the *handler* stack pointer as the return stack pointer when SA_ONSTACK is used. That means sigreturn restores to the alternate stack instead of the original main stack.

This is incorrect POSIX behavior and can cause:
- The process to continue executing on the alternate stack after the handler returns.
- Subsequent sigaltstack() calls to behave unexpectedly.
- Failures in the sigaltstack_test that expects normal execution to continue on the main stack.

### Evidence in Code
- `kernel/src/signal/delivery.rs` (x86_64 and aarch64):
- `SignalFrame.saved_rsp` / `saved_sp` is set to `user_rsp` / `user_sp`.
- When SA_ONSTACK is used, `user_rsp` / `user_sp` is the *alternate* stack top.
- This is the value restored by sigreturn.
- `kernel/src/syscall/handler.rs` (syscall-return delivery path):
- `SignalFrame.saved_rsp` is set to `user_rsp` in `deliver_to_user_handler_syscall()`.
- For SIGUSR1 delivered on syscall return (the sigaltstack test path), this is the hot path.

## Fix Required
- Save the *original* user stack pointer into `SignalFrame.saved_rsp` / `saved_sp`.
- Continue to use the alternate stack for the handler frame placement only.

### Status
- Updated `kernel/src/signal/delivery.rs` to save the original stack pointer for both x86_64 and ARM64 signal delivery.
- The syscall-return path fix **still needs to be applied** in `kernel/src/syscall/handler.rs` (Tier-1 prohibited file; requires explicit approval).

## Next Steps
1. Apply the same saved_rsp fix in `kernel/src/syscall/handler.rs` (needs approval).
2. Run boot-stages or targeted signal tests to confirm `SIGALTSTACK_TEST_PASSED`.
3. If failure persists, inspect for:
- SA_ONSTACK flag propagation in sigaction
- alt stack address validation vs user space bounds
- any failure to clear `alt_stack.on_stack` in sigreturn paths

217 changes: 206 additions & 11 deletions docs/planning/ARM64_FEATURE_PARITY_PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,61 @@ This plan is deliberately frank about gaps found in the current ARM64 code path.
- Kernel-mode graphics terminal + kernel shell loop.
- Minimal syscall entry/exit path for EL0.

### Recent Progress (parity wiring)
- ARM64 execv supports argv + ext2/test-disk fallback.
- wait4/waitpid implemented for ARM64.
- Core FS/IO/pipe/poll/select/ioctl/session/pty syscalls wired on ARM64.
- TCP enabled on ARM64 socket syscalls.
- ARM64 boot now attempts `/bin/init_shell` from ext2 before test-disk fallback.
- devptsfs is initialized on ARM64 at boot.
- TTY subsystem is initialized on ARM64 at boot.
- Ext2 disk builder supports ARM64 (`scripts/create_ext2_disk.sh --arch aarch64`); ARM64 QEMU script prefers ext2 image.
- ARM64 userspace build script now includes coreutils/telnetd (best-effort) for ext2 population.

### What Is Missing or Stubbed
- Userspace syscalls for FS/TTY/PTY/session/pipe/select/poll on ARM64.
- Userspace shell (init_shell) running from disk.
- File-based exec path for ARM64 (uses test disk loader only).
- PTY/TTY validation under ARM64 userspace load (devptsfs now initialized).
- Userspace shell (init_shell) running from ext2 disk image (ARM64 binaries).
- ARM64 ext2 image still lacks full coreutils coverage (depends on ARM64 userspace builds).
- Userspace test harness / boot stage parity for ARM64.
- Proper kernel heap allocator (ARM64 uses a bump allocator).
- User pointer validation uses x86_64 canonical split (unsafe on ARM64 identity map).
- User pointer validation needs full audit against ARM64 VMA/layout.
- Full scheduler/quantum reset and signal delivery on ARM64 return paths.
- TCP sockets on ARM64 are explicitly blocked.
- TCP sockets enabled but not validated under ARM64 userspace load.

## High-Risk Gaps (Blockers)
1. **User pointer validation is unsafe on ARM64**
- `kernel/src/syscall/userptr.rs` uses x86_64 canonical split; kernel memory can be treated as user.
2. **ARM64 syscall coverage is incomplete**
- Many syscalls return ENOSYS in `kernel/src/arch_impl/aarch64/syscall_entry.rs`.
1. **User pointer validation needs full ARM64 audit**
- Range checks exist, but must align with actual ARM64 VMA/layout and page fault behavior.
2. **PTY/TTY path unverified on ARM64**
- devptsfs is initialized, but PTY/TTY behavior under userspace is unproven.
3. **Kernel-mode shell is not parity**
- Userspace init_shell depends on TTY/PTY/syscalls; current ARM64 uses `kernel/src/shell/mod.rs`.
4. **Memory subsystem parity not reached**
- ARM64 boot uses hard-coded ranges and a bump allocator in `kernel/src/main_aarch64.rs`.

## AMD64 vs ARM64 Parity Matrix (Frank Status)

This section is deliberately blunt about what is missing on ARM64 compared to AMD64.

| Subsystem | AMD64 status (baseline) | ARM64 current state | Gap / risk | Required work |
| --- | --- | --- | --- | --- |
| Boot + MMU | High-half kernel + HHDM stable; CR3 behavior mature | High-half transition in progress; TTBR split booting but still evolving | Wrong mappings or identity-map assumptions break drivers | Finish high-half + HHDM mapping; remove identity-map assumptions |
| Memory map / discovery | Uses platform-provided memory map | ARM64 uses fixed ranges; no DTB memory map integration | Wrong RAM sizing, allocator bugs | Parse DTB memory map and feed allocator |
| Kernel heap | Tiered allocator / real heap | ARM64 uses bump allocator | Fragmentation, OOM under load | Enable full allocator on ARM64 |
| User pointers | Validated for x86_64 layout | ARM64 userptr was unsafe; now partially aligned with high-half | Security risk + EFAULT mismatch | Complete ARM64 userptr validation for new VA layout |
| Scheduler + preemption | Preemptive scheduling stable | ARM64 preemption not fully validated | Timing bugs, missed signals | Ensure timer IRQ drives scheduler; verify preemption on ARM64 |
| Signal delivery | AMD64 SA_ONSTACK + sigreturn working | ARM64 delivery path exists but not parity-verified | SA_ONSTACK, sigreturn, mask restore on ARM64 | Validate signal delivery on ARM64 and fix path divergences |
| Syscall coverage | Broad syscall set for tests/shell | Core syscalls wired; ARM64 coverage largely matches AMD64 | Unverified correctness on ARM64 | Validate syscall tests under ARM64 and fix ABI/edge cases |
| Exec / ELF | Exec from ext2 works; argv supported | ARM64 execv supports argv + ext2/test-disk fallback | Userspace shell not yet proven | Validate exec + argv under ARM64 userspace |
| VFS/ext2 | VFS + ext2 stable | ARM64 syscalls wired; ext2 mounted at boot | Unverified under userspace load | Validate ext2 + VFS on ARM64 |
| devfs / devpts | Working on AMD64 | devfs + devptsfs initialized on ARM64 | Unverified under userspace | Validate devptsfs with PTY allocation on ARM64 |
| TTY + PTY | Full interactive shell + job control | PTY syscalls wired; devptsfs mounted | No interactive userspace yet | Validate TTY line discipline + job control on ARM64 |
| VirtIO block | AMD64 stable (PCI) | ARM64 MMIO driver in progress | Storage I/O unreliable | Confirm MMIO queues + IRQs + HHDM DMA |
| VirtIO net | AMD64 stable | ARM64 MMIO wired; TCP enabled | Unverified under ARM64 userspace | Validate RX/TX + TCP tests on ARM64 |
| VirtIO GPU/input | AMD64 stable | ARM64 MMIO in progress | No interactive UI | Confirm MMIO registers + input routing |
| IPC (pipes, sockets) | Pipes, UNIX sockets, UDP/TCP | ARM64 IPC syscalls wired | Unverified under ARM64 userspace | Validate IPC/poll/select tests on ARM64 |
| Userland shell | init_shell + coreutils on ext2 | Kernel shell only | Not parity | Build/install ARM64 userland and boot into init_shell |
| CI / tests | Boot stages + userspace tests | ARM64 manual workflow only | No parity signal in CI | Add ARM64 parity subsets once core syscalls work |

## Parity Scope (Definition of Done)
- Boot into EL0 init_shell from ext2 filesystem image.
- TTY input + canonical/raw modes + job control, signals, Ctrl-C.
Expand Down Expand Up @@ -70,6 +106,9 @@ Deliverables:
- Kernel heap allocator enabled on ARM64.
- Userspace pointer validation blocks kernel addresses.

Execution note (in progress):
- High-half kernel + TTBR0/TTBR1 split is now being implemented in `boot.S` + `linker.ld`.

Primary files:
- `kernel/src/main_aarch64.rs`
- `kernel/src/arch_impl/aarch64/mmu.rs`
Expand All @@ -92,9 +131,9 @@ Primary files:
- `kernel/src/arch_impl/aarch64/context_switch.rs`

## Phase 3 - Syscall Parity (Core)
- Remove ARM64 ENOSYS stubs for FS/TTY/PTY/session/pipe/select/poll.
- Wire shared syscall modules for ARM64 by loosening `cfg(target_arch)` gates.
- ✅ Core syscall wiring done (FS/TTY/PTY/session/pipe/select/poll/ioctl/exec/wait4).
- Validate ARM64 ABI struct layouts for stat/dirent/time/sigset.
- Run syscall-heavy userspace tests on ARM64 and fix edge cases.

Deliverables:
- ARM64 passes syscall tests that currently pass on AMD64.
Expand Down Expand Up @@ -199,3 +238,159 @@ Primary files:
1. Fix ARM64 user pointer validation and memory map plumbing.
2. Wire syscall modules and remove ARM64 ENOSYS stubs for FS/TTY/PTY.
3. Boot into userspace init_shell from ext2 disk image.

---

# Parity Checklist (Living Document)
This checklist captures **what must match AMD64**. ARM64 status is intentionally blunt; any "unknown" item requires a concrete audit pass.

Legend: `[x]` parity verified, `[~]` partial/in-progress, `[ ]` missing/unknown

## Boot & Initialization
- [ ] UEFI/DTB memory map consumed and trusted (no static ranges)
- [ ] Per-CPU structures allocated and initialized
- [ ] SMP bring-up parity (APs start, enter scheduler)
- [ ] Userspace init process launched from filesystem image

## Memory & MMU
- [ ] VMA + COW flows usable by ARM64 page tables
- [ ] User/kernel address split enforced by userptr checks
- [ ] Kernel heap allocator active (no bump allocator)
- [ ] Fault handling parity (page faults, permissions, user faults)

## Scheduling, Signals, and Timers
- [ ] Preemptive scheduling with timer-based quantum reset
- [ ] Signal delivery path (incl. alt stack) matches AMD64
- [ ] sigreturn restores correct context on ARM64
- [ ] Timer IRQ handling is minimal and timing-safe

## Syscall Surface Parity
- [ ] FS syscalls (open/read/write/getdents/fstat/close/etc)
- [ ] TTY/PTY/session/setsid/ioctl
- [ ] pipe/dup/poll/select
- [ ] process (fork/exec/wait/exit/getpid)
- [ ] time (clock_gettime, nanosleep, etc)
- [ ] socket (UDP/TCP), bind/connect/accept/listen

## Filesystem & Storage
- [ ] ext2 read/write parity
- [ ] VFS + devfs + devpts mount parity
- [ ] VirtIO block MMIO: IRQ + queue features stable

## TTY/PTY & Shell
- [ ] VirtIO input routed to TTY line discipline
- [ ] /dev/pts functional (PTY pairs)
- [ ] Userspace init_shell runs with job control + signals

## Networking
- [ ] VirtIO net MMIO RX/TX stable
- [ ] UDP userspace tests pass
- [ ] TCP userspace tests pass (no ARM64 block)
- [ ] DNS/HTTP userspace tests pass

## Drivers & Graphics
- [ ] VirtIO GPU usable by userspace terminal
- [ ] VirtIO input/keyboard parity
- [ ] Any ARM64-specific device quirks documented

## CI/Test Parity
- [ ] ARM64 build is warning-free
- [ ] ARM64 test subset defined and tracked
- [ ] Boot stages (or equivalent) executed for ARM64

---

# Analysis Workstreams (Deep Diff Required)
This is the concrete work needed to **prove** AMD64 ↔ ARM64 parity and identify every gap.

## Workstream A - Syscall Matrix Diff
Goal: build an explicit list of syscalls that are implemented on AMD64 but ENOSYS or stubbed on ARM64.

Tasks:
- Inventory AMD64 syscall table and mapping (source of truth).
- Inventory ARM64 syscall entry mapping and `cfg(target_arch)` gates.
- Produce a per-syscall matrix with status: OK / stubbed / missing / ABI mismatch.
- Highlight syscalls required for init_shell + tests.

Deliverable:
- A table appended here or in a sibling doc: `ARM64_SYSCALL_MATRIX.md`.
- Current artifact: `docs/planning/ARM64_SYSCALL_MATRIX.md`.
- Porting checklist: `docs/planning/ARM64_SYSCALL_PORTING_CHECKLIST.md`.

## Workstream B - User/Kernel Memory Safety Audit
Goal: ensure ARM64 user memory validation and page table policy match AMD64 behavior.

Tasks:
- Audit `kernel/src/syscall/userptr.rs` and architecture-specific splits.
- Verify page fault handler parity (error codes, user vs kernel faults).
- Validate `ProcessPageTable` integration for ARM64 mappings.

Deliverable:
- Summary of differences and exact code locations; explicit fixes.
- Current artifact: `docs/planning/ARM64_USERPTR_AUDIT.md`.
- Memory layout diff: `docs/planning/ARM64_MEMORY_LAYOUT_DIFF.md`.

## Workstream C - Exec/ELF/Process Parity
Goal: ensure ARM64 exec path is real filesystem-backed, not test-only loader.

Tasks:
- Audit ARM64 ELF loader for correct auxv, stack layout, and permissions.
- Confirm execve path is shared and not gated for AMD64 only.
- Confirm fork/exec/wait semantics in scheduler and process manager.

Deliverable:
- A minimal boot-to-shell scenario documented with steps.

## Workstream D - Device & IRQ Path Parity
Goal: ensure VirtIO MMIO and IRQ routing is complete for block/net/input/gpu.

Tasks:
- Compare VirtIO MMIO feature negotiation and IRQ ack/EOI paths.
- Validate timer IRQ performance and preemption behavior.
- Confirm device drivers do not assume x86-specific features.

Deliverable:
- Driver parity checklist with explicit IRQ and feature gaps.

## Workstream E - Filesystem & TTY/PTY Parity
Goal: ensure init_shell has full TTY and filesystem semantics.

Tasks:
- Confirm devfs/devpts mount parity at boot.
- Validate PTY allocation and session leadership syscalls on ARM64.
- Ensure TTY line discipline receives VirtIO input.

Deliverable:
- A matrix of required shell syscalls and their ARM64 status.

---

# Milestones and Exit Criteria

## Milestone 1 - "Boot to Userspace"
Exit criteria:
- ARM64 boots to EL0 init_shell from ext2 image.
- Basic TTY input works (echo, backspace, newline).

## Milestone 2 - "Core Shell Workflow"
Exit criteria:
- `/bin/ls`, `/bin/cat` run from disk.
- Job control and Ctrl-C work.
- No kernel shell fallback in normal path.

## Milestone 3 - "Networking Online"
Exit criteria:
- UDP/TCP tests pass on ARM64.
- DNS/HTTP userspace tests pass.

## Milestone 4 - "Parity Lock"
Exit criteria:
- ARM64 passes the same userspace test suite as AMD64 (or documented, justified exceptions).
- No ARM64-only hacks in hot paths.

---

# Verification Strategy
- Use AMD64 tests as the gold standard; define the ARM64 subset explicitly and expand it to parity.
- Require warning-free ARM64 builds.
- Validate each subsystem with a minimal userspace test (filesystem, TTY, signals, networking).
54 changes: 54 additions & 0 deletions docs/planning/ARM64_HANDOFF_2026-01-27.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# ARM64 Parity Handoff (2026-01-27)

This is a short, concrete handoff of the current ARM64 parity effort and where it stands.

## Current Branch / Status
- Branch: `feature/arm64-parity`
- Work is being executed step-by-step and reflected in the parity docs after each major change.
- Most recent commits:
- `fc6dea8` arm64: expand userspace build list for ext2
- `f5f19fe` arm64: add ext2 disk support for userspace
- `a3b7eae` arm64: init tty at boot and update plan
- `13486d9` arm64: enable devptsfs and refresh parity docs
- `bea6750` arm64: execv/wait4, tcp sockets, ext2 init_shell

## Planning Docs (authoritative)
- `docs/planning/ARM64_FEATURE_PARITY_PLAN.md` (main plan + phased checklist)
- `docs/planning/ARM64_SYSCALL_MATRIX.md` (current syscall parity snapshot)
- `docs/planning/ARM64_SYSCALL_PORTING_CHECKLIST.md` (porting checklist)
- `docs/planning/ARM64_USERPTR_AUDIT.md` (user pointer validation audit)
- `docs/planning/ARM64_MEMORY_LAYOUT_DIFF.md` (ARM64 VA layout notes)

These are kept up to date as parity work progresses.

## What Was Just Completed
- ARM64 execv now supports argv and ext2-backed exec, with test-disk fallback.
- wait4/waitpid wired for ARM64.
- TCP sockets enabled on ARM64 (no longer EAFNOSUPPORT gate).
- devptsfs enabled on ARM64 at boot; TTY subsystem initialized at boot.
- ext2 disk builder now supports ARM64 (`scripts/create_ext2_disk.sh --arch aarch64`).
- ARM64 QEMU script prefers ext2 image if present.
- ARM64 userspace build list expanded to include coreutils + telnetd (best-effort).

## Current Gaps (Still Blocking Full Parity)
- ARM64 userspace binaries installed on ext2 image (coreutils coverage still TBD).
- PTY/TTY behavior unverified under ARM64 userspace load.
- Scheduler/preemption validation under userspace load.
- Memory map + allocator parity (ARM64 still uses bump allocator).
- ARM64 test harness / boot stage parity subset not established.

## How To Continue (Next Concrete Steps)
1) Build ARM64 userspace binaries (best-effort list):
- `cd userspace/tests && ./build-aarch64.sh`
2) Create ARM64 ext2 image with those binaries:
- `./scripts/create_ext2_disk.sh --arch aarch64`
3) Boot ARM64 with ext2 image preferred:
- `./scripts/run-arm64-graphics.sh release`

(Testing not run in this handoff; these are only provided as next steps.)

## Notes / Constraints
- Do not modify Tier 1/Tier 2 prohibited files without explicit approval.
- Avoid adding logging to interrupt/syscall hot paths.
- Keep parity docs updated after each milestone (plan + syscall matrix).

Loading
Loading