Skip to content

Conversation

@ryanbreen
Copy link
Owner

Summary

This PR adds a comprehensive parallel boot test framework for ARM64 and brings the architecture to functional parity with x86_64 for core subsystems. 61 tests now pass across 12 subsystems.

What's Included

Test Framework Infrastructure

  • kernel/src/test_framework/ - Complete test infrastructure
  • Parallel execution via kthreads (one per subsystem)
  • Graphical progress bars rendered to framebuffer
  • Serial output markers for CI/automation

Test Coverage (61 tests)

Subsystem Tests Status
memory 23 ✅ PASS
scheduler 3 ✅ PASS
interrupts 7 ✅ PASS
filesystem 4 ✅ PASS
network 4 ✅ PASS
ipc 6 ✅ PASS
process 4 ✅ PASS
syscall 4 ✅ PASS
timer 4 ✅ PASS
logging 3 ✅ PASS
system 3 ✅ PASS

Key Technical Fixes

  • CpuContext: Added x1-x18 fields for complete ARM64 register save/restore during context switch
  • test_stack_layout: Added #[inline(never)] to prevent compiler optimization defeating the test
  • test_timer_interrupt_running: Use hardware timers (CNTVCT_EL0) instead of unreliable spin loops
  • reset_quantum: Fixed infinite recursion bug in socket.rs

Known Gaps (Future Work)

These items are documented and need follow-up PRs:

IPC Thread-Wake Semantics

The wake_read_waiters() and wake_write_waiters() functions in pipe.rs are stubbed on ARM64:

#[cfg(target_arch = "aarch64")]
let _ = waiters; // On ARM64 we don't have a scheduler yet

Impact: IPC tests pass (data structures work) but blocking pipe operations won't wake waiting threads.

Features Tested on x86_64 but Not ARM64

  • Userspace execution (syscall entry/exit)
  • Signal delivery
  • Full process lifecycle (fork/exec/wait)
  • Blocking I/O with proper scheduler integration
  • CoW (Copy-on-Write) memory

Test Plan

# Build ARM64 kernel
cargo build --release --target aarch64-breenix.json \
  -Zbuild-std=core,alloc -Zbuild-std-features=compiler-builtins-mem \
  -p kernel --bin kernel-aarch64

# Run with graphics to see progress bars
BREENIX_GRAPHICS=1 ./scripts/run-arm64-qemu.sh

# Expected output: All 61 tests pass

Screenshots

The parallel test runner displays real-time progress bars showing test status across all subsystems.


🤖 Generated with Claude Code

ryanbreen and others added 14 commits January 27, 2026 10:21
Implement ARM64 preemptive scheduling by connecting the scheduler to
exception return paths:

IRQ return path (boot.S):
- After handle_irq, check SPSR_EL1.M bits to determine if returning to EL0
- Call check_need_resched_and_switch_arm64 before restoring registers
- Only reschedule when returning to userspace to avoid lock deadlocks

Syscall return path (syscall_entry.rs):
- Wire check_need_resched_and_switch_aarch64 to call the real switcher
- Scheduler quantum reset after context switch

Fixes:
- Kernel code is not preemptible - prevents deadlock when IRQs fire
  during operations that hold the scheduler lock
- Remove redundant check code, use early return for kernel mode

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The signal handler setup warnings ("Could not set up SIGINT handler") were
caused by sys_sigaction failing because current_thread_id() returned the
idle thread (0) instead of the init_shell thread.

Root cause: The old code used spawn() which:
1. Added thread to ready_queue
2. Set need_resched = true
3. Then we removed thread from ready_queue
4. Then we set current_thread

But when the timer fired before the shell finished starting, schedule()
was called, and it would switch from thread 2 (init_shell) to idle (0)
because need_resched was still true and the ready queue was empty.

Fix: Add spawn_as_current() which:
- Adds thread to scheduler's thread list (for lookups by syscalls)
- Sets it as current_thread immediately
- Does NOT add to ready_queue
- Does NOT set need_resched

This ensures current_thread_id() returns the correct thread ID when
sys_sigaction is called during shell initialization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove #[cfg(target_arch = "x86_64")] guards from PTY syscalls
  (posix_openpt, grantpt, unlockpt, ptsname) to enable them on ARM64
- Add PtyMaster/PtySlave handlers to sys_read and sys_write
- Add /dev/pts/* path handling to ARM64 sys_open stub
- Fix PTY echo pollution: master_write no longer echoes to slave_to_master
  buffer since terminal emulator handles its own display
- Prevent ARM64 scheduler from switching userspace threads to idle,
  as idle runs in EL1 and won't be preempted until next IRQ
- Update scripts for ARM64 ext2 disk with pty_test support

All PTY tests pass under ARM64 userspace:
- posix_openpt, grantpt, unlockpt, ptsname syscalls
- Open slave device via /dev/pts/N
- Master -> slave data flow
- Slave -> master data flow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add net::init() call to ARM64 boot sequence
- Add VirtIO network device to QEMU ARM64 script
- Enable STATUS feature for link awareness
- Add interrupt acknowledgment and DSB barrier in RX path
- Add udp_socket_test to ARM64 build and ext2 disk

Network driver initializes correctly and TX works (ARP requests sent).
However, RX packets are not being received - the used_idx in the RX
queue never changes even though interrupts fire. This appears to be
a deeper issue with VirtIO MMIO network on ARM64 QEMU that requires
further investigation.

Working:
- MAC address detection
- Link status (reports link up)
- Packet transmission (ARP requests complete)
- Queue setup and buffer posting

Not working:
- Packet reception (used_idx stays 0)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The VirtIO network header was 12 bytes (including num_buffers field) but for
legacy VirtIO (v1) without the MRG_RXBUF feature negotiated, it should only
be 10 bytes. The extra 2 bytes were prepended to every TX packet, corrupting
the Ethernet frame and causing QEMU's SLIRP to see malformed packets.

Root cause analysis:
- Captured TX packet showed: 00 00 ff ff ff ff ff ff 52 54 ...
- The "00 00" prefix shifted the entire Ethernet header by 2 bytes
- This caused ARP responses to never arrive (SLIRP dropped malformed frames)

Fix:
- Remove num_buffers field from VirtioNetHdr struct
- Header is now correctly 10 bytes for legacy mode
- ARP resolution and ICMP ping now work correctly

Also adds optional debug flags to run-arm64-qemu.sh:
- BREENIX_NET_DEBUG=1 enables packet capture to /tmp/breenix-packets.pcap
- BREENIX_VIRTIO_TRACE=1 enables QEMU VirtIO tracing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updates the handoff document to reflect:
- All 8 core ARM64 parity tasks completed
- VirtIO network header fix details
- Build status (warning-free)
- Remaining work (CI/test parity)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Graphics parity:
- Add BREENIX_GRAPHICS=1 mode to run-arm64-qemu.sh for headed display
- Use -device virtio-gpu-device and -display cocoa on macOS
- Fix run-arm64-graphics.sh build path to aarch64-breenix
- Add idempotency guard to gpu_mmio::init()

Serial keyboard input:
- Add serial-only mode detection (when no VirtIO GPU)
- Poll PL011 UART for input in main loop via get_received_byte()
- Implement serial shell with echo, command processing, output
- Add ShellState helper methods for buffer management

Usage:
- Serial only: ./scripts/run-arm64-qemu.sh
- With graphics: BREENIX_GRAPHICS=1 ./scripts/run-arm64-qemu.sh

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. TTY graphical output (Task #1):
   - Add ARM64 support to tty/driver.rs write_bytes()
   - Route output to terminal_manager::write_bytes_to_shell()
   - init_shell output now appears in graphical terminal

2. Serial/keyboard input (Task #2):
   - UART interrupt handler now pushes to stdin buffer
   - Implement wake_blocked_readers for ARM64 scheduler
   - Userspace read() syscall can now receive keyboard input

3. CPU spinning (Task #3):
   - Replace spin_loop() with WFI in main loop and idle thread
   - CPU now halts until interrupt instead of busy-spinning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stdin read syscall was spin-waiting with WFI instead of properly
blocking the thread. This caused:
1. CPU spinning (WFI wakes on any interrupt including timer)
2. Thread never actually blocked (stayed as current thread)

Fixed to match x86_64 pattern:
- Call sched.block_current() to remove thread from ready queue
- Set blocked_in_syscall = true
- Check thread.state == Blocked to know when woken
- Clear blocked_in_syscall after resuming

The wake_blocked_readers_try() already calls sched.unblock() which
sets thread state to Ready, allowing the blocking loop to exit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three parallel fixes from Codex agents:

1. Blocking I/O (syscall/io.rs):
   - Add preempt_enable() before WFI loop to allow timer interrupts
   - Add preempt_disable() after to balance syscall entry
   - Check blocking condition before WFI, not after yield_current()

2. Keyboard input debugging (exception.rs, timer_interrupt.rs, stdin.rs):
   - Add debug markers to trace input path
   - Identified race: main loop and interrupt handler compete for UART FIFO
   - 'U'=UART interrupt, 'R'=byte received, 'P'=pushed to stdin, 'V'=VirtIO key

3. CPU idle (main_aarch64.rs):
   - Add debug markers to polling loop
   - WFI now properly halts CPU between timer ticks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removed the serial polling loop that was racing with the UART interrupt
handler for the same FIFO data. This caused a race condition where bytes
could be missed or duplicated.

Changes:
- Remove while loop polling get_received_byte() from kernel_main()
- Remove dead code: process_serial_shell_char(), execute_serial_command()
- Keep VirtIO keyboard polling (uses virtqueues, no interrupt support)
- Main loop now just WFI + VirtIO polling

Serial input now works correctly:
1. UART byte arrives → IRQ 33 fires
2. handle_uart_interrupt() reads from FIFO
3. push_byte_from_irq() adds to stdin buffer, wakes blocked readers
4. Userspace read() returns with data

-171 lines of problematic polling code removed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… fixes)

Add comprehensive ARM64 boot test infrastructure mirroring x86_64 patterns:

Kernel fixes:
- GIC: Set interrupt group to Group 1 for IRQ delivery
- GIC: Add AckCtl bit (GICC_CTLR=0x7) for timer interrupt acknowledgment
- GIC: Filter spurious interrupts (IDs 1020-1022)
- exception.rs: Simplified UART interrupt handler
- main_aarch64.rs: Removed polling loop, kernel shell reads stdin buffer

Test infrastructure:
- scripts/run-arm64-boot-test.sh: 7 test modes (full, timer, interrupt,
  syscall, schedule, signal, network)
- tests/arm64_boot_post_test.rs: Rust test equivalents
- tests/shared_qemu_aarch64.rs: Shared QEMU infrastructure for ARM64

Test results (57/57 passing):
- POST: 22/22
- Interrupt/Timer: 10/10
- Syscall/EL0: 6/6
- Scheduling: 6/6
- Signals: 4/4
- Network: 9/9

KNOWN ISSUES (flagged by validation):
- Tests use string-presence checks, not functional verification
- "not found (may be expected)" pattern hides failures
- Single-character debug markers are too weak
- Tests would pass even with broken functionality

This commit captures working state before addressing validation feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Addresses technical accuracy and intellectual honesty issues identified
in boot test validation audit:

1. Replace single-char debug markers with unique bracketed strings:
   - r → [STDIN_READ], b → [STDIN_BLOCK], w → [STDIN_WAKE]
   - P → [STDIN_PUSH], U → [UART_IRQ], R → [NEED_RESCHED]
   - V → [VIRTIO_KEY]
   - Add raw_serial_str() for lock-free multi-char output

2. Remove "not found (may be expected)" silent failure patterns:
   - 11 instances audited and fixed
   - All were optional features, now properly marked OPTIONAL

3. Add EL0_CONFIRMED hard requirement gating:
   - syscall, schedule, signal test modes now fail immediately
     if userspace never executed
   - Prevents meaningless tests from passing

4. Fix timer frequency verification:
   - Timer IS working at 200 Hz - original measurement was flawed
   - Changed print interval from 1 to 200 (once per second)
   - Added dynamic ticks calculation based on CNTFRQ_EL0

5. Add network connectivity verification:
   - Track ARP request/reply, ICMP request/reply
   - Fail if request sent but no reply received
   - Critical failure if no network device found

6. Document functional verification gaps:
   - Audit completed identifying string-presence tests
     that need functional verification

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive test framework for ARM64 parity validation:

Test Framework Infrastructure:
- kernel/src/test_framework/ with registry, executor, progress, display
- Parallel test execution via kthreads (one per subsystem)
- Graphical progress bars rendered to framebuffer
- Serial output markers for automated validation

61 Tests Across 12 Subsystems:
- memory (23 tests) - heap, stack, page allocation
- scheduler (3 tests) - init, spawn, context switch
- interrupts (7 tests) - controller, timer, exceptions
- filesystem (4 tests) - VFS, devfs, file ops
- network (4 tests) - stack init, virtio, sockets
- ipc (6 tests) - pipes, file descriptors
- process (4 tests) - creation, fork semantics
- syscall (4 tests) - entry, handling, return
- timer (4 tests) - tick counter, delays
- logging (3 tests) - serial output
- system (3 tests) - boot markers

Key Fixes:
- CpuContext: Add x1-x18 fields for full register save/restore
- test_stack_layout: Add #[inline(never)] to prevent optimization
- test_timer_interrupt_running: Use hardware timers for reliable delays
- reset_quantum: Fix infinite recursion bug in socket.rs
- Remove unused PageTableFlags import

Known Gaps (documented for future work):
- IPC thread-wake semantics stubbed on ARM64 (scheduler integration needed)
- Some subsystem tests verify data structures but not full behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ryanbreen ryanbreen merged commit 2682a2a into main Jan 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants