Skip to content

[WIP] Enable QA Testing for macOS#2487

Draft
tallpsmith wants to merge 34 commits intoperformancecopilot:mainfrom
tallpsmith:macos-qa-uplift
Draft

[WIP] Enable QA Testing for macOS#2487
tallpsmith wants to merge 34 commits intoperformancecopilot:mainfrom
tallpsmith:macos-qa-uplift

Conversation

@tallpsmith
Copy link
Contributor

Summary

This PR enables the PCP QA test suite to run on macOS. While PCP builds and runs successfully on macOS, the QA infrastructure has several blockers that prevent the test suite from executing.

Key Fixes

  • Service management: Updated QA infrastructure to work with modern macOS launchd (replacing ancient /etc/hostconfig checks)
  • Library loading: Fixed DYLD_LIBRARY_PATH handling and test binary rpath embedding for macOS
  • PMNS rebuild: Ensure PMNS root file is properly built in CI environments
  • Build system: Added missing include paths for internal PCP headers in dynamic PMDA

Status

Currently debugging the QA test suite in CI. See commits and implementation plan for full details.

CI runs: https://github.com/tallpsmith/pcp/actions/workflows/qa-macos.yml


Note: Draft PR for tracking progress and getting early feedback. Not ready for merge.

macOS deprecated /etc/hostconfig years ago, breaking QA service management.
Replaces brew diy (deprecated) and hostconfig with launchctl bootstrap/bootout
for service control. Adds Cirrus CI QA task and GitHub workflow for validation.
Enables automatic workflow execution on push to test before merging
Avoid uploading 25,809 files with colon-in-filename issues
The two-job split was fighting us: DMG excludes QA files, yet qa job
needs git checkout + compiler for qa/src helpers. Just do everything
in one job.

Drop myconfigure entirely - qa/GNUmakefile.install detects installed
PCP via $(PCP_INC_DIR)/builddefs, doesn't need source configure.

Switch to -g sanity -x not_in_ci for targeted validation instead of
running all tests with -l.
QA framework creates localconfig, check.log, and test output files.
Some tests also invoke sudo internally without tty.
QA framework expects TOPDIR=../.. to point at configured source tree.
Makepkgs builds in separate pcp-$VERSION/ tarball leaving checkout unconfigured, breaking builddefs include and libpcp.h symlink.

Switch to configure/make/install in original checkout like Linux does.
Add install_pcp target to install launchd plists during make install.
Fix plist names from com.github.performancecopilot.* to io.pcp.*
Install was failing because make install tries to chown directories to
pcp:pcp, but those don't exist on macOS runners. Consolidated all user
creation into one step before install.
When pmcd fails to start, dump pmcd.log and launchctl status to help
debug. Also upload all /var/log/pcp/ logs in artifacts for deeper
investigation.
Instead of waiting just 5 seconds, retry for up to 180 seconds checking
if pmcd is responding. Shows detailed diagnostics on timeout including
process status, logs, and network connections.
The PMNS root file must be built at runtime by merging individual
PMDA namespace files (root_darwin, root_pmcd, etc). While the rc
script should handle this automatically via _reboot_setup(), it's
not triggering reliably in CI. Added explicit Rebuild step as
workaround, with investigation notes in implementation plan.
The install creates Rebuild at $PCP_PMNSADM_DIR/Rebuild and symlinks it to
$PCP_VAR_DIR/pmns/Rebuild. Sudo can't execute the symlink (command not found),
so we run the real script while cd'd to the PMNS directory.

Added debug check to verify the script exists before attempting to run it.
PCP_PMNSADM_DIR isn't exported to runtime environment - it's build-time only.
Instead, use readlink to follow the symlink and construct the full path.

Added debug output to show available PCP_* environment variables.
Check if files actually exist at target locations before trying to run them.
Use realpath to properly resolve relative symlinks to absolute paths.
macOS uses symlinks: /etc -> /private/etc, /var -> /private/var
Without realpath, pcp.conf gets installed to the wrong location and
can't be sourced, leaving all PCP_* variables unset.

This matches the approach used in Makepkgs for darwin builds.
…nfig_on()

Addresses three interconnected issues causing QA test failures on macOS:

1. pmlogger/pmie services never started - plists existed but were never
   bootstrapped. Added StartInterval for periodic health checks (like systemd
   timers on Linux) and bootstrap/kickstart in postinstall and CI workflow.

2. is_chkconfig_on() tried to source /etc/hostconfig which hasn't existed
   since macOS 10.6 (~2009). Replaced with modern launchctl print-disabled check.

3. Added localhost DNS verification step in CI to diagnose/fix potential
   mDNSResponder slowness in VM environments.
…locker

Documents completion of service management fixes (pmlogger/pmie now start) and
discovery of critical DYLD_LIBRARY_PATH issue blocking all QA test execution.

Key updates:
- Added Status section showing completed work (commit 49ee361)
- Added Phase 0 (HIGH PRIORITY) for DYLD_LIBRARY_PATH fix
- Moved service management issues to "RESOLVED" in Known Issues
- Updated Implementation Order to prioritize DYLD fix
- Added verification steps and technical details for DYLD issue

CI run 21793886742 shows 44/70 tests failing due to dyld library loading errors.
macOS requires DYLD_LIBRARY_PATH for runtime library lookup since qa/src
test binaries don't have rpath embedded during build phase. Set it in
common.rc and provide cross-platform _add_lib_path() helper for mock
library tests. Add CI verification step to validate the fix.
Mark Phase 0 as completed and add future task to audit all QA tests
for hardcoded LD_LIBRARY_PATH usage that needs _add_lib_path() conversion.
Test binaries built during 'make' don't have rpath, but GNUmakefile.install
adds -Wl,-rpath on Darwin. Rebuild binaries in /var/lib/pcp/testsuite/src
after install to embed rpath, eliminating remaining dyld errors from
subprocess spawning.
Create composite action to centralize macOS build dependencies across
workflows, eliminating duplication. Adds Perl module installation
(cpanm + required modules) which was missing from both workflows,
causing PCP Perl components to be skipped during build.

Changes:
- New composite action: .github/actions/install-macos-deps
- Installs Homebrew packages (autoconf, unixodbc, valkey, libuv, etc.)
- Installs Perl modules via cpanm (JSON, Date::Parse, XML::TokeParser, etc.)
- Installs Python packages (lxml, openpyxl, psycopg2-binary, etc.)
- Both macOS.yml and qa-macos.yml now use shared action
Document all Homebrew, Perl, and Python dependencies required for
building PCP on macOS. Adds cross-reference to the composite action
to ensure documentation stays in sync with CI workflows.

This fixes the outdated single-line brew install that was missing
Perl modules, Python packages, and several Homebrew dependencies.
Set ownership of /var/lib/pcp/testsuite to pcpqa before attempting
rebuild, preventing "Permission denied" errors when make tries to
write localconfig files.
The dynamic PMDA uses internal PCP headers (libpcp.h) that require
-I$(PCP_INC_DIR) to be found. The GNUmakefile.install was stripping
all -I flags from builddefs and only adding back -I$(PCP_INC_DIR)/..
which works for public headers like <pcp/pmapi.h> but not for internal
headers like "libpcp.h".

This was causing compilation failures when building the dynamic PMDA
from the installed testsuite with: fatal error: 'libpcp.h' file not found
The previous fix only modified GNUmakefile.install, but the main
GNUmakefile also has a code path for building from an installed
testsuite (when TOPDIR/src/include/builddefs doesn't exist). This code
path also needs the -I$(PCP_INC_DIR) flag to find internal headers
like libpcp.h.
Check what's in the installed GNUmakefile and where libpcp.h actually is
libpcp.h is a private/internal header (marked NOSHIP) that is not
installed. The dynamic PMDA was using it for pmDebugOptions, but this
symbol is already available in the public pmapi.h header which is
already included.

This fixes the compilation error:
  dynamic.c:11:10: fatal error: 'libpcp.h' file not found
The dynamic PMDA test code legitimately needs internal PCP headers for
symbols like PDU_FLAG_AUTH. The GNUmakefile explicitly declares a
dependency on libpcp.h and includes comments about supporting internal
headers. This reverts the misguided "cleanup" from f7c6b9c.

Fixes compilation error:
  dynamic.c:355:23: error: use of undeclared identifier 'PDU_FLAG_AUTH'
Disable parallel make and add detailed state inspection before QA
tests run to understand why dynamic PMDA build fails on macOS.
The GNUmakefile.install assumed non-/usr/include/pcp paths meant
internal headers would be in the installed location, but libpcp.h is
marked NOSHIP and only exists in testsuite/src/ post-install.

Add Darwin-specific case to explicitly use ../../src where libpcp.h
actually lives, while preserving all existing Linux/BSD behavior.
Match qa/src pattern of using "libpcp.h" instead of <pcp/libpcp.h>.
The -I../../src flag in GNUmakefile.install now finds the file at
../../src/libpcp.h without needing a pcp/ subdirectory structure.
Document critical learnings about internal headers in test code,
especially the libpcp.h handling differences between build tree
and installed testsuite, and macOS vs Linux path differences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant