-
Notifications
You must be signed in to change notification settings - Fork 228
Description
What problem does your feature solve?
The network settings config upgrade logic in the start script is brittle. The upgrade_soroban_config function uses stellar-core get-settings-upgrade-txs to generate transactions, submits them via curl to core's HTTP endpoint, and confirms they were applied by polling the global ledger.transaction.count metric.
For example:
Lines 672 to 718 in 6357b28
| upgrade_output="$(echo $NETWORK_ROOT_SECRET_KEY \ | |
| | stellar-core get-settings-upgrade-txs \ | |
| "$NETWORK_ROOT_ACCOUNT_ID" \ | |
| "$seq_num" \ | |
| "$NETWORK_PASSPHRASE" \ | |
| --xdr `stellar-xdr encode --type ConfigUpgradeSet < "$config_file_path"` \ | |
| --signtxs)" | |
| let line_count=$(echo "$upgrade_output" | wc -l) | |
| echo "$upgrade_output" | { \ | |
| TX_COUNT="`curl -s http://localhost:11626/metrics | jq -r '.metrics."ledger.transaction.count".count'`" | |
| TX_COUNT=$((TX_COUNT+1)) | |
| # If the line count is 9 instead of 7, a version of core is being used where the restore op is being returned | |
| if [ $line_count = 9 ] ; then | |
| read tx; | |
| read txid; | |
| echo "upgrades: soroban config: restore contract: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" | jq -r '.status')"; | |
| while [ "`curl -s http://localhost:11626/metrics | jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done | |
| TX_COUNT=$((TX_COUNT+1)) | |
| fi | |
| read tx; \ | |
| read txid; \ | |
| echo "upgrades: soroban config: install contract: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" | jq -r '.status')"; \ | |
| while [ "`curl -s http://localhost:11626/metrics | jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done | |
| TX_COUNT=$((TX_COUNT+1)); \ | |
| read tx; \ | |
| read txid; \ | |
| echo "upgrades: soroban config: deploy contract: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" | jq -r '.status')"; \ | |
| while [ "`curl -s http://localhost:11626/metrics | jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done | |
| TX_COUNT=$((TX_COUNT+1)); \ | |
| read tx; \ | |
| read txid; \ | |
| echo "upgrades: soroban config: upload config: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" | jq -r '.status')"; \ | |
| while [ "`curl -s http://localhost:11626/metrics | jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done | |
| TX_COUNT=$((TX_COUNT+1)); \ | |
| read key; \ | |
| echo "upgrades: soroban config: set config with key: $key"; | |
| OUTPUT="$(curl -sG 'http://localhost:11626/upgrades?mode=set&upgradetime=1970-01-01T00:00:00Z' --data-urlencode "configupgradesetkey=$key")" | |
| echo "$OUTPUT"; \ | |
| if [ "$OUTPUT" == "Error setting configUpgradeSet" ]; then | |
| echo "!!!!! Unable to upgrade Soroban Config Settings. Stopping all services. !!!!!" | |
| kill_supervisor | |
| fi | |
| } | |
| echo "upgrades: soroban config done" |
The script reads transactions and transaction IDs from stdout line-by-line, submits each via curl, then waits for the global transaction count metric to increment:
while [ "`curl -s http://localhost:11626/metrics | jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; doneThis is brittle in several ways:
- Transaction confirmation by global counter: It does not verify that the specific transaction succeeded, only that the total transaction count increased. If any other transaction occurs, or if a transaction fails but is still counted, the logic breaks.
- Output format coupling: The script checks
if [ $line_count = 9 ]vs 7 lines to detect whether a restore operation is included in the output, coupling it tightly to the exact output format ofstellar-core get-settings-upgrade-txswhich can change between versions. - Pipe-based parsing of stdout: The entire flow reads tx blobs and tx IDs via
readfrom a piped subshell, which is fragile and hard to debug when something goes wrong.
@sisuresh and I have noticed some recent flaky build failures that may be related to this brittleness:
What would you like to see?
Replace the brittle shell-based transaction submission and confirmation logic with something more robust. This could be part of a small Rust CLI tool (#906) that handles transaction submission and confirmation directly, or another approach that avoids relying on polling global metrics and parsing stdout line counts.
What alternatives are there?
- Improve the shell script: Add retries, check transaction results directly via the
/txendpoint response, and make the output parsing more resilient. This improves reliability but still leaves the fundamental brittleness of doing this in bash. - Use stellar-cli: Ship
stellar-cliwith quickstart and use it for transaction submission. Downside is thatstellar-cliis further downstream and harder to keep in sync with unreleased stellar-core changes. - Build into a small Rust CLI: As proposed in Add a small Rust CLI tool to the quickstart image for non-trivial startup logic #906, a minimal Rust tool could handle this logic more robustly with proper error handling and transaction result checking.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status