Network settings config upgrade in start script is brittle

### What problem does your feature solve?

The network settings config upgrade logic in the `start` script is brittle. The [`upgrade_soroban_config` function](https://github.com/stellar/quickstart/blob/6357b286e43c856c68b2b6b7690f8a520d1b7f5d/start#L393-L437) uses `stellar-core get-settings-upgrade-txs` to generate transactions, submits them via `curl` to core's HTTP endpoint, and confirms they were applied by polling the global `ledger.transaction.count` metric.

For example:

https://github.com/stellar/quickstart/blob/6357b286e43c856c68b2b6b7690f8a520d1b7f5d/start#L672-L718

The script reads transactions and transaction IDs from stdout line-by-line, submits each via `curl`, then waits for the global transaction count metric to increment:

```sh
while [ "`curl -s http://localhost:11626/metrics | jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done
```

This is brittle in several ways:

- **Transaction confirmation by global counter:** It does not verify that the specific transaction succeeded, only that the total transaction count increased. If any other transaction occurs, or if a transaction fails but is still counted, the logic breaks.
- **Output format coupling:** The script checks `if [ $line_count = 9 ]` vs 7 lines to detect whether a restore operation is included in the output, coupling it tightly to the exact output format of `stellar-core get-settings-upgrade-txs` which can change between versions.
- **Pipe-based parsing of stdout:** The entire flow reads tx blobs and tx IDs via `read` from a piped subshell, which is fragile and hard to debug when something goes wrong.

@sisuresh and I have noticed some recent flaky build failures that may be related to this brittleness:
- https://github.com/stellar/quickstart/actions/runs/21943547186/job/63375620508#step:15:147

Related: #906, #555

### What would you like to see?

Replace the brittle shell-based transaction submission and confirmation logic with something more robust. This could be part of a small Rust CLI tool (#906) that handles transaction submission and confirmation directly, or another approach that avoids relying on polling global metrics and parsing stdout line counts.

### What alternatives are there?

- **Improve the shell script:** Add retries, check transaction results directly via the `/tx` endpoint response, and make the output parsing more resilient. This improves reliability but still leaves the fundamental brittleness of doing this in bash.
- **Use stellar-cli:** Ship `stellar-cli` with quickstart and use it for transaction submission. Downside is that `stellar-cli` is further downstream and harder to keep in sync with unreleased stellar-core changes.
- **Build into a small Rust CLI:** As proposed in #906, a minimal Rust tool could handle this logic more robustly with proper error handling and transaction result checking.

	upgrade_output="$(echo $NETWORK_ROOT_SECRET_KEY \
	\| stellar-core get-settings-upgrade-txs \
	"$NETWORK_ROOT_ACCOUNT_ID" \
	"$seq_num" \
	"$NETWORK_PASSPHRASE" \
	--xdr `stellar-xdr encode --type ConfigUpgradeSet < "$config_file_path"` \
	--signtxs)"

	let line_count=$(echo "$upgrade_output" \| wc -l)

	echo "$upgrade_output" \| { \
	TX_COUNT="`curl -s http://localhost:11626/metrics \| jq -r '.metrics."ledger.transaction.count".count'`"
	TX_COUNT=$((TX_COUNT+1))
	# If the line count is 9 instead of 7, a version of core is being used where the restore op is being returned
	if [ $line_count = 9 ] ; then
	read tx;
	read txid;
	echo "upgrades: soroban config: restore contract: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" \| jq -r '.status')";
	while [ "`curl -s http://localhost:11626/metrics \| jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done
	TX_COUNT=$((TX_COUNT+1))
	fi
	read tx; \
	read txid; \
	echo "upgrades: soroban config: install contract: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" \| jq -r '.status')"; \
	while [ "`curl -s http://localhost:11626/metrics \| jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done
	TX_COUNT=$((TX_COUNT+1)); \
	read tx; \
	read txid; \
	echo "upgrades: soroban config: deploy contract: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" \| jq -r '.status')"; \
	while [ "`curl -s http://localhost:11626/metrics \| jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done
	TX_COUNT=$((TX_COUNT+1)); \
	read tx; \
	read txid; \
	echo "upgrades: soroban config: upload config: $txid .. $(curl -sG 'http://localhost:11626/tx' --data-urlencode "blob=$tx" \| jq -r '.status')"; \
	while [ "`curl -s http://localhost:11626/metrics \| jq -r '.metrics."ledger.transaction.count".count'`" != "$TX_COUNT" ]; do sleep 1; done
	TX_COUNT=$((TX_COUNT+1)); \
	read key; \
	echo "upgrades: soroban config: set config with key: $key";
	OUTPUT="$(curl -sG 'http://localhost:11626/upgrades?mode=set&upgradetime=1970-01-01T00:00:00Z' --data-urlencode "configupgradesetkey=$key")"
	echo "$OUTPUT"; \

	if [ "$OUTPUT" == "Error setting configUpgradeSet" ]; then
	echo "!!!!! Unable to upgrade Soroban Config Settings. Stopping all services. !!!!!"
	kill_supervisor
	fi
	}
	echo "upgrades: soroban config done"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network settings config upgrade in start script is brittle #907

What problem does your feature solve?

What would you like to see?

What alternatives are there?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Network settings config upgrade in start script is brittle #907

Description

What problem does your feature solve?

What would you like to see?

What alternatives are there?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions