docs: document betweenness sampling config#380
Conversation
Co-Authored-By: Oz <oz-agent@warp.dev>
📝 WalkthroughWalkthroughAdds two new sampling configuration options— Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.wordlist.txt (1)
541-542: Reverse the order of these two entries.The entries are not in alphabetical order.
samplingSeedshould come beforesamplingSize(alphabetically 'e' comes before 'i').📝 Proposed fix
-samplingSize samplingSeed +samplingSize🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.wordlist.txt around lines 541 - 542, The two wordlist entries are out of alphabetical order: swap the positions of the tokens "samplingSize" and "samplingSeed" so that "samplingSeed" appears before "samplingSize" in the .wordlist.txt file; locate the two entries (the strings samplingSize and samplingSeed) and reverse their order.algorithms/betweenness-centrality.md (1)
100-102: LGTM! Clear and helpful usage notes.The usage notes effectively communicate:
- Performance considerations
- The accuracy/performance trade-off
- Reproducibility requirements
Consider adding a second example demonstrating the sampling parameters:
// Example: Using sampling for better performance on large graphs CALL algo.betweenness({ 'nodeLabels': ['Person'], 'relationshipTypes': ['FRIEND'], 'samplingSize': 16, 'samplingSeed': 42 }) YIELD node, score RETURN node.name AS person, score ORDER BY score DESCThis would help users understand how to apply the sampling configuration in practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@algorithms/betweenness-centrality.md` around lines 100 - 102, Add a second usage example showing how to enable sampling for performance: include a short cypher example that calls algo.betweenness with samplingSize and samplingSeed alongside nodeLabels and relationshipTypes so readers can see practical usage of the sampling parameters (use the same parameter names samplingSize and samplingSeed and demonstrate reproducibility by setting a fixed seed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@algorithms/betweenness-centrality.md`:
- Around line 39-44: Update the parameter table entry for samplingSeed to show
the actual default value instead of the word "time-based": change the Default
cell to `0` (or `null` if code uses null) and add a parenthetical note such as
"`0` triggers time-based seeding" so readers know the concrete default; ensure
the description for samplingSeed remains consistent with this change (it already
mentions "If omitted (or set to `0`)") and reference the samplingSeed parameter
name so the table and description align.
---
Nitpick comments:
In @.wordlist.txt:
- Around line 541-542: The two wordlist entries are out of alphabetical order:
swap the positions of the tokens "samplingSize" and "samplingSeed" so that
"samplingSeed" appears before "samplingSize" in the .wordlist.txt file; locate
the two entries (the strings samplingSize and samplingSeed) and reverse their
order.
In `@algorithms/betweenness-centrality.md`:
- Around line 100-102: Add a second usage example showing how to enable sampling
for performance: include a short cypher example that calls algo.betweenness with
samplingSize and samplingSeed alongside nodeLabels and relationshipTypes so
readers can see practical usage of the sampling parameters (use the same
parameter names samplingSize and samplingSeed and demonstrate reproducibility by
setting a fixed seed).
algorithms/betweenness-centrality.md
Outdated
| |---------------------|---------|-------------|---------| | ||
| | `nodeLabels` | Array | *(Optional)* List of node labels to include in the computation. | `[]` (all labels) | | ||
| | `relationshipTypes` | Array | *(Optional)* List of relationship types to traverse. | `[]` (all relationship types) | | ||
| | `samplingSize` | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. The value is clamped to the number of reachable nodes in the graph. | `32` | |
There was a problem hiding this comment.
Please rephrase: "The value is clamped to the number of reachable nodes in the graph."
It's a bit unclear, does that mean the max value equals to the number of nodes in the graph ?
Co-Authored-By: Oz <oz-agent@warp.dev>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
algorithms/betweenness-centrality.md (1)
44-44:⚠️ Potential issue | 🟡 MinorUse a concrete default value for
samplingSeedin the table.
time-baseddescribes behavior, not the actual default input value. Since the description already states “omitted (or set to0)”, set the Default cell to0(ornull, if that is the real default) and keep the time-based behavior in the description.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@algorithms/betweenness-centrality.md` at line 44, Update the table row for the parameter samplingSeed so the Default column contains a concrete value (`0`) instead of the phrase "time-based"; keep the explanatory text in the Description (which already notes "omitted (or set to `0`), a time-based seed is used") so behavior is preserved while the Default cell shows the actual input value consumers should use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@algorithms/betweenness-centrality.md`:
- Line 44: Update the table row for the parameter samplingSeed so the Default
column contains a concrete value (`0`) instead of the phrase "time-based"; keep
the explanatory text in the Description (which already notes "omitted (or set to
`0`), a time-based seed is used") so behavior is preserved while the Default
cell shows the actual input value consumers should use.
algorithms/betweenness-centrality.md
Outdated
| | `nodeLabels` | Array | *(Optional)* List of node labels to include in the computation. | `[]` (all labels) | | ||
| | `relationshipTypes` | Array | *(Optional)* List of relationship types to traverse. | `[]` (all relationship types) | | ||
| | `samplingSize` | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. The value is clamped to the number of reachable nodes in the graph. | `32` | | ||
| | `samplingSize` | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. For similar accuracy across different graph sizes, keep `samplingSize` proportional to the number of input nodes (i.e., sample a similar fraction of nodes). If `samplingSize` is larger than the number of eligible nodes, all eligible nodes are used. | `32` | |
There was a problem hiding this comment.
this is too long, no need to go into so much details, keep this simple.
Co-Authored-By: Oz <oz-agent@warp.dev>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@algorithms/betweenness-centrality.md`:
- Around line 100-102: Clarify that when samplingSize is used the algorithm is
approximate: update the three bullets to state that with the default
samplingSize=32 the betweenness score is an approximation (not exact
all-shortest-paths), and add a note that to obtain exact betweenness you must
disable sampling (i.e., set samplingSize to disable sampling / full mode) so the
implementation computes all shortest paths; also keep the existing guidance
about samplingSeed for reproducibility. Reference samplingSize and samplingSeed
and the “all shortest paths” claim so the wording reconciles both behaviors.
| - Betweenness Centrality can be computationally expensive on large, dense graphs. | ||
| - Use `samplingSize` to trade accuracy for performance (larger samples are slower but usually more accurate). | ||
| - Set `samplingSeed` to a fixed value to make runs reproducible; if you omit it, results may vary between runs due to random sampling. |
There was a problem hiding this comment.
Clarify that sampling produces an approximation by default.
With default samplingSize=32, results are typically approximate, so the usage notes should explicitly qualify this to avoid conflicting with the earlier “all shortest paths” statement (Line 98).
✏️ Suggested wording
-- Betweenness Centrality can be computationally expensive on large, dense graphs.
-- Use `samplingSize` to trade accuracy for performance (larger samples are slower but usually more accurate).
-- Set `samplingSeed` to a fixed value to make runs reproducible; if you omit it, results may vary between runs due to random sampling.
+- Betweenness Centrality can be computationally expensive on large, dense graphs.
+- By default, scores are estimated using sampled source nodes (`samplingSize=32`); increase `samplingSize` for higher fidelity at higher cost.
+- Set `samplingSeed` to a fixed value to make sampled runs reproducible; if omitted (or `0`), a time-based seed is used.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@algorithms/betweenness-centrality.md` around lines 100 - 102, Clarify that
when samplingSize is used the algorithm is approximate: update the three bullets
to state that with the default samplingSize=32 the betweenness score is an
approximation (not exact all-shortest-paths), and add a note that to obtain
exact betweenness you must disable sampling (i.e., set samplingSize to disable
sampling / full mode) so the implementation computes all shortest paths; also
keep the existing guidance about samplingSeed for reproducibility. Reference
samplingSize and samplingSeed and the “all shortest paths” claim so the wording
reconciles both behaviors.
Documents Betweenness Centrality config options:
samplingSizeandsamplingSeedto the parameters section (including defaults and behavior).Co-Authored-By: Oz oz-agent@warp.dev
Summary by CodeRabbit
New Features
Documentation
Chores