This repository was archived by the owner on Oct 30, 2023. It is now read-only.
Closed
Conversation
heslami
approved these changes
Sep 19, 2017
| @@ -792,7 +792,6 @@ private static boolean calculateRegression(double[] coefficient, | |||
| LOG.warn("There are " + coefficient.length + | |||
There was a problem hiding this comment.
We should make this a LOG.info. We can also entirely remove this if block as it doesn't add much info to the log actually. If we want to keep the if, we should also remove the "but" from the logline :-)
majakabiljo
approved these changes
Sep 21, 2017
majakabiljo
left a comment
There was a problem hiding this comment.
+1
What does it mean that columns are invalid?
Contributor
Author
|
These columns correspond to the different variables in linear regression model and include the number of edges read so far, number of vertices computed etc. A case of an invalid column would be all samples have a value of zero for this column (e.g. there no vertices computed yet). Another case would be there is a linear dependency between two columns, so you can't run the regression. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Method MemoryEstimatorOracle.calculateRegression() exits if the number of valid columns to use for the regression is not the same as the total number of columns. This is wrong, the regression can still run on only the valid columns. This causes memory estimation to never be used in practice, and OOC starts spilling only when memory usage gets very high.
This is fixed in #34 too, but I want to make these changes one-by-one so that we can test in isolation.
Tests:
JIRA: https://issues.apache.org/jira/browse/GIRAPH-1160