Open
Conversation
the most common case when parsing elements from the streaming API will be tweets, thus, it should come first in the if/else branch rather than last so that in more scenarios less checks will be performed. this is a small difference but actually is highly visible when streaming at high throughput.
add support for parsing `delimited: length` from twitter, and perform less redundant comparisons in all cases.
Author
|
By the way, I suspect there is still a large degree of room to further improve streaming performance. For example, using an actual buffer parser such as node-strtok could make a massive difference over my hacky method of using a local string variable which has to be constantly reassigned. In addition, some profiling and looking for unnecessary operations in hot code paths could go a long way. I'd love to encourage others who know Javascript better than me to look into this! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I made a few changes to the way streaming parsing is done to improve performance and throughput.
Here are some ad-hoc benchmarks from my workstation, consuming the streaming API with a query that generates ~1000 tweets per second for 35 seconds.
Before:
After:
In addition, the new streaming parser adds support for sending the
delimited: lengthparameter to the Twitter API streaming endpoint, which causes it to prefix each tweet object with its length in bytes. This enables more efficient parsing since you don't have to scan ahead for an EOL.Using
{delimited: length}:I rarely use Javascript or NodeJS so I would strongly suggest a code review on this before merging.