-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The current implementation of processing is too slow, taking ~30 mins to process each keyword, with the bottleneck being the embedding. This has increased due to us collecting more news articles ~8,000 per day (still from newsapi.org not from aylien).
To make this problem tractable on my machine I have limited the number of articles for each keyword to 500. These 500 articles result in ~15,000x15,000 word matrix.
To try and manage this I have tweaked the Fame SVD to only calculate the first d eigenvectors. Another option is to process articles as bipartite networks with words<->sentences. This changes slightly the eigenvectors we get back, but because the number of sentences is about a third the number of words, the time it takes to process is roughly half.