Speeding up embedding

The current implementation of processing is too slow, taking ~30 mins to process each keyword, with the bottleneck being the embedding. This has increased due to us collecting more news articles ~8,000 per day (still from newsapi.org not from aylien). 

To make this problem tractable on my machine I have limited the number of articles for each keyword to 500. These 500 articles result in ~15,000x15,000 word matrix.

To try and manage this I have tweaked the Fame SVD to only calculate the first d eigenvectors. Another option is to process articles as bipartite networks with words<->sentences. This changes slightly the eigenvectors we get back, but because the number of sentences is about a third the number of words, the time it takes to process is roughly half.
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up embedding #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speeding up embedding #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions