Skip to content

Speeding up embedding #4

@StirlingSmith

Description

@StirlingSmith

The current implementation of processing is too slow, taking ~30 mins to process each keyword, with the bottleneck being the embedding. This has increased due to us collecting more news articles ~8,000 per day (still from newsapi.org not from aylien).

To make this problem tractable on my machine I have limited the number of articles for each keyword to 500. These 500 articles result in ~15,000x15,000 word matrix.

To try and manage this I have tweaked the Fame SVD to only calculate the first d eigenvectors. Another option is to process articles as bipartite networks with words<->sentences. This changes slightly the eigenvectors we get back, but because the number of sentences is about a third the number of words, the time it takes to process is roughly half.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions