Conversation
1da9422 to
eb19fb7
Compare
89994cf to
b2d3de9
Compare
| # @param [RSolr::Client] connection | ||
| # @param [ResourceFactory] resource_factory | ||
| def initialize(connection:, resource_factory:, start:, batch_size:, except_models:) | ||
| Valkyrie.logger.warn("You are trying to query from Solr in batches larger than 1_000, this may cause issues for large Solr documents") if batch_size > 1_000 |
There was a problem hiding this comment.
Question: Is this problem because of Solr or because of Valkyrie? If it's because of Solr, I wonder if we can separate the paging from the batch size somehow? Might not be too important, I'm not sure when I'd actually use this query for Solr...
There was a problem hiding this comment.
Ah, yeah, good point. Let me think on how to do that.
| def run | ||
| docs = Paginator.new(start: start, batch_size: batch_size) | ||
| while docs.has_next? | ||
| docs = connection.paginate(docs.next_page, docs.per_page, "select", params: { q: query })["response"]["docs"] |
There was a problem hiding this comment.
If there's no sort parameter I think this might return inconsistently, especially between replicas. In Postgres it works because I'm pretty sure AR's handling those internals.
I think we could either get all the IDs at once and resolve them to full documents, or add a sort param. Or maybe this works, I'm really not sure, I might just be thinking about SolrCloud edge cases..
There was a problem hiding this comment.
I think you're right about inconsistent performance - I was just worried about the performance implications of sorting, but maybe I should do some benchmarking to find out how big of a hit it makes.
69303b6 to
b667a7f
Compare
|
I haven't followed through the use of |
Oldest version of rdf that makes reliance on BigDecimal explicit.
Will not pass tests because other query_services don't have method yet Connected to #985
Will not pass tests because other query_services don't have method yet How can we figure out whether this will kill a "normal" machine's memory for a "normal" solr corpus? Connected to #985
Will not pass tests because other query_services don't have method yet Connected to #985
Connected to #985
…er to get deterministic results
4671fae to
903d581
Compare
Adds the
#find_in_batchesmethod to each query service. This allows for more batch processing, especially with Postgres and Solr query services.Connected to #985