Surprisingly difficult Infinite scroll problem #1738
Unanswered
enoc2222
asked this question in
Forums - Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi All,
I recently found crawl4ai while trying to do some simple scraping on a classified site(https://classifieds.ksl.com).
I am new to scraping and figured this would be a good site to learn/practice some basics on scraping data.
After trying selenium and Requests/Beautifulsoup approach with no luck (apparently they've implemented some decent scraping/bot blockers), I saw a suggestion on reddit to try crawl4ai.
I was able to get a basic script up and kind of running pretty quick and got some initial data but I am hitting a wall trying to get the complete data set from the search.
The classified search result page loads/function like a classic infinite scroll, adding more listings as you scroll down.
With my script, I can usually get the first 11-20 items, sometimes up to 34 (once got 80), but can't ever get the full search results(which as of writing is 254 for the url in the code below).
I've tried virtual scroll but could only ever get the first 11 using the selector ".grid". I've also tried scroll_delay with a range of 0.5 to 10 seconds with no noticeable difference.
I've been banging my head on this till the wee hours and figured wiser minds may have better insight if this is a simple fix or a deeper dive.
Here's my current best working code:
Let me know if you have any thoughts/suggestions
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions