-
-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add webscraping tutorial (OR: arbitrary number of returns from nursery) #1483
Comments
Hi! Glad you're enjoying Trio. Can you try https://github.com/python-trio/trimeter and tell us if that helps? Also, the recommended HTTP client right now is https://www.python-httpx.org/ |
Give Synchronizing and communicating between tasks a read; I'd use a channel to send back the results from the other tasks. |
Thanks for the quick responses!
I don't have time to install it from source right now, but this seems like an excellent solution in principle (only that the last commit was Feb '19)?
Thanks for the tip! So
I'm sure the task can be implemented. But that seems (at first glance) like an unreasonably high amount of complexity/effort just to process some requests. |
… except when you want to use a websocket … |
I managed to solve it by writing to a sort of global dict (which is not a pattern I like), but at least it works: result = {}
async def get_shiny_thing(key, url, session):
# abort if we've done the lookup already
if key in result:
return
r = await session.get(url)
# whatever
result[key] = some_shiny_thing
async def get_treasure(urls, max_concurrent=10):
session = asks.Session(connections=max_concurrent)
async with trio.open_nursery() as nursery:
for key, url in enumerate(urls):
nursery.start_soon(add_pr_title, key, url, session)
trio.run(get_treasure, list_of_urls)
# ... continue processing `result` |
You don't need a global dict. Just create it in |
I think the remaining action item here is a duplicate of #421. |
It does need some love (and packaging!), but it's quite small and I think it still works with latest Trio.
@h-vetinari It's not! We sometimes have periods without activity, and sometimes I work on urllib3 before merging the work in hip. We still believe the idea of hip is sound. |
I'm new to trio, but it seems to me to be the cleanest approach to async programming in python. :)
So when I had a little task of grabbing a bunch of things from the web I automatically thought I'd try it, but ran into problems straight away. Even if the solution to my problem ends up being trivial, I'm maybe a good example of someone looking at the tutorial and trying to build their first toy example (the issue title can be adapted accordingly).
Let's say I have my function:
All I really want to do is (knowing that the order is indeterminate):
This fails with
RuntimeError: use 'async with open_nursery(...)', not 'with open_nursery(...)'
Next step: a wrapper function:
But then - gasp! -
my_treasure
is empty:I tested that
get_shiny_thing
actually does what it should. Next, I then found this SO answer from 2018 by @njsmith, how what I want to do is not really possible (yet?). But the workaround of creating separate functions that update each url (in a dict?) separately and then get passed tostart_soon
seems cumbersome, even if I built a "function factory".In short: one of the most generic & popular async examples (a little web scraping) should IMO be one of the things in a tutorial. The tutorial even notes this absence:
If hip isn't ready yet, then it's maybe worth considering just doing that example with
asks
.The text was updated successfully, but these errors were encountered: