3 min read

Python: Asyncio and Aiohttp

Python: Asyncio and Aiohttp

Suppose your program is trying to execute a series of tasks (1 - 6). If each task takes different time to complete, then your program will need to wait for each task to be completed sequentially before it can proceed.

Asyncio will be useful in such scenarios because it enables the program to continue running other tasks while waiting for the specific task to be completed. In order to use Asyncio, you will need to use compatible libraries. For example, instead of using requests (a popular HTTP library in Python), you will use aiohttp.

Note: In order to install aiohttp library in Windows system, you will need to download and install Microsoft C++ Build Tools. https://visualstudio.microsoft.com/visual-cpp-build-tools/

When to use Asyncio?

  • You want to speed up a specific part of your program where you are running a list of tasks sequentially for large-N items.
  • Suppose you are making API calls based on a list of different values for a parameter, you can use asyncio and aiohttp to make the API requests.
  • You do not need to change your entire program to use async/await syntax. Try to observe which part of the program is a bottleneck and explore how asyncio can improve performance on this particular flow.

Example: Crawling Wikipedia for info on Football (Soccer) Clubs

In this demo, we are going to perform the list of tasks below:

  1. Read the list of football clubs from a csv file.
  2. Get the Wikipedia URL of each Football club.
  3. Get the Wikipedia HTML page of each Football club.
  4. Write the HTML page into a HTML file for each Football Club.
  5. From the HTML page, we need to parse for information (Full Name, Ground, Founded Date etc.) using BeautifulSoup library.
  6. For the information of each club, we want to append the information into a Dataframe.
  7. Finally, print out the Dataframe to see if the information is correct.

See the Synchronous example and Asynchronous example from my Github repo. If we execute both scripts, we can an estimated difference here where Asyncio complete the execution faster by about 20-30%.

Execution time for Asyncio : 17.885913610458374
Execution time for Synchronous: 23.075875997543335

Refactor Tips

  • As a practice, a co-routine main is often defined and used in an event loop (e.g. asyncio.run(main()). Then in the co-routine main function, all the other co-routines are await.
  • If the request has a consistent response time, then you should stick to the synchronous approach. For example, if you are using Pandas, then you should use apply() on a function. For parts of the program which are bottleneck, you should try with asyncio to see if the speed performance is improved.

Key Terms

Event Loop. You must use an event loop to run the co-routine.

# Running event loop for Python 3.7+
asyncio.run(main())

# Older syntax before Python 3.7+
loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

Co-routine

async / await. This is the syntax for defining co-routines in python. You can declare a co-routine by using async def in front of a function. await is used inside a co-routine and tells the program to come back to foo() when do_something() is ready. Make sure that do_something() is also a co-routine.

async def foo():
    x = await do_something()
    return x