Python 3.8+里的asyncio和asyncio.Semaphore

  • Title(EN): asyncio and asyncio.Semaphore in Python 3.8+
  • Author: dog2

以前的程序放在Python 3.8里跑出错了,原来是由于Python升级3.8后协程库asyncio又双叒叕更新了。 新版本里asyncio.Semaphore的用法改变了,本文简单记录一下新写法。

代码说明:用支持异步的http库httpx简单爬数据,用asyncio.Semaphore控制并发数,而asyncio.Semaphore在Python 3.8中需要配合上下文管理器contextvars.ContextVar使用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import httpx
import asyncio

from contextvars import ContextVar

def crawl(concurrency=3):
context = ContextVar("concurrent") # 定义全局上下文管理器
URL_BASE = 'https://github.com/topics?page='

async def crawl_one(i):
sem = context.get() # 获取上下文关键字
async with sem:
async with httpx.AsyncClient() as client:
r = await client.get(f"{URL_BASE}{i}")
return len(r.text)

async def crawl_all():
context.set(asyncio.Semaphore(concurrency)) #上下文管理器赋值,concurrency控制并发数
tasks = [asyncio.create_task(crawl_one(i)) for i in range(1, 30)] # 将协程封装成任务共30个
done, pending = await asyncio.wait(tasks) # 执行所有任务
return done

tasks_done = asyncio.run(crawl_all())
return [t.result() for t in tasks_done]

if __name__ == '__main__':
for r in crawl(concurrency=1):
print(r)

参考链接: - Python 协程模块 asyncio 使用指南