Python 3.8+里的asyncio和asyncio.Semaphore

Title(EN): asyncio and asyncio.Semaphore in Python 3.8+
Author: dog2

以前的程序放在Python 3.8里跑出错了，原来是由于Python升级3.8后协程库asyncio又双叒叕更新了。新版本里asyncio.Semaphore的用法改变了，本文简单记录一下新写法。

代码说明：用支持异步的http库httpx简单爬数据，用asyncio.Semaphore控制并发数，而asyncio.Semaphore在Python 3.8中需要配合上下文管理器contextvars.ContextVar使用。

import httpx
import asyncio

from contextvars import ContextVar

def crawl(concurrency=3):
    context = ContextVar("concurrent")  # 定义全局上下文管理器
    URL_BASE = 'https://github.com/topics?page='

    async def crawl_one(i):
        sem = context.get()  # 获取上下文关键字
        async with sem:
            async with httpx.AsyncClient() as client:
                r = await client.get(f"{URL_BASE}{i}")
                return len(r.text)

    async def crawl_all():
        context.set(asyncio.Semaphore(concurrency)) #上下文管理器赋值，concurrency控制并发数
        tasks = [asyncio.create_task(crawl_one(i)) for i in range(1, 30)] # 将协程封装成任务共30个
        done, pending = await asyncio.wait(tasks) # 执行所有任务
        return done

    tasks_done = asyncio.run(crawl_all())
    return [t.result() for t in tasks_done]

if __name__ == '__main__':
    for r in crawl(concurrency=1):
        print(r)

参考链接： - Python 协程模块 asyncio 使用指南