GPU Scheduler in Linux Kernel
缩略词
缩略词 | 解释 |
---|---|
rq | run queue |
SPSC | Single Producer Single Consumer |
drm_gpu_scheduler
drm_sched_backend_ops
drm_sched_rq
drm_sched_entity
drm_sched_job
drm_sched_fence
Linux内核的GPU scheduler是负责向GPU硬件提交作业的,它被编译到内核模块gpu-sched
. 它本身是一个内核线程kthread
, 这个线程的入口函数是drm_sched_main
, 它的启动是在drm_sched_init
.
1 | /** |
drm_sched_main
这个函数根据dma_fence
的状态(signaled or unsignaled)反复进行睡眠和唤醒, 让下面这4个callbacks有条不紊地在CPU上执行。drm_sched_init
中有一个drm_sched_backend_ops
需要驱动去实现:
1 | static const struct drm_sched_backend_ops xxx_sched_ops = { |
dma_fence
就像一个“令牌”一样,在这4个函数之间流转,gpu scheduler不单单用一个dma_fence
, 它使用了一组dma_fence
(2个, scheduled和finished), 这组dma_fence
被封装在drm_sched_fence
1 | /** |
drm_scheduler 的一些问题
如果在 run_job()
path 存在内存申请,就有可能导致内核死锁
我们试着从 dri-devel IRC 频道 上的一段讨论来理解这个问题。
16:51
The dma_fence design itself is fine. It’s designed that way for very good reasons. There are problems we need to solve but they’re more around how things have become tangled up inside drm than they are about dma_fence.
16:54DemiMarie: if you have a way to swapout some memory accessed by in-flight jobs, you might be able to unblock the situation, but I’m sure this ‘no allocation in the scheduler path’ rule is here to address the problem where a job takes too long to finish and the shrinker decides to reclaim memory anyway.
16:56I think the problem is that drm_sched exposes a fence to the outside world, and it needs a guarantee that this fence will be signaled, otherwise other parties (the shrinker) might wait for an event that’s never going to happen
16:56Yup
16:57that comes from the fact it’s not the driver fence that’s exposed to the outside world, but an intermediate object, which is indirectly signaled by the driver fence, that’s created later on when the scheduler calls ->run_job()
16:58Once a fence has been exposed, even internally within the kernel, it MUST signal in finite time.
16:59If you allocate memory, that could kick off reclaim which can then have to wait on the GPU and you’re stuck.
17:00so the issue most drivers have, is that they allocate this driver fence in the ->run_job() path with GFP_KERNEL (waitable allocation), which might kick the GPU driver shrinker, which in turn will wait on the fence exposed by the drm_sched, which will never be signaled because the driver is waiting for memory to allocate its driver fence :-)