A GPU page fault commonly occurs under one of these conditions. An application mistakenly executes work on the GPU that references a deleted object. This is one of the top reasons for an unexpected device removal. An application mistakenly executes work on the GPU that accesses an evicted resource, or a non-resident tile.
GPU 页面错误通常在下列情况之一下发生:
应用程序在 GPU 上错误地执行了应用已删除的对象的作业。 这是意外删除设备的主要原因之一。
应用程序错误地在 GPU 上执行了访问已逐出的资源或非驻留磁贴的作业。
着色器引用未初始化的或过时的描述符。
着色器索引超出根绑定末尾。
参考
Use DRED to diagnose GPU faults
使用 DRED 诊断 GPU 错误
[Vega10] GPU lockup on boot: VMC page fault
GPU Multisplit
Bug 105733 - Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.
Debugging mesa and the linux 3D graphics stack
Debugging HyperZ and fixing a radeon drm linux kernel module