Commit 028a12f5aa829b4ba6ac011530b815eda4960e89
1 parent
434fdb5151
drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init
Certain boards with GP107/GP108 chipsets hang (often, but randomly) for unknown reasons during GR initialisation. The first tell-tale symptom of this issue is: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ] appearing in dmesg, likely followed by many other failures being logged. Karol found this WAR for the issue a while back, but efforts to isolate the root cause and proper fix have not yielded success so far. I've modified the original patch to include a few more details, limit it to GP107/GP108 by default, and added a config option to override this choice. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>
Showing 1 changed file with 26 additions and 0 deletions Side-by-side Diff
drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
... | ... | @@ -1981,7 +1981,33 @@ |
1981 | 1981 | { |
1982 | 1982 | struct gf100_gr *gr = gf100_gr(base); |
1983 | 1983 | struct nvkm_subdev *subdev = &base->engine.subdev; |
1984 | + struct nvkm_device *device = subdev->device; | |
1985 | + bool reset = device->chipset == 0x137 || device->chipset == 0x138; | |
1984 | 1986 | u32 ret; |
1987 | + | |
1988 | + /* On certain GP107/GP108 boards, we trigger a weird issue where | |
1989 | + * GR will stop responding to PRI accesses after we've asked the | |
1990 | + * SEC2 RTOS to boot the GR falcons. This happens with far more | |
1991 | + * frequency when cold-booting a board (ie. returning from D3). | |
1992 | + * | |
1993 | + * The root cause for this is not known and has proven difficult | |
1994 | + * to isolate, with many avenues being dead-ends. | |
1995 | + * | |
1996 | + * A workaround was discovered by Karol, whereby putting GR into | |
1997 | + * reset for an extended period right before initialisation | |
1998 | + * prevents the problem from occuring. | |
1999 | + * | |
2000 | + * XXX: As RM does not require any such workaround, this is more | |
2001 | + * of a hack than a true fix. | |
2002 | + */ | |
2003 | + reset = nvkm_boolopt(device->cfgopt, "NvGrResetWar", reset); | |
2004 | + if (reset) { | |
2005 | + nvkm_mask(device, 0x000200, 0x00001000, 0x00000000); | |
2006 | + nvkm_rd32(device, 0x000200); | |
2007 | + msleep(50); | |
2008 | + nvkm_mask(device, 0x000200, 0x00001000, 0x00001000); | |
2009 | + nvkm_rd32(device, 0x000200); | |
2010 | + } | |
1985 | 2011 | |
1986 | 2012 | nvkm_pmu_pgob(gr->base.engine.subdev.device->pmu, false); |
1987 | 2013 |