Commit 028a12f5aa829b4ba6ac011530b815eda4960e89

Authored by Ben Skeggs
1 parent 434fdb5151

drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init

Certain boards with GP107/GP108 chipsets hang (often, but randomly) for
unknown reasons during GR initialisation.

The first tell-tale symptom of this issue is:

nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]

appearing in dmesg, likely followed by many other failures being logged.

Karol found this WAR for the issue a while back, but efforts to isolate
the root cause and proper fix have not yielded success so far.  I've
modified the original patch to include a few more details, limit it to
GP107/GP108 by default, and added a config option to override this choice.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

Showing 1 changed file with 26 additions and 0 deletions Side-by-side Diff

drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c
... ... @@ -1981,7 +1981,33 @@
1981 1981 {
1982 1982 struct gf100_gr *gr = gf100_gr(base);
1983 1983 struct nvkm_subdev *subdev = &base->engine.subdev;
  1984 + struct nvkm_device *device = subdev->device;
  1985 + bool reset = device->chipset == 0x137 || device->chipset == 0x138;
1984 1986 u32 ret;
  1987 +
  1988 + /* On certain GP107/GP108 boards, we trigger a weird issue where
  1989 + * GR will stop responding to PRI accesses after we've asked the
  1990 + * SEC2 RTOS to boot the GR falcons. This happens with far more
  1991 + * frequency when cold-booting a board (ie. returning from D3).
  1992 + *
  1993 + * The root cause for this is not known and has proven difficult
  1994 + * to isolate, with many avenues being dead-ends.
  1995 + *
  1996 + * A workaround was discovered by Karol, whereby putting GR into
  1997 + * reset for an extended period right before initialisation
  1998 + * prevents the problem from occuring.
  1999 + *
  2000 + * XXX: As RM does not require any such workaround, this is more
  2001 + * of a hack than a true fix.
  2002 + */
  2003 + reset = nvkm_boolopt(device->cfgopt, "NvGrResetWar", reset);
  2004 + if (reset) {
  2005 + nvkm_mask(device, 0x000200, 0x00001000, 0x00000000);
  2006 + nvkm_rd32(device, 0x000200);
  2007 + msleep(50);
  2008 + nvkm_mask(device, 0x000200, 0x00001000, 0x00001000);
  2009 + nvkm_rd32(device, 0x000200);
  2010 + }
1985 2011  
1986 2012 nvkm_pmu_pgob(gr->base.engine.subdev.device->pmu, false);
1987 2013