15 Sep, 2016

8 commits

  • Helper routine to read out maximum supported pixel rate
    for DisplayPort legay VGA converter or TMDS clock rate
    for other digital legacy converters. The helper returns
    clock rate in kHz.

    v2: Return early if detailed port cap info is not available.
    Replace if-else ladder with switch-case (Ville)

    Reviewed-by: Jim Bride
    Signed-off-by: Mika Kahola
    Acked-by: Dave Airlie
    Signed-off-by: Jani Nikula
    Link: http://patchwork.freedesktop.org/patch/msgid/1473419458-17080-4-git-send-email-mika.kahola@intel.com

    Mika Kahola
     
  • Drop "VGA" from bits per component definitions as these
    are also used by other standards such as DVI, HDMI,
    DP++.

    Reviewed-by: Jim Bride
    Signed-off-by: Mika Kahola
    Acked-by: Dave Airlie
    Signed-off-by: Jani Nikula
    Link: http://patchwork.freedesktop.org/patch/msgid/1473419458-17080-3-git-send-email-mika.kahola@intel.com

    Mika Kahola
     
  • Add missing DisplayPort downstream port types. The introduced
    new port types are DP++ and Wireless.

    Reviewed-by: Jim Bride
    Signed-off-by: Mika Kahola
    Acked-by: Dave Airlie
    Signed-off-by: Jani Nikula
    Link: http://patchwork.freedesktop.org/patch/msgid/1473419458-17080-2-git-send-email-mika.kahola@intel.com

    Mika Kahola
     
  • Adding the ddb size into the devide info will avoid
    platform checks while computing wm.

    v2: Added comment and WARN_ON if ddb size is zero.(Jani)
    v3: Added WARN_ON at the right place.(Jani)

    Suggested-by: Ander Conselvan de Oliveira
    Signed-off-by: Deepak M
    Signed-off-by: Jani Nikula
    Link: http://patchwork.freedesktop.org/patch/msgid/1473931870-7724-1-git-send-email-m.deepak@intel.com

    Deepak M
     
  • Renaming to more consistent scheme, and updating comments, mostly
    about i915_guc_wq_reserve(), aka i915_guc_wq_check_space().

    Signed-off-by: Dave Gordon
    Link: http://patchwork.freedesktop.org/patch/msgid/1473711577-11454-4-git-send-email-david.s.gordon@intel.com
    Reviewed-by: Tvrtko Ursulin
    Signed-off-by: Chris Wilson

    Dave Gordon
     
  • Renaming to more consistent scheme, delete unused definitions

    Signed-off-by: Dave Gordon
    Link: http://patchwork.freedesktop.org/patch/msgid/1473711577-11454-3-git-send-email-david.s.gordon@intel.com
    Reviewed-by: Tvrtko Ursulin
    Signed-off-by: Chris Wilson

    Dave Gordon
     
  • No functional changes; just renaming a bit, tweaking a datatype,
    prettifying layout, and adding comments, in particular in the
    GuC setup code that touches this data.

    Signed-off-by: Dave Gordon
    Link: http://patchwork.freedesktop.org/patch/msgid/1473711577-11454-2-git-send-email-david.s.gordon@intel.com
    Reviewed-by: Tvrtko Ursulin
    Signed-off-by: Chris Wilson

    Dave Gordon
     
  • Commentary from Chris Wilson's original version:

    > I was looking at some wait_for() timeouts on a slow system, with lots of
    > debug enabled (KASAN, lockdep, mmio_debug). Thinking that we were
    > mishandling the timeout, I tried to ensure that we loop at least once
    > after first testing COND. However, the double test of COND either side
    > of the timeout check makes that unlikely. But we can do an equivalent
    > loop, that keeps the COND check after testing for timeout (required so
    > that we are not preempted between testing COND and then testing for a
    > timeout) without expanding COND twice.
    >
    > The advantage of only expanding COND once is a dramatic reduction in
    > code size:
    >
    > text data bss dec hex
    > 1308733 5184 1152 1315069 1410fd before
    > 1305341 5184 1152 1311677 1403bd after

    but it turned out that due to a missing iniitialiser, gcc had "gone
    wild trimming undefined code" :( This version acheives a rather more
    modest (but still worthwhile) gain of ~550 bytes.

    Signed-off-by: Dave Gordon
    Original-idea-by: Chris Wilson
    Cc: Chris Wilson
    Cc: Zanoni, Paulo R
    Link: http://patchwork.freedesktop.org/patch/msgid/1473855033-26980-1-git-send-email-david.s.gordon@intel.com
    Reviewed-by: Paulo Zanoni
    Signed-off-by: Chris Wilson

    Dave Gordon
     

14 Sep, 2016

1 commit

  • Turns out
    commit a05628195a0d ("drm/i915: Get panel_type from OpRegion panel
    details") has regressed quite a few machines. So it looks like we
    can't use the panel type from OpRegion on all systems, and yet we
    absolutely must use it on some specific systems.

    Despite trying, I was unable to find any automagic way to determine
    if the OpRegion panel type is respectable or not. The only glimmer
    of hope I had was bit 8 in the SCIC response, but that turned out to
    not work either (it was always 0 on both types of systems).

    So, to fix the regressions without breaking the machine we know to need
    the OpRegion panel type, let's just add a quirk for this. Only specific
    machines known to require the OpRegion panel type will therefore use
    it. Everyone else will fall bck to the VBT panel type.

    The only known machine so far is a "Conrac GmbH IX45GM2". The PCI
    subsystem ID on this machine is just a generic 8086:2a42, so of no use.
    Instead we'll go with a DMI match.

    I suspect we can now also revert
    commit aeddda06c1a7 ("drm/i915: Ignore panel type from OpRegion on SKL")
    but let's leave that to a separate patch.

    v2: Do the DMI match in the opregion code directly, as dev_priv->quirks
    gets populated too late

    Cc: Rob Kramer
    Cc: Martin van Es
    Cc: Andrea Arcangeli
    Cc: Dave Airlie
    Cc: Marco Krüger
    Cc: Sean Greenslade
    Cc: Trudy Tective
    Cc: Robin Müller
    Cc: Alexander Kobel
    Cc: Alexey Shumitsky
    Cc: Emil Andersen Lauridsen
    Cc: oceans112@gmail.com
    Cc: James Hogan
    Cc: James Bottomley
    Cc: stable@vger.kernel.org
    References: https://lists.freedesktop.org/archives/intel-gfx/2016-August/105545.html
    References: https://lists.freedesktop.org/archives/dri-devel/2016-August/116888.html
    References: https://lists.freedesktop.org/archives/intel-gfx/2016-June/098826.html
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94825
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97060
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97443
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97363
    Fixes: a05628195a0d ("drm/i915: Get panel_type from OpRegion panel details")
    Tested-by: Marco Krüger
    Tested-by: Alexey Shumitsky
    Tested-by: Sean Greenslade
    Tested-by: Emil Andersen Lauridsen
    Tested-by: Robin Müller
    Tested-by: oceans112@gmail.com
    Tested-by: Rob Kramer
    Signed-off-by: Ville Syrjälä
    Link: http://patchwork.freedesktop.org/patch/msgid/1473758539-21565-1-git-send-email-ville.syrjala@linux.intel.com
    References: http://patchwork.freedesktop.org/patch/msgid/1473602239-15855-1-git-send-email-adrienverge@gmail.com
    Acked-by: Jani Nikula

    Ville Syrjälä
     

13 Sep, 2016

2 commits

  • This reverts

    commit 1c80c25fb622973dd135878e98d172be20859049
    Author: Daniel Vetter
    Date: Wed May 18 18:47:12 2016 +0200

    drm/i915/psr: Make idle_frames sensible again

    There are panels that needs 4 idle frames before entering PSR,
    but VBT is unproperly set.

    Also lately it was identified that idle frame count calculated at HW
    can be off by 1, what makes the minimum of 2, at least.

    Without the current vbt+1 we are with the risk of having HW calculating
    0 idle frames and entering PSR when it shouldn't. Regardless the lack
    of link training.

    [Jani: there is some disagreement on the explanation, but the commit
    regresses so revert it is.]

    References: http://marc.info/?i=20160904191153.GA2328@light.dominikbrodowski.net
    Cc: Dominik Brodowski
    Cc: Jani Nikula
    Cc: Daniel Vetter
    Signed-off-by: Rodrigo Vivi
    Fixes: 1c80c25fb622 ("drm/i915/psr: Make idle_frames sensible again")
    Cc: drm-intel-fixes@lists.freedesktop.org # v4.8-rc1+
    Signed-off-by: Jani Nikula
    Link: http://patchwork.freedesktop.org/patch/msgid/1473295351-8766-1-git-send-email-rodrigo.vivi@intel.com

    Rodrigo Vivi
     
  • This adds support for KBL in the new function added in commit ID:
    commit that returns a
    shared pll in case of DDI platforms.

    Signed-off-by: Manasi Navare
    Reviewed-by: Rodrigo Vivi
    Signed-off-by: Rodrigo Vivi
    Link: http://patchwork.freedesktop.org/patch/msgid/1473728663-14355-1-git-send-email-manasi.d.navare@intel.com

    Navare, Manasi D
     

12 Sep, 2016

1 commit

  • drm already provides fallback versions of readq and writeq.

    Signed-off-by: Matthew Auld
    Reviewed-by: Joonas Lahtinen
    Signed-off-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/1473451373-9852-1-git-send-email-matthew.auld@intel.com

    Matthew Auld
     

10 Sep, 2016

6 commits

  • Fix the number of tries in channel euqalization link training sequence
    according to DP 1.2 Spec. It returns a boolean depending on channel
    equalization pass or failure.

    Signed-off-by: Dhinakaran Pandiyan
    Signed-off-by: Manasi Navare
    Reviewed-by: Mika Kahola
    Signed-off-by: Rodrigo Vivi

    Navare, Manasi D
     
  • This function cleans up clock recovery loop in link training compliant
    tp Dp Spec 1.2. It tries the clock recovery 5 times for the same voltage
    or until max voltage swing is reached and removes the additional non
    compliant retries. This function now returns a boolean values based on
    if clock recovery passed or failed.

    v3:
    * Better Debug prints in case of failures (Mika Kahola)
    v2:
    * Rebased on top of new revision of vswing patch (Manasi Navare)

    Signed-off-by: Dhinakaran Pandiyan
    Signed-off-by: Manasi Navare
    Reviewed-by: Mika Kahola
    Signed-off-by: Rodrigo Vivi

    Dhinakaran Pandiyan
     
  • Wrap the max. vswing check in a separate function.
    This makes the clock recovery phase of DP link training cleaner

    v3:
    Fixed the paranthesis warning (Mika Kahola)
    v2:
    Fixed the Compiler warning (Mika Kahola)

    Signed-off-by: Dhinakaran Pandiyan
    Signed-off-by: Manasi Navare
    Reviewed-by: Mika Kahola
    Signed-off-by: Rodrigo Vivi

    Dhinakaran Pandiyan
     
  • Add the PLL selection code for HSW/BDW/BXT/SKL into a stand-alone function
    in order to allow for the implementation of a platform neutral upfront
    link training function.

    v4:
    * Removed dereferencing NULL pointer in case of failure (Dhinakaran Pandiyan)
    v3:
    * Add Hooks for all DDI platforms into this standalone function

    v2:
    * Change the macro to use dev_priv instead of dev (David Weinehall)

    Reviewed-by: Durgadoss R
    Signed-off-by: Manasi Navare
    Signed-off-by: Jim Bride
    Signed-off-by: Rodrigo Vivi

    Jim Bride
     
  • Recently I have been applying an optimisation to avoid stalling and
    clflushing GGTT objects based on their current binding. That is we only
    set-to-gtt-domain upon first bind. However, on hibernation the objects
    remain bound, but they are in the CPU domain. Currently (since commit
    975f7ff42edf ("drm/i915: Lazily migrate the objects after hibernation"))
    we only flush scanout objects as all other objects are expected to be
    flushed prior to use. That breaks down in the face of the runtime
    optimisation above - and we need to flush all GGTT pinned objects
    (essentially ringbuffers).

    To reduce the burden of extra clflushes, we only flush those objects we
    cannot discard from the GGTT. Everything pinned to the scanout, or
    current contexts or ringbuffers will be flushed and rebound. Other
    objects, such as inactive contexts, will be left unbound and in the CPU
    domain until first use after resuming.

    Fixes: 7abc98fadfdd ("drm/i915: Only change the context object's domain...")
    Fixes: 57e885318119 ("drm/i915: Use VMA for ringbuffer tracking")
    References: https://bugs.freedesktop.org/show_bug.cgi?id=94722
    Signed-off-by: Chris Wilson
    Cc: Joonas Lahtinen
    Cc: Mika Kuoppala
    Cc: David Weinehall
    Reviewed-by: Matthew Auld
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909201957.2499-1-chris@chris-wilson.co.uk

    Chris Wilson
     
  • In an attempt to keep the hibernation image as same as possible, let's
    try and discard any unwanted pages and our own page arrays.

    Signed-off-by: Chris Wilson
    Reviewed-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909190218.16831-1-chris@chris-wilson.co.uk

    Chris Wilson
     

09 Sep, 2016

21 commits

  • Now that we can wait upon fences before emitting the request, it becomes
    trivial to wait upon any implicit fence provided by the dma-buf
    reservation object.

    To protect against failure, we force any asynchronous waits on a foreign
    fence to timeout after 10s - so that a stall in another driver does not
    permanently cripple ourselves. Still unpleasant though!

    Testcase: igt/prime_vgem/fence-wait
    Signed-off-by: Chris Wilson
    Reviewed-by: John Harrison
    Reviewed-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-21-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Now that we have fences in place to drive request submission, we can
    employ those to queue requests after their dependencies as opposed to
    stalling in the middle of an execbuf ioctl. (However, we still choose to
    spin before enabling the IRQ as that is faster - though contentious.)

    v2: Do the fence ordering first, where we can still fail.

    Signed-off-by: Chris Wilson
    Reviewed-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-20-chris@chris-wilson.co.uk

    Chris Wilson
     
  • If we are waiting upon an external fence, from the pov of hangcheck the
    engine is stuck on the last submitted seqno. Currently we give a small
    increment to the hangcheck score in order to catch a stuck waiter /
    driver. Now that we both have an independent wait hangcheck and may be
    stuck waiting on an external fence, resetting the GPU has little effect
    on that external fence. As we cannot advance by resetting, skip
    incrementing the hangcheck score.

    Signed-off-by: Chris Wilson
    Cc: Mika Kuoppala
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-19-chris@chris-wilson.co.uk

    Chris Wilson
     
  • If we find a ring waiting on a semaphore for another assigned but not yet
    emitted request, treat it as valid and waiting.

    Signed-off-by: Chris Wilson
    Reviewed-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-18-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Currently the presumption is that the request construction and its
    submission to the GuC are all under the same holding of struct_mutex. We
    wish to relax this to separate the request construction and the later
    submission to the GuC. This requires us to reserve some space in the
    GuC command queue for the future submission. For flexibility to handle
    out-of-order request submission we do not preallocate the next slot in
    the GuC command queue during request construction, just ensuring that
    there is enough space later.

    Signed-off-by: Chris Wilson
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-17-chris@chris-wilson.co.uk

    Chris Wilson
     
  • We are about to specialize object synchronisation to enable nonblocking
    execbuf submission. First we make a copy of the current object
    synchronisation for execbuffer. The general i915_gem_object_sync() will
    be removed following the removal of CS flips in the near future.

    Signed-off-by: Chris Wilson
    Reviewed-by: John Harrison
    Reviewed-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Let's avoid mixing sealing the hardware commands for the request and
    adding the request to the software tracking.

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-15-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Drive final request submission from a callback from the fence. This way
    the request is queued until all dependencies are resolved, at which
    point it is handed to the backend for queueing to hardware. At this
    point, no dependencies are set on the request, so the callback is
    immediate.

    A side-effect of imposing a heavier-irqsafe spinlock for execlist
    submission is that we lose the softirq enabling after scheduling the
    execlists tasklet. To compensate, we manually kickstart the softirq by
    disabling and enabling the bh around the fence signaling.

    Signed-off-by: Chris Wilson
    Reviewed-by: Joonas Lahtinen
    Reviewed-by: John Harrison
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Update reset path in preparation for engine reset which requires
    identification of incomplete requests and associated context and fixing
    their state so that engine can resume correctly after reset.

    The request that caused the hang will be skipped and head is reset to the
    start of breadcrumb. This allows us to resume from where we left-off.
    Since this request didn't complete normally we also need to cleanup elsp
    queue manually. This is vital if we employ nonblocking request
    submission where we may have a web of dependencies upon the hung request
    and so advancing the seqno manually is no longer trivial.

    ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS

    We change the way we count pending batches. Only the active context
    involved in the reset is marked as either innocent or guilty, and not
    mark the entire world as pending. By inspection this only affects
    igt/gem_reset_stats (which assumes implementation details) and not
    piglit.

    ARB_robustness gives this guide on how we expect the user of this
    interface to behave:

    * Provide a mechanism for an OpenGL application to learn about
    graphics resets that affect the context. When a graphics reset
    occurs, the OpenGL context becomes unusable and the application
    must create a new context to continue operation. Detecting a
    graphics reset happens through an inexpensive query.

    And with regards to the actual meaning of the reset values:

    Certain events can result in a reset of the GL context. Such a reset
    causes all context state to be lost. Recovery from such events
    requires recreation of all objects in the affected context. The
    current status of the graphics reset state is returned by

    enum GetGraphicsResetStatusARB();

    The symbolic constant returned indicates if the GL context has been
    in a reset state at any point since the last call to
    GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
    has not been in a reset state since the last call.
    GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
    that is attributable to the current GL context.
    INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
    is not attributable to the current GL context.
    UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
    cause is unknown.

    The language here is explicit in that we must mark up the guilty batch,
    but is loose enough for us to relax the innocent (i.e. pending)
    accounting as only the active batches are involved with the reset.

    In the future, we are looking towards single engine resetting (with
    minimal locking), where it seems inappropriate to mark the entire world
    as innocent since the reset occurred on a different engine. Reducing the
    information available means we only have to encounter the pain once, and
    also reduces the information leaking from one context to another.

    v2: Legacy ringbuffer submission required a reset following hibernation,
    or else we restore stale values to the RING_HEAD and walked over
    stolen garbage.

    v3: GuC requires replaying the requests after a reset.

    v4: Restore engine IRQ after reset (so waiters will be woken!)
    Rearm hangcheck if resetting with a waiter.

    Cc: Tvrtko Ursulin
    Cc: Mika Kuoppala
    Cc: Arun Siluvery
    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Since we have a cooperative mode now with a direct reset, we can avoid
    the contention on struct_mutex and instead try then sleep on the
    I915_RESET_IN_PROGRESS bit. If the mutex is held and that bit is
    cleared, all is fine. Otherwise, we sleep for a bit and try again. In
    the worst case we sleep for an extra second waiting for the mutex to be
    released (no one touching the GPU is allowed the struct_mutex whilst the
    I915_RESET_IN_PROGRESS bit is set). But when we have a direct reset,
    this allows us to clean up the reset worker faster.

    v2: Remember to call wake_up_bit() after changing (for the faster wakeup
    as promised)

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-12-chris@chris-wilson.co.uk

    Chris Wilson
     
  • If a waiter is holding the struct_mutex, then the reset worker cannot
    reset the GPU until the waiter returns. We do not want to return -EAGAIN
    form i915_wait_request as that breaks delicate operations like
    i915_vma_unbind() which often cannot be restarted easily, and returning
    -EIO is just as useless (and has in the past proven dangerous). The
    remaining WARN_ON(i915_wait_request) serve as a valuable reminder that
    handling errors from an indefinite wait are tricky.

    We can keep the current semantic that knowing after a reset is complete,
    so is the request, by performing the reset ourselves if we hold the
    mutex.

    uevent emission is still handled by the reset worker, so it may appear
    slightly out of order with respect to the actual reset (and concurrent
    use of the device).

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk

    Chris Wilson
     
  • In the next patch we want to handle reset directly by a locked waiter in
    order to avoid issues with returning before the reset is handled. To
    handle the reset, we must first know whether we hold the struct_mutex.
    If we do not hold the struct_mtuex we can not perform the reset, but we do
    not block the reset worker either (and so we can just continue to wait for
    request completion) - otherwise we must relinquish the mutex.

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk

    Chris Wilson
     
  • We need finer control over wakeup behaviour during i915_wait_request(),
    so expand the current bool interruptible to a bitmask.

    Signed-off-by: Chris Wilson
    Reviewed-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Access to intel_init_emon() is strictly ordered by gt_powersave, using
    struct_mutex around it is overkill (and will conflict with the caller
    holding struct_mutex themselves).

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-8-chris@chris-wilson.co.uk

    Chris Wilson
     
  • In preparation for introducing a per-engine reset, we can first separate
    the mixing of the reset state from the global reset counter.

    The loss of atomicity in updating the reset state poses a small problem
    for handling the waiters. For requests, this is solved by advancing the
    seqno so that a waiter waking up after the reset knows the request is
    complete. For pending flips, we still rely on the increment of the
    global reset epoch (as well as the reset-in-progress flag) to signify
    when the hardware was reset.

    The advantage, now that we do not inspect the reset state during reset
    itself i.e. we no longer emit requests during reset, is that we can use
    the atomic updates of the state flags to ensure that only one reset
    worker is active.

    v2: Mika spotted that I transformed the i915_gem_wait_for_error() wakeup
    into a waiter wakeup.

    Signed-off-by: Chris Wilson
    Cc: Arun Siluvery
    Cc: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-6-git-send-email-arun.siluvery@linux.intel.com
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-7-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Emulate HW to track and manage ELSP queue. A set of SW ports are defined
    and requests are assigned to these ports before submitting them to HW. This
    helps in cleaning up incomplete requests during reset recovery easier
    especially after engine reset by decoupling elsp queue management. This
    will become more clear in the next patch.

    In the engine reset case we want to resume where we left-off after skipping
    the incomplete batch which requires checking the elsp queue, removing
    element and fixing elsp_submitted counts in some cases. Instead of directly
    manipulating the elsp queue from reset path we can examine these ports, fix
    up ringbuffer pointers using the incomplete request and restart submissions
    again after reset.

    Cc: Tvrtko Ursulin
    Cc: Mika Kuoppala
    Cc: Arun Siluvery
    Signed-off-by: Chris Wilson
    Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-3-git-send-email-arun.siluvery@linux.intel.com
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-6-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Just rearrange the code to reduce churn in the next patch.

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-5-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Similar to the issue with reading from the context status buffer,
    see commit 26720ab97fea ("drm/i915: Move CSB MMIO reads out of the
    execlists lock"), we frequently write to the ELSP register (4 writes per
    interrupt) and know we hold the required spinlock and forcewake throughout.
    We can further reduce the cost of writing these registers beyond the
    I915_WRITE_FW() by precomputing the address of the ELSP register. We also
    note that the subsequent read serves no purpose here, and are happy to
    see it go.

    v2: Address I915_WRITE mistakes in changelog

    text data bss dec hex filename
    1259784 4581 576 1264941 134d2d drivers/gpu/drm/i915/i915.ko
    1259720 4581 576 1264877 134ced drivers/gpu/drm/i915/i915.ko

    Saves 64 bytes of address recomputation.

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-4-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Rather than blindly assuming we need to advance the tail for
    resubmitting the request via the ELSP, record the position.

    Signed-off-by: Chris Wilson
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-3-chris@chris-wilson.co.uk

    Chris Wilson
     
  • Leave the more complicated request dequeueing to the tasklet and instead
    just kick start the tasklet if we detect we are adding the first
    request.

    v2: Play around with list operators until we agree upon something

    Signed-off-by: Chris Wilson
    Cc: Mika Kuoppala
    Reviewed-by: Mika Kuoppala
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-2-chris@chris-wilson.co.uk

    Chris Wilson
     
  • This is really a core kernel struct in disguise until we can finally
    place it in kernel/. There is an immediate need for a fence collection
    mechanism that is more flexible than fence-array, in particular being
    able to easily drive request submission via events (and not just
    interrupt driven). The same mechanism would be useful for handling
    nonblocking and asynchronous atomic modesets, parallel execution and
    more, but for the time being just create a local sw fence for execbuf.

    Signed-off-by: Chris Wilson
    Reviewed-by: Joonas Lahtinen
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-1-chris@chris-wilson.co.uk

    Chris Wilson
     

08 Sep, 2016

1 commit