17 Oct, 2011

1 commit

  • This adds a mechanism to resume selected IRQs during syscore_resume
    instead of dpm_resume_noirq.

    Under Xen we need to resume IRQs associated with IPIs early enough
    that the resched IPI is unmasked and we can therefore schedule
    ourselves out of the stop_machine where the suspend/resume takes
    place.

    This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME".

    Signed-off-by: Ian Campbell
    Cc: Rafael J. Wysocki
    Cc: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Konrad Rzeszutek Wilk
    Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk
    Cc: stable@kernel.org (at least to 2.6.32.y)
    Signed-off-by: Thomas Gleixner

    Ian Campbell
     

03 Oct, 2011

2 commits

  • As request_percpu_irq() doesn't allow for a percpu interrupt to have
    its type configured (it is generally impossible to configure it on all
    CPUs at once), add a 'type' argument to enable_percpu_irq().

    This allows some low-level, board specific init code to be switched to
    a generic API.

    [ tglx: Added WARN_ON argument ]

    Signed-off-by: Marc Zyngier
    Cc: Abhijeet Dharmapurikar
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     
  • The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
    which are usually used to connect local timers to each core. Each CPU
    has its own private interface to the GIC, and only sees the PPIs that
    are directly connect to it.

    While these timers are separate devices and have a separate interrupt
    line to a core, they all use the same IRQ number.

    For these devices, request_irq() is not the right API as it assumes
    that an IRQ number is visible by a number of CPUs (through the
    affinity setting), but makes it very awkward to express that an IRQ
    number can be handled by all CPUs, and yet be a different interrupt
    line on each CPU, requiring a different dev_id cookie to be passed
    back to the handler.

    The *_percpu_irq() functions is designed to overcome these
    limitations, by providing a per-cpu dev_id vector:

    int request_percpu_irq(unsigned int irq, irq_handler_t handler,
    const char *devname, void __percpu *percpu_dev_id);
    void free_percpu_irq(unsigned int, void __percpu *);
    int setup_percpu_irq(unsigned int irq, struct irqaction *new);
    void remove_percpu_irq(unsigned int irq, struct irqaction *act);
    void enable_percpu_irq(unsigned int irq);
    void disable_percpu_irq(unsigned int irq);

    The API has a number of limitations:
    - no interrupt sharing
    - no threading
    - common handler across all the CPUs

    Once the interrupt is requested using setup_percpu_irq() or
    request_percpu_irq(), it must be enabled by each core that wishes its
    local interrupt to be delivered.

    Based on an initial patch by Thomas Gleixner.

    Signed-off-by: Marc Zyngier
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

15 Jun, 2011

1 commit

  • Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
    introduced performance regression. In an AIM7 test, this commit degraded
    performance by about 40%.

    The commit runs rcu callbacks in a kthread instead of softirq. We observed
    high rate of context switch which is caused by this. Out test system has
    64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
    which is caused by RCU's per-CPU kthread. A trace showed that most of
    the time the RCU per-CPU kthread doesn't actually handle any callbacks,
    but instead just does a very small amount of work handling grace periods.
    This means that RCU's per-CPU kthreads are making the scheduler do quite
    a bit of work in order to allow a very small amount of RCU-related
    processing to be done.

    Alex Shi's analysis determined that this slowdown is due to lock
    contention within the scheduler. Unfortunately, as Peter Zijlstra points
    out, the scheduler's real-time semantics require global action, which
    means that this contention is inherent in real-time scheduling. (Yes,
    perhaps someone will come up with a workaround -- otherwise, -rt is not
    going to do well on large SMP systems -- but this patch will work around
    this issue in the meantime. And "the meantime" might well be forever.)

    This patch therefore re-introduces softirq processing to RCU, but only
    for core RCU work. RCU callbacks are still executed in kthread context,
    so that only a small amount of RCU work runs in softirq context in the
    common case. This should minimize ksoftirqd execution, allowing us to
    skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

    Signed-off-by: Shaohua Li
    Tested-by: "Alex,Shi"
    Signed-off-by: Paul E. McKenney

    Shaohua Li
     

06 May, 2011

1 commit

  • If RCU priority boosting is to be meaningful, callback invocation must
    be boosted in addition to preempted RCU readers. Otherwise, in presence
    of CPU real-time threads, the grace period ends, but the callbacks don't
    get invoked. If the callbacks don't get invoked, the associated memory
    doesn't get freed, so the system is still subject to OOM.

    But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
    moves the callback invocations to a kthread, which can be boosted easily.

    Also add comments and properly synchronized all accesses to
    rcu_cpu_kthread_task, as suggested by Lai Jiangshan.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     

31 Mar, 2011

1 commit


30 Mar, 2011

1 commit


16 Mar, 2011

2 commits

  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
    x86: Enable forced interrupt threading support
    x86: Mark low level interrupts IRQF_NO_THREAD
    x86: Use generic show_interrupts
    x86: ioapic: Avoid redundant lookup of irq_cfg
    x86: ioapic: Use new move_irq functions
    x86: Use the proper accessors in fixup_irqs()
    x86: ioapic: Use irq_data->state
    x86: ioapic: Simplify irq chip and handler setup
    x86: Cleanup the genirq name space
    genirq: Add chip flag to force mask on suspend
    genirq: Add desc->irq_data accessor
    genirq: Add comments to Kconfig switches
    genirq: Fixup fasteoi handler for oneshot mode
    genirq: Provide forced interrupt threading
    sched: Switch wait_task_inactive to schedule_hrtimeout()
    genirq: Add IRQF_NO_THREAD
    genirq: Allow shared oneshot interrupts
    genirq: Prepare the handling of shared oneshot interrupts
    genirq: Make warning in handle_percpu_event useful
    x86: ioapic: Move trigger defines to io_apic.h
    ...

    Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
    space changes clashing with the Xen cleanups. The set_irq_msi() had
    moved to xen_bind_pirq_msi_to_irq().

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
    sched: Resched proper CPU on yield_to()
    sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
    sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
    sched: Clean up the IRQ_TIME_ACCOUNTING code
    sched: Add #ifdef around irq time accounting functions
    sched, autogroup: Stop claiming ownership of the root task group
    sched, autogroup: Stop going ahead if autogroup is disabled
    sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
    sched: Fix the group_imb logic
    sched: Clean up some f_b_g() comments
    sched: Clean up remnants of sd_idle
    sched: Wholesale removal of sd_idle logic
    sched: Add yield_to(task, preempt) functionality
    sched: Use a buddy to implement yield_task_fair()
    sched: Limit the scope of clear_buddies
    sched: Check the right ->nr_running in yield_task_fair()
    sched: Avoid expensive initial update_cfs_load(), on UP too
    sched: Fix switch_from_fair()
    sched: Simplify the idle scheduling class
    softirqs: Account ksoftirqd time as cpustat softirq
    ...

    Linus Torvalds
     

26 Feb, 2011

3 commits

  • Add a commandline parameter "threadirqs" which forces all interrupts except
    those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
    allow retrieving better debug data from crashing interrupt handlers. If
    "threadirqs" is not enabled on the kernel command line, then there is no
    impact in the interrupt hotpath.

    Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
    marking the interrupts which cant be threaded IRQF_NO_THREAD. All
    interrupts which have IRQF_TIMER set are implict marked
    IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.

    Forced threading hard interrupts also forces all soft interrupt
    handling into thread context.

    When enabled it might slow down things a bit, but for debugging problems in
    interrupt code it's a reasonable penalty as it does not immediately
    crash and burn the machine when an interrupt handler is buggy.

    Some test results on a Core2Duo machine:

    Cache cold run of:
    # time git grep irq_desc

    non-threaded threaded
    real 1m18.741s 1m19.061s
    user 0m1.874s 0m1.757s
    sys 0m5.843s 0m5.427s

    # iperf -c server
    non-threaded
    [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec
    threaded
    [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec
    [ 3] 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    LKML-Reference:

    Thomas Gleixner
     
  • Some low level interrupts cannot be threaded even when we force thread
    all interrupt handlers. Add a flag to annotate such interrupts. Add
    all timer interrupts to this category by default.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    LKML-Reference:

    Thomas Gleixner
     
  • For level type interrupts we need to track how many threads are on
    flight to avoid useless interrupt storms when not all thread handlers
    have finished yet. Keep track of the woken threads and only unmask
    when there are no more threads in flight.

    Yes, I'm lazy and using a bitfield. But not only because I'm lazy, the
    main reason is that it's way simpler than using a refcount. A refcount
    based solution would need to keep track of various things like
    crashing the irq thread, spurious interrupts coming in,
    disables/enables, free_irq() and some more. The bitfield keeps the
    tracking simple and makes things just work. It's also nicely confined
    to the thread code pathes and does not require additional checks all
    over the place.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    LKML-Reference:

    Thomas Gleixner
     

19 Feb, 2011

3 commits


08 Feb, 2011

2 commits


26 Jan, 2011

1 commit


23 Jan, 2011

1 commit

  • When initiating I/O on a multiqueue and multi-IRQ device, we may want
    to select a queue for which the response will be handled on the same
    or a nearby CPU. This requires a reverse-map of IRQ affinity. Add a
    notification mechanism to support this.

    This is based closely on work by Thomas Gleixner .

    Signed-off-by: Ben Hutchings
    Cc: linux-net-drivers@solarflare.com
    Cc: Tom Herbert
    Cc: David Miller
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Ben Hutchings
     

10 Nov, 2010

1 commit

  • We currently use kmalloc-96 slab for struct irqaction allocations on
    64bit arches.

    This is unfortunate because of possible false sharing and two cache
    lines accesses.

    Move 'name' and 'dir' fields at the end of the structure, and force a
    suitable alignement.

    Hot path fields now use one cache line on x86_64.

    Signed-off-by: Eric Dumazet
    Reviewed-by: Andi Kleen
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Eric Dumazet
     

28 Oct, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    perf python scripting: Add futex-contention script
    perf python scripting: Fixup cut'n'paste error in sctop script
    perf scripting: Shut up 'perf record' final status
    perf record: Remove newline character from perror() argument
    perf python scripting: Support fedora 11 (audit 1.7.17)
    perf python scripting: Improve the syscalls-by-pid script
    perf python scripting: print the syscall name on sctop
    perf python scripting: Improve the syscalls-counts script
    perf python scripting: Improve the failed-syscalls-by-pid script
    kprobes: Remove redundant text_mutex lock in optimize
    x86/oprofile: Fix uninitialized variable use in debug printk
    tracing: Fix 'faild' -> 'failed' typo
    perf probe: Fix format specified for Dwarf_Off parameter
    perf trace: Fix detection of script extension
    perf trace: Use $PERF_EXEC_PATH in canned report scripts
    perf tools: Document event modifiers
    perf tools: Remove direct slang.h include
    perf_events: Fix for transaction recovery in group_sched_in()
    perf_events: Revert: Fix transaction recovery in group_sched_in()
    perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
    ...

    Linus Torvalds
     

24 Oct, 2010

1 commit


23 Oct, 2010

1 commit

  • … 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    softirqs: Make wakeup_softirqd static

    * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, asm: Restore parentheses around one pushl_cfi argument
    x86, asm: Fix ancient-GAS workaround
    x86, asm: Fix CFI macro invocations to deal with shortcomings in gas

    * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA

    * 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: HPET force enable for CX700 / VIA Epia LT

    * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, setup: Use string copy operation to optimze copy in kernel compression

    * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()

    * 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.

    Linus Torvalds
     

22 Oct, 2010

1 commit

  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits)
    apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
    apic, x86: Check if EILVT APIC registers are available (AMD only)
    x86: ioapic: Call free_irte only if interrupt remapping enabled
    arm: Use ARCH_IRQ_INIT_FLAGS
    genirq, ARM: Fix boot on ARM platforms
    genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build
    x86: Switch sparse_irq allocations to GFP_KERNEL
    genirq: Switch sparse_irq allocator to GFP_KERNEL
    genirq: Make sparse_lock a mutex
    x86: lguest: Use new irq allocator
    genirq: Remove the now unused sparse irq leftovers
    genirq: Sanitize dynamic irq handling
    genirq: Remove arch_init_chip_data()
    x86: xen: Sanitise sparse_irq handling
    x86: Use sane enumeration
    x86: uv: Clean up the direct access to irq_desc
    x86: Make io_apic.c local functions static
    genirq: Remove irq_2_iommu
    x86: Speed up the irq_remapped check in hot pathes
    intr_remap: Simplify the code further
    ...

    Fix up trivial conflicts in arch/x86/Kconfig

    Linus Torvalds
     

21 Oct, 2010

1 commit

  • With the addition of trace_softirq_raise() the softirq tracepoint got
    even more convoluted. Why the tracepoints take two pointers to assign
    an integer is beyond my comprehension.

    But adding an extra case which treats the first pointer as an unsigned
    long when the second pointer is NULL including the back and forth
    type casting is just horrible.

    Convert the softirq tracepoints to take a single unsigned int argument
    for the softirq vector number and fix the call sites.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Acked-by: Peter Zijlstra
    Acked-by: mathieu.desnoyers@efficios.com
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt

    Thomas Gleixner
     

12 Oct, 2010

1 commit


22 Sep, 2010

1 commit


07 Sep, 2010

1 commit

  • Add a tracepoint for tracing when softirq action is raised.

    This and the existing tracepoints complete softirq's tracepoints:
    softirq_raise, softirq_entry and softirq_exit.

    And when this tracepoint is used in combination with
    the softirq_entry tracepoint we can determine
    the softirq raise latency.

    Signed-off-by: Lai Jiangshan
    Acked-by: Mathieu Desnoyers
    Acked-by: Neil Horman
    Cc: David Miller
    Cc: Kaneshige Kenji
    Cc: Izumo Taku
    Cc: Kosaki Motohiro
    Cc: Lai Jiangshan
    Cc: Scott Mcmillan
    Cc: Steven Rostedt
    Cc: Eric Dumazet
    LKML-Reference:
    [ factorize softirq events with DECLARE_EVENT_CLASS ]
    Signed-off-by: Koki Sanagi
    Signed-off-by: Frederic Weisbecker

    Lai Jiangshan
     

29 Jul, 2010

1 commit

  • A small number of users of IRQF_TIMER are using it for the implied no
    suspend behaviour on interrupts which are not timer interrupts.

    Therefore add a new IRQF_NO_SUSPEND flag, rename IRQF_TIMER to
    __IRQF_TIMER and redefine IRQF_TIMER in terms of these new flags.

    Signed-off-by: Ian Campbell
    Cc: Jeremy Fitzhardinge
    Cc: Dmitry Torokhov
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Grant Likely
    Cc: xen-devel@lists.xensource.com
    Cc: linux-input@vger.kernel.org
    Cc: linuxppc-dev@ozlabs.org
    Cc: devicetree-discuss@lists.ozlabs.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Ian Campbell
     

22 May, 2010

1 commit

  • Fix kernel-doc fatal error:
    /** beginning a non-kernel-doc comment block:
    (That alone does not kill kernel-doc, but the 'enum' was
    totally confusing to it.)

    Error(/lnx/src/TMP/linux-2.6.34-git6//include/linux/interrupt.h:88): cannot understand prototype: 'enum '
    make[2]: *** [Documentation/DocBook/genericirq.xml] Error 1

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

03 May, 2010

1 commit

  • This patch adds a cpumask affinity hint to the irq_desc structure,
    along with a registration function and a read-only proc entry for each
    interrupt.

    This affinity_hint handle for each interrupt can be used by underlying
    drivers that need a better mechanism to control interrupt affinity.
    The underlying driver can register a cpumask for the interrupt, which
    will allow the driver to provide the CPU mask for the interrupt to
    anything that requests it. The intent is to extend the userspace
    daemon, irqbalance, to help hint to it a preferred CPU mask to balance
    the interrupt into.

    [ tglx: Fixed compile warnings, added WARN_ON, made SMP only ]

    Signed-off-by: Peter P Waskiewicz Jr
    Cc: davem@davemloft.net
    Cc: arjan@linux.jf.intel.com
    Cc: bhutchings@solarflare.com
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Peter P Waskiewicz Jr
     

13 Apr, 2010

2 commits

  • Remove all code which is related to IRQF_DISABLED from the core kernel
    code. IRQF_DISABLED still exists as a flag, but becomes a NOOP and
    will be removed after a grace period. That way we can easily revert to
    the previous behaviour by just restoring the core code.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: Andi Kleen
    Cc: David Miller
    Cc: Greg Kroah-Hartman
    Cc: Arnaldo Carvalho de Melo
    Cc: Linus Torvalds
    LKML-Reference:

    Thomas Gleixner
     
  • Now that we enjoy threaded interrupts, we're starting to see irq_chip
    implementations (wm831x, pca953x) that make use of threaded interrupts
    for the controller, and nested interrupts for the client interrupt. It
    all works very well, with one drawback:

    Drivers requesting an IRQ must now know whether the handler will
    run in a thread context or not, and call request_threaded_irq() or
    request_irq() accordingly.

    The problem is that the requesting driver sometimes doesn't know
    about the nature of the interrupt, specially when the interrupt
    controller is a discrete chip (typically a GPIO expander connected
    over I2C) that can be connected to a wide variety of otherwise perfectly
    supported hardware.

    This patch introduces the request_any_context_irq() function that mostly
    mimics the usual request_irq(), except that it checks whether the irq
    level is configured as nested or not, and calls the right backend.
    On success, it also returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED.

    [ tglx: Made return value an enum, simplified code and made the export
    of request_any_context_irq GPL ]

    Signed-off-by: Marc Zyngier
    Cc:
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    Marc Zyngier
     

04 Nov, 2009

1 commit


12 Oct, 2009

1 commit


24 Sep, 2009

2 commits


15 Sep, 2009

1 commit

  • * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
    block: use blkdev_issue_discard in blk_ioctl_discard
    Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
    block: don't assume device has a request list backing in nr_requests store
    block: Optimal I/O limit wrapper
    cfq: choose a new next_req when a request is dispatched
    Seperate read and write statistics of in_flight requests
    aoe: end barrier bios with EOPNOTSUPP
    block: trace bio queueing trial only when it occurs
    block: enable rq CPU completion affinity by default
    cfq: fix the log message after dispatched a request
    block: use printk_once
    cciss: memory leak in cciss_init_one()
    splice: update mtime and atime on files
    block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
    cfq-iosched: get rid of must_alloc flag
    block: use interrupts disabled version of raise_softirq_irqoff()
    block: fix comment in blk-iopoll.c
    block: adjust default budget for blk-iopoll
    block: fix long lines in block/blk-iopoll.c
    block: add blk-iopoll, a NAPI like approach for block devices
    ...

    Linus Torvalds
     

11 Sep, 2009

1 commit

  • This borrows some code from NAPI and implements a polled completion
    mode for block devices. The idea is the same as NAPI - instead of
    doing the command completion when the irq occurs, schedule a dedicated
    softirq in the hopes that we will complete more IO when the iopoll
    handler is invoked. Devices have a budget of commands assigned, and will
    stay in polled mode as long as they continue to consume their budget
    from the iopoll softirq handler. If they do not, the device is set back
    to interrupt completion mode.

    This patch holds the core bits for blk-iopoll, device driver support
    sold separately.

    Signed-off-by: Jens Axboe

    Jens Axboe