25 Sep, 2014

4 commits

  • * Some comments became stale. Updated.
    * percpu_ref_tryget() unnecessarily initializes @ret. Removed.
    * A blank line removed from percpu_ref_kill_rcu().
    * Explicit function name in a WARN format string replaced with __func__.
    * WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE().

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet

    Tejun Heo
     
  • percpu_ref is gonna go through restructuring. Move
    percpu_ref_reinit() after percpu_ref_kill_and_confirm(). This will
    make later changes easier to follow and result in cleaner
    organization.

    Signed-off-by: Tejun Heo
    Reviewed-by: Kent Overstreet

    Tejun Heo
     
  • This reverts commit 0a30288da1aec914e158c2d7a3482a85f632750f, which
    was a temporary fix for SCSI blk-mq stall issue. The following
    patches will fix the issue properly by introducing atomic mode to
    percpu_ref.

    Signed-off-by: Tejun Heo
    Cc: Kent Overstreet
    Cc: Jens Axboe
    Cc: Christoph Hellwig

    Tejun Heo
     
  • …linux-block into for-3.18

    This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a
    kludge for SCSI blk-mq stall during probe") which implements
    __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The
    commit reverted and patches to implement proper fix will be added.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Kent Overstreet <kmo@daterainc.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Christoph Hellwig <hch@lst.de>

    Tejun Heo
     

24 Sep, 2014

11 commits

  • blk-mq uses percpu_ref for its usage counter which tracks the number
    of in-flight commands and used to synchronously drain the queue on
    freeze. percpu_ref shutdown takes measureable wallclock time as it
    involves a sched RCU grace period. This means that draining a blk-mq
    takes measureable wallclock time. One would think that this shouldn't
    matter as queue shutdown should be a rare event which takes place
    asynchronously w.r.t. userland.

    Unfortunately, SCSI probing involves synchronously setting up and then
    tearing down a lot of request_queues back-to-back for non-existent
    LUNs. This means that SCSI probing may take more than ten seconds
    when scsi-mq is used.

    This will be properly fixed by implementing a mechanism to keep
    q->mq_usage_counter in atomic mode till genhd registration; however,
    that involves rather big updates to percpu_ref which is difficult to
    apply late in the devel cycle (v3.17-rc6 at the moment). As a
    stop-gap measure till the proper fix can be implemented in the next
    cycle, this patch introduces __percpu_ref_kill_expedited() and makes
    blk_mq_freeze_queue() use it. This is heavy-handed but should work
    for testing the experimental SCSI blk-mq implementation.

    Signed-off-by: Tejun Heo
    Reported-by: Christoph Hellwig
    Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
    Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
    Cc: Kent Overstreet
    Cc: Jens Axboe
    Tested-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Pull infiniband/rdma fixes from Roland Dreier:
    "Last late set of InfiniBand/RDMA fixes for 3.17:

    - fixes for the new memory region re-registration support
    - iSER initiator error path fixes
    - grab bag of small fixes for the qib and ocrdma hardware drivers
    - larger set of fixes for mlx4, especially in RoCE mode"

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
    IB/mlx4: Fix VF mac handling in RoCE
    IB/mlx4: Do not allow APM under RoCE
    IB/mlx4: Don't update QP1 in native mode
    IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header
    mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses
    IB/core: When marshaling uverbs path, clear unused fields
    IB/mlx4: Avoid executing gid task when device is being removed
    IB/mlx4: Fix lockdep splat for the iboe lock
    IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up
    IB/mlx4: Reorder steps in RoCE GID table initialization
    IB/mlx4: Don't duplicate the default RoCE GID
    IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()
    IB/iser: Bump version to 1.4.1
    IB/iser: Allow bind only when connection state is UP
    IB/iser: Fix RX/TX CQ resource leak on error flow
    RDMA/ocrdma: Use right macro in query AH
    RDMA/ocrdma: Resolve L2 address when creating user AH
    mlx4: Correct error flows in rereg_mr
    IB/qib: Correct reference counting in debugfs qp_stats
    IPoIB: Remove unnecessary port query
    ...

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "One fix is about a buggy computation in PCM API function Clemens
    spotted out, but the impact must be really small as no one really uses
    it in user-space side.

    The rest are a trivial fix for a HD-audio model and a USB-audio
    device-specific regression fix, so all look fairly safe to apply"

    * tag 'sound-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: snd-usb-caiaq: Fix LED commands for Kore controller
    ALSA: pcm: fix fifo_size frame calculation
    ALSA: hda - Add fixup model name lookup for Lemote A1205

    Linus Torvalds
     
  • Pull final block fixes from Jens Axboe:
    "This week and last we've been fixing some corner cases related to
    blk-mq, mostly. I ended up pulling most of that out of for-linus
    yesterday, which is why the branch looks fresh. The rest were
    postponed for 3.18.

    This pull request contains:

    - Fix from Christoph, avoiding a stack overflow when FUA insertion
    would recursive infinitely.

    - Fix from David Hildenbrand on races between the timeout handler and
    uninitialized requests. Fixes a real issue that virtio_blk has run
    into.

    - A few fixes from me:

    - Ensure that request deadline/timeout is ordered before the
    request is marked as started.

    - A potential oops on out-of-memory, when we scale the queue
    depth of the device and retry.

    - A hang fix on requeue from SCSI, where the hardware queue
    would be stopped when we attempt to re-run it (and hence
    nothing would happen, stalling progress).

    - A fix for commit 2da78092, where the cleanup path was moved
    to RCU, but a debug might_sleep() was inadvertently left in
    the code. This causes warnings for people"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    genhd: fix leftover might_sleep() in blk_free_devt()
    blk-mq: use blk_mq_start_hw_queues() when running requeue work
    blk-mq: fix potential oops on out-of-memory in __blk_mq_alloc_rq_maps()
    blk-mq: avoid infinite recursion with the FUA flag
    blk-mq: Avoid race condition with uninitialized requests
    blk-mq: request deadline must be visible before marking rq as started

    Linus Torvalds
     
  • Pull parisc fixes from Helge Deller:
    "We avoid using -mfast-indirect-calls for 64bit kernel builds to
    prevent building an unbootable kernel due to latest gcc changes.

    In the pdc_stable/firmware-access driver we fix a few possible stack
    overflows and we now call secure_computing_strict() instead of
    secure_computing() which fixes upcoming SECCOMP patches in the
    for-next trees"

    * 'parisc-3.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds
    parisc: pdc_stable.c: Avoid potential stack overflows
    parisc: pdc_stable.c: Cleaning up unnecessary use of memset in conjunction with strncpy
    parisc: ptrace: use secure_computing_strict()

    Linus Torvalds
     
  • In spite of what the GCC manual says, the -mfast-indirect-calls has
    never been supported in the 64-bit parisc compiler. Indirect calls have
    always been done using function descriptors irrespective of the
    -mfast-indirect-calls option.

    Recently, it was noticed that a function descriptor was always requested
    when the -mfast-indirect-calls option was specified. This caused
    problems when the option was used in application code and doesn't make
    any sense because the whole point of the option is to avoid using a
    function descriptor for indirect calls.

    Fixing this broke 64-bit kernel builds.

    I will fix GCC but for now we need the attached change. This results in
    the same kernel code as before.

    Signed-off-by: John David Anglin
    Cc: stable@vger.kernel.org # v3.0+
    Signed-off-by: Helge Deller

    John David Anglin
     
  • Pull ia64 defconfig update from Tony Luck:
    "Need to rebuild defconfig files to cope with removal of "select NET"
    in drivers/scsi/Kconfig"

    * tag 'please-pull-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
    [IA64] refresh arch/ia64/configs/* using "make savedefconfig"

    Linus Torvalds
     
  • Pull hwmon fixes from Guenter Roeck:
    - Fix a resource leak in tmp103 driver
    - Add support for two more processors to fam15h_power driver
    - Also fix a bug in the same driver to only report the power level on
    chips which actually support reporting it

    * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
    hwmon: (tmp103) Fix resource leak bug in tmp103 temperature sensor driver
    hwmon: (fam15h_power) Add support for two more processors
    hwmon: (fam15h_power) Make actual power reporting conditional

    Linus Torvalds
     
  • Prompted by a change to drivers/scsi/Kconfig which used to do a
    "select NET" but now does a "depends on NET". This meant that some
    configurations ended up without CONFIG_NET=y

    Signed-off-by Tony Luck

    Tony Luck
     
  • Pull another kvm fix from Paolo Bonzini:
    "Another fix for 3.17 arrived at just the wrong time, after I had sent
    yesterday's pull request. Normally I would have waited for some other
    patches to pile up, but since 3.17 might be short here it is"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    arm/arm64: KVM: Fix unaligned access bug on gicv2 access

    Linus Torvalds
     
  • Pull cgroup fix from Tejun Heo:
    "One late fix for cgroup.

    I was waiting for another set of fixes for a long-standing obscure
    cpuset bug but am not sure whether they'll be ready before v3.17
    release. This one is a simple fix for a mutex unlock balance bug in
    an allocation failure path in pidlist_array_load().

    The bug was introduced in v3.14 and the fix is tagged for -stable"

    * 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: fix unbalanced locking

    Linus Torvalds
     

23 Sep, 2014

25 commits

  • …/kernel/git/kvmarm/kvmarm into kvm-master

    Fixes unaligned access to the gicv2 virtual cpu status.

    Paolo Bonzini
     
  • This reverts commit 9cb0e394234d244fe5a97e743ec9dd7ddff7e64b.

    It causes my Sony Vaio Pro 11 to immediately reboot at startup.

    Acked-by: Ingo Molnar
    Cc: Peter Anvin
    Cc: Maarten Lankhorst
    Cc: Ard Biesheuvel
    Cc: Matt Fleming
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) If the user gives us a msg_namelen of 0, don't try to interpret
    anything pointed to by msg_name. From Ani Sinha.

    2) Fix some bnx2i/bnx2fc randconfig compilation errors.

    The gist of the issue is that we firstly have drivers that span both
    SCSI and networking. And at the top of that chain of dependencies
    we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are
    selected.

    But since select is a sledgehammer and ignores dependencies,
    everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also
    explicitly select their dependencies and so on and so forth.

    Generally speaking 'select' is supposed to only be used for child
    nodes, those which have no dependencies of their own. And this
    whole chain of dependencies in the scsi layer violates that rather
    strongly.

    So just make SCSI_NETLINK depend upon it's dependencies, and so on
    and so forth for the things selecting it (either directly or
    indirectly).

    From Anish Bhatt and Randy Dunlap.

    3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert.

    4) Actually notice netdev feature changes in rtl_open() code, from
    Hayes Wang.

    5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov.

    6) Missing memory barrier in sunvnet driver, from David Stevens.

    7) Don't leave anycast addresses around when ipv6 interface is
    destroyed, from Sabrina Dubroca.

    8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is
    initialized in SFC driver, from Edward Cree.

    9) Fix missing DMA error checking in 3c59x, from Neal Horman.

    10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently,
    fix from Samuel Gauthier.

    11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a
    build error.

    12) Fix macvlan regression wherein we stopped emitting
    broadcast/multicast frames over software devices. From Nicolas
    Dichtel.

    13) Fix infiniband bug due to unintended overflow of skb->cb[], from
    Eric Dumazet. And add an assertion so this doesn't happen again.

    14) dm9000_parse_dt() should return error pointers, not NULL. From
    Tobias Klauser.

    15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix
    from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
    net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma
    net: bcmgenet: fix TX reclaim accounting for fragments
    ipv4: do not use this_cpu_ptr() in preemptible context
    dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt()
    r8169: fix an if condition
    r8152: disable ALDPS
    ipoib: validate struct ipoib_cb size
    net: sched: shrink struct qdisc_skb_cb to 28 bytes
    tg3: Work around HW/FW limitations with vlan encapsulated frames
    macvlan: allow to enqueue broadcast pkt on virtual device
    pch_gbe: 'select' NET_PTP_CLASSIFY.
    scsi: Use 'depends' with LIBFC instead of 'select'.
    openvswitch: restore OVS_FLOW_CMD_NEW notifications
    genetlink: add function genl_has_listeners()
    lib: rhashtable: remove second linux/log2.h inclusion
    net: allow macvlans to move to net namespace
    3c59x: Fix bad offset spec in skb_frag_dma_map
    3c59x: Add dma error checking and recovery
    sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG
    can: at91_can: add missing prepare and unprepare of the clock
    ...

    Linus Torvalds
     
  • Pull clock layer fixes from Mike Turquette:
    "The fixes for the clock tree are mostly run-time bugs in clock
    drivers.

    The fixes for TI DRA7 remove divide-by-zero errors. The recently
    merged AT91 clock driver fixes some bad error checking and the QCOM
    driver fix restores audio for that platform, a clear regression. A
    list iteration bug in the framework core was hit recently and is fixed
    up here. Finally a compilation warning is fixed for efm32gg, which is
    also a regression fix"

    * tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
    clk/efm32gg: fix dt init prototype
    clk: prevent erronous parsing of children during rate change
    clk: rockchip: Fix the clocks for i2c1 and i2c2
    clk: qcom: Fix sdc 144kHz frequency entry
    clk: at91: fix num_parents test in at91sam9260 slow clk implementation
    clk: ti: dra7-atl: Provide error check for incoming parameters in set_rate
    clk: ti: divider: Provide error check for incoming parameters in set_rate

    Linus Torvalds
     
  • …git/dhowells/linux-fs

    Pull fs-cache fixes from David Howells:

    - Put a timeout in releasepage() to deal with a recursive hang between
    the memory allocator, writeback, ext4 and fscache under memory
    pressure.

    - Fix a pair of refcount bugs in the fscache error handling.

    - Remove a couple of unused pagevecs.

    - The cachefiles requirement that the base directory support rename
    should permit rename2 as an alternative - otherwise certain
    filesystems cannot now be used as backing stores (such as ext4).

    * tag 'fscache-fixes-20140917' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
    CacheFiles: Handle rename2
    cachefiles: remove two unused pagevecs.
    FS-Cache: refcount becomes corrupt under vma pressure.
    FS-Cache: Reduce cookie ref count if submit fails.
    FS-Cache: Timeout for releasepage()

    Linus Torvalds
     
  • Florian Fainelli says:

    ====================
    net: bcmgenet: TX reclaim and DMA fixes

    This patch set contains one fix for an accounting problem while reclaiming
    transmitted buffers having fragments, and the second fix is to make sure
    that the DMA shutdown is properly controlled.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We should not be manipulaging the DMA_CTRL registers directly by writing
    0 to them to disable DMA. This is an operation that needs to be timed to
    make sure the DMA engines have been properly stopped since their state
    machine stops on a packet boundary, not immediately.

    Make sure that tha bcmgenet_fini_dma() calls bcmgenet_dma_teardown() to
    ensure a proper DMA engine state. As a result, we need to reorder the
    function bodies to resolve the use dependency.

    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • The GENET driver supports SKB fragments, and succeeds in transmitting
    them properly, but when reclaiming these transmitted fragments, we will
    only update the count of free buffer descriptors by 1, even for SKBs
    with fragments. This leads to the networking stack thinking it has more
    room than the hardware has when pushing new SKBs, and backing off
    consequently because we return NETDEV_TX_BUSY.

    Fix this by accounting for the SKB nr_frags plus one (itself) and update
    ring->free_bds accordingly with that value for each iteration loop in
    __bcmgenet_tx_reclaim().

    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • this_cpu_ptr() in preemptible context is generally bad

    Sep 22 05:05:55 br kernel: [ 94.608310] BUG: using smp_processor_id()
    in
    preemptible [00000000] code: ip/2261
    Sep 22 05:05:55 br kernel: [ 94.608316] caller is
    tunnel_dst_set.isra.28+0x20/0x60 [ip_tunnel]
    Sep 22 05:05:55 br kernel: [ 94.608319] CPU: 3 PID: 2261 Comm: ip Not
    tainted
    3.17.0-rc5 #82

    We can simply use raw_cpu_ptr(), as preemption is safe in these
    contexts.

    Should fix https://bugzilla.kernel.org/show_bug.cgi?id=84991

    Signed-off-by: Eric Dumazet
    Reported-by: Joe
    Fixes: 9a4aa9af447f ("ipv4: Use percpu Cache route in IP tunnels")
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We were using an atomic bitop on the vgic_v2.vgic_elrsr field which was
    not aligned to the natural size on 64-bit platforms. This bug showed up
    after QEMU correctly identifies the pl011 line as being level-triggered,
    and not edge-triggered.

    These data structures are protected by a spinlock so simply use a
    non-atomic version of the accessor instead.

    Tested-by: Joel Schopp
    Reported-by: Riku Voipio
    Signed-off-by: Christoffer Dall

    Christoffer Dall
     
  • Commit 2da78092 changed the locking from a mutex to a spinlock,
    so we now longer sleep in this context. But there was a leftover
    might_sleep() in there, which now triggers since we do the final
    free from an RCU callback. Get rid of it.

    Reported-by: Pontus Fuchs
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2014-09-22

    We generate a blackhole or queueing route if a packet
    matches an IPsec policy but a state can't be resolved.
    Here we assume that dst_output() is called to kill
    these packets. Unfortunately this assumption is not
    true in all cases, so it is possible that these packets
    leave the system without the necessary transformations.

    This pull request contains two patches to fix this issue:

    1) Fix for blackhole routed packets.

    2) Fix for queue routed packets.

    Both patches are serious stable candidates.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In one error condition dm9000_parse_dt() returns NULL, however the
    return value is checked using IS_ERR() in dm9000_probe(), leading to the
    error not being properly propagated if CONFIG_OF is not enabled or the
    device tree data is not available. Fix this by also returning an
    ERR_PTR() in this case.

    Fixes: 0b8bf1baabe5 (net: dm9000: Allow instantiation using device tree)
    Signed-off-by: Tobias Klauser
    Signed-off-by: David S. Miller

    Tobias Klauser
     
  • There is an extra semi-colon so __rtl8169_set_features() is called every
    time.

    Fixes: 929a031dfd62 ('r8169: adjust __rtl8169_set_features')
    Signed-off-by: Dan Carpenter
    Acked-by: Hayes Wang --
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • Pull KVM fixes from Paolo Bonzini:
    "Two very simple bugfixes, affecting all supported architectures"

    [ Two? There's three commits in here. Oh well, I guess Paolo didn't
    count the preparatory symbol export ]

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: correct null pid check in kvm_vcpu_yield_to()
    KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()
    mm: export symbol dependencies of is_zero_pfn()

    Linus Torvalds
     
  • If the hw is in ALDPS mode, the hw may have no response for accessing
    the most registers. Therefore, the ALDPS should be disabled before
    accessing the hw in rtl_ops.init(), rtl_ops.disable(), rtl_ops.up(),
    and rtl_ops.down(). Regardless of rtl_ops.enable(), because the hw
    wouldn't enter ALDPS mode when linking on. The hw would enter the
    ALDPS mode after several seconds when link down occurs and the ALDPS
    is enabled.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller

    hayeswang
     
  • To catch future errors sooner.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
    or increasing skb->cb[] size.

    Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
    skb_flow_dissect()") broke IPoIB.

    Only current offender is sch_choke, and this one do not need an
    absolutely precise flow key.

    If we store 17 bytes of flow key, its more than enough. (Its the actual
    size of flow_keys if it was a packed structure, but we might add new
    fields at the end of it later)

    Signed-off-by: Eric Dumazet
    Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • TG3 appears to have an issue performing TSO and checksum offloading
    correclty when the frame has been vlan encapsulated (non-accelrated).
    In these cases, tcp checksum is not correctly updated.

    This patch attempts to work around this issue. After the patch,
    802.1ad vlans start working correctly over tg3 devices.

    CC: Prashant Sreedharan
    CC: Michael Chan
    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • tmp103 temperature sensor driver registers with the hwmon framework by calling
    hwmon_device_register_with_groups but does not have a .remove method to call
    hwmon_device_unregister to unregister from the framework when the device is no
    longer needed. Fix this by calling devm_hwmon_device_register_with_groups.

    Signed-off-by: Sundar J Dev
    Signed-off-by: Guenter Roeck

    sundarjdev
     
  • Since commit 412ca1550cbe ("macvlan: Move broadcasts into a work queue"), the
    driver uses tx_queue_len of the master device as the limit of packets enqueuing.
    Problem is that virtual drivers have this value set to 0, thus all broadcast
    packets were rejected.
    Because tx_queue_len was arbitrarily chosen, I replace it with a static limit
    of 1000 (also arbitrarily chosen).

    CC: Herbert Xu
    Reported-by: Thibaut Collet
    Suggested-by: Thibaut Collet
    Tested-by: Thibaut Collet
    Signed-off-by: Nicolas Dichtel
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • When requests are retried due to hw or sw resource shortages,
    we often stop the associated hardware queue. So ensure that we
    restart the queues when running the requeue work, otherwise the
    queue run will be a no-op.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • __blk_mq_alloc_rq_maps() can be invoked multiple times, if we scale
    back the queue depth if we are low on memory. So don't clear
    set->tags when we fail, this is handled directly in
    the parent function, blk_mq_alloc_tag_set().

    Reported-by: Robert Elliott
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We should not insert requests into the flush state machine from
    blk_mq_insert_request. All incoming flush requests come through
    blk_{m,s}q_make_request and are handled there, while blk_execute_rq_nowait
    should only be called for BLOCK_PC requests. All other callers
    deal with requests that already went through the flush statemchine
    and shouldn't be reinserted into it.

    Reported-by: Robert Elliott
    Debugged-by: Ming Lei
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This patch should fix the bug reported in
    https://lkml.org/lkml/2014/9/11/249.

    We have to initialize at least the atomic_flags and the cmd_flags when
    allocating storage for the requests.

    Otherwise blk_mq_timeout_check() might dereference uninitialized
    pointers when racing with the creation of a request.

    Also move the reset of cmd_flags for the initializing code to the point
    where a request is freed. So we will never end up with pending flush
    request indicators that might trigger dereferences of invalid pointers
    in blk_mq_timeout_check().

    Cc: stable@vger.kernel.org
    Signed-off-by: David Hildenbrand
    Reported-by: Paulo De Rezende Pinatti
    Tested-by: Paulo De Rezende Pinatti
    Acked-by: Christian Borntraeger
    Signed-off-by: Jens Axboe

    David Hildenbrand