27 Oct, 2015

40 commits

  • commit ee296d7c5709440f8abd36b5b65c6b3e388538d9 upstream.

    They just call file_inode and then the corresponding *_inode_file_wait
    function. Just make them static inlines instead.

    Signed-off-by: Jeff Layton
    Cc: William Dauchy
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit 29d01b22eaa18d8b46091d3c98c6001c49f78e4a upstream.

    Allow callers to pass in an inode instead of a filp.

    Signed-off-by: Jeff Layton
    Reviewed-by: "J. Bruce Fields"
    Tested-by: "J. Bruce Fields"
    Cc: William Dauchy
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit bcd7f78d078ff6197715c1ed070c92aca57ec12c upstream.

    ...and rename it to better describe how it works.

    In order to fix a use-after-free in NFS, we need to be able to remove
    locks from an inode after the filp associated with them may have already
    been freed. flock_lock_file already only dereferences the filp to get to
    the inode, so just change it so the callers do that.

    All of the callers already pass in a lock request that has the fl_file
    set properly, so we don't need to pass it in individually. With that
    change it now only dereferences the filp to get to the inode, so just
    push that out to the callers.

    Signed-off-by: Jeff Layton
    Reviewed-by: "J. Bruce Fields"
    Tested-by: "J. Bruce Fields"
    Cc: William Dauchy
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit c91aed9896946721bb30705ea2904edb3725dd61 upstream.

    The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
    were not taking into account the initial page_offset when determining
    the rdma read length. This resulted in a read who's starting address
    and length exceeded the base/bounds of the frmr.

    The server gets an async error from the rdma device and kills the
    connection, and the client then reconnects and resends. This repeats
    indefinitely, and the application hangs.

    Most work loads don't tickle this bug apparently, but one test hit it
    every time: building the linux kernel on a 16 core node with 'make -j
    16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

    This bug seems to only be tripped with devices having small fastreg page
    list depths. I didn't see it with mlx4, for instance.

    Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
    Signed-off-by: Steve Wise
    Tested-by: Chuck Lever
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Steve Wise
     
  • commit 1a541b4e3cd6f5795022514114854b3e1345f24e upstream.

    6910fa1 ("arm64: enable PTE type bit in the mask for pte_modify") fixes
    a problem whereby a large block of PROT_NONE mapped memory is
    incorrectly mapped as block descriptors when mprotect is called.

    Unfortunately, a subtle bug was introduced by this fix to the THP logic.

    If one mmaps a large block of memory, then faults it such that it is
    collapsed into THPs; resulting calls to mprotect on this area of memory
    will lead to incorrect table descriptors being written instead of block
    descriptors. This is because pmd_modify calls pte_modify which is now
    allowed to modify the type of the page table entry.

    This patch reverts commit 6910fa16dbe142f6a0fd0fd7c249f9883ff7fc8a, and
    fixes the problem it was trying to address by adjusting PAGE_NONE to
    represent a table entry. Thus no change in pte type is required when
    moving from PROT_NONE to a different protection.

    Fixes: 6910fa16dbe1 ("arm64: enable PTE type bit in the mask for pte_modify")
    Cc: # 4.0+
    Cc: Feng Kan
    Reported-by: Ganapatrao Kulkarni
    Tested-by: Ganapatrao Kulkarni
    Reviewed-by: Catalin Marinas
    [SteveC: backported 1a541b4e3cd6f5795022514114854b3e1345f24e to 4.1 and
    4.2 stable. Just one minor fix to second part to allow patch to apply
    cleanly, no logic changed.]
    Signed-off-by: Steve Capper
    Signed-off-by: Catalin Marinas
    Signed-off-by: Greg Kroah-Hartman

    Steve Capper
     
  • commit 9911a2d5e9d14e39692b751929a92cb5a1d9d0e0 upstream.

    The code in pinctrl-imx.c only works correctly if in the
    imx_pinctrl_soc_info passed to imx_pinctrl_probe we have:

    info->pins[i].number = i
    conf_reg(info->pins[i]) = 4 * i

    (which conf_reg(pin) being the offset of the pin's configuration
    register).

    When the imx25 specific part was introduced in b4a87c9b966f ("pinctrl:
    pinctrl-imx: add imx25 pinctrl driver") we had:

    info->pins[i].number = i + 1
    conf_reg(info->pins[i]) = 4 * i

    . Commit 34027ca2bbc6 ("pinctrl: imx25: fix numbering for pins") tried
    to fix that but made the situation:

    info->pins[i-1].number = i
    conf_reg(info->pins[i-1]) = 4 * i

    which is hardly better but fixed the error seen back then.

    So insert another reserved entry in the array to finally yield:

    info->pins[i].number = i
    conf_reg(info->pins[i]) = 4 * i

    Fixes: 34027ca2bbc6 ("pinctrl: imx25: fix numbering for pins")
    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Linus Walleij
    Signed-off-by: Greg Kroah-Hartman

    Uwe Kleine-König
     
  • commit fe32d3cd5e8eb0f82e459763374aa80797023403 upstream.

    These functions check should_resched() before unlocking spinlock/bh-enable:
    preempt_count always non-zero => should_resched() always returns false.
    cond_resched_lock() worked iff spin_needbreak is set.

    This patch adds argument "preempt_offset" to should_resched().

    preempt_count offset constants for that:

    PREEMPT_DISABLE_OFFSET - offset after preempt_disable()
    PREEMPT_LOCK_OFFSET - offset after spin_lock()
    SOFTIRQ_DISABLE_OFFSET - offset after local_bh_distable()
    SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh()

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Graf
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers")
    Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz
    Signed-off-by: Ingo Molnar
    Signed-off-by: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     
  • commit 90b62b5129d5cb50f62f40e684de7a1961e57197 upstream.

    "CHECK" suggests it's only used as a comparison mask. But now it's used
    further as a config-conditional preempt disabler offset. Lets
    disambiguate this name.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1431441711-29753-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Frederic Weisbecker
     
  • commit 3ebe138ac642a195c7f2efdb918f464734421fd6 upstream.

    If rbd_dev_image_probe() in rbd_dev_probe_parent() fails, header_name
    is freed twice: once in rbd_dev_probe_parent() and then in its caller
    rbd_dev_image_probe() (rbd_dev_image_probe() is called recursively to
    handle parent images).

    rbd_dev_probe_parent() is responsible for probing the parent, so it
    shouldn't muck with clone's fields.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder
    Signed-off-by: Greg Kroah-Hartman

    Ilya Dryomov
     
  • commit ba30670f4d5292c4e7f7980bbd5071f7c4794cdd upstream.

    Fixes: ac8c3f3df ("dm thin: generate event when metadata threshold passed")
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Mike Snitzer
     
  • commit 51a4726b04e880fdd9b4e0e58b13f70b0a68a7f5 upstream.

    They were added relatively early in the driver init process
    which meant that in some cases the driver was not finished
    initializing before external tools tried to use them which
    could result in a crash depending on the timing.

    Signed-off-by: Alex Deucher
    Signed-off-by: Greg Kroah-Hartman

    Alex Deucher
     
  • commit bc8c131ccdd62d4ed4f33c6b50f92907e7c32dee upstream.

    This allows tiled monitors to work with radeon once mst is enabled.

    Signed-off-by: Dave Airlie
    Signed-off-by: Greg Kroah-Hartman

    Dave Airlie
     
  • commit ae491542cbbbcca0ec8938c37d4079a985e58440 upstream.

    This zeroes the msg so no random stack data ends up getting
    sent, it also limits the function to not accepting > 4
    i2c msgs.

    Reviewed-by: Daniel Vetter
    Signed-off-by: Dave Airlie
    Signed-off-by: Greg Kroah-Hartman

    Dave Airlie
     
  • commit f231976c2e8964ceaa9250e57d27c35ff03825c2 upstream.

    We need to do this in order to prevent accesses to the device while it's
    powered down. Userspace may have an mmap of the fb, and there's no good
    way (that I know of) to prevent it from touching the device otherwise.

    This fixes some nasty races between runpm and plymouth on some systems,
    which result in the GPU getting very upset and hanging the boot.

    Signed-off-by: Ben Skeggs
    Signed-off-by: Greg Kroah-Hartman

    Ben Skeggs
     
  • commit 874bbfe600a660cba9c776b3957b1ce393151b76 upstream.

    My system keeps crashing with below message. vmstat_update() schedules a delayed
    work in current cpu and expects the work runs in the cpu.
    schedule_delayed_work() is expected to make delayed work run in local cpu. The
    problem is timer can be migrated with NO_HZ. __queue_work() queues work in
    timer handler, which could run in a different cpu other than where the delayed
    work is scheduled. The end result is the delayed work runs in different cpu.
    The patch makes __queue_delayed_work records local cpu earlier. Where the timer
    runs doesn't change where the work runs with the change.

    [ 28.010131] ------------[ cut here ]------------
    [ 28.010609] kernel BUG at ../mm/vmstat.c:1392!
    [ 28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
    [ 28.011860] Modules linked in:
    [ 28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G W4.3.0-rc3+ #634
    [ 28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
    [ 28.014160] Workqueue: events vmstat_update
    [ 28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
    [ 28.015445] RIP: 0010:[] []vmstat_update+0x31/0x80
    [ 28.016282] RSP: 0018:ffff8800ba42fd80 EFLAGS: 00010297
    [ 28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
    [ 28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
    [ 28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
    [ 28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
    [ 28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
    [ 28.020071] FS: 0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
    [ 28.020071] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
    [ 28.020071] Stack:
    [ 28.020071] ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
    [ 28.020071] ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
    [ 28.020071] ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
    [ 28.020071] Call Trace:
    [ 28.020071] [] process_one_work+0x1c8/0x540
    [ 28.020071] [] ? process_one_work+0x14b/0x540
    [ 28.020071] [] worker_thread+0x114/0x460
    [ 28.020071] [] ? process_one_work+0x540/0x540
    [ 28.020071] [] kthread+0xf8/0x110
    [ 28.020071] [] ?kthread_create_on_node+0x200/0x200
    [ 28.020071] [] ret_from_fork+0x3f/0x70
    [ 28.020071] [] ?kthread_create_on_node+0x200/0x200

    Signed-off-by: Shaohua Li
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Shaohua Li
     
  • commit 36d48fb5766aee9717e429f772046696b215282d upstream.

    The core may register clients attached to this master which may use
    funtionality from the master. So, RuntimePM must be enabled before, otherwise
    this will fail.

    Signed-off-by: Wolfram Sang
    Signed-off-by: Wolfram Sang
    Acked-by: Mika Westerberg
    Signed-off-by: Greg Kroah-Hartman

    Wolfram Sang
     
  • commit 56d4b8a24cef5d66f0d10ac778a520d3c2c68a48 upstream.

    ACPI SSCN/FMCN methods were originally added because then the platform can
    provide the most accurate HCNT/LCNT values to the driver. However, this
    seems not to be true for Dell Inspiron 7348 where using these causes the
    touchpad to fail in boot:

    i2c_hid i2c-DLL0675:00: failed to retrieve report from device.
    i2c_designware INT3433:00: i2c_dw_handle_tx_abort: lost arbitration
    i2c_hid i2c-DLL0675:00: failed to retrieve report from device.
    i2c_designware INT3433:00: controller timed out

    The values received from ACPI are (in fast mode):

    HCNT: 72
    LCNT: 160

    this translates to following timings (input clock is 100MHz on Broadwell):

    tHIGH: 720 ns (spec min 600 ns)
    tLOW: 1600 ns (spec min 1300 ns)
    Bus period: 2920 ns (assuming 300 ns tf and tr)
    Bus speed: 342.5 kHz

    Both tHIGH and tLOW are within the I2C specification.

    The calculated values when ACPI parameters are not used are (in fast mode):

    HCNT: 87
    LCNT: 159

    which translates to:

    tHIGH: 870 ns (spec min 600 ns)
    tLOW: 1590 ns (spec min 1300 ns)
    Bus period 3060 ns (assuming 300 ns tf and tr)
    Bus speed 326.8 kHz

    These values are also within the I2C specification.

    Since both ACPI and calculated values meet the I2C specification timing
    requirements it is hard to say why the touchpad does not function properly
    with the ACPI values except that the bus speed is higher in this case (but
    still well below the max 400kHz).

    Solve this by adding DMI quirk to the driver that disables using ACPI
    parameters on this particulare machine.

    Reported-by: Pavel Roskin
    Signed-off-by: Mika Westerberg
    Tested-by: Pavel Roskin
    Signed-off-by: Wolfram Sang
    Signed-off-by: Greg Kroah-Hartman

    Mika Westerberg
     
  • commit eadd709f5d2e8aebb1b7bf49460e97a68d81a9b0 upstream.

    The core may register clients attached to this master which may use
    funtionality from the master. So, RuntimePM must be enabled before, otherwise
    this will fail. While here, move drvdata, too.

    Signed-off-by: Wolfram Sang
    Tested-by: Krzysztof Kozlowski
    Acked-by: Kukjin Kim
    Signed-off-by: Wolfram Sang
    Signed-off-by: Greg Kroah-Hartman

    Wolfram Sang
     
  • commit 4f7effddf4549d57114289f273710f077c4c330a upstream.

    The core may register clients attached to this master which may use
    funtionality from the master. So, RuntimePM must be enabled before, otherwise
    this will fail. While here, move drvdata, too.

    Reported-by: Geert Uytterhoeven
    Signed-off-by: Wolfram Sang
    Signed-off-by: Wolfram Sang
    Signed-off-by: Greg Kroah-Hartman

    Wolfram Sang
     
  • commit 1b52e50f2a402a266f1ba2281f0a57e87637a047 upstream.

    If i2c_new_dummy() fails in max77843_chg_init(), an PTR_ERR(NULL) is
    returned which is 0. So the function was wrongly returning a success
    value instead of an error code.

    Fixes: c7f585fe46d8 ("mfd: max77843: Add max77843 MFD driver core driver")
    Signed-off-by: Javier Martinez Canillas
    Reviewed-by: Krzysztof Kozlowski
    Signed-off-by: Lee Jones
    Signed-off-by: Greg Kroah-Hartman

    Javier Martinez Canillas
     
  • commit 8c3ad9cb7343dc5f61b8cf3cdbe1016c5e7c2c8b upstream.

    Recent Linux clients have started to send GETLAYOUT requests with
    minlength less than blocksize.

    Servers aren't really allowed to impose this kind of restriction on
    layouts; see RFC 5661 section 18.43.3 for details.

    This has been observed to cause indefinite hangs on fsx runs on some
    clients.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Greg Kroah-Hartman

    Christoph Hellwig
     
  • commit b6dd8e0719c0d2d01429639a11b7bc2677de240c upstream.

    Commit df057cc7b4fa ("arm64: errata: add module build workaround for
    erratum #843419") sets CFLAGS_MODULE to ensure that the large memory
    model is used by the compiler when building kernel modules.

    However, CFLAGS_MODULE is an environment variable and intended to be
    overridden on the command line, which appears to be the case with the
    Ubuntu kernel packaging system, so use KBUILD_CFLAGS_MODULE instead.

    Cc: Ard Biesheuvel
    Fixes: df057cc7b4fa ("arm64: errata: add module build workaround for erratum #843419")
    Reported-by: Dann Frazier
    Tested-by: Dann Frazier
    Signed-off-by: Will Deacon
    Signed-off-by: Greg Kroah-Hartman

    Will Deacon
     
  • commit dc6c5fb3b514221f2e9d21ee626a9d95d3418dff upstream.

    The code for btrfs inode-resolve has never worked properly for
    files with enough hard links to trigger extrefs. It was trying to
    get the leaf out of a path after freeing the path:

    btrfs_release_path(path);
    leaf = path->nodes[0];
    item_size = btrfs_item_size_nr(leaf, slot);

    The fix here is to use the extent buffer we cloned just a little higher
    up to avoid deadlocks caused by using the leaf in the path.

    Signed-off-by: Chris Mason
    cc: Mark Fasheh
    Reviewed-by: Filipe Manana
    Reviewed-by: Mark Fasheh
    Signed-off-by: Greg Kroah-Hartman

    Chris Mason
     
  • commit 8eb934591f8bf584969454a658f629cd06e59f3a upstream.

    We don't verify that all the balance filter arguments supplemented by
    the flags are actually known to the kernel. Thus we let it silently pass
    and do nothing.

    At the moment this means only the 'limit' filter, but we're going to add
    a few more soon so it's better to have that fixed. Also in older stable
    kernels so that it works with newer userspace tools.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    David Sterba
     
  • commit 424cdc14138088ada1b0e407a2195b2783c6e5ef upstream.

    page_counter_memparse() returns pages for the threshold, while
    mem_cgroup_usage() returns bytes for memory usage. Convert the
    threshold to bytes.

    Fixes: 3e32cb2e0a12b6915 ("memcg: rename cgroup_event to mem_cgroup_event").
    Signed-off-by: Shaohua Li
    Cc: Johannes Weiner
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Shaohua Li
     
  • commit 8996eafdcbad149ac0f772fb1649fbb75c482a6a upstream.

    Unlike shash algorithms, ahash drivers must implement export
    and import as their descriptors may contain hardware state and
    cannot be exported as is. Unfortunately some ahash drivers did
    not provide them and end up causing crashes with algif_hash.

    This patch adds a check to prevent these drivers from registering
    ahash algorithms until they are fixed.

    Signed-off-by: Russell King
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Russell King
     
  • commit a66d7f724a96d6fd279bfbd2ee488def6b081bea upstream.

    Some of the crypto algorithms write to the initialization vector,
    but no space has been allocated for it. This clobbers adjacent memory.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Dave Kleikamp
     
  • commit 621bd0f6982badd6483acb191eb7b6226a578328 upstream.

    With atomic drivers we need to make sure that (at least in general)
    property reads hold the right locks. But the legacy dpms property is
    special and can be read locklessly. Since userspace loves to just
    randomly look at that all the time (like with "status") do that.

    To make it clear that we play tricks use the READ_ONCE compiler
    barrier (and also for paranoia).

    Note that there's not really anything bad going on since even with the
    new atomic paths we eventually end up not chasing any pointers (and
    hence possibly freed memory and other fun stuff). The locking WARNING
    has been added in

    commit 88a48e297b3a3bac6022c03babfb038f1a886cea
    Author: Rob Clark
    Date: Thu Dec 18 16:01:50 2014 -0500

    drm: add atomic properties

    but since drivers are converting not everyone will have seen this from
    the start.

    Jens reported this and submitted a patch to just grab the
    mode_config.connection_mutex, but we can do a bit better.

    v2: Remove unused variables I failed to git add for real.

    Reference: http://mid.gmane.org/20150928194822.GA3930@kernel.dk
    Reported-by: Jens Axboe
    Tested-by: Jens Axboe
    Cc: Rob Clark
    Signed-off-by: Daniel Vetter
    Signed-off-by: Dave Airlie
    Signed-off-by: Greg Kroah-Hartman

    Daniel Vetter
     
  • [ Upstream commit e9193d60d363e4dff75ff6d43a48f22be26d59c7 ]

    Now send with MSG_PEEK can return data from multiple SKBs.

    Unfortunately we take into account the peek offset for each skb,
    that is wrong. We need to apply the peek offset only once.

    In addition, the peek offset should be used only if MSG_PEEK is set.

    Cc: "David S. Miller" (maintainer:NETWORKING
    Cc: Eric Dumazet (commit_signer:1/14=7%)
    Cc: Aaron Conole
    Fixes: 9f389e35674f ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag")
    Signed-off-by: Andrey Vagin
    Tested-by: Aaron Conole
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Andrey Vagin
     
  • [ Upstream commit 9f389e35674f5b086edd70ed524ca0f287259725 ]

    AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
    is set.

    This is referenced in kernel bugzilla #12323 @
    https://bugzilla.kernel.org/show_bug.cgi?id=12323

    As described both in the BZ and lkml thread @
    http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
    AF_UNIX socket only reads a single skb, where the desired effect is
    to return as much skb data has been queued, until hitting the recv
    buffer size (whichever comes first).

    The modified MSG_PEEK path will now move to the next skb in the tree
    and jump to the again: label, rather than following the natural loop
    structure. This requires duplicating some of the loop head actions.

    This was tested using the python socketpair python code attached to
    the bugzilla issue.

    Signed-off-by: Aaron Conole
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Aaron Conole
     
  • [ Upstream commit 4613012db1d911f80897f9446a49de817b2c4c47 ]

    As suggested by Eric Dumazet this change replaces the
    #define with a static inline function to enjoy
    complaints by the compiler when misusing the API.

    Signed-off-by: Aaron Conole
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Aaron Conole
     
  • [ Upstream commit db65a3aaf29ecce2e34271d52e8d2336b97bd9fe ]

    netlink_dump() allocates skb based on the calculated min_dump_alloc or
    a per socket max_recvmsg_len.
    min_alloc_size is maximum space required for any single netdev
    attributes as calculated by rtnl_calcit().
    max_recvmsg_len tracks the user provided buffer to netlink_recvmsg.
    It is capped at 16KiB.
    The intention is to avoid small allocations and to minimize the number
    of calls required to obtain dump information for all net devices.

    netlink_dump packs as many small messages as could fit within an skb
    that was sized for the largest single netdev information. The actual
    space available within an skb is larger than what is requested. It could
    be much larger and up to near 2x with align to next power of 2 approach.

    Allowing netlink_dump to use all the space available within the
    allocated skb increases the buffer size a user has to provide to avoid
    truncaion (i.e. MSG_TRUNG flag set).

    It was observed that with many VLANs configured on at least one netdev,
    a larger buffer of near 64KiB was necessary to avoid "Message truncated"
    error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when
    min_alloc_size was only little over 32KiB.

    This patch trims skb to allocated size in order to allow the user to
    avoid truncation with more reasonable buffer size.

    Signed-off-by: Ronen Arad
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Arad, Ronen
     
  • [ Upstream commit dde4b5ae65de659b9ec64bafdde0430459fcb495 ]

    In commit e3eea1eb47a ("tipc: clean up handling of message priorities")
    we introduced a field in the packet header for keeping track of the
    priority of fragments, since this value is not present in the specified
    protocol header. Since the value so far only is used at the transmitting
    end of the link, we have not yet officially defined it as part of the
    protocol.

    Unfortunately, the field we use for keeping this value, bits 13-15 in
    in word 5, has turned out to be a poor choice; it is already used by the
    broadcast protocol for carrying the 'network id' field of the sending
    node. Since packet fragments also need to be transported across the
    broadcast protocol, the risk of conflict is obvious, and we see this
    happen when we use network identities larger than 2^13-1. This has
    escaped our testing because we have so far only been using small network
    id values.

    We now move this field to bits 0-2 in word 9, a field that is guaranteed
    to be unused by all involved protocols.

    Fixes: e3eea1eb47a ("tipc: clean up handling of message priorities")
    Signed-off-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jon Paul Maloy
     
  • [ Upstream commit 077cb37fcf6f00a45f375161200b5ee0cd4e937b ]

    It seems that kernel memory can leak into userspace by a
    kmalloc, ethtool_get_strings, then copy_to_user sequence.

    Avoid this by using kcalloc to zero fill the copied buffer.

    Signed-off-by: Joe Perches
    Acked-by: Ben Hutchings
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Joe Perches
     
  • [ Upstream commit d40496a56430eac0d330378816954619899fe303 ]

    Similar to commit c29390c6dfee ("xps: must clear sender_cpu before forwarding")
    the skb->sender_cpu needs to be cleared when moving from Rx
    Tx, otherwise kernel could crash.

    Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices")
    Cc: Eric Dumazet
    Cc: Jamal Hadi Salim
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     
  • [ Upstream commit 598c12d0ba6de9060f04999746eb1e015774044b ]

    When openvswitch tries allocate memory from offline numa node 0:
    stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
    It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
    [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
    This patch disables numa affinity in this case.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     
  • [ Upstream commit 93d08b6966cf730ea669d4d98f43627597077153 ]

    When sockets have a native eBPF program attached through
    setsockopt(sk, SOL_SOCKET, SO_ATTACH_BPF, ...), and then try to
    dump these over getsockopt(sk, SOL_SOCKET, SO_GET_FILTER, ...),
    the following panic appears:

    [49904.178642] BUG: unable to handle kernel NULL pointer dereference at (null)
    [49904.178762] IP: [] sk_get_filter+0x39/0x90
    [49904.182000] PGD 86fc9067 PUD 531a1067 PMD 0
    [49904.185196] Oops: 0000 [#1] SMP
    [...]
    [49904.224677] Call Trace:
    [49904.226090] [] sock_getsockopt+0x319/0x740
    [49904.227535] [] ? sock_has_perm+0x63/0x70
    [49904.228953] [] ? release_sock+0x108/0x150
    [49904.230380] [] ? selinux_socket_getsockopt+0x23/0x30
    [49904.231788] [] SyS_getsockopt+0xa6/0xc0
    [49904.233267] [] entry_SYSCALL_64_fastpath+0x12/0x71

    The underlying issue is the very same as in commit b382c0865600
    ("sock, diag: fix panic in sock_diag_put_filterinfo"), that is,
    native eBPF programs don't store an original program since this
    is only needed in cBPF ones.

    However, sk_get_filter() wasn't updated to test for this at the
    time when eBPF could be attached. Just throw an error to the user
    to indicate that eBPF cannot be dumped over this interface.
    That way, it can also be known that a program _is_ attached (as
    opposed to just return 0), and a different (future) method needs
    to be consulted for a dump.

    Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     
  • [ Upstream commit 2306c704ce280c97a60d1f45333b822b40281dea ]

    reqsk_timer_handler() tests if icsk_accept_queue.listen_opt
    is NULL at its beginning.

    By the time it calls inet_csk_reqsk_queue_drop() and
    reqsk_queue_unlink(), listener might have been closed and
    inet_csk_listen_stop() had called reqsk_queue_yank_acceptq()
    which sets icsk_accept_queue.listen_opt to NULL

    We therefore need to correctly check listen_opt being NULL
    after holding syn_wait_lock for proper synchronization.

    Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
    Fixes: b357a364c57c ("inet: fix possible panic in reqsk_queue_unlink()")
    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit e6740165b8f7f06d8caee0fceab3fb9d790a6fed ]

    Since commit 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
    pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
    PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
    PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
    following oops:

    [ 570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0
    [ 570.142931] IP: [] pppoe_release+0x50/0x101 [pppoe]
    [ 570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
    [ 570.144601] Oops: 0000 [#1] SMP
    [ 570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
    [ 570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
    [ 570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
    [ 570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000
    [ 570.144601] RIP: 0010:[] [] pppoe_release+0x50/0x101 [pppoe]
    [ 570.144601] RSP: 0018:ffff880036b63e08 EFLAGS: 00010202
    [ 570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206
    [ 570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20
    [ 570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000
    [ 570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780
    [ 570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0
    [ 570.144601] FS: 00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
    [ 570.144601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0
    [ 570.144601] Stack:
    [ 570.144601] ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0
    [ 570.144601] ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008
    [ 570.144601] ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5
    [ 570.144601] Call Trace:
    [ 570.144601] [] sock_release+0x1a/0x75
    [ 570.144601] [] sock_close+0xd/0x11
    [ 570.144601] [] __fput+0xff/0x1a5
    [ 570.144601] [] ____fput+0x9/0xb
    [ 570.144601] [] task_work_run+0x66/0x90
    [ 570.144601] [] prepare_exit_to_usermode+0x8c/0xa7
    [ 570.144601] [] syscall_return_slowpath+0x16d/0x19b
    [ 570.144601] [] int_ret_from_sys_call+0x25/0x9f
    [ 570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
    [ 570.144601] RIP [] pppoe_release+0x50/0x101 [pppoe]
    [ 570.144601] RSP
    [ 570.144601] CR2: 00000000000004e0
    [ 570.200518] ---[ end trace 46956baf17349563 ]---

    pppoe_flush_dev() has no reason to override sk->sk_state with
    PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
    PPPOX_DEAD, which is the correct state given that sk is unbound and
    po->pppoe_dev is NULL.

    Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
    Tested-by: Oleksii Berezhniak
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • [ Upstream commit c7c49b8fde26b74277188bdc6c9dca38db6fa35b ]

    Greg reported crashes hitting the following check in __sk_backlog_rcv()

    BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));

    The pfmemalloc bit is currently checked in sk_filter().

    This works correctly for TCP, because sk_filter() is ran in
    tcp_v[46]_rcv() before hitting the prequeue or backlog checks.

    For UDP or other protocols, this does not work, because the sk_filter()
    is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
    queuing if socket is owned by user by the time packet is processed by
    softirq handler.

    Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing")
    Signed-off-by: Eric Dumazet
    Reported-by: Greg Thelen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet