10 Apr, 2014

4 commits


09 Apr, 2014

36 commits

  • Pull i2c updates from Wolfram Sang:
    "Here is the pull request from the i2c subsystem. It got a little
    delayed because I needed to wait for a dependency to be included
    (commit b424080a9e08: "reset: Add optional resets and stubs"). Plus,
    I had some email problems. All done now, the highlights are:

    - drivers can now deprecate their use of i2c classes. That shouldn't
    be used on embedded platforms anyhow and was often blindly
    copy&pasted. This mechanism gives users time to switch away and
    ultimately boot faster once the use of classes for those drivers is
    gone for good.

    - new drivers for QUP, Cadence, efm32

    - tracepoint support for I2C and SMBus

    - bigger cleanups for the mv64xxx, nomadik, and designware drivers

    And the usual bugfixes, cleanups, feature additions. Most stuff has
    been in linux-next for a while. Just some hot fixes and new drivers
    were added a bit more recently."

    * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (63 commits)
    i2c: cadence: fix Kconfig dependency
    i2c: Add driver for Cadence I2C controller
    i2c: cadence: Document device tree bindings
    Documentation: i2c: improve section about flags mangling the protocol
    i2c: qup: use proper type fro clk_freq
    i2c: qup: off by ones in qup_i2c_probe()
    i2c: efm32: fix binding doc
    MAINTAINERS: update I2C web resources
    i2c: qup: New bus driver for the Qualcomm QUP I2C controller
    i2c: qup: Add device tree bindings information
    i2c: i2c-xiic: deprecate class based instantiation
    i2c: i2c-sirf: deprecate class based instantiation
    i2c: i2c-mv64xxx: deprecate class based instantiation
    i2c: i2c-designware-platdrv: deprecate class based instantiation
    i2c: i2c-davinci: deprecate class based instantiation
    i2c: i2c-bcm2835: deprecate class based instantiation
    i2c: mv64xxx: Fix reset controller handling
    i2c: omap: fix usage of IS_ERR_VALUE with pm_runtime_get_sync
    i2c: efm32: new bus driver
    i2c: exynos5: remove unnecessary cast of void pointer
    ...

    Linus Torvalds
     
  • Pull MMC updates from Chris Ball:
    "MMC highlights for 3.15:

    Core:
    - CONFIG_MMC_UNSAFE_RESUME=y is now default behavior
    - DT bindings for SDHCI UHS, eMMC HS200, high-speed DDR, at 1.8/1.2V
    - Add GPIO descriptor based slot-gpio card detect API

    Drivers:
    - dw_mmc: Refactor SOCFPGA support as a variant inside dw_mmc-pltfm.c
    - mmci: Support HW busy detection on ux500
    - omap: Support MMC_ERASE
    - omap_hsmmc: Support MMC_PM_KEEP_POWER, MMC_PM_WAKE_SDIO_IRQ, (a)cmd23
    - rtsx: Support pre-req/post-req async
    - sdhci: Add support for Realtek RTS5250 controllers
    - sdhci-acpi: Add support for 80860F16, fix 80860F14/SDIO card detect
    - sdhci-msm: Add new driver for Qualcomm SDHCI chipset support
    - sdhci-pxav3: Add support for Marvell Armada 380 and 385 SoCs"

    * tag 'mmc-updates-for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (102 commits)
    mmc: sdhci-acpi: Intel SDIO has broken card detect
    mmc: sdhci-pxav3: add support for the Armada 38x SDHCI controller
    mmc: sdhci-msm: Add platform_execute_tuning implementation
    mmc: sdhci-msm: Initial support for Qualcomm chipsets
    mmc: sdhci-msm: Qualcomm SDHCI binding documentation
    sdhci: only reprogram retuning timer when flag is set
    mmc: rename ARCH_BCM to ARCH_BCM_MOBILE
    mmc: sdhci: Allow for irq being shared
    mmc: sdhci-acpi: Add device id 80860F16
    mmc: sdhci-acpi: Fix broken card detect for ACPI HID 80860F14
    mmc: slot-gpio: Add GPIO descriptor based CD GPIO API
    mmc: slot-gpio: Split out CD IRQ request into a separate function
    mmc: slot-gpio: Record GPIO descriptors instead of GPIO numbers
    Revert "dts: socfpga: Add support for SD/MMC on the SOCFPGA platform"
    mmc: sdhci-spear: use generic card detection gpio support
    mmc: sdhci-spear: remove support for power gpio
    mmc: sdhci-spear: simplify resource handling
    mmc: sdhci-spear: fix platform_data usage
    mmc: sdhci-spear: fix error handling paths for DT
    mmc: sdhci-bcm-kona: fix build errors when built-in
    ...

    Linus Torvalds
     
  • Pull more powerpc updates from Ben Herrenschmidt:
    "Here are a few more powerpc things for you.

    So you'll find here the conversion of the two new firmware sysfs
    interfaces to the new API for self-removing files that Greg and Tejun
    introduced, so they can finally remove the old one.

    I'm also reverting the hwmon driver for powernv. I shouldn't have
    merged it, I got a bit carried away here. I hadn't realized it was
    never CCed to the relevant maintainer(s) and list(s), and happens to
    have some issues so I'm taking it out and it will come back via the
    proper channels.

    The rest is a bunch of LE fixes (argh, some of the new stuff was
    broken on LE, I really need to start testing LE myself !) and various
    random fixes here and there.

    Finally one bit that's not strictly a fix, which is the HVC OPAL
    change to "kick" the HVC thread when the firmware tells us there is
    new incoming data. I don't feel like waiting for this one, it's
    simple enough, and it makes a big difference in console responsiveness
    which is good for my nerves"

    * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (26 commits)
    powerpc/powernv Adapt opal-elog and opal-dump to new sysfs_remove_file_self
    Revert "powerpc/powernv: hwmon driver for power values, fan rpm and temperature"
    power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
    powerpc/le: Avoid creatng R_PPC64_TOCSAVE relocations for modules.
    arch/powerpc: Use RCU_INIT_POINTER(x, NULL) in platforms/cell/spu_syscalls.c
    powerpc/opal: Add missing include
    powerpc: Convert last uses of __FUNCTION__ to __func__
    powerpc: Add lq/stq emulation
    powerpc/powernv: Add invalid OPAL call
    powerpc/powernv: Add OPAL message log interface
    powerpc/book3s: Fix mc_recoverable_range buffer overrun issue.
    powerpc: Remove dead code in sycall entry
    powerpc: Use of_node_init() for the fakenode in msi_bitmap.c
    powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast()
    powerpc/powernv: Fix endian issues with sensor code
    powerpc/powernv: Fix endian issues with OPAL async code
    tty/hvc_opal: Kick the HVC thread on OPAL console events
    powerpc/powernv: Add opal_notifier_unregister() and export to modules
    powerpc/ppc64: Do not turn AIL (reloc-on interrupts) too early
    powerpc/ppc64: Gracefully handle early interrupts
    ...

    Linus Torvalds
     
  • Jan Stancek reported:
    "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
    occasionally fails, because some threads fail to wake up.

    Testcase creates 5 threads, which are all waiting on same condition.
    Main thread then calls pthread_cond_broadcast() without holding mutex,
    which calls:

    futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)

    This immediately wakes up single thread A, which unlocks mutex and
    tries to wake up another thread:

    futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)

    If thread A manages to call futex_wake() before any waiters are
    requeued for uaddr2, no other thread is woken up"

    The ordering constraints for the hash bucket waiter counting are that
    the waiter counts have to be incremented _before_ getting the spinlock
    (because the spinlock acts as part of the memory barrier), but the
    "requeue" operation didn't honor those rules, and nobody had even
    thought about that case.

    This fairly simple patch just increments the waiter count for the target
    hash bucket (hb2) when requeing a futex before taking the locks. It
    then decrements them again after releasing the lock - the code that
    actually moves the futex(es) between hash buckets will do the additional
    required waiter count housekeeping.

    Reported-and-tested-by: Jan Stancek
    Acked-by: Davidlohr Bueso
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: stable@vger.kernel.org # 3.14
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • We are currently using sysfs_schedule_callback() which is deprecated
    and about to be removed. Switch to the new interface instead.

    Signed-off-by: Stewart Smith
    Signed-off-by: Benjamin Herrenschmidt

    Stewart Smith
     
  • This reverts commit 0de7f8a917b5202014430e0055c0e1db0348bd62.

    This driver wasn't merged via the proper maintainers (my fault ... ooops !)
    and has serious issues so let's take it out for now and have a new better
    one be merged the right way

    Signed-off-by: Benjamin Herrenschmidt
    ---

    Benjamin Herrenschmidt
     
  • Since v1:
    Edited the comment according to Srivatsa's suggestion.

    During the testing, we encounter below WARN followed by Oops:

    WARNING: at kernel/sched/core.c:6218
    ...
    NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200
    LR [c000000000101358] .build_sched_domains+0xec8/0x1200
    PACATMSCRATCH [800000000000f032]
    Call Trace:
    [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200
    [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
    [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
    [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
    ...
    Oops: Kernel access of bad area, sig: 11 [#1]
    ...
    NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0
    LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200
    PACATMSCRATCH [8000000000029032]
    Call Trace:
    [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0
    [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200
    [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
    [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
    [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
    ...

    This was caused by that 'sd->groups == NULL' after building groups, which
    was caused by the empty 'sd->span'.

    The cpu's domain contained nothing because the cpu was assigned to a wrong
    node, due to the following unfortunate sequence of events:

    1. The hypervisor sent a topology update to the guest OS, to notify changes
    to the cpu-node mapping. However, the update was actually redundant - i.e.,
    the "new" mapping was exactly the same as the old one.

    2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting
    the 'for-loop' in arch_update_cpu_topology().

    3. So we ended up calling stop-machine() with an empty cpumask list, which made
    stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as
    the cpu to run the payload (the update_cpu_topology() function).

    4. This causes update_cpu_topology() to be run by CPU0. And since 'updates'
    is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology()
    finds update->cpu as well as update->new_nid to be 0. In other words, we
    end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly.

    Along with the following wrong updating, it causes the sched-domain rebuild
    code to break and crash the system.

    Fix this by skipping the topology update in cases where we find that
    the topology has not actually changed in reality (ie., spurious updates).

    CC: Benjamin Herrenschmidt
    CC: Paul Mackerras
    CC: Nathan Fontenot
    CC: Stephen Rothwell
    CC: Andrew Morton
    CC: Robert Jennings
    CC: Jesse Larrew
    CC: "Srivatsa S. Bhat"
    CC: Alistair Popple
    Suggested-by: "Srivatsa S. Bhat"
    Signed-off-by: Michael Wang
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Benjamin Herrenschmidt

    Michael Wang
     
  • When building modules with a native le toolchain the linker will
    generate R_PPC64_TOCSAVE relocations when it's safe to omit saving r2 on
    a plt call. This isn't helpful in the conext of a kernel module and the
    kernel will fail to load those modules with an error like:
    nf_conntrack: Unknown ADD relocation: 109

    This patch tells the linker to avoid createing R_PPC64_TOCSAVE
    relocations allowing modules to load.

    Signed-off-by: Tony Breeds
    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Tony Breeds
     
  • Here rcu_assign_pointer() is ensuring that the
    initialization of a structure is carried out before storing a pointer
    to that structure.
    So, rcu_assign_pointer(p, NULL) can always safely be converted to
    RCU_INIT_POINTER(p, NULL).

    Signed-off-by: Monam Agarwal
    Signed-off-by: Benjamin Herrenschmidt

    Monam Agarwal
     
  • next-20140324 currently fails compiling celleb_defconfig with:

    arch/powerpc/include/asm/opal.h:894:42: error: 'struct notifier_block' declared inside parameter list [-Werror]
    arch/powerpc/include/asm/opal.h:894:42: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
    arch/powerpc/include/asm/opal.h:896:14: error: 'struct notifier_block' declared inside parameter list [-Werror]

    This is due to a missing include which is added here.

    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling
     
  • Just about all of these have been converted to __func__,
    so convert the last uses.

    Signed-off-by: Joe Perches
    Signed-off-by: Benjamin Herrenschmidt

    Joe Perches
     
  • Recent CPUs support quad word load and store instructions. Add
    support to the alignment handler for them.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • This call will not be understood by OPAL, and cause it to add an error
    to it's log. Among other things, this is useful for testing the
    behaviour of the log as it fills up.

    Signed-off-by: Joel Stanley
    Signed-off-by: Benjamin Herrenschmidt

    Joel Stanley
     
  • OPAL provides an in-memory circular buffer containing a message log
    populated with various runtime messages produced by the firmware.

    Provide a sysfs interface /sys/firmware/opal/msglog for userspace to
    view the messages.

    Signed-off-by: Joel Stanley
    Signed-off-by: Benjamin Herrenschmidt

    Joel Stanley
     
  • Currently we wrongly allocate mc_recoverable_range buffer (to hold
    recoverable ranges) based on size of the property "mcheck-recoverable-ranges".
    This results in allocating less memory to hold available recoverable range
    entries from /proc/device-tree/ibm,opal/mcheck-recoverable-ranges.

    This patch fixes this issue by allocating mc_recoverable_range buffer based
    on number of entries of recoverable ranges instead of device property size.
    Without this change we end up allocating less memory and run into memory
    corruption issue.

    Signed-off-by: Mahesh Salgaonkar
    Signed-off-by: Benjamin Herrenschmidt

    Mahesh Salgaonkar
     
  • In:
    commit 742415d6b66bf09e3e73280178ef7ec85c90b7ee
    Author: Michael Neuling
    powerpc: Turn syscall handler into macros

    We converted the syscall entry code onto macros, but in doing this we
    introduced some cruft that's never run and should never have been added.

    This removes that code.

    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling
     
  • This patch uses of_node_init() to initialize the kobject in the fake
    node used in test_of_node(), to avoid following kobject warning.

    [ 0.897654] kobject: '(null)' (c0000007ca183a08): is not initialized, yet kobject_put() is being called.
    [ 0.897682] ------------[ cut here ]------------
    [ 0.897688] WARNING: at lib/kobject.c:670
    [ 0.897692] Modules linked in:
    [ 0.897701] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 3.14.0+ #1
    [ 0.897708] task: c0000007ca100000 ti: c0000007ca180000 task.ti: c0000007ca180000
    [ 0.897715] NIP: c00000000046a1f0 LR: c00000000046a1ec CTR: 0000000001704660
    [ 0.897721] REGS: c0000007ca1835c0 TRAP: 0700 Not tainted (3.14.0+)
    [ 0.897727] MSR: 8000000000029032 CR: 28000024 XER: 0000000d
    [ 0.897749] CFAR: c0000000008ef4ec SOFTE: 1
    GPR00: c00000000046a1ec c0000007ca183840 c0000000014c59b8 000000000000005c
    GPR04: 0000000000000001 c000000000129770 0000000000000000 0000000000000001
    GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000003fef
    GPR12: 0000000000000000 c00000000f221200 c00000000000c350 0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR24: 0000000000000000 c00000000144e808 c000000000c56f20 00000000000000d8
    GPR28: c000000000cd5058 0000000000000000 c000000001454ca8 c0000007ca183a08
    [ 0.897856] NIP [c00000000046a1f0] .kobject_put+0xa0/0xb0
    [ 0.897863] LR [c00000000046a1ec] .kobject_put+0x9c/0xb0
    [ 0.897868] Call Trace:
    [ 0.897874] [c0000007ca183840] [c00000000046a1ec] .kobject_put+0x9c/0xb0 (unreliable)
    [ 0.897885] [c0000007ca1838c0] [c000000000743f9c] .of_node_put+0x2c/0x50
    [ 0.897894] [c0000007ca183940] [c000000000c83954] .test_of_node+0x1dc/0x208
    [ 0.897902] [c0000007ca183b80] [c000000000c839a4] .msi_bitmap_selftest+0x24/0x38
    [ 0.897913] [c0000007ca183bf0] [c00000000000bb34] .do_one_initcall+0x144/0x200
    [ 0.897922] [c0000007ca183ce0] [c000000000c748e4] .kernel_init_freeable+0x2b4/0x394
    [ 0.897931] [c0000007ca183db0] [c00000000000c374] .kernel_init+0x24/0x130
    [ 0.897940] [c0000007ca183e30] [c00000000000a2f4] .ret_from_kernel_thread+0x5c/0x68
    [ 0.897947] Instruction dump:
    [ 0.897952] 7fe3fb78 38210080 e8010010 ebe1fff8 7c0803a6 4800014c e89f0000 3c62ff6e
    [ 0.897971] 7fe5fb78 3863a950 48485279 60000000 39000000 393f0038 4bffff80
    [ 0.897992] ---[ end trace 1eeffdb9f825a556 ]---

    Signed-off-by: Li Zhong
    Signed-off-by: Benjamin Herrenschmidt

    Li Zhong
     
  • We need to handle numa pte via the slow path

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • One OPAL call and one device tree property needed byte swapping.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Pull nfsd updates from Bruce Fields:
    "Highlights:
    - server-side nfs/rdma fixes from Jeff Layton and Tom Tucker
    - xdr fixes (a larger xdr rewrite has been posted but I decided it
    would be better to queue it up for 3.16).
    - miscellaneous fixes and cleanup from all over (thanks especially to
    Kinglong Mee)"

    * 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits)
    nfsd4: don't create unnecessary mask acl
    nfsd: revert v2 half of "nfsd: don't return high mode bits"
    nfsd4: fix memory leak in nfsd4_encode_fattr()
    nfsd: check passed socket's net matches NFSd superblock's one
    SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
    NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
    SUNRPC: New helper for creating client with rpc_xprt
    NFSD: Free backchannel xprt in bc_destroy
    NFSD: Clear wcc data between compound ops
    nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+
    nfsd4: fix nfs4err_resource in 4.1 case
    nfsd4: fix setclientid encode size
    nfsd4: remove redundant check from nfsd4_check_resp_size
    nfsd4: use more generous NFS4_ACL_MAX
    nfsd4: minor nfsd4_replay_cache_entry cleanup
    nfsd4: nfsd4_replay_cache_entry should be static
    nfsd4: update comments with obsolete function name
    rpc: Allow xdr_buf_subsegment to operate in-place
    NFSD: Using free_conn free connection
    SUNRPC: fix memory leak of peer addresses in XPRT
    ...

    Linus Torvalds
     
  • Merge a few more patches from Andrew Morton:
    "A few leftovers"

    * emailed patches from Andrew Morton :
    fs/ncpfs/dir.c: fix indenting in ncp_lookup()
    ncpfs/inode.c: fix mismatch printk formats and arguments
    ncpfs: remove now unused PRINTK macro
    ncpfs: convert PPRINTK to ncp_vdbg
    ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg
    ncpfs: Add pr_fmt and convert printks to pr_
    arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()
    lib/percpu_counter.c: fix bad percpu counter state during suspend
    autofs4: check dev ioctl size before allocating
    mm: vmscan: do not swap anon pages just because free+file is low

    Linus Torvalds
     
  • My static checker suggests adding curly braces here. Probably that was
    the intent, but actually the code works the same either way. I've just
    changed the indenting and left the code as-is.

    Signed-off-by: Dan Carpenter
    Cc: Petr Vandrovec
    Acked-by: Dave Chiluk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • Conversions to ncp_dbg showed some format/argument mismatches so fix
    them.

    Signed-off-by: Joe Perches
    Cc: Petr Vandrovec
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Uses are gone, remove the macro.

    Signed-off-by: Joe Perches
    Cc: Petr Vandrovec
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Use a more current logging style.

    Convert the paranoia debug statement to vdbg.
    Remove the embedded function names as dynamic_debug can do that.

    Signed-off-by: Joe Perches
    Cc: Petr Vandrovec
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Use a more current logging style and enable use of dynamic debugging.

    Remove embedded function names, dynamic debug can add this instead.

    Signed-off-by: Joe Perches
    Cc: Petr Vandrovec
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Convert to a more current logging style.

    Add pr_fmt to prefix with "ncpfs: ".
    Remove the embedded function names and use "%s: ", __func__

    Some previously unprefixed messages now have "ncpfs: "

    Signed-off-by: Joe Perches
    Cc: Petr Vandrovec
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Kmemcheck should use the preferred interface for parsing command line
    arguments, kstrto*(), rather than sscanf() itself. Use it
    appropriately.

    Signed-off-by: David Rientjes
    Cc: Vegard Nossum
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • I got a bug report yesterday from Laszlo Ersek in which he states that
    his kvm instance fails to suspend. Laszlo bisected it down to this
    commit 1cf7e9c68fe8 ("virtio_blk: blk-mq support") where virtio-blk is
    converted to use the blk-mq infrastructure.

    After digging a bit, it became clear that the issue was with the queue
    drain. blk-mq tracks queue usage in a percpu counter, which is
    incremented on request alloc and decremented when the request is freed.
    The initial hunt was for an inconsistency in blk-mq, but everything
    seemed fine. In fact, the counter only returned crazy values when
    suspend was in progress.

    When a CPU is unplugged, the percpu counters merges that CPU state with
    the general state. blk-mq takes care to register a hotcpu notifier with
    the appropriate priority, so we know it runs after the percpu counter
    notifier. However, the percpu counter notifier only merges the state
    when the CPU is fully gone. This leaves a state transition where the
    CPU going away is no longer in the online mask, yet it still holds
    private values. This means that in this state, percpu_counter_sum()
    returns invalid results, and the suspend then hangs waiting for
    abs(dead-cpu-value) requests to complete which of course will never
    happen.

    Fix this by clearing the state earlier, so we never have a case where
    the CPU isn't in online mask but still holds private state. This bug
    has been there since forever, I guess we don't have a lot of users where
    percpu counters needs to be reliable during the suspend cycle.

    Signed-off-by: Jens Axboe
    Reported-by: Laszlo Ersek
    Tested-by: Laszlo Ersek
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jens Axboe
     
  • There wasn't any check of the size passed from userspace before trying
    to allocate the memory required.

    This meant that userspace might request more space than allowed,
    triggering an OOM.

    Signed-off-by: Sasha Levin
    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • Page reclaim force-scans / swaps anonymous pages when file cache drops
    below the high watermark of a zone in order to prevent what little cache
    remains from thrashing.

    However, on bigger machines the high watermark value can be quite large
    and when the workload is dominated by a static anonymous/shmem set, the
    file set might just be a small window of used-once cache. In such
    situations, the VM starts swapping heavily when instead it should be
    recycling the no longer used cache.

    This is a longer-standing problem, but it's more likely to trigger after
    commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy")
    because file pages can no longer accumulate in a single zone and are
    dispersed into smaller fractions among the available zones.

    To resolve this, do not force scan anon when file pages are low but
    instead rely on the scan/rotation ratios to make the right prediction.

    Signed-off-by: Johannes Weiner
    Acked-by: Rafael Aquini
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Suleiman Souhlal
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Pull more networking updates from David Miller:

    1) If a VXLAN interface is created with no groups, we can crash on
    reception of packets. Fix from Mike Rapoport.

    2) Missing includes in CPTS driver, from Alexei Starovoitov.

    3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
    and Dan Carpenter.

    4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers. From
    Josh Boyer.

    5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
    Borkmann.

    6) Byte-Queue-Limit enabled drivers aren't handled properly in
    AF_PACKET transmit path, also from Daniel Borkmann.

    Same problem exists in pktgen, and Daniel fixed it there too.

    7) Fix resource leaks in driver probe error paths of new sxgbe driver,
    from Francois Romieu.

    8) Truesize of SKBs can gradually get more and more corrupted in NAPI
    packet recycling path, fix from Eric Dumazet.

    9) Fix uniprocessor netfilter build, from Florian Westphal. In the
    longer term we should perhaps try to find a way for ARRAY_SIZE() to
    work even with zero sized array elements.

    10) Fix crash in netfilter conntrack extensions due to mis-estimation of
    required extension space. From Andrey Vagin.

    11) Since we commit table rule updates before trying to copy the
    counters back to userspace (it's the last action we perform), we
    really can't signal the user copy with an error as we are beyond the
    point from which we can unwind everything. This causes all kinds of
    use after free crashes and other mysterious behavior.

    From Thomas Graf.

    12) Restore previous behvaior of div/mod by zero in BPF filter
    processing. From Daniel Borkmann.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
    net: sctp: wake up all assocs if sndbuf policy is per socket
    isdnloop: several buffer overflows
    netdev: remove potentially harmful checks
    pktgen: fix xmit test for BQL enabled devices
    net/at91_ether: avoid NULL pointer dereference
    tipc: Let tipc_release() return 0
    at86rf230: fix MAX_CSMA_RETRIES parameter
    mac802154: fix duplicate #include headers
    sxgbe: fix duplicate #include headers
    net: filter: be more defensive on div/mod by X==0
    netfilter: Can't fail and free after table replacement
    xen-netback: Trivial format string fix
    net: bcmgenet: Remove unnecessary version.h inclusion
    net: smc911x: Remove unused local variable
    bonding: Inactive slaves should keep inactive flag's value
    netfilter: nf_tables: fix wrong format in request_module()
    netfilter: nf_tables: set names cannot be larger than 15 bytes
    netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
    netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
    netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
    ...

    Linus Torvalds
     
  • Pull more staging patches from Greg KH:
    "Here are some more staging patches for 3.15-rc1.

    They include a late-submission of a wireless driver that a bunch of
    people seem to have the hardware for now. As it's stand-alone, it
    should be fine (now passes the 0-day random build bot tests).

    There are also some fixes for the unisys drivers, as they were causing
    havoc on a number of different machines. To resolve all of those
    issues, we just mark the driver as BROKEN now, and we can fix it up
    "properly" over time"

    * tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: rtl8723au: The 8723 only has two paths
    Staging: unisys: mark drivers as BROKEN
    Staging: unisys: verify that a control channel exists
    staging: unisys: Add missing close parentheses in filexfer.c
    staging: r8723au: Fix build problem when RFKILL is not selected
    staging: r8723au: Fix randconfig build errors
    staging: r8723au: Turn on build of new driver
    staging: r8723au: Additional source patches
    staging: r8723au: Add source files for new driver - part 4
    staging: r8723au: Add source files for new driver - part 3
    staging: r8723au: Add source files for new driver - part 2
    staging: r8723au: Add source files for new driver - part 1

    Linus Torvalds
     
  • Pull second set of arm64 updates from Catalin Marinas:
    "A second pull request for this merging window, mainly with fixes and
    docs clarification:

    - Documentation clarification on CPU topology and booting
    requirements
    - Additional cache flushing during boot (needed in the presence of
    external caches or under virtualisation)
    - DMA range invalidation fix for non cache line aligned buffers
    - Build failure fix with !COMPAT
    - Kconfig update for STRICT_DEVMEM"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: Fix DMA range invalidation for cache line unaligned buffers
    arm64: Add missing Kconfig for CONFIG_STRICT_DEVMEM
    arm64: fix !CONFIG_COMPAT build failures
    Revert "arm64: virt: ensure visibility of __boot_cpu_mode"
    arm64: Relax the kernel cache requirements for boot
    arm64: Update the TCR_EL1 translation granule definitions for 16K pages
    ARM: topology: Make it clear that all CPUs need to be described

    Linus Torvalds
     
  • Pull second set of s390 patches from Martin Schwidefsky:
    "The second part of Heikos uaccess rework, the page table walker for
    uaccess is now a thing of the past (yay!)

    The code change to fix the theoretical TLB flush problem allows us to
    add a TLB flush optimization for zEC12, this machine has new
    instructions that allow to do CPU local TLB flushes for single pages
    and for all pages of a specific address space.

    Plus the usual bug fixing and some more cleanup"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/uaccess: rework uaccess code - fix locking issues
    s390/mm,tlb: optimize TLB flushing for zEC12
    s390/mm,tlb: safeguard against speculative TLB creation
    s390/irq: Use defines for external interruption codes
    s390/irq: Add defines for external interruption codes
    s390/sclp: add timeout for queued requests
    kvm/s390: also set guest pages back to stable on kexec/kdump
    lcs: Add missing destroy_timer_on_stack()
    s390/tape: Add missing destroy_timer_on_stack()
    s390/tape: Use del_timer_sync()
    s390/3270: fix crash with multiple reset device requests
    s390/bitops,atomic: add missing memory barriers
    s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6

    Linus Torvalds
     
  • SCTP charges chunks for wmem accounting via skb->truesize in
    sctp_set_owner_w(), and sctp_wfree() respectively as the
    reverse operation. If a sender runs out of wmem, it needs to
    wait via sctp_wait_for_sndbuf(), and gets woken up by a call
    to __sctp_write_space() mostly via sctp_wfree().

    __sctp_write_space() is being called per association. Although
    we assign sk->sk_write_space() to sctp_write_space(), which
    is then being done per socket, it is only used if send space
    is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
    is set and therefore not invoked in sock_wfree().

    Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf
    again when transmitting packet") fixed an issue where in case
    sctp_packet_transmit() manages to queue up more than sndbuf
    bytes, sctp_wait_for_sndbuf() will never be woken up again
    unless it is interrupted by a signal. However, a still
    remaining issue is that if net.sctp.sndbuf_policy=0, that is
    accounting per socket, and one-to-many sockets are in use,
    the reclaimed write space from sctp_wfree() is 'unfairly'
    handed back on the server to the association that is the lucky
    one to be woken up again via __sctp_write_space(), while
    the remaining associations are never be woken up again
    (unless by a signal).

    The effect disappears with net.sctp.sndbuf_policy=1, that
    is wmem accounting per association, as it guarantees a fair
    share of wmem among associations.

    Therefore, if we have reclaimed memory in case of per socket
    accounting, wake all related associations to a socket in a
    fair manner, that is, traverse the socket association list
    starting from the current neighbour of the association and
    issue a __sctp_write_space() to everyone until we end up
    waking ourselves. This guarantees that no association is
    preferred over another and even if more associations are
    taken into the one-to-many session, all receivers will get
    messages from the server and are not stalled forever on
    high load. This setting still leaves the advantage of per
    socket accounting in touch as an association can still use
    up global limits if unused by others.

    Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.")
    Signed-off-by: Daniel Borkmann
    Cc: Thomas Graf
    Cc: Neil Horman
    Cc: Vlad Yasevich
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Daniel Borkmann