10 Oct, 2015

1 commit


06 Oct, 2015

1 commit

  • Pull xen bug fixes from David Vrabel:

    - Fix VM save performance regression with x86 PV guests

    - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI)

    - Other minor fixes.

    * tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    x86/xen/p2m: hint at the last populated P2M entry
    x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map
    x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
    xen/x86: Don't try to write syscall-related MSRs for PV guests
    xen: use correct type for HYPERVISOR_memory_op()

    Linus Torvalds
     

04 Oct, 2015

1 commit

  • Pull strscpy string copy function implementation from Chris Metcalf.

    Chris sent this during the merge window, but I waffled back and forth on
    the pull request, which is why it's going in only now.

    The new "strscpy()" function is definitely easier to use and more secure
    than either strncpy() or strlcpy(), both of which are horrible nasty
    interfaces that have serious and irredeemable problems.

    strncpy() has a useless return value, and doesn't NUL-terminate an
    overlong result. To make matters worse, it pads a short result with
    zeroes, which is a performance disaster if you have big buffers.

    strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
    the insane NUL padding, but having a differently broken return value
    which returns the original length of the source string. Which means
    that it will read characters past the count from the source buffer, and
    you have to trust the source to be properly terminated. It also makes
    error handling fragile, since the test for overflow is unnecessarily
    subtle.

    strscpy() avoids both these problems, guaranteeing the NUL termination
    (but not excessive padding) if the destination size wasn't zero, and
    making the overflow condition very obvious by returning -E2BIG. It also
    doesn't read past the size of the source, and can thus be used for
    untrusted source data too.

    So why did I waffle about this for so long?

    Every time we introduce a new-and-improved interface, people start doing
    these interminable series of trivial conversion patches.

    And every time that happens, somebody does some silly mistake, and the
    conversion patch to the improved interface actually makes things worse.
    Because the patch is mindnumbing and trivial, nobody has the attention
    span to look at it carefully, and it's usually done over large swatches
    of source code which means that not every conversion gets tested.

    So I'm pulling the strscpy() support because it *is* a better interface.
    But I will refuse to pull mindless conversion patches. Use this in
    places where it makes sense, but don't do trivial patches to fix things
    that aren't actually known to be broken.

    * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    tile: use global strscpy() rather than private copy
    string: provide strscpy()
    Make asm/word-at-a-time.h available on all architectures

    Linus Torvalds
     

03 Oct, 2015

2 commits

  • Pull drm fixes from Dave Airlie:
    "Bunch of fixes all over the place, all pretty small: amdgpu, i915,
    exynos, one qxl and one vmwgfx.

    There is also a bunch of mst fixes, I left some cleanups in the series
    as I didn't think it was worth splitting up the tested series"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
    drm/dp/mst: add some defines for logical/physical ports
    drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
    drm/dp/mst: split connector registration into two parts (v2)
    drm/dp/mst: update the link_address_sent before sending the link address (v3)
    drm/dp/mst: fixup handling hotplug on port removal.
    drm/dp/mst: don't pass port into the path builder function
    drm/radeon: drop radeon_fb_helper_set_par
    drm: handle cursor_set2 in restore_fbdev_mode
    drm/exynos: Staticize local function in exynos_drm_gem.c
    drm/exynos: fimd: actually disable dp clock
    drm/exynos: dp: remove suspend/resume functions
    drm/qxl: recreate the primary surface when the bo is not primary
    drm/amdgpu: only print meaningful VM faults
    drm/amdgpu/cgs: remove import_gpu_mem
    drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2
    drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
    drm/vmwgfx: Fix a command submission hang regression
    drm/exynos: remove unused mode_fixup() code
    drm/exynos: remove decon_mode_fixup()
    drm/exynos: remove fimd_mode_fixup()
    ...

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:
    "Another week, another round of fixes.

    These have been brewing for a bit and in various iterations, but I
    feel pretty comfortable about the quality of them. They fix real
    issues. The pull request is mostly blk-mq related, and the only one
    not fixing a real bug, is the tag iterator abstraction from Christoph.
    But it's pretty trivial, and we'll need it for another fix soon.

    Apart from the blk-mq fixes, there's an NVMe affinity fix from Keith,
    and a single fix for xen-blkback from Roger fixing failure to free
    requests on disconnect"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    blk-mq: factor out a helper to iterate all tags for a request_queue
    blk-mq: fix racy updates of rq->errors
    blk-mq: fix deadlock when reading cpu_list
    blk-mq: avoid inserting requests before establishing new mapping
    blk-mq: fix q->mq_usage_counter access race
    blk-mq: Fix use after of free q->mq_map
    blk-mq: fix sysfs registration/unregistration race
    blk-mq: avoid setting hctx->tags->cpumask before allocation
    NVMe: Set affinity after allocating request queues
    xen/blkback: free requests on disconnection

    Linus Torvalds
     

02 Oct, 2015

11 commits

  • Pull IOVA fixes from David Woodhouse:
    "The main fix here is the first one, fixing the over-allocation of
    size-aligned requests. The other patches simply make the existing
    IOVA code available to users other than the Intel VT-d driver, with no
    functional change.

    I concede the latter really *should* have been submitted during the
    merge window, but since it's basically risk-free and people are
    waiting to build on top of it and it's my fault I didn't get it in, I
    (and they) would be grateful if you'd take it"

    * git://git.infradead.org/intel-iommu:
    iommu: Make the iova library a module
    iommu: iova: Export symbols
    iommu: iova: Move iova cache management to the iova library
    iommu/iova: Avoid over-allocating when size-aligned

    Linus Torvalds
     
  • This just removes the magic number.

    Acked-by: Daniel Vetter
    Signed-off-by: Dave Airlie

    Dave Airlie
     
  • In order to cache the EDID properly for tiled displays, we
    need to retrieve it before we register the connector with
    userspace, otherwise userspace can call get resources
    and try and get the edid before we've even cached it.

    This fixes some problems when hotplugging mst monitors,
    with X/mutter running. As mutter seems to get 0 modes
    for one of the monitors in the tile.

    v2: fix warning in radeon
    handle tile setting in cached path rather than
    get edid path.

    Reviewed-by: Daniel Vetter
    Cc: stable@vger.kernel.org
    Signed-off-by: Dave Airlie

    Dave Airlie
     
  • Merge misc fixes from Andrew Morton:
    "12 fixes"

    * emailed patches from Andrew Morton :
    dmapool: fix overflow condition in pool_find_page()
    thermal: avoid division by zero in power allocator
    memcg: remove pcp_counter_lock
    kprobes: use _do_fork() in samples to make them work again
    drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE
    memcg: make mem_cgroup_read_stat() unsigned
    memcg: fix dirty page migration
    dax: fix NULL pointer in __dax_pmd_fault()
    mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
    mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)
    userfaultfd: remove kernel header include from uapi header
    arch/x86/include/asm/efi.h: fix build failure

    Linus Torvalds
     
  • Pull power management and ACPI fixes from Rafael Wysocki:
    "These are fixes mostly, for a few changes made in this cycle (the
    intel_idle driver, the OPP library, the ACPI EC driver, turbostat) and
    for some issues that have just been discovered (ACPI PCI IRQ
    management, PCI power management documentation, turbostat), with a
    couple of cleanups on top of them.

    Specifics:

    - intel_idle driver fixup for the recently added Skylake chips
    support (Len Brown).

    - Operating Performance Points (OPP) library fix related to the
    recently added support for new DT bindings and a fix for a typo in
    a comment (Viresh Kumar, Stephen Boyd).

    - ACPI EC driver fix for a recently introduced memory leak in an
    error code path (Lv Zheng).

    - ACPI PCI IRQ management fix for the issue where an ISA IRQ is
    shared with a PCI device which requires it to be configured in a
    different way and may cause an interrupt storm to happen as a
    result with an extra ACPI SCI IRQ handling simplification on top of
    it (Jiang Liu).

    - Update of the PCI power management documentation that became
    outdated and started to actively confuse the readers to make it
    actually reflect the code (Rafael J Wysocki).

    - turbostat fixes including an IVB Xeon regression fix (related to
    the --debug command line option), Skylake adjustment for the TSC
    running at a frequency that doesn't match the base one exactly, and
    a Knights Landing quirk to account for the fact that it only
    updates APERF and MPERF every 1024 clock cycles plus bumping up the
    turbostat version number (Len Brown, Hubert Chrzaniuk)"

    * tag 'pm+acpi-4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    tools/power turbosat: update version number
    tools/power turbostat: SKL: Adjust for TSC difference from base frequency
    tools/power turbostat: KNL workaround for %Busy and Avg_MHz
    tools/power turbostat: IVB Xeon: fix --debug regression
    ACPI / PCI: Remove duplicated penalty on SCI IRQ
    ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
    ACPI / EC: Fix a memory leak issue in acpi_ec_query()
    PM / OPP: Fix typo modifcation -> modification
    PCI / PM: Update runtime PM documentation for PCI devices
    PM / OPP: of_property_count_u32_elems() can return errors
    intel_idle: Skylake Client Support - updated

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Fix regression in SKB partial checksum handling, from Pravin B
    Shalar.

    2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse
    Brandeburg.

    3) Cure softlockups during accept() in SCTP, from Karl Heiss.

    4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from
    Aaron Conole.

    5) IPV6 erroneously ignores output interface specifier in lookup key for
    route lookups, fix from David Ahern.

    6) In Marvell DSA driver, forward unknown frames to CPU port, from
    Andrew Lunn.

    7) Mission flow flag initializations in some code paths, from David
    Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: Initialize flow flags in input path
    net: dsa: fix preparation of a port STP update
    testptp: Silence compiler warnings on ppc64
    net/mlx4: Handle return codes in mlx4_qp_attach_common
    dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port
    skbuff: Fix skb checksum partial check.
    net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
    net sysfs: Print link speed as signed integer
    bna: fix error handling
    af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
    af_unix: Convert the unix_sk macro to an inline function for type safety
    net: sctp: Don't use 64 kilobyte lookup table for four elements
    l2tp: protect tunnel->del_work by ref_count
    net/ibm/emac: bump version numbers for correct work with ethtool
    sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
    sctp: Whitespace fix
    i40e/i40evf: check for stopped admin queue
    i40e: fix VLAN inside VXLAN
    r8169: fix handling rtl_readphy result
    net: hisilicon: fix handling platform_get_irq result

    Linus Torvalds
     
  • Commit 733a572e66d2 ("memcg: make mem_cgroup_read_{stat|event}() iterate
    possible cpus instead of online") removed the last use of the per memcg
    pcp_counter_lock but forgot to remove the variable.

    Kill the vestigial variable.

    Signed-off-by: Greg Thelen
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • The problem starts with a file backed dirty page which is charged to a
    memcg. Then page migration is used to move oldpage to newpage.

    Migration:
    - copies the oldpage's data to newpage
    - clears oldpage.PG_dirty
    - sets newpage.PG_dirty
    - uncharges oldpage from memcg
    - charges newpage to memcg

    Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
    count.

    However, because newpage is not yet charged, setting newpage.PG_dirty
    does not increment the memcg's dirty page count. After migration
    completes newpage.PG_dirty is eventually cleared, often in
    account_page_cleaned(). At this time newpage is charged to a memcg so
    the memcg's dirty page count is decremented which causes underflow
    because the count was not previously incremented by migration. This
    underflow causes balance_dirty_pages() to see a very large unsigned
    number of dirty memcg pages which leads to aggressive throttling of
    buffered writes by processes in non root memcg.

    This issue:
    - can harm performance of non root memcg buffered writes.
    - can report too small (even negative) values in
    memory.stat[(total_)dirty] counters of all memcg, including the root.

    To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
    page_memcg() and set_page_memcg() helpers.

    Test:
    0) setup and enter limited memcg
    mkdir /sys/fs/cgroup/test
    echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
    echo $$ > /sys/fs/cgroup/test/cgroup.procs

    1) buffered writes baseline
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    2) buffered writes with compaction antagonist to induce migration
    yes 1 > /proc/sys/vm/compact_memory &
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    kill %
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    3) buffered writes without antagonist, should match baseline
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    (speed, dirty residue)
    unpatched patched
    1) 841 MB/s 0 dirty pages 886 MB/s 0 dirty pages
    2) 611 MB/s -33427456 dirty pages 793 MB/s 0 dirty pages
    3) 114 MB/s -33427456 dirty pages 891 MB/s 0 dirty pages

    Notice that unpatched baseline performance (1) fell after
    migration (3): 841 -> 114 MB/s. In the patched kernel, post
    migration performance matches baseline.

    Fixes: c4843a7593a9 ("memcg: add per cgroup dirty page accounting")
    Signed-off-by: Greg Thelen
    Reported-by: Dave Hansen
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: [4.2+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • As include/uapi/linux/userfaultfd.h is a user visible header file, it
    should not include kernel-exclusive header files.

    So trying to build the userfaultfd test program from the selftests
    directory fails, since it contains a reference to linux/compiler.h. As
    it turns out, that header is not really needed there, so we can simply
    remove it to fix that issue.

    Signed-off-by: Andre Przywara
    Cc: Andrea Arcangeli
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andre Przywara
     
  • Pull rdma fixes from Doug Ledford:
    - Fixes for mlx5 related issues
    - Fixes for ipoib multicast handling

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
    IB/ipoib: increase the max mcast backlog queue
    IB/ipoib: Make sendonly multicast joins create the mcast group
    IB/ipoib: Expire sendonly multicast joins
    IB/mlx5: Remove pa_lkey usages
    IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY
    IB/iser: Add module parameter for always register memory
    xprtrdma: Replace global lkey with lkey local to PD

    Linus Torvalds
     
  • * pm-pci:
    PCI / PM: Update runtime PM documentation for PCI devices

    * acpi-pci:
    ACPI / PCI: Remove duplicated penalty on SCI IRQ
    ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ

    Rafael J. Wysocki
     

01 Oct, 2015

2 commits

  • And replace the blk_mq_tag_busy_iter with it - the driver use has been
    replaced with a new helper a while ago, and internal to the block we
    only need the new version.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • blk_mq_complete_request may be a no-op if the request has already
    been completed by others means (e.g. a timeout or cancellation), but
    currently drivers have to set rq->errors before calling
    blk_mq_complete_request, which might leave us with the wrong error value.

    Add an error parameter to blk_mq_complete_request so that we can
    defer setting rq->errors until we known we won the race to complete the
    request.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

30 Sep, 2015

4 commits

  • drm_kms_helper_poll_enable() was converted to lock the mode_config
    mutex in commit 8c4ccc4ab6f64e859d4ff8d7c02c2ed2e956e07f
    ("drm/probe-helper: Grab mode_config.mutex in poll_init/enable").

    This disregarded the cases where this function is called from a context
    where this mutex is already locked.

    Add a non-locking version as well.

    Changes since v1:
    - use function name suffix '_locked' for the function that
    is to be called from a locked context.

    Signed-off-by: Egbert Eich
    Reviewed-by: Daniel Vetter
    Signed-off-by: Jani Nikula

    Egbert Eich
     
  • Earlier patch 6ae459bda tried to detect void ckecksum partial
    skb by comparing pull length to checksum offset. But it does
    not work for all cases since checksum-offset depends on
    updates to skb->data.

    Following patch fixes it by validating checksum start offset
    after skb-data pointer is updated. Negative value of checksum
    offset start means there is no need to checksum.

    Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
    Reported-by: Andrew Vagin
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • As suggested by Eric Dumazet this change replaces the
    #define with a static inline function to enjoy
    complaints by the compiler when misusing the API.

    Signed-off-by: Aaron Conole
    Signed-off-by: David S. Miller

    Aaron Conole
     
  • There is a race between cpu hotplug handling and adding/deleting
    gendisk for blk-mq, where both are trying to register and unregister
    the same sysfs entries.

    null_add_dev
    --> blk_mq_init_queue
    --> blk_mq_init_allocated_queue
    --> add to 'all_q_list' (*)
    --> add_disk
    --> blk_register_queue
    --> blk_mq_register_disk (++)

    null_del_dev
    --> del_gendisk
    --> blk_unregister_queue
    --> blk_mq_unregister_disk (--)
    --> blk_cleanup_queue
    --> blk_mq_free_queue
    --> del from 'all_q_list' (*)

    blk_mq_queue_reinit
    --> blk_mq_sysfs_unregister (-)
    --> blk_mq_sysfs_register (+)

    While the request queue is added to 'all_q_list' (*),
    blk_mq_queue_reinit() can be called for the queue anytime by CPU
    hotplug callback. But blk_mq_sysfs_unregister (-) and
    blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
    before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
    is finished. Because '/sys/block/*/mq/' is not exists.

    There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can
    be used to track these sysfs stuff, but it is only fixing this issue
    partially.

    In order to fix it completely, we just need per-queue flag instead of
    per-hctx flag with appropriate locking. So this introduces
    q->mq_sysfs_init_done which is properly protected with all_q_mutex.

    Also, we need to ensure that blk_mq_map_swqueue() is called with
    all_q_mutex is held. Since hctx->nr_ctx is reset temporarily and
    updated in blk_mq_map_swqueue(), so we should avoid
    blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value
    in CPU hotplug handling or adding/deleting gendisk .

    Signed-off-by: Akinobu Mita
    Reviewed-by: Ming Lei
    Cc: Ming Lei
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Akinobu Mita
     

28 Sep, 2015

2 commits

  • Currently there is a number of issues preventing PVHVM Xen guests from
    doing successful kexec/kdump:

    - Bound event channels.
    - Registered vcpu_info.
    - PIRQ/emuirq mappings.
    - shared_info frame after XENMAPSPACE_shared_info operation.
    - Active grant mappings.

    Basically, newly booted kernel stumbles upon already set up Xen
    interfaces and there is no way to reestablish them. In Xen-4.7 a new
    feature called 'soft reset' is coming. A guest performing kexec/kdump
    operation is supposed to call SCHEDOP_shutdown hypercall with
    SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
    (with some help from toolstack) will do full domain cleanup (but
    keeping its memory and vCPU contexts intact) returning the guest to
    the state it had when it was first booted and thus allowing it to
    start over.

    Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
    probably OK as by default all unknown shutdown reasons cause domain
    destroy with a message in toolstack log: 'Unknown shutdown reason code
    5. Destroying domain.' which gives a clue to what the problem is and
    eliminates false expectations.

    Signed-off-by: Vitaly Kuznetsov
    Cc:
    Signed-off-by: David Vrabel

    Vitaly Kuznetsov
     
  • …k/linux-rcu into core/urgent

    Pull RCU fixes from Paul E. McKenney, for two regressions
    introduced in this merge window:

    - Fix bug with recent GCCs.
    - Fix false positive lockdep splat.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

27 Sep, 2015

1 commit

  • Pull SCSI target fixes from Nicholas Bellinger:
    "This includes a iser-target series from Jenny + Sagi @ Mellanox that
    addresses the few remaining active I/O shutdown bugs, along with a
    patch to support zero-copy for immediate data payloads that gives a
    nice performance improvement for small block WRITEs.

    Also included are some recent >= v4.2 regression bug-fixes. The most
    notable is a RCU conversion regression for SPC-3 PR registrations, and
    recent removal of obsolete RFC-3720 markers that introduced a login
    regression bug with MSFT iSCSI initiators.

    Thanks to everyone who has been testing + reporting bugs for v4.x"

    * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
    iscsi-target: Avoid OFMarker + IFMarker negotiation
    target: Make TCM_WRITE_PROTECT failure honor D_SENSE bit
    target: Fix target_sense_desc_format NULL pointer dereference
    target: Propigate backend read-only to core_tpg_add_lun
    target: Fix PR registration + APTPL RCU conversion regression
    iser-target: Skip data copy if all the command data comes as immediate
    iser-target: Change the recv buffers posting logic
    iser-target: Fix pending connections handling in target stack shutdown sequnce
    iser-target: Remove np_ prefix from isert_np members
    iser-target: Remove unused variables
    iser-target: Put the reference on commands waiting for unsol data
    iser-target: remove command with state ISTATE_REMOVE

    Linus Torvalds
     

26 Sep, 2015

4 commits

  • Pull networking fixes from David Miller:

    1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
    instead of cloning them. From Daniel Borkmann.

    2) When converting classical BPF into eBPF, fix the setting of the
    source reg to BPF_REG_X. From Tycho Andersen.

    3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
    Linus Lussing.

    4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.

    5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
    Roopa Prabhu.

    6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.

    7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
    Florian Westphal.

    8) Check DMA mapping errors in bna driver, from Ivan Vecera.

    9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
    misclearing of interrupt status in TX timeout handler, etc.) from
    David Woodhouse.

    10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
    Hugne.

    11) Fix autobind races et al. in netlink code, from Herbert Xu with
    help from Tejun Heo and others.

    12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.

    13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
    Eric Dumazet.

    14) Fix array overruns in mac80211, from Johannes Berg and Dan
    Carpenter.

    15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.

    16) Fix race between poll_one_napi and napi_disable, from Neil Horman.

    17) Fix byte order in geneve tunnel port config, from John W Linville.

    18) Fix handling of ARP replies over lightweight tunnels, from Jiri
    Benc.

    19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
    Kok and Roopa Prabhu.

    20) Several reference count handling bug fixes in the PHY/MDIO layer
    from Russel King.

    21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.

    22) Fix crash in icmp_route_lookup(), from David Ahern.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
    net: Fix panic in icmp_route_lookup
    net: update docbook comment for __mdiobus_register()
    ppp: fix lockdep splat in ppp_dev_uninit()
    net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
    phy: marvell: add link partner advertised modes
    net: fix net_device refcounting
    phy: add phy_device_remove()
    phy: fixed-phy: properly validate phy in fixed_phy_update_state()
    net: fix phy refcounting in a bunch of drivers
    of_mdio: fix MDIO phy device refcounting
    phy: add proper phy struct device refcounting
    phy: fix mdiobus module safety
    net: dsa: fix of_mdio_find_bus() device refcount leak
    phy: fix of_mdio_find_bus() device refcount leak
    ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
    ip6_gre: Reduce log level in ip6gre_err() to debug
    fib_rules: fix fib rule dumps across multiple skbs
    bnx2x: byte swap rss_key to comply to Toeplitz specs
    net: revert "net_sched: move tp->root allocation into fw_init()"
    lwtunnel: remove source and destination UDP port config option
    ...

    Linus Torvalds
     
  • Avoid IRQs occupied by ISA IRQs when allocating IRQs for PCI link devices,
    otherwise it may cause interrupt storm due to incompatible pin attributes.

    This issue was triggered on a KVM virtual machine, which
    1) uses IRQ9 for SCI in high level mode.
    2) defines an PCI interrupt link device (LNKS) with IRQ9 as the only
    possible irq.
    3) has an PCI device referring to link device LNKS.
    So it causes interrupt storm when enabling the PCI device because PCI IRQ
    works in low level mode.

    Signed-off-by: Jiang Liu
    Acked-by: Bjorn Helgaas
    Signed-off-by: Rafael J. Wysocki

    Jiang Liu
     
  • Pull another cgroup fix from Tejun Heo:
    "The cgroup writeback support got inadvertently enabled for traditional
    hierarchies revealing two regressions which are currently being worked
    on. It shouldn't have been enabled on traditional hierarchies, so
    disable it on them. This is enough to make the regressions go away
    for people who aren't experimenting with cgroup"

    * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup, writeback: don't enable cgroup writeback on traditional hierarchies

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - fix v4.2 SEEK on files over 2 gigs
    - Fix a layout segment reference leak when pNFS I/O falls back to inband I/O.
    - Fix recovery of recalled read delegations

    Bugfixes:
    - Fix a case where NFSv4 fails to send CLOSE after a server reboot
    - Fix sunrpc to wait for connections to complete before retrying
    - Fix sunrpc races between transport connect/disconnect and shutdown
    - Fix an infinite loop when layoutget fail with BAD_STATEID
    - nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
    - Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set
    - Fix layoutreturn/close ordering issues"

    * tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFS41: make close wait for layoutreturn
    NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set
    NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget
    NFSv4: Recovery of recalled read delegations is broken
    NFS: Fix an infinite loop when layoutget fail with BAD_STATEID
    NFS: Do cleanup before resetting pageio read/write to mds
    SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose
    SUNRPC: Lock the transport layer on shutdown
    nfs/filelayout: Fix NULL reference caused by double freeing of fh_array
    SUNRPC: Ensure that we wait for connections to complete before retrying
    SUNRPC: drop null test before destroy functions
    nfs: fix v4.2 SEEK on files over 2 gigs
    SUNRPC: Fix races between socket connection and destroy code
    nfs: fix pg_test page count calculation
    Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

    Linus Torvalds
     

25 Sep, 2015

11 commits

  • Commit 96249d70dd70 ("IB/core: Guarantee that a local_dma_lkey
    is available") allows ULPs that make use of the local dma key to keep
    working as before by allocating a DMA MR with local permissions and
    converted these consumers to use the MR associated with the PD
    rather then device->local_dma_lkey.

    ConnectIB has some known issues with memory registration
    using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).

    Thus don't expose support for it (remove device->local_dma_lkey
    setting), and take advantage of the above commit such that no regression
    is introduced to working systems.

    The local_dma_lkey support will be restored in CX4 depending on FW
    capability query.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Doug Ledford

    Sagi Grimberg
     
  • This patch adds a DF_READ_ONLY flag that is used by IBLOCK to
    signal when a backend has been set to read-only mode, in order
    to propigate read-only status up to core_tpg_add_lun() for all
    future LUN fabric exports.

    With this is place, existing emulation for reporting read-only
    in spc_emulate_modesense() and normal transport_lookup_cmd_lun()
    TCM_WRITE_PROTECTED status checking just works as expected.

    Reported-by: Joeue Deng
    Reported-by: Andy Grover
    Signed-off-by: Nicholas Bellinger

    Nicholas Bellinger
     
  • Add a phy_device_remove() function to complement phy_device_register(),
    which undoes the effects of phy_device_register() by removing the phy
    device from visibility, but not freeing it.

    This allows these details to be moved out of the mdio bus code into
    the phy code where this action belongs.

    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     
  • Re-implement the mdiobus module refcounting to ensure that we actually
    ensure that the mdiobus module code does not go away while we might call
    into it.

    The old scheme using bus->dev.driver was buggy, because bus->dev is a
    class device which never has a struct device_driver associated with it,
    and hence the associated code trying to obtain a refcount did nothing
    useful.

    Instead, take the approach that other subsystems do: pass the module
    when calling mdiobus_register(), and record that in the mii_bus struct.
    When we need to increment the module use count in the phy code, use
    this stored pointer. When the phy is deteched, drop the module
    refcount, remembering that the phy device might go away at that point.

    This doesn't stop the mii_bus going away while there are in-use phys -
    it merely stops the underlying code vanishing.

    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     
  • Pull thermal management fixes from Zhang Rui:

    - Power allocator governor changes to allow binding on thermal zones
    with missing power estimates information. From Javi Merino.

    - Add compile test flags on thermal drivers that allow it without
    producing compilation errors. From Eduardo Valentin.

    - Fixes around memory allocation on cpu_cooling. From Javi Merino.

    - Fix on db8500 cpufreq code to allow autoload. From Luis de
    Bethencourt.

    - Maintainer entries for cpu cooling device

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
    thermal: power_allocator: exit early if there are no cooling devices
    thermal: power_allocator: don't require tzp to be present for the thermal zone
    thermal: power_allocator: relax the requirement of two passive trip points
    thermal: power_allocator: relax the requirement of a sustainable_power in tzp
    thermal: Add a function to get the minimum power
    thermal: cpu_cooling: free power table on error or when unregistering
    thermal: cpu_cooling: don't call kcalloc() under rcu_read_lock
    thermal: db8500_cpufreq_cooling: Fix module autoload for OF platform driver
    thermal: cpu_cooling: Add MAINTAINERS entry
    thermal: ti-soc: Kconfig fix to avoid menu showing wrongly
    thermal: ti-soc: allow compile test
    thermal: qcom_spmi: allow compile test
    thermal: exynos: allow compile test
    thermal: armada: allow compile test
    thermal: dove: allow compile test
    thermal: kirkwood: allow compile test
    thermal: rockchip: allow compile test
    thermal: spear: allow compile test
    thermal: hisi: allow compile test
    thermal: Fix thermal_zone_of_sensor_register to match documentation

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "15 fixes"

    * emailed patches from Andrew Morton :
    ocfs2/dlm: fix deadlock when dispatch assert master
    membarrier: clean up selftest
    vmscan: fix sane_reclaim helper for legacy memcg
    lib/iommu-common.c: do not try to deref a null iommu->lazy_flush() pointer when n < pool->hint
    x86, efi, kasan: #undef memset/memcpy/memmove per arch
    mm: migrate: hugetlb: putback destination hugepage to active list
    mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified
    userfaultfd: register uapi generic syscall (aarch64)
    userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical
    userfaultfd: selftest: return an error if BOUNCE_VERIFY fails
    userfaultfd: selftest: avoid my_bcmp false positives with powerpc
    userfaultfd: selftest: only warn if __NR_userfaultfd is undefined
    userfaultfd: selftest: headers fixup
    userfaultfd: selftests: vm: pick up sanitized kernel headers
    userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"

    Linus Torvalds
     
  • The UDP tunnel config is asymmetric wrt. to the ports used. The source and
    destination ports from one direction of the tunnel are not related to the
    ports of the other direction. We need to be able to respond to ARP requests
    using the correct ports without involving routing.

    As the consequence, UDP ports need to be fixed property of the tunnel
    interface and cannot be set per route. Remove the ability to set ports per
    route. This is still okay to do, as no kernel has been released with these
    attributes yet.

    Note that the ability to specify source and destination ports is preserved
    for other users of the lwtunnel API which don't use routes for tunnel key
    specification (like openvswitch).

    If in the future we rework ARP handling to allow port specification, the
    attributes can be added back.

    Signed-off-by: Jiri Benc
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • When using ip lwtunnels, the additional data for xmit (basically, the actual
    tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
    metadata dst. When replying to ARP requests, we need to send the reply to
    the same tunnel the request came from. This means we need to construct
    proper metadata dst for ARP replies.

    We could perform another route lookup to get a dst entry with the correct
    lwtstate. However, this won't always ensure that the outgoing tunnel is the
    same as the incoming one, and it won't work anyway for IPv4 duplicate
    address detection.

    The only thing to do is to "reverse" the ip_tunnel_info.

    Signed-off-by: Jiri Benc
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • VXLAN device can receive skb with checksum partial. But the checksum
    offset could be in outer header which is pulled on receive. This results
    in negative checksum offset for the skb. Such skb can cause the assert
    failure in skb_checksum_help(). Following patch fixes the bug by setting
    checksum-none while pulling outer header.

    Following is the kernel panic msg from old kernel hitting the bug.

    ------------[ cut here ]------------
    kernel BUG at net/core/dev.c:1906!
    RIP: 0010:[] skb_checksum_help+0x144/0x150
    Call Trace:

    [] queue_userspace_packet+0x408/0x470 [openvswitch]
    [] ovs_dp_upcall+0x5d/0x60 [openvswitch]
    [] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
    [] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
    [] ovs_vport_receive+0x2a/0x30 [openvswitch]
    [] vxlan_rcv+0x53/0x60 [openvswitch]
    [] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
    [] udp_queue_rcv_skb+0x2dc/0x3b0
    [] __udp4_lib_rcv+0x1cf/0x6c0
    [] udp_rcv+0x1a/0x20
    [] ip_local_deliver_finish+0xdd/0x280
    [] ip_local_deliver+0x88/0x90
    [] ip_rcv_finish+0x10d/0x370
    [] ip_rcv+0x235/0x300
    [] __netif_receive_skb+0x55d/0x620
    [] netif_receive_skb+0x80/0x90
    [] virtnet_poll+0x555/0x6f0
    [] net_rx_action+0x134/0x290
    [] __do_softirq+0xa8/0x210
    [] call_softirq+0x1c/0x30
    [] do_softirq+0x65/0xa0
    [] irq_exit+0x8e/0xb0
    [] do_IRQ+0x63/0xe0
    [] common_interrupt+0x6e/0x6e

    Reported-by: Anupam Chanda
    Signed-off-by: Pravin B Shelar
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • inode_cgwb_enabled() gates cgroup writeback support. If it returns
    true, each inode is attached to the corresponding memory domain which
    gets mapped to io domain. It currently only tests whether the
    filesystem and bdi support cgroup writeback; however, cgroup writeback
    support doesn't work on traditional hierarchies and thus it should
    also test whether memcg and iocg are on the default hierarchy.

    This caused traditional hierarchy setups to hit the cgroup writeback
    path inadvertently and ended up creating separate writeback domains
    for each memcg and mapping them all to the root iocg uncovering a
    couple issues in the cgroup writeback path.

    cgroup writeback was never meant to be enabled on traditional
    hierarchies. Make inode_cgwb_enabled() test whether both memcg and
    iocg are on the default hierarchy.

    Signed-off-by: Tejun Heo
    Reported-by: Artem Bityutskiy
    Reported-by: Dexuan Cui
    Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
    Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net

    Tejun Heo
     
  • Pull spi fixes from Mark Brown:
    "A disappointingly large collection of fixes for SPI issues, though
    almost all in drivers (and there mainly the newly added Mediatek
    driver) and the core fixes are documentation and error handling.

    The driver fixes are all of the usual 'important if you see them'
    variety"

    * tag 'spi-fix-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
    spi: xtensa-xtfpga: fix register endianness
    spi: meson: Fix module autoload for OF platform driver
    spi: mediatek: fix wrong error return value on probe
    spi: fix kernel-doc warnings in spi.h
    spi: spidev: fix possible NULL dereference
    spi: atmel: remove warning when !CONFIG_PM_SLEEP
    spi: bcm2835: BUG: fix wrong use of PAGE_MASK
    spi: mediatek: fix spi cs polarity error
    spi: Fix documentation of spi_alloc_master()
    spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled
    spi: Mediatek: Document devicetree bindings update for spi bus
    spi: mediatek: fix spi clock usage error
    spi: mediatek: remove clk_disable_unprepare()

    Linus Torvalds