04 Nov, 2017

29 commits

  • Files removed in 'net-next' had their license header updated
    in 'net'. We take the remove from 'net-next'.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Return success if the same dispatch function is being registered for
    a given opcode and subcode, there by allow multiple switchdev enable
    and disables.

    Signed-off-by: Vijaya Mohan Guvva
    Signed-off-by: Satanand Burla
    Signed-off-by: Felix Manlunas
    Signed-off-by: David S. Miller

    Vijaya Mohan Guvva
     
  • Jiri Pirko says:

    ====================
    mlxsw: Handle changes in GRE configuration

    Petr says:

    Until now, when an IP tunnel was offloaded by the mlxsw driver, the
    offload was pretty much static, and changes in Linux configuration were
    not reflected in the hardware. That led to discrepancies between traffic
    flows in slow path and fast path. The work-around used to be to remove
    all routes that forward to the netdevice and re-add them. This is
    clearly suboptimal, but actually, as of the decap-only patchset, it's
    not even enough anymore, and one needs to go all the way and simply drop
    the tunnel and recreate it correctly.

    With this patchset, the NETDEV_CHANGE events that are generated for
    changes of up'd tunnel netdevices are captured and interpreted to
    correctly reconfigure the HW in accordance with changes requested at the
    software layer. In addition, NETDEV_CHANGEUPPER, NETDEV_UP and
    NETDEV_DOWN are now handled not only for tunnel devices themselves, but
    also for their bound devices. Each change is then translated to one or
    more of the following updates to the HW configuration:

    - refresh of offload of local route that corresponds to tunnel's local
    address
    - refresh of the loopback RIF
    - refresh of offloads of routes that forward to the changed tunnel
    - removal of tunnel offloads

    These tools are used to implement the following configuration changes:

    - addition of a new offloadable tunnel with local address that conflicts
    with that of an already-offloaded tunnel (the existing tunnel is
    onloaded, the new one isn't offloaded)
    - changes to TTL, TOS that make tunnel unsuitable for offloading
    - changes to ikey, okey, remote
    - changes to local, which when they cause conflict with another
    tunnel, lead to onloading of both newly-conflicting tunnels
    - migration of a bound device of an offloaded tunnel device to a
    different VRF
    - changes to what device is bound to a tunnel device (i.e. like what
    "ip tunnel change name g dev another" does)
    - changes to up / down state of a bound device. A down bound device
    doesn't forward encapsulated traffic anymore, but decap still works.

    This patchset starts with a suite of patches that adapt the existing
    code base step by step to facilitate introduction of the offloading
    code. The five substantial patches at the end then implement the changes
    mentioned above.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When the bound device of a tunnel device is down, encapsulated packets
    are not egressed anymore, but tunnel decap still works. Extend
    mlxsw_sp_nexthop_rif_update() to take IFF_UP into consideration when
    deciding whether a given next hop should be offloaded.

    Because the new logic was added to mlxsw_sp_nexthop_rif_update(), this
    fixes the case where a newly-added tunnel has a down bound device, which
    would previously be fully offloaded. Now the down state of the bound
    device is noted and next hops forwarding to such tunnel are not
    offloaded.

    In addition to that, notice NETDEV_UP and NETDEV_DOWN of a bound device
    to force refresh of tunnel encap route offloads.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • When a bound device of an IP-in-IP tunnel changes, such as through
    'ip tunnel change name $name dev $dev', the loopback backing the tunnel
    needs to be recreated.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Changes to L3 tunnel netdevices (through `ip tunnel change' as well as
    `ip link set') lead to NETDEV_CHANGE being generated on the tunnel
    device. Because what is relevant for the tunnel in question depends on
    the tunnel type, handling of the event is dispatched to the IPIP module
    through a newly-added interface mlxsw_sp_ipip_ops.ol_netdev_change().

    IPIP tunnels now remember the last set of tunnel parameters in struct
    mlxsw_sp_ipip_entry.parms, and use it to figure out what exactly has
    changed.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • When a bound device of a tunnel netdevice changes VRF, the loopback RIF
    that backs the tunnel needs to be updated and existing encapsulating
    routes need to be refreshed.

    Note that several tunnels can share the same bound device, in which case
    all the impacted tunnels need to be updated.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The approach for offloading IP tunnels implemented currently by mlxsw
    doesn't allow two tunnels that have the same local IP address in the
    same (underlay) VRF. Previously, offloads were introduced on demand as
    encap routes were formed. When such a route was created that would cause
    offload of a conflicting tunnel, mlxsw_sp_ipip_entry_create() would
    detect it and return -EEXIST, which would propagate up and cause FIB
    abort.

    Now however IPIP entries are created as soon as an offloadable netdevice
    is created, and the failure prevents creation of such device.
    Furthermore, if the driver is installed at the point where such
    conflicting tunnels exist, the failure actually prevents successful
    modprobe.

    Furthermore, follow-up patches implement handling of NETDEV_CHANGE due
    to the local address change. However, NETDEV_CHANGE can't be vetoed. The
    failure merely means that the offloads weren't updated, but the change
    in Linux configuration is not rolled back. It is thus desirable to have
    a robust way of handling these conflicts, which can later be reused for
    handling NETDEV_CHANGE as well.

    To fix this, when a conflicting tunnel is created, instead of failing,
    simply pull the old tunnel to slow path and reject offloading the
    new one.

    Introduce two functions: mlxsw_sp_ipip_entry_demote_tunnel() and
    mlxsw_sp_ipip_demote_tunnel_by_saddr() to handle this. Make them both
    public, because they will be useful later on in this patchset.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • When trying to determine whether there are other offloaded tunnels with
    the same local address, mlxsw_sp_ipip_entry_create() should look for a
    tunnel with matching UL protocol, matching saddr, in the same VRF.
    However instead of taking into account the UL protocol of the tunnel
    netdevice (which mlxsw_sp_ipip_entry_saddr_matches() then compares to
    the UL protocol of inspected IPIP entry), it deduces the UL protocol
    from the inspected IPIP entry (and that's compared to itself).

    This is currently immaterial, because only one tunnel type is offloaded,
    and therefore the UL protocol always matches, but introducing support
    for a tunnel with IPv6 underlay would uncover this error.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The work that needs to be done to update HW configuration in response to
    changes is similar to what __mlxsw_sp_ipip_entry_update_tunnel() already
    does, but with a number of twists: each change requires a different
    subset of things to happen. Extend the function to support all these
    uses, and allow finely-grained configuration of what should happen at
    each call through a suite of function arguments.

    Publish the updated function to allow use from the spectrum_ipip module.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The work that's done by mlxsw_sp_netdevice_ipip_ol_vrf_event() is a good
    basis for a more versatile function that would take care of all sorts of
    tunnel updates requests: __mlxsw_sp_ipip_entry_update_tunnel(). Extract
    that function. Factor out a helper mlxsw_sp_ipip_entry_ol_lb_update() as
    well.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The function mlxsw_sp_rif_create() takes an extack parameter. So far,
    for creation of loopback interfaces, NULL was passed. For some events
    however the extack can be extracted and passed along. So do that for
    NETDEV_CHANGEUPPER handler.

    Use the opportunity to update the type of info argument that
    mlxsw_sp_netdevice_ipip_ol_event() takes. Follow-up patches will
    introduce handling of more changes, and some of them carry an extack as
    well, but in an info structure of a different type. Though not strictly
    erroneous (the pointer could be cast whichever way), it makes no sense
    to pretend the value is always of a certain type, when in fact it isn't.
    So change the prototype of the above-mentioned function as well.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • The piece of logic to promote decap route, if any, is useful for generic
    tunnel updates, not just for handling of NETDEV_UP events on tunnel
    interfaces. Extract it to a separate function.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • This function only ever returns 0, so don't pretend it returns anything
    useful and just make it void.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • To implement NETDEV_CHANGE notifications on IP-in-IP tunnels, the
    handler needs to figure out what actually changed, to understand how
    exactly to update the offloads. It will do so by storing struct
    ip_tunnel_parm with previous configuration, and comparing that to the
    new version.

    To facilitate these comparisons, extract the code that operates on
    struct ip_tunnel_parm from the existing accessor functions, and make
    those a thin wrapper that extracts tunnel parameters and dispatches.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • These functions ideologically belong to the IPIP module, and some
    follow-up work will benefit from their presence there.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Some of the code down the road needs this logic as well.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • To distinguish between events related to tunnel device itself and its
    bound device, rename a number of functions related to handling tunneling
    netdevice events to include _ol_ (for "overlay") in the name. That
    leaves room in the namespace for underlay-related functions, which would
    have _ul_ in the name.

    Signed-off-by: Petr Machata
    Reviewed-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Petr Machata
     
  • Pull clk fix from Stephen Boyd:
    "One fix for USB clks on Uniphier PXs3 SoCs"

    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
    clk: uniphier: fix clock data for PXs3

    Linus Torvalds
     
  • Pull arch/tile fixes from Chris Metcalf:
    "Two one-line bug fixes"

    * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    arch/tile: Implement ->set_state_oneshot_stopped()
    tile: pass machine size to sparse

    Linus Torvalds
     
  • Pull SCSI fix from James Bottomley:
    "One minor fix in the error leg of the qla2xxx driver (it oopses the
    system if we get an error trying to start the internal kernel thread).

    The fix is minor because the problem isn't often encountered in the
    field (although it can be induced by inserting the module in a low
    memory environment)"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: qla2xxx: Fix oops in qla2x00_probe_one error path

    Linus Torvalds
     
  • set_state_oneshot_stopped() is called by the clkevt core, when the
    next event is required at an expiry time of 'KTIME_MAX'. This normally
    happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.

    This patch makes the clockevent device to stop on such an event, to
    avoid spurious interrupts, as explained by: commit 8fff52fd5093
    ("clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state").

    Signed-off-by: Chris Metcalf

    Chris Metcalf
     
  • Pull powerpc fixes from Michael Ellerman:
    "Some more powerpc fixes for 4.14.

    This is bigger than I like to send at rc7, but that's at least partly
    because I didn't send any fixes last week. If it wasn't for the IMC
    driver, which is new and getting heavy testing, the diffstat would
    look a bit better. I've also added ftrace on big endian to my test
    suite, so we shouldn't break that again in future.

    - A fix to the handling of misaligned paste instructions (P9 only),
    where a change to a #define has caused the check for the
    instruction to always fail.

    - The preempt handling was unbalanced in the radix THP flush (P9
    only). Though we don't generally use preempt we want to keep it
    working as much as possible.

    - Two fixes for IMC (P9 only), one when booting with restricted
    number of CPUs and one in the error handling when initialisation
    fails due to firmware etc.

    - A revert to fix function_graph on big endian machines, and then a
    rework of the reverted patch to fix kprobes blacklist handling on
    big endian machines.

    Thanks to: Anju T Sudhakar, Guilherme G. Piccoli, Madhavan Srinivasan,
    Naveen N. Rao, Nicholas Piggin, Paul Mackerras"

    * tag 'powerpc-4.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/perf: Fix core-imc hotplug callback failure during imc initialization
    powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text
    Revert "powerpc64/elfv1: Only dereference function descriptor for non-text symbols"
    powerpc/64s/radix: Fix preempt imbalance in TLB flush
    powerpc: Fix check for copy/paste instructions in alignment handler
    powerpc/perf: Fix IMC allocation routine

    Linus Torvalds
     
  • Pull MMC fixes from Ulf Hansson:
    "Fix dw_mmc request timeout issues"

    * tag 'mmc-v4.14-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
    mmc: dw_mmc: Fix the DTO timeout calculation
    mmc: dw_mmc: Add locking to the CTO timer
    mmc: dw_mmc: Fix the CTO timeout calculation
    mmc: dw_mmc: cancel the CTO timer after a voltage switch

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:

    - one nouveau regression fix

    - some amdgpu fixes for stable to fix hangs on some harvested Polaris
    GPUs

    - a set of KASAN and regression fixes for i915, their CI system seems
    to be working pretty well now.

    * tag 'drm-fixes-for-v4.14-rc8' of git://people.freedesktop.org/~airlied/linux:
    drm/amdgpu: allow harvesting check for Polaris VCE
    drm/amdgpu: return -ENOENT from uvd 6.0 early init for harvesting
    drm/i915: Check incoming alignment for unfenced buffers (on i915gm)
    drm/nouveau/kms/nv50: use the correct state for base channel notifier setup
    drm/i915: Hold rcu_read_lock when iterating over the radixtree (vma idr)
    drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects)
    drm/i915/edp: read edp display control registers unconditionally
    drm/i915: Do not rely on wm preservation for ILK watermarks
    drm/i915: Cancel the modeset retry work during modeset cleanup

    Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "Hopefully this is the last batch of networking fixes for 4.14

    Fingers crossed...

    1) Fix stmmac to use the proper sized OF property read, from Bhadram
    Varka.

    2) Fix use after free in net scheduler tc action code, from Cong
    Wang.

    3) Fix SKB control block mangling in tcp_make_synack().

    4) Use proper locking in fib_dump_info(), from Florian Westphal.

    5) Fix IPG encodings in systemport driver, from Florian Fainelli.

    6) Fix division by zero in NV TCP congestion control module, from
    Konstantin Khlebnikov.

    7) Fix use after free in nf_reject_ipv4, from Tejaswi Tanikella"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net: systemport: Correct IPG length settings
    tcp: do not mangle skb->cb[] in tcp_make_synack()
    fib: fib_dump_info can no longer use __in_dev_get_rtnl
    stmmac: use of_property_read_u32 instead of read_u8
    net_sched: hold netns refcnt for each action
    net_sched: acquire RTNL in tc_action_net_exit()
    net: vrf: correct FRA_L3MDEV encode type
    tcp_nv: fix division by zero in tcpnv_acked()
    netfilter: nf_reject_ipv4: Fix use-after-free in send_reset
    netfilter: nft_set_hash: disable fast_ops for 2-len keys

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "7 fixes"

    * emailed patches from Andrew Morton :
    mm, swap: fix race between swap count continuation operations
    mm/huge_memory.c: deposit page table when copying a PMD migration entry
    initramfs: fix initramfs rebuilds w/ compression after disabling
    fs/hugetlbfs/inode.c: fix hwpoison reserve accounting
    ocfs2: fstrim: Fix start offset of first cluster group during fstrim
    mm, /proc/pid/pagemap: fix soft dirty marking for PMD migration entry
    userfaultfd: hugetlbfs: prevent UFFDIO_COPY to fill beyond the end of i_size

    Linus Torvalds
     
  • MIPS will soon not be a part of Imagination Technologies, and as such
    many @imgtec.com email addresses will no longer be valid. This patch
    updates the addresses for those who:

    - Have 10 or more patches in mainline authored using an @imgtec.com
    email address, or any patches dated within the past year.

    - Are still with Imagination but leaving as part of the MIPS business
    unit, as determined from an internal email address list.

    - Haven't already updated their email address (ie. JamesH) or expressed
    a desire to be excluded (ie. Maciej).

    - Acked v2 or earlier of this patch, which leaves Deng-Cheng, Matt &
    myself.

    New addresses are of the form firstname.lastname@mips.com, and all
    verified against an internal email address list. An entry is added to
    .mailmap for each person such that get_maintainer.pl will report the new
    addresses rather than @imgtec.com addresses which will soon be dead.

    Instances of the affected addresses throughout the tree are then
    mechanically replaced with the new @mips.com address.

    Signed-off-by: Paul Burton
    Cc: Deng-Cheng Zhu
    Cc: Deng-Cheng Zhu
    Acked-by: Dengcheng Zhu
    Cc: Matt Redfearn
    Cc: Matt Redfearn
    Acked-by: Matt Redfearn
    Cc: Andrew Morton
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: trivial@kernel.org
    Signed-off-by: Linus Torvalds

    Paul Burton
     

03 Nov, 2017

11 commits

  • Commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for
    /proc/cpuinfo "cpu MHz"") is not sufficient to restore the previous
    behavior of "cpu MHz" in /proc/cpuinfo on x86 due to some changes
    made after the commit it has reverted.

    To address this, make the code in question use arch_freq_get_on_cpu()
    which also is used by cpufreq for reporting the current frequency of
    CPUs and since that function doesn't really depend on cpufreq in any
    way, drop the CONFIG_CPU_FREQ dependency for the object file
    containing it.

    Also refactor arch_freq_get_on_cpu() somewhat to avoid IPIs and
    return cached values right away if it is called very often over a
    short time (to prevent user space from triggering IPI storms through
    it).

    Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
    Cc: stable@kernel.org # 4.13 - together with 890da9cf0983
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • One page may store a set of entries of the sis->swap_map
    (swap_info_struct->swap_map) in multiple swap clusters.

    If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX,
    multiple pages will be used to store the set of entries of the
    sis->swap_map. And the pages are linked with page->lru. This is called
    swap count continuation. To access the pages which store the set of
    entries of the sis->swap_map simultaneously, previously, sis->lock is
    used. But to improve the scalability of __swap_duplicate(), swap
    cluster lock may be used in swap_count_continued() now. This may race
    with add_swap_count_continuation() which operates on a nearby swap
    cluster, in which the sis->swap_map entries are stored in the same page.

    The race can cause wrong swap count in practice, thus cause unfreeable
    swap entries or software lockup, etc.

    To fix the race, a new spin lock called cont_lock is added to struct
    swap_info_struct to protect the swap count continuation page list. This
    is a lock at the swap device level, so the scalability isn't very well.
    But it is still much better than the original sis->lock, because it is
    only acquired/released when swap count continuation is used. Which is
    considered rare in practice. If it turns out that the scalability
    becomes an issue for some workloads, we can split the lock into some
    more fine grained locks.

    Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com
    Fixes: 235b62176712 ("mm/swap: add cluster lock")
    Signed-off-by: "Huang, Ying"
    Cc: Johannes Weiner
    Cc: Shaohua Li
    Cc: Tim Chen
    Cc: Michal Hocko
    Cc: Aaron Lu
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Minchan Kim
    Cc: Hugh Dickins
    Cc: [4.11+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • We need to deposit pre-allocated PTE page table when a PMD migration
    entry is copied in copy_huge_pmd(). Otherwise, we will leak the
    pre-allocated page and cause a NULL pointer dereference later in
    zap_huge_pmd().

    The missing counters during PMD migration entry copy process are added
    as well.

    The bug report is here: https://lkml.org/lkml/2017/10/29/214

    Link: http://lkml.kernel.org/r/20171030144636.4836-1-zi.yan@sent.com
    Fixes: 84c3fc4e9c563 ("mm: thp: check pmd migration entry in common path")
    Signed-off-by: Zi Yan
    Reported-by: Fengguang Wu
    Acked-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zi Yan
     
  • This is a follow-up to commit 57ddfdaa9a72 ("initramfs: fix disabling of
    initramfs (and its compression)"). This particular commit fixed the use
    case where we build the kernel with an initramfs with no compression,
    and then we build the kernel with no initramfs.

    Now this still left us with the same case as described here:

    http://lkml.kernel.org/r/20170521033337.6197-1-f.fainelli@gmail.com

    not working with initramfs compression. This can be seen by the
    following steps/timestamps:

    https://www.spinics.net/lists/kernel/msg2598153.html

    .initramfs_data.cpio.gz.cmd is correct:

    cmd_usr/initramfs_data.cpio.gz := /bin/bash
    ./scripts/gen_initramfs_list.sh -o usr/initramfs_data.cpio.gz -u 1000 -g 1000 /home/fainelli/work/uclinux-rootfs/romfs /home/fainelli/work/uclinux-rootfs/misc/initramfs.dev

    and was generated the first time we did generate the gzip initramfs, so
    the command has not changed, nor its arguments, so we just don't call
    it, no initramfs cpio is re-generated as a consequence.

    The fix for this problem is just to properly keep track of the
    .initramfs_cpio_data.d file by suffixing it with the compression
    extension. This takes care of properly tracking dependencies such that
    the initramfs get (re)generated any time files are added/deleted etc.

    Link: http://lkml.kernel.org/r/20170930033936.6722-1-f.fainelli@gmail.com
    Fixes: db2aa7fd15e8 ("initramfs: allow again choice of the embedded initramfs compression algorithm")
    Fixes: 9e3596b0c653 ("kbuild: initramfs cleanup, set target from Kconfig")
    Signed-off-by: Florian Fainelli
    Cc: "Francisco Blas Izquierdo Riera (klondike)"
    Cc: Nicholas Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Fainelli
     
  • Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in bad
    (negative) reserved huge page counts. This may not happen immediately,
    but may happen later when the underlying file is removed or filesystem
    unmounted. For example:

    AnonHugePages: 0 kB
    ShmemHugePages: 0 kB
    HugePages_Total: 1
    HugePages_Free: 0
    HugePages_Rsvd: 18446744073709551615
    HugePages_Surp: 0
    Hugepagesize: 2048 kB

    In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts is
    called after remove_huge_page. hugetlb_fix_reserve_counts is designed
    to only be called/used only if a failure is returned from
    hugetlb_unreserve_pages. Therefore, call hugetlb_unreserve_pages as
    required and only call hugetlb_fix_reserve_counts in the unlikely event
    that hugetlb_unreserve_pages returns an error.

    Link: http://lkml.kernel.org/r/20171019230007.17043-2-mike.kravetz@oracle.com
    Fixes: 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
    Signed-off-by: Mike Kravetz
    Acked-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: Aneesh Kumar
    Cc: Anshuman Khandual
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • The first cluster group descriptor is not stored at the start of the
    group but at an offset from the start. We need to take this into
    account while doing fstrim on the first cluster group. Otherwise we
    will wrongly start fstrim a few blocks after the desired start block and
    the range can cross over into the next cluster group and zero out the
    group descriptor there. This can cause filesytem corruption that cannot
    be fixed by fsck.

    Link: http://lkml.kernel.org/r/1507835579-7308-1-git-send-email-ashish.samant@oracle.com
    Signed-off-by: Ashish Samant
    Reviewed-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ashish Samant
     
  • When the pagetable is walked in the implementation of /proc//pagemap,
    pmd_soft_dirty() is used for both the PMD huge page map and the PMD
    migration entries. That is wrong, pmd_swp_soft_dirty() should be used
    for the PMD migration entries instead because the different page table
    entry flag is used.

    As a result, /proc/pid/pagemap may report incorrect soft dirty information
    for PMD migration entries.

    Link: http://lkml.kernel.org/r/20171017081818.31795-1-ying.huang@intel.com
    Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
    Signed-off-by: "Huang, Ying"
    Acked-by: Kirill A. Shutemov
    Acked-by: Naoya Horiguchi
    Cc: Michal Hocko
    Cc: David Rientjes
    Cc: Arnd Bergmann
    Cc: Hugh Dickins
    Cc: "Jérôme Glisse"
    Cc: Daniel Colascione
    Cc: Zi Yan
    Cc: Anshuman Khandual
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • This oops:

    kernel BUG at fs/hugetlbfs/inode.c:484!
    RIP: remove_inode_hugepages+0x3d0/0x410
    Call Trace:
    hugetlbfs_setattr+0xd9/0x130
    notify_change+0x292/0x410
    do_truncate+0x65/0xa0
    do_sys_ftruncate.constprop.3+0x11a/0x180
    SyS_ftruncate+0xe/0x10
    tracesys+0xd9/0xde

    was caused by the lack of i_size check in hugetlb_mcopy_atomic_pte.

    mmap() can still succeed beyond the end of the i_size after vmtruncate
    zapped vmas in those ranges, but the faults must not succeed, and that
    includes UFFDIO_COPY.

    We could differentiate the retval to userland to represent a SIGBUS like
    a page fault would do (vs SIGSEGV), but it doesn't seem very useful and
    we'd need to pick a random retval as there's no meaningful syscall
    retval that would differentiate from SIGSEGV and SIGBUS, there's just
    -EFAULT.

    Link: http://lkml.kernel.org/r/20171016223914.2421-2-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Mike Kravetz
    Cc: Mike Rapoport
    Cc: "Dr. David Alan Gilbert"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Jiri Pirko says:

    ====================
    net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath

    This patchset's main patch is patch number 2. It carries the
    description. Patch 1 is just a dependency.

    ---
    v3->v4:
    - rebased to be applicable on top of the current net-next
    v2->v3:
    - Using head change callback to replace miniq pointer every time tp head
    changes. This eliminates one rcu dereference and makes the claim "without
    added overhead" valid.
    v1->v2:
    - Use dev instead of skb->dev in sch_handle_egress as pointed out by Daniel
    - Fixed synchronize_rcu_bh() in mini_qdisc_disable and commented
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In sch_handle_egress and sch_handle_ingress tp->q is used only in order
    to update stats. So stats and filter list are the only things that are
    needed in clsact qdisc fastpath processing. Introduce new mini_Qdisc
    struct to hold those items. Also, introduce a helper to swap the
    mini_Qdisc structures in case filter list head changes.

    This removes need for tp->q usage without added overhead.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko
     
  • Add a callback that is to be called whenever head of the chain changes.
    Also provide a callback for the default case when the caller gets a
    block using non-extended getter.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko