23 Sep, 2016

1 commit


21 Sep, 2016

1 commit

  • This commit introduces a generic library to estimate either the min or
    max value of a time-varying variable over a recent time window. This
    is code originally from Kathleen Nichols. The current form of the code
    is from Van Jacobson.

    A single struct minmax_sample will track the estimated windowed-max
    value of the series if you call minmax_running_max() or the estimated
    windowed-min value of the series if you call minmax_running_min().

    Nearly equivalent code is already in place for minimum RTT estimation
    in the TCP stack. This commit extracts that code and generalizes it to
    handle both min and max. Moving the code here reduces the footprint
    and complexity of the TCP code base and makes the filter generally
    available for other parts of the codebase, including an upcoming TCP
    congestion control module.

    This library works well for time series where the measurements are
    smoothly increasing or decreasing.

    Signed-off-by: Van Jacobson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Eric Dumazet
    Signed-off-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Neal Cardwell
     

20 Sep, 2016

1 commit

  • The insecure_elasticity setting is an ugly wart brought out by
    users who need to insert duplicate objects (that is, distinct
    objects with identical keys) into the same table.

    In fact, those users have a much bigger problem. Once those
    duplicate objects are inserted, they don't have an interface to
    find them (unless you count the walker interface which walks
    over the entire table).

    Some users have resorted to doing a manual walk over the hash
    table which is of course broken because they don't handle the
    potential existence of multiple hash tables. The result is that
    they will break sporadically when they encounter a hash table
    resize/rehash.

    This patch provides a way out for those users, at the expense
    of an extra pointer per object. Essentially each object is now
    a list of objects carrying the same key. The hash table will
    only see the lists so nothing changes as far as rhashtable is
    concerned.

    To use this new interface, you need to insert a struct rhlist_head
    into your objects instead of struct rhash_head. While the hash
    table is unchanged, for type-safety you'll need to use struct
    rhltable instead of struct rhashtable. All the existing interfaces
    have been duplicated for rhlist, including the hash table walker.

    One missing feature is nulls marking because AFAIK the only potential
    user of it does not need duplicate objects. Should anyone need
    this it shouldn't be too hard to add.

    Signed-off-by: Herbert Xu
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Sep, 2016

1 commit


16 Sep, 2016

1 commit

  • Commit d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan
    info from skb->vlan_tci") made flow dissector look at vlan_proto
    when vlan is present. Since test_bpf sets skb->vlan_tci to ~0
    (including VLAN_TAG_PRESENT) we have to populate skb->vlan_proto.

    Fixes false negative on test #24:
    test_bpf: #24 LD_PAYLOAD_OFF jited:0 175 ret 0 != 42 FAIL (1 times)

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Dinan Gunawardena
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

13 Sep, 2016

1 commit


07 Sep, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next
    tree. Most relevant updates are the removal of per-conntrack timers to
    use a workqueue/garbage collection approach instead from Florian
    Westphal, the hash and numgen expression for nf_tables from Laura
    Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
    removal of ip_conntrack sysctl and many other incremental updates on our
    Netfilter codebase.

    More specifically, they are:

    1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
    transport area in dccp, sctp, tcp, udp and udplite protocol
    conntrackers, from Gao Feng.

    2) Missing whitespace on error message in physdev match, from Hangbin Liu.

    3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.

    4) Add nf_ct_expires() helper function and use it, from Florian Westphal.

    5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
    from Florian.

    6) Rename nf_tables set implementation to nft_set_{name}.c

    7) Introduce the hash expression to allow arbitrary hashing of selector
    concatenations, from Laura Garcia Liebana.

    8) Remove ip_conntrack sysctl backward compatibility code, this code has
    been around for long time already, and we have two interfaces to do
    this already: nf_conntrack sysctl and ctnetlink.

    9) Use nf_conntrack_get_ht() helper function whenever possible, instead
    of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.

    10) Add quota expression for nf_tables.

    11) Add number generator expression for nf_tables, this supports
    incremental and random generators that can be combined with maps,
    very useful for load balancing purpose, again from Laura Garcia Liebana.

    12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.

    13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
    configuration, this is used by a follow up patch to perform better chain
    update validation.

    14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
    nft_set_hash implementation to honor the NLM_F_EXCL flag.

    15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
    patch from Florian Westphal.

    16) Don't use the DYING bit to know if the conntrack event has been already
    delivered, instead a state variable to track event re-delivery
    states, also from Florian.

    17) Remove the per-conntrack timer, use the workqueue approach that was
    discussed during the NFWS, from Florian Westphal.

    18) Use the netlink conntrack table dump path to kill stale entries,
    again from Florian.

    19) Add a garbage collector to get rid of stale conntracks, from
    Florian.

    20) Reschedule garbage collector if eviction rate is high.

    21) Get rid of the __nf_ct_kill_acct() helper.

    22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.

    23) Make nf_log_set() interface assertive on unsupported families.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Sep, 2016

2 commits

  • Some versions of gcc don't like tests for the value of an undefined
    preprocessor symbol, even in the #else branch of an #ifndef:

    lib/test_hash.c:224:7: warning: "HAVE_ARCH__HASH_32" is not defined [-Wundef]
    #elif HAVE_ARCH__HASH_32 != 1
    ^
    lib/test_hash.c:229:7: warning: "HAVE_ARCH_HASH_32" is not defined [-Wundef]
    #elif HAVE_ARCH_HASH_32 != 1
    ^
    lib/test_hash.c:234:7: warning: "HAVE_ARCH_HASH_64" is not defined [-Wundef]
    #elif HAVE_ARCH_HASH_64 != 1
    ^

    Seen with gcc 4.9, not seen with 4.1.2.

    Change the logic to only check the value inside an #ifdef to fix this.

    Fixes: 468a9428521e7d00 (": Add support for architecture-specific functions")
    Link: http://lkml.kernel.org/r/20160829214952.1334674-4-arnd@arndb.de
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Arnd Bergmann
    Acked-by: George Spelvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • lib/test_hash.c: In function 'test_hash_init':
    lib/test_hash.c:146:2: warning: missing braces around initializer [-Wmissing-braces]

    Fixes: 468a9428521e7d00 (": Add support for architecture-specific functions")
    Link: http://lkml.kernel.org/r/20160829214952.1334674-3-arnd@arndb.de
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Arnd Bergmann
    Acked-by: George Spelvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

31 Aug, 2016

1 commit

  • There are three usercopy warnings which are currently being silenced for
    gcc 4.6 and newer:

    1) "copy_from_user() buffer size is too small" compile warning/error

    This is a static warning which happens when object size and copy size
    are both const, and copy size > object size. I didn't see any false
    positives for this one. So the function warning attribute seems to
    be working fine here.

    Note this scenario is always a bug and so I think it should be
    changed to *always* be an error, regardless of
    CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

    2) "copy_from_user() buffer size is not provably correct" compile warning

    This is another static warning which happens when I enable
    __compiletime_object_size() for new compilers (and
    CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size
    is const, but copy size is *not*. In this case there's no way to
    compare the two at build time, so it gives the warning. (Note the
    warning is a byproduct of the fact that gcc has no way of knowing
    whether the overflow function will be called, so the call isn't dead
    code and the warning attribute is activated.)

    So this warning seems to only indicate "this is an unusual pattern,
    maybe you should check it out" rather than "this is a bug".

    I get 102(!) of these warnings with allyesconfig and the
    __compiletime_object_size() gcc check removed. I don't know if there
    are any real bugs hiding in there, but from looking at a small
    sample, I didn't see any. According to Kees, it does sometimes find
    real bugs. But the false positive rate seems high.

    3) "Buffer overflow detected" runtime warning

    This is a runtime warning where object size is const, and copy size >
    object size.

    All three warnings (both static and runtime) were completely disabled
    for gcc 4.6 with the following commit:

    2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

    That commit mistakenly assumed that the false positives were caused by a
    gcc bug in __compiletime_object_size(). But in fact,
    __compiletime_object_size() seems to be working fine. The false
    positives were instead triggered by #2 above. (Though I don't have an
    explanation for why the warnings supposedly only started showing up in
    gcc 4.6.)

    So remove warning #2 to get rid of all the false positives, and re-enable
    warnings #1 and #3 by reverting the above commit.

    Furthermore, since #1 is a real bug which is detected at compile time,
    upgrade it to always be an error.

    Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
    needed.

    Signed-off-by: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Andy Lutomirski
    Cc: Steven Rostedt
    Cc: Brian Gerst
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Byungchul Park
    Cc: Nilay Vaish
    Signed-off-by: Linus Torvalds

    Josh Poimboeuf
     

30 Aug, 2016

1 commit


27 Aug, 2016

1 commit


26 Aug, 2016

1 commit

  • This patch modifies __rhashtable_insert_fast() so it returns the
    existing object that clashes with the one that you want to insert.
    In case the object is successfully inserted, NULL is returned.
    Otherwise, you get an error via ERR_PTR().

    This patch adapts the existing callers of __rhashtable_insert_fast()
    so they handle this new logic, and it adds a new
    rhashtable_lookup_get_insert_key() interface to fetch this existing
    object.

    nf_tables needs this change to improve handling of EEXIST cases via
    honoring the NLM_F_EXCL flag and by checking if the data part of the
    mapping matches what we have.

    Cc: Herbert Xu
    Cc: Thomas Graf
    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Herbert Xu

    Pablo Neira Ayuso
     

20 Aug, 2016

1 commit

  • The commit 8f6fd83c6c5ec66a4a70c728535ddcdfef4f3697 ("rhashtable:
    accept GFP flags in rhashtable_walk_init") added a GFP flag argument
    to rhashtable_walk_init because some users wish to use the walker
    in an unsleepable context.

    In fact we don't need to allocate memory in rhashtable_walk_init
    at all. The walker is always paired with an iterator so we could
    just stash ourselves there.

    This patch does that by introducing a new enter function to replace
    the existing init function. This way we don't have to churn all
    the existing users again.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Aug, 2016

1 commit

  • Pull networking fixes from David Miller:

    1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
    Fietkau.

    2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

    3) Fix some tg3 ethtool logic bugs, and one that would cause no
    interrupts to be generated when rx-coalescing is set to 0. From
    Satish Baddipadige and Siva Reddy Kallam.

    4) QLCNIC mailbox corruption and napi budget handling fix from Manish
    Chopra.

    5) Fix fib_trie logic when walking the trie during /proc/net/route
    output than can access a stale node pointer. From David Forster.

    6) Several sctp_diag fixes from Phil Sutter.

    7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

    8) Checksum fixup fixes in bpf from Daniel Borkmann.

    9) Memork leaks in nfnetlink, from Liping Zhang.

    10) Use after free in rxrpc, from David Howells.

    11) Use after free in new skb_array code of macvtap driver, from Jason
    Wang.

    12) Calipso resource leak, from Colin Ian King.

    13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

    14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

    15) Fix lockdep splats in macsec, from Sabrina Dubroca.

    16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
    handling.

    17) Various tc-action bug fixes, from CONG Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
    net_sched: allow flushing tc police actions
    net_sched: unify the init logic for act_police
    net_sched: convert tcf_exts from list to pointer array
    net_sched: move tc offload macros to pkt_cls.h
    net_sched: fix a typo in tc_for_each_action()
    net_sched: remove an unnecessary list_del()
    net_sched: remove the leftover cleanup_a()
    mlxsw: spectrum: Allow packets to be trapped from any PG
    mlxsw: spectrum: Unmap 802.1Q FID before destroying it
    mlxsw: spectrum: Add missing rollbacks in error path
    mlxsw: reg: Fix missing op field fill-up
    mlxsw: spectrum: Trap loop-backed packets
    mlxsw: spectrum: Add missing packet traps
    mlxsw: spectrum: Mark port as active before registering it
    mlxsw: spectrum: Create PVID vPort before registering netdevice
    mlxsw: spectrum: Remove redundant errors from the code
    mlxsw: spectrum: Don't return upon error in removal path
    i40e: check for and deal with non-contiguous TCs
    ixgbe: Re-enable ability to toggle VLAN filtering
    ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
    ...

    Linus Torvalds
     

16 Aug, 2016

1 commit

  • I got this:

    ================================================================================
    UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13
    shift exponent 64 is too large for 64-bit type 'long unsigned int'
    CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    Workqueue: events rht_deferred_worker
    0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3
    ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0
    0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640
    Call Trace:
    [] dump_stack+0xac/0xfc
    [] ? _atomic_dec_and_lock+0xc4/0xc4
    [] ubsan_epilogue+0xd/0x8a
    [] __ubsan_handle_shift_out_of_bounds+0x255/0x29a
    [] ? __ubsan_handle_out_of_bounds+0x180/0x180
    [] ? nl80211_req_set_reg+0x256/0x2f0
    [] ? print_context_stack+0x8a/0x160
    [] ? amd_pmu_reset+0x341/0x380
    [] rht_deferred_worker+0x1618/0x1790
    [] ? rht_deferred_worker+0x1618/0x1790
    [] ? rhashtable_jhash2+0x370/0x370
    [] ? process_one_work+0x6fd/0x1970
    [] process_one_work+0x79f/0x1970
    [] ? process_one_work+0x6fd/0x1970
    [] ? try_to_grab_pending+0x4c0/0x4c0
    [] ? worker_thread+0x1c4/0x1340
    [] worker_thread+0x55f/0x1340
    [] ? __schedule+0x4df/0x1d40
    [] ? process_one_work+0x1970/0x1970
    [] ? process_one_work+0x1970/0x1970
    [] kthread+0x237/0x390
    [] ? __kthread_parkme+0x280/0x280
    [] ? _raw_spin_unlock_irq+0x33/0x50
    [] ret_from_fork+0x1f/0x40
    [] ? __kthread_parkme+0x280/0x280
    ================================================================================

    roundup_pow_of_two() is undefined when called with an argument of 0, so
    let's avoid the call and just fall back to ht->p.min_size (which should
    never be smaller than HASH_MIN_SIZE).

    Cc: Herbert Xu
    Signed-off-by: Vegard Nossum
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Vegard Nossum
     

15 Aug, 2016

1 commit

  • Sander reports following splat after netfilter nat bysrc table got
    converted to rhashtable:

    swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
    [] warn_alloc_failed+0xdd/0x140
    [] __alloc_pages_nodemask+0x3e1/0xcf0
    [] alloc_pages_current+0x8d/0x110
    [] kmalloc_order+0x1f/0x70
    [] __kmalloc+0x129/0x140
    [] bucket_table_alloc+0xc1/0x1d0
    [] rhashtable_insert_rehash+0x5d/0xe0
    [] nf_nat_setup_info+0x2ef/0x400

    The failure happens when allocating the spinlock array.
    Even with GFP_KERNEL its unlikely for such a large allocation
    to succeed.

    Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
    to adding NOWARN for atomic allocations this also makes the bucket-array
    sizing more conservative.

    In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
    Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
    IOW, consider size needed by a single spinlock when determining
    number of locks per cpu. So with 64 byte per cacheline and 4 byte per
    spinlock this gives 32 locks per cpu.

    Resulting size of the lock-array (sizeof(spinlock) == 4):

    cpus: 1 2 4 8 16 32 64
    old: 1k 1k 4k 8k 16k 16k 16k
    new: 128 256 512 1k 2k 4k 8k

    8k allocation should have decent chance of success even
    with GFP_ATOMIC, and should not fail with GFP_KERNEL.

    With 72-byte spinlock (LOCKDEP):
    cpus : 1 2
    old: 9k 18k
    new: ~2k ~4k

    Reported-by: Sander Eikelenboom
    Suggested-by: Thomas Graf
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

09 Aug, 2016

2 commits

  • When I initially added the unsafe_[get|put]_user() helpers in commit
    5b24a7a2aa20 ("Add 'unsafe' user access functions for batched
    accesses"), I made the mistake of modeling the interface on our
    traditional __[get|put]_user() functions, which return zero on success,
    or -EFAULT on failure.

    That interface is fairly easy to use, but it's actually fairly nasty for
    good code generation, since it essentially forces the caller to check
    the error value for each access.

    In particular, since the error handling is already internally
    implemented with an exception handler, and we already use "asm goto" for
    various other things, we could fairly easily make the error cases just
    jump directly to an error label instead, and avoid the need for explicit
    checking after each operation.

    So switch the interface to pass in an error label, rather than checking
    the error value in the caller. Best do it now before we start growing
    more users (the signal handling code in particular would be a good place
    to use the new interface).

    So rather than

    if (unsafe_get_user(x, ptr))
    ... handle error ..

    the interface is now

    unsafe_get_user(x, ptr, label);

    where an error during the user mode fetch will now just cause a jump to
    'label' in the caller.

    Right now the actual _implementation_ of this all still ends up being a
    "if (err) goto label", and does not take advantage of any exception
    label tricks, but for "unsafe_put_user()" in particular it should be
    fairly straightforward to convert to using the exception table model.

    Note that "unsafe_get_user()" is much harder to convert to a clever
    exception table model, because current versions of gcc do not allow the
    use of "asm goto" (for the exception) with output values (for the actual
    value to be fetched). But that is hopefully not a limitation in the
    long term.

    [ Also note that it might be a good idea to switch unsafe_get_user() to
    actually _return_ the value it fetches from user space, but this
    commit only changes the error handling semantics ]

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Looks like a simple copy'n'paste error.

    Fixes: 1aa661f5c3df1 ("rhashtable-test: Measure time to insert, remove & traverse entries")
    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     

04 Aug, 2016

2 commits

  • Although dynamic debug is often only used for debug builds, sometimes
    its enabled for production builds as well. Minimize its impact by using
    jump labels. This reduces the text section by 7000+ bytes in the kernel
    image below. It does increase data, but this should only be referenced
    when changing the direction of the branches, and hence usually not in
    cache.

    text data bss dec hex filename
    8194852 4879776 925696 14000324 d5a0c4 vmlinux.pre
    8187337 4960224 925696 14073257 d6bda9 vmlinux.post

    Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.com
    Signed-off-by: Jason Baron
    Cc: "David S. Miller"
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Heiko Carstens
    Cc: Joe Perches
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • The dma-mapping core and the implementations do not change the DMA
    attributes passed by pointer. Thus the pointer can point to const data.
    However the attributes do not have to be a bitfield. Instead unsigned
    long will do fine:

    1. This is just simpler. Both in terms of reading the code and setting
    attributes. Instead of initializing local attributes on the stack
    and passing pointer to it to dma_set_attr(), just set the bits.

    2. It brings safeness and checking for const correctness because the
    attributes are passed by value.

    Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
    )

    Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
    Signed-off-by: Krzysztof Kozlowski
    Acked-by: Vineet Gupta
    Acked-by: Robin Murphy
    Acked-by: Hans-Christian Noren Egtvedt
    Acked-by: Mark Salter [c6x]
    Acked-by: Jesper Nilsson [cris]
    Acked-by: Daniel Vetter [drm]
    Reviewed-by: Bart Van Assche
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Fabien Dessenne [bdisp]
    Reviewed-by: Marek Szyprowski [vb2-core]
    Acked-by: David Vrabel [xen]
    Acked-by: Konrad Rzeszutek Wilk [xen swiotlb]
    Acked-by: Joerg Roedel [iommu]
    Acked-by: Richard Kuo [hexagon]
    Acked-by: Geert Uytterhoeven [m68k]
    Acked-by: Gerald Schaefer [s390]
    Acked-by: Bjorn Andersson
    Acked-by: Hans-Christian Noren Egtvedt [avr32]
    Acked-by: Vineet Gupta [arc]
    Acked-by: Robin Murphy [arm64 and dma-iommu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     

03 Aug, 2016

8 commits

  • Merge yet more updates from Andrew Morton:

    - the rest of ocfs2

    - various hotfixes, mainly MM

    - quite a bit of misc stuff - drivers, fork, exec, signals, etc.

    - printk updates

    - firmware

    - checkpatch

    - nilfs2

    - more kexec stuff than usual

    - rapidio updates

    - w1 things

    * emailed patches from Andrew Morton : (111 commits)
    ipc: delete "nr_ipc_ns"
    kcov: allow more fine-grained coverage instrumentation
    init/Kconfig: add clarification for out-of-tree modules
    config: add android config fragments
    init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
    relay: add global mode support for buffer-only channels
    init: allow blacklisting of module_init functions
    w1:omap_hdq: fix regression
    w1: add helper macro module_w1_family
    w1: remove need for ida and use PLATFORM_DEVID_AUTO
    rapidio/switches: add driver for IDT gen3 switches
    powerpc/fsl_rio: apply changes for RIO spec rev 3
    rapidio: modify for rev.3 specification changes
    rapidio: change inbound window size type to u64
    rapidio/idt_gen2: fix locking warning
    rapidio: fix error handling in mbox request/release functions
    rapidio/tsi721_dma: advance queue processing from transfer submit call
    rapidio/tsi721: add messaging mbox selector parameter
    rapidio/tsi721: add PCIe MRRS override parameter
    rapidio/tsi721_dma: add channel mask and queue size parameters
    ...

    Linus Torvalds
     
  • For more targeted fuzzing, it's better to disable kernel-wide
    instrumentation and instead enable it on a per-subsystem basis. This
    follows the pattern of UBSAN and allows you to compile in the kcov
    driver without instrumenting the whole kernel.

    To instrument a part of the kernel, you can use either

    # for a single file in the current directory
    KCOV_INSTRUMENT_filename.o := y

    or

    # for all the files in the current directory (excluding subdirectories)
    KCOV_INSTRUMENT := y

    or

    # (same as above)
    ccflags-y += $(CFLAGS_KCOV)

    or

    # for all the files in the current directory (including subdirectories)
    subdir-ccflags-y += $(CFLAGS_KCOV)

    Link: http://lkml.kernel.org/r/1464008380-11405-1-git-send-email-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Cc: Dmitry Vyukov
    Cc: Quentin Casasnovas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • The crc32 test function measures the elapsed time in nanoseconds, but
    uses 'struct timespec' for that. We want to remove timespec from the
    kernel for y2038 compatibility, and ktime_get_ns() also helps make the
    code simpler here.

    It is also slightly better to use monontonic time, as we are only
    interested in the time difference.

    Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Cc: "David S . Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • When a large enough area in the iommu bitmap is found but would span a
    boundary we continue the search starting from the next bit position.
    For large allocations this can lead to several useless invocations of
    bitmap_find_next_zero_area() and iommu_is_span_boundary().

    Continue the search from the start of the next segment (which is the
    next bit position such that we'll not cross the same segment boundary
    again).

    Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1606081910070.3211@schleppi
    Signed-off-by: Sebastian Ott
    Reviewed-by: Gerald Schaefer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Ott
     
  • Extend the ratelimiting facility to print the amount of suppressed lines
    when it is being released.

    This use case is aimed at short-termed, burst-like users for which we
    want to output the suppressed lines stats only once, after it has been
    disposed of. For an example, see /dev/kmsg usage in a follow-on patch.

    Also, change the printk() line we issue on release to not use
    "callbacks" as it is misleading: we're not suppressing callbacks but
    printk() calls.

    This has been separated from a previous patch by Linus.

    Link: http://lkml.kernel.org/r/20160716061745.15795-2-bp@alien8.de
    Signed-off-by: Borislav Petkov
    Cc: Dave Young
    Cc: Franck Bui
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • handle_object_size_mismatch() used %pk to format a kernel pointer with
    pr_err(). This seemed to be a misspelling for %pK, but using this to
    format a kernel pointer does not make much sence here.

    Therefore use %p instead, like in handle_missaligned_access().

    Link: http://lkml.kernel.org/r/20160730083010.11569-1-nicolas.iooss_linux@m4x.org
    Signed-off-by: Nicolas Iooss
    Acked-by: Andrey Ryabinin
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Iooss
     
  • Radix trees may be used not only for storing page cache pages, so
    unconditionally accounting radix tree nodes to the current memory cgroup
    is bad: if a radix tree node is used for storing data shared among
    different cgroups we risk pinning dead memory cgroups forever.

    So let's only account radix tree nodes if it was explicitly requested by
    passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to
    account page cache entries, so mark mapping->page_tree so.

    Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup")
    Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: [4.6+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Pull kbuild updates from Michal Marek:

    - GCC plugin support by Emese Revfy from grsecurity, with a fixup from
    Kees Cook. The plugins are meant to be used for static analysis of
    the kernel code. Two plugins are provided already.

    - reduction of the gcc commandline by Arnd Bergmann.

    - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

    - bin2c fix by Michael Tautschnig

    - setlocalversion fix by Wolfram Sang

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    gcc-plugins: disable under COMPILE_TEST
    kbuild: Abort build on bad stack protector flag
    scripts: Fix size mismatch of kexec_purgatory_size
    kbuild: make samples depend on headers_install
    Kbuild: don't add obj tree in additional includes
    Kbuild: arch: look for generated headers in obtree
    Kbuild: always prefix objtree in LINUXINCLUDE
    Kbuild: avoid duplicate include path
    Kbuild: don't add ../../ to include path
    vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
    kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
    kconfig.h: use already defined macros for IS_REACHABLE() define
    export.h: use __is_defined() to check if __KSYM_* is defined
    kconfig.h: use __is_defined() to check if MODULE is defined
    kbuild: setlocalversion: print error to STDERR
    Add sancov plugin
    Add Cyclomatic complexity GCC plugin
    GCC plugin infrastructure
    Shared library support

    Linus Torvalds
     

02 Aug, 2016

1 commit

  • Pull crypto fixes from Herbert Xu:
    "This fixes a number of regressions in the marvell cesa driver caused
    by the chaining work, and a regression in lib/mpi that leads to a
    GFP_KERNEL allocation with preemption disabled"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: marvell - Don't copy IV vectors from the _process op for ciphers
    lib/mpi: Fix SG miter leak
    crypto: marvell - Update cache with input sg only when it is unmapped
    crypto: marvell - Don't chain at DMA level when backlog is disabled
    crypto: marvell - Fix memory leaks in TDMA chain for cipher requests

    Linus Torvalds
     

31 Jul, 2016

1 commit

  • Pull x86 microcode updates from Thomas Gleixner:

    - more work to make the microcode loader robust

    - a fix for the micro code load precedence

    - fixes for initrd loading with randomized memory

    - less printk noise on SMP machines

    * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit
    x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
    x86/microcode: Remove unused symbol exports
    x86/microcode/intel: Do not issue microcode updates messages on each CPU
    Documentation/microcode: Document some aspects for more clarity
    x86/microcode/AMD: Make amd_ucode_patch[] static
    x86/microcode/intel: Unexport save_mc_for_early()
    x86/microcode/intel: Rename load_microcode_early() to find_microcode_patch()
    x86/microcode: Propagate save_microcode_in_initrd() retval
    x86/microcode: Get rid of find_cpio_data()'s dummy offset arg
    lib/cpio: Make find_cpio_data()'s offset arg optional
    x86/microcode: Fix suspend to RAM with builtin microcode
    x86/microcode: Fix loading precedence

    Linus Torvalds
     

29 Jul, 2016

6 commits

  • In mpi_read_raw_from_sgl we may leak the SG miter resouces after
    reading the leading zeroes. This patch fixes this by stopping the
    iteration once the leading zeroes have been read.

    Fixes: 127827b9c295 ("lib/mpi: Do not do sg_virt")
    Reported-by: Nicolai Stange
    Tested-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • Merge more updates from Andrew Morton:
    "The rest of MM"

    * emailed patches from Andrew Morton : (101 commits)
    mm, compaction: simplify contended compaction handling
    mm, compaction: introduce direct compaction priority
    mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
    mm, page_alloc: make THP-specific decisions more generic
    mm, page_alloc: restructure direct compaction handling in slowpath
    mm, page_alloc: don't retry initial attempt in slowpath
    mm, page_alloc: set alloc_flags only once in slowpath
    lib/stackdepot.c: use __GFP_NOWARN for stack allocations
    mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
    mm, kasan: account for object redzone in SLUB's nearest_obj()
    mm: fix use-after-free if memory allocation failed in vma_adjust()
    zsmalloc: Delete an unnecessary check before the function call "iput"
    mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
    mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
    mm: optimize copy_page_to/from_iter_iovec
    mm: add cond_resched() to generic_swapfile_activate()
    Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
    mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
    mm: hwpoison: remove incorrect comments
    make __section_nr() more efficient
    ...

    Linus Torvalds
     
  • This (large, atomic) allocation attempt can fail. We expect and handle
    that, so avoid the scary warning.

    Link: http://lkml.kernel.org/r/20160720151905.GB19146@node.shutemov.name
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: David Rientjes
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • For KASAN builds:
    - switch SLUB allocator to using stackdepot instead of storing the
    allocation/deallocation stacks in the objects;
    - change the freelist hook so that parts of the freelist can be put
    into the quarantine.

    [aryabinin@virtuozzo.com: fixes]
    Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
    Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Christoph Lameter
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt (Red Hat)
    Cc: Joonsoo Kim
    Cc: Kostya Serebryany
    Cc: Andrey Ryabinin
    Cc: Kuthonuzo Luruo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data
    to userspace or from userspace. These functions have a fast path where
    they map a page using kmap_atomic and a slow path where they use kmap.

    kmap is slower than kmap_atomic, so the fast path is preferred.

    However, on kernels without highmem support, kmap just calls
    page_address, so there is no need to avoid kmap. On kernels without
    highmem support, the fast path just increases code size (and cache
    footprint) and it doesn't improve copy performance in any way.

    This patch enables the fast path only if CONFIG_HIGHMEM is defined.

    Code size reduced by this patch:
    x86 (without highmem) 928
    x86-64 960
    sparc64 848
    alpha 1136
    pa-risc 1200

    [akpm@linux-foundation.org: use IS_ENABLED(), per Andi]
    Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Mikulas Patocka
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Cc: Alexander Viro
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     
  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

28 Jul, 2016

2 commits

  • Pull random driver updates from Ted Ts'o:
    "A number of improvements for the /dev/random driver; the most
    important is the use of a ChaCha20-based CRNG for /dev/urandom, which
    is faster, more efficient, and easier to make scalable for
    silly/abusive userspace programs that want to read from /dev/urandom
    in a tight loop on NUMA systems.

    This set of patches also improves entropy gathering on VM's running on
    Microsoft Azure, and will take advantage of a hw random number
    generator (if present) to initialize the /dev/urandom pool"

    (It turns out that the random tree hadn't been in linux-next this time
    around, because it had been dropped earlier as being too quiet. Oh
    well).

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: strengthen input validation for RNDADDTOENTCNT
    random: add backtracking protection to the CRNG
    random: make /dev/urandom scalable for silly userspace programs
    random: replace non-blocking pool with a Chacha20-based CRNG
    random: properly align get_random_int_hash
    random: add interrupt callback to VMBus IRQ handler
    random: print a warning for the first ten uninitialized random users
    random: initialize the non-blocking pool via add_hwgenerator_randomness()

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

    2) Make DSA binding more sane, from Andrew Lunn.

    3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

    4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

    5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface. From Brenden Blanco and
    others.

    6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

    7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

    8) Simplify netlink conntrack entry layout, from Florian Westphal.

    9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

    10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

    11) Support qdisc packet injection in pktgen, from John Fastabend.

    12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

    13) Add NV congestion control support to TCP, from Lawrence Brakmo.

    14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

    15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

    16) Support MPLS over IPV4, from Simon Horman.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    xgene: Fix build warning with ACPI disabled.
    be2net: perform temperature query in adapter regardless of its interface state
    l2tp: Correctly return -EBADF from pppol2tp_getname.
    net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
    net: ipmr/ip6mr: update lastuse on entry change
    macsec: ensure rx_sa is set when validation is disabled
    tipc: dump monitor attributes
    tipc: add a function to get the bearer name
    tipc: get monitor threshold for the cluster
    tipc: make cluster size threshold for monitoring configurable
    tipc: introduce constants for tipc address validation
    net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
    MAINTAINERS: xgene: Add driver and documentation path
    Documentation: dtb: xgene: Add MDIO node
    dtb: xgene: Add MDIO node
    drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
    drivers: net: xgene: Use exported functions
    drivers: net: xgene: Enable MDIO driver
    drivers: net: xgene: Add backward compatibility
    drivers: net: phy: xgene: Add MDIO driver
    ...

    Linus Torvalds
     

27 Jul, 2016

1 commit