29 Jul, 2016

5 commits

  • Merge more updates from Andrew Morton:
    "The rest of MM"

    * emailed patches from Andrew Morton : (101 commits)
    mm, compaction: simplify contended compaction handling
    mm, compaction: introduce direct compaction priority
    mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
    mm, page_alloc: make THP-specific decisions more generic
    mm, page_alloc: restructure direct compaction handling in slowpath
    mm, page_alloc: don't retry initial attempt in slowpath
    mm, page_alloc: set alloc_flags only once in slowpath
    lib/stackdepot.c: use __GFP_NOWARN for stack allocations
    mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
    mm, kasan: account for object redzone in SLUB's nearest_obj()
    mm: fix use-after-free if memory allocation failed in vma_adjust()
    zsmalloc: Delete an unnecessary check before the function call "iput"
    mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
    mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
    mm: optimize copy_page_to/from_iter_iovec
    mm: add cond_resched() to generic_swapfile_activate()
    Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
    mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
    mm: hwpoison: remove incorrect comments
    make __section_nr() more efficient
    ...

    Linus Torvalds
     
  • This (large, atomic) allocation attempt can fail. We expect and handle
    that, so avoid the scary warning.

    Link: http://lkml.kernel.org/r/20160720151905.GB19146@node.shutemov.name
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: David Rientjes
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • For KASAN builds:
    - switch SLUB allocator to using stackdepot instead of storing the
    allocation/deallocation stacks in the objects;
    - change the freelist hook so that parts of the freelist can be put
    into the quarantine.

    [aryabinin@virtuozzo.com: fixes]
    Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
    Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Christoph Lameter
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt (Red Hat)
    Cc: Joonsoo Kim
    Cc: Kostya Serebryany
    Cc: Andrey Ryabinin
    Cc: Kuthonuzo Luruo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data
    to userspace or from userspace. These functions have a fast path where
    they map a page using kmap_atomic and a slow path where they use kmap.

    kmap is slower than kmap_atomic, so the fast path is preferred.

    However, on kernels without highmem support, kmap just calls
    page_address, so there is no need to avoid kmap. On kernels without
    highmem support, the fast path just increases code size (and cache
    footprint) and it doesn't improve copy performance in any way.

    This patch enables the fast path only if CONFIG_HIGHMEM is defined.

    Code size reduced by this patch:
    x86 (without highmem) 928
    x86-64 960
    sparc64 848
    alpha 1136
    pa-risc 1200

    [akpm@linux-foundation.org: use IS_ENABLED(), per Andi]
    Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.com
    Signed-off-by: Mikulas Patocka
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Cc: Alexander Viro
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     
  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

28 Jul, 2016

2 commits

  • Pull random driver updates from Ted Ts'o:
    "A number of improvements for the /dev/random driver; the most
    important is the use of a ChaCha20-based CRNG for /dev/urandom, which
    is faster, more efficient, and easier to make scalable for
    silly/abusive userspace programs that want to read from /dev/urandom
    in a tight loop on NUMA systems.

    This set of patches also improves entropy gathering on VM's running on
    Microsoft Azure, and will take advantage of a hw random number
    generator (if present) to initialize the /dev/urandom pool"

    (It turns out that the random tree hadn't been in linux-next this time
    around, because it had been dropped earlier as being too quiet. Oh
    well).

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: strengthen input validation for RNDADDTOENTCNT
    random: add backtracking protection to the CRNG
    random: make /dev/urandom scalable for silly userspace programs
    random: replace non-blocking pool with a Chacha20-based CRNG
    random: properly align get_random_int_hash
    random: add interrupt callback to VMBus IRQ handler
    random: print a warning for the first ten uninitialized random users
    random: initialize the non-blocking pool via add_hwgenerator_randomness()

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

    2) Make DSA binding more sane, from Andrew Lunn.

    3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

    4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

    5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface. From Brenden Blanco and
    others.

    6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

    7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

    8) Simplify netlink conntrack entry layout, from Florian Westphal.

    9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

    10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

    11) Support qdisc packet injection in pktgen, from John Fastabend.

    12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

    13) Add NV congestion control support to TCP, from Lawrence Brakmo.

    14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

    15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

    16) Support MPLS over IPV4, from Simon Horman.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    xgene: Fix build warning with ACPI disabled.
    be2net: perform temperature query in adapter regardless of its interface state
    l2tp: Correctly return -EBADF from pppol2tp_getname.
    net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
    net: ipmr/ip6mr: update lastuse on entry change
    macsec: ensure rx_sa is set when validation is disabled
    tipc: dump monitor attributes
    tipc: add a function to get the bearer name
    tipc: get monitor threshold for the cluster
    tipc: make cluster size threshold for monitoring configurable
    tipc: introduce constants for tipc address validation
    net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
    MAINTAINERS: xgene: Add driver and documentation path
    Documentation: dtb: xgene: Add MDIO node
    dtb: xgene: Add MDIO node
    drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
    drivers: net: xgene: Use exported functions
    drivers: net: xgene: Enable MDIO driver
    drivers: net: xgene: Add backward compatibility
    drivers: net: phy: xgene: Add MDIO driver
    ...

    Linus Torvalds
     

27 Jul, 2016

6 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds
     
  • The new helper is similar to radix_tree_maybe_preload(), but tries to
    preload number of nodes required to insert (1 << order) continuous
    naturally-aligned elements.

    This is required to push huge pages into pagecache.

    Link: http://lkml.kernel.org/r/1466021202-61880-24-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Currently, we store each page's allocation stacktrace on corresponding
    page_ext structure and it requires a lot of memory. This causes the
    problem that memory tight system doesn't work well if page_owner is
    enabled. Moreover, even with this large memory consumption, we cannot
    get full stacktrace because we allocate memory at boot time and just
    maintain 8 stacktrace slots to balance memory consumption. We could
    increase it to more but it would make system unusable or change system
    behaviour.

    To solve the problem, this patch uses stackdepot to store stacktrace.
    It obviously provides memory saving but there is a drawback that
    stackdepot could fail.

    stackdepot allocates memory at runtime so it could fail if system has
    not enough memory. But, most of allocation stack are generated at very
    early time and there are much memory at this time. So, failure would
    not happen easily. And, one failure means that we miss just one page's
    allocation stacktrace so it would not be a big problem. In this patch,
    when memory allocation failure happens, we store special stracktrace
    handle to the page that is failed to save stacktrace. With it, user can
    guess memory usage properly even if failure happens.

    Memory saving looks as following. (4GB memory system with page_owner)
    (before the patch -> after the patch)

    static allocation:
    92274688 bytes -> 25165824 bytes

    dynamic allocation after boot + kernel build:
    0 bytes -> 327680 bytes

    total:
    92274688 bytes -> 25493504 bytes

    72% reduction in total.

    Note that implementation looks complex than someone would imagine
    because there is recursion issue. stackdepot uses page allocator and
    page_owner is called at page allocation. Using stackdepot in page_owner
    could re-call page allcator and then page_owner. That is a recursion.
    To detect and avoid it, whenever we obtain stacktrace, recursion is
    checked and page_owner is set to dummy information if found. Dummy
    information means that this page is allocated for page_owner feature
    itself (such as stackdepot) and it's understandable behavior for user.

    [iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3]
    Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • get_hash_bucket() and put_hash_bucket() acquire and release the same
    spinlock, but this confuses static checkers such as sparse

    lib/dma-debug.c:254:27: warning: context imbalance in 'get_hash_bucket' - wrong count at exit
    lib/dma-debug.c:268:13: warning: context imbalance in 'put_hash_bucket' - unexpected unlock

    Add the appropriate acquire and release statements so that checkers can
    properly track the lock state.

    Link: http://lkml.kernel.org/r/20160701191552.24295-1-sboyd@codeaurora.org
    Signed-off-by: Stephen Boyd
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     
  • Pull crypto updates from Herbert Xu:
    "Here is the crypto update for 4.8:

    API:
    - first part of skcipher low-level conversions
    - add KPP (Key-agreement Protocol Primitives) interface.

    Algorithms:
    - fix IPsec/cryptd reordering issues that affects aesni
    - RSA no longer does explicit leading zero removal
    - add SHA3
    - add DH
    - add ECDH
    - improve DRBG performance by not doing CTR by hand

    Drivers:
    - add x86 AVX2 multibuffer SHA256/512
    - add POWER8 optimised crc32c
    - add xts support to vmx
    - add DH support to qat
    - add RSA support to caam
    - add Layerscape support to caam
    - add SEC1 AEAD support to talitos
    - improve performance by chaining requests in marvell/cesa
    - add support for Araneus Alea I USB RNG
    - add support for Broadcom BCM5301 RNG
    - add support for Amlogic Meson RNG
    - add support Broadcom NSP SoC RNG"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
    crypto: vmx - Fix aes_p8_xts_decrypt build failure
    crypto: vmx - Ignore generated files
    crypto: vmx - Adding support for XTS
    crypto: vmx - Adding asm subroutines for XTS
    crypto: skcipher - add comment for skcipher_alg->base
    crypto: testmgr - Print akcipher algorithm name
    crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
    crypto: nx - off by one bug in nx_of_update_msc()
    crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
    crypto: scatterwalk - Inline start/map/done
    crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
    crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
    crypto: scatterwalk - Fix test in scatterwalk_done
    crypto: api - Optimise away crypto_yield when hard preemption is on
    crypto: scatterwalk - add no-copy support to copychunks
    crypto: scatterwalk - Remove scatterwalk_bytes_sglen
    crypto: omap - Stop using crypto scatterwalk_bytes_sglen
    crypto: skcipher - Remove top-level givcipher interface
    crypto: user - Remove crypto_lookup_skcipher call
    crypto: cts - Convert to skcipher
    ...

    Linus Torvalds
     

26 Jul, 2016

3 commits

  • Pull timer updates from Thomas Gleixner:
    "This update provides the following changes:

    - The rework of the timer wheel which addresses the shortcomings of
    the current wheel (cascading, slow search for next expiring timer,
    etc). That's the first major change of the wheel in almost 20
    years since Finn implemted it.

    - A large overhaul of the clocksource drivers init functions to
    consolidate the Device Tree initialization

    - Some more Y2038 updates

    - A capability fix for timerfd

    - Yet another clock chip driver

    - The usual pile of updates, comment improvements all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
    tick/nohz: Optimize nohz idle enter
    clockevents: Make clockevents_subsys static
    clocksource/drivers/time-armada-370-xp: Fix return value check
    timers: Implement optimization for same expiry time in mod_timer()
    timers: Split out index calculation
    timers: Only wake softirq if necessary
    timers: Forward the wheel clock whenever possible
    timers/nohz: Remove pointless tick_nohz_kick_tick() function
    timers: Optimize collect_expired_timers() for NOHZ
    timers: Move __run_timers() function
    timers: Remove set_timer_slack() leftovers
    timers: Switch to a non-cascading wheel
    timers: Reduce the CPU index space to 256k
    timers: Give a few structs and members proper names
    hlist: Add hlist_is_singular_node() helper
    signals: Use hrtimer for sigtimedwait()
    timers: Remove the deprecated mod_timer_pinned() API
    timers, net/ipv4/inet: Initialize connection request timers as pinned
    timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
    timers, drivers/tty/metag_da: Initialize the poll timer as pinned
    ...

    Linus Torvalds
     
  • Pull x86 mm updates from Ingo Molnar:
    "Various x86 low level modifications:

    - preparatory work to support virtually mapped kernel stacks (Andy
    Lutomirski)

    - support for 64-bit __get_user() on 32-bit kernels (Benjamin
    LaHaise)

    - (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

    - MPX enhancements (Dave Hansen)

    - mremap() extension to allow remapping of the special VDSO vma, for
    purposes of user level context save/restore (Dmitry Safonov)

    - hweight and entry code cleanups (Borislav Petkov)

    - bitops code generation optimizations and cleanups with modern GCC
    (H. Peter Anvin)

    - syscall entry code optimizations (Paolo Bonzini)"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
    x86/mm/cpa: Add missing comment in populate_pdg()
    x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
    x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
    x86/smp: Remove unnecessary initialization of thread_info::cpu
    x86/smp: Remove stack_smp_processor_id()
    x86/uaccess: Move thread_info::addr_limit to thread_struct
    x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
    x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
    x86/dumpstack: When OOPSing, rewind the stack before do_exit()
    x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
    x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
    x86/dumpstack: Try harder to get a call trace on stack overflow
    x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
    x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
    x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
    x86/mm: Use pte_none() to test for empty PTE
    x86/mm: Disallow running with 32-bit PTEs to work around erratum
    x86/mm: Ignore A/D bits in pte/pmd/pud_none()
    x86/mm: Move swap offset/type up in PTE to work around erratum
    x86/entry: Inline enter_from_user_mode()
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "The locking tree was busier in this cycle than the usual pattern - a
    couple of major projects happened to coincide.

    The main changes are:

    - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
    across all SMP architectures (Peter Zijlstra)

    - add atomic_fetch_{inc/dec}() as well, using the generic primitives
    (Davidlohr Bueso)

    - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
    Waiman Long)

    - optimize smp_cond_load_acquire() on arm64 and implement LSE based
    atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    on arm64 (Will Deacon)

    - introduce smp_acquire__after_ctrl_dep() and fix various barrier
    mis-uses and bugs (Peter Zijlstra)

    - after discovering ancient spin_unlock_wait() barrier bugs in its
    implementation and usage, strengthen its semantics and update/fix
    usage sites (Peter Zijlstra)

    - optimize mutex_trylock() fastpath (Peter Zijlstra)

    - ... misc fixes and cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
    locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
    locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
    locking/static_keys: Fix non static symbol Sparse warning
    locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
    locking/atomic, arch/tile: Fix tilepro build
    locking/atomic, arch/m68k: Remove comment
    locking/atomic, arch/arc: Fix build
    locking/Documentation: Clarify limited control-dependency scope
    locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
    locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
    locking/atomic, arch/mips: Convert to _relaxed atomics
    locking/atomic, arch/alpha: Convert to _relaxed atomics
    locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
    locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
    locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
    locking/atomic: Fix atomic64_relaxed() bits
    locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
    ...

    Linus Torvalds
     

15 Jul, 2016

1 commit

  • struct thread_info is a legacy mess. To prepare for its partial removal,
    move thread_info::addr_limit out.

    As an added benefit, this way is simpler.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

07 Jul, 2016

1 commit

  • We now have implicit batching in the timer wheel. The slack API is no longer
    used, so remove it.

    Signed-off-by: Thomas Gleixner
    Cc: Alan Stern
    Cc: Andrew F. Davis
    Cc: Arjan van de Ven
    Cc: Chris Mason
    Cc: David S. Miller
    Cc: David Woodhouse
    Cc: Dmitry Eremin-Solenikov
    Cc: Eric Dumazet
    Cc: Frederic Weisbecker
    Cc: George Spelvin
    Cc: Greg Kroah-Hartman
    Cc: Jaehoon Chung
    Cc: Jens Axboe
    Cc: John Stultz
    Cc: Josh Triplett
    Cc: Len Brown
    Cc: Linus Torvalds
    Cc: Mathias Nyman
    Cc: Pali Rohár
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Sebastian Reichel
    Cc: Ulf Hansson
    Cc: linux-block@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: linux-usb@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

06 Jul, 2016

1 commit


03 Jul, 2016

1 commit


01 Jul, 2016

2 commits

  • Currently the mpi SG helpers use sg_virt which is completely
    broken. It happens to work with normal kernel memory but will
    fail with anything that is not linearly mapped.

    This patch fixes this by using the SG iterator helpers.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • Every implementation of RSA that we have naturally generates
    output with leading zeroes. The one and only user of RSA,
    pkcs1pad wants to have those leading zeroes in place, in fact
    because they are currently absent it has to write those zeroes
    itself.

    So we shouldn't be stripping leading zeroes in the first place.
    In fact this patch makes rsa-generic produce output with fixed
    length so that pkcs1pad does not need to do any extra work.

    This patch also changes DH to use the new interface.

    Signed-off-by: Herbert Xu

    Herbert Xu
     

16 Jun, 2016

1 commit

  • …relaxed,_acquire,_release}()

    Now that all the architectures have implemented support for these new
    atomic primitives add on the generic infrastructure to expose and use
    it.

    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: linux-arch@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Peter Zijlstra
     

15 Jun, 2016

2 commits


11 Jun, 2016

1 commit

  • We always mixed in the parent pointer into the dentry name hash, but we
    did it late at lookup time. It turns out that we can simplify that
    lookup-time action by salting the hash with the parent pointer early
    instead of late.

    A few other users of our string hashes also wanted to mix in their own
    pointers into the hash, and those are updated to use the same mechanism.

    Hash users that don't have any particular initial salt can just use the
    NULL pointer as a no-salt.

    Cc: Vegard Nossum
    Cc: George Spelvin
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Jun, 2016

1 commit

  • bvec has one native/mature iterator for long time, so not
    necessary to use the reinvented wheel for iterating bvecs
    in lib/iov_iter.c.

    Two ITER_BVEC test cases are run:
    - xfstest(-g auto) on loop dio/aio, no regression found
    - swap file works well under extreme stress(stress-ng --all 64 -t
    800 -v), and lots of OOMs are triggerd, and the whole
    system still survives

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Tested-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Ming Lei
     

08 Jun, 2016

1 commit

  • People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
    into kcov, lto, etc, experimentations.

    Add asm versions for __sw_hweight{32,64}() and do explicit saving and
    restoring of clobbered registers. This gets rid of the special calling
    convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs.

    We still need to hardcode POPCNT and register operands as some old gas
    versions which we support, do not know about POPCNT.

    Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives
    can do padding now.

    Suggested-by: H. Peter Anvin
    Signed-off-by: Borislav Petkov
    Acked-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

31 May, 2016

10 commits

  • mpi_read_from_buffer() and mpi_read_raw_data() do basically the same thing
    except that the former extracts the number of payload bits from the first
    two bytes of the input buffer.

    Besides that, the data copying logic is exactly the same.

    Replace the open coded buffer to MPI instance conversion by a call to
    mpi_read_raw_data().

    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • The first two bytes of the input buffer encode its expected length and
    mpi_read_from_buffer() prints a console message if the given buffer is too
    short.

    However, there are some oddities with how this message is printed:
    - It is printed at the default loglevel. This is different from the
    one used in the case that the first two bytes' value is unsupportedly
    large, i.e. KERN_INFO.
    - The format specifier '%d' is used for unsigned ints.
    - It prints the values of nread and *ret_nread. This is redundant since
    the former is always the latter + 1.

    Clean this up as follows:
    - Use pr_info() rather than printk() with no loglevel.
    - Use the format specifiers '%u' in place if '%d'.
    - Do not print the redundant 'nread' but the more helpful 'nbytes' value.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • Currently, if the input buffer is shorter than the expected length as
    indicated by its first two bytes, an MPI instance of this expected length
    will be allocated and filled with as much data as is available. The rest
    will remain uninitialized.

    Instead of leaving this condition undetected, an error code should be
    reported to the caller.

    Since this situation indicates that the input buffer's first two bytes,
    encoding the number of expected bits, are garbled, -EINVAL is appropriate
    here.

    If the input buffer is shorter than indicated by its first two bytes,
    make mpi_read_from_buffer() return -EINVAL.
    Get rid of the 'nread' variable: with the new semantics, the total number
    of bytes read from the input buffer is known in advance.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • Currently, if digsig_verify_rsa() detects that the modulo's length is zero,
    i.e. mlen == 0, it returns -ENOMEM which doesn't really fit here.

    Make digsig_verify_rsa() return -EINVAL upon mlen == 0.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • mpi_read_from_buffer() reads a MPI from a buffer into a newly allocated
    MPI instance. It expects the buffer's leading two bytes to contain the
    number of bits, followed by the actual payload.

    On failure, it returns NULL and updates the in/out argument ret_nread
    somewhat inconsistently:
    - If the given buffer is too short to contain the leading two bytes
    encoding the number of bits or their value is unsupported, then
    ret_nread will be cleared.
    - If the allocation of the resulting MPI instance fails, ret_nread is left
    as is.

    The only user of mpi_read_from_buffer(), digsig_verify_rsa(), simply checks
    for a return value of NULL and returns -ENOMEM if that happens.

    While this is all of cosmetic nature only, there is another error condition
    which currently isn't detectable by the caller of mpi_read_from_buffer():
    if the given buffer is too small to hold the number of bits as encoded in
    its first two bytes, the return value will be non-NULL and *ret_nread > 0.

    In preparation of communicating this condition to the caller, let
    mpi_read_from_buffer() return error values by means of the ERR_PTR()
    mechanism.

    Make the sole caller of mpi_read_from_buffer(), digsig_verify_rsa(),
    check the return value for IS_ERR() rather than == NULL. If IS_ERR() is
    true, return the associated error value rather than the fixed -ENOMEM.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • The number of bits, nbits, is calculated in mpi_read_raw_data() as follows:

    nbits = nbytes * 8;

    Afterwards, the number of leading zero bits of the first byte get
    subtracted:

    nbits -= count_leading_zeros(buffer[0]);

    However, count_leading_zeros() takes an unsigned long and thus,
    the u8 gets promoted to an unsigned long.

    Thus, the above doesn't subtract the number of leading zeros in the most
    significant nonzero input byte from nbits, but the number of leading
    zeros of the most significant nonzero input byte promoted to unsigned long,
    i.e. BITS_PER_LONG - 8 too many.

    Fix this by subtracting

    count_leading_zeros(...) - (BITS_PER_LONG - 8)

    from nbits only.

    Fixes: e1045992949 ("MPILIB: Provide a function to read raw data into an
    MPI")
    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • In mpi_read_raw_data(), unsigned nbits is calculated as follows:

    nbits = nbytes * 8;

    and redundantly cleared later on if nbytes == 0:

    if (nbytes > 0)
    ...
    else
    nbits = 0;

    Purge this redundant clearing for the sake of clarity.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • mpi_set_buffer() has no in-tree users and similar functionality is provided
    by mpi_read_raw_data().

    Remove mpi_set_buffer().

    Signed-off-by: Nicolai Stange
    Signed-off-by: Herbert Xu

    Nicolai Stange
     
  • Use '+ 0' and '+ 1' as offsets, like they were intended, instead of
    adding to the result.

    Fixes: 2b1b0d66704a ("lib/uuid.c: introduce a few more generic helpers")
    Signed-off-by: Bjørn Mork
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Linus Torvalds

    Bjørn Mork
     
  • It appears that somehow I missed a test of the latest UUID rework which
    landed in the kernel. Present a small test module to avoid such cases
    in the future.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

29 May, 2016

2 commits

  • Pull string hash improvements from George Spelvin:
    "This series does several related things:

    - Makes the dcache hash (fs/namei.c) useful for general kernel use.

    (Thanks to Bruce for noticing the zero-length corner case)

    - Converts the string hashes in to use the
    above.

    - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
    32-bit multiplies will do well enough.

    - Rids the world of the bad hash multipliers in hash_32.

    This finishes the job started in commit 689de1d6ca95 ("Minimal
    fix-up of bad hashing behavior of hash_64()")

    The vast majority of Linux architectures have hardware support for
    32x32-bit multiply and so derive no benefit from "simplified"
    multipliers.

    The few processors that do not (68000, h8/300 and some models of
    Microblaze) have arch-specific implementations added. Those
    patches are last in the series.

    - Overhauls the dcache hash mixing.

    The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
    CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
    Replaced with a much more careful design that's simultaneously
    faster and better. (My own invention, as there was noting suitable
    in the literature I could find. Comments welcome!)

    - Modify the hash_name() loop to skip the initial HASH_MIX(). This
    would let us salt the hash if we ever wanted to.

    - Sort out partial_name_hash().

    The hash function is declared as using a long state, even though
    it's truncated to 32 bits at the end and the extra internal state
    contributes nothing to the result. And some callers do odd things:

    - fs/hfs/string.c only allocates 32 bits of state
    - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

    - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
    rather than 0..sizeof(long)-1. This would simplify users other
    than full_name_hash"

    Special thanks to Bruce Fields for testing and finding bugs in v1. (I
    learned some humbling lessons about "obviously correct" code.)

    On the arch-specific front, the m68k assembly has been tested in a
    standalone test harness, I've been in contact with the Microblaze
    maintainers who mostly don't care, as the hardware multiplier is never
    omitted in real-world applications, and I haven't heard anything from
    the H8/300 world"

    * 'hash' of git://ftp.sciencehorizons.net/linux:
    h8300: Add
    microblaze: Add
    m68k: Add
    : Add support for architecture-specific functions
    fs/namei.c: Improve dcache hash function
    Eliminate bad hash multipliers from hash_32() and hash_64()
    Change hash_64() return value to 32 bits
    : Define hash_str() in terms of hashlen_string()
    fs/namei.c: Add hashlen_string() function
    Pull out string hash to

    Linus Torvalds
     
  • This is just the infrastructure; there are no users yet.

    This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
    the existence of .

    That file may define its own versions of various functions, and define
    HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

    Included is a self-test (in lib/test_hash.c) that verifies the basics.
    It is NOT in general required that the arch-specific functions compute
    the same thing as the generic, but if a HAVE_* symbol is defined with
    the value 1, then equality is tested.

    Signed-off-by: George Spelvin
    Cc: Geert Uytterhoeven
    Cc: Greg Ungerer
    Cc: Andreas Schwab
    Cc: Philippe De Muyter
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: Alistair Francis
    Cc: Michal Simek
    Cc: Yoshinori Sato
    Cc: uclinux-h8-devel@lists.sourceforge.jp

    George Spelvin