Eric Lee / smarc-fsl-linux-kernel

29 Jul, 2016

5 commits

1c88e19b0 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:
"The rest of MM"

* emailed patches from Andrew Morton : (101 commits)
mm, compaction: simplify contended compaction handling
mm, compaction: introduce direct compaction priority
mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
mm, page_alloc: make THP-specific decisions more generic
mm, page_alloc: restructure direct compaction handling in slowpath
mm, page_alloc: don't retry initial attempt in slowpath
mm, page_alloc: set alloc_flags only once in slowpath
lib/stackdepot.c: use __GFP_NOWARN for stack allocations
mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
mm, kasan: account for object redzone in SLUB's nearest_obj()
mm: fix use-after-free if memory allocation failed in vma_adjust()
zsmalloc: Delete an unnecessary check before the function call "iput"
mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
mm: optimize copy_page_to/from_iter_iovec
mm: add cond_resched() to generic_swapfile_activate()
Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
mm: hwpoison: remove incorrect comments
make __section_nr() more efficient
...

Linus Torvalds
2016-07-29 07:36:48 +0800
87cc271d5 lib/stackdepot.c: use __GFP_NOWARN for stack allocations ... Browse Code »

This (large, atomic) allocation attempt can fail. We expect and handle
that, so avoid the scary warning.

Link: http://lkml.kernel.org/r/20160720151905.GB19146@node.shutemov.name
Cc: Andrey Ryabinin
Cc: Alexander Potapenko
Cc: Michal Hocko
Cc: Rik van Riel
Cc: David Rientjes
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-07-29 07:07:41 +0800
80a9201a5 mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB ... Browse Code »

For KASAN builds:
- switch SLUB allocator to using stackdepot instead of storing the
allocation/deallocation stacks in the objects;
- change the freelist hook so that parts of the freelist can be put
into the quarantine.

[aryabinin@virtuozzo.com: fixes]
Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko
Cc: Andrey Konovalov
Cc: Christoph Lameter
Cc: Dmitry Vyukov
Cc: Steven Rostedt (Red Hat)
Cc: Joonsoo Kim
Cc: Kostya Serebryany
Cc: Andrey Ryabinin
Cc: Kuthonuzo Luruo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-07-29 07:07:41 +0800
3fa6c5073 mm: optimize copy_page_to/from_iter_iovec ... Browse Code »

copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data
to userspace or from userspace. These functions have a fast path where
they map a page using kmap_atomic and a slow path where they use kmap.

kmap is slower than kmap_atomic, so the fast path is preferred.

However, on kernels without highmem support, kmap just calls
page_address, so there is no need to avoid kmap. On kernels without
highmem support, the fast path just increases code size (and cache
footprint) and it doesn't improve copy performance in any way.

This patch enables the fast path only if CONFIG_HIGHMEM is defined.

Code size reduced by this patch:
x86 (without highmem) 928
x86-64 960
sparc64 848
alpha 1136
pa-risc 1200

[akpm@linux-foundation.org: use IS_ENABLED(), per Andi]
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Mikulas Patocka
Cc: Hugh Dickins
Cc: Michal Hocko
Cc: Alexander Viro
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mikulas Patocka
2016-07-29 07:07:41 +0800
554828ee0 Merge branch 'salted-string-hash' ... Browse Code »

This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash

Linus Torvalds
2016-07-29 03:26:31 +0800

28 Jul, 2016

2 commits

818e607b5 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random driver updates from Ted Ts'o:
"A number of improvements for the /dev/random driver; the most
important is the use of a ChaCha20-based CRNG for /dev/urandom, which
is faster, more efficient, and easier to make scalable for
silly/abusive userspace programs that want to read from /dev/urandom
in a tight loop on NUMA systems.

This set of patches also improves entropy gathering on VM's running on
Microsoft Azure, and will take advantage of a hw random number
generator (if present) to initialize the /dev/urandom pool"

(It turns out that the random tree hadn't been in linux-next this time
around, because it had been dropped earlier as being too quiet. Oh
well).

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: strengthen input validation for RNDADDTOENTCNT
random: add backtracking protection to the CRNG
random: make /dev/urandom scalable for silly userspace programs
random: replace non-blocking pool with a Chacha20-based CRNG
random: properly align get_random_int_hash
random: add interrupt callback to VMBus IRQ handler
random: print a warning for the first ten uninitialized random users
random: initialize the non-blocking pool via add_hwgenerator_randomness()

Linus Torvalds
2016-07-28 06:11:55 +0800
468fc7ed5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Unified UDP encapsulation offload methods for drivers, from
Alexander Duyck.

2) Make DSA binding more sane, from Andrew Lunn.

3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
packets as soon as the device sees them, with the option to mirror
the packet on TX via the same interface. From Brenden Blanco and
others.

6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

8) Simplify netlink conntrack entry layout, from Florian Westphal.

9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
xgene: Fix build warning with ACPI disabled.
be2net: perform temperature query in adapter regardless of its interface state
l2tp: Correctly return -EBADF from pppol2tp_getname.
net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
net: ipmr/ip6mr: update lastuse on entry change
macsec: ensure rx_sa is set when validation is disabled
tipc: dump monitor attributes
tipc: add a function to get the bearer name
tipc: get monitor threshold for the cluster
tipc: make cluster size threshold for monitoring configurable
tipc: introduce constants for tipc address validation
net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
MAINTAINERS: xgene: Add driver and documentation path
Documentation: dtb: xgene: Add MDIO node
dtb: xgene: Add MDIO node
drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
drivers: net: xgene: Use exported functions
drivers: net: xgene: Enable MDIO driver
drivers: net: xgene: Add backward compatibility
drivers: net: phy: xgene: Add MDIO driver
...

Linus Torvalds
2016-07-28 03:03:20 +0800

27 Jul, 2016

6 commits

0e06f5c0d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few misc bits

- ocfs2

- most(?) of MM

* emailed patches from Andrew Morton : (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...

Linus Torvalds
2016-07-27 10:55:54 +0800
c78c66d1d radix-tree: implement radix_tree_maybe_preload_order() ... Browse Code »

The new helper is similar to radix_tree_maybe_preload(), but tries to
preload number of nodes required to insert (1 << order) continuous
naturally-aligned elements.

This is required to push huge pages into pagecache.

Link: http://lkml.kernel.org/r/1466021202-61880-24-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-07-27 07:19:19 +0800
f2ca0b557 mm/page_owner: use stackdepot to store stacktrace ... Browse Code »

Currently, we store each page's allocation stacktrace on corresponding
page_ext structure and it requires a lot of memory. This causes the
problem that memory tight system doesn't work well if page_owner is
enabled. Moreover, even with this large memory consumption, we cannot
get full stacktrace because we allocate memory at boot time and just
maintain 8 stacktrace slots to balance memory consumption. We could
increase it to more but it would make system unusable or change system
behaviour.

To solve the problem, this patch uses stackdepot to store stacktrace.
It obviously provides memory saving but there is a drawback that
stackdepot could fail.

stackdepot allocates memory at runtime so it could fail if system has
not enough memory. But, most of allocation stack are generated at very
early time and there are much memory at this time. So, failure would
not happen easily. And, one failure means that we miss just one page's
allocation stacktrace so it would not be a big problem. In this patch,
when memory allocation failure happens, we store special stracktrace
handle to the page that is failed to save stacktrace. With it, user can
guess memory usage properly even if failure happens.

Memory saving looks as following. (4GB memory system with page_owner)
(before the patch -> after the patch)

static allocation:
92274688 bytes -> 25165824 bytes

dynamic allocation after boot + kernel build:
0 bytes -> 327680 bytes

total:
92274688 bytes -> 25493504 bytes

72% reduction in total.

Note that implementation looks complex than someone would imagine
because there is recursion issue. stackdepot uses page allocator and
page_owner is called at page allocation. Using stackdepot in page_owner
could re-call page allcator and then page_owner. That is a recursion.
To detect and avoid it, whenever we obtain stacktrace, recursion is
checked and page_owner is set to dummy information if found. Dummy
information means that this page is allocated for page_owner feature
itself (such as stackdepot) and it's understandable behavior for user.

[iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3]
Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Alexander Potapenko
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2016-07-27 07:19:19 +0800
d5dfc80f8 dma-debug: track bucket lock state for static checkers ... Browse Code »

get_hash_bucket() and put_hash_bucket() acquire and release the same
spinlock, but this confuses static checkers such as sparse

lib/dma-debug.c:254:27: warning: context imbalance in 'get_hash_bucket' - wrong count at exit
lib/dma-debug.c:268:13: warning: context imbalance in 'put_hash_bucket' - unexpected unlock

Add the appropriate acquire and release statements so that checkers can
properly track the lock state.

Link: http://lkml.kernel.org/r/20160701191552.24295-1-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Boyd
2016-07-27 07:19:19 +0800
d05d7f407 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:

- the big change is the cleanup from Mike Christie, cleaning up our
uses of command types and modified flags. This is what will throw
some merge conflicts

- regression fix for the above for btrfs, from Vincent

- following up to the above, better packing of struct request from
Christoph

- a 2038 fix for blktrace from Arnd

- a few trivial/spelling fixes from Bart Van Assche

- a front merge check fix from Damien, which could cause issues on
SMR drives

- Atari partition fix from Gabriel

- convert cfq to highres timers, since jiffies isn't granular enough
for some devices these days. From Jan and Jeff

- CFQ priority boost fix idle classes, from me

- cleanup series from Ming, improving our bio/bvec iteration

- a direct issue fix for blk-mq from Omar

- fix for plug merging not involving the IO scheduler, like we do for
other types of merges. From Tahsin

- expose DAX type internally and through sysfs. From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
block: Fix front merge check
block: do not merge requests without consulting with io scheduler
block: Fix spelling in a source code comment
block: expose QUEUE_FLAG_DAX in sysfs
block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Btrfs: fix comparison in __btrfs_map_block()
block: atari: Return early for unsupported sector size
Doc: block: Fix a typo in queue-sysfs.txt
cfq-iosched: Charge at least 1 jiffie instead of 1 ns
cfq-iosched: Fix regression in bonnie++ rewrite performance
cfq-iosched: Convert slice_resid from u64 to s64
block: Convert fifo_time from ulong to u64
blktrace: avoid using timespec
block/blk-cgroup.c: Declare local symbols static
block/bio-integrity.c: Add #include "blk.h"
block/partition-generic.c: Remove a set-but-not-used variable
block: bio: kill BIO_MAX_SIZE
cfq-iosched: temporarily boost queue priority for idle classes
block: drbd: avoid to use BIO_MAX_SIZE
block: bio: remove BIO_MAX_SECTORS
...

Linus Torvalds
2016-07-27 06:03:07 +0800
bbce2ad2d Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.8:

API:
- first part of skcipher low-level conversions
- add KPP (Key-agreement Protocol Primitives) interface.

Algorithms:
- fix IPsec/cryptd reordering issues that affects aesni
- RSA no longer does explicit leading zero removal
- add SHA3
- add DH
- add ECDH
- improve DRBG performance by not doing CTR by hand

Drivers:
- add x86 AVX2 multibuffer SHA256/512
- add POWER8 optimised crc32c
- add xts support to vmx
- add DH support to qat
- add RSA support to caam
- add Layerscape support to caam
- add SEC1 AEAD support to talitos
- improve performance by chaining requests in marvell/cesa
- add support for Araneus Alea I USB RNG
- add support for Broadcom BCM5301 RNG
- add support for Amlogic Meson RNG
- add support Broadcom NSP SoC RNG"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
crypto: vmx - Fix aes_p8_xts_decrypt build failure
crypto: vmx - Ignore generated files
crypto: vmx - Adding support for XTS
crypto: vmx - Adding asm subroutines for XTS
crypto: skcipher - add comment for skcipher_alg->base
crypto: testmgr - Print akcipher algorithm name
crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
crypto: nx - off by one bug in nx_of_update_msc()
crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
crypto: scatterwalk - Inline start/map/done
crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
crypto: scatterwalk - Fix test in scatterwalk_done
crypto: api - Optimise away crypto_yield when hard preemption is on
crypto: scatterwalk - add no-copy support to copychunks
crypto: scatterwalk - Remove scatterwalk_bytes_sglen
crypto: omap - Stop using crypto scatterwalk_bytes_sglen
crypto: skcipher - Remove top-level givcipher interface
crypto: user - Remove crypto_lookup_skcipher call
crypto: cts - Convert to skcipher
...

Linus Torvalds
2016-07-27 04:40:17 +0800

26 Jul, 2016

3 commits

55392c4c0 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Thomas Gleixner:
"This update provides the following changes:

- The rework of the timer wheel which addresses the shortcomings of
the current wheel (cascading, slow search for next expiring timer,
etc). That's the first major change of the wheel in almost 20
years since Finn implemted it.

- A large overhaul of the clocksource drivers init functions to
consolidate the Device Tree initialization

- Some more Y2038 updates

- A capability fix for timerfd

- Yet another clock chip driver

- The usual pile of updates, comment improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
tick/nohz: Optimize nohz idle enter
clockevents: Make clockevents_subsys static
clocksource/drivers/time-armada-370-xp: Fix return value check
timers: Implement optimization for same expiry time in mod_timer()
timers: Split out index calculation
timers: Only wake softirq if necessary
timers: Forward the wheel clock whenever possible
timers/nohz: Remove pointless tick_nohz_kick_tick() function
timers: Optimize collect_expired_timers() for NOHZ
timers: Move __run_timers() function
timers: Remove set_timer_slack() leftovers
timers: Switch to a non-cascading wheel
timers: Reduce the CPU index space to 256k
timers: Give a few structs and members proper names
hlist: Add hlist_is_singular_node() helper
signals: Use hrtimer for sigtimedwait()
timers: Remove the deprecated mod_timer_pinned() API
timers, net/ipv4/inet: Initialize connection request timers as pinned
timers, drivers/tty/mips_ejtag: Initialize the poll timer as pinned
timers, drivers/tty/metag_da: Initialize the poll timer as pinned
...

Linus Torvalds
2016-07-26 11:43:12 +0800
0f657262d Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 mm updates from Ingo Molnar:
"Various x86 low level modifications:

- preparatory work to support virtually mapped kernel stacks (Andy
Lutomirski)

- support for 64-bit __get_user() on 32-bit kernels (Benjamin
LaHaise)

- (involved) workaround for Knights Landing CPU erratum (Dave Hansen)

- MPX enhancements (Dave Hansen)

- mremap() extension to allow remapping of the special VDSO vma, for
purposes of user level context save/restore (Dmitry Safonov)

- hweight and entry code cleanups (Borislav Petkov)

- bitops code generation optimizations and cleanups with modern GCC
(H. Peter Anvin)

- syscall entry code optimizations (Paolo Bonzini)"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
x86/mm/cpa: Add missing comment in populate_pdg()
x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs
x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2
x86/smp: Remove unnecessary initialization of thread_info::cpu
x86/smp: Remove stack_smp_processor_id()
x86/uaccess: Move thread_info::addr_limit to thread_struct
x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err
x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct
x86/dumpstack: When OOPSing, rewind the stack before do_exit()
x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm
x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS
x86/dumpstack: Try harder to get a call trace on stack overflow
x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()
x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated
x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()
x86/mm: Use pte_none() to test for empty PTE
x86/mm: Disallow running with 32-bit PTEs to work around erratum
x86/mm: Ignore A/D bits in pte/pmd/pud_none()
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/entry: Inline enter_from_user_mode()
...

Linus Torvalds
2016-07-26 06:34:18 +0800
c86ad14d3 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.

The main changes are:

- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)

- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)

- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)

- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)

- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)

- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)

- optimize mutex_trylock() fastpath (Peter Zijlstra)

- ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...

Linus Torvalds
2016-07-26 03:41:29 +0800

15 Jul, 2016

1 commit

13d4ea097 x86/uaccess: Move thread_info::addr_limit to thread_struct ... Browse Code »

struct thread_info is a legacy mess. To prepare for its partial removal,
move thread_info::addr_limit out.

As an added benefit, this way is simpler.

Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar

Andy Lutomirski
2016-07-15 16:26:30 +0800

07 Jul, 2016

1 commit

53bf837b7 timers: Remove set_timer_slack() leftovers ... Browse Code »

We now have implicit batching in the timer wheel. The slack API is no longer
used, so remove it.

Signed-off-by: Thomas Gleixner
Cc: Alan Stern
Cc: Andrew F. Davis
Cc: Arjan van de Ven
Cc: Chris Mason
Cc: David S. Miller
Cc: David Woodhouse
Cc: Dmitry Eremin-Solenikov
Cc: Eric Dumazet
Cc: Frederic Weisbecker
Cc: George Spelvin
Cc: Greg Kroah-Hartman
Cc: Jaehoon Chung
Cc: Jens Axboe
Cc: John Stultz
Cc: Josh Triplett
Cc: Len Brown
Cc: Linus Torvalds
Cc: Mathias Nyman
Cc: Pali Rohár
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Sebastian Reichel
Cc: Ulf Hansson
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
Signed-off-by: Ingo Molnar

Thomas Gleixner
2016-07-07 16:35:09 +0800

06 Jul, 2016

1 commit

c1adf2005 Introduce rb_replace_node_rcu() ... Browse Code »

Implement an RCU-safe variant of rb_replace_node() and rearrange
rb_replace_node() to do things in the same order.

Signed-off-by: David Howells
Acked-by: Peter Zijlstra (Intel)

David Howells
2016-07-06 17:51:14 +0800

03 Jul, 2016

1 commit

e192be9d9 random: replace non-blocking pool with a Chacha20-based CRNG ... Browse Code »

The CRNG is faster, and we don't pretend to track entropy usage in the
CRNG any more.

Signed-off-by: Theodore Ts'o

Theodore Ts'o
2016-07-03 12:57:23 +0800

01 Jul, 2016

2 commits

127827b9c lib/mpi: Do not do sg_virt ... Browse Code »

Currently the mpi SG helpers use sg_virt which is completely
broken. It happens to work with normal kernel memory but will
fail with anything that is not linearly mapped.

This patch fixes this by using the SG iterator helpers.

Signed-off-by: Herbert Xu

Herbert Xu
2016-07-01 23:45:18 +0800
9b45b7bba crypto: rsa - Generate fixed-length output ... Browse Code »

Every implementation of RSA that we have naturally generates
output with leading zeroes. The one and only user of RSA,
pkcs1pad wants to have those leading zeroes in place, in fact
because they are currently absent it has to write those zeroes
itself.

So we shouldn't be stripping leading zeroes in the first place.
In fact this patch makes rsa-generic produce output with fixed
length so that pkcs1pad does not need to do any extra work.

This patch also changes DH to use the new interface.

Signed-off-by: Herbert Xu

Herbert Xu
2016-07-01 23:45:18 +0800

16 Jun, 2016

1 commit

28aa2bda2 locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_… ... Browse Code »

…relaxed,_acquire,_release}()

Now that all the architectures have implemented support for these new
atomic primitives add on the generic infrastructure to expose and use
it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Peter Zijlstra
2016-06-16 16:48:32 +0800

15 Jun, 2016

2 commits

4e9a073f6 torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code ... Browse Code »

This commit removes CONFIG_RCU_TORTURE_TEST_RUNNABLE in favor of the
already-existing rcutorture.torture_runnable kernel boot parameter.
It also converts an #ifdef into IS_ENABLED(), saving a few lines of code.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-06-15 07:02:15 +0800
f8cbdee99 torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLE ... Browse Code »

This commit applies the infamous IS_ENABLED() macro to eliminate a #ifdef.
It also eliminates the RCU_PERF_TEST_RUNNABLE Kconfig option in favor
of the already-existing rcuperf.perf_runnable kernel boot parameter.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2016-06-15 07:02:15 +0800

11 Jun, 2016

1 commit

8387ff257 vfs: make the string hashes salt the hash ... Browse Code »

We always mixed in the parent pointer into the dentry name hash, but we
did it late at lookup time. It turns out that we can simplify that
lookup-time action by salting the hash with the parent pointer early
instead of late.

A few other users of our string hashes also wanted to mix in their own
pointers into the hash, and those are updated to use the same mechanism.

Hash users that don't have any particular initial salt can just use the
NULL pointer as a no-salt.

Cc: Vegard Nossum
Cc: George Spelvin
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-06-11 11:21:46 +0800

10 Jun, 2016

1 commit

1bdc76aea iov_iter: use bvec iterator to implement iterate_bvec() ... Browse Code »

bvec has one native/mature iterator for long time, so not
necessary to use the reinvented wheel for iterating bvecs
in lib/iov_iter.c.

Two ITER_BVEC test cases are run:
- xfstest(-g auto) on loop dio/aio, no regression found
- swap file works well under extreme stress(stress-ng --all 64 -t
800 -v), and lots of OOMs are triggerd, and the whole
system still survives

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Tested-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Ming Lei
2016-06-10 00:02:47 +0800

08 Jun, 2016

1 commit

f5967101e x86/hweight: Get rid of the special calling convention ... Browse Code »

People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
into kcov, lto, etc, experimentations.

Add asm versions for __sw_hweight{32,64}() and do explicit saving and
restoring of clobbered registers. This gets rid of the special calling
convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs.

We still need to hardcode POPCNT and register operands as some old gas
versions which we support, do not know about POPCNT.

Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives
can do padding now.

Suggested-by: H. Peter Anvin
Signed-off-by: Borislav Petkov
Acked-by: Peter Zijlstra (Intel)
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar

Borislav Petkov
2016-06-08 21:01:02 +0800

31 May, 2016

10 commits

20b5b7f3c lib/mpi: refactor mpi_read_from_buffer() in terms of mpi_read_raw_data() ... Browse Code »

mpi_read_from_buffer() and mpi_read_raw_data() do basically the same thing
except that the former extracts the number of payload bits from the first
two bytes of the input buffer.

Besides that, the data copying logic is exactly the same.

Replace the open coded buffer to MPI instance conversion by a call to
mpi_read_raw_data().

Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:42:01 +0800
cdf24b42c lib/mpi: mpi_read_from_buffer(): sanitize short buffer printk ... Browse Code »

The first two bytes of the input buffer encode its expected length and
mpi_read_from_buffer() prints a console message if the given buffer is too
short.

However, there are some oddities with how this message is printed:
- It is printed at the default loglevel. This is different from the
one used in the case that the first two bytes' value is unsupportedly
large, i.e. KERN_INFO.
- The format specifier '%d' is used for unsigned ints.
- It prints the values of nread and *ret_nread. This is redundant since
the former is always the latter + 1.

Clean this up as follows:
- Use pr_info() rather than printk() with no loglevel.
- Use the format specifiers '%u' in place if '%d'.
- Do not print the redundant 'nread' but the more helpful 'nbytes' value.

Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:42:00 +0800
7af791e0f lib/mpi: mpi_read_from_buffer(): return -EINVAL upon too short buffer ... Browse Code »

Currently, if the input buffer is shorter than the expected length as
indicated by its first two bytes, an MPI instance of this expected length
will be allocated and filled with as much data as is available. The rest
will remain uninitialized.

Instead of leaving this condition undetected, an error code should be
reported to the caller.

Since this situation indicates that the input buffer's first two bytes,
encoding the number of expected bits, are garbled, -EINVAL is appropriate
here.

If the input buffer is shorter than indicated by its first two bytes,
make mpi_read_from_buffer() return -EINVAL.
Get rid of the 'nread' variable: with the new semantics, the total number
of bytes read from the input buffer is known in advance.

Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:42:00 +0800
c5ce7c697 lib/digsig: digsig_verify_rsa(): return -EINVAL if modulo length is zero ... Browse Code »

Currently, if digsig_verify_rsa() detects that the modulo's length is zero,
i.e. mlen == 0, it returns -ENOMEM which doesn't really fit here.

Make digsig_verify_rsa() return -EINVAL upon mlen == 0.

Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:42:00 +0800
03cdfaad4 lib/mpi: mpi_read_from_buffer(): return error code ... Browse Code »

mpi_read_from_buffer() reads a MPI from a buffer into a newly allocated
MPI instance. It expects the buffer's leading two bytes to contain the
number of bits, followed by the actual payload.

On failure, it returns NULL and updates the in/out argument ret_nread
somewhat inconsistently:
- If the given buffer is too short to contain the leading two bytes
encoding the number of bits or their value is unsupported, then
ret_nread will be cleared.
- If the allocation of the resulting MPI instance fails, ret_nread is left
as is.

The only user of mpi_read_from_buffer(), digsig_verify_rsa(), simply checks
for a return value of NULL and returns -ENOMEM if that happens.

While this is all of cosmetic nature only, there is another error condition
which currently isn't detectable by the caller of mpi_read_from_buffer():
if the given buffer is too small to hold the number of bits as encoded in
its first two bytes, the return value will be non-NULL and *ret_nread > 0.

In preparation of communicating this condition to the caller, let
mpi_read_from_buffer() return error values by means of the ERR_PTR()
mechanism.

Make the sole caller of mpi_read_from_buffer(), digsig_verify_rsa(),
check the return value for IS_ERR() rather than == NULL. If IS_ERR() is
true, return the associated error value rather than the fixed -ENOMEM.

Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:41:59 +0800
eef0df6a5 lib/mpi: mpi_read_raw_data(): fix nbits calculation ... Browse Code »

The number of bits, nbits, is calculated in mpi_read_raw_data() as follows:

nbits = nbytes * 8;

Afterwards, the number of leading zero bits of the first byte get
subtracted:

nbits -= count_leading_zeros(buffer[0]);

However, count_leading_zeros() takes an unsigned long and thus,
the u8 gets promoted to an unsigned long.

Thus, the above doesn't subtract the number of leading zeros in the most
significant nonzero input byte from nbits, but the number of leading
zeros of the most significant nonzero input byte promoted to unsigned long,
i.e. BITS_PER_LONG - 8 too many.

Fix this by subtracting

count_leading_zeros(...) - (BITS_PER_LONG - 8)

from nbits only.

Fixes: e1045992949 ("MPILIB: Provide a function to read raw data into an
MPI")
Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:41:59 +0800
dfd905106 lib/mpi: mpi_read_raw_data(): purge redundant clearing of nbits ... Browse Code »

In mpi_read_raw_data(), unsigned nbits is calculated as follows:

nbits = nbytes * 8;

and redundantly cleared later on if nbytes == 0:

if (nbytes > 0)
...
else
nbits = 0;

Purge this redundant clearing for the sake of clarity.

Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:41:58 +0800
4bdf1cfca lib/mpi: purge mpi_set_buffer() ... Browse Code »

mpi_set_buffer() has no in-tree users and similar functionality is provided
by mpi_read_raw_data().

Remove mpi_set_buffer().

Signed-off-by: Nicolai Stange
Signed-off-by: Herbert Xu

Nicolai Stange
2016-05-31 16:41:58 +0800
bc9dc9d5e lib/uuid.c: use correct offset in uuid parser ... Browse Code »

Use '+ 0' and '+ 1' as offsets, like they were intended, instead of
adding to the result.

Fixes: 2b1b0d66704a ("lib/uuid.c: introduce a few more generic helpers")
Signed-off-by: Bjørn Mork
Signed-off-by: Andy Shevchenko
Signed-off-by: Linus Torvalds

Bjørn Mork
2016-05-31 06:26:57 +0800
cfaff0e51 lib/uuid: add a test module ... Browse Code »

It appears that somehow I missed a test of the latest UUID rework which
landed in the kernel. Present a small test module to avoid such cases
in the future.

Signed-off-by: Andy Shevchenko
Signed-off-by: Linus Torvalds

Andy Shevchenko
2016-05-31 06:26:57 +0800

29 May, 2016

2 commits

7e0fb73c5 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux ... Browse Code »

Pull string hash improvements from George Spelvin:
"This series does several related things:

- Makes the dcache hash (fs/namei.c) useful for general kernel use.

(Thanks to Bruce for noticing the zero-length corner case)

- Converts the string hashes in to use the
above.

- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.

- Rids the world of the bad hash multipliers in hash_32.

This finishes the job started in commit 689de1d6ca95 ("Minimal
fix-up of bad hashing behavior of hash_64()")

The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.

The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.

- Overhauls the dcache hash mixing.

The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)

- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.

- Sort out partial_name_hash().

The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:

- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"

Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)

On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add
microblaze: Add
m68k: Add
: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to

Linus Torvalds
2016-05-29 07:15:25 +0800
468a94285 <linux/hash.h>: Add support for architecture-specific functions ... Browse Code »

This is just the infrastructure; there are no users yet.

This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of .

That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.

Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.

Signed-off-by: George Spelvin
Cc: Geert Uytterhoeven
Cc: Greg Ungerer
Cc: Andreas Schwab
Cc: Philippe De Muyter
Cc: linux-m68k@lists.linux-m68k.org
Cc: Alistair Francis
Cc: Michal Simek
Cc: Yoshinori Sato
Cc: uclinux-h8-devel@lists.sourceforge.jp

George Spelvin
2016-05-29 03:48:31 +0800