Eric Lee / smarc-fsl-linux-kernel

20 May, 2016

11 commits

52b6f46bc mm: /proc/sys/vm/stat_refresh to force vmstat update ... Browse Code »

Provide /proc/sys/vm/stat_refresh to force an immediate update of
per-cpu into global vmstats: useful to avoid a sleep(2) or whatever
before checking counts when testing. Originally added to work around a
bug which left counts stranded indefinitely on a cpu going idle (an
inaccuracy magnified when small below-batch numbers represent "huge"
amounts of memory), but I believe that bug is now fixed: nonetheless,
this is still a useful knob.

Its schedule_on_each_cpu() is probably too expensive just to fold into
reading /proc/meminfo itself: give this mode 0600 to prevent abuse.
Allow a write or a read to do the same: nothing to read, but "grep -h
Shmem /proc/sys/vm/stat_refresh /proc/meminfo" is convenient. Oh, and
since global_page_state() itself is careful to disguise any underflow as
0, hack in an "Invalid argument" and pr_warn() if a counter is negative
after the refresh - this helped to fix a misaccounting of
NR_ISOLATED_FILE in my migration code.

But on recent kernels, I find that NR_ALLOC_BATCH and NR_PAGES_SCANNED
often go negative some of the time. I have not yet worked out why, but
have no evidence that it's actually harmful. Punt for the moment by
just ignoring the anomaly on those.

Signed-off-by: Hugh Dickins
Cc: "Kirill A. Shutemov"
Cc: Andrea Arcangeli
Cc: Andres Lagar-Cavilla
Cc: Yang Shi
Cc: Ning Qu
Cc: Mel Gorman
Cc: Andres Lagar-Cavilla
Cc: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2016-05-20 10:12:14 +0800
0edaf86cf include/linux/nodemask.h: create next_node_in() helper ... Browse Code »

Lots of code does

node = next_node(node, XXX);
if (node == MAX_NUMNODES)
node = first_node(XXX);

so create next_node_in() to do this and use it in various places.

[mhocko@suse.com: use next_node_in() helper]
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Signed-off-by: Michal Hocko
Cc: Xishi Qiu
Cc: Joonsoo Kim
Cc: David Rientjes
Cc: Naoya Horiguchi
Cc: Laura Abbott
Cc: Hui Zhu
Cc: Wang Xiaoqiang
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2016-05-20 10:12:14 +0800
0139aa7b7 mm: rename _count, field of the struct page, to _refcount ... Browse Code »

Many developers already know that field for reference count of the
struct page is _count and atomic type. They would try to handle it
directly and this could break the purpose of page reference count
tracepoint. To prevent direct _count modification, this patch rename it
to _refcount and add warning message on the code. After that, developer
who need to handle reference count will find that field should not be
accessed directly.

[akpm@linux-foundation.org: fix comments, per Vlastimil]
[akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
[sfr@canb.auug.org.au: sync ethernet driver changes]
Signed-off-by: Joonsoo Kim
Signed-off-by: Stephen Rothwell
Cc: Vlastimil Babka
Cc: Hugh Dickins
Cc: Johannes Berg
Cc: "David S. Miller"
Cc: Sunil Goutham
Cc: Chris Metcalf
Cc: Manish Chopra
Cc: Yuval Mintz
Cc: Tariq Toukan
Cc: Saeed Mahameed
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2016-05-20 10:12:14 +0800
19d795b67 kernel/padata.c: hide unused functions ... Browse Code »

A recent cleanup removed some exported functions that were not used
anywhere, which in turn exposed the fact that some other functions in
the same file are only used in some configurations.

We now get a warning about them when CONFIG_HOTPLUG_CPU is disabled:

kernel/padata.c:670:12: error: '__padata_remove_cpu' defined but not used [-Werror=unused-function]
static int __padata_remove_cpu(struct padata_instance *pinst, int cpu)
^~~~~~~~~~~~~~~~~~~
kernel/padata.c:650:12: error: '__padata_add_cpu' defined but not used [-Werror=unused-function]
static int __padata_add_cpu(struct padata_instance *pinst, int cpu)

This rearranges the code so the __padata_remove_cpu/__padata_add_cpu
functions are within the #ifdef that protects the code that calls them.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 4ba6d78c671e ("kernel/padata.c: removed unused code")
Signed-off-by: Arnd Bergmann
Cc: Richard Cochran
Cc: Steffen Klassert
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2016-05-20 10:12:14 +0800
815613da6 kernel/padata.c: removed unused code ... Browse Code »

By accident I stumbled across code that has never been used. This
driver has EXPORT_SYMBOL functions, and the only user of the code is
pcrypt.c, but this only uses a subset of the exported symbols.

According to 'git log -G', the functions, padata_set_cpumasks,
padata_add_cpu, and padata_remove_cpu have never been used since they
were first introduced. This patch removes the unused code.

On one 64 bit build, with CRYPTO_PCRYPT built in, the text is more than
4k smaller.

kbuild_hp> size $KBUILD_OUTPUT/vmlinux
text data bss dec hex filename
10566658 4678360 1122304 16367322 f9beda vmlinux
10561984 4678360 1122304 16362648 f9ac98 vmlinux

On another config, 32 bit, the saving is about 0.5k bytes.

kbuild_hp-x86> size $KBUILD_OUTPUT/vmlinux
6012005 2409513 2785280 11206798 ab008e vmlinux
6011491 2409513 2785280 11206284 aafe8c vmlinux

Signed-off-by: Richard Cochran
Cc: Steffen Klassert
Cc: Herbert Xu
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Richard Cochran
2016-05-20 10:12:14 +0800
b9fdac7f6 debugobjects: insulate non-fixup logic related to static obj from fixup callbacks ... Browse Code »

When activating a static object we need make sure that the object is
tracked in the object tracker. If it is a non-static object then the
activation is illegal.

In previous implementation, each subsystem need take care of this in
their fixup callbacks. Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.

To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not. If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.

This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.

At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked. Then Change
such object can overwrite someone's memory and cause unexpected
behaviour. For example, the timer_fixup_activate bind timer to function
stub_timer.

Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin
Cc: Jonathan Corbet
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Du, Changbin
2016-05-20 10:12:14 +0800
3263d28eb rcu: update debugobjects fixup callbacks return type ... Browse Code »

Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).

Signed-off-by: Du, Changbin
Cc: Jonathan Corbet
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Du, Changbin
2016-05-20 10:12:14 +0800
e3252464d timer: update debugobjects fixup callbacks return type ... Browse Code »

Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).

Signed-off-by: Du, Changbin
Cc: Jonathan Corbet
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Du, Changbin
2016-05-20 10:12:14 +0800
02a982a6e workqueue: update debugobjects fixup callbacks return type ... Browse Code »

Update the return type to use bool instead of int, corresponding to
change (debugobjects: make fixup functions return bool instead of int)

Signed-off-by: Du, Changbin
Cc: Jonathan Corbet
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Christian Borntraeger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Du, Changbin
2016-05-20 10:12:14 +0800
8e4f70e21 time: remove timespec_add_safe() ... Browse Code »

All references to timespec_add_safe() now use timespec64_add_safe().

The plan is to replace struct timespec references with struct timespec64
throughout the kernel as timespec is not y2038 safe.

Drop timespec_add_safe() and use timespec64_add_safe() for all
architectures.

Link: http://lkml.kernel.org/r/1461947989-21926-4-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani
Acked-by: John Stultz
Cc: Thomas Gleixner
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Deepa Dinamani
2016-05-20 10:12:14 +0800
bc2c53e5f time: add missing implementation for timespec64_add_safe() ... Browse Code »

timespec64_add_safe() has been defined in time64.h for 64 bit systems.
But, 32 bit systems only have an extern function prototype defined.
Provide a definition for the above function.

The function will be necessary as part of y2038 changes. struct
timespec is not y2038 safe. All references to timespec will be replaced
by struct timespec64. The function is meant to be a replacement for
timespec_add_safe().

The implementation is similar to timespec_add_safe().

Link: http://lkml.kernel.org/r/1461947989-21926-2-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani
Acked-by: John Stultz
Cc: Thomas Gleixner
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Deepa Dinamani
2016-05-20 10:12:14 +0800

19 May, 2016

3 commits

2600a46ee Merge tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"This includes two new updates for the ftrace infrastructure.

- With the changing of the code for filtering events by pid, from a
list of pids to a bitmask, we can now easily implement following
forks. With a new tracing option "event-fork" which, when set,
will have tasks with pids in set_event_pid, when they fork, to have
their child pids added to set_event_pid and the child will be
traced as well.

Note, if "event-fork" is set and a task with its pid in
set_event_pid exits, its pid will be removed from set_event_pid

- The addition of Tom Zanussi's hist triggers. This includes a very
thorough documentatino on how to use the hist triggers with events.
This introduces a quick and easy way to get histogram data from
events and their fields.

Some other cleanups and updates were added as well. Like Masami
Hiramatsu added test cases for the event trigger and hist triggers.
Also I added a speed up of filtering by using a temp buffer when
filters are set"

* tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
tracing: Use temp buffer when filtering events
tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic
tracing: Remove unused function trace_current_buffer_lock_reserve()
tracing: Remove one use of trace_current_buffer_lock_reserve()
tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL
tracing: Remove unused function trace_current_buffer_discard_commit()
tracing: Move trace_buffer_unlock_commit{_regs}() to local header
tracing: Fold filter_check_discard() into its only user
tracing: Make filter_check_discard() local
tracing: Move event_trigger_unlock_commit{_regs}() to local header
tracing: Don't use the address of the buffer array name in copy_from_user
tracing: Handle tracing_map_alloc_elts() error path correctly
tracing: Add check for NULL event field when creating hist field
tracing: checking for NULL instead of IS_ERR()
tracing: Do not inherit event-fork option for instances
tracing: Fix unsigned comparison to zero in hist trigger code
kselftests/ftrace: Add a test for log2 modifier of hist trigger
tracing: Add hist trigger 'log2' modifier
kselftests/ftrace: Add hist trigger testcases
kselftests/ftrace : Add event trigger testcases
...

Linus Torvalds
2016-05-19 09:55:19 +0800
03e1aa1cb Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit ... Browse Code »

Pull audit updates from Paul Moore:
"Four small audit patches for 4.7.

Two are simple cleanups around the audit thread management code, one
adds a tty field to AUDIT_LOGIN events, and the final patch makes
tty_name() usable regardless of CONFIG_TTY.

Nothing controversial, and it all passes our regression test"

* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
tty: provide tty_name() even without CONFIG_TTY
audit: add tty field to LOGIN event
audit: we don't need to __set_current_state(TASK_RUNNING)
audit: cleanup prune_tree_thread

Linus Torvalds
2016-05-19 09:46:55 +0800
9e17632c0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs cleanups from Al Viro:
"Assorted cleanups and fixes all over the place"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: only charge written data against RLIMIT_CORE
coredump: get rid of coredump_params->written
ecryptfs_lookup(): try either only encrypted or plaintext name
ecryptfs: avoid multiple aliases for directories
bpf: reject invalid names right in ->lookup()
__d_alloc(): treat NULL name as QSTR("/", 1)
mtd: switch ubi_open_volume_path() to vfs_stat()
mtd: switch open_mtd_by_chdev() to use of vfs_stat()

Linus Torvalds
2016-05-19 02:51:59 +0800

18 May, 2016

8 commits

1eccc6e15 Merge tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio ... Browse Code »

Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for kernel cycle v4.7:

Core infrastructural changes:

- Support for natively single-ended GPIO driver stages.

This means that if the hardware has registers to configure open
drain or open source configuration, we use that rather than (as we
did before) try to emulate it by switching the line to an input to
get high impedance.

This is also documented throughly in Documentation/gpio/driver.txt
for those of you who did not understand one word of what I just
wrote.

- Start to do away with the unnecessarily complex and unitelligible
ARCH_REQUIRE_GPIOLIB and ARCH_WANT_OPTIONAL_GPIOLIB, another
evolutional artifact from the time when the GPIO subsystem was
unmaintained.

Archs can now just select GPIOLIB and be done with it, cleanups to
arches will trickle in for the next kernel. Some minor archs ACKed
the changes immediately so these are included in this pull request.

- Advancing the use of the data pointer inside the GPIO device for
storing driver data by switching the PowerPC, Super-H Unicore and
a few other subarches or subsystem drivers in ALSA SoC, Input,
serial, SSB, staging etc to use it.

- The initialization now reads the input/output state of the GPIO
lines, so that each GPIO descriptor knows - if this callback is
implemented - whether the line is input or output. This also
reflects nicely in userspace "lsgpio".

- It is now possible to name GPIO producer names, line names, from
the device tree. (Platform data has been supported for a while).
I bet we will get a similar mechanism for ACPI one of those days.
This makes is possible to get sensible producer names for e.g.
GPIO rails in "lsgpio" in userspace.

New drivers:

- New driver for the Loongson1.

- The XLP driver now supports Broadcom Vulcan ARM64.

- The IT87 driver now supports IT8620 and IT8628.

- The PCA953X driver now supports Galileo Gen2.

Driver improvements:

- MCP23S08 was switched to use the gpiolib irqchip helpers and now
also suppors level-triggered interrupts.

- 74x164 and RCAR now supports the .set_multiple() callback

- AMDPT was converted to use generic GPIO.

- TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994
support the new single ended callback for open drain and in some
cases open source.

- Implement the .get_direction() callback for a few more drivers like
PL061, Xgene.

Cleanups:

- Paul Gortmaker combed through the drivers and de-modularized those
who are not really modules.

- Move the GPIO poweroff DT bindings to the power subdir where they
belong.

- Rename gpio-generic.c to gpio-mmio.c, which is much more to the
point. That's what it is handling, nothing more, nothing less"

* tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (126 commits)
MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
gpio: zevio: make it explicitly non-modular
gpio: timberdale: make it explicitly non-modular
gpio: stmpe: make it explicitly non-modular
gpio: sodaville: make it explicitly non-modular
pinctrl: sh-pfc: Let gpio_chip.to_irq() return zero on error
gpio: dwapb: Add ACPI device ID for DWAPB GPIO controller on X-Gene platforms
gpio: dt-bindings: add wd,mbl-gpio bindings
gpio: of: make it possible to name GPIO lines
gpio: make gpiod_to_irq() return negative for NO_IRQ
gpio: xgene: implement .get_direction()
gpio: xgene: Enable ACPI support for X-Gene GFC GPIO driver
gpio: tegra: Implement gpio_get_direction callback
gpio: set up initial state from .get_direction()
gpio: rename gpio-generic.c into gpio-mmio.c
gpio: generic: fix GPIO_GENERIC_PLATFORM is set to module case
gpio: dwapb: add gpio-signaled acpi event support
gpio: dwapb: convert device node to fwnode
gpio: dwapb: remove name from dwapb_port_property
gpio/qoriq: select IRQ_DOMAIN
...

Linus Torvalds
2016-05-18 08:39:42 +0800
0b86c75db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching ... Browse Code »

Pull livepatching updates from Jiri Kosina:

- remove of our own implementation of architecture-specific relocation
code and leveraging existing code in the module loader to perform
arch-dependent work, from Jessica Yu.

The relevant patches have been acked by Rusty (for module.c) and
Heiko (for s390).

- live patching support for ppc64le, which is a joint work of Michael
Ellerman and Torsten Duwe. This is coming from topic branch that is
share between livepatching.git and ppc tree.

- addition of livepatching documentation from Petr Mladek

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: make object/func-walking helpers more robust
livepatch: Add some basic livepatch documentation
powerpc/livepatch: Add live patching support on ppc64le
powerpc/livepatch: Add livepatch stack to struct thread_info
powerpc/livepatch: Add livepatch header
livepatch: Allow architectures to specify an alternate ftrace location
ftrace: Make ftrace_location_range() global
livepatch: robustify klp_register_patch() API error checking
Documentation: livepatch: outline Elf format and requirements for patch modules
livepatch: reuse module loader code to write relocations
module: s390: keep mod_arch_specific for livepatch modules
module: preserve Elf information for livepatch modules
Elf: add livepatch-specific Elf constants

Linus Torvalds
2016-05-18 08:11:27 +0800
a7fd20d1c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Highlights:

1) Support SPI based w5100 devices, from Akinobu Mita.

2) Partial Segmentation Offload, from Alexander Duyck.

3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

4) Allow cls_flower stats offload, from Amir Vadai.

5) Implement bpf blinding, from Daniel Borkmann.

6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
actually using FASYNC these atomics are superfluous. From Eric
Dumazet.

7) Run TCP more preemptibly, also from Eric Dumazet.

8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
driver, from Gal Pressman.

9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

11) Support tunneling offloads in qed, from Manish Chopra.

12) aRFS offloading in mlx5e, from Maor Gottlieb.

13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
Leitner.

14) Add MSG_EOR support to TCP, this allows controlling packet
coalescing on application record boundaries for more accurate
socket timestamp sampling. From Martin KaFai Lau.

15) Fix alignment of 64-bit netlink attributes across the board, from
Nicolas Dichtel.

16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

17) Several conversions of drivers to ethtool ksettings, from Philippe
Reynes.

18) Checksum neutral ILA in ipv6, from Tom Herbert.

19) Factorize all of the various marvell dsa drivers into one, from
Vivien Didelot

20) Add VF support to qed driver, from Yuval Mintz"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
Revert "phy dp83867: Make rgmii parameters optional"
r8169: default to 64-bit DMA on recent PCIe chips
phy dp83867: Make rgmii parameters optional
phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
bpf: arm64: remove callee-save registers use for tmp registers
asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
switchdev: pass pointer to fib_info instead of copy
net_sched: close another race condition in tcf_mirred_release()
tipc: fix nametable publication field in nl compat
drivers: net: Don't print unpopulated net_device name
qed: add support for dcbx.
ravb: Add missing free_irq() calls to ravb_close()
qed: Remove a stray tab
net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fec-mpc52xx: use phydev from struct net_device
bpf, doc: fix typo on bpf_asm descriptions
stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fs-enet: use phydev from struct net_device
...

Linus Torvalds
2016-05-18 07:26:30 +0800
a4d1dbed0 Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer updates from Jens Axboe:
"This is the core block IO changes for this merge window. Nothing
earth shattering in here, it's mostly just fixes. In detail:

- Fix for a long standing issue where wrong ordering in blk-mq caused
order_to_size() to spew a warning. From Bart.

- Async discard support from Christoph. Basically just splitting our
sync interface into a submit + wait part.

- Add a cleaner interface for flagging whether a device has a write
back cache or not. We've previously overloaded blk_queue_flush()
with this, but let's make it more explicit. Drivers cleaned up and
updated in the drivers pull request. From me.

- Fix for a double check for whether IO accounting is enabled or not.
From Michael Callahan.

- Fix for the async discard from Mike Snitzer, reinstating the early
EOPNOTSUPP return if the device doesn't support discards.

- Also from Mike, export bio_inc_remaining() so dm can drop it's
private copy of it.

- From Ming Lin, add support for passing in an offset for request
payloads.

- Tag function export from Sagi, which will be used in NVMe in the
drivers pull.

- Two blktrace related fixes from Shaohua.

- Propagate NOMERGE flag when making a request from a bio, also from
Shaohua.

- An optimization to not parse cgroup paths in blk-throttle, if we
don't need to. From Shaohua"

* 'for-4.7/core' of git://git.kernel.dk/linux-block:
blk-mq: fix undefined behaviour in order_to_size()
blk-throttle: don't parse cgroup path if trace isn't enabled
blktrace: add missed mask name
blktrace: delete garbage for message trace
block: make bio_inc_remaining() interface accessible again
block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
block: Minor blk_account_io_start usage cleanup
block: add __blkdev_issue_discard
block: remove struct bio_batch
block: copy NOMERGE flag from bio to request
block: add ability to flag write back caching on a device
blk-mq: Export tagset iter function
block: add offset in blk_add_request_payload()
writeback: Fix performance regression in wb_over_bg_thresh()

Linus Torvalds
2016-05-18 06:29:49 +0800
7f427d3a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull parallel filesystem directory handling update from Al Viro.

This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.

The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.

The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).

A more detailed explanation of the process from Al Viro:
"The xattr work starts with some acl fixes, then switches ->getxattr to
passing inode and dentry separately. This is the point where the
things start to get tricky - that got merged into the very beginning
of the -rc3-based #work.lookups, to allow untangling the
security_d_instantiate() mess. The xattr work itself proceeds to
switch a lot of filesystems to generic_...xattr(); no complications
there.

After that initial xattr work, the series then does the following:

- untangle security_d_instantiate()

- convert a bunch of open-coded lookup_one_len_unlocked() to calls of
that thing; one such place (in overlayfs) actually yields a trivial
conflict with overlayfs fixes later in the cycle - overlayfs ended
up switching to a variant of lookup_one_len_unlocked() sans the
permission checks. I would've dropped that commit (it gets
overridden on merge from #ovl-fixes in #for-next; proper resolution
is to use the variant in mainline fs/overlayfs/super.c), but I
didn't want to rebase the damn thing - it was fairly late in the
cycle...

- some filesystems had managed to depend on lookup/lookup exclusion
for *fs-internal* data structures in a way that would break if we
relaxed the VFS exclusion. Fixing hadn't been hard, fortunately.

- core of that series - parallel lookup machinery, replacing
->i_mutex with rwsem, making lookup_slow() take it only shared. At
that point lookups happen in parallel; lookups on the same name
wait for the in-progress one to be done with that dentry.

Surprisingly little code, at that - almost all of it is in
fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
making it use the new primitive and actually switching to locking
shared.

- parallel readdir stuff - first of all, we provide the exclusion on
per-struct file basis, same as we do for read() vs lseek() for
regular files. That takes care of most of the needed exclusion in
readdir/readdir; however, these guys are trickier than lookups, so
I went for switching them one-by-one. To do that, a new method
'->iterate_shared()' is added and filesystems are switched to it
as they are either confirmed to be OK with shared lock on directory
or fixed to be OK with that. I hope to kill the original method
come next cycle (almost all in-tree filesystems are switched
already), but it's still not quite finished.

- several filesystems get switched to parallel readdir. The
interesting part here is dealing with dcache preseeding by readdir;
that needs minor adjustment to be safe with directory locked only
shared.

Most of the filesystems doing that got switched to in those
commits. Important exception: NFS. Turns out that NFS folks, with
their, er, insistence on VFS getting the fuck out of the way of the
Smart Filesystem Code That Knows How And What To Lock(tm) have
grown the locking of their own. They had their own homegrown
rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
is the reader there). Of course, with VFS getting the fuck out of
the way, as requested, the actual smarts of the smart filesystem
code etc. had become exposed...

- do_last/lookup_open/atomic_open cleanups. As the result, open()
without O_CREAT locks the directory only shared. Including the
->atomic_open() case. Backmerge from #for-linus in the middle of
that - atomic_open() fix got brought in.

- then comes NFS switch to saner (VFS-based ;-) locking, killing the
homegrown "lookup and readdir are writers" kinda-sorta rwsem. All
exclusion for sillyunlink/lookup is done by the parallel lookups
mechanism. Exclusion between sillyunlink and rmdir is a real rwsem
now - rmdir being the writer.

Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
now.

- the rest of the series consists of switching a lot of filesystems
to parallel readdir; in a lot of cases ->llseek() gets simplified
as well. One backmerge in there (again, #for-linus - rockridge
fix)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
ext4: switch to ->iterate_shared()
hfs: switch to ->iterate_shared()
hfsplus: switch to ->iterate_shared()
hostfs: switch to ->iterate_shared()
hpfs: switch to ->iterate_shared()
hpfs: handle allocation failures in hpfs_add_pos()
gfs2: switch to ->iterate_shared()
f2fs: switch to ->iterate_shared()
afs: switch to ->iterate_shared()
befs: switch to ->iterate_shared()
befs: constify stuff a bit
isofs: switch to ->iterate_shared()
get_acorn_filename(): deobfuscate a bit
btrfs: switch to ->iterate_shared()
logfs: no need to lock directory in lseek
switch ecryptfs to ->iterate_shared
9p: switch to ->iterate_shared()
fat: switch to ->iterate_shared()
romfs, squashfs: switch to ->iterate_shared()
more trivial ->iterate_shared conversions
...

Linus Torvalds
2016-05-18 02:01:31 +0800
ede40902c Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq updates from Thomas Gleixner:
"This update delivers:

- Yet another interrupt chip diver (LPC32xx)
- Core functions to handle partitioned per-cpu interrupts
- Enhancements to the IPI core
- Proper handling of irq type configuration
- A large set of ARM GIC enhancements
- The usual pile of small fixes, cleanups and enhancements"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
irqchip/bcm2836: Use a more generic memory barrier call
irqchip/bcm2836: Fix compiler warning on 64-bit build
irqchip/bcm2836: Drop smp_set_ops on arm64 builds
irqchip/gic: Add helper functions for GIC setup and teardown
irqchip/gic: Store GIC configuration parameters
irqchip/gic: Pass GIC pointer to save/restore functions
irqchip/gic: Return an error if GIC initialisation fails
irqchip/gic: Remove static irq_chip definition for eoimode1
irqchip/gic: Don't initialise chip if mapping IO space fails
irqchip/gic: WARN if setting the interrupt type for a PPI fails
irqchip/gic: Don't unnecessarily write the IRQ configuration
irqchip: Mask the non-type/sense bits when translating an IRQ
genirq: Ensure IRQ descriptor is valid when setting-up the IRQ
irqchip/gic-v3: Configure all interrupts as non-secure Group-1
irqchip/gic-v2m: Add workaround for Broadcom NS2 GICv2m erratum
irqchip/irq-alpine-msi: Don't use
irqchip/mbigen: Checking for IS_ERR() instead of NULL
irqchip/gic-v3: Remove inexistant register definition
irqchip/gicv3-its: Don't allow devices whose ID is outside range
irqchip: Add LPC32xx interrupt controller driver
...

Linus Torvalds
2016-05-18 01:27:29 +0800
91e8d0cbc Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer updates from Thomas Gleixner:
"A rather small set of patches from the timer departement:

- Some more y2038 work
- Yet another new clocksource driver
- The usual set of small fixes, cleanups and enhancements"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/tegra: Remove unused suspend/resume code
clockevents/driversi/mps2: add MPS2 Timer driver
dt-bindings: document the MPS2 timer bindings
clocksource/drivers/mtk_timer: Add __init attribute
clockevents/drivers/dw_apb_timer: Implement ->set_state_oneshot_stopped()
time: Introduce do_sys_settimeofday64()
security: Introduce security_settime64()
clocksource: Add missing include of of.h.

Linus Torvalds
2016-05-18 00:49:28 +0800
2fe2edf85 Merge tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/rostedt/linux-trace

Pull tracing ring-buffer fixes from Steven Rostedt:
"Hao Qin reported an integer overflow possibility with signed and
unsigned numbers in the ring-buffer code.

https://bugzilla.kernel.org/show_bug.cgi?id=118001

At first I did not think this was too much of an issue, because the
overflow would be caught later when either too much data was allocated
or it would trigger RB_WARN_ON() which shuts down the ring buffer.

But looking closer into it, I found that the right settings could
bypass the checks and crash the kernel. Luckily, this is only
accessible by root.

The first fix is to convert all the variables into long, such that we
don't get into issues between 32 bit variables being assigned 64 bit
ones. This fixes the RB_WARN_ON() triggering.

The next fix is to get rid of a duplicate DIV_ROUND_UP() that when
called twice with the right value, can cause a kernel crash.

The first DIV_ROUND_UP() is to normalize the input and it is checked
against the minimum allowable value. But then DIV_ROUND_UP() is
called again, which can overflow due to the (a + b - 1)/b, logic. The
first called upped the value, the second can overflow (with the +b
part).

The second call to DIV_ROUND_UP() came in via a second change a while
ago and the code is cleaned up to remove it"

* tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Prevent overflow of size in ring_buffer_resize()
ring-buffer: Use long for nr_pages to avoid overflow failures

Linus Torvalds
2016-05-18 00:42:58 +0800

17 May, 2016

14 commits

be69f70e6 Merge branches 'for-4.7/core', 'for-4.7/livepatching-doc' and 'for-4.7/livepatch… ... Browse Code »

…ing-ppc64' into for-linus

Jiri Kosina
2016-05-17 18:06:35 +0800
0e0162bb8 Merge branch 'ovl-fixes' into for-linus ... Browse Code »

Backmerge to resolve a conflict in ovl_lookup_real();
"ovl_lookup_real(): use lookup_one_len_unlocked()" instead,
but it was too late in the cycle to rebase.

Al Viro
2016-05-17 14:17:59 +0800
d57d39431 Merge tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael Wysocki:
"The majority of changes go into the cpufreq subsystem this time.

To me, quite obviously, the biggest ticket item is the new "schedutil"
governor. Interestingly enough, it's the first new cpufreq governor
since the beginning of the git era (except for some out-of-the-tree
ones).

There are two main differences between it and the existing governors.
First, it uses the information provided by the scheduler directly for
making its decisions, so it doesn't have to track anything by itself.
Second, it can invoke drivers (supporting that feature) to adjust CPU
performance right away without having to spawn work items to be
executed in process context or similar. Currently, the acpi-cpufreq
driver is the only one supporting that mode of operation, but then it
is used on a large number of systems.

The "schedutil" governor as included here is very simple and mostly
regarded as a foundation for future work on the integration of the
scheduler with CPU power management (in fact, there is work in
progress on top of it already). Nevertheless it works and the
preliminary results obtained with it are encouraging.

There also is some consolidation of CPU frequency management for ARM
platforms that can add their machine IDs the the new stub dt-platdev
driver now and that will take care of creating the requisite platform
device for cpufreq-dt, so it is not necessary to do that in platform
code any more. Several ARM platforms are switched over to using this
generic mechanism.

In addition to that, the intel_pstate driver is now going to respect
CPU frequency limits set by the platform firmware (or a BMC) and
provided via the ACPI _PPC object.

The devfreq subsystem is getting a new "passive" governor for SoCs
subsystems that will depend on somebody else to manage their voltage
rails and its support for Samsung Exynos SoCs is consolidated.

The rest is support for new hardware (Intel Broxton support in
intel_idle for one example), bug fixes, optimizations and cleanups in
a number of places.

Specifics:

- New cpufreq "schedutil" governor (making decisions based on CPU
utilization information provided by the scheduler and capable of
switching CPU frequencies right away if the underlying driver
supports that) and support for fast frequency switching in the
acpi-cpufreq driver (Rafael Wysocki)

- Consolidation of CPU frequency management on ARM platforms allowing
them to get rid of some platform-specific boilerplate code if they
are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao,
Marc Gonzalez)

- Support for ACPI _PPC and CPU frequency limits in the intel_pstate
driver (Srinivas Pandruvada)

- Fixes and cleanups in the cpufreq core and generic governor code
(Rafael Wysocki, Sai Gurrappadi)

- intel_pstate driver optimizations and cleanups (Rafael Wysocki,
Philippe Longepe, Chen Yu, Joe Perches)

- cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri
Bhat)

- cpufreq qoriq driver fixes and cleanups (Jia Hongtao)

- ACPI cpufreq driver cleanups (Viresh Kumar)

- Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang,
Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla)

- Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann)

- Fixes and cleanups in the OPP (Operating Performance Points)
framework, mostly related to OPP sharing, and reorganization of
OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla)

- New "passive" governor for devfreq (for SoC subsystems that will
rely on someone else for the management of their power resources)
and consolidation of devfreq support for Exynos platforms, coding
style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham)

- PM core fixes and cleanups, mostly to make it work better with the
generic power domains (genpd) framework, and updates for that
framework (Ulf Hansson, Thierry Reding, Colin Ian King)

- Intel Broxton support for the intel_idle driver (Len Brown)

- cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach)

- ARM cpuidle cleanups (Jisheng Zhang)

- Intel Kabylake support for the RAPL power capping driver (Jacob
Pan)

- AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko
Stuebner)

- Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King,
Mattia Dongili, Thomas Renninger)"

* tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (112 commits)
intel_pstate: Clean up get_target_pstate_use_performance()
intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
intel_pstate: Clarify average performance computation
intel_pstate: Avoid unnecessary synchronize_sched() during initialization
cpufreq: schedutil: Make default depend on CONFIG_SMP
cpufreq: powernv: del_timer_sync when global and local pstate are equal
cpufreq: powernv: Move smp_call_function_any() out of irq safe block
intel_pstate: Clean up intel_pstate_get()
cpufreq: schedutil: Make it depend on CONFIG_SMP
cpufreq: governor: Fix handling of special cases in dbs_update()
PM / OPP: Move CONFIG_OF dependent code in a separate file
cpufreq: intel_pstate: Ignore _PPC processing under HWP
cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table
PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_table
cpufreq: tango: Use generic platdev driver
PM / OPP: pass cpumask by reference
cpufreq: Fix GOV_LIMITS handling for the userspace governor
cpupower: fix potential memory leak
PM / devfreq: style/typo fixes
PM / devfreq: exynos: Add the detailed correlation for Exynos5422 bus
..

Linus Torvalds
2016-05-17 10:17:22 +0800
be092017b Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux ... Browse Code »

Pull arm64 updates from Will Deacon:

- virt_to_page/page_address optimisations

- support for NUMA systems described using device-tree

- support for hibernate/suspend-to-disk

- proper support for maxcpus= command line parameter

- detection and graceful handling of AArch64-only CPUs

- miscellaneous cleanups and non-critical fixes

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
arm64: do not enforce strict 16 byte alignment to stack pointer
arm64: kernel: Fix incorrect brk randomization
arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str
arm64: secondary_start_kernel: Remove unnecessary barrier
arm64: Ensure pmd_present() returns false after pmd_mknotpresent()
arm64: Replace hard-coded values in the pmd/pud_bad() macros
arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
arm64: Fix typo in the pmdp_huge_get_and_clear() definition
arm64: mm: remove unnecessary EXPORT_SYMBOL_GPL
arm64: always use STRICT_MM_TYPECHECKS
arm64: kvm: Fix kvm teardown for systems using the extended idmap
arm64: kaslr: increase randomization granularity
arm64: kconfig: drop CONFIG_RTC_LIB dependency
arm64: make ARCH_SUPPORTS_DEBUG_PAGEALLOC depend on !HIBERNATION
arm64: hibernate: Refuse to hibernate if the boot cpu is offline
arm64: kernel: Add support for hibernate/suspend-to-disk
PM / Hibernate: Call flush_icache_range() on pages restored in-place
arm64: Add new asm macro copy_page
arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
arm64: kernel: Include _AC definition in page.h
...

Linus Torvalds
2016-05-17 08:17:24 +0800
825a3b260 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:

- massive CPU hotplug rework (Thomas Gleixner)

- improve migration fairness (Peter Zijlstra)

- CPU load calculation updates/cleanups (Yuyang Du)

- cpufreq updates (Steve Muckle)

- nohz optimizations (Frederic Weisbecker)

- switch_mm() micro-optimization on x86 (Andy Lutomirski)

- ... lots of other enhancements, fixes and cleanups.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
ARM: Hide finish_arch_post_lock_switch() from modules
sched/core: Provide a tsk_nr_cpus_allowed() helper
sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed
sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
sched/fair: Correct unit of load_above_capacity
sched/fair: Clean up scale confusion
sched/nohz: Fix affine unpinned timers mess
sched/fair: Fix fairness issue on migration
sched/core: Kill sched_class::task_waking to clean up the migration logic
sched/fair: Prepare to fix fairness problems on migration
sched/fair: Move record_wakee()
sched/core: Fix comment typo in wake_q_add()
sched/core: Remove unused variable
sched: Make hrtick_notifier an explicit call
sched/fair: Make ilb_notifier an explicit call
sched/hotplug: Make activate() the last hotplug step
sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
sched/migration: Move CPU_ONLINE into scheduler state
sched/migration: Move calc_load_migrate() into CPU_DYING
sched/migration: Move prepare transition to SCHED_STARTING state
...

Linus Torvalds
2016-05-17 05:47:16 +0800
36db171cc Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"Bigger kernel side changes:

- Add backwards writing capability to the perf ring-buffer code,
which is preparation for future advanced features like robust
'overwrite support' and snapshot mode. (Wang Nan)

- Add pause and resume ioctls for the perf ringbuffer (Wang Nan)

- x86 Intel cstate code cleanups and reorgnization (Thomas Gleixner)

- x86 Intel uncore and CPU PMU driver updates (Kan Liang, Peter
Zijlstra)

- x86 AUX (Intel PT) related enhancements and updates (Alexander
Shishkin)

- x86 MSR PMU driver enhancements and updates (Huang Rui)

- ... and lots of other changes spread out over 40+ commits.

Biggest tooling side changes:

- 'perf trace' features and enhancements. (Arnaldo Carvalho de Melo)

- BPF tooling updates (Wang Nan)

- 'perf sched' updates (Jiri Olsa)

- 'perf probe' updates (Masami Hiramatsu)

- ... plus 200+ other enhancements, fixes and cleanups to tools/

The merge commits, the shortlog and the changelogs contain a lot more
details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (249 commits)
perf/core: Disable the event on a truncated AUX record
perf/x86/intel/pt: Generate PMI in the STOP region as well
perf buildid-cache: Use lsdir() for looking up buildid caches
perf symbols: Use lsdir() for the search in kcore cache directory
perf tools: Use SBUILD_ID_SIZE where applicable
perf tools: Fix lsdir to set errno correctly
perf trace: Move seccomp args beautifiers to tools/perf/trace/beauty/
perf trace: Move flock op beautifier to tools/perf/trace/beauty/
perf build: Add build-test for debug-frame on arm/arm64
perf build: Add build-test for libunwind cross-platforms support
perf script: Fix export of callchains with recursion in db-export
perf script: Fix callchain addresses in db-export
perf script: Fix symbol insertion behavior in db-export
perf symbols: Add dso__insert_symbol function
perf scripting python: Use Py_FatalError instead of die()
perf tools: Remove xrealloc and ALLOC_GROW
perf help: Do not use ALLOC_GROW in add_cmd_list
perf pmu: Make pmu_formats_string to check return value of strbuf
perf header: Make topology checkers to check return value of strbuf
perf tools: Make alias handler to check return value of strbuf
...

Linus Torvalds
2016-05-17 05:08:43 +0800
3469d261e Merge branch 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull support for killable rwsems from Ingo Molnar:
"This, by Michal Hocko, implements down_write_killable().

The main usecase will be to update mm_sem usage sites to use this new
API, to allow the mm-reaper introduced in commit aac453635549 ("mm,
oom: introduce oom reaper") to tear down oom victim address spaces
asynchronously with minimum latencies and without deadlock worries"

[ The vfs will want it too as the inode lock is changed from a mutex to
a rwsem due to the parallel lookup and readdir updates ]

* 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix comment on register clobbering
locking/rwsem: Fix down_write_killable()
locking/rwsem, x86: Add frame annotation for call_rwsem_down_write_failed_killable()
locking/rwsem: Provide down_write_killable()
locking/rwsem, x86: Provide __down_write_killable()
locking/rwsem, s390: Provide __down_write_killable()
locking/rwsem, ia64: Provide __down_write_killable()
locking/rwsem, alpha: Provide __down_write_killable()
locking/rwsem: Introduce basis for down_write_killable()
locking/rwsem, sparc: Drop superfluous arch specific implementation
locking/rwsem, sh: Drop superfluous arch specific implementation
locking/rwsem, xtensa: Drop superfluous arch specific implementation
locking/rwsem: Drop explicit memory barriers
locking/rwsem: Get rid of __down_write_nested()

Linus Torvalds
2016-05-17 04:41:02 +0800
1c19b68a2 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking changes from Ingo Molnar:
"The main changes in this cycle were:

- pvqspinlock statistics fixes (Davidlohr Bueso)

- flip atomic_fetch_or() arguments (Peter Zijlstra)

- locktorture simplification (Paul E. McKenney)

- documentation updates (SeongJae Park, David Howells, Davidlohr
Bueso, Paul E McKenney, Peter Zijlstra, Will Deacon)

- various fixes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/atomics: Flip atomic_fetch_or() arguments
locking/pvqspinlock: Robustify init_qspinlock_stat()
locking/pvqspinlock: Avoid double resetting of stats
lcoking/locktorture: Simplify the torture_runnable computation
locking/Documentation: Clarify that ACQUIRE applies to loads, RELEASE applies to stores
locking/Documentation: State purpose of memory-barriers.txt
locking/Documentation: Add disclaimer
locking/Documentation/lockdep: Fix spelling mistakes
locking/lockdep: Deinline register_lock_class(), save 2328 bytes
locking/locktorture: Fix NULL pointer dereference for cleanup paths
locking/locktorture: Fix deboosting NULL pointer dereference
locking/Documentation: Mention smp_cond_acquire()
locking/Documentation: Insert white spaces consistently
locking/Documentation: Fix formatting inconsistencies
locking/Documentation: Add missed subsection in TOC
locking/Documentation: Fix missed s/lock/acquire renames
locking/Documentation: Clarify relationship of barrier() to control dependencies

Linus Torvalds
2016-05-17 04:17:56 +0800
230e51f21 Merge branch 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core signal updates from Ingo Molnar:
"These updates from Stas Sergeev and Andy Lutomirski, improve the
sigaltstack interface by extending its ABI with the SS_AUTODISARM
feature, which makes it possible to use swapcontext() in a sighandler
that works on sigaltstack. Without this flag, the subsequent signal
will corrupt the state of the switched-away sighandler.

The inspiration is more robust dosemu signal handling"

* 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
signals/sigaltstack: Change SS_AUTODISARM to (1U << 31)
signals/sigaltstack: Report current flag bits in sigaltstack()
selftests/sigaltstack: Fix the sigaltstack test on old kernels
signals/sigaltstack: If SS_AUTODISARM, bypass on_sig_stack()
selftests/sigaltstack: Add new testcase for sigaltstack(SS_ONSTACK|SS_AUTODISARM)
signals/sigaltstack: Implement SS_AUTODISARM flag
signals/sigaltstack: Prepare to add new SS_xxx flags
signals/sigaltstack, x86/signals: Unify the x86 sigaltstack check with other architectures

Linus Torvalds
2016-05-17 03:25:25 +0800
a3871bd43 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU updates from Ingo Molnar:
"The main changes are:

- Documentation updates, including fixes to the design-level
requirements documentation and a fixed version of the design-level
data-structure documentation. These fixes include removing
cartoons and getting rid of the html/htmlx duplication.

- Further improvements to the new-age expedited grace periods.

- Miscellaneous fixes.

- Torture-test changes, including a new rcuperf module for measuring
RCU grace-period performance and scalability, which is useful for
the expedited-grace-period changes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
rcutorture: Add boot-time adjustment of leaf fanout
rcutorture: Add irqs-disabled test for call_rcu()
rcutorture: Dump trace buffer upon shutdown
rcutorture: Don't rebuild identical kernel
rcutorture: Add OS-jitter capability
documentation: Add documentation for RCU's major data structures
rcutorture: Convert test duration to seconds early
torture: Kill qemu, not parent process
torture: Clarify refusal to run more than one torture test
rcutorture: Consider FROZEN hotplug notifier transitions
rcutorture: Remove redundant initialization to zero
rcuperf: Do not wake up shutdown wait queue if "shutdown" is false.
rcutorture: Add largish-system rcuperf scenario
rcutorture: Avoid RCU CPU stall warning and RT throttling
rcutorture: Add rcuperf holdoff boot parameter to reduce interference
rcutorture: Make scripts analyze rcuperf trace data, if present
rcutorture: Make rcuperf collect expedited event-trace data
rcutorture: Print measure of batching efficiency
rcutorture: Set rcuperf writer kthreads to real-time priority
rcutorture: Bind rcuperf reader/writer kthreads to CPUs
...

Linus Torvalds
2016-05-17 03:02:08 +0800
4f3446bb8 bpf: add generic constant blinding for use in jits ... Browse Code »

This work adds a generic facility for use from eBPF JIT compilers
that allows for further hardening of JIT generated images through
blinding constants. In response to the original work on BPF JIT
spraying published by Keegan McAllister [1], most BPF JITs were
changed to make images read-only and start at a randomized offset
in the page, where the rest was filled with trap instructions. We
have this nowadays in x86, arm, arm64 and s390 JIT compilers.
Additionally, later work also made eBPF interpreter images read
only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86,
arm, arm64 and s390 archs as well currently. This is done by
default for mentioned JITs when JITing is enabled. Furthermore,
we had a generic and configurable constant blinding facility on our
todo for quite some time now to further make spraying harder, and
first implementation since around netconf 2016.

We found that for systems where untrusted users can load cBPF/eBPF
code where JIT is enabled, start offset randomization helps a bit
to make jumps into crafted payload harder, but in case where larger
programs that cross page boundary are injected, we again have some
part of the program opcodes at a page start offset. With improved
guessing and more reliable payload injection, chances can increase
to jump into such payload. Elena Reshetova recently wrote a test
case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which
can leave some more room for payloads. Note that for all this,
additional bugs in the kernel are still required to make the jump
(and of course to guess right, to not jump into a trap) and naturally
the JIT must be enabled, which is disabled by default.

For helping mitigation, the general idea is to provide an option
bpf_jit_harden that admins can tweak along with bpf_jit_enable, so
that for cases where JIT should be enabled for performance reasons,
the generated image can be further hardened with blinding constants
for unpriviledged users (bpf_jit_harden == 1), with trading off
performance for these, but not for privileged ones. We also added
the option of blinding for all users (bpf_jit_harden == 2), which
is quite helpful for testing f.e. with test_bpf.ko. There are no
further e.g. hardening levels of bpf_jit_harden switch intended,
rationale is to have it dead simple to use as on/off. Since this
functionality would need to be duplicated over and over for JIT
compilers to use, which are already complex enough, we provide a
generic eBPF byte-code level based blinding implementation, which is
then just transparently JITed. JIT compilers need to make only a few
changes to integrate this facility and can be migrated one by one.

This option is for eBPF JITs and will be used in x86, arm64, s390
without too much effort, and soon ppc64 JITs, thus that native eBPF
can be blinded as well as cBPF to eBPF migrations, so that both can
be covered with a single implementation. The rule for JITs is that
bpf_jit_blind_constants() must be called from bpf_int_jit_compile(),
and in case blinding is disabled, we follow normally with JITing the
passed program. In case blinding is enabled and we fail during the
process of blinding itself, we must return with the interpreter.
Similarly, in case the JITing process after the blinding failed, we
return normally to the interpreter with the non-blinded code. Meaning,
interpreter doesn't change in any way and operates on eBPF code as
usual. For doing this pre-JIT blinding step, we need to make use of
a helper/auxiliary register, here BPF_REG_AX. This is strictly internal
to the JIT and not in any way part of the eBPF architecture. Just like
in the same way as JITs internally make use of some helper registers
when emitting code, only that here the helper register is one
abstraction level higher in eBPF bytecode, but nevertheless in JIT
phase. That helper register is needed since f.e. manually written
program can issue loads to all registers of eBPF architecture.

The core concept with the additional register is: blind out all 32
and 64 bit constants by converting BPF_K based instructions into a
small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this
is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND,
and REG BPF_REG_AX, so actual operation on the target register
is translated from BPF_K into BPF_X one that is operating on
BPF_REG_AX's content. During rewriting phase when blinding, RND is
newly generated via prandom_u32() for each processed instruction.
64 bit loads are split into two 32 bit loads to make translation and
patching not too complex. Only basic thing required by JITs is to
call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair, and to map BPF_REG_AX into an unused register.

Small bpf_jit_disasm extract from [2] when applied to x86 JIT:

echo 0 > /proc/sys/net/core/bpf_jit_harden

ffffffffa034f5e9 + :
[...]
39: mov $0xa8909090,%eax
3e: mov $0xa8909090,%eax
43: mov $0xa8ff3148,%eax
48: mov $0xa89081b4,%eax
4d: mov $0xa8900bb0,%eax
52: mov $0xa810e0c1,%eax
57: mov $0xa8908eb4,%eax
5c: mov $0xa89020b0,%eax
[...]

echo 1 > /proc/sys/net/core/bpf_jit_harden

ffffffffa034f1e5 + :
[...]
39: mov $0xe1192563,%r10d
3f: xor $0x4989b5f3,%r10d
46: mov %r10d,%eax
49: mov $0xb8296d93,%r10d
4f: xor $0x10b9fd03,%r10d
56: mov %r10d,%eax
59: mov $0x8c381146,%r10d
5f: xor $0x24c7200e,%r10d
66: mov %r10d,%eax
69: mov $0xeb2a830e,%r10d
6f: xor $0x43ba02ba,%r10d
76: mov %r10d,%eax
79: mov $0xd9730af,%r10d
7f: xor $0xa5073b1f,%r10d
86: mov %r10d,%eax
89: mov $0x9a45662b,%r10d
8f: xor $0x325586ea,%r10d
96: mov %r10d,%eax
[...]

As can be seen, original constants that carry payload are hidden
when enabled, actual operations are transformed from constant-based
to register-based ones, making jumps into constants ineffective.
Above extract/example uses single BPF load instruction over and
over, but of course all instructions with constants are blinded.

Performance wise, JIT with blinding performs a bit slower than just
JIT and faster than interpreter case. This is expected, since we
still get all the performance benefits from JITing and in normal
use-cases not every single instruction needs to be blinded. Summing
up all 296 test cases averaged over multiple runs from test_bpf.ko
suite, interpreter was 55% slower than JIT only and JIT with blinding
was 8% slower than JIT only. Since there are also some extremes in
the test suite, I expect for ordinary workloads that the performance
for the JIT with blinding case is even closer to JIT only case,
f.e. nmap test case from suite has averaged timings in ns 29 (JIT),
35 (+ blinding), and 151 (interpreter).

BPF test suite, seccomp test suite, eBPF sample code and various
bigger networking eBPF programs have been tested with this and were
running fine. For testing purposes, I also adapted interpreter and
redirected blinded eBPF image to interpreter and also here all tests
pass.

[1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
[2] https://github.com/01org/jit-spray-poc-for-ksp/
[3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5

Signed-off-by: Daniel Borkmann
Reviewed-by: Elena Reshetova
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-05-17 01:49:32 +0800
d1c55ab5e bpf: prepare bpf_int_jit_compile/bpf_prog_select_runtime apis ... Browse Code »

Since the blinding is strictly only called from inside eBPF JITs,
we need to change signatures for bpf_int_jit_compile() and
bpf_prog_select_runtime() first in order to prepare that the
eBPF program we're dealing with can change underneath. Hence,
for call sites, we need to return the latest prog. No functional
change in this patch.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-05-17 01:49:32 +0800
c237ee5eb bpf: add bpf_patch_insn_single helper ... Browse Code »

Move the functionality to patch instructions out of the verifier
code and into the core as the new bpf_patch_insn_single() helper
will be needed later on for blinding as well. No changes in
functionality.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-05-17 01:49:32 +0800
4936e3528 bpf: minor cleanups in ebpf code ... Browse Code »

Besides others, remove redundant comments where the code is self
documenting enough, and properly indent various bpf_verifier_ops
and bpf_prog_type_list declarations. Moreover, remove two exports
that actually have no module user.

Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2016-05-17 01:49:31 +0800

16 May, 2016

3 commits

c47b3bd0d Merge branch 'pm-cpufreq' ... Browse Code »

* pm-cpufreq: (63 commits)
intel_pstate: Clean up get_target_pstate_use_performance()
intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
intel_pstate: Clarify average performance computation
intel_pstate: Avoid unnecessary synchronize_sched() during initialization
cpufreq: schedutil: Make default depend on CONFIG_SMP
cpufreq: powernv: del_timer_sync when global and local pstate are equal
cpufreq: powernv: Move smp_call_function_any() out of irq safe block
intel_pstate: Clean up intel_pstate_get()
cpufreq: schedutil: Make it depend on CONFIG_SMP
cpufreq: governor: Fix handling of special cases in dbs_update()
cpufreq: intel_pstate: Ignore _PPC processing under HWP
cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table
cpufreq: tango: Use generic platdev driver
cpufreq: Fix GOV_LIMITS handling for the userspace governor
cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/
cpufreq: dt: Kill platform-data
mvebu: Use dev_pm_opp_set_sharing_cpus() to mark OPP tables as shared
cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2
cpufreq: governor: Change confusing struct field and variable names
cpufreq: intel_pstate: Enable PPC enforcement for servers
...

Rafael J. Wysocki
2016-05-16 20:30:43 +0800
04cafed7f locking/rwsem: Fix down_write_killable() ... Browse Code »

The new signal_pending exit path in __rwsem_down_write_failed_common()
was fingered as breaking his kernel by Tetsuo Handa.

Upon inspection it was found that there are two things wrong with it;

- it forgets to remove WAITING_BIAS if it leaves the list empty, or
- it forgets to wake further waiters that were blocked on the now
removed waiter.

Especially the first issue causes new lock attempts to block and stall
indefinitely, as the code assumes that pending waiters mean there is
an owner that will wake when it releases the lock.

Reported-by: Tetsuo Handa
Tested-by: Tetsuo Handa
Tested-by: Michal Hocko
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Andrew Morton
Cc: Arnaldo Carvalho de Melo
Cc: Chris Zankel
Cc: David S. Miller
Cc: Davidlohr Bueso
Cc: H. Peter Anvin
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Max Filippov
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vince Weaver
Cc: Waiman Long
Link: http://lkml.kernel.org/r/20160512115745.GP3192@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-05-16 04:55:00 +0800
909b27f70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next'
because we no longer have a per-netns conntrack hash.

The ip_gre.c conflict as well as the iwlwifi ones were cases of
overlapping changes.

Conflicts:
drivers/net/wireless/intel/iwlwifi/mvm/tx.c
net/ipv4/ip_gre.c
net/netfilter/nf_conntrack_core.c

Signed-off-by: David S. Miller

David S. Miller
2016-05-16 01:32:48 +0800

14 May, 2016

1 commit

1410b74e4 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup fixes from Tejun Heo:
"During v4.6-rc1 cgroup namespace support was merged. There is an
issue where it's impossible to tell whether a given cgroup mount point
is bind mounted or namespaced. Serge has been working on the issue
but it took longer than expected to resolve, so the late pull request.

Given that it's a completely new feature and the patches don't touch
anything else, the risk seems acceptable. However, if this is too
late, an alternative is plugging new cgroup ns creation for v4.6 and
retrying for v4.7"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix compile warning
kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
kernfs_path_from_node_locked: don't overwrite nlen

Linus Torvalds
2016-05-14 07:26:46 +0800