Eric Lee / smarc-fsl-linux-kernel

13 Jul, 2015

2 commits

7b732169e Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Thomas Gleixner:
"This update from the timer departement contains:

- A series of patches which address a shortcoming in the tick
broadcast code.

If the broadcast device is not available or an hrtimer emulated
broadcast device, some of the original assumptions lead to boot
failures. I rather plugged all of the corner cases instead of only
addressing the issue reported, so the change got a little larger.

Has been extensivly tested on x86 and arm.

- Get rid of the last holdouts using do_posix_clock_monotonic_gettime()

- A regression fix for the imx clocksource driver

- An update to the new state callbacks mechanism for clockevents.
This is required to simplify the conversion, which will take place
in 4.3"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Prevent NULL pointer dereference
time: Get rid of do_posix_clock_monotonic_gettime
cris: Replace do_posix_clock_monotonic_gettime()
tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
tick/broadcast: Handle spurious interrupts gracefully
tick/broadcast: Check for hrtimer broadcast active early
tick/broadcast: Return busy when IPI is pending
tick/broadcast: Return busy if periodic mode and hrtimer broadcast
tick/broadcast: Move the check for periodic mode inside state handling
tick/broadcast: Prevent deep idle if no broadcast device available
tick/broadcast: Make idle check independent from mode and config
tick/broadcast: Sanity check the shutdown of the local clock_event
tick/broadcast: Prevent hrtimer recursion
clockevents: Allow set-state callbacks to be optional
clocksource/imx: Define clocksource for mx27

Linus Torvalds
2015-07-13 00:36:59 +0800
c4bc680cf Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq fix from Thomas Gleixner:
"A single fix for a cpu hotplug race vs. interrupt descriptors:

Prevent irq setup/teardown across the cpu starting/dying parts of cpu
hotplug so that the starting/dying cpu has a stable view of the
descriptor space. This has been an issue for all architectures in the
cpu dying phase, where interrupts are migrated away from the dying
cpu. In the starting phase its mostly a x86 issue vs the vector space
update"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hotplug: Prevent alloc/free of irq descriptors during cpu up/down

Linus Torvalds
2015-07-13 00:15:02 +0800

11 Jul, 2015

1 commit

c4d029f2d tick/broadcast: Prevent NULL pointer dereference ... Browse Code »

Dan reported that the recent changes to the broadcast code introduced
a potential NULL dereference.

Add the proper check.

Fixes: e0454311903d "tick/broadcast: Sanity check the shutdown of the local clock_event"
Reported-by: Dan Carpenter
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2015-07-11 20:26:34 +0800

09 Jul, 2015

2 commits

758556bdc module: Fix load_module() error path ... Browse Code »

The load_module() error path frees a module but forgot to take it out
of the mod_tree, leaving a dangling entry in the tree, causing havoc.

Cc: Mathieu Desnoyers
Reported-by: Arthur Marsh
Tested-by: Arthur Marsh
Fixes: 93c2e105f6bc ("module: Optimize __module_address() using a latched RB-tree")
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Rusty Russell

Peter Zijlstra
2015-07-09 05:27:12 +0800
45820c294 Fix broken audit tests for exec arg len ... Browse Code »

The "fix" in commit 0b08c5e5944 ("audit: Fix check of return value of
strnlen_user()") didn't fix anything, it broke things. As reported by
Steven Rostedt:

"Yes, strnlen_user() returns 0 on fault, but if you look at what len is
set to, than you would notice that on fault len would be -1"

because we just subtracted one from the return value. So testing
against 0 doesn't test for a fault condition, it tests against a
perfectly valid empty string.

Also fix up the usual braindamage wrt using WARN_ON() inside a
conditional - make it part of the conditional and remove the explicit
unlikely() (which is already part of the WARN_ON*() logic, exactly so
that you don't have to write unreadable code.

Reported-and-tested-by: Steven Rostedt
Cc: Jan Kara
Cc: Paul Moore
Signed-off-by: Linus Torvalds

Linus Torvalds
2015-07-09 00:33:38 +0800

08 Jul, 2015

10 commits

a89941816 hotplug: Prevent alloc/free of irq descriptors during cpu up/down ... Browse Code »

When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.

When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.

Example:

CPU1 CPU2
cpu_up/down
irq_desc = irq_to_desc(irq);
remove_from_radix_tree(desc);
raw_spin_lock(&desc->lock);
free(desc);

We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:

Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.

Signed-off-by: Thomas Gleixner
Acked-by: Peter Zijlstra
Cc: xiao jin
Cc: Joerg Roedel
Cc: Borislav Petkov
Cc: Yanmin Zhang
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de

Thomas Gleixner
2015-07-08 17:32:25 +0800
c42883348 tick/broadcast: Handle spurious interrupts gracefully ... Browse Code »

Andriy reported that on a virtual machine the warning about negative
expiry time in the clock events programming code triggered:

hpet: hpet0 irq 40 for MSI
hpet: hpet1 irq 41 for MSI
Switching to clocksource hpet
WARNING: at kernel/time/clockevents.c:239

[] clockevents_program_event+0xdb/0xf0
[] tick_handle_periodic_broadcast+0x41/0x50
[] timer_interrupt+0x15/0x20

When the second hpet is installed as a per cpu timer the broadcast
event is not longer required and stopped, which sets the next_evt of
the broadcast device to KTIME_MAX.

If after that a spurious interrupt happens on the broadcast device,
then the current code blindly handles it and tries to reprogram the
broadcast device afterwards, which adds the period to
next_evt. KTIME_MAX + period results in a negative expiry value
causing the WARN_ON in the clockevents code to trigger.

Add a proper check for the state of the broadcast device into the
interrupt handler and return if the interrupt is spurious.

[ Folded in pointer fix from Sudeep ]

Reported-by: Andriy Gapon
Signed-off-by: Thomas Gleixner
Cc: Sudeep Holla
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Link: http://lkml.kernel.org/r/20150705205221.802094647@linutronix.de

Thomas Gleixner
2015-07-08 00:46:48 +0800
d5113e13a tick/broadcast: Check for hrtimer broadcast active early ... Browse Code »

If the current cpu is the one which has the hrtimer based broadcast
queued then we better return busy immediately instead of going through
loops and hoops to figure that out.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:48 +0800
0cc5281aa tick/broadcast: Return busy when IPI is pending ... Browse Code »

Tell the idle code not to go deep if the broadcast IPI is about to
arrive.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:48 +0800
d33257264 tick/broadcast: Return busy if periodic mode and hrtimer broadcast ... Browse Code »

If the system is in periodic mode and the broadcast device is hrtimer
based, return busy as we have no proper handling for this.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:48 +0800
e3ac79e08 tick/broadcast: Move the check for periodic mode inside state handling ... Browse Code »

We need to check more than the periodic mode for proper operation in
all runtime combinations. To avoid code duplication move the check
into the enter state handling.

No functional change.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:47 +0800
b78f3f3c8 tick/broadcast: Prevent deep idle if no broadcast device available ... Browse Code »

Add a check for a installed broadcast device to the oneshot control
function and return busy if not.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:47 +0800
f32dd1170 tick/broadcast: Make idle check independent from mode and config ... Browse Code »

Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.

If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.

Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.

The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.

Provide checks for the trivial cases:

- If broadcast is disabled in the config, then return busy

- If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
busy if the broadcast device is hrtimer based.

- If oneshot mode is enabled in the config call the original
tick_broadcast_oneshot_control() function. That function needs
extra checks which will be implemented in seperate patches.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:47 +0800
e04543119 tick/broadcast: Sanity check the shutdown of the local clock_event ... Browse Code »

The broadcast code shuts down the local clock event unconditionally
even if no broadcast device is installed or if the broadcast device is
hrtimer based.

Add proper sanity checks.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:47 +0800
8eb231261 tick/broadcast: Prevent hrtimer recursion ... Browse Code »

The hrtimer based broadcast vehicle can cause a hrtimer recursion
which went unnoticed until we changed the hrtimer expiry code to keep
track of the currently running timer.

local_timer_interrupt()
local_handler()
hrtimer_interrupt()
expire_hrtimers()
broadcast_hrtimer()
send_ipis()
local_handler()
hrtimer_interrupt()
....

Solution is simple: Prevent the local handler call from the broadcast
code when the broadcast 'device' is hrtimer based.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla
Signed-off-by: Thomas Gleixner
Cc: Suzuki Poulose
Cc: Lorenzo Pieralisi
Cc: Catalin Marinas
Cc: Rafael J. Wysocki
Cc: Peter Zijlstra
Cc: Preeti U Murthy
Cc: Ingo Molnar
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

Thomas Gleixner
2015-07-08 00:46:47 +0800

07 Jul, 2015

1 commit

7c4a976cd clockevents: Allow set-state callbacks to be optional ... Browse Code »

Its mandatory for the drivers to provide set_state_{oneshot|periodic}()
(only if related modes are supported) and set_state_shutdown() callbacks
today, if they are implementing the new set-state interface.

But this leads to unnecessary noop callbacks for drivers which don't
want to implement them. Over that, it will lead to a full function call
for nothing really useful.

Lets make all set-state callbacks optional.

Suggested-by: Daniel Lezcano
Signed-off-by: Viresh Kumar
Signed-off-by: Daniel Lezcano
Link: http://lkml.kernel.org/r/1436256875-15562-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Thomas Gleixner

Viresh Kumar
2015-07-07 16:44:45 +0800

06 Jul, 2015

1 commit

57ffc5ca6 perf: Fix AUX buffer refcounting ... Browse Code »

Its currently possible to drop the last refcount to the aux buffer
from NMI context, which results in the expected fireworks.

The refcounting needs a bigger overhaul, but to cure the immediate
problem, delay the freeing by using an irq_work.

Reviewed-and-tested-by: Alexander Shishkin
Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-07-06 20:08:30 +0800

05 Jul, 2015

2 commits

1dc51b828 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...

Linus Torvalds
2015-07-05 10:36:06 +0800
1b3618b60 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull kvm fixes from Paolo Bonzini:
"Except for the preempt notifiers fix, these are all small bugfixes
that could have been waited for -rc2. Sending them now since I was
taking care of Peter's patch anyway"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: add hyper-v crash msrs values
KVM: x86: remove data variable from kvm_get_msr_common
KVM: s390: virtio-ccw: don't overwrite config space values
KVM: x86: keep track of LVT0 changes under APICv
KVM: x86: properly restore LVT0
KVM: x86: make vapics_in_nmi_mode atomic
sched, preempt_notifier: separate notifier registration from static_key inc/dec

Linus Torvalds
2015-07-05 02:29:59 +0800

04 Jul, 2015

7 commits

22a093b2f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Debug info and other statistics fixes and related enhancements"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/numa: Fix numa balancing stats in /proc/pid/sched
sched/numa: Show numa_group ID in /proc/sched_debug task listings
sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y
sched/stat: Simplify the sched_info accounting dependency

Linus Torvalds
2015-07-04 23:56:53 +0800
397f2378f sched/numa: Fix numa balancing stats in /proc/pid/sched ... Browse Code »

Commit 44dba3d5d6a1 ("sched: Refactor task_struct to use
numa_faults instead of numa_* pointers") modified the way
tsk->numa_faults stats are accounted.

However that commit never touched show_numa_stats() that is displayed
in /proc/pid/sched and thus the numbers displayed in /proc/pid/sched
don't match the actual numbers.

Fix it by making sure that /proc/pid/sched reflects the task
fault numbers. Also add group fault stats too.

Also couple of more modifications are added here:

1. Format changes:

- Previously we would list two entries per node, one for private
and one for shared. Also the home node info was listed in each entry.

- Now preferred node, total_faults and current node are
displayed separately.

- Now there is one entry per node, that lists private,shared task and
group faults.

2. Unit changes:

- p->numa_pages_migrated was getting reset after every read of
/proc/pid/sched. It's more useful to have absolute numbers since
differential migrations between two accesses can be more easily
calculated.

Signed-off-by: Srikar Dronamraju
Acked-by: Rik van Riel
Cc: Iulia Manda
Cc: Linus Torvalds
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1435252903-1081-4-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Srikar Dronamraju
2015-07-04 16:04:33 +0800
e3d24d0a6 sched/numa: Show numa_group ID in /proc/sched_debug task listings ... Browse Code »

Having the numa group ID in /proc/sched_debug helps to see how
the numa groups have spread across the system.

Signed-off-by: Srikar Dronamraju
Acked-by: Rik van Riel
Cc: Iulia Manda
Cc: Linus Torvalds
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1435252903-1081-3-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Srikar Dronamraju
2015-07-04 16:04:32 +0800
6b55c9654 sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h ... Browse Code »

Currently print_cfs_rq() is declared in include/linux/sched.h.
However it's not used outside kernel/sched. Hence move the
declaration to kernel/sched/sched.h

Also some functions are only available for CONFIG_SCHED_DEBUG=y.
Hence move the declarations to within the #ifdef.

Signed-off-by: Srikar Dronamraju
Acked-by: Rik van Riel
Cc: Iulia Manda
Cc: Linus Torvalds
Cc: Mel Gorman
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Srikar Dronamraju
2015-07-04 16:04:31 +0800
f6db83479 sched/stat: Simplify the sched_info accounting dependency ... Browse Code »

Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
sched_info, which results in ugly #if clauses.

Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
switch, selected by both.

Signed-off-by: Naveen N. Rao
Cc: Balbir Singh
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Srikar Dronamraju
Cc: Thomas Gleixner
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Naveen N. Rao
2015-07-04 16:04:30 +0800
0cbee9926 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace updates from Eric Biederman:
"Long ago and far away when user namespaces where young it was realized
that allowing fresh mounts of proc and sysfs with only user namespace
permissions could violate the basic rule that only root gets to decide
if proc or sysfs should be mounted at all.

Some hacks were put in place to reduce the worst of the damage could
be done, and the common sense rule was adopted that fresh mounts of
proc and sysfs should allow no more than bind mounts of proc and
sysfs. Unfortunately that rule has not been fully enforced.

There are two kinds of gaps in that enforcement. Only filesystems
mounted on empty directories of proc and sysfs should be ignored but
the test for empty directories was insufficient. So in my tree
directories on proc, sysctl and sysfs that will always be empty are
created specially. Every other technique is imperfect as an ordinary
directory can have entries added even after a readdir returns and
shows that the directory is empty. Special creation of directories
for mount points makes the code in the kernel a smidge clearer about
it's purpose. I asked container developers from the various container
projects to help test this and no holes were found in the set of mount
points on proc and sysfs that are created specially.

This set of changes also starts enforcing the mount flags of fresh
mounts of proc and sysfs are consistent with the existing mount of
proc and sysfs. I expected this to be the boring part of the work but
unfortunately unprivileged userspace winds up mounting fresh copies of
proc and sysfs with noexec and nosuid clear when root set those flags
on the previous mount of proc and sysfs. So for now only the atime,
read-only and nodev attributes which userspace happens to keep
consistent are enforced. Dealing with the noexec and nosuid
attributes remains for another time.

This set of changes also addresses an issue with how open file
descriptors from /proc//ns/* are displayed. Recently readlink of
/proc//fd has been triggering a WARN_ON that has not been
meaningful since it was added (as all of the code in the kernel was
converted) and is not now actively wrong.

There is also a short list of issues that have not been fixed yet that
I will mention briefly.

It is possible to rename a directory from below to above a bind mount.
At which point any directory pointers below the renamed directory can
be walked up to the root directory of the filesystem. With user
namespaces enabled a bind mount of the bind mount can be created
allowing the user to pick a directory whose children they can rename
to outside of the bind mount. This is challenging to fix and doubly
so because all obvious solutions must touch code that is in the
performance part of pathname resolution.

As mentioned above there is also a question of how to ensure that
developers by accident or with purpose do not introduce exectuable
files on sysfs and proc and in doing so introduce security regressions
in the current userspace that will not be immediately obvious and as
such are likely to require breaking userspace in painful ways once
they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Remove incorrect debugging WARN in prepend_path
mnt: Update fs_fully_visible to test for permanently empty directories
sysfs: Create mountpoints with sysfs_create_mount_point
sysfs: Add support for permanently empty directories to serve as mount points.
kernfs: Add support for always empty directories.
proc: Allow creating permanently empty directories that serve as mount points
sysctl: Allow creating permanently empty directories that serve as mountpoints.
fs: Add helper functions for permanently empty directories.
vfs: Ignore unlocked mounts in fs_fully_visible
mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
mnt: Refactor the logic for mounting sysfs and proc in a user namespace

Linus Torvalds
2015-07-04 06:20:57 +0800
2ecd9d29a sched, preempt_notifier: separate notifier registration from static_key inc/dec ... Browse Code »

Commit 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
had two problems. First, the preempt-notifier API needs to sleep with the
addition of the static_key, we do however need to hold off preemption
while modifying the preempt notifier list, otherwise a preemption could
observe an inconsistent list state. KVM correctly registers and
unregisters preempt notifiers with preemption disabled, so the sleep
caused dmesg splats.

Second, KVM registers and unregisters preemption notifiers very often
(in vcpu_load/vcpu_put). With a single uniprocessor guest the static key
would move between 0 and 1 continuously, hitting the slow path on every
userspace exit.

To fix this, wrap the static_key inc/dec in a new API, and call it from
KVM.

Fixes: 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
Reported-by: Pontus Fuchs
Reported-by: Takashi Iwai
Tested-by: Takashi Iwai
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Paolo Bonzini

Peter Zijlstra
2015-07-04 00:55:00 +0800

03 Jul, 2015

1 commit

7df9ab845 make certificate list change message more useful ... Browse Code »

It's a bug in our Makefile rules, make it show what the changing
certificate list was, and make it a warning so that people actually see
it.

Signed-off-by: Linus Torvalds

Linus Torvalds
2015-07-03 07:42:13 +0800

02 Jul, 2015

5 commits

2d01eedf1 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge third patchbomb from Andrew Morton:

- the rest of MM

- scripts/gdb updates

- ipc/ updates

- lib/ updates

- MAINTAINERS updates

- various other misc things

* emailed patches from Andrew Morton : (67 commits)
genalloc: rename of_get_named_gen_pool() to of_gen_pool_get()
genalloc: rename dev_get_gen_pool() to gen_pool_get()
x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit
MAINTAINERS: add zpool
MAINTAINERS: BCACHE: Kent Overstreet has changed email address
MAINTAINERS: move Jens Osterkamp to CREDITS
MAINTAINERS: remove unused nbd.h pattern
MAINTAINERS: update brcm gpio filename pattern
MAINTAINERS: update brcm dts pattern
MAINTAINERS: update sound soc intel patterns
MAINTAINERS: remove website for paride
MAINTAINERS: update Emulex ocrdma email addresses
bcache: use kvfree() in various places
libcxgbi: use kvfree() in cxgbi_free_big_mem()
target: use kvfree() in session alloc and free
IB/ehca: use kvfree() in ipz_queue_{cd}tor()
drm/nouveau/gem: use kvfree() in u_free()
drm: use kvfree() in drm_free_large()
cxgb4: use kvfree() in t4_free_mem()
cxgb3: use kvfree() in cxgb_free_mem()
...

Linus Torvalds
2015-07-02 08:47:51 +0800
6ac15baac Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Thomas Gleixner:
"This contains:

- a build regression fix introduced by the timeconst move

- a hotplug regression fix introduced by the timer wheel diet

- a cpu hotplug bug fix for the exynos clocksource driver"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Remove development rules from Kbuild/Makefile
timer: Fix hotplug regression
clocksource: exynos_mct: Avoid blocking calls in the cpu hotplug notifier

Linus Torvalds
2015-07-02 06:44:18 +0800
5c3950970 Merge tag 'pm+acpi-4.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes that didn't make it to the previous PM+ACPI pull
request or are fixing issues introduced by it.

Specifics:

- Fix a recently added memory leak in an error path in the ACPI
resources management code (Dan Carpenter)

- Fix a build warning triggered by an ACPI video header function that
should be static inline (Borislav Petkov)

- Change names of helper function converting struct fwnode_handle
pointers to either struct device_node or struct acpi_device
pointers so they don't conflict with local variable names
(Alexander Sverdlin)

- Make the hibernate core re-enable nonboot CPUs on failures to
disable them as expected (Vitaly Kuznetsov)

- Increase the default timeout of the device suspend watchdog to
prevent it from triggering too early on some systems (Takashi Iwai)

- Prevent the cpuidle powernv driver from registering idle states
with CPUIDLE_FLAG_TIMER_STOP set if CONFIG_TICK_ONESHOT is unset
which leads to boot hangs (Preeti U Murthy)"

* tag 'pm+acpi-4.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode
PM / sleep: Increase default DPM watchdog timeout to 60
PM / hibernate: re-enable nonboot cpus on disable_nonboot_cpus() failure
ACPI / OF: Rename of_node() and acpi_node() to to_of_node() and to_acpi_node()
ACPI / video: Inline acpi_video_set_dmi_backlight_type
ACPI / resources: free memory on error in add_region_before()

Linus Torvalds
2015-07-02 05:17:44 +0800
7adf12b87 Merge tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull xen updates from David Vrabel:
"Xen features and cleanups for 4.2-rc0:

- add "make xenconfig" to assist in generating configs for Xen guests

- preparatory cleanups necessary for supporting 64 KiB pages in ARM
guests

- automatically use hvc0 as the default console in ARM guests"

* tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
block/xen-blkback: s/nr_pages/nr_segs/
block/xen-blkfront: Remove invalid comment
block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS
arm/xen: Drop duplicate define mfn_to_virt
xen/grant-table: Remove unused macro SPP
xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
xen: Include xen/page.h rather than asm/xen/page.h
kconfig: add xenconfig defconfig helper
kconfig: clarify kvmconfig is for kvm
xen/pcifront: Remove usage of struct timeval
xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
hvc_xen: avoid uninitialized variable warning
xenbus: avoid uninitialized variable warning
xen/arm: allow console=hvc0 to be omitted for guests
arm,arm64/xen: move Xen initialization earlier
arm/xen: Correctly check if the event channel interrupt is present

Linus Torvalds
2015-07-02 02:53:46 +0800
02201e3f1 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.

A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...

Linus Torvalds
2015-07-02 01:49:25 +0800

01 Jul, 2015

8 commits

f9bb48825 sysfs: Create mountpoints with sysfs_create_mount_point ... Browse Code »

This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/ s390_hypfs
/sys/kernel/config/ configfs
/sys/kernel/debug/ debugfs
/sys/firmware/efi/efivars/ efivarfs
/sys/fs/fuse/connections/ fusectl
/sys/fs/pstore/ pstore
/sys/kernel/tracing/ tracefs
/sys/fs/cgroup/ cgroup
/sys/kernel/security/ securityfs
/sys/fs/selinux/ selinuxfs
/sys/fs/smackfs/ smackfs

Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-07-01 23:36:47 +0800
f9bd6733d sysctl: Allow creating permanently empty directories that serve as mountpoints. ... Browse Code »

Add a magic sysctl table sysctl_mount_point that when used to
create a directory forces that directory to be permanently empty.

Update the code to use make_empty_dir_inode when accessing permanently
empty directories.

Update the code to not allow adding to permanently empty directories.

Update /proc/sys/fs/binfmt_misc to be a permanently empty directory.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-07-01 23:36:39 +0800
65f26062c time: Remove development rules from Kbuild/Makefile ... Browse Code »

time.o gets rebuilt unconditionally due to a leftover Makefile rule
which was placed there for development purposes.

Remove it along with the commented out always rule in the toplevel
Kbuild file.

Fixes: 0a227985d4a9 'time: Move timeconst.h into include/generated'
Reported-by; Stephen Boyd
Signed-off-by: Thomas Gleixner
Cc: Nicholas Mc Guire

Thomas Gleixner
2015-07-01 15:57:35 +0800
200f1ce36 kernel/relay.c: use kvfree() in relay_free_page_array() ... Browse Code »

Use kvfree() instead of open-coding it.

Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2015-07-01 10:44:59 +0800
b389645f0 printk: improve the description of /dev/kmsg line format ... Browse Code »

The comment about /dev/kmsg does not mention the additional values which
may actually be exported, fix that.

Also move up the part of the comment instructing the users to ignore these
additional values, this way the reading is more fluent and logically
compact.

Signed-off-by: Antonio Ospite
Cc: Joe Perches
Cc: Jonathan Corbet
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Antonio Ospite
2015-07-01 10:44:59 +0800
3e44c471a gcov: add support for GCC 5.1 ... Browse Code »

Fix kernel gcov support for GCC 5.1. Similar to commit a992bf836f9
("gcov: add support for GCC 4.9"), this patch takes into account the
existence of a new gcov counter (see gcc's gcc/gcov-counter.def.)

Firstly, it increments GCOV_COUNTERS (to 10), which makes the data
structure struct gcov_info compatible with GCC 5.1.

Secondly, a corresponding counter function __gcov_merge_icall_topn (Top N
value tracking for indirect calls) is included in base.c with the other
gcov counters unused for kernel profiling.

Signed-off-by: Lorenzo Stoakes
Cc: Andrey Ryabinin
Cc: Yuan Pengfei
Tested-by: Peter Oberparleiter
Reviewed-by: Peter Oberparleiter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2015-07-01 10:44:57 +0800
5375b708f kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path ... Browse Code »

Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.

The problem is that the commit overlooks panic_on_oops kernel boot option.
If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.

To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().

Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.

Signed-off-by: HATAYAMA Daisuke
Acked-by: Baoquan He
Tested-by: Hidehiro Kawai
Reviewed-by: Masami Hiramatsu
Cc: Vivek Goyal
Cc: Ingo Molnar
Cc: Hidehiro Kawai
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2015-07-01 10:44:57 +0800
f45d85ff1 kernel/panic: call the 2nd crash_kexec() only if crash_kexec_post_notifiers is enabled ... Browse Code »

For compatibility with the behaviour before the commit f06e5153f4ae2e
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
panic_notifers"), the 2nd crash_kexec() should be called only if
crash_kexec_post_notifiers is enabled.

Note that crash_kexec() returns immediately if kdump crash kernel is not
loaded, so in this case, this patch makes no functionality change, but the
point is to make it explicit, from the caller panic() side, that the 2nd
crash_kexec() does nothing.

Signed-off-by: HATAYAMA Daisuke
Suggested-by: Ingo Molnar
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Masami Hiramatsu
Cc: Hidehiro Kawai
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2015-07-01 10:44:57 +0800