Eric Lee / smarc-fsl-linux-kernel

19 Sep, 2013

2 commits

9d2cd7048 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fix from Ingo Molnar:
"An NTP related lockup fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix HRTICK related deadlock from ntp lock changes

Linus Torvalds
2013-09-19 00:24:49 +0800
7e28b2712 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Misc fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix comment for sched_info_depart
sched/Documentation: Update sched-design-CFS.txt documentation
sched/debug: Take PID namespace into account
sched/fair: Fix small race where child->se.parent,cfs_rq might point to invalid ones

Linus Torvalds
2013-09-19 00:23:32 +0800

16 Sep, 2013

1 commit

13b62e46d sched: Fix comment for sched_info_depart ... Browse Code »

sched_info_depart seems to be only called from
sched_info_switch(), so only on involuntary task switch.

Fix the comment to match.

Signed-off-by: Michael S. Tsirkin
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: KOSAKI Motohiro
Link: http://lkml.kernel.org/r/20130916083036.GA1113@redhat.com
Signed-off-by: Ingo Molnar

Michael S. Tsirkin
2013-09-16 17:18:34 +0800

14 Sep, 2013

1 commit

9bf12df31 Merge git://git.kvack.org/~bcrl/aio-next ... Browse Code »

Pull aio changes from Ben LaHaise:
"First off, sorry for this pull request being late in the merge window.
Al had raised a couple of concerns about 2 items in the series below.
I addressed the first issue (the race introduced by Gu's use of
mm_populate()), but he has not provided any further details on how he
wants to rework the anon_inode.c changes (which were sent out months
ago but have yet to be commented on).

The bulk of the changes have been sitting in the -next tree for a few
months, with all the issues raised being addressed"

* git://git.kvack.org/~bcrl/aio-next: (22 commits)
aio: rcu_read_lock protection for new rcu_dereference calls
aio: fix race in ring buffer page lookup introduced by page migration support
aio: fix rcu sparse warnings introduced by ioctx table lookup patch
aio: remove unnecessary debugging from aio_free_ring()
aio: table lookup: verify ctx pointer
staging/lustre: kiocb->ki_left is removed
aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
aio: convert the ioctx list to table lookup v3
aio: double aio_max_nr in calculations
aio: Kill ki_dtor
aio: Kill ki_users
aio: Kill unneeded kiocb members
aio: Kill aio_rw_vect_retry()
aio: Don't use ctx->tail unnecessarily
aio: io_cancel() no longer returns the io_event
aio: percpu ioctx refcount
aio: percpu reqs_available
aio: reqs_active -> reqs_available
aio: fix build when migration is disabled
...

Linus Torvalds
2013-09-14 01:55:58 +0800

13 Sep, 2013

12 commits

0244ad004 Remove GENERIC_HARDIRQ config option ... Browse Code »

After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2013-09-13 21:09:52 +0800
ac4de9543 Merge branch 'akpm' (patches from Andrew Morton) ... Browse Code »

Merge more patches from Andrew Morton:
"The rest of MM. Plus one misc cleanup"

* emailed patches from Andrew Morton : (35 commits)
mm/Kconfig: add MMU dependency for MIGRATION.
kernel: replace strict_strto*() with kstrto*()
mm, thp: count thp_fault_fallback anytime thp fault fails
thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
thp: do_huge_pmd_anonymous_page() cleanup
thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
mm: cleanup add_to_page_cache_locked()
thp: account anon transparent huge pages into NR_ANON_PAGES
truncate: drop 'oldsize' truncate_pagecache() parameter
mm: make lru_add_drain_all() selective
memcg: document cgroup dirty/writeback memory statistics
memcg: add per cgroup writeback pages accounting
memcg: check for proper lock held in mem_cgroup_update_page_stat
memcg: remove MEMCG_NR_FILE_MAPPED
memcg: reduce function dereference
memcg: avoid overflow caused by PAGE_ALIGN
memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
memcg: correct RESOURCE_MAX to ULLONG_MAX
mm: memcg: do not trap chargers with full callstack on OOM
mm: memcg: rework and document OOM waiting and wakeup
...

Linus Torvalds
2013-09-13 06:44:27 +0800
6072ddc85 kernel: replace strict_strto*() with kstrto*() ... Browse Code »

The usage of strict_strto*() is not preferred, because strict_strto*() is
obsolete. Thus, kstrto*() should be used.

Signed-off-by: Jingoo Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jingoo Han
2013-09-13 06:38:03 +0800
1a36e59d4 memcg: reduce function dereference ... Browse Code »

This function dereferences res far too often, so optimize it.

Signed-off-by: Sha Zhengju
Signed-off-by: Qiang Huang
Acked-by: Michal Hocko
Cc: Daisuke Nishimura
Cc: Jeff Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sha Zhengju
2013-09-13 06:38:02 +0800
3af335167 memcg: avoid overflow caused by PAGE_ALIGN ... Browse Code »

Since PAGE_ALIGN is aligning up(the next page boundary), so after
PAGE_ALIGN, the value might be overflow, such as write the MAX value to
*.limit_in_bytes.

$ cat /cgroup/memory/memory.limit_in_bytes
18446744073709551615

# echo 18446744073709551615 > /cgroup/memory/memory.limit_in_bytes
bash: echo: write error: Invalid argument

Some user programs might depend on such behaviours(like libcg, we read
the value in snapshot, then use the value to reset cgroup later), and
that will cause confusion. So we need to fix it.

Signed-off-by: Sha Zhengju
Signed-off-by: Qiang Huang
Acked-by: Michal Hocko
Cc: Daisuke Nishimura
Cc: Jeff Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sha Zhengju
2013-09-13 06:38:02 +0800
6de5a8bfc memcg: rename RESOURCE_MAX to RES_COUNTER_MAX ... Browse Code »

RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX.

Signed-off-by: Sha Zhengju
Signed-off-by: Qiang Huang
Acked-by: Michal Hocko
Cc: Daisuke Nishimura
Cc: Jeff Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sha Zhengju
2013-09-13 06:38:02 +0800
26935fb06 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile 4 from Al Viro:
"list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
super: fix for destroy lrus
list_lru: dynamically adjust node arrays
shrinker: Kill old ->shrink API.
shrinker: convert remaining shrinkers to count/scan API
staging/lustre/libcfs: cleanup linux-mem.h
staging/lustre/ptlrpc: convert to new shrinker API
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
staging/lustre/ldlm: convert to shrinkers to count/scan API
hugepage: convert huge zero page shrinker to new shrinker API
i915: bail out earlier when shrinker cannot acquire mutex
drivers: convert shrinkers to new count/scan API
fs: convert fs shrinkers to new scan/count API
xfs: fix dquot isolation hang
xfs-convert-dquot-cache-lru-to-list_lru-fix
xfs: convert dquot cache lru to list_lru
xfs: rework buffer dispose list tracking
xfs-convert-buftarg-lru-to-generic-code-fix
xfs: convert buftarg LRU to generic code
fs: convert inode and dentry shrinking to be node aware
vmscan: per-node deferred work
...

Linus Torvalds
2013-09-13 06:01:38 +0800
02b9735c1 Merge tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull ACPI and power management fixes from Rafael Wysocki:
"All of these commits are fixes that have emerged recently and some of
them fix bugs introduced during this merge window.

Specifics:

1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events

After the recent ACPIPHP changes we've seen some interesting
breakage on a system that triggers device check notifications
during boot for non-existing devices. Although those
notifications are really spurious, we should be able to deal with
them nevertheless and that shouldn't introduce too much overhead.
Four commits to make that work properly.

2) Memory hotplug and hibernation mutual exclusion rework

This was maent to be a cleanup, but it happens to fix a classical
ABBA deadlock between system suspend/hibernation and ACPI memory
hotplug which is possible if they are started roughly at the same
time. Three commits rework memory hotplug so that it doesn't
acquire pm_mutex and make hibernation use device_hotplug_lock
which prevents it from racing with memory hotplug.

3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix

The ACPI LPSS driver crashes during boot on Apple Macbook Air with
Haswell that has slightly unusual BIOS configuration in which one
of the LPSS device's _CRS method doesn't return all of the
information expected by the driver. Fix from Mika Westerberg, for
stable.

4) ACPICA fix related to Store->ArgX operation

AML interpreter fix for obscure breakage that causes AML to be
executed incorrectly on some machines (observed in practice).
From Bob Moore.

5) ACPI core fix for PCI ACPI device objects lookup

There still are cases in which there is more than one ACPI device
object matching a given PCI device and we don't choose the one
that the BIOS expects us to choose, so this makes the lookup take
more criteria into account in those cases.

6) Fix to prevent cpuidle from crashing in some rare cases

If the result of cpuidle_get_driver() is NULL, which can happen on
some systems, cpuidle_driver_ref() will crash trying to use that
pointer and the Daniel Fu's fix prevents that from happening.

7) cpufreq fixes related to CPU hotplug

Stephen Boyd reported a number of concurrency problems with
cpufreq related to CPU hotplug which are addressed by a series of
fixes from Srivatsa S Bhat and Viresh Kumar.

8) cpufreq fix for time conversion in time_in_state attribute

Time conversion carried out by cpufreq when user space attempts to
read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state
won't work correcty if cputime_t doesn't map directly to jiffies.
Fix from Andreas Schwab.

9) Revert of a troublesome cpufreq commit

Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) was intended to address some known concurrency
problems in cpufreq related to the ordering of transitions, but
unfortunately it introduced several problems of its own, so I
decided to revert it now and address the original problems later
in a more robust way.

10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle.

11) cpufreq fixes related to system suspend/resume

The recent cpufreq changes that made it preserve CPU sysfs
attributes over suspend/resume cycles introduced a possible NULL
pointer dereference that caused it to crash during the second
attempt to suspend. Three commits from Srivatsa S Bhat fix that
problem and a couple of related issues.

12) cpufreq locking fix

cpufreq_policy_restore() should acquire the lock for reading, but
it acquires it for writing. Fix from Lan Tianyu"

* tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
cpufreq: Restructure if/else block to avoid unintended behavior
cpufreq: Fix crash in cpufreq-stats during suspend/resume
intel_pstate: Add Haswell CPU models
Revert "cpufreq: make sure frequency transitions are serialized"
cpufreq: Use signed type for 'ret' variable, to store negative error values
cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
cpufreq: Split __cpufreq_remove_dev() into two parts
cpufreq: Fix wrong time unit conversion
cpufreq: serialize calls to __cpufreq_governor()
cpufreq: don't allow governor limits to be changed when it is disabled
ACPI / bind: Prefer device objects with _STA to those without it
ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks
ACPI / hotplug / PCI: Use _OST to notify firmware about notify status
ACPI / hotplug / PCI: Avoid doing too much for spurious notifies
ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field.
ACPI / hotplug / PCI: Don't trim devices before scanning the namespace
...

Linus Torvalds
2013-09-13 02:22:45 +0800
75acebf24 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Various fixes.

The -g perf report lockup you reported is only partially addressed,
patches that fix the excessive runtime are still being worked on"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix uncore PCI fixed counter handling
uprobes: Fix utask->depth accounting in handle_trampoline()
perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING
perf: Fix up MMAP2 buffer space reservation
perf tools: Add attr->mmap2 support
perf kvm: Fix sample_type manipulation
perf evlist: Fix id pos in perf_evlist__open()
perf trace: Handle perf.data files with no tracepoints
perf session: Separate progress bar update when processing events
perf trace: Check if MAP_32BIT is defined
perf hists: Fix formatting of long symbol names
perf evlist: Fix parsing with no sample_id_all bit set
perf tools: Add test for parsing with no sample_id_all bit
perf trace: Check control+C more often

Linus Torvalds
2013-09-13 01:44:54 +0800
b55ee2816 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fix from Ingo Molnar:
"Performance regression fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix load balancing performance regression in should_we_balance()

Linus Torvalds
2013-09-13 01:44:13 +0800
fc840914e sched/debug: Take PID namespace into account ... Browse Code »

Emmanuel reported that /proc/sched_debug didn't report the right PIDs
when using namespaces, cure this.

Reported-by: Emmanuel Deloget
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20130909110141.GM31370@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-09-13 01:14:16 +0800
6c9a27f5d sched/fair: Fix small race where child->se.parent,cfs_rq might point to invalid ones ... Browse Code »

There is a small race between copy_process() and cgroup_attach_task()
where child->se.parent,cfs_rq points to invalid (old) ones.

parent doing fork() | someone moving the parent to another cgroup
-------------------------------+---------------------------------------------
copy_process()
+ dup_task_struct()
-> parent->se is copied to child->se.
se.parent,cfs_rq of them point to old ones.

cgroup_attach_task()
+ cgroup_task_migrate()
-> parent->cgroup is updated.
+ cpu_cgroup_attach()
+ sched_move_task()
+ task_move_group_fair()
+- set_task_rq()
-> se.parent,cfs_rq of parent
are updated.

+ cgroup_fork()
-> parent->cgroup is copied to child->cgroup. (*1)
+ sched_fork()
+ task_fork_fair()
-> se.parent,cfs_rq of child are accessed
while they point to old ones. (*2)

In the worst case, this bug can lead to "use-after-free" and cause a panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).

In fact, a panic caused by this bug was originally caught in RHEL6.4.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] sched_slice+0x6e/0xa0
[...]
Call Trace:
[] place_entity+0x75/0xa0
[] task_fork_fair+0xaa/0x160
[] sched_fork+0x6b/0x140
[] copy_process+0x5b2/0x1450
[] ? wake_up_new_task+0xd9/0x130
[] do_fork+0x94/0x460
[] ? sys_wait4+0xae/0x100
[] sys_clone+0x28/0x30
[] stub_clone+0x13/0x20
[] ? system_call_fastpath+0x16/0x1b

Signed-off-by: Daisuke Nishimura
Signed-off-by: Peter Zijlstra
Cc:
Link: http://lkml.kernel.org/r/039601ceae06$733d3130$59b79390$@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar

Daisuke Nishimura
2013-09-13 01:14:14 +0800

12 Sep, 2013

24 commits

878b5a6ef uprobes: Fix utask->depth accounting in handle_trampoline() ... Browse Code »

Currently utask->depth is simply the number of allocated/pending
return_instance's in uprobe_task->return_instances list.

handle_trampoline() should decrement this counter every time we
handle/free an instance, but due to typo it does this only if
->chained == T. This means that in the likely case this counter
is never decremented and the probed task can't report more than
MAX_URETPROBE_DEPTH events.

Reported-by: Mikhail Kulemin
Reported-by: Hemant Kumar Shaw
Signed-off-by: Oleg Nesterov
Acked-by: Anton Arapov
Cc: masami.hiramatsu.pt@hitachi.com
Cc: srikar@linux.vnet.ibm.com
Cc: systemtap@sourceware.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20130911154726.GA8093@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-09-12 14:00:55 +0800
7bd360144 timekeeping: Fix HRTICK related deadlock from ntp lock changes ... Browse Code »

Gerlando Falauto reported that when HRTICK is enabled, it is
possible to trigger system deadlocks. These were hard to
reproduce, as HRTICK has been broken in the past, but seemed
to be connected to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[ 15.849182] ntpd/4062 is trying to acquire lock:
[ 15.849765] (&(&pool->lock)->rlock){..-...}, at: [] __queue_work+0x145/0x480
[ 15.850051]
[ 15.850051] but task is already holding lock:
[ 15.850051] (timekeeper_lock){-.-.-.}, at: [] do_adjtimex+0x7f/0x100

[ 15.850051] Chain exists of: &(&pool->lock)->rlock --> &p->pi_lock --> timekeeper_lock
[ 15.850051] Possible unsafe locking scenario:
[ 15.850051]
[ 15.850051] CPU0 CPU1
[ 15.850051] ---- ----
[ 15.850051] lock(timekeeper_lock);
[ 15.850051] lock(&p->pi_lock);
[ 15.850051] lock(timekeeper_lock);
[ 15.850051] lock(&(&pool->lock)->rlock);
[ 15.850051]
[ 15.850051] *** DEADLOCK ***

The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping:
Hold timekeepering locks in do_adjtimex and hardpps") in 3.10

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto
Tested-by: Lin Ming
Signed-off-by: John Stultz
Cc: Mathieu Desnoyers
Cc: stable #3.11, 3.10
Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2013-09-12 13:49:51 +0800
6723734cd panic: call panic handlers before kmsg_dump ... Browse Code »

Since the panic handlers may produce additional information (via printk)
for the kernel log, it should be reported as part of the panic output
saved by kmsg_dump(). Without this re-ordering, nothing that adds
information to a panic will show up in pstore's view when kmsg_dump runs,
and is therefore not visible to crash reporting tools that examine pstore
output.

Signed-off-by: Kees Cook
Cc: Anton Vorontsov
Cc: Colin Cross
Acked-by: Tony Luck
Cc: Stephen Boyd
Cc: Vikram Mulukutla
Cc: Peter Zijlstra
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-09-12 06:59:30 +0800
80c74f6a4 kexec: remove unnecessary return ... Browse Code »

Code can not run here forever, so remove the unnecessary return.

Signed-off-by: Xishi Qiu
Suggested-by: Zhang Yanfei
Reviewed-by: Simon Horman
Reviewed-by: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2013-09-12 06:59:10 +0800
73af963f9 __ptrace_may_access() should not deny sub-threads ... Browse Code »

__ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
current, this can can lead to surprising results.

For example, a sub-thread can't readlink("/proc/self/exe") if the
executable is not readable. setup_new_exec()->would_dump() notices that
inode_permission(MAY_READ) fails and then it does
set_dumpable(suid_dumpable). After that get_dumpable() fails.

(It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
could add PTRACE_MODE_NODUMPABLE)

Change __ptrace_may_access() to use same_thread_group() instead of "task
== current". Any security check is pointless when the tasks share the
same ->mm.

Signed-off-by: Mark Grondona
Signed-off-by: Ben Woodard
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Grondona
2013-09-12 06:59:01 +0800
af96397de kprobes: allow to specify custom allocator for insn caches ... Browse Code »

The current two insn slot caches both use module_alloc/module_free to
allocate and free insn slot cache pages.

For s390 this is not sufficient since there is the need to allocate insn
slots that are either within the vmalloc module area or within dma memory.

Therefore add a mechanism which allows to specify an own allocator for an
own insn slot cache.

Signed-off-by: Heiko Carstens
Acked-by: Masami Hiramatsu
Cc: Ananth N Mavinakayanahalli
Cc: Ingo Molnar
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2013-09-12 06:58:52 +0800
c802d64a3 kprobes: unify insn caches ... Browse Code »

The current kpropes insn caches allocate memory areas for insn slots
with module_alloc(). The assumption is that the kernel image and module
area are both within the same +/- 2GB memory area.

This however is not true for s390 where the kernel image resides within
the first 2GB (DMA memory area), but the module area is far away in the
vmalloc area, usually somewhere close below the 4TB area.

For new pc relative instructions s390 needs insn slots that are within
+/- 2GB of each area. That way we can patch displacements of
pc-relative instructions within the insn slots just like x86 and
powerpc.

The module area works already with the normal insn slot allocator,
however there is currently no way to get insn slots that are within the
first 2GB on s390 (aka DMA area).

Therefore this patch set modifies the kprobes insn slot cache code in
order to allow to specify a custom allocator for the insn slot cache
pages. In addition architecure can now have private insn slot caches
withhout the need to modify common code.

Patch 1 unifies and simplifies the current insn and optinsn caches
implementation. This is a preparation which allows to add more
insn caches in a simple way.

Patch 2 adds the possibility to specify a custom allocator.

Patch 3 makes s390 use the new insn slot mechanisms and adds support for
pc-relative instructions with long displacements.

This patch (of 3):

The two insn caches (insn, and optinsn) each have an own mutex and
alloc/free functions (get_[opt]insn_slot() / free_[opt]insn_slot()).

Since there is the need for yet another insn cache which satifies dma
allocations on s390, unify and simplify the current implementation:

- Move the per insn cache mutex into struct kprobe_insn_cache.
- Move the alloc/free functions to kprobe.h so they are simply
wrappers for the generic __get_insn_slot/__free_insn_slot functions.
The implementation is done with a DEFINE_INSN_CACHE_OPS() macro
which provides the alloc/free functions for each cache if needed.
- move the struct kprobe_insn_cache to kprobe.h which allows to generate
architecture specific insn slot caches outside of the core kprobes
code.

Signed-off-by: Heiko Carstens
Cc: Masami Hiramatsu
Cc: Ananth N Mavinakayanahalli
Cc: Ingo Molnar
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2013-09-12 06:58:52 +0800
892f6668f task_work: documentation ... Browse Code »

No functional changes, just comments.

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:58:27 +0800
205e550a0 task_work: minor cleanups ... Browse Code »

Trivial. Remove the unnecessary "work = NULL" initialization and turn
read_barrier_depends() into smp_read_barrier_depends() in
task_work_cancel().

Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:58:26 +0800
202da4005 kernel/smp.c: quit unconditionally enabling irqs in on_each_cpu_mask(). ... Browse Code »

As in commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in
!SMP version of on_each_cpu()"), we don't want to enable irqs if they
are not already enabled.

I don't know of any bugs currently caused by this unconditional
local_irq_enable(), but I want to use this function in MIPS/OCTEON early
boot (when we have early_boot_irqs_disabled). This also makes this
function have similar semantics to on_each_cpu() which is good in
itself.

Signed-off-by: David Daney
Cc: Gilad Ben-Yossef
Cc: Christoph Lameter
Cc: Chris Metcalf
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Daney
2013-09-12 06:58:25 +0800
e656a6341 extable: skip sorting if the table is empty ... Browse Code »

At least on ARM no-MMU the extable is empty and so there is nothing to
sort. So add a check for the table to be empty which effectively only
changes that the misleading pr_notice is suppressed.

Signed-off-by: Uwe Kleine-König
Cc: Ingo Molnar
Cc: David Daney
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Uwe Kleine-König
2013-09-12 06:58:25 +0800
bff2dc42b smp.h: move !SMP version of on_each_cpu() out-of-line ... Browse Code »

All of the other non-trivial !SMP versions of functions in smp.h are
out-of-line in up.c. Move on_each_cpu() there as well.

This allows us to get rid of the #include . The
drawback is that this makes both the x86_64 and i386 defconfig !SMP
kernels about 200 bytes larger each.

Signed-off-by: David Daney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Daney
2013-09-12 06:58:25 +0800
081192b25 up.c: use local_irq_{save,restore}() in smp_call_function_single. ... Browse Code »

The SMP version of this function doesn't unconditionally enable irqs, so
neither should this !SMP version. There are no know problems caused by
this, but we make the change for consistency's sake.

Signed-off-by: David Daney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Daney
2013-09-12 06:58:25 +0800
fa688207c smp: quit unconditionally enabling irq in on_each_cpu_mask and on_each_cpu_cond ... Browse Code »

As in commit f21afc25f9ed ("smp.h: Use local_irq_{save,restore}() in
!SMP version of on_each_cpu()"), we don't want to enable irqs if they
are not already enabled. There are currently no known problematical
callers of these functions, but since it is a known failure pattern, we
preemptively fix them.

Since they are not trivial functions, make them non-inline by moving
them to up.c. This also makes it so we don't have to fix #include
dependancies for preempt_{disable,enable}.

Signed-off-by: David Daney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Daney
2013-09-12 06:58:23 +0800
c14c338cb kernel/spinlock.c: add default arch_*_relax definitions for GENERIC_LOCKBREAK ... Browse Code »

When running with GENERIC_LOCKBREAK=y, the locking implementations emit
calls to arch_{read,write,spin}_relax when spinning on a contended lock
in order to allow architectures to favour the CPU owning the lock if
possible.

In reality, everybody apart from PowerPC and S390 just does cpu_relax()
here, so make that the default behaviour and allow it to be overridden
if required.

Signed-off-by: Will Deacon
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Will Deacon
2013-09-12 06:58:21 +0800
60c323699 kernel/smp.c: free related resources when failure occurs in hotplug_cfd() ... Browse Code »

When failure occurs in hotplug_cfd(), need release related resources, or
will cause memory leak.

Signed-off-by: Chen Gang
Acked-by: Wang YanQing
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen Gang
2013-09-12 06:58:21 +0800
54a33b1b1 kernel/modsign_pubkey.c: fix init const for module signing code ... Browse Code »

const has to use __initconst, not __initdata

Signed-off-by: Andi Kleen
Acked-by: David Howells
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2013-09-12 06:58:21 +0800
3ddc5b46a kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user() ... Browse Code »

I found the following pattern that leads in to interesting findings:

grep -r "ret.*|=.*__put_user" *
grep -r "ret.*|=.*__get_user" *
grep -r "ret.*|=.*__copy" *

The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
since those appear in compat code, we could probably expect the kernel
addresses not to be reachable in the lower 32-bit range, so I think they
might not be exploitable.

For the "__get_user" cases, I don't think those are exploitable: the worse
that can happen is that the kernel will copy kernel memory into in-kernel
buffers, and will fail immediately afterward.

The alpha csum_partial_copy_from_user() seems to be missing the
access_ok() check entirely. The fix is inspired from x86. This could
lead to information leak on alpha. I also noticed that many architectures
map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
wonder if the latter is performing the access checks on every
architectures.

Signed-off-by: Mathieu Desnoyers
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Jens Axboe
Cc: Oleg Nesterov
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2013-09-12 06:58:18 +0800
86cdb465c mm: prepare to remove /proc/sys/vm/hugepages_treat_as_movable ... Browse Code »

Now hugepage migration is enabled, although restricted on pmd-based
hugepages for now (due to lack of testing.) So we should allocate
migratable hugepages from ZONE_MOVABLE if possible.

This patch makes GFP flags in hugepage allocation dependent on migration
support, not only the value of hugepages_treat_as_movable. It provides no
change on the behavior for architectures which do not support hugepage
migration,

Signed-off-by: Naoya Horiguchi
Acked-by: Andi Kleen
Reviewed-by: Wanpeng Li
Cc: Hillf Danton
Cc: Mel Gorman
Cc: Hugh Dickins
Cc: KOSAKI Motohiro
Cc: Michal Hocko
Cc: Rik van Riel
Cc: "Aneesh Kumar K.V"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2013-09-12 06:57:49 +0800
c33bc315f mm: use zone_end_pfn() instead of zone_start_pfn+spanned_pages ... Browse Code »

Use "zone_end_pfn()" instead of "zone->zone_start_pfn + zone->spanned_pages".
Simplify the code, no functional change.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Xishi Qiu
Cc: Cody P Schafer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2013-09-12 06:57:36 +0800
ef0855d33 mm: mempolicy: turn vma_set_policy() into vma_dup_policy() ... Browse Code »

Simple cleanup. Every user of vma_set_policy() does the same work, this
looks a bit annoying imho. And the new trivial helper which does
mpol_dup() + vma_set_policy() to simplify the callers.

Signed-off-by: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Andi Kleen
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:57:00 +0800
40a0d32d1 fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks ... Browse Code »

do_fork() denies CLONE_THREAD | CLONE_PARENT if NEWUSER | NEWPID.

Then later copy_process() denies CLONE_SIGHAND if the new process will
be in a different pid namespace (task_active_pid_ns() doesn't match
current->nsproxy->pid_ns).

This looks confusing and inconsistent. CLONE_NEWPID is very similar to
the case when ->pid_ns was already unshared, we want the same
restrictions so copy_process() should also nack CLONE_PARENT.

And it would be better to deny CLONE_NEWUSER && CLONE_SIGHAND as well
just for consistency.

Kill the "CLONE_NEWUSER | CLONE_NEWPID" check in do_fork() and change
copy_process() to do the same check along with ->pid_ns check we already
have.

Signed-off-by: Oleg Nesterov
Acked-by: Andy Lutomirski
Cc: "Eric W. Biederman"
Cc: Colin Walters
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:56:20 +0800
5167246a8 pidns: kill the unnecessary CLONE_NEWPID in copy_process() ... Browse Code »

Commit 8382fcac1b81 ("pidns: Outlaw thread creation after
unshare(CLONE_NEWPID)") nacks CLONE_NEWPID if the forking process
unshared pid_ns. This is correct but unnecessary, copy_pid_ns() does
the same check.

Remove the CLONE_NEWPID check to cleanup the code and prepare for the
next change.

Test-case:

static int child(void *arg)
{
return 0;
}

static char stack[16 * 1024];

int main(void)
{
pid_t pid;

assert(unshare(CLONE_NEWUSER | CLONE_NEWPID) == 0);

pid = clone(child, stack + sizeof(stack) / 2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid < 0 && errno == EINVAL);

return 0;
}

clone(CLONE_NEWPID) correctly fails with or without this change.

Signed-off-by: Oleg Nesterov
Acked-by: Andy Lutomirski
Cc: "Eric W. Biederman"
Cc: Colin Walters
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:56:19 +0800
e79f525e9 pidns: fix vfork() after unshare(CLONE_NEWPID) ... Browse Code »

Commit 8382fcac1b81 ("pidns: Outlaw thread creation after
unshare(CLONE_NEWPID)") nacks CLONE_VM if the forking process unshared
pid_ns, this obviously breaks vfork:

int main(void)
{
assert(unshare(CLONE_NEWUSER | CLONE_NEWPID) == 0);
assert(vfork() >= 0);
_exit(0);
return 0;
}

fails without this patch.

Change this check to use CLONE_SIGHAND instead. This also forbids
CLONE_THREAD automatically, and this is what the comment implies.

We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it
would be safer to not do this. The current check denies CLONE_SIGHAND
implicitely and there is no reason to change this.

Eric said "CLONE_SIGHAND is fine. CLONE_THREAD would be even better.
Having shared signal handling between two different pid namespaces is
the case that we are fundamentally guarding against."

Signed-off-by: Oleg Nesterov
Reported-by: Colin Walters
Acked-by: Andy Lutomirski
Reviewed-by: "Eric W. Biederman"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-09-12 06:56:19 +0800