Eric Lee / smarc-fsl-linux-kernel

22 Aug, 2012

1 commit

f341861fb task_work: add a scheduling point in task_work_run() ... Browse Code »

It seems commit 4a9d4b024a31 ("switch fput to task_work_add") re-
introduced the problem addressed in 944be0b22472 ("close_files(): add
scheduling point")

If a server process with a lot of files (say 2 million tcp sockets) is
killed, we can spend a lot of time in task_work_run() and trigger a soft
lockup.

Signed-off-by: Eric Dumazet
Signed-off-by: Linus Torvalds

Eric Dumazet
2012-08-22 00:11:44 +0800

21 Aug, 2012

1 commit

53795ced6 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix migration thread runtime bogosity
sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled
sched,cgroup: Fix up task_groups list
sched: fix divide by zero at {thread_group,task}_times
sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies

Linus Torvalds
2012-08-21 01:35:05 +0800

19 Aug, 2012

2 commits

90785be31 Merge branch 'alpha' (alpha architecture patches) ... Browse Code »

Merge alpha architecture update from Michael Cree:
"The Alpha Maintainer, Matt Turner, is currently unavailable, so I have
collected up patches that have been posted to the linux-alpha mailing
list over the last couple of months, and are forwarding them to you in
the hope that you are prepared to accept them via me.

The patches by Al Viro and myself I have been running against kernels
for two months now so have had quite a bit of testing. All except one
patch were intended for the 3.5 kernel but because of Matt's
unavailability never got forwarded to you."

* emailed patches from Michael Cree : (9 commits)
alpha: Fix fall-out from disintegrating asm/system.h
Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts
alpha: fix fpu.h usage in userspace
alpha/mm/fault.c: Port OOM changes to do_page_fault
alpha: take kernel_execve() out of entry.S
alpha: take a bunch of syscalls into osf_sys.c
alpha: Use new generic strncpy_from_user() and strnlen_user()
alpha: Wire up cross memory attach syscalls
alpha: Don't export SOCK_NONBLOCK to user space.

Linus Torvalds
2012-08-19 23:41:29 +0800
be53db6e4 alpha: take a bunch of syscalls into osf_sys.c ... Browse Code »

New helper: current_thread_info(). Allows to do a bunch of odd syscalls
in C. While we are at it, there had never been a reason to do
osf_getpriority() in assembler. We also get "namespace"-aware (read:
consistent with getuid(2), etc.) behaviour from getx?id() syscalls now.

Signed-off-by: Al Viro
Signed-off-by: Michael Cree
Acked-by: Matt Turner
Signed-off-by: Linus Torvalds

Al Viro
2012-08-19 23:41:19 +0800

14 Aug, 2012

5 commits

8f6189684 sched: Fix migration thread runtime bogosity ... Browse Code »

Make stop scheduler class do the same accounting as other classes,

Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained ->se.exec_start. The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.

%CPU PID USER CMD
99.3 45 root [migration/10]
97.7 53 root [migration/12]
97.0 57 root [migration/13]
90.1 49 root [migration/11]
89.6 65 root [migration/15]
88.7 17 root [migration/3]
80.4 37 root [migration/8]
78.1 41 root [migration/9]
44.2 13 root [migration/2]

Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner

Mike Galbraith
2012-08-14 00:41:55 +0800
e221d028b sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled ... Browse Code »

Root task group bandwidth replenishment must service all CPUs, regardless of
where the timer was last started, and regardless of the isolation mechanism,
lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.

Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner

Mike Galbraith
2012-08-14 00:41:55 +0800
35cf4e50b sched,cgroup: Fix up task_groups list ... Browse Code »

With multiple instances of task_groups, for_each_rt_rq() is a noop,
no task groups having been added to the rt.c list instance. This
renders __enable/disable_runtime() and print_rt_stats() noop, the
user (non) visible effect being that rt task groups are missing in
/proc/sched_debug.

Signed-off-by: Mike Galbraith
Cc: stable@kernel.org # v3.3+
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner

Mike Galbraith
2012-08-14 00:41:54 +0800
bea6832cc sched: fix divide by zero at {thread_group,task}_times ... Browse Code »

On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

PID: 2331 TASK: ffff880472814b00 CPU: 2 COMMAND: "oraagent.bin"
#0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
#1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
#2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
#3 [ffff880472a51cd0] die at ffffffff8100f26b
#4 [ffff880472a51d00] do_trap at ffffffff814f03f4
#5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
#6 [ffff880472a51e00] divide_error at ffffffff8100be7b
[exception RIP: thread_group_times+0x56]
RIP: ffffffff81056a16 RSP: ffff880472a51eb8 RFLAGS: 00010046
RAX: bc3572c9fe12d194 RBX: ffff880874150800 RCX: 0000000110266fad
RDX: 0000000000000000 RSI: ffff880472a51eb8 RDI: 001038ae7d9633dc
RBP: ffff880472a51ef8 R8: 00000000b10a3a64 R9: ffff880874150800
R10: 00007fcba27ab680 R11: 0000000000000202 R12: ffff880472a51f08
R13: ffff880472a51f10 R14: 0000000000000000 R15: 0000000000000007
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
#8 [ffff880472a51f40] sys_times at ffffffff81088524
#9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
RIP: 0000003808caac3a RSP: 00007fcba27ab6d8 RFLAGS: 00000202
RAX: 0000000000000064 RBX: ffffffff8100b0f2 RCX: 0000000000000000
RDX: 00007fcba27ab6e0 RSI: 000000000076d58e RDI: 00007fcba27ab6e0
RBP: 00007fcba27ab700 R8: 0000000000000020 R9: 000000000000091b
R10: 00007fcba27ab680 R11: 0000000000000202 R12: 00007fff9ca41940
R13: 0000000000000000 R14: 00007fcba27ac9c0 R15: 00007fff9ca41940
ORIG_RAX: 0000000000000064 CS: 0033 SS: 002b

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner

Stanislaw Gruszka
2012-08-14 00:41:54 +0800
a35b6466a sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies ... Browse Code »

Peter Portante reported that for large cgroup hierarchies (and or on
large CPU counts) we get immense lock contention on rq->lock and stuff
stops working properly.

His workload was a ton of processes, each in their own cgroup,
everybody idling except for a sporadic wakeup once every so often.

It was found that:

schedule()
idle_balance()
load_balance()
local_irq_save()
double_rq_lock()
update_h_load()
walk_tg_tree(tg_load_down)
tg_load_down()

Results in an entire cgroup hierarchy walk under rq->lock for every
new-idle balance and since new-idle balance isn't throttled this
results in a lot of work while holding the rq->lock.

This patch does two things, it removes the work from under rq->lock
based on the good principle of race and pray which is widely employed
in the load-balancer as a whole. And secondly it throttles the
update_h_load() calculation to max once per jiffy.

I considered excluding update_h_load() for new-idle balance
all-together, but purely relying on regular balance passes to update
this data might not work out under some rare circumstances where the
new-idle busiest isn't the regular busiest for a while (unlikely, but
a nightmare to debug if someone hits it and suffers).

Cc: pjt@google.com
Cc: Larry Woodman
Cc: Mike Galbraith
Reported-by: Peter Portante
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2012-08-14 00:41:54 +0800

13 Aug, 2012

2 commits

e4e139beb Merge tag 'pm-for-3.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management fixes from Rafael J. Wysocki:

- Fix for two recent regressions in the generic PM domains framework.

- Revert of a commit that introduced a resume regression and is
conceptually incorrect in my opinion.

- Fix for a return value in pcc-cpufreq.c from Julia Lawall.

- RTC wakeup signaling fix from Neil Brown.

- Suppression of compiler warnings for CONFIG_PM_SLEEP unset in ACPI,
platform/x86 and TPM drivers.

* tag 'pm-for-3.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tpm_tis / PM: Fix unused function warning for CONFIG_PM_SLEEP
platform / x86 / PM: Fix unused function warnings for CONFIG_PM_SLEEP
ACPI / PM: Fix unused function warnings for CONFIG_PM_SLEEP
Revert "NMI watchdog: fix for lockup detector breakage on resume"
PM: Make dev_pm_get_subsys_data() always return 0 on success
drivers/cpufreq/pcc-cpufreq.c: fix error return code
RTC: Avoid races between RTC alarm wakeup and suspend.

Linus Torvalds
2012-08-13 02:34:09 +0800
e3756477a printk: Fix calculation of length used to discard records ... Browse Code »

While tracking down a weird buffer overflow issue in a program that
looked to be sane, I started double checking the length returned by
syslog(SYSLOG_ACTION_READ_ALL, ...) to make sure it wasn't overflowing
the buffer.

Sure enough, it was. I saw this in strace:

11339 syslog(SYSLOG_ACTION_READ_ALL, "[244017.708129] REISERFS (dev"..., 8192) = 8279

It turns out that the loops that calculate how much space the entries
will take when they're copied don't include the newlines and prefixes
that will be included in the final output since prev flags is passed as
zero.

This patch properly accounts for it and fixes the overflow.

CC: stable@kernel.org
Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2012-08-13 02:25:50 +0800

09 Aug, 2012

1 commit

300d3739e Revert "NMI watchdog: fix for lockup detector breakage on resume" ... Browse Code »

Revert commit 45226e9 (NMI watchdog: fix for lockup detector breakage
on resume) which breaks resume from system suspend on my SH7372
Mackerel board (by causing a NULL pointer dereference to happen) and
is generally wrong, because it abuses the CPU hotplug functionality
in a shamelessly blatant way.

The original issue should be addressed through appropriate syscore
resume callback instead.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2012-08-09 02:49:45 +0800

05 Aug, 2012

1 commit

1d17d1748 time: Fix adjustment cleanup bug in timekeeping_adjust() ... Browse Code »

Tetsuo Handa reported that sporadically the system clock starts
counting up too quickly which is enough to confuse the hangcheck
timer to print a bogus stall warning.

Commit 2a8c0883 "time: Move xtime_nsec adjustment underflow handling
timekeeping_adjust" overlooked this exit path:

} else
return;

which should really be a proper exit sequence, fixing the bug as a
side effect.

Also make the flow more readable by properly balancing curly
braces.

Reported-by: Tetsuo Handa wrote:
Tested-by: Tetsuo Handa wrote:
Signed-off-by: Ingo Molnar
Cc: john.stultz@linaro.org
Cc: a.p.zijlstra@chello.nl
Cc: richardcochran@gmail.com
Cc: prarit@redhat.com
Link: http://lkml.kernel.org/r/20120804192114.GA28347@gmail.com
Signed-off-by: Ingo Molnar

Ingo Molnar
2012-08-05 18:37:14 +0800

04 Aug, 2012

6 commits

c4e62d678 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull futex fixes from Ingo Molnar:
"A couple of futex fixes from Darren Hart: two bugs reported by Dave
Jones (found with his trinity test) and Dan Carpenter through static
analysis. The third found while debugging the first two."

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
futex: Fix bug in WARN_ON for NULL q.pi_state
futex: Test for pi_mutex on fault in futex_wait_requeue_pi()

Linus Torvalds
2012-08-04 02:00:26 +0800
ddc5057c1 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Ingo Molnar:
"One regression fix, and a couple of cleanups that clean up the code
flow in areas that had high-profile bugs recently."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Remove all direct references to timekeeper
time: Clean up offs_real/wall_to_mono and offs_boot/total_sleep_time updates
time: Clean up stray newlines
time/jiffies: Rename ACTHZ to SHIFTED_HZ
time/jiffies: Allow CLOCK_TICK_RATE to be undefined
time: Fix casting issue in tk_set_xtime and tk_xtime_add

Linus Torvalds
2012-08-04 01:58:57 +0800
fcc1d2a9c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Fixes and two late cleanups"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cleanups: Add load balance cpumask pointer to 'struct lb_env'
sched: Fix comment about PREEMPT_ACTIVE bit location
sched: Fix minor code style issues
sched: Use task_rq_unlock() in __sched_setscheduler()
sched/numa: Add SD_PERFER_SIBLING to CPU domain

Linus Torvalds
2012-08-04 01:58:13 +0800
bd463a060 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Fix merge window fallout and fix sleep profiling (this was always
broken, so it's not a fix for the merge window - we can skip this one
from the head of the tree)."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/trace: Add ability to set a target task for events
perf/x86: Fix USER/KERNEL tagging of samples properly
perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit

Linus Torvalds
2012-08-04 01:57:20 +0800
148311d2a Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull irq fix from Ingo Molnar.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Allow irq chips to mark themself oneshot safe

Linus Torvalds
2012-08-04 01:56:44 +0800
d97e1dcde Merge tag 'for_linux-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb ... Browse Code »

Pull KGDB/KDB/usb-dbgp fixes and cleanups from Jason Wessel:
"There are no new features, those will be delayed to the 3.7 window.
There are only fixes/cleanup against the usual kernel churn and we are
removing more lines than we add:

- usb-dbgp - increase the controller wait time to come out of halt.
- kdb - Remove unused KDB_FLAG_ONLY_DO_DUMP code and cpu in more prompt
- debug core - pass NMI type on archs that provide NMI types"

* tag 'for_linux-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
USB: echi-dbgp: increase the controller wait time to come out of halt.
kernel/debug: Make use of KGDB_REASON_NMI
kdb: Remove cpu from the more prompt
kdb: Remove unused KDB_FLAG_ONLY_DO_DUMP

Linus Torvalds
2012-08-04 01:53:47 +0800

02 Aug, 2012

1 commit

a0e881b7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.

Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...

Linus Torvalds
2012-08-02 01:26:23 +0800

01 Aug, 2012

9 commits

2d5349262 Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6 ... Browse Code »

Pull irqdomain changes from Grant Likely:
"Round of refactoring and enhancements to irq_domain infrastructure.
This series starts the process of simplifying irqdomain. The ultimate
goal is to merge LEGACY, LINEAR and TREE mappings into a single
system, but had to back off from that after some last minute bugs.
Instead it mainly reorganizes the code and ensures that the reverse
map gets populated when the irq is mapped instead of the first time it
is looked up.

Merging of the irq_domain types is deferred to v3.7

In other news, this series adds helpers for creating static mappings
on a linear or tree mapping."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
irqdomain: Improve diagnostics when a domain mapping fails
irqdomain: eliminate slow-path revmap lookups
irqdomain: Fix irq_create_direct_mapping() to test irq_domain type.
irqdomain: Eliminate dedicated radix lookup functions
irqdomain: Support for static IRQ mapping and association.
irqdomain: Always update revmap when setting up a virq
irqdomain: Split disassociating code into separate function
irq_domain: correct a minor wrong comment for linear revmap
irq_domain: Standardise legacy/linear domain selection
irqdomain: Make ops->map hook optional
irqdomain: Remove unnecessary test for IRQ_DOMAIN_MAP_LEGACY
irqdomain: Simple NUMA awareness.
devicetree: add helper inline for retrieving a node's full name

Linus Torvalds
2012-08-01 11:44:03 +0800
ac694dbdb Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers

* emailed patches from Andrew Morton : (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...

Linus Torvalds
2012-08-01 10:25:39 +0800
3e9a97082 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random subsystem patches from Ted Ts'o:
"This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.

The goal is to addresses weaknesses discussed in the paper "Mining
your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
which will be published in the Proceedings of the 21st Usenix Security
Symposium, August 2012. (See https://factorable.net for more
information and an extended version of the paper.)"

Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
random: mix in architectural randomness in extract_buf()
dmi: Feed DMI table to /dev/random driver
random: Add comment to random_initialize()
random: final removal of IRQF_SAMPLE_RANDOM
um: remove IRQF_SAMPLE_RANDOM which is now a no-op
sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
[ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
...

Linus Torvalds
2012-08-01 10:07:42 +0800
907aed48f mm: allow PF_MEMALLOC from softirq context ... Browse Code »

This is needed to allow network softirq packet processing to make use of
PF_MEMALLOC.

Currently softirq context cannot use PF_MEMALLOC due to it not being
associated with a task, and therefore not having task flags to fiddle with
- thus the gfp to alloc flag mapping ignores the task flags when in
interrupts (hard or soft) context.

Allowing softirqs to make use of PF_MEMALLOC therefore requires some
trickery. This patch borrows the task flags from whatever process happens
to be preempted by the softirq. It then modifies the gfp to alloc flags
mapping to not exclude task flags in softirq context, and modify the
softirq code to save, clear and restore the PF_MEMALLOC flag.

The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't
leak into the softirq. The restore ensures a softirq's PF_MEMALLOC flag
cannot leak back into the preempted process. This should be safe due to
the following reasons

Softirqs can run on multiple CPUs sure but the same task should not be
executing the same softirq code. Neither should the softirq
handler be preempted by any other softirq handler so the flags
should not leak to an unrelated softirq.

Softirqs re-enable hardware interrupts in __do_softirq() so can be
preempted by hardware interrupts so PF_MEMALLOC is inherited
by the hard IRQ. However, this is similar to a process in
reclaim being preempted by a hardirq. While PF_MEMALLOC is
set, gfp_to_alloc_flags() distinguishes between hard and
soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS
flag.

If the softirq is deferred to ksoftirq then its flags may be used
instead of a normal tasks but as the softirq cannot be preempted,
the PF_MEMALLOC flag does not leak to other code by accident.

[davem@davemloft.net: Document why PF_MEMALLOC is safe]
Signed-off-by: Peter Zijlstra
Signed-off-by: Mel Gorman
Cc: David Miller
Cc: Neil Brown
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-08-01 09:42:45 +0800
9adb62a5d mm/hotplug: correctly setup fallback zonelists when creating new pgdat ... Browse Code »

When hotadd_new_pgdat() is called to create new pgdat for a new node, a
fallback zonelist should be created for the new node. There's code to try
to achieve that in hotadd_new_pgdat() as below:

/*
* The node we allocated has no zone fallback lists. For avoiding
* to access not-initialized zonelist, build here.
*/
mutex_lock(&zonelists_mutex);
build_all_zonelists(pgdat, NULL);
mutex_unlock(&zonelists_mutex);

But it doesn't work as expected. When hotadd_new_pgdat() is called, the
new node is still in offline state because node_set_online(nid) hasn't
been called yet. And build_all_zonelists() only builds zonelists for
online nodes as:

for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);

build_zonelists(pgdat);
build_zonelist_cache(pgdat);
}

Though we hope to create zonelist for the new pgdat, but it doesn't. So
add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
the new pgdat too.

Signed-off-by: Jiang Liu
Signed-off-by: Xishi Qiu
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Minchan Kim
Cc: Rusty Russell
Cc: Yinghai Lu
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: David Rientjes
Cc: Keping Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2012-08-01 09:42:44 +0800
c255a4580 memcg: rename config variables ... Browse Code »

Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Tejun Heo
Cc: Aneesh Kumar K.V
Cc: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-08-01 09:42:43 +0800
3965c9ae4 mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads ... Browse Code »

Since per-BDI flusher threads were introduced in 2.6, the pdflush
mechanism is not used any more. But the old interface exported through
/proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.

For back-compatibility, printk warning information and return 2 to notify
the users that the interface is removed.

Signed-off-by: Wanpeng Li
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-08-01 09:42:40 +0800
44de9d0ca mm: account the total_vm in the vm_stat_account() ... Browse Code »

vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now.
But we can also account for total_vm in the vm_stat_account() which makes
the code tidy.

Even for mprotect_fixup(), we can get the right result in the end.

Signed-off-by: Huang Shijie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Shijie
2012-08-01 09:42:39 +0800
bca1a5c0e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes
updates/cleanups/fixes from Oleg and diverse tooling updates (mostly
fixes) now that Arnaldo is back from vacation."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
uprobes: __replace_page() needs munlock_vma_page()
uprobes: Rename vma_address() and make it return "unsigned long"
uprobes: Fix register_for_each_vma()->vma_address() check
uprobes: Introduce vaddr_to_offset(vma, vaddr)
uprobes: Teach build_probe_list() to consider the range
uprobes: Remove insert_vm_struct()->uprobe_mmap()
uprobes: Remove copy_vma()->uprobe_mmap()
uprobes: Fix overflow in vma_address()/find_active_uprobe()
uprobes: Suppress uprobe_munmap() from mmput()
uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()
uprobes: Clean up and document write_opcode()->lock_page(old_page)
uprobes: Kill write_opcode()->lock_page(new_page)
uprobes: __replace_page() should not use page_address_in_vma()
uprobes: Don't recheck vma/f_mapping in write_opcode()
perf/x86: Fix missing struct before structure name
perf/x86: Fix format definition of SNB-EP uncore QPI box
perf/x86: Make bitfield unsigned
perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
perf/x86: Add Intel Nehalem-EX uncore support
perf/x86: Fix typo in format definition of uncore PCU filter
...

Linus Torvalds
2012-08-01 06:34:13 +0800

31 Jul, 2012

11 commits

4e250fdde time: Remove all direct references to timekeeper ... Browse Code »

Ingo noted that the numerous timekeeper.value references made
the timekeeping code ugly and caused many long lines that
had to be broken up. He recommended replacing timekeeper.value
references with tk->value.

This patch provides a local tk value for all top level time
functions and sets it to &timekeeper. Then all timekeeper
access is done via a tk pointer.

Signed-off-by: John Stultz
Cc: Prarit Bhargava
Link: http://lkml.kernel.org/r/1343414893-45779-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2012-07-31 23:09:14 +0800
6d0ef903e time: Clean up offs_real/wall_to_mono and offs_boot/total_sleep_time updates ... Browse Code »

For performance reasons, we maintain ktime_t based duplicates of
wall_to_monotonic (offs_real) and total_sleep_time (offs_boot).

Since large problems could occur (such as the resume regression
on 3.5-rc7, or the leapsecond hrtimer issue) if these value
pairs were to be inconsistently updated, this patch this cleans
up how we modify these value pairs to ensure we are always
consistent.

As a side-effect this is also more efficient as we only
caulculate the duplicate values when they are changed,
rather then every update_wall_time call.

This also provides WARN_ONs to detect if future changes break
the invariants.

Signed-off-by: John Stultz
Cc: Peter Zijlstra
Cc: Richard Cochran
Cc: Prarit Bhargava
Link: http://lkml.kernel.org/r/1343414893-45779-5-git-send-email-john.stultz@linaro.org
[ Cleaned up minor style issues. ]
Signed-off-by: Ingo Molnar

John Stultz
2012-07-31 23:09:14 +0800
d4e3ab384 time: Clean up stray newlines ... Browse Code »

Ingo noted inconsistent newline usage between functions.
This patch cleans those up.

Signed-off-by: John Stultz
Cc: Prarit Bhargava
Link: http://lkml.kernel.org/r/1343414893-45779-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2012-07-31 23:09:13 +0800
02ab20ae3 time/jiffies: Rename ACTHZ to SHIFTED_HZ ... Browse Code »

Ingo noted that ACTHZ is a confusing name, and requested it
be renamed, so this patch renames ACTHZ to SHIFTED_HZ to
better describe it.

Signed-off-by: John Stultz
Cc: Prarit Bhargava
Link: http://lkml.kernel.org/r/1343414893-45779-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2012-07-31 23:09:12 +0800
1f815faec Merge branch 'linus' into timers/urgent ... Browse Code »

Merge in Linus's branch which already has timers/core merged.

Signed-off-by: Ingo Molnar

Ingo Molnar
2012-07-31 23:05:27 +0800
e6dab5ffa perf/trace: Add ability to set a target task for events ... Browse Code »

A few events are interesting not only for a current task.
For example, sched_stat_* events are interesting for a task
which wakes up. For this reason, it will be good if such
events will be delivered to a target task too.

Now a target task can be set by using __perf_task().

The original idea and a draft patch belongs to Peter Zijlstra.

I need these events for profiling sleep times. sched_switch is used for
getting callchains and sched_stat_* is used for getting time periods.
These events are combined in user space, then it can be analyzed by
perf tools.

Inspired-by: Peter Zijlstra
Cc: Steven Rostedt
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Cc: Arun Sharma
Signed-off-by: Andrew Vagin
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar

Andrew Vagin
2012-07-31 23:02:05 +0800
b9403130a sched/cleanups: Add load balance cpumask pointer to 'struct lb_env' ... Browse Code »

With this patch struct ld_env will have a pointer of the load balancing
cpumask and we don't need to pass a cpumask around anymore.

Signed-off-by: Michael Wang
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/4FFE8665.3080705@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

Michael Wang
2012-07-31 23:00:16 +0800
b10d22d6e kernel/debug: Make use of KGDB_REASON_NMI ... Browse Code »

Currently kernel never set KGDB_REASON_NMI. We do now, when we enter
KGDB/KDB from an NMI.

This is not to be confused with kgdb_nmicallback(), NMI callback is
an entry for the slave CPUs during CPUs roundup, but REASON_NMI is the
entry for the master CPU.

Signed-off-by: Anton Vorontsov
Signed-off-by: Jason Wessel

Anton Vorontsov
2012-07-31 21:16:43 +0800
07cd27bbd kdb: Remove cpu from the more prompt ... Browse Code »

Having the CPU in the more prompt is completely redundent vs the
standard kdb prompt, and it also wastes 32 bytes on the stack.

Signed-off-by: Jason Wessel

Jason Wessel
2012-07-31 21:16:43 +0800
0f26d0e0a kdb: Remove unused KDB_FLAG_ONLY_DO_DUMP ... Browse Code »

This code cleanup was missed in the original kdb merge, and this code
is simply not used at all. The code that was previously used to set
the KDB_FLAG_ONLY_DO_DUMP was removed prior to the initial kdb merge.

Signed-off-by: Jason Wessel

Jason Wessel
2012-07-31 21:16:42 +0800
65fed8f6f resource: make sure requested range is included in the root range ... Browse Code »

When the requested range is outside of the root range the logic in
__reserve_region_with_split will cause an infinite recursion which will
overflow the stack as seen in the warning bellow.

This particular stack overflow was caused by requesting the
(100000000-107ffffff) range while the root range was (0-ffffffff). In
this case __request_resource would return the whole root range as
conflict range (i.e. 0-ffffffff). Then, the logic in
__reserve_region_with_split would continue the recursion requesting the
new range as (conflict->end+1, end) which incidentally in this case
equals the originally requested range.

This patch aborts looking for an usable range when the request does not
intersect with the root range. When the request partially overlaps with
the root range, it ajust the request to fall in the root range and then
continues with the new request.

When the request is modified or aborted errors and a stack trace are
logged to allow catching the errors in the upper layers.

[ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
[ 5.975150] Modules linked in:
[ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
[ 5.985324] Call Trace:
[ 5.987759] [] ? console_unlock+0x17b/0x18d
[ 5.992891] [] warn_slowpath_common+0x48/0x5d
[ 5.998194] [] ? sub_preempt_count+0x63/0x89
[ 6.003412] [] warn_slowpath_null+0xf/0x13
[ 6.008453] [] sub_preempt_count+0x63/0x89
[ 6.013499] [] _raw_spin_unlock+0x27/0x3f
[ 6.018453] [] add_partial+0x36/0x3b
[ 6.022973] [] deactivate_slab+0x96/0xb4
[ 6.027842] [] __slab_alloc.isra.54.constprop.63+0x204/0x241
[ 6.034456] [] ? kzalloc.constprop.5+0x29/0x38
[ 6.039842] [] ? kzalloc.constprop.5+0x29/0x38
[ 6.045232] [] kmem_cache_alloc_trace+0x51/0xb0
[ 6.050710] [] ? kzalloc.constprop.5+0x29/0x38
[ 6.056100] [] kzalloc.constprop.5+0x29/0x38
[ 6.061320] [] __reserve_region_with_split+0x1c/0xd1
[ 6.067230] [] __reserve_region_with_split+0xc6/0xd1
...
[ 7.179057] [] __reserve_region_with_split+0xc6/0xd1
[ 7.184970] [] reserve_region_with_split+0x30/0x42
[ 7.190709] [] e820_reserve_resources_late+0xd1/0xe9
[ 7.196623] [] pcibios_resource_survey+0x23/0x2a
[ 7.202184] [] pcibios_init+0x23/0x35
[ 7.206789] [] pci_subsys_init+0x3f/0x44
[ 7.211659] [] do_one_initcall+0x72/0x122
[ 7.216615] [] ? pci_legacy_init+0x3d/0x3d
[ 7.221659] [] kernel_init+0xa6/0x118
[ 7.226265] [] ? start_kernel+0x334/0x334
[ 7.231223] [] kernel_thread_helper+0x6/0x10

Signed-off-by: Octavian Purdila
Signed-off-by: Ram Pai
Cc: Jesse Barnes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Octavian Purdila
2012-07-31 08:25:21 +0800