Eric Lee / smarc-fsl-linux-kernel

18 Nov, 2011

1 commit

d5e553d6e trace: Include <asm/asm-offsets.h> in trace_syscalls.c ... Browse Code »

Include into trace_syscalls.c, to allow for
NR_syscalls to be automatically generated.

Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Signed-off-by: H. Peter Anvin

H. Peter Anvin
2011-11-18 05:35:36 +0800

08 Nov, 2011

1 commit

a6f05b97d PM / QoS: Set cpu_dma_pm_qos->name ... Browse Code »

Since commit 4a31a334, the name of this misc device is not initialized,
which leads to a funny device named /dev/(null) being created and
/proc/misc containing an entry with just a number but no name. The latter
leads to complaints by cryptsetup, which caused me to investigate this
matter.

Signed-off-by: Dominik Brodowski
Signed-off-by: Rafael J. Wysocki

Dominik Brodowski
2011-11-08 06:02:24 +0800

07 Nov, 2011

7 commits

b32fc0a06 Merge branch 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux… ... Browse Code »

…/kernel/git/jeremy/xen

* 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
jump-label: initialize jump-label subsystem much earlier
x86/jump_label: add arch_jump_label_transform_static()
s390/jump-label: add arch_jump_label_transform_static()
jump_label: add arch_jump_label_transform_static() to optimise non-live code updates
sparc/jump_label: drop arch_jump_label_text_poke_early()
x86/jump_label: drop arch_jump_label_text_poke_early()
jump_label: if a key has already been initialized, don't nop it out
stop_machine: make stop_machine safe and efficient to call early
jump_label: use proper atomic_t initializer

Conflicts:
- arch/x86/kernel/jump_label.c
Added __init_or_module to arch_jump_label_text_poke_early vs
removal of that function entirely
- kernel/stop_machine.c
same patch ("stop_machine: make stop_machine safe and efficient
to call early") merged twice, with whitespace fix in one version

Linus Torvalds
2011-11-07 12:20:46 +0800
32aaeffbd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h

Linus Torvalds
2011-11-07 11:44:47 +0800
208bca086 Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Add a 'reason' to wb_writeback_work
writeback: send work item to queue_io, move_expired_inodes
writeback: trace event balance_dirty_pages
writeback: trace event bdi_dirty_ratelimit
writeback: fix ppc compile warnings on do_div(long long, unsigned long)
writeback: per-bdi background threshold
writeback: dirty position control - bdi reserve area
writeback: control dirty pause time
writeback: limit max dirty pause time
writeback: IO-less balance_dirty_pages()
writeback: per task dirty rate limit
writeback: stabilize bdi->dirty_ratelimit
writeback: dirty rate control
writeback: add bg_threshold parameter to __bdi_update_bandwidth()
writeback: dirty position control
writeback: account per-bdi accumulated dirtied pages

Linus Torvalds
2011-11-07 11:02:23 +0800
d4a2e61f0 Merge git://github.com/rustyrussell/linux ... Browse Code »

* git://github.com/rustyrussell/linux:
module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree
module: Enable dynamic debugging regardless of taint

Linus Torvalds
2011-11-07 09:28:32 +0800
1197ab294 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
powerpc/p3060qds: Add support for P3060QDS board
powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX
powerpc/85xx: Make kexec to interate over online cpus
powerpc/fsl_booke: Fix comment in head_fsl_booke.S
powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices
powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver
powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver
powerpc/86xx: Correct Gianfar support for GE boards
powerpc/cpm: Clear muram before it is in use.
drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager
powerpc/fsl_msi: add support for "msi-address-64" property
powerpc/85xx: Setup secondary cores PIR with hard SMP id
powerpc/fsl-booke: Fix settlbcam for 64-bit
powerpc/85xx: Adding DCSR node to dtsi device trees
powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards
powerpc/85xx: fix PHYS_64BIT selection for P1022DS
powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
powerpc: respect mem= setting for early memory limit setup
powerpc: Update corenet64_smp_defconfig
powerpc: Update mpc85xx/corenet 32-bit defconfigs
...

Fix up trivial conflicts in:
- arch/powerpc/configs/40x/hcu4_defconfig
removed stale file, edited elsewhere
- arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c:
added opal and gelic drivers vs added ePAPR driver
- drivers/tty/serial/8250.c
moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support

Linus Torvalds
2011-11-07 09:12:03 +0800
2449b8ba0 module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree ... Browse Code »

Use of the GPL or a compatible licence doesn't necessarily make the code
any good. We already consider staging modules to be suspect, and this
should also be true for out-of-tree modules which may receive very
little review.

Signed-off-by: Ben Hutchings
Reviewed-by: Dave Jones
Acked-by: Greg Kroah-Hartman
Signed-off-by: Rusty Russell (patched oops-tracing.txt)

Ben Hutchings
2011-11-07 05:24:42 +0800
1cd0d6c30 module: Enable dynamic debugging regardless of taint ... Browse Code »

Dynamic debugging is currently disabled for tainted modules, except
for TAINT_CRAP. This prevents use of dynamic debugging for
out-of-tree modules once the next patch is applied.

This condition was apparently intended to avoid a crash if a force-
loaded module has an incompatible definition of dynamic debug
structures. However, a administrator that forces us to load a module
is claiming that it *is* compatible even though it fails our version
checks. If they are mistaken, there are any number of ways the module
could crash the system.

As a side-effect, proprietary and other tainted modules can now use
dynamic_debug.

Signed-off-by: Ben Hutchings
Acked-by: Mathieu Desnoyers
Signed-off-by: Rusty Russell

Ben Hutchings
2011-11-07 05:24:40 +0800

05 Nov, 2011

3 commits

d6cc76856 PM / Freezer: Revert 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake T… ... Browse Code »

…ASK_KILLABLE tasks too"

Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake
TASK_KILLABLE tasks too" updated fake_signal_wake_up() used by freezer
to wake up KILLABLE tasks. Sending unsolicited wakeups to tasks in
killable sleep is dangerous as there are code paths which depend on
tasks not waking up spuriously from KILLABLE sleep.

For example. sys_read() or page can sleep in TASK_KILLABLE assuming
that wait/down/whatever _killable can only fail if we can not return
to the usermode. TASK_TRACED is another obvious example.

The previous patch updated wait_event_freezekillable() such that it
doesn't depend on the spurious wakeup. This patch reverts the
offending commit.

Note that the spurious KILLABLE wakeup had other implicit effects in
KILLABLE sleeps in nfs and cifs and those will need further updates to
regain freezekillable behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Tejun Heo
2011-11-05 05:28:15 +0800
6513fd697 PM / QoS: Remove redundant check ... Browse Code »

Remove an "if" check, that repeats an equivalent one 6 lines above.

Signed-off-by: Guennadi Liakhovetski
Signed-off-by: Rafael J. Wysocki

Guennadi Liakhovetski
2011-11-05 05:28:14 +0800
79cfbdfa8 PM / Sleep: Fix race between CPU hotplug and freezer ... Browse Code »
1

The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down()
functions depend on the value of the 'tasks_frozen' argument passed to them
(which indicates whether tasks have been frozen or not).
(Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN,
CPU_DEAD, CPU_DEAD_FROZEN).

Thus, it is essential that while the callbacks for those notifications are
running, the state of the system with respect to the tasks being frozen or
not remains unchanged, *throughout that duration*. Hence there is a need for
synchronizing the CPU hotplug code with the freezer subsystem.

Since the freezer is involved only in the Suspend/Hibernate call paths, this
patch hooks the CPU hotplug code to the suspend/hibernate notifiers
PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent
the race between CPU hotplug and freezer, thus ensuring that CPU hotplug
notifications will always be run with the state of the system really being
what the notifications indicate, _throughout_ their execution time.

Signed-off-by: Srivatsa S. Bhat
Signed-off-by: Rafael J. Wysocki

Srivatsa S. Bhat
2011-11-05 05:28:09 +0800

03 Nov, 2011

6 commits

4536e4d1d Revert "perf: Add PM notifiers to fix CPU hotplug races" ... Browse Code »

This reverts commit 144060fee07e9c22e179d00819c83c86fbcbf82c.

It causes a resume regression for Andi on his Acer Aspire 1830T post
3.1. The screen just stays black after wakeup.

Also, it really looks like the wrong way to suspend and resume perf
events: I think they should be done as part of the CPU suspend and
resume, rather than as a notifier that does smp_call_function().

Reported-by: Andi Kleen
Acked-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-11-03 22:44:04 +0800
c1e2ee2dc memcg: replace ss->id_lock with a rwlock ... Browse Code »
43

While back-porting Johannes Weiner's patch "mm: memcg-aware global
reclaim" for an internal effort, we noticed a significant performance
regression during page-reclaim heavy workloads due to high contention of
the ss->id_lock. This lock protects idr map, and serializes calls to
idr_get_next() in css_get_next() (which is used during the memcg hierarchy
walk).

Since idr_get_next() is just doing a look up, we need only serialize it
with respect to idr_remove()/idr_get_new(). By making the ss->id_lock a
rwlock, contention is greatly reduced and performance improves.

Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
each core (one file + container per core) in parallel on a NUMA machine.
Result is the time for the test to complete in 1 of the containers.
Both kernels included Johannes' memcg-aware global reclaim patches.

Before rwlock patch: 1710.778s
After rwlock patch: 152.227s

Signed-off-by: Andrew Bresticker
Cc: Paul Menage
Cc: Li Zefan
Acked-by: KAMEZAWA Hiroyuki
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Bresticker
2011-11-03 07:07:03 +0800
f1ecf0685 sysctl: add support for poll() ... Browse Code »

Adding support for poll() in sysctl fs allows userspace to receive
notifications of changes in sysctl entries. This adds a infrastructure to
allow files in sysctl fs to be pollable and implements it for hostname and
domainname.

[akpm@linux-foundation.org: s/declare/define/ for definitions]
Signed-off-by: Lucas De Marchi
Cc: Greg KH
Cc: Kay Sievers
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lucas De Marchi
2011-11-03 07:07:02 +0800
89e8a244b cpusets: avoid looping when storing to mems_allowed if one node remains set ... Browse Code »

{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES > BITS_PER_LONG.

This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself. It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example. In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!

The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes. This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.

If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes. Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.

Signed-off-by: David Rientjes
Cc: Miao Xie
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-11-03 07:07:00 +0800
77ceab8ea cgroups: don't attach task to subsystem if migration failed ... Browse Code »

If a task has exited to the point it has called cgroup_exit() already,
then we can't migrate it to another cgroup anymore.

This can happen when we are attaching a task to a new cgroup between the
call to ->can_attach_task() on subsystems and the migration that is
eventually tried in cgroup_task_migrate().

In this case cgroup_task_migrate() returns -ESRCH and we don't want to
attach the task to the subsystems because the attachment to the new cgroup
itself failed.

Fix this by only calling ->attach_task() on the subsystems if the cgroup
migration succeeded.

Reported-by: Oleg Nesterov
Signed-off-by: Ben Blum
Acked-by: Paul Menage
Cc: Li Zefan
Cc: Tejun Heo
Signed-off-by: Frederic Weisbecker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2011-11-03 07:06:59 +0800
33ef6b698 cgroups: more safe tasklist locking in cgroup_attach_proc ... Browse Code »

Fix unstable tasklist locking in cgroup_attach_proc.

According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is
not sufficient to guarantee the tasklist is stable w.r.t. de_thread and
exit. Taking tasklist_lock for reading, instead of rcu_read_lock, ensures
proper exclusion.

Signed-off-by: Ben Blum
Acked-by: Paul Menage
Cc: Oleg Nesterov
Cc: Frederic Weisbecker
Cc: "Paul E. McKenney"
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Blum
2011-11-03 07:06:59 +0800

02 Nov, 2011

1 commit

367069f16 Merge branch 'next/dt' of git://git.linaro.org/people/arnd/arm-soc ... Browse Code »

* 'next/dt' of git://git.linaro.org/people/arnd/arm-soc:
ARM: gic: use module.h instead of export.h
ARM: gic: fix irq_alloc_descs handling for sparse irq
ARM: gic: add OF based initialization
ARM: gic: add irq_domain support
irq: support domains with non-zero hwirq base
of/irq: introduce of_irq_init
ARM: at91: add at91sam9g20 and Calao USB A9G20 DT support
ARM: at91: dt: at91sam9g45 family and board device tree files
arm/mx5: add device tree support for imx51 babbage
arm/mx5: add device tree support for imx53 boards
ARM: msm: Add devicetree support for msm8660-surf
msm_serial: Add devicetree support
msm_serial: Use relative resources for iomem

Fix up conflicts in arch/arm/mach-at91/{at91sam9260.c,at91sam9g45.c}

Linus Torvalds
2011-11-02 12:02:35 +0800

01 Nov, 2011

14 commits

094803e0a Merge branch 'akpm' (Andrew's incoming) ... Browse Code »

Quoth Andrew:

- Most of MM. Still waiting for the poweroc guys to get off their
butts and review some threaded hugepages patches.

- alpha

- vfs bits

- drivers/misc

- a few core kerenl tweaks

- printk() features

- MAINTAINERS updates

- backlight merge

- leds merge

- various lib/ updates

- checkpatch updates

* akpm: (127 commits)
epoll: fix spurious lockdep warnings
checkpatch: add a --strict check for utf-8 in commit logs
kernel.h/checkpatch: mark strict_strto and simple_strto as obsolete
llist-return-whether-list-is-empty-before-adding-in-llist_add-fix
wireless: at76c50x: follow rename pack_hex_byte to hex_byte_pack
fat: follow rename pack_hex_byte() to hex_byte_pack()
security: follow rename pack_hex_byte() to hex_byte_pack()
kgdb: follow rename pack_hex_byte() to hex_byte_pack()
lib: rename pack_hex_byte() to hex_byte_pack()
lib/string.c: fix strim() semantics for strings that have only blanks
lib/idr.c: fix comment for ida_get_new_above()
lib/percpu_counter.c: enclose hotplug only variables in hotplug ifdef
lib/bitmap.c: quiet sparse noise about address space
lib/spinlock_debug.c: print owner on spinlock lockup
lib/kstrtox: common code between kstrto*() and simple_strto*() functions
drivers/leds/leds-lp5521.c: check if reset is successful
leds: turn the blink_timer off before starting to blink
leds: save the delay values after a successful call to blink_set()
drivers/leds/leds-gpio.c: use gpio_get_value_cansleep() when initializing
drivers/leds/leds-lm3530.c: add __devexit_p where needed
...

Linus Torvalds
2011-11-01 08:46:07 +0800
50e1499f4 kgdb: follow rename pack_hex_byte() to hex_byte_pack() ... Browse Code »

There is no functional change.

Signed-off-by: Andy Shevchenko
Acked-by: Jesper Nilsson
Cc: David Howells
Cc: Koichi Yasutake
Cc: Jason Wessel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Shevchenko
2011-11-01 08:30:56 +0800
ae29bc92d printk: remove bounds checking for log_prefix ... Browse Code »

Currently log_prefix is testing that the first character of the log level
and facility is less than '0' and greater than '9' (which is always
false).

Since the code being updated works because strtoul bombs out (endp isn't
updated) and 0 is returned anyway just remove the check and don't change
the behavior of the function.

Signed-off-by: William Douglas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

William Douglas
2011-11-01 08:30:53 +0800
48e41899e printk: fix bounds checking for log_prefix ... Browse Code »

Currently log_prefix is testing that the first character of the log level
and facility is less than '0' and greater than '9' (which is always
false). It should be testing to see if the character less than '0' or
greater than '9' instead. This patch makes that change.

The code being changed worked because strtoul bombs out (endp isn't
updated) and 0 is returned anyway.

Signed-off-by: William Douglas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

William Douglas
2011-11-01 08:30:53 +0800
134620f7a printk: add console_suspend module parameter ... Browse Code »

We are enabling some power features on medfield. To test suspend-2-RAM
conveniently, we need turn on/off console_suspend_enabled frequently.

Add a module parameter, so users could change it by:
/sys/module/printk/parameters/console_suspend

Signed-off-by: Yanmin Zhang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yanmin Zhang
2011-11-01 08:30:53 +0800
0eca6b7c7 printk: add module parameter ignore_loglevel to control ignore_loglevel ... Browse Code »

We are enabling some power features on medfield. To test suspend-2-RAM
conveniently, we need turn on/off ignore_loglevel frequently without
rebooting.

Add a module parameter, so users can change it by:
/sys/module/printk/parameters/ignore_loglevel

Signed-off-by: Yanmin Zhang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yanmin Zhang
2011-11-01 08:30:53 +0800
73efc0394 kernel/sysctl.c: add cap_last_cap to /proc/sys/kernel ... Browse Code »

Userspace needs to know the highest valid capability of the running
kernel, which right now cannot reliably be retrieved from the header files
only. The fact that this value cannot be determined properly right now
creates various problems for libraries compiled on newer header files
which are run on older kernels. They assume capabilities are available
which actually aren't. libcap-ng is one example. And we ran into the
same problem with systemd too.

Now the capability is exported in /proc/sys/kernel/cap_last_cap.

[akpm@linux-foundation.org: make cap_last_cap const, per Ulrich]
Signed-off-by: Dan Ballard
Cc: Randy Dunlap
Cc: Ingo Molnar
Cc: Lennart Poettering
Cc: Kay Sievers
Cc: Ulrich Drepper
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Ballard
2011-11-01 08:30:53 +0800
4ff819515 watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL ... Browse Code »

Fix compilation warnings for CONFIG_SYSCTL=n:

fixed compilation warnings in case of disabled CONFIG_SYSCTL
kernel/watchdog.c:483:13: warning: `watchdog_enable_all_cpus' defined but not used
kernel/watchdog.c:500:13: warning: `watchdog_disable_all_cpus' defined but not used

these functions are static and are used only in sysctl handler, so move
them inside #ifdef CONFIG_SYSCTL too

Signed-off-by: Vasily Averin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasily Averin
2011-11-01 08:30:53 +0800
f445027e4 stop_machine: make stop_machine safe and efficient to call early ... Browse Code »

Make stop_machine() safe to call early in boot, before SMP has been set
up, by simply calling the callback function directly if there's only one
CPU online.

[ Fixes from AKPM:
- add comment
- local_irq_flags, not save_flags
- also call hard_irq_disable() for systems which need it

Tejun suggested using an explicit flag rather than just looking at
the online cpu count. ]

Cc: Tejun Heo
Acked-by: Rusty Russell
Cc: Peter Zijlstra
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Steven Rostedt
Acked-by: Tejun Heo
Cc: Konrad Rzeszutek Wilk
Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
2011-11-01 08:30:53 +0800
bc3e53f68 mm: distinguish between mlocked and pinned pages ... Browse Code »
43

Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".

The difference between mlocking and pinning is:

A. mlocked pages are marked with PG_mlocked and are exempt from
swapping. Page migration may move them around though.
They are kept on a special LRU list.

B. Pinned pages cannot be moved because something needs to
directly access physical memory. They may not be on any
LRU list.

I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:

Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.

This patch introduces a separate counter for pinned pages and
accounts them seperately.

Signed-off-by: Christoph Lameter
Cc: Mike Marciniszyn
Cc: Roland Dreier
Cc: Sean Hefty
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2011-11-01 08:30:46 +0800
c9f01245b oom: remove oom_disable_count ... Browse Code »

This removes mm->oom_disable_count entirely since it's unnecessary and
currently buggy. The counter was intended to be per-process but it's
currently decremented in the exit path for each thread that exits, causing
it to underflow.

The count was originally intended to prevent oom killing threads that
share memory with threads that cannot be killed since it doesn't lead to
future memory freeing. The counter could be fixed to represent all
threads sharing the same mm, but it's better to remove the count since:

- it is possible that the OOM_DISABLE thread sharing memory with the
victim is waiting on that thread to exit and will actually cause
future memory freeing, and

- there is no guarantee that a thread is disabled from oom killing just
because another thread sharing its mm is oom disabled.

Signed-off-by: David Rientjes
Reported-by: Oleg Nesterov
Reviewed-by: Oleg Nesterov
Cc: Ying Han
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-11-01 08:30:45 +0800
fcf634098 Cross Memory Attach ... Browse Code »

The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call. There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
using it:
- Does not allow for specifying iovecs for both src and dest, assuming
preadv or pwritev was implemented either the area read from or
written to would need to be contiguous.
- Currently mem_read allows only processes who are currently
ptrace'ing the target and are still able to ptrace the target to read
from the target. This check could possibly be moved to the open call,
but its not clear exactly what race this restriction is stopping
(reason appears to have been lost)
- Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
domain socket is a bit ugly from a userspace point of view,
especially when you may have hundreds if not (eventually) thousands
of processes that all need to do this with each other
- Doesn't allow for some future use of the interface we would like to
consider adding in the future (see below)
- Interestingly reading from /proc/pid/mem currently actually
involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems. Since you need the reader and writer working co-operatively if
the pipe is not drained then you block. Which requires some wrapping to
do non blocking on the send side or polling on the receive. In all to all
communication it requires ordering otherwise you can deadlock. And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could. For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy. We don't need to keep a copy of the data
from the source. I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI). This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication. And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Chris Yeoh
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Arnd Bergmann
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: David Howells
Cc: James Morris
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christopher Yeoh
2011-11-01 08:30:44 +0800
ec53cf23c irq: don't put module.h into irq.h for tracking irqgen modules. ... Browse Code »

Recent commit "irq: Track the owner of irq descriptor" in
commit ID b6873807a7143b7 placed module.h into linux/irq.h
but we are trying to limit module.h inclusion to just C files
that really need it, due to its size and number of children
includes. This targets just reversing that include.

Add in the basic "struct module" since that is all we really need
to ensure things compile. In theory, b687380 should have added the
module.h include to the irqdesc.h header as well, but the implicit
module.h everywhere presence masked this from showing up. So give
it the "struct module" as well.

As for the C files, irqdesc.c is only using THIS_MODULE, so it
does not need module.h - give it export.h instead. The C file
irq/manage.c is now (as of b687380) using try_module_get and
module_put and so it needs module.h (which it already has).

Also convert the irq_alloc_descs variants to macros, since all
they really do is is call the __irq_alloc_descs primitive.
This avoids including export.h and no debug info is lost.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-11-01 07:32:35 +0800
6e5fdeedc kernel: Fix files explicitly needing EXPORT_SYMBOL infrastructure ... Browse Code »

These files were getting via an implicit non-obvious
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason. Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-11-01 07:30:05 +0800

31 Oct, 2011

7 commits

bdfa97bf7 kernel: fix up module header handling in rcutiny files ... Browse Code »

The file rcutiny.c does not need moduleparam.h header, as
there are no modparams in this file.

However rcutiny_plugin.h does define a module_init() and
a module_exit() and it uses the various MODULE_ macros, so
it really does need module.h included.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:13 +0800
72a59aaad kernel: params.c needs module.h not moduleparam.h ... Browse Code »

Through various other implicit include paths, some files were
getting the full module.h file, and hence living the illusion
that they really only needed moduleparam.h -- but the reality
is that once you remove the module.h presence, these show up:

kernel/params.c:583: warning: ‘struct module_kobject’ declared inside parameter list

Such files really require module.h so simply make it so. As the
file module.h grabs moduleparam.h on the fly, all will be well.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:13 +0800
1596425fd kernel: ksysfs.c is implicitly using stat.h ... Browse Code »

With the module.h usage cleanup, we'll get this:

kernel/ksysfs.c:161: error: ‘S_IRUGO’ undeclared here (not in a function)
make[2]: *** [kernel/ksysfs.o] Error 1

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:13 +0800
967d1f906 kernel: fix two implicit header assumptions in irq_work.c ... Browse Code »

Up until now, this file was getting percpu.h because nearly every
file was implicitly getting module.h (and all its sub-includes).
But we want to clean that up, so call out percpu.h explicitly.
Otherwise we'll get things like this on an ARM build:

kernel/irq_work.c:48: error: expected declaration specifiers or '...' before 'irq_work_list'
kernel/irq_work.c:48: warning: type defaults to 'int' in declaration of 'DEFINE_PER_CPU'

The same thing was happening for builds on ARM for asm/processor.h

kernel/irq_work.c: In function 'irq_work_sync':
kernel/irq_work.c:166: error: implicit declaration of function 'cpu_relax'

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800
74da1ff71 kernel: fix several implicit usasges of kmod.h ... Browse Code »

These files were implicitly relying on coming in via
module.h, as without it we get things like:

kernel/power/suspend.c:100: error: implicit declaration of function ‘usermodehelper_disable’
kernel/power/suspend.c:109: error: implicit declaration of function ‘usermodehelper_enable’
kernel/power/user.c:254: error: implicit declaration of function ‘usermodehelper_disable’
kernel/power/user.c:261: error: implicit declaration of function ‘usermodehelper_enable’

kernel/sys.c:317: error: implicit declaration of function ‘usermodehelper_disable’
kernel/sys.c:1816: error: implicit declaration of function ‘call_usermodehelper_setup’
kernel/sys.c:1822: error: implicit declaration of function ‘call_usermodehelper_setfns’
kernel/sys.c:1824: error: implicit declaration of function ‘call_usermodehelper_exec’

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800
56d82e000 kernel: Add <linux/module.h> to files using it implicitly ... Browse Code »

These files are doing things like module_put and try_module_get
so they need to call out the module.h for explicit inclusion,
rather than getting it via which we ideally want
to remove the module.h inclusion from.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800
9984de1a5 kernel: Map most files to use export.h instead of module.h ... Browse Code »

The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

-#include
+#include

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800