Eric Lee / smarc-fsl-linux-kernel

23 Mar, 2011

40 commits

207205a2b kthread: NUMA aware kthread_create_on_node() ... Browse Code »

All kthreads being created from a single helper task, they all use memory
from a single node for their kernel stack and task struct.

This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter
to parameters already used by kthread_create().

This parameter serves in allocating memory for the new kthread on its
memory node if possible.

Signed-off-by: Eric Dumazet
Acked-by: David S. Miller
Reviewed-by: Andi Kleen
Acked-by: Rusty Russell
Cc: Tejun Heo
Cc: Tony Luck
Cc: Fenghua Yu
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2011-03-23 08:44:01 +0800
b6a84016b mm: NUMA aware alloc_thread_info_node() ... Browse Code »

Add a node parameter to alloc_thread_info(), and change its name to
alloc_thread_info_node()

This change is needed to allow NUMA aware kthread_create_on_cpu()

Signed-off-by: Eric Dumazet
Acked-by: David S. Miller
Reviewed-by: Andi Kleen
Acked-by: Rusty Russell
Cc: Tejun Heo
Cc: Tony Luck
Cc: Fenghua Yu
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2011-03-23 08:44:01 +0800
504f52b54 mm: NUMA aware alloc_task_struct_node() ... Browse Code »

All kthreads being created from a single helper task, they all use memory
from a single node for their kernel stack and task struct.

This patch suite creates kthread_create_on_cpu(), adding a 'cpu' parameter
to parameters already used by kthread_create().

This parameter serves in allocating memory for the new kthread on its
memory node if available.

Users of this new function are : ksoftirqd, kworker, migration, pktgend...

This patch:

Add a node parameter to alloc_task_struct(), and change its name to
alloc_task_struct_node()

This change is needed to allow NUMA aware kthread_create_on_cpu()

Signed-off-by: Eric Dumazet
Acked-by: David S. Miller
Reviewed-by: Andi Kleen
Acked-by: Rusty Russell
Cc: Tejun Heo
Cc: Tony Luck
Cc: Fenghua Yu
Cc: David Howells
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2011-03-23 08:44:01 +0800
9d502c1c8 mm/compaction: check migrate_pages's return value instead of list_empty() ... Browse Code »

Many migrate_page's caller check return value instead of list_empy by
cf608ac19c ("mm: compaction: fix COMPACTPAGEFAILED counting"). This patch
makes compaction's migrate_pages consistent with others. This patch
should not change old behavior.

Signed-off-by: Minchan Kim
Cc: Mel Gorman
Cc: Andrea Arcangeli
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2011-03-23 08:44:00 +0800
d527caf22 mm: compaction: prevent kswapd compacting memory to reduce CPU usage ... Browse Code »

This patch reverts 5a03b051 ("thp: use compaction in kswapd for GFP_ATOMIC
order > 0") due to reports stating that kswapd CPU usage was higher and
IRQs were being disabled more frequently. This was reported at
http://www.spinics.net/linux/fedora/alsa-user/msg09885.html.

Without this patch applied, CPU usage by kswapd hovers around the 20% mark
according to the tester (Arthur Marsh:
http://www.spinics.net/linux/fedora/alsa-user/msg09899.html). With this
patch applied, it's around 2%.

The problem is not related to THP which specifies __GFP_NO_KSWAPD but is
triggered by high-order allocations hitting the low watermark for their
order and waking kswapd on kernels with CONFIG_COMPACTION set. The most
common trigger for this is network cards configured for jumbo frames but
it's also possible it'll be triggered by fork-heavy workloads (order-1)
and some wireless cards which depend on order-1 allocations.

The symptoms for the user will be high CPU usage by kswapd in low-memory
situations which could be confused with another writeback problem. While
a patch like 5a03b051 may be reintroduced in the future, this patch plays
it safe for now and reverts it.

[mel@csn.ul.ie: Beefed up the changelog]
Signed-off-by: Andrea Arcangeli
Signed-off-by: Mel Gorman
Reported-by: Arthur Marsh
Tested-by: Arthur Marsh
Cc: [2.6.38.1]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-03-23 08:44:00 +0800
89699605f mm: vmap area cache ... Browse Code »

Provide a free area cache for the vmalloc virtual address allocator, based
on the algorithm used by the user virtual memory allocator.

This reduces the number of rbtree operations and linear traversals over
the vmap extents in order to find a free area, by starting off at the last
point that a free area was found.

The free area cache is reset if areas are freed behind it, or if we are
searching for a smaller area or alignment than last time. So allocation
patterns are not changed (verified by corner-case and random test cases in
userspace testing).

This solves a regression caused by lazy vunmap TLB purging introduced in
db64fe02 (mm: rewrite vmap layer). That patch will leave extents in the
vmap allocator after they are vunmapped, and until a significant number
accumulate that can be flushed in a single batch. So in a workload that
vmalloc/vfree frequently, a chain of extents will build up from
VMALLOC_START address, which have to be iterated over each time (giving an
O(n) type of behaviour).

After this patch, the search will start from where it left off, giving
closer to an amortized O(1).

This is verified to solve regressions reported Steven in GFS2, and Avi in
KVM.

Hugh's update:

: I tried out the recent mmotm, and on one machine was fortunate to hit
: the BUG_ON(first->va_start < addr) which seems to have been stalling
: your vmap area cache patch ever since May.

: I can get you addresses etc, I did dump a few out; but once I stared
: at them, it was easier just to look at the code: and I cannot see how
: you would be so sure that first->va_start < addr, once you've done
: that addr = ALIGN(max(...), align) above, if align is over 0x1000
: (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve).

: I originally got around it by just changing the
: if (first->va_start < addr) {
: to
: while (first->va_start < addr) {
: without thinking about it any further; but that seemed unsatisfactory,
: why would we want to loop here when we've got another very similar
: loop just below it?

: I am never going to admit how long I've spent trying to grasp your
: "while (n)" rbtree loop just above this, the one with the peculiar
: if (!first && tmp->va_start < addr + size)
: in. That's unfamiliar to me, I'm guessing it's designed to save a
: subsequent rb_next() in a few circumstances (at risk of then setting
: a wrong cached_hole_size?); but they did appear few to me, and I didn't
: feel I could sign off something with that in when I don't grasp it,
: and it seems responsible for extra code and mistaken BUG_ON below it.

: I've reverted to the familiar rbtree loop that find_vma() does (but
: with va_end >= addr as you had, to respect the additional guard page):
: and then (given that cached_hole_size starts out 0) I don't see the
: need for any complications below it. If you do want to keep that loop
: as you had it, please add a comment to explain what it's trying to do,
: and where addr is relative to first when you emerge from it.

: Aren't your tests "size first->va_start" forgetting the guard page we want
: before the next area? I've changed those.

: I have not changed your many "addr + size - 1 < addr" overflow tests,
: but have since come to wonder, shouldn't they be "addr + size < addr"
: tests - won't the vend checks go wrong if addr + size is 0?

: I have added a few comments - Wolfgang Wander's 2.6.13 description of
: 1363c3cd8603a913a27e2995dccbd70d5312d8e6 Avoiding mmap fragmentation
: helped me a lot, perhaps a pointer to that would be good too. And I found
: it easier to understand when I renamed cached_start slightly and moved the
: overflow label down.

: This patch would go after your mm-vmap-area-cache.patch in mmotm.
: Trivially, nobody is going to get that BUG_ON with this patch, and it
: appears to work fine on my machines; but I have not given it anything like
: the testing you did on your original, and may have broken all the
: performance you were aiming for. Please take a look and test it out
: integrate with yours if you're satisfied - thanks.

[akpm@linux-foundation.org: add locking comment]
Signed-off-by: Nick Piggin
Signed-off-by: Hugh Dickins
Reviewed-by: Minchan Kim
Reported-and-tested-by: Steven Whitehouse
Reported-and-tested-by: Avi Kivity
Tested-by: "Barry J. Marson"
Cc: Prarit Bhargava
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2011-03-23 08:44:00 +0800
ef0a5e80f pwm_backlight: add check_fb() hook ... Browse Code »

In systems with multiple framebuffer devices, one of the devices might be
blanked while another is unblanked. In order for the backlight blanking
logic to know whether to turn off the backlight for a particular
framebuffer's blanking notification, it needs to be able to check if a
given framebuffer device corresponds to the backlight.

This plumbs the check_fb hook from core backlight through the
pwm_backlight helper to allow platform code to plug in a check_fb hook.

Signed-off-by: Robert Morell
Cc: Richard Purdie
Cc: Arun Murthy
Cc: Linus Walleij
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert Morell
2011-03-23 08:44:00 +0800
0508e04e0 drivers/video/backlight/jornada720_*.c: make needlessly global symbols static ... Browse Code »

The following symbols are needlessly defined global: jornada_bl_init,
jornada_bl_exit, jornada_lcd_init, jornada_lcd_exit.

Make them static.

Signed-off-by: Axel Lin
Acked-by: Kristoffer Ericson
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Axel Lin
2011-03-23 08:44:00 +0800
b372412e1 backlight: apple_bl depends on ACPI ... Browse Code »

apple_bl uses ACPI interfaces (data & code), so it should depend on ACPI.

drivers/video/backlight/apple_bl.c:142: warning: 'struct acpi_device' declared inside parameter list
drivers/video/backlight/apple_bl.c:142: warning: its scope is only this definition or declaration, which is probably not what you want
drivers/video/backlight/apple_bl.c:201: warning: 'struct acpi_device' declared inside parameter list
drivers/video/backlight/apple_bl.c:215: error: variable 'apple_bl_driver' has initializer but incomplete type
drivers/video/backlight/apple_bl.c:216: error: unknown field 'name' specified in initializer
...

Signed-off-by: Randy Dunlap
Acked-by: Matthew Garrett
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-03-23 08:44:00 +0800
39b3dee76 mbp_nvidia_bl: rename to apple_bl ... Browse Code »

It works on hardware other than Macbook Pros, and it works on GPUs other
than Nvidia. It should even work on iMacs, so change the name to match
reality more precisely and include an alias so existing users don't get
confused.

Signed-off-by: Matthew Garrett
Acked-by: Richard Purdie
Cc: Mourad De Clerck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Garrett
2011-03-23 08:44:00 +0800
99fd28e19 mbp_nvidia_bl: check that the backlight control functions ... Browse Code »

The SMI-based backlight control functionality may fail to work if the
system is running under EFI rather than BIOS. Check that the hardware
responds as expected, and exit if it doesn't.

Signed-off-by: Matthew Garrett
Acked-by: Richard Purdie
Cc: Mourad De Clerck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Garrett
2011-03-23 08:44:00 +0800
23a9847f0 mbp_nvidia_bl: remove DMI dependency ... Browse Code »

This driver only has to deal with two different classes of hardware, but
right now it needs new DMI entries for every new machine. It turns out
that there's an ACPI device that uniquely identifies Apples with backlights,
so this patch reworks the driver into an ACPI one, identifies the hardware
by checking the PCI vendor of the root bridge and strips out all the DMI
code. It also changes the config text to clarify that it works on devices
other than Macbook Pros and GPUs other than nvidia.

Signed-off-by: Matthew Garrett
Acked-by: Richard Purdie
Cc: Mourad De Clerck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Garrett
2011-03-23 08:44:00 +0800
9661e92c1 acpi: tie ACPI backlight devices to PCI devices if possible ... Browse Code »

Dual-GPU machines may provide more than one ACPI backlight interface. Tie
the backlight device to the GPU in order to allow userspace to identify
the correct interface.

Signed-off-by: Matthew Garrett
Cc: Richard Purdie
Cc: Chris Wilson
Cc: David Airlie
Cc: Alex Deucher
Cc: Ben Skeggs
Cc: Zhang Rui
Cc: Len Brown
Cc: Jesse Barnes
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Garrett
2011-03-23 08:44:00 +0800
7eae3efa1 nouveau: change the backlight parent device to the connector, not the PCI dev ... Browse Code »

We may eventually end up with per-connector backlights, especially with
ddcci devices. Make sure that the parent node for the backlight device is
the connector rather than the PCI device.

Signed-off-by: Matthew Garrett
Cc: Richard Purdie
Cc: Chris Wilson
Cc: David Airlie
Cc: Alex Deucher
Acked-by: Ben Skeggs
Cc: Zhang Rui
Cc: Len Brown
Cc: Jesse Barnes
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Garrett
2011-03-23 08:43:59 +0800
63ec0119d radeon: expose backlight class device for legacy LVDS encoder ... Browse Code »

Allows e.g. power management daemons to control the backlight level. Inspired
by the corresponding code in radeonfb.

[mjg@redhat.com: updated to add backlight type and make the connector the parent device]
Signed-off-by: Michel Dänzer
Signed-off-by: Matthew Garrett
Cc: Richard Purdie
Cc: Chris Wilson
Cc: David Airlie
Acked-by: Alex Deucher
Cc: Ben Skeggs
Cc: Zhang Rui
Cc: Len Brown
Cc: Jesse Barnes
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Dänzer
2011-03-23 08:43:59 +0800
bb7ca747f backlight: add backlight type ... Browse Code »

There may be multiple ways of controlling the backlight on a given
machine. Allow drivers to expose the type of interface they are
providing, making it possible for userspace to make appropriate policy
decisions.

Signed-off-by: Matthew Garrett
Cc: Richard Purdie
Cc: Chris Wilson
Cc: David Airlie
Cc: Alex Deucher
Cc: Ben Skeggs
Cc: Zhang Rui
Cc: Len Brown
Cc: Jesse Barnes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Garrett
2011-03-23 08:43:59 +0800
ccd7510fd drivers/leds/leds-lp5523.c: world-writable engine* sysfs files ... Browse Code »

Don't allow everybody to change LED settings.

Signed-off-by: Vasiliy Kulikov
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasiliy Kulikov
2011-03-23 08:43:59 +0800
67d1da79b drivers/leds/leds-lp5521.c: world-writable sysfs engine* files ... Browse Code »

Don't allow everybody to change LED settings.

Signed-off-by: Vasiliy Kulikov
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasiliy Kulikov
2011-03-23 08:43:59 +0800
1baf0eb39 drivers/vidfeo/backlight: ld9040 amoled driver support ... Browse Code »

Add a ld9040 amoled panel driver.

Signed-off-by: Donghwa Lee
Signed-off-by: Kyungmin Park
Signed-off-by: Inki Dae
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Donghwa Lee
2011-03-23 08:43:59 +0800
9517f925f leds: make *struct gpio_led_platform_data.leds const ... Browse Code »
46

And fix a typo.

Signed-off-by: Uwe Kleine-König
Cc: Lars-Peter Clausen
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Uwe Kleine-König
2011-03-23 08:43:59 +0800
b1e6b7068 leds: add driver for LM3530 ALS ... Browse Code »

Simple backlight driver for National Semiconductor LM3530. Presently only
manual mode is supported, PWM and ALS support to be added.

Signed-off-by: Shreshtha Kumar Sahu
Cc: Linus Walleij
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shreshtha Kumar Sahu
2011-03-23 08:43:59 +0800
551ea7383 leds: convert bd2802 driver to dev_pm_ops ... Browse Code »

There is a move to deprecate bus-specific PM operations and move to using
dev_pm_ops instead in order to reduce the amount of boilerplate code in
buses and facilitiate updates to the PM core. Do this move for the bs2802
driver.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Mark Brown
Cc: Kim Kyuwon
Cc: Kim Kyuwon
Cc: Richard Purdie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Brown
2011-03-23 08:43:58 +0800
8d2587970 cgroups: if you list_empty() a head then don't list_del() it ... Browse Code »

list_del() leaves poison in the prev and next pointers. The next
list_empty() will compare those poisons, and say the list isn't empty.
Any list operations that assume the node is on a list because of such a
check will be fooled into dereferencing poison. One needs to INIT the
node after the del, and fortunately there's already a wrapper for that -
list_del_init().

Some of the dels are followed by deallocations, so can be ignored, and one
can be merged with an add to make a move. Apart from that, I erred on the
side of caution in making nodes list_empty()-queriable.

Signed-off-by: Phil Carmody
Reviewed-by: Paul Menage
Cc: Li Zefan
Acked-by: Kirill A. Shutemov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Phil Carmody
2011-03-23 08:43:58 +0800
edd45544c oom: avoid deferring oom killer if exiting task is being traced ... Browse Code »

The oom killer naturally defers killing anything if it finds an eligible
task that is already exiting and has yet to detach its ->mm. This avoids
unnecessarily killing tasks when one is already in the exit path and may
free enough memory that the oom killer is no longer needed. This is
detected by PF_EXITING since threads that have already detached its ->mm
are no longer considered at all.

The problem with always deferring when a thread is PF_EXITING, however, is
that it may never actually exit when being traced, specifically if another
task is tracing it with PTRACE_O_TRACEEXIT. The oom killer does not want
to defer in this case since there is no guarantee that thread will ever
exit without intervention.

This patch will now only defer the oom killer when a thread is PF_EXITING
and no ptracer has stopped its progress in the exit path. It also ensures
that a child is sacrificed for the chosen parent only if it has a
different ->mm as the comment implies: this ensures that the thread group
leader is always targeted appropriately.

Signed-off-by: David Rientjes
Reported-by: Oleg Nesterov
Cc: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Andrey Vagin
Cc: [2.6.38.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-03-23 08:43:58 +0800
30e2b41f2 oom: skip zombies when iterating tasklist ... Browse Code »

We shouldn't defer oom killing if a thread has already detached its ->mm
and still has TIF_MEMDIE set. Memory needs to be freed, so find kill
other threads that pin the same ->mm or find another task to kill.

Signed-off-by: Andrey Vagin
Signed-off-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: [2.6.38.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2011-03-23 08:43:58 +0800
3a5dda7a1 oom: prevent unnecessary oom kills or kernel panics ... Browse Code »

This patch prevents unnecessary oom kills or kernel panics by reverting
two commits:

495789a5 (oom: make oom_score to per-process value)
cef1d352 (oom: multi threaded process coredump don't make deadlock)

First, 495789a5 (oom: make oom_score to per-process value) ignores the
fact that all threads in a thread group do not necessarily exit at the
same time.

It is imperative that select_bad_process() detect threads that are in the
exit path, specifically those with PF_EXITING set, to prevent needlessly
killing additional tasks. If a process is oom killed and the thread group
leader exits, select_bad_process() cannot detect the other threads that
are PF_EXITING by iterating over only processes. Thus, it currently
chooses another task unnecessarily for oom kill or panics the machine when
nothing else is eligible.

By iterating over threads instead, it is possible to detect threads that
are exiting and nominate them for oom kill so they get access to memory
reserves.

Second, cef1d352 (oom: multi threaded process coredump don't make
deadlock) erroneously avoids making the oom killer a no-op when an
eligible thread other than current isfound to be exiting. We want to
detect this situation so that we may allow that exiting thread time to
exit and free its memory; if it is able to exit on its own, that should
free memory so current is no loner oom. If it is not able to exit on its
own, the oom killer will nominate it for oom kill which, in this case,
only means it will get access to memory reserves.

Without this change, it is easy for the oom killer to unnecessarily target
tasks when all threads of a victim don't exit before the thread group
leader or, in the worst case, panic the machine.

Signed-off-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Cc: Oleg Nesterov
Cc: Hugh Dickins
Cc: Andrey Vagin
Cc: [2.6.38.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-03-23 08:43:58 +0800
52c50567d mm: swap: unlock swapfile inode mutex before closing file on bad swapfiles ... Browse Code »

If an administrator tries to swapon a file backed by NFS, the inode mutex is
taken (as it is for any swapfile) but later identified to be a bad swapfile
due to the lack of bmap and tries to cleanup. During cleanup, an attempt is
made to close the file but with inode->i_mutex still held. Closing an NFS
file syncs it which tries to acquire the inode mutex leading to deadlock. If
lockdep is enabled the following appears on the console;

=============================================
[ INFO: possible recursive locking detected ]
2.6.38-rc8-autobuild #1
---------------------------------------------
swapon/2192 is trying to acquire lock:
(&sb->s_type->i_mutex_key#13){+.+.+.}, at: vfs_fsync_range+0x47/0x7c

but task is already holding lock:
(&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7

other info that might help us debug this:
1 lock held by swapon/2192:
#0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7

stack backtrace:
Pid: 2192, comm: swapon Not tainted 2.6.38-rc8-autobuild #1
Call Trace:
__lock_acquire+0x2eb/0x1623
find_get_pages_tag+0x14a/0x174
pagevec_lookup_tag+0x25/0x2e
vfs_fsync_range+0x47/0x7c
lock_acquire+0xd3/0x100
vfs_fsync_range+0x47/0x7c
nfs_flush_one+0x0/0xdf [nfs]
mutex_lock_nested+0x40/0x2b1
vfs_fsync_range+0x47/0x7c
vfs_fsync_range+0x47/0x7c
vfs_fsync+0x1c/0x1e
nfs_file_flush+0x64/0x69 [nfs]
filp_close+0x43/0x72
sys_swapon+0xa39/0xae7
sysret_check+0x2e/0x69
system_call_fastpath+0x16/0x1b

This patch releases the mutex if its held before calling filep_close()
so swapon fails as expected without deadlock when the swapfile is backed
by NFS. If accepted for 2.6.39, it should also be considered a -stable
candidate for 2.6.38 and 2.6.37.

Signed-off-by: Mel Gorman
Acked-by: Hugh Dickins
Cc: [2.6.37+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2011-03-23 08:43:58 +0800
c7a1fcd8e include/asm-generic/unistd.h: fix syncfs syscall number ... Browse Code »

syncfs() is duplicating name_to_handle_at() due to a merging mistake.

Cc: Sage Weil
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2011-03-23 08:43:58 +0800
01ba82514 Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Add statistics for this_cmpxchg_double failures
slub: Add missing irq restore for the OOM path

Linus Torvalds
2011-03-23 07:26:57 +0800
ab70a1d7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
[net/9p]: Introduce basic flow-control for VirtIO transport.
9p: use the updated offset given by generic_write_checks
[net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
[net/9p] Set the condition just before waking up.
[net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
fs/9p: Add v9fs_dentry2v9ses
fs/9p: Attach writeback_fid on first open with WR flag
fs/9p: Open writeback fid in O_SYNC mode
fs/9p: Use truncate_setsize instead of vmtruncate
net/9p: Fix compile warning
net/9p: Convert the in the 9p rpc call path to GFP_NOFS
fs/9p: Fix race in initializing writeback fid

Linus Torvalds
2011-03-23 07:26:10 +0800
0adfc56ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: use watch/notify for changes in rbd header
libceph: add lingering request and watch/notify event framework
rbd: update email address in Documentation
ceph: rename dentry_release -> d_release, fix comment
ceph: add request to the tail of unsafe write list
ceph: remove request from unsafe list if it is canceled/timed out
ceph: move readahead default to fs/ceph from libceph
ceph: add ino32 mount option
ceph: update common header files
ceph: remove debugfs debug cruft
libceph: fix osd request queuing on osdmap updates
ceph: preserve I_COMPLETE across rename
libceph: Fix base64-decoding when input ends in newline.

Linus Torvalds
2011-03-23 07:25:25 +0800
f23eb2b2b tty: stop using "delayed_work" in the tty layer ... Browse Code »

Using delayed-work for tty flip buffers ends up causing us to wait for
the next tick to complete some actions. That's usually not all that
noticeable, but for certain latency-critical workloads it ends up being
totally unacceptable.

As an extreme case of this, passing a token back-and-forth over a pty
will take two ticks per iteration, so even just a thousand iterations
will take 8 seconds assuming a common 250Hz configuration.

Avoiding the whole delayed work issue brings that ping-pong test-case
down to 0.009s on my machine.

In more practical terms, this latency has been a performance problem for
things like dive computer simulators (simulating the serial interface
using the ptys) and for other environments (Alan mentions a CP/M emulator).

Reported-by: Jef Driesen
Acked-by: Greg KH
Acked-by: Alan Cox
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-03-23 07:17:32 +0800
68da9ba4e [net/9p]: Introduce basic flow-control for VirtIO transport. ... Browse Code »

Recent zerocopy work in the 9P VirtIO transport maps and pins
user buffers into kernel memory for the server to work on them.
Since the user process can initiate this kind of pinning with a simple
read/write call, thousands of IO threads initiated by the user process can
hog the system resources and could result into denial of service.

This patch introduces flow control to avoid that extreme scenario.

The ceiling limit to avoid denial of service attacks is set to relatively
high (nr_free_pagecache_pages()/4) so that it won't interfere with
regular usage, but can step in extreme cases to limit the total system
hang. Since we don't have a global structure to accommodate this variable,
I choose the virtio_chan as the home for this.

Signed-off-by: Venkateswararao Jujjuri
Reviewed-by: Badari Pulavarty
Signed-off-by: Eric Van Hensbergen

Venkateswararao Jujjuri (JV)
2011-03-23 05:32:50 +0800
aaf0ef1d2 9p: use the updated offset given by generic_write_checks ... Browse Code »

Without this fix, even if a file is opened in O_APPEND mode, data will be
written at current file position instead of end of file.

Signed-off-by: M. Mohan Kumar
Reviewed-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

M. Mohan Kumar
2011-03-23 05:32:49 +0800
316ad5501 [net/9p] Don't re-pin pages on retrying virtqueue_add_buf(). ... Browse Code »

Signed-off-by: Venkateswararao Jujjuri
Signed-off-by: Eric Van Hensbergen

Venkateswararao Jujjuri (JV)
2011-03-23 05:32:48 +0800
a01a98403 [net/9p] Set the condition just before waking up. ... Browse Code »

Given that the sprious wake-ups are common, we need to move the
condition setting right next to the wake_up(). After setting the condition
to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the
virtqueue back on the free list for someone else to use.
This may result in kernel panic while relasing the pinned pages
in p9_release_req_pages().

Also rearranged the while loop in req_done() for better redability.

Signed-off-by: Venkateswararao Jujjuri
Signed-off-by: Eric Van Hensbergen

Venkateswararao Jujjuri (JV)
2011-03-23 05:32:47 +0800
53bda3e5b [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring ... Browse Code »

Process may wait to get space on VirtIO ring to send a transaction to
VirtFS server. Current code just does a conditional wake_up() which
means only one process will be woken up even if multiple processes
are waiting.

This fix makes the wake_up unconditional. Hence we won't have any
processes waiting for-ever.

Signed-off-by: Venkateswararao Jujjuri
Signed-off-by: Eric Van Hensbergen

Venkateswararao Jujjuri (JV)
2011-03-23 05:32:19 +0800
42869c8ad fs/9p: Add v9fs_dentry2v9ses ... Browse Code »

Add the new static inline and use the same

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Venkateswararao Jujjuri
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2011-03-23 04:43:36 +0800
7add697a3 fs/9p: Attach writeback_fid on first open with WR flag ... Browse Code »

We don't need writeback fid if we are only doing O_RDONLY open

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Venkateswararao Jujjuri
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2011-03-23 04:43:36 +0800
ea59bb759 fs/9p: Open writeback fid in O_SYNC mode ... Browse Code »

Older version of protocol don't support tsyncfs operation.
So for them force a O_SYNC flag on the server

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Venkateswararao Jujjuri
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2011-03-23 04:43:36 +0800