Doug / smarc-fsl-linux-kernel | Embedian Git Server

11 Sep, 2010

6 commits

10d90f280 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Kill all BKL usage.

Linus Torvalds
2010-09-11 23:01:09 +0800
aad1830e6 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
sched: Move sched_avg_update() to update_cpu_load()

Linus Torvalds
2010-09-11 22:59:49 +0800
55496c896 x86, tsc: Fix a preemption leak in restore_sched_clock_state() ... Browse Code »

Doh, a real life genuine preemption leak..

This caused a suspend failure.

Reported-bisected-and-tested-by-the-invaluable: Jeff Chua
Acked-by: Suresh Siddha
Signed-off-by: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Nico Schottelius
Cc: Jesse Barnes
Cc: Linus Torvalds
Cc: Florian Pritz
Cc: Suresh Siddha
Cc: Len Brown
Cc: # Greg, please apply after: cd7240c ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from")
sleep states
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-11 15:47:07 +0800
3e6dce76d Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel ... Browse Code »

* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
drm/i915: don't enable self-refresh on Ironlake
drm/i915: Double check that the wait_request is not pending before warning
Revert "drm/i915: Warn if we run out of FIFO space for a mode"
Revert "drm/i915: Allow LVDS on pipe A on gen4+"
Revert "drm/i915: Enable RC6 on Ironlake."

Linus Torvalds
2010-09-11 09:19:43 +0800
fbc148701 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log IO completion workqueue is a high priority queue
xfs: prevent reading uninitialized stack memory

Linus Torvalds
2010-09-11 09:19:26 +0800
5ee5e97ee x86, tsc: Fix a preemption leak in restore_sched_clock_state() ... Browse Code »

A real life genuine preemption leak..

Reported-and-tested-by: Jeff Chua
Signed-off-by: Peter Zijlstra
Acked-by: Suresh Siddha
Signed-off-by: Linus Torvalds

Peter Zijlstra
2010-09-11 09:17:45 +0800

10 Sep, 2010

34 commits

51749e47e xfs: log IO completion workqueue is a high priority queue ... Browse Code »

The workqueue implementation in 2.6.36-rcX has changed, resulting
in the workqueues no longer having dedicated threads for work
processing. This has caused severe livelocks under heavy parallel
create workloads because the log IO completions have been getting
held up behind metadata IO completions. Hence log commits would
stall, memory allocation would stall because pages could not be
cleaned, and lock contention on the AIL during inode IO completion
processing was being seen to slow everything down even further.

By making the log Io completion workqueue a high priority workqueue,
they are queued ahead of all data/metadata IO completions and
processed before the data/metadata completions. Hence the log never
gets stalled, and operations needed to clean memory can continue as
quickly as possible. This avoids the livelock conditions and allos
the system to keep running under heavy load as per normal.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-09-10 23:16:54 +0800
9aea5a65a execve: make responsive to SIGKILL with large arguments ... Browse Code »

An execve with a very large total of argument/environment strings
can take a really long time in the execve system call. It runs
uninterruptibly to count and copy all the strings. This change
makes it abort the exec quickly if sent a SIGKILL.

Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending(). It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.

Signed-off-by: Roland McGrath
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Linus Torvalds

Roland McGrath
2010-09-10 23:10:26 +0800
7993bc1f4 execve: improve interactivity with large arguments ... Browse Code »

This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings(). There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.

When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.

Signed-off-by: Roland McGrath
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Linus Torvalds

Roland McGrath
2010-09-10 23:10:26 +0800
1b528181b setup_arg_pages: diagnose excessive argument size ... Browse Code »

The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
check the size of the argument/environment area on the stack.
When it is unworkably large, shift_arg_pages() hits its BUG_ON.
This is exploitable with a very large RLIMIT_STACK limit, to
create a crash pretty easily.

Check that the initial stack is not too large to make it possible
to map in any executable. We're not checking that the actual
executable (or intepreter, for binfmt_elf) will fit. So those
mappings might clobber part of the initial stack mapping. But
that is just userland lossage that userland made happen, not a
kernel problem.

Signed-off-by: Roland McGrath
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Linus Torvalds

Roland McGrath
2010-09-10 23:10:26 +0800
be6200aac Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Perform hardware_enable in CPU_STARTING callback
KVM: i8259: fix migration
KVM: fix i8259 oops when no vcpus are online
KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts

Linus Torvalds
2010-09-10 23:02:45 +0800
f2955b490 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
perf symbols: Fix multiple initialization of symbol system
perf: Fix CPU hotplug
perf, trace: Fix module leak
tracing/kprobe: Fix handling of C-unlike argument names
tracing/kprobes: Fix handling of argument names
perf probe: Fix handling of arguments names
perf probe: Fix return probe support
tracing/kprobe: Fix a memory leak in error case
tracing: Do not allow llseek to set_ftrace_filter

Linus Torvalds
2010-09-10 22:31:24 +0800
3d96406c7 KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyring ... Browse Code »

Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
of the parent process's session keyring whether or not the parent has a session
keyring [CVE-2010-2960].

This results in the following oops:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [] keyctl_session_to_parent+0x251/0x443
...
Call Trace:
[] ? keyctl_session_to_parent+0x67/0x443
[] ? __do_fault+0x24b/0x3d0
[] sys_keyctl+0xb4/0xb8
[] system_call_fastpath+0x16/0x1b

if the parent process has no session keyring.

If the system is using pam_keyinit then it mostly protected against this as all
processes derived from a login will have inherited the session keyring created
by pam_keyinit during the log in procedure.

To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.

Reported-by: Tavis Ormandy
Signed-off-by: David Howells
Acked-by: Tavis Ormandy
Signed-off-by: Linus Torvalds

David Howells
2010-09-10 22:30:00 +0800
9d1ac65a9 KEYS: Fix RCU no-lock warning in keyctl_session_to_parent() ... Browse Code »

There's an protected access to the parent process's credentials in the middle
of keyctl_session_to_parent(). This results in the following RCU warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl-session-/2137:
#0: (tasklist_lock){.+.+..}, at: [] keyctl_session_to_parent+0x60/0x236

stack backtrace:
Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb3
[] keyctl_session_to_parent+0xed/0x236
[] sys_keyctl+0xb4/0xb6
[] system_call_fastpath+0x16/0x1b

The code should take the RCU read lock to make sure the parents credentials
don't go away, even though it's holding a spinlock and has IRQ disabled.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-09-10 22:30:00 +0800
ff3cb3fec Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Range check cpu in blk_cpu_to_group
scatterlist: prevent invalid free when alloc fails
writeback: Fix lost wake-up shutting down writeback thread
writeback: do not lose wakeup events when forking bdi threads
cciss: fix reporting of max queue depth since init
block: switch s390 tape_block and mg_disk to elevator_change()
block: add function call to switch the IO scheduler from a driver
fs/bio-integrity.c: return -ENOMEM on kmalloc failure
bio-integrity.c: remove dependency on __GFP_NOFAIL
BLOCK: fix bio.bi_rw handling
block: put dev->kobj in blk_register_queue fail path
cciss: handle allocation failure
cfq-iosched: Documentation help for new tunables
cfq-iosched: blktrace print per slice sector stats
cfq-iosched: Implement tunable group_idle
cfq-iosched: Do group share accounting in IOPS when slice_idle=0
cfq-iosched: Do not idle if slice_idle=0
cciss: disable doorbell reset on reset_devices
blkio: Fix return code for mkdir calls

Linus Torvalds
2010-09-10 22:26:27 +0800
6ccaa3172 Merge branch 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91 ... Browse Code »

* 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91:
AT91: at91sam9261ek: remove C99 comments but keep information
AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC
AT91: dm9000 initialization update
AT91: SAM9G45 - add a separate clock entry for every single TC block
AT91: clock: peripheral clocks can have other parent than mck
AT91: change dma resource index

Linus Torvalds
2010-09-10 22:24:51 +0800
3657423c0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: rawmidi: fix the get next midi device ioctl
ALSA: hda - Fix wrong HP pin detection in snd_hda_parse_pin_def_config()
ALSA: seq/oss - Fix double-free at error path of snd_seq_oss_open()
ALSA: msnd-classic: Fix invalid cfg parameter
ALSA: hda - Enable PC-beep for EeePC with ALC269 codec
ALSA: hda - Add errata initverb sequence for CS42xx codecs
ALSA: usb - Release capture substream URBs properly
ALSA: virtuoso: fix setting of Xonar DS line-in/mic-in controls
ALSA: virtuoso: work around missing reset in the Xonar DS Windows driver
ALSA: hda - Add quirk for Lenovo T400s
ALSA: usb-audio: fix detection of vendor-specific device protocol settings
ALSA: usb-audio: Assume first control interface is for audio
ALSA: hda - Add a new hp-laptop model for Conexant 5066, tested on HP G60

Linus Torvalds
2010-09-10 22:23:45 +0800
dd8849c8f drm/i915: don't enable self-refresh on Ironlake ... Browse Code »

We don't know how to enable it safely, especially as outputs turn on and
off. When disabling LP1 we also need to make sure LP2 and 3 are already
disabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29173
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29082
Reported-by: Chris Lord
Signed-off-by: Jesse Barnes
Tested-by: Daniel Vetter
Cc: stable@kernel.org
Signed-off-by: Chris Wilson

Jesse Barnes
2010-09-10 22:11:43 +0800
a122eb2fd xfs: prevent reading uninitialized stack memory ... Browse Code »

The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
bytes of uninitialized stack memory, because the fsxattr struct
declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
the 12-byte fsx_pad member before copying it back to the user. This
patch takes care of it.

Signed-off-by: Dan Rosenberg
Reviewed-by: Eric Sandeen
Signed-off-by: Alex Elder

Dan Rosenberg
2010-09-10 20:39:28 +0800
4deb22a60 AT91: at91sam9261ek: remove C99 comments but keep information ... Browse Code »

Signed-off-by: Nicolas Ferre

Nicolas Ferre
2010-09-10 20:36:06 +0800
64d72bbee AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC ... Browse Code »

The sd/mmc data structure is not used if SPI is selected. The configuration
of PIO on the board prevent from using both interfaces at the same time
(board dependent).
Remove the warnings at compilation time adding a preprocessor condition.

Signed-off-by: Nicolas Ferre

Nicolas Ferre
2010-09-10 18:00:56 +0800
1879c45cc AT91: dm9000 initialization update ... Browse Code »

Add information in dm9000 mac/phy chip initialization:
- irq resource details
- platform data details

Signed-off-by: Nicolas Ferre

Nicolas Ferre
2010-09-10 17:39:23 +0800
be14eb619 block: Range check cpu in blk_cpu_to_group ... Browse Code »

While testing CPU DLPAR, the following problem was discovered.
We were DLPAR removing the first CPU, which in this case was
logical CPUs 0-3. CPUs 0-2 were already marked offline and
we were in the process of offlining CPU 3. After marking
the CPU inactive and offline in cpu_disable, but before the
cpu was completely idle (cpu_die), we ended up in __make_request
on CPU 3. There we looked at the topology map to see which CPU
to complete the I/O on and found no CPUs in the cpu_sibling_map.
This resulted in the block layer setting the completion cpu
to be NR_CPUS, which then caused an oops when we tried to
complete the I/O.

Fix this by sanity checking the value we return from blk_cpu_to_group
to be a valid cpu value.

Signed-off-by: Brian King
Signed-off-by: Jens Axboe

Brian King
2010-09-10 15:03:21 +0800
5431427b1 Merge branch 'fix/hda' into for-linus Browse Code »

Takashi Iwai
2010-09-10 14:27:00 +0800
9efdda310 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…rostedt/linux-2.6-trace into perf/urgent

Ingo Molnar
2010-09-10 14:05:34 +0800
e1d4e08d1 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/acme… ... Browse Code »

…/linux-2.6 into perf/urgent

Ingo Molnar
2010-09-10 13:34:14 +0800
df423dc7f Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata-sff: Reenable Port Multiplier after libata-sff remodeling.
libata: skip EH autopsy and recovery during suspend
ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs
ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs
libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()
ahci: fix hang on failed softreset
pata_artop: Fix device ID parity check

Linus Torvalds
2010-09-10 11:28:19 +0800
df0916255 tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread ... Browse Code »

Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
having properly started the iterator to iterate the hash. This case is
degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
to misuse a pointer. This causes a NULL ptr deref with possible security
implications. Tracked as CVE-2010-3079.

Cc: Robert Swiecki
Cc: Eugene Teo
Cc:
Signed-off-by: Chris Wright
Signed-off-by: Steven Rostedt

Chris Wright
2010-09-10 10:43:49 +0800
ea3c64506 libata-sff: Reenable Port Multiplier after libata-sff remodeling. ... Browse Code »

Keep track of the link on the which the current request is in progress.
It allows support of links behind port multiplier.

Not all libata-sff is PMP compliant. Code for native BMDMA controller
does not take in accound PMP.

Tested on Marvell 7042 and Sil7526.

Signed-off-by: Gwendal Grignou
Signed-off-by: Jeff Garzik

Gwendal Grignou
2010-09-10 10:31:55 +0800
e2f3d75fc libata: skip EH autopsy and recovery during suspend ... Browse Code »

For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend. As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo
Tested-by: Stephan Diestelhorst
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik

Tejun Heo
2010-09-10 10:27:59 +0800
992b3fb9b ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs ... Browse Code »

This patch adds the Intel Patsburg (PCH) SATA AHCI and RAID Controller
DeviceIDs.

Signed-off-by: Seth Heasley
Signed-off-by: Jeff Garzik

Seth Heasley
2010-09-10 10:27:55 +0800
238e149c7 ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs ... Browse Code »

This patch adds the Intel Patsburg (PCH) IDE mode SATA Controller DeviceIDs.

Signed-off-by: Seth Heasley
Signed-off-by: Jeff Garzik

Seth Heasley
2010-09-10 10:27:48 +0800
40c602303 libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load() ... Browse Code »

Commit 978c0666 (libata: Remove excess delay in the tf_load path)
removed ata_wait_idle() from ata_sff_tf_load() and via_tf_load().
This caused obscure detection problems in sata_sil.

https://bugzilla.kernel.org/show_bug.cgi?id=16606

The commit was pure performance optimization. Revert it for now.

Reported-by: Dieter Plaetinck
Reported-by: Jan Beulich
Bisected-by: gianluca
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik

Tejun Heo
2010-09-10 10:27:44 +0800
eee743fd7 minix: fix regression in minix_mkdir() ... Browse Code »

Commit 9eed1fb721c ("minix: replace inode uid,gid,mode init with helper")
broke directory creation on minix filesystems.

Fix it by passing the needed mode flag to inode init helper.

Signed-off-by: Jorge Boncompte [DTI2]
Cc: Dmitry Monakhov
Cc: Al Viro
Cc: [2.6.35.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jorge Boncompte [DTI2]
2010-09-10 09:57:25 +0800
9ee493ce0 mm: page allocator: drain per-cpu lists after direct reclaim allocation fails ... Browse Code »

When under significant memory pressure, a process enters direct reclaim
and immediately afterwards tries to allocate a page. If it fails and no
further progress is made, it's possible the system will go OOM. However,
on systems with large amounts of memory, it's possible that a significant
number of pages are on per-cpu lists and inaccessible to the calling
process. This leads to a process entering direct reclaim more often than
it should increasing the pressure on the system and compounding the
problem.

This patch notes that if direct reclaim is making progress but allocations
are still failing that the system is already under heavy pressure. In
this case, it drains the per-cpu lists and tries the allocation a second
time before continuing.

Signed-off-by: Mel Gorman
Reviewed-by: Minchan Kim
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Cc: Dave Chinner
Cc: Wu Fengguang
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-09-10 09:57:25 +0800
aa4548403 mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is … ... Browse Code »

…low and kswapd is awake

Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists. To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained
both periodically and when the delta is above a threshold. On large CPU
systems, the difference between the estimated and real value of
NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than
number of real free page in buddy, the VM can allocate pages below min
watermark, at worst reducing the real number of pages to zero. Even if
the OOM killer kills some victim for freeing memory, it may not free
memory if the exit path requires a new page resulting in livelock.

This patch introduces a zone_page_state_snapshot() function (courtesy of
Christoph) that takes a slightly more accurate view of an arbitrary vmstat
counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid
the watermark being accidentally broken. The estimate is not perfect and
may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Christoph Lameter
2010-09-10 09:57:25 +0800
72853e299 mm: page allocator: update free page counters after pages are placed on the free list ... Browse Code »

When allocating a page, the system uses NR_FREE_PAGES counters to
determine if watermarks would remain intact after the allocation was made.
This check is made without interrupts disabled or the zone lock held and
so is race-prone by nature. Unfortunately, when pages are being freed in
batch, the counters are updated before the pages are added on the list.
During this window, the counters are misleading as the pages do not exist
yet. When under significant pressure on systems with large numbers of
CPUs, it's possible for processes to make progress even though they should
have been stalled. This is particularly problematic if a number of the
processes are using GFP_ATOMIC as the min watermark can be accidentally
breached and in extreme cases, the system can livelock.

This patch updates the counters after the pages have been added to the
list. This makes the allocator more cautious with respect to preserving
the watermarks and mitigates livelock possibilities.

[akpm@linux-foundation.org: avoid modifying incoming args]
Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Reviewed-by: Minchan Kim
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Christoph Lameter
Reviewed-by: KOSAKI Motohiro
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-09-10 09:57:25 +0800
5ee28a447 vmstat: update zone stat threshold when onlining a cpu ... Browse Code »

refresh_zone_stat_thresholds() calculates parameter based on the number of
online cpus. It's called at cpu offlining but needs to be called at
onlining, too.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Christoph Lameter
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-09-10 09:57:25 +0800
3ab04d5cf vfs: take O_NONBLOCK out of the O_* uniqueness test ... Browse Code »

O_NONBLOCK on parisc has a dual value:

#define O_NONBLOCK 000200004 /* HPUX has separate NDELAY & NONBLOCK */

It is caught by the O_* bits uniqueness check and leads to a parisc
compile error. The fix would be to take O_NONBLOCK out.

Signed-off-by: Wu Fengguang
Signed-off-by: James Bottomley
Cc: Jamie Lokier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

James Bottomley
2010-09-10 09:57:25 +0800
339944663 swap: discard while swapping only if SWAP_FLAG_DISCARD ... Browse Code »

Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs
show a shift since I last tested in December: in part because of firmware
updates, in part because of the necessary move from barriers to awaiting
completion at the block layer. While discard at swapon still shows as
slightly beneficial on both, discarding 1MB swap cluster when allocating
is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV).

Surrender: discard as presently implemented is more hindrance than help
for swap; but might prove useful on other devices, or with improvements.
So continue to do the discard at swapon, but make discard while swapping
conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using
only the lower 16 bits of int flags).

We can add a --discard or -d to swapon(8), and a "discard" to swap in
/etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2.

Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Nigel Cunningham
Cc: Tejun Heo
Cc: Jens Axboe
Cc: James Bottomley
Cc: "Martin K. Petersen"
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2010-09-10 09:57:25 +0800