Eric Lee / smarc-fsl-linux-kernel

22 Mar, 2011

40 commits

ad9c990a2 Linux 2.6.33.8 Browse Code »

Greg Kroah-Hartman
2011-03-22 03:49:43 +0800
99b0ee6c6 isdn: avoid calling tty_ldisc_flush() in atomic context ... Browse Code »

commit bc10f96757bd6ab3721510df8defa8f21c32f974 upstream.

Remove the call to tty_ldisc_flush() from the RESULT_NO_CARRIER
branch of isdn_tty_modem_result(), as already proposed in commit
00409bb045887ec5e7b9e351bc080c38ab6bfd33.
This avoids a "sleeping function called from invalid context" BUG
when the hardware driver calls the statcallb() callback with
command==ISDN_STAT_DHUP in atomic context, which in turn calls
isdn_tty_modem_result(RESULT_NO_CARRIER, ~), and from there,
tty_ldisc_flush() which may sleep.

Signed-off-by: Tilman Schmidt
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Tilman Schmidt
2011-03-22 03:45:53 +0800
b8f5defba x86: Flush TLB if PGD entry is changed in i386 PAE mode ... Browse Code »

commit 4981d01eada5354d81c8929d5b2836829ba3df7b upstream.

According to intel CPU manual, every time PGD entry is changed in i386 PAE
mode, we need do a full TLB flush. Current code follows this and there is
comment for this too in the code.

But current code misses the multi-threaded case. A changed page table
might be used by several CPUs, every such CPU should flush TLB. Usually
this isn't a problem, because we prepopulate all PGD entries at process
fork. But when the process does munmap and follows new mmap, this issue
will be triggered.

When it happens, some CPUs keep doing page faults:

http://marc.info/?l=linux-kernel&m=129915020508238&w=2

Reported-by: Yasunori Goto
Tested-by: Yasunori Goto
Reviewed-by: Rik van Riel
Signed-off-by: Shaohua Li
Cc: Mallick Asit K
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: linux-mm
LKML-Reference:
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Shaohua Li
2011-03-22 03:45:53 +0800
dae148953 call_function_many: add missing ordering ... Browse Code »

commit 45a5791920ae643eafc02e2eedef1a58e341b736 upstream.

Paul McKenney's review pointed out two problems with the barriers in the
2.6.38 update to the smp call function many code.

First, a barrier that would force the func and info members of data to
be visible before their consumption in the interrupt handler was
missing. This can be solved by adding a smp_wmb between setting the
func and info members and setting setting the cpumask; this will pair
with the existing and required smp_rmb ordering the cpumask read before
the read of refs. This placement avoids the need a second smp_rmb in
the interrupt handler which would be executed on each of the N cpus
executing the call request. (I was thinking this barrier was present
but was not).

Second, the previous write to refs (establishing the zero that we the
interrupt handler was testing from all cpus) was performed by a third
party cpu. This would invoke transitivity which, as a recient or
concurrent addition to memory-barriers.txt now explicitly states, would
require a full smp_mb().

However, we know the cpumask will only be set by one cpu (the data
owner) and any preivous iteration of the mask would have cleared by the
reading cpu. By redundantly writing refs to 0 on the owning cpu before
the smp_wmb, the write to refs will follow the same path as the writes
that set the cpumask, which in turn allows us to keep the barrier in the
interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
will be be executed by N cpus for each of the possible M elements on the
list).

I moved and expanded the comment about our (ab)use of the rcu list
primitives for the concurrent walk earlier into this function. I
considered moving the first two paragraphs to the queue list head and
lock, but felt it would have been too disconected from the code.

Cc: Paul McKinney
Signed-off-by: Milton Miller
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Milton Miller
2011-03-22 03:45:53 +0800
9b1bd836a call_function_many: fix list delete vs add race ... Browse Code »

commit e6cd1e07a185d5f9b0aa75e020df02d3c1c44940 upstream.

Peter pointed out there was nothing preventing the list_del_rcu in
smp_call_function_interrupt from running before the list_add_rcu in
smp_call_function_many.

Fix this by not setting refs until we have gotten the lock for the list.
Take advantage of the wmb in list_add_rcu to save an explicit additional
one.

I tried to force this race with a udelay before the lock & list_add and
by mixing all 64 online cpus with just 3 random cpus in the mask, but
was unsuccessful. Still, inspection shows a valid race, and the fix is
a extension of the existing protection window in the current code.

Reported-by: Peter Zijlstra
Signed-off-by: Milton Miller
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Milton Miller
2011-03-22 03:45:52 +0800
9cedf7840 ext3: Always set dx_node's fake_dirent explicitly. ... Browse Code »

commit d7433142b63d727b5a217c37b1a1468b116a9771 upstream.

(crossport of 1f7bebb9e911d870fa8f997ddff838e82b5715ea
by Andreas Schlick )

When ext3_dx_add_entry() has to split an index node, it has to ensure that
name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
won't recognise it as an intermediate htree node and consider the htree to
be corrupted.

Signed-off-by: Eric Sandeen
Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman

Eric Sandeen
2011-03-22 03:45:52 +0800
9008aa5b6 perf, powerpc: Handle events that raise an exception without overflowing ... Browse Code »

commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 upstream.

Events on POWER7 can roll back if a speculative event doesn't
eventually complete. Unfortunately in some rare cases they will
raise a performance monitor exception. We need to catch this to
ensure we reset the PMC. In all cases the PMC will be 256 or less
cycles from overflow.

Signed-off-by: Anton Blanchard
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Anton Blanchard
2011-03-22 03:45:51 +0800
585f09f8b SUNRPC: Ensure we always run the tk_callback before tk_action ... Browse Code »

commit e020c6800c9621a77223bf2c1ff68180e41e8ebf upstream.

This fixes a race in which the task->tk_callback() puts the rpc_task
to sleep, setting a new callback. Under certain circumstances, the current
code may end up executing the task->tk_action before it gets round to the
callback.

Signed-off-by: Trond Myklebust
Signed-off-by: Greg Kroah-Hartman

Trond Myklebust
2011-03-22 03:45:51 +0800
314877d90 scsi_dh_alua: fix deadlock in stpg_endio ... Browse Code »

commit ed0f36bc5719b25659b637f80ceea85494b84502 upstream.

The use of blk_execute_rq_nowait() implies __blk_put_request() is needed
in stpg_endio() rather than blk_put_request() -- blk_finish_request() is
called with queue lock already held.

Signed-off-by: Joseph Gruher
Signed-off-by: Ilgu Hong
Signed-off-by: Mike Snitzer
Signed-off-by: James Bottomley
Signed-off-by: Greg Kroah-Hartman

Joseph Gruher
2011-03-22 03:45:51 +0800
0fa74a5c1 ALSA: ctxfi - Clear input settings before initialization ... Browse Code »

commit efed5f26664f93991c929d5bb343e65f900d72bc upstream.

Clear input settings before initialization.

Signed-off-by: Przemyslaw Bruski
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman

Przemyslaw Bruski
2011-03-22 03:45:50 +0800
38243420f ALSA: ctxfi - Fix SPDIF status retrieval ... Browse Code »

commit f164753a263bfd2daaf3e0273b179de7e099c57d upstream.

SDPIF status retrieval always returned the default settings instead of
the actual ones.

Signed-off-by: Przemyslaw Bruski
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman

Przemyslaw Bruski
2011-03-22 03:45:50 +0800
89b0dcd2a ALSA: ctxfi - Fix incorrect SPDIF status bit mask ... Browse Code »

commit 4c1847e884efddcc3ede371f7839e5e65b25c34d upstream.

SPDIF status mask creation was incorrect.

Signed-off-by: Przemyslaw Bruski
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman

Przemyslaw Bruski
2011-03-22 03:45:49 +0800
db79cbf24 PCI: sysfs: Fix failure path for addition of "vpd" attribute ... Browse Code »

commit 0f12a4e29368a9476076515881d9ef4e5876c6e2 upstream.

Commit 280c73d ("PCI: centralize the capabilities code in
pci-sysfs.c") changed the initialisation of the "rom" and "vpd"
attributes, and made the failure path for the "vpd" attribute
incorrect. We must free the new attribute structure (attr), but
instead we currently free dev->vpd->attr. That will normally be NULL,
resulting in a memory leak, but it might be a stale pointer, resulting
in a double-free.

Found by inspection; compile-tested only.

Signed-off-by: Ben Hutchings
Signed-off-by: Jesse Barnes
Signed-off-by: Greg Kroah-Hartman

Ben Hutchings
2011-03-22 03:45:49 +0800
1078c7bf0 PCI: do not create quirk I/O regions below PCIBIOS_MIN_IO for ICH ... Browse Code »

commit 87e3dc3855430bd254370afc79f2ed92250f5b7c upstream.

Some broken BIOSes on ICH4 chipset report an ACPI region which is in
conflict with legacy IDE ports when ACPI is disabled. Even though the
regions overlap, IDE ports are working correctly (we cannot find out
the decoding rules on chipsets).

So the only problem is the reported region itself, if we don't reserve
the region in the quirk everything works as expected.

This patch avoids reserving any quirk regions below PCIBIOS_MIN_IO
which is 0x1000. Some regions might be (and are by a fast google
query) below this border, but the only difference is that they won't
be reserved anymore. They should still work though the same as before.

The conflicts look like (1f.0 is bridge, 1f.1 is IDE ctrl):
pci 0000:00:1f.1: address space collision: [io 0x0170-0x0177] conflicts with 0000:00:1f.0 [io 0x0100-0x017f]

At 0x0100 a 128 bytes long ACPI region is reported in the quirk for
ICH4. ata_piix then fails to find disks because the IDE legacy ports
are zeroed:
ata_piix 0000:00:1f.1: device not available (can't reserve [io 0x0000-0x0007])

References: https://bugzilla.novell.com/show_bug.cgi?id=558740
Signed-off-by: Jiri Slaby
Cc: Bjorn Helgaas
Cc: "David S. Miller"
Cc: Thomas Renninger
Signed-off-by: Jesse Barnes
Signed-off-by: Greg Kroah-Hartman

Jiri Slaby
2011-03-22 03:45:49 +0800
1fed17a3f PCI: add more checking to ICH region quirks ... Browse Code »

commit cdb9755849fbaf2bb9c0a009ba5baa817a0f152d upstream.

Per ICH4 and ICH6 specs, ACPI and GPIO regions are valid iff ACPI_EN
and GPIO_EN bits are set to 1. Add checks for these bits into the
quirks prior to the region creation.

While at it, name the constants by macros.

Signed-off-by: Jiri Slaby
Cc: Bjorn Helgaas
Cc: "David S. Miller"
Cc: Thomas Renninger
Signed-off-by: Jesse Barnes
Signed-off-by: Greg Kroah-Hartman

Jiri Slaby
2011-03-22 03:45:49 +0800
ea5b3ecf8 PCI: remove quirk for pre-production systems ... Browse Code »

commit b99af4b002e4908d1a5cdaf424529bdf1dc69768 upstream.

Revert commit 7eb93b175d4de9438a4b0af3a94a112cb5266944
Author: Yu Zhao
Date: Fri Apr 3 15:18:11 2009 +0800

PCI: SR-IOV quirk for Intel 82576 NIC

If BIOS doesn't allocate resources for the SR-IOV BARs, zero the Flash
BAR and program the SR-IOV BARs to use the old Flash Memory Space.

Please refer to Intel 82576 Gigabit Ethernet Controller Datasheet
section 7.9.2.14.2 for details.
http://download.intel.com/design/network/datashts/82576_Datasheet.pdf

Signed-off-by: Yu Zhao
Signed-off-by: Jesse Barnes

This quirk was added before SR-IOV was in production and now all machines that
originally had this issue alreayd have bios updates to correct the issue. The
quirk itself is no longer needed and in fact causes bugs if run. Remove it.

Signed-off-by: Jesse Brandeburg
CC: Yu Zhao
CC: Jesse Barnes
Signed-off-by: Jesse Barnes
Signed-off-by: Greg Kroah-Hartman

Brandeburg, Jesse
2011-03-22 03:45:48 +0800
9f13867ca ALSA: hda - fix digital mic selection in mixer on 92HD8X codecs ... Browse Code »

commit 094a42452abd5564429045e210281c6d22e67fca upstream.

When the mux for digital mic is different from the mux for other mics,
the current auto-parser doesn't handle them in a right way but provides
only one mic. This patch fixes the issue.

Signed-off-by: Vitaliy Kulikov
Signed-off-by: Takashi Iwai
Signed-off-by: Greg Kroah-Hartman

Vitaliy Kulikov
2011-03-22 03:45:48 +0800
23dd4f104 xfs: prevent reading uninitialized stack memory ... Browse Code »

commit a122eb2fdfd78b58c6dd992d6f4b1aaef667eef9 upstream.

The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
bytes of uninitialized stack memory, because the fsxattr struct
declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
the 12-byte fsx_pad member before copying it back to the user. This
patch takes care of it.

Signed-off-by: Dan Rosenberg
Reviewed-by: Eric Sandeen
Signed-off-by: Alex Elder
Cc: dann frazier
Signed-off-by: Greg Kroah-Hartman

Dan Rosenberg
2011-03-22 03:45:47 +0800
24ae805d4 USB: serial: handle Data Carrier Detect changes ... Browse Code »

commit d14fc1a74e846d7851f24fc9519fe87dc12a1231 upstream.

Alan's commit 335f8514f200e63d689113d29cb7253a5c282967 introduced
.carrier_raised function in several drivers. That also means
tty_port_block_til_ready can now suspend the process trying to open the serial
port when Carrier Detect is low and put it into tty_port.open_wait queue. We
need to wake up the process when Carrier Detect goes high and trigger TTY
hangup when CD goes low.

Some of the devices do not report modem status line changes, or at least we
don't understand the status message, so for those we remove .carrier_raised
again.

Signed-off-by: Libor Pechacek
Signed-off-by: Greg Kroah-Hartman

Libor Pechacek
2011-03-22 03:45:47 +0800
659786b73 USB: CP210x Removed incorrect device ID ... Browse Code »

commit 9926c0df7b31b2128eebe92e0e2b052f380ea464 upstream.

Device ID removed 0x10C4/0x8149 for West Mountain Radio Computerized
Battery Analyzer. This device is actually based on a SiLabs C8051Fxxx,
see http://www.etheus.net/SiUSBXp_Linux_Driver for further info.

Signed-off-by: Craig Shelley
Signed-off-by: Greg Kroah-Hartman

Craig Shelley
2011-03-22 03:45:47 +0800
31447d059 USB: CP210x Add two device IDs ... Browse Code »

commit faea63f7ccfddfb8fc19798799fcd38c58415172 upstream.

Device Ids added for IRZ Automation Teleport SG-10 GSM/GPRS Modem and
DekTec DTA Plus VHF/UHF Booster/Attenuator.

Signed-off-by: Craig Shelley
Signed-off-by: Greg Kroah-Hartman

Craig Shelley
2011-03-22 03:45:46 +0800
b3133dba8 staging: usbip: remove double giveback of URB ... Browse Code »

commit 7571f089d7522a95c103558faf313c7af8856ceb upstream.

In the vhci_urb_dequeue() function the TCP connection is checked twice.
Each time when the TCP connection is closed the URB is unlinked and given
back. Remove the second attempt of unlinking and giving back of the URB completely.

This patch fixes the bug described at https://bugzilla.kernel.org/show_bug.cgi?id=24872 .

Signed-off-by: Márton Németh
Signed-off-by: Greg Kroah-Hartman

Márton Németh
2011-03-22 03:45:46 +0800
272d7ea16 sctp: Do not reset the packet during sctp_packet_config(). ... Browse Code »

commit 4bdab43323b459900578b200a4b8cf9713ac8fab upstream.

sctp_packet_config() is called when getting the packet ready
for appending of chunks. The function should not touch the
current state, since it's possible to ping-pong between two
transports when sending, and that can result packet corruption
followed by skb overlfow crash.

Reported-by: Thomas Dreibholz
Signed-off-by: Vlad Yasevich
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Vlad Yasevich
2011-03-22 03:45:46 +0800
ebe7aad41 SCSI: mptsas: fix hangs caused by ATA pass-through ... Browse Code »

commit 2a1b7e575b80ceb19ea50bfa86ce0053ea57181d upstream.

I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
pass-through commands, in particular by smartctl.

First, my version of the symptoms. On an LSI SAS1068E B3 HBA running
01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
occasional task, bus, and host resets, some of which lead to hard faults of
the HBA requiring a reboot. Abusively looping the smartctl command,

# while true; do smartctl -a /dev/sdb > /dev/null; done

dramatically increases the frequency of these failures to nearly one per
minute. A high IO load through the HBA while looping smartctl seems to
improve the chance of a full scsi host reset or a non-recoverable hang.

I reduced what smartctl was doing down to a simple test case which
causes the hang with a single IO when pointed at the sd interface. See
the code at the bottom of this e-mail. It uses an SG_IO ioctl to issue
a single pass-through ATA identify device command. If the buffer
userspace gives for the read data has certain alignments, the task is
issued to the HBA but the HBA fails to respond. If run against the sg
interface, neither the test code nor smartctl causes a hang.

sd and sg handle the SG_IO ioctl slightly differently. Unless you
specifically set a flag to do direct IO, sg passes a buffer of its own,
which is page-aligned, to the block layer and later copies the result
into the userspace buffer regardless of its alignment. sd, on the other
hand, always does direct IO unless the userspace buffer fails an
alignment test at block/blk-map.c line 57, in which case a page-aligned
buffer is created and used for the transfer.

The alignment test currently checks for word-alignment, the default
setup by scsi_lib.c; therefore, userspace buffers of almost any
alignment are given directly to the HBA as DMA targets. The LSI 1068
hardware doesn't seem to like at least a couple of the alignments which
cross a page boundary (see the test code below). Curiously, many
page-boundary-crossing alignments do work just fine.

So, either the hardware has an bug handling certain alignments or the
hardware has a stricter alignment requirement than the driver is
advertising. If stricter alignment is required, then in no case should
misaligned buffers from userspace be allowed through without being
bounced or at least causing an error to be returned.

It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
a stricter alignment requirement. If it does, sd does the right thing and
bounces misaligned buffers (see block/blk-map.c line 57). The following
patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the wrong
place for this code, but it gets my idea across.

Acked-by: Kashyap Desai
Signed-off-by: James Bottomley
Signed-off-by: Greg Kroah-Hartman

Ryan Kuester
2011-03-22 03:45:45 +0800
3ed36704f sched: Fix user time incorrectly accounted as system time on 32-bit ... Browse Code »

commit e75e863dd5c7d96b91ebbd241da5328fc38a78cc upstream.

We have 32-bit variable overflow possibility when multiply in
task_times() and thread_group_times() functions. When the
overflow happens then the scaled utime value becomes erroneously
small and the scaled stime becomes i erroneously big.

Reported here:

https://bugzilla.redhat.com/show_bug.cgi?id=633037
https://bugzilla.kernel.org/show_bug.cgi?id=16559

Reported-by: Michael Chapman
Reported-by: Ciriaco Garcia de Celis
Signed-off-by: Stanislaw Gruszka
Signed-off-by: Peter Zijlstra
Cc: Hidetoshi Seto
LKML-Reference:
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Stanislaw Gruszka
2011-03-22 03:45:45 +0800
f45c71c17 rt2x00: add device id for windy31 usb device ... Browse Code »

commit 9c4cf6d94fb362c27a24df5223ed6e327eb7279a upstream.

This patch adds the device id for the windy31 USB device to the rt73usb
driver.

Thanks to Ralf Flaxa for reporting this and providing testing and a
sample device.

Reported-by: Ralf Flaxa
Tested-by: Ralf Flaxa
Signed-off-by: Greg Kroah-Hartman
Acked-by: Ivo van Doorn
Signed-off-by: John W. Linville

Greg Kroah-Hartman
2011-03-22 03:45:45 +0800
527a95060 pid: make setpgid() system call use RCU read-side critical section ... Browse Code »

commit 950eaaca681c44aab87a46225c9e44f902c080aa upstream.

[ 23.584719]
[ 23.584720] ===================================================
[ 23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 23.585176] ---------------------------------------------------
[ 23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
[ 23.585176]
[ 23.585176] other info that might help us debug this:
[ 23.585176]
[ 23.585176]
[ 23.585176] rcu_scheduler_active = 1, debug_locks = 1
[ 23.585176] 1 lock held by rc.sysinit/728:
[ 23.585176] #0: (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x5f/0x193
[ 23.585176]
[ 23.585176] stack backtrace:
[ 23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
[ 23.585176] Call Trace:
[ 23.585176] [] lockdep_rcu_dereference+0x99/0xa2
[ 23.585176] [] find_task_by_pid_ns+0x50/0x6a
[ 23.585176] [] find_task_by_vpid+0x1d/0x1f
[ 23.585176] [] sys_setpgid+0x67/0x193
[ 23.585176] [] system_call_fastpath+0x16/0x1b
[ 24.959669] type=1400 audit(1282938522.956:4): avc: denied { module_request } for pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas

It turns out that the setpgid() system call fails to enter an RCU
read-side critical section before doing a PID-to-task_struct translation.
This commit therefore does rcu_read_lock() before the translation, and
also does rcu_read_unlock() after the last use of the returned pointer.

Reported-by: Andrew Morton
Signed-off-by: Paul E. McKenney
Acked-by: David Howells
Cc: Jiri Slaby
Cc: Oleg Nesterov
Signed-off-by: Greg Kroah-Hartman

Paul E. McKenney
2011-03-22 03:45:45 +0800
d941cd42c percpu: fix pcpu_last_unit_cpu ... Browse Code »

commit 46b30ea9bc3698bc1d1e6fd726c9601d46fa0a91 upstream.

pcpu_first/last_unit_cpu are used to track which cpu has the first and
last units assigned. This in turn is used to determine the span of a
chunk for man/unmap cache flushes and whether an address belongs to
the first chunk or not in per_cpu_ptr_to_phys().

When the number of possible CPUs isn't power of two, a chunk may
contain unassigned units towards the end of a chunk. The logic to
determine pcpu_last_unit_cpu was incorrect when there was an unused
unit at the end of a chunk. It failed to ignore the unused unit and
assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.

This was discovered through kdump failure which was caused by
malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
CPUs by CAI Qian.

Signed-off-by: Tejun Heo
Reported-by: CAI Qian
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2011-03-22 03:45:44 +0800
1ecd68e25 mm: page allocator: update free page counters after pages are placed on the free list ... Browse Code »

commit 72853e2991a2702ae93aaf889ac7db743a415dd3 upstream.

When allocating a page, the system uses NR_FREE_PAGES counters to
determine if watermarks would remain intact after the allocation was made.
This check is made without interrupts disabled or the zone lock held and
so is race-prone by nature. Unfortunately, when pages are being freed in
batch, the counters are updated before the pages are added on the list.
During this window, the counters are misleading as the pages do not exist
yet. When under significant pressure on systems with large numbers of
CPUs, it's possible for processes to make progress even though they should
have been stalled. This is particularly problematic if a number of the
processes are using GFP_ATOMIC as the min watermark can be accidentally
breached and in extreme cases, the system can livelock.

This patch updates the counters after the pages have been added to the
list. This makes the allocator more cautious with respect to preserving
the watermarks and mitigates livelock possibilities.

[akpm@linux-foundation.org: avoid modifying incoming args]
Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Reviewed-by: Minchan Kim
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Christoph Lameter
Reviewed-by: KOSAKI Motohiro
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Mel Gorman
2011-03-22 03:45:44 +0800
baa466d39 mm: page allocator: drain per-cpu lists after direct reclaim allocation fails ... Browse Code »

commit 9ee493ce0a60bf42c0f8fd0b0fe91df5704a1cbf upstream.

When under significant memory pressure, a process enters direct reclaim
and immediately afterwards tries to allocate a page. If it fails and no
further progress is made, it's possible the system will go OOM. However,
on systems with large amounts of memory, it's possible that a significant
number of pages are on per-cpu lists and inaccessible to the calling
process. This leads to a process entering direct reclaim more often than
it should increasing the pressure on the system and compounding the
problem.

This patch notes that if direct reclaim is making progress but allocations
are still failing that the system is already under heavy pressure. In
this case, it drains the per-cpu lists and tries the allocation a second
time before continuing.

Signed-off-by: Mel Gorman
Reviewed-by: Minchan Kim
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Cc: Dave Chinner
Cc: Wu Fengguang
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Mel Gorman
2011-03-22 03:45:44 +0800
35c27b582 mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is … ... Browse Code »

…low and kswapd is awake

commit aa45484031ddee09b06350ab8528bfe5b2c76d1c upstream.

Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists. To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained
both periodically and when the delta is above a threshold. On large CPU
systems, the difference between the estimated and real value of
NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than
number of real free page in buddy, the VM can allocate pages below min
watermark, at worst reducing the real number of pages to zero. Even if
the OOM killer kills some victim for freeing memory, it may not free
memory if the exit path requires a new page resulting in livelock.

This patch introduces a zone_page_state_snapshot() function (courtesy of
Christoph) that takes a slightly more accurate view of an arbitrary vmstat
counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid
the watermark being accidentally broken. The estimate is not perfect and
may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Christoph Lameter
2011-03-22 03:45:44 +0800
22b19ee03 KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyring ... Browse Code »

commit 3d96406c7da1ed5811ea52a3b0905f4f0e295376 upstream.

Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
of the parent process's session keyring whether or not the parent has a session
keyring [CVE-2010-2960].

This results in the following oops:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [] keyctl_session_to_parent+0x251/0x443
...
Call Trace:
[] ? keyctl_session_to_parent+0x67/0x443
[] ? __do_fault+0x24b/0x3d0
[] sys_keyctl+0xb4/0xb8
[] system_call_fastpath+0x16/0x1b

if the parent process has no session keyring.

If the system is using pam_keyinit then it mostly protected against this as all
processes derived from a login will have inherited the session keyring created
by pam_keyinit during the log in procedure.

To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.

Reported-by: Tavis Ormandy
Signed-off-by: David Howells
Acked-by: Tavis Ormandy
Cc: dann frazier
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

David Howells
2011-03-22 03:45:43 +0800
ef29fb3e1 KEYS: Fix RCU no-lock warning in keyctl_session_to_parent() ... Browse Code »

commit 9d1ac65a9698513d00e5608d93fca0c53f536c14 upstream.

There's an protected access to the parent process's credentials in the middle
of keyctl_session_to_parent(). This results in the following RCU warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl-session-/2137:
#0: (tasklist_lock){.+.+..}, at: [] keyctl_session_to_parent+0x60/0x236

stack backtrace:
Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb3
[] keyctl_session_to_parent+0xed/0x236
[] sys_keyctl+0xb4/0xb6
[] system_call_fastpath+0x16/0x1b

The code should take the RCU read lock to make sure the parents credentials
don't go away, even though it's holding a spinlock and has IRQ disabled.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
Cc: dann frazier
Signed-off-by: Greg Kroah-Hartman

David Howells
2011-03-22 03:45:43 +0800
b62b1d7df inotify: send IN_UNMOUNT events ... Browse Code »

commit 611da04f7a31b2208e838be55a42c7a1310ae321 upstream.

Since the .31 or so notify rewrite inotify has not sent events about
inodes which are unmounted. This patch restores those events.

Signed-off-by: Eric Paris
Cc: Ben Hutchings
Signed-off-by: Greg Kroah-Hartman

Eric Paris
2011-03-22 03:45:43 +0800
3e32b3234 IA64: Optimize ticket spinlocks in fsys_rt_sigprocmask ... Browse Code »

commit 2d2b6901649a62977452be85df53eda2412def24 upstream.

Tony's fix (f574c843191728d9407b766a027f779dcd27b272) has a small bug,
it incorrectly uses "r3" as a scratch register in the first of the two
unlock paths ... it is also inefficient. Optimize the fast path again.

Signed-off-by: Petr Tesarik
Signed-off-by: Tony Luck
Signed-off-by: Greg Kroah-Hartman

Petr Tesarik
2011-03-22 03:45:42 +0800
f0d44f184 IA64: fix siglock ... Browse Code »

commit f574c843191728d9407b766a027f779dcd27b272 upstream.

When ia64 converted to using ticket locks, an inline implementation
of trylock/unlock in fsys.S was missed. This was not noticed because
in most circumstances it simply resulted in using the slow path because
the siglock was apparently not available (under old spinlock rules).

Problems occur when the ticket spinlock has value 0x0 (when first
initialised, or when it wraps around). At this point the fsys.S
code acquires the lock (changing the 0x0 to 0x1. If another process
attempts to get the lock at this point, it will change the value from
0x1 to 0x2 (using new ticket lock rules). Then the fsys.S code will
free the lock using old spinlock rules by writing 0x0 to it. From
here a variety of bad things can happen.

Signed-off-by: Tony Luck
Signed-off-by: Greg Kroah-Hartman

Tony Luck
2011-03-22 03:45:42 +0800
bc498e99c hwmon: (via686a) Initialize fan_div values ... Browse Code »

commit f790674d3f87df6390828ac21a7d1530f71b59c8 upstream.

Functions set_fan_min() and set_fan_div() assume that the fan_div
values have already been read from the register. The driver currently
doesn't initialize them at load time, they are only set when function
via686a_update_device() is called. This means that set_fan_min() and
set_fan_div() misbehave if, for example, "sensors -s" is called
before any monitoring application (e.g. "sensors") is has been run.

Fix the problem by always initializing the fan_div values at device
bind time.

Signed-off-by: Jean Delvare
Acked-by: Guenter Roeck
Signed-off-by: Greg Kroah-Hartman

Jean Delvare
2011-03-22 03:45:42 +0800
6192bed17 hw breakpoints: Fix pid namespace bug ... Browse Code »

commit 068e35eee9ef98eb4cab55181977e24995d273be upstream.

Hardware breakpoints can't be registered within pid namespaces
because tsk->pid is passed rather than the pid in the current
namespace.

(See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )

This is a quick fix demonstrating the problem but is not the
best method of solving the problem since passing pids internally
is not the best way to avoid pid namespace bugs. Subsequent patches
will show a better solution.

Much thanks to Frederic Weisbecker for doing
the bulk of the work finding this bug.

Reported-by: Robin Green
Signed-off-by: Matt Helsley
Signed-off-by: Peter Zijlstra
Cc: Prasad
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Cc: Will Deacon
Cc: Mahesh Salgaonkar
LKML-Reference:
Signed-off-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Signed-off-by: Greg Kroah-Hartman

Matt Helsley
2011-03-22 03:45:42 +0800
2d41b2aad Fix unprotected access to task credentials in waitid() ... Browse Code »

commit f362b73244fb16ea4ae127ced1467dd8adaa7733 upstream.

Using a program like the following:

#include
#include
#include
#include

int main() {
id_t id;
siginfo_t infop;
pid_t res;

id = fork();
if (id == 0) { sleep(1); exit(0); }
kill(id, SIGSTOP);
alarm(1);
waitid(P_PID, id, &infop, WCONTINUED);
return 0;
}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
2 locks held by waitid02/22252:
#0: (tasklist_lock){.?.?..}, at: [] do_wait+0xc5/0x310
#1: (&(&sighand->siglock)->rlock){-.-...}, at: []
wait_consider_task+0x19a/0xbe0

stack backtrace:
Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
Call Trace:
[] lockdep_rcu_dereference+0xa4/0xc0
[] wait_consider_task+0xaf1/0xbe0
[] do_wait+0xf5/0x310
[] sys_waitid+0x86/0x1f0
[] ? child_wait_callback+0x0/0x70
[] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman
Signed-off-by: David Howells
Acked-by: Paul E. McKenney
Acked-by: Oleg Nesterov
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Daniel J Blueman
2011-03-22 03:45:41 +0800
fba2e5ee1 drivers/video/via/ioctl.c: prevent reading uninitialized stack memory ... Browse Code »

commit b4aaa78f4c2f9cde2f335b14f4ca30b01f9651ca upstream.

The VIAFB_GET_INFO device ioctl allows unprivileged users to read 246
bytes of uninitialized stack memory, because the "reserved" member of
the viafb_ioctl_info struct declared on the stack is not altered or
zeroed before being copied back to the user. This patch takes care of
it.

Signed-off-by: Dan Rosenberg
Signed-off-by: Florian Tobias Schandinat
Signed-off-by: Greg Kroah-Hartman

Dan Rosenberg
2011-03-22 03:45:41 +0800