Doug / smarc-fsl-linux-kernel | Embedian Git Server

13 May, 2010

8 commits

46a47b1ed KVM: convert ioapic lock to spinlock ... Browse Code »

kvm_set_irq is used from non sleepable contexes, so convert ioapic from
mutex to spinlock.

KVM-Stable-Tag.
Tested-by: Ralf Bonenkamp
Signed-off-by: Marcelo Tosatti

Marcelo Tosatti
2010-05-13 12:23:55 +0800
be835674b Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/perf_event: Fix oops due to perf_event_do_pending call
powerpc/swiotlb: Fix off by one in determining boundary of which ops to use

Linus Torvalds
2010-05-13 09:48:26 +0800
5ec390e04 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 ... Browse Code »

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] correct address of _stext with CONFIG_SHARED_KERNEL=y
[S390] ptrace: fix return value of do_syscall_trace_enter()
[S390] dasd: fix race between tasklet and dasd_sleep_on

Linus Torvalds
2010-05-13 09:47:55 +0800
cdf5f61ed Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: preserve seq # on requeued messages after transient transport errors
ceph: fix cap removal races
ceph: zero unused message header, footer fields
ceph: fix locking for waking session requests after reconnect
ceph: resubmit requests on pg mapping change (not just primary change)
ceph: fix open file counting on snapped inodes when mds returns no caps
ceph: unregister osd request on failure
ceph: don't use writeback_control in writepages completion
ceph: unregister bdi before kill_anon_super releases device name

Linus Torvalds
2010-05-13 09:47:29 +0800
131c6c9ed Merge commit 'kumar/merge' into merge Browse Code »

Benjamin Herrenschmidt
2010-05-13 09:42:40 +0800
769d9968e Revert "PCI: update bridge resources to get more big ranges in PCI assign unssigned" ... Browse Code »

This reverts commit 977d17bb1749517b353874ccdc9b85abc7a58c2a, because it
can cause problems with some devices not getting any resources at all
when the resource tree is re-allocated.

For an example of this, see

https://bugzilla.kernel.org/show_bug.cgi?id=15960
(originally https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4982)
(lkml thread: http://lkml.org/lkml/2010/4/19/20)

where Peter Henriksson reported his Xonar DX sound card gone, because
the IO port region was no longer allocated.

Reported-bisected-and-tested-by: Peter Henriksson
Requested-by: Andrew Morton
Requested-by: Clemens Ladisch
Acked-by: Jesse Barnes
Cc: Yinghai Lu
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-05-13 09:39:45 +0800
7ac512aa8 CacheFiles: Fix error handling in cachefiles_determine_cache_security() ... Browse Code »

cachefiles_determine_cache_security() is expected to return with a
security override in place. However, if set_create_files_as() fails, we
fail to do this. In this case, we should just reinstate the security
override that was set by the caller.

Furthermore, if set_create_files_as() fails, we should dispose of the
new credentials we were in the process of creating.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-05-13 09:23:58 +0800
91af70814 rwsem: Test for no active locks in __rwsem_do_wake undo code ... Browse Code »

If there are no active threasd using a semaphore, it is always correct
to unqueue blocked threads. This seems to be what was intended in the
undo code.

What was done instead, was to look for a sem count of zero - this is an
impossible situation, given that at least one thread is known to be
queued on the semaphore. The code might be correct as written, but it's
hard to reason about and it's not what was intended (otherwise the goto
out would have been unconditional).

Go for checking the active count - the alternative is not worth the
headache.

Signed-off-by: Michel Lespinasse
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

Michel Lespinasse
2010-05-13 09:23:34 +0800

12 May, 2010

32 commits

57d84906f [S390] correct address of _stext with CONFIG_SHARED_KERNEL=y ... Browse Code »

As of git commit 1844c9bc0b2fed3023551c1affe033ab38e90b9a head64.S/head31.S
are not included in head.S anymore but build as an extra object. This breaks
shared kernel support because the .org statement in head64.S/head31.S for
CONFIG_SHARED_KERNEL=y will have a different effect. The end address of the
head.text section in head.o will be added to the .org value, to compensate
for this subtract 0x11000 to get the required value of 0x100000 again.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2010-05-12 15:32:26 +0800
545c174d1 [S390] ptrace: fix return value of do_syscall_trace_enter() ... Browse Code »

strace may change the system call number, so regs->gprs[2] must not
be read before tracehook_report_syscall_entry(). This fixes a bug
where "strace -f" will hang after a vfork().

Cc:
Signed-off-by: Gerald Schaefer
Signed-off-by: Martin Schwidefsky

Gerald Schaefer
2010-05-12 15:32:26 +0800
1c1e093cb [S390] dasd: fix race between tasklet and dasd_sleep_on ... Browse Code »

The various dasd_sleep_on functions use a global wait queue when
waiting for a cqr. The wait condition checks the status and devlist
fields of the cqr to determine if it is safe to continue. This
evaluation may return true, although the tasklet has not finished
processing of the cqr and the callback function has not been called
yet. When the callback is finally called, the data in the cqr may
already be invalid. The sleep_on wait condition needs a safe way to
determine if the tasklet has finished processing. Use the
callback_data field of the cqr to store a token, which is set by
the callback function itself.

Cc:
Signed-off-by: Stefan Weinhuber
Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Stefan Weinhuber
2010-05-12 15:32:26 +0800
0fe1ac48b powerpc/perf_event: Fix oops due to perf_event_do_pending call ... Browse Code »

Anton Blanchard found that large POWER systems would occasionally
crash in the exception exit path when profiling with perf_events.
The symptom was that an interrupt would occur late in the exit path
when the MSR[RI] (recoverable interrupt) bit was clear. Interrupts
should be hard-disabled at this point but they were enabled. Because
the interrupt was not recoverable the system panicked.

The reason is that the exception exit path was calling
perf_event_do_pending after hard-disabling interrupts, and
perf_event_do_pending will re-enable interrupts.

The simplest and cleanest fix for this is to use the same mechanism
that 32-bit powerpc does, namely to cause a self-IPI by setting the
decrementer to 1. This means we can remove the tests in the exception
exit path and raw_local_irq_restore.

This also makes sure that the call to perf_event_do_pending from
timer_interrupt() happens within irq_enter/irq_exit. (Note that
calling perf_event_do_pending from timer_interrupt does not mean that
there is a possible 1/HZ latency; setting the decrementer to 1 ensures
that the timer interrupt will happen immediately, i.e. within one
timebase tick, which is a few nanoseconds or 10s of nanoseconds.)

Signed-off-by: Paul Mackerras
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt

Paul Mackerras
2010-05-12 12:34:00 +0800
e84346b72 ceph: preserve seq # on requeued messages after transient transport errors ... Browse Code »

If the tcp connection drops and we reconnect to reestablish a stateful
session (with the mds), we need to resend previously sent (and possibly
received) messages with the _same_ seq # so that they can be dropped on
the other end if needed. Only assign a new seq once after the message is
queued.

Signed-off-by: Sage Weil

Sage Weil
2010-05-12 12:20:38 +0800
f818a7367 ceph: fix cap removal races ... Browse Code »

The iterate_session_caps helper traverses the session caps list and tries
to grab an inode reference. However, the __ceph_remove_cap was clearing
the inode backpointer _before_ removing itself from the session list,
causing a null pointer dereference.

Clear cap->ci under protection of s_cap_lock to avoid the race, and to
tightly couple the list and backpointer state. Use a local flag to
indicate whether we are releasing the cap, as cap->session may be modified
by a racing thread in iterate_session_caps.

Signed-off-by: Sage Weil

Sage Weil
2010-05-12 11:56:31 +0800
cea0d767c Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging ... Browse Code »

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (applesmc) Correct sysfs fan error handling
hwmon: (asc7621) Bug fixes

Linus Torvalds
2010-05-12 08:38:04 +0800
b2464ab20 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kprobes/x86: Fix removed int3 checking order
perf: Fix static strings treated like dynamic ones

Linus Torvalds
2010-05-12 08:37:24 +0800
788885ae7 drivers/gpu/drm/i915/i915_irq.c:i915_error_object_create(): use correct kmap-atomic slot ... Browse Code »

i915_error_object_create() is called from the timer interrupt and hence
can corrupt the KM_USER0 slot. Use KM_IRQ0 instead.

Reported-by: Jaswinder Singh Rajput
Tested-by: Jaswinder Singh Rajput
Acked-by: Chris Wilson
Cc: Dave Airlie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2010-05-12 08:33:42 +0800
06efbeb4a hp_accel: fix race in device removal ... Browse Code »

The work queue has to be flushed after the device has been made
inaccessible. The patch closes a window during which a work queue might
remain active after the device is removed and would then lead to ACPI
calls with undefined behavior.

Signed-off-by: Oliver Neukum
Acked-by: Eric Piel
Acked-by: Pavel Machek
Cc: Pavel Herrmann
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oliver Neukum
2010-05-12 08:33:42 +0800
a3ed2a157 mqueue: fix kernel BUG caused by double free() on mq_open() ... Browse Code »

In case of aborting because we reach the maximum amount of memory which
can be allocated to message queues per user (RLIMIT_MSGQUEUE), we would
try to free the message area twice when bailing out: first by the error
handling code itself, and then later when cleaning up the inode through
delete_inode().

Signed-off-by: André Goddard Rosa
Cc: Alexey Dobriyan
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

André Goddard Rosa
2010-05-12 08:33:42 +0800
de145b44b fbdev: bfin-t350mcqb-fb: fix fbmem allocation with blanking lines ... Browse Code »

The current allocation does not include the memory required for blanking
lines. So avoid memory corruption when multiple devices are using the DMA
memory near each other.

Signed-off-by: Michael Hennerich
Signed-off-by: Mike Frysinger
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Hennerich
2010-05-12 08:33:42 +0800
747388d78 memcg: fix css_is_ancestor() RCU locking ... Browse Code »

Some callers (in memcontrol.c) calls css_is_ancestor() without
rcu_read_lock. Because css_is_ancestor() has to access RCU protected
data, it should be under rcu_read_lock().

This makes css_is_ancestor() itself does safe access to RCU protected
area. (At least, "root" can have refcnt==0 if it's not an ancestor of
"child". So, we need rcu_read_lock().)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Paul E. McKenney"
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-12 08:33:42 +0800
7f0f15464 memcg: fix css_id() RCU locking for real ... Browse Code »

Commit ad4ba375373937817404fd92239ef4cadbded23b ("memcg: css_id() must be
called under rcu_read_lock()") modifies memcontol.c for fixing RCU check
message. But Andrew Morton pointed out that the fix doesn't seems sane
and it was just for hidining lockdep messages.

This is a patch for do proper things. Checking again, all places,
accessing without rcu_read_lock, that commit fixies was intentional....
all callers of css_id() has reference count on it. So, it's not necessary
to be under rcu_read_lock().

Considering again, we can use rcu_dereference_check for css_id(). We know
css->id is valid if css->refcnt > 0. (css->id never changes and freed
after css->refcnt going to be 0.)

This patch makes use of rcu_dereference_check() in css_id/depth and remove
unnecessary rcu-read-lock added by the commit.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Paul E. McKenney"
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-12 08:33:42 +0800
11cad320a bsdacct: use del_timer_sync() in acct_exit_ns() ... Browse Code »

acct_exit_ns --> acct_file_reopen deletes timer without check timer
execution on other CPUs. So acct_timeout() can change an unmapped memory.

Signed-off-by: Vitaliy Gusev
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaliy Gusev
2010-05-12 08:33:42 +0800
ab941e0ff rmap: remove anon_vma check in page_address_in_vma() ... Browse Code »

Currently page_address_in_vma() compares vma->anon_vma and
page_anon_vma(page) for parameter check, but in 2.6.34 a vma can have
multiple anon_vmas with anon_vma_chain, so current check does not work.
(For anonymous page shared by multiple processes, some verified (page,vma)
pairs return -EFAULT wrongly.)

We can go to checking all anon_vmas in the "same_vma" chain, but it needs
to meet lock requirement. Instead, we can remove anon_vma check safely
because page_address_in_vma() assumes that page and vma are already
checked to belong to the identical process.

Signed-off-by: Naoya Horiguchi
Reviewed-by: Rik van Riel
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2010-05-12 08:33:42 +0800
4a6018f7f hugetlbfs: kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer ... Browse Code »

Ordinarily, application using hugetlbfs will create mappings with
reserves. For shared mappings, these pages are reserved before mmap()
returns success and for private mappings, the caller process is guaranteed
and a child process that cannot get the pages gets killed with sigbus.

An application that uses MAP_NORESERVE gets no reservations and mmap()
will always succeed at the risk the page will not be available at fault
time. This might be used for example on very large sparse mappings where
the developer is confident the necessary huge pages exist to satisfy all
faults even though the whole mapping cannot be backed by huge pages.
Unfortunately, if an allocation does fail, VM_FAULT_OOM is returned to the
fault handler which proceeds to trigger the OOM-killer. This is
unhelpful.

Even without hugetlbfs mounted, a user using mmap() can trivially trigger
the OOM-killer because VM_FAULT_OOM is returned (will provide example
program if desired - it's a whopping 24 lines long). It could be
considered a DOS available to an unprivileged user.

This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
where huge pages were not available with SIGBUS instead of triggering the
OOM killer.

This change affects hugetlb_cow() as well. I feel there is a failure case
in there, but I didn't create one. It would need a fairly specific target
in terms of the faulting application and the hugepage pool size. The
hugetlb_no_page() path is much easier to hit but both might as well be
closed.

Signed-off-by: Mel Gorman
Cc: Lee Schermerhorn
Cc: David Rientjes
Cc: Andi Kleen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-12 08:33:42 +0800
475f9aa6a kexec: fix OOPS in crash_kernel_shrink ... Browse Code »

Two "echo 0 > /sys/kernel/kexec_crash_size" OOPSes kernel. Also content
of this file is invalid after first shrink to zero: it shows 1 instead of
0.

This scenario is unlikely to happen often (root privs, valid crashkernel=
in cmdline, dump-capture kernel not loaded), I hit it only by chance.

This patch fixes it.

Signed-off-by: Vitaly Mayatskikh
Cc: Cong Wang
Cc: Neil Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Mayatskikh
2010-05-12 08:33:42 +0800
d586ebbb8 mmc: atmel-mci: fix in debugfs: response value printing ... Browse Code »

In debugfs, printing of command response reports resp[2] twice: fix it to
resp[3].

Signed-off-by: Nicolas Ferre
Haavard Skinnemoen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Ferre
2010-05-12 08:33:41 +0800
abc2c9fdf mmc: atmel-mci: remove data error interrupt after xfer ... Browse Code »

Disable data error interrupts while we are actually recording that there
is not such errors. This will prevent, in some cases, the warning message
printed at new request queuing (in atmci_start_request()).

Signed-off-by: Nicolas Ferre
Cc: Haavard Skinnemoen
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Ferre
2010-05-12 08:33:41 +0800
009a891b2 mmc: atmel-mci: prevent kernel oops while removing card ... Browse Code »

The removing of an SD card in certain circumstances can lead to a kernel
oops if we do not make sure that the "data" field of the host structure is
valid. This patch adds a test in atmci_dma_cleanup() function and also
calls atmci_stop_dma() before throwing away the reference to data.

Signed-off-by: Nicolas Ferre
Cc: Haavard Skinnemoen
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Ferre
2010-05-12 08:33:41 +0800
ebb1fea9b mmc: atmel-mci: fix two parameters swapped ... Browse Code »

Two parameters were swapped in the calls to atmci_init_slot().

Signed-off-by: Nicolas Ferre
Reported-by: Anders Grahn
Cc: Haavard Skinnemoen
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Ferre
2010-05-12 08:33:41 +0800
34441427a revert "procfs: provide stack information for threads" and its fixup commits ... Browse Code »

Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.

Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.

Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.

Commit 9ebd4eba7 ("procfs: fix /proc//stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.

Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.

This patch nearly undoes the effect of all these patches.

The reason for reverting these is it provides an unusable value in
field 28. For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address. This unpredictability of the stack_start value makes
it worthless. That includes the intended use of showing how much stack
space a thread has.

Other architectures will get different values. As an example, ia64
gets 0. The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.

I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") . If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured. Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.

I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") . I left the KSTK_ESP() change in
place as that seemed worthwhile.

Signed-off-by: Robin Holt
Cc: Stefani Seibold
Cc: KOSAKI Motohiro
Cc: Michal Simek
Cc: Ingo Molnar
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Holt
2010-05-12 08:33:41 +0800
3c904afd7 it8761e_gpio: fix bug in gpio numbering ... Browse Code »

The SIO chip contains 16 possible gpio lines, not 14. The schematic was
not read carefully.

Signed-off-by: Denis Turischev
Cc: David Brownell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denis Turischev
2010-05-12 08:33:41 +0800
f33d7e2d2 dma-mapping: fix dma_sync_single_range_* ... Browse Code »

dma_sync_single_range_for_cpu() and dma_sync_single_range_for_device() use
a wrong address with a partial synchronization.

Signed-off-by: FUJITA Tomonori
Reviewed-by: Konrad Rzeszutek Wilk
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

FUJITA Tomonori
2010-05-12 08:33:41 +0800
45c6ceb54 ceph: zero unused message header, footer fields ... Browse Code »

We shouldn't leak any prior memory contents to other parties. And random
data, particularly in the 'version' field, can cause problems down the
line.

Signed-off-by: Sage Weil

Sage Weil
2010-05-12 06:17:40 +0800
fc2a093e7 Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 ... Browse Code »

* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon: Fix 3 regressions - since buffer rework

Linus Torvalds
2010-05-12 01:12:18 +0800
9fc282baa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Fix FDDI and TR config checks in ipv4 arp and LLC.
IPv4: unresolved multicast route cleanup
mac80211: remove association work when processing deauth request
ar9170: wait for asynchronous firmware loading
ipv4: udp: fix short packet and bad checksum logging
phy: Fix initialization in micrel driver.
sctp: Fix a race between ICMP protocol unreachable and connect()
veth: Dont kfree_skb() after dev_forward_skb()
IPv6: fix IPV6_RECVERR handling of locally-generated errors
net/gianfar: drop recycled skbs on MTU change
iwlwifi: work around passive scan issue

Linus Torvalds
2010-05-12 01:11:40 +0800
c61ea31da CacheFiles: Fix occasional EIO on call to vfs_unlink() ... Browse Code »

Fix an occasional EIO returned by a call to vfs_unlink():

[ 4868.465413] CacheFiles: I/O Error: Unlink failed
[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
[ 4947.320011] CacheFiles: File cache on md3 unregistering
[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
[ 5127.348716] CacheFiles: File cache on md3 registered
[ 7076.871081] CacheFiles: I/O Error: Unlink failed
[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
[ 7116.780891] CacheFiles: File cache on md3 unregistering
[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
[ 7296.813432] CacheFiles: File cache on md3 registered

What happens is this:

(1) A cached NFS file is seen to have become out of date, so NFS retires the
object and immediately acquires a new object with the same key.

(2) Retirement of the old object is done asynchronously - so the lookup/create
to generate the new object may be done first.

This can be a problem as the old object and the new object must exist at
the same point in the backing filesystem (i.e. they must have the same
pathname).

(3) The lookup for the new object sees that a backing file already exists,
checks to see whether it is valid and sees that it isn't. It then deletes
that file and creates a new one on disk.

(4) The retirement phase for the old file is then performed. It tries to
delete the dentry it has, but ext4_unlink() returns -EIO because the inode
attached to that dentry no longer matches the inode number associated with
the filename in the parent directory.

The trace below shows this quite well.

[md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1)
[md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100)

NFS has retired the old cookie and asked for a new one.

[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24})
[kslowd] OBJECT_DYING]
[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0})
[kslowd] OBJECT_LOOKING_UP]
[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24})
[kslowd] OBJECT_RECYCLING]

The old object (OBJ52) is going through the terminal states to get rid of it,
whilst the new object - (OBJ53) - is coming into being.

[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0})
[kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,)
[kslowd] lookup '@68'
[kslowd] next -> ffff88002ce41bd0 positive
[kslowd] advance
[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
[kslowd] next -> ffff8800369faac8 positive

The new object has looked up the subdir in which the file would be in (getting
dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry
ffff8800369faac8).

[kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
[kslowd] unlink stale object
[kslowd] inode does not match i_ino.

[kslowd] OBJECT_DEAD]
[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0})
[kslowd] OBJECT_ACTIVE]

(Note that the above trace includes extra information beyond that produced by
the upstream code).

The fix is to note when an object that is being retired has had its object
deleted preemptively by a replacement object that is being created, and to
skip the second removal attempt in such a case.

Reported-by: Greg M
Reported-by: Mark Moseley
Reported-by: Romain DEGEZ
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-05-12 01:07:53 +0800
7d6fb7bd1 ACPI: sleep: eliminate duplicate entries in acpisleep_dmi_table[] ... Browse Code »

Duplicate entries ended up acpisleep_dmi_table[] by accident.
They don't hurt functionality, but they are ugly, so let's get
rid of them.

Cc: stable@kernel.org
Signed-off-by: Alex Chiang
Signed-off-by: Linus Torvalds

Alex Chiang
2010-05-12 01:07:53 +0800
9abf82b8b ceph: fix locking for waking session requests after reconnect ... Browse Code »

The session->s_waiting list is protected by mdsc->mutex, not s_mutex. This
was causing (rare) s_waiting list corruption.

Fix errors paths too, while we're here. A more thorough cleanup of this
function is coming soon.

Signed-off-by: Sage Weil

Sage Weil
2010-05-12 00:53:57 +0800
d85b70566 ceph: resubmit requests on pg mapping change (not just primary change) ... Browse Code »

OSD requests need to be resubmitted on any pg mapping change, not just when
the pg primary changes. Resending only when the primary changes results in
occasional 'hung' requests during osd cluster recovery or rebalancing.

Signed-off-by: Sage Weil

Sage Weil
2010-05-12 00:53:56 +0800