05 May, 2010
15 commits
-
call_sbin_request_key() creates a keyring and then attempts to insert a link to
the authorisation key into that keyring, but does so without holding a write
lock on the keyring semaphore.It will normally get away with this because it hasn't told anyone that the
keyring exists yet. The new keyring, however, has had its serial number
published, which means it can be accessed directly by that handle.This was found by a previous patch that adds RCU lockdep checks to the code
that reads the keyring payload pointer, which includes a check that the keyring
semaphore is actually locked.Without this patch, the following command:
keyctl request2 user b a @s
will provoke the following lockdep warning is displayed in dmesg:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/keyring.c:727 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by keyctl/2076:
#0: (key_types_sem){.+.+.+}, at: [] key_type_lookup+0x1c/0x71
#1: (keyring_serialise_link_sem){+.+.+.}, at: [] __key_link+0x4d/0x3c5stack backtrace:
Pid: 2076, comm: keyctl Not tainted 2.6.34-rc6-cachefs #54
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb2
[] ? __key_link+0x4d/0x3c5
[] __key_link+0x19e/0x3c5
[] ? __key_instantiate_and_link+0xb1/0xdc
[] ? key_instantiate_and_link+0x42/0x5f
[] call_sbin_request_key+0xe7/0x33b
[] ? mutex_unlock+0x9/0xb
[] ? __key_instantiate_and_link+0xb1/0xdc
[] ? key_instantiate_and_link+0x42/0x5f
[] ? request_key_auth_new+0x1c2/0x23c
[] ? cache_alloc_debugcheck_after+0x108/0x173
[] ? request_key_and_link+0x146/0x300
[] ? kmem_cache_alloc+0xe1/0x118
[] request_key_and_link+0x28b/0x300
[] sys_request_key+0xf7/0x14a
[] ? trace_hardirqs_on_caller+0x10c/0x130
[] ? trace_hardirqs_on_thunk+0x3a/0x3f
[] system_call_fastpath+0x16/0x1bSigned-off-by: David Howells
Signed-off-by: James Morris -
The keyring key type code should use RCU dereference wrappers, even when it
holds the keyring's key semaphore.Reported-by: Vegard Nossum
Signed-off-by: David Howells
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
find_keyring_by_name() can gain access to a keyring that has had its reference
count reduced to zero, and is thus ready to be freed. This then allows the
dead keyring to be brought back into use whilst it is being destroyed.The following timeline illustrates the process:
|(cleaner) (user)
|
| free_user(user) sys_keyctl()
| | |
| key_put(user->session_keyring) keyctl_get_keyring_ID()
| || //=> keyring->usage = 0 |
| |schedule_work(&key_cleanup_task) lookup_user_key()
| || |
| kmem_cache_free(,user) |
| . |[KEY_SPEC_USER_KEYRING]
| . install_user_keyrings()
| . ||
| key_cleanup() [serial_node,) } ||
| | ||
| [spin_unlock(&key_serial_lock)] |find_keyring_by_name()
| | |||
| keyring_destroy(keyring) ||[read_lock(&keyring_name_lock)]
| || |||
| |[write_lock(&keyring_name_lock)] ||atomic_inc(&keyring->usage)
| |. ||| *** GET freeing keyring ***
| |. ||[read_unlock(&keyring_name_lock)]
| || ||
| |list_del() |[mutex_unlock(&key_user_k..mutex)]
| || |
| |[write_unlock(&keyring_name_lock)] ** INVALID keyring is returned **
| | .
| kmem_cache_free(,keyring) .
| .
| atomic_dec(&keyring->usage)
v *** DESTROYED ***
TIMEIf CONFIG_SLUB_DEBUG=y then we may see the following message generated:
=============================================================================
BUG key_jar: Poison overwritten
-----------------------------------------------------------------------------INFO: 0xffff880197a7e200-0xffff880197a7e200. First byte 0x6a instead of 0x6b
INFO: Allocated in key_alloc+0x10b/0x35f age=25 cpu=1 pid=5086
INFO: Freed in key_cleanup+0xd0/0xd5 age=12 cpu=1 pid=10
INFO: Slab 0xffffea000592cb90 objects=16 used=2 fp=0xffff880197a7e200 flags=0x200000000000c3
INFO: Object 0xffff880197a7e200 @offset=512 fp=0xffff880197a7e300Bytes b4 0xffff880197a7e1f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xffff880197a7e200: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkkAlternatively, we may see a system panic happen, such as:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [] kmem_cache_alloc+0x5b/0xe9
PGD 6b2b4067 PUD 6a80d067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/kexec_crash_loaded
CPU 1
...
Pid: 31245, comm: su Not tainted 2.6.34-rc5-nofixed-nodebug #2 D2089/PRIMERGY
RIP: 0010:[] [] kmem_cache_alloc+0x5b/0xe9
RSP: 0018:ffff88006af3bd98 EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88007d19900b
RDX: 0000000100000000 RSI: 00000000000080d0 RDI: ffffffff81828430
RBP: ffffffff81828430 R08: ffff88000a293750 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000100000 R12: 00000000000080d0
R13: 00000000000080d0 R14: 0000000000000296 R15: ffffffff810f20ce
FS: 00007f97116bc700(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000001 CR3: 000000006a91c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process su (pid: 31245, threadinfo ffff88006af3a000, task ffff8800374414c0)
Stack:
0000000512e0958e 0000000000008000 ffff880037f8d180 0000000000000001
0000000000000000 0000000000008001 ffff88007d199000 ffffffff810f20ce
0000000000008000 ffff88006af3be48 0000000000000024 ffffffff810face3
Call Trace:
[] ? get_empty_filp+0x70/0x12f
[] ? do_filp_open+0x145/0x590
[] ? tlb_finish_mmu+0x2a/0x33
[] ? unmap_region+0xd3/0xe2
[] ? virt_to_head_page+0x9/0x2d
[] ? alloc_fd+0x69/0x10e
[] ? do_sys_open+0x56/0xfc
[] ? system_call_fastpath+0x16/0x1b
Code: 0f 1f 44 00 00 49 89 c6 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 60 e8 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 8b 04 03 49 89 00 eb 14 4c 89 f9 83 ca ff 44 89 e6 48 89 ef
RIP [] kmem_cache_alloc+0x5b/0xe9This problem is that find_keyring_by_name does not confirm that the keyring is
valid before accepting it.Skipping keyrings that have been reduced to a zero count seems the way to go.
To this end, use atomic_inc_not_zero() to increment the usage count and skip
the candidate keyring if that returns false.The following script _may_ cause the bug to happen, but there's no guarantee
as the window of opportunity is small:#!/bin/sh
LOOP=100000
USER=dummy_user
/bin/su -c "exit;" $USER || { /usr/sbin/adduser -m $USER; add=1; }
for ((i=0; i&/dev/nullas that uses a keyring named "foo" rather than relying on the user and
user-session named keyrings.Reported-by: Toshiyuki Okajima
Signed-off-by: David Howells
Tested-by: Toshiyuki Okajima
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms/legacy: only enable load detection property on DVI-I
drm/radeon/kms: fix panel scaling adjusted mode setup
drivers/gpu/drm/drm_sysfs.c: sysfs files error handling
drivers/gpu/drm/radeon/radeon_atombios.c: range check issues
gpu: vga_switcheroo, fix lock imbalance
drivers/gpu/drm/drm_memory.c: fix check for end of loop
drivers/gpu/drm/via/via_video.c: fix off by one issue
drm/radeon/kms/agp The wrong AGP chipset can cause a NULL pointer dereference
drm/radeon/kms: r300 fix CS checker to allow zbuffer-only fastfill -
…git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
powernow-k8: Fix frequency reporting
x86: Fix parse_reservetop() build failure on certain configs
x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests
x86: Fix 'reservetop=' functionality -
…s/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
KEYS: Fix RCU handling in key_gc_keyring()
KEYS: Fix an RCU warning in the reading of user keys -
key_gc_keyring() needs to either hold the RCU read lock or hold the keyring
semaphore if it's going to scan the keyring's list. Given that it only needs
to read the key list, and it's doing so under a spinlock, the RCU read lock is
the thing to use.Furthermore, the RCU check added in e7b0a61b7929632d36cf052d9e2820ef0a9c1bfe is
incorrect as holding the spinlock on key_serial_lock is not grounds for
assuming a keyring's pointer list can be read safely. Instead, a simple
rcu_dereference() inside of the previously mentioned RCU read lock is what we
want.Reported-by: Serge E. Hallyn
Signed-off-by: David Howells
Acked-by: Serge Hallyn
Acked-by: "Paul E. McKenney"
Signed-off-by: James Morris -
Fix an RCU warning in the reading of user keys:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/user_defined.c:202 invoked rcu_dereference_check() without protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl/3637:
#0: (&key->sem){+++++.}, at: [] keyctl_read_key+0x9c/0xcfstack backtrace:
Pid: 3637, comm: keyctl Not tainted 2.6.34-rc5-cachefs #18
Call Trace:
[] lockdep_rcu_dereference+0xaa/0xb2
[] user_read+0x47/0x91
[] keyctl_read_key+0xac/0xcf
[] sys_keyctl+0x75/0xb7
[] system_call_fastpath+0x16/0x1bSigned-off-by: David Howells
Acked-by: Serge Hallyn
Signed-off-by: James Morris -
DVI-D doesn't have analog. This matches the avivo behavior.
Signed-off-by: Alex Deucher
Signed-off-by: Dave Airlie -
This should duplicate exactly what the ddx does for both
legacy and avivo.Signed-off-by: Alex Deucher
Signed-off-by: Dave Airlie -
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Avoid a gcc warning in ocfs2_wipe_inode().
ocfs2: Avoid direct write if we fall back to buffered I/O
ocfs2_dlmfs: Fix math error when reading LVB.
ocfs2: Update VFS inode's id info after reflink.
ocfs2: potential ERR_PTR dereference on error paths
ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod()
ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path
ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path
ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code
ocfs2: Reset status if we want to restart file extension.
ocfs2: Compute metaecc for superblocks during online resize.
ocfs2: Check the owner of a lockres inside the spinlock
ocfs2: one more warning fix in ocfs2_file_aio_write(), v2
ocfs2_dlmfs: User DLM_* when decoding file open flags. -
The x86_64 call_rwsem_wait() treats the active state counter part of the
R/W semaphore state as being 16-bit when it's actually 32-bit (it's half
of the 64-bit state). It should do "decl %edx" not "decw %dx".Signed-off-by: David Howells
Signed-off-by: Linus Torvalds -
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-core: Use per-adapter userspace device lists
i2c: Fix probing of FSC hardware monitoring chips
i2c-core: Erase pointer to clientdata on removal -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix resource leak in failure path of perf_event_open() -
…/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Fix RCU lockdep splat on freezer_fork path
rcu: Fix RCU lockdep splat in set_task_cpu on fork path
mutex: Don't spin when the owner CPU is offline or other weird cases
04 May, 2010
16 commits
-
Using a single list for all userspace devices leads to a dead lock
on multiplexed buses in some circumstances (mux chip instantiated
from userspace). This is solved by using a separate list for each
bus segment.Signed-off-by: Jean Delvare
Acked-by: Michael Lawnick -
Some FSC hardware monitoring chips (Syleus at least) doesn't like
quick writes we typically use to probe for I2C chips. Use a regular
byte read instead for the address they live at (0x73). These are the
only known chips living at this address on PC systems.For clarity, this fix should not be needed for kernels 2.6.30 and
later, as we started instantiating the hwmon devices explicitly based
on DMI data. Still, this fix is valuable in the following two cases:
* Support for recent FSC chips on older kernels. The DMI-based device
instantiation is more difficult to backport than the device support
itself.
* Case where the DMI-based device instantiation fails, whatever the
reason. We fall back to probing in that case, so it should work.This fixes kernel bug #15634:
https://bugzilla.kernel.org/show_bug.cgi?id=15634Signed-off-by: Jean Delvare
Acked-by: Hans de Goede
Cc: stable@kernel.org -
After discovering that a lot of i2c-drivers leave the pointer to their
clientdata dangling, it was decided to let the core handle this issue.
It is assumed that the core may access the private data after remove()
as there are no guarantees for the lifetime of such pointers anyhow (see
thread starting at http://lkml.org/lkml/2010/3/21/68)Signed-off-by: Wolfram Sang
Signed-off-by: Jean Delvare -
gcc warns that a variable is uninitialized. It's actually handled, but
an early return fools gcc. Let's just initialize the variable to a
garbage value that will crash if the usage is ever broken.Signed-off-by: Joel Becker
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: remove bad auth_x kmem_cache
ceph: fix lockless caps check
ceph: clear dir complete, invalidate dentry on replayed rename
ceph: fix direct io truncate offset
ceph: discard incoming messages with bad seq #
ceph: fix seq counting for skipped messages
ceph: add missing #includes
ceph: fix leaked spinlock during mds reconnect
ceph: print more useful version info on module load
ceph: fix snap realm splits
ceph: clear dir complete on d_move -
It's useless, since our allocations are already a power of 2. And it was
allocated per-instance (not globally), which caused a name collision when
we tried to mount a second file system with auth_x enabled.Signed-off-by: Sage Weil
-
The __ variant requires caller to hold i_lock.
Signed-off-by: Sage Weil
-
If a rename operation is resent to the MDS following an MDS restart, the
client does not get a full reply (containing the resulting metadata) back.
In that case, a ceph_rename() needs to compensate by doing anything useful
that fill_inode() would have, like d_move().It also needs to invalidate the dentry (to workaround the vfs_rename_dir()
bug) and clear the dir complete flag, just like fill_trace().Signed-off-by: Sage Weil
-
truncate_inode_pages_range wants the end offset to align with the last byte
in a page.Signed-off-by: Sage Weil
-
We can get old message seq #'s after a tcp reconnect for stateful sessions
(i.e., the MDS). If we get a higher seq #, that is an error, and we
shouldn't see any bad seq #'s for stateless (mon, osd) connections.Signed-off-by: Sage Weil
-
Increment in_seq even when the message is skipped for some reason.
Signed-off-by: Sage Weil
-
Signed-off-by: Sage Weil
-
Signed-off-by: Sage Weil
-
Decouple the client version from the server side. Print relevant protocol
and map version info instead.Signed-off-by: Sage Weil
-
The snap realm split was checking i_snap_realm, not the list_head, to
determine if an inode belonged in the new realm. The check always failed,
which meant we always moved the inode, corrupting the old realm's list and
causing various crashes.Also wait to release old realm reference to avoid possibility of use after
free.Signed-off-by: Sage Weil
-
d_move() reorders the d_subdirs list, breaking the readdir result caching.
Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on
rename.Signed-off-by: Sage Weil
03 May, 2010
5 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
watchdog: ep93xx_wdt.c fix default timout value in MODULE_PARM_DESC string. -
As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
And nilfs does not set s_bdi anywhere. I noticed this problem by the
warning introduced by the recent commit 5129a469 ("Catch filesystem
lacking s_bdi").WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
Hardware name: PowerEdge 2850
Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
Call Trace:
[] warn_slowpath_common+0x60/0x90
[] warn_slowpath_null+0xd/0x10
[] vfs_kern_mount+0xc5/0x14e
[] do_kern_mount+0x32/0xbd
[] do_mount+0x671/0x6d0
[] ? __get_free_pages+0x1f/0x21
[] ? copy_mount_options+0x2b/0xe2
[] ? strndup_user+0x48/0x67
[] sys_mount+0x61/0x8f
[] sysenter_do_call+0x12/0x32This ensures to set s_bdi for nilfs and fixes the sync silent failure.
Signed-off-by: Ryusuke Konishi
Acked-by: Jens Axboe
Signed-off-by: Linus Torvalds -
With F10, model 10, all valid frequencies are in the ACPI _PST table.
Cc: # 33.x 32.x
Signed-off-by: Mark Langsdorf
LKML-Reference:
Signed-off-by: Borislav Petkov
Reviewed-by: Thomas Renninger
Signed-off-by: H. Peter Anvin
Signed-off-by: Ingo Molnar -
The WATCHDOG_TIMEOUT macro does not exist. The default timeout value is WDT_TIMEOUT.
Fix the MODULE_PARM_DESC so that the code can compile again.reported-by: Randy Dunlap
Signed-off-by: Wim Van Sebroeck -
Commit e67a807 ("x86: Fix 'reservetop=' functionality") added a
fixup_early_ioremap() call to parse_reservetop() and declared it
in io.h.But asm/io.h was only included indirectly - and on some configs
not at all, causing a build failure on those configs.Cc: Liang Li
Cc: Konrad Rzeszutek Wilk
Cc: Yinghai Lu
Cc: Jeremy Fitzhardinge
Cc: Wang Chen
Cc: "H. Peter Anvin"
Cc: Andrew Morton
LKML-Reference:
Signed-off-by: Ingo Molnar
01 May, 2010
4 commits
-
perf_event_open() kfrees event after init failure which doesn't
release all resources allocated by perf_event_alloc(). Use
free_event() instead.Signed-off-by: Tejun Heo
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar -
Upstream PV guests fail to boot because of a NULL pointer in
irq_force_complete_move(). It is possible that xen guests have
irq_desc->chip_data = NULL.Test for NULL chip_data pointer before attempting to complete an irq move.
Signed-off-by: Prarit Bhargava
LKML-Reference:
Acked-by: Suresh Siddha
Signed-off-by: H. Peter Anvin
Cc: [2.6.33] -
when we fall back to buffered write from direct write, we call
__generic_file_aio_write() but that will end up doing direct write
even we are only prepared to do buffered write because the file
has the O_DIRECT flag set. This is a fix for
https://bugzilla.novell.com/show_bug.cgi?id=591039
revised with Joel's comments.Signed-off-by: Li Dongyang
Acked-by: Mark Fasheh
Signed-off-by: Joel Becker -
…t/mfasheh/ocfs2-mark into ocfs2-fixes