05 May, 2010

15 commits

  • call_sbin_request_key() creates a keyring and then attempts to insert a link to
    the authorisation key into that keyring, but does so without holding a write
    lock on the keyring semaphore.

    It will normally get away with this because it hasn't told anyone that the
    keyring exists yet. The new keyring, however, has had its serial number
    published, which means it can be accessed directly by that handle.

    This was found by a previous patch that adds RCU lockdep checks to the code
    that reads the keyring payload pointer, which includes a check that the keyring
    semaphore is actually locked.

    Without this patch, the following command:

    keyctl request2 user b a @s

    will provoke the following lockdep warning is displayed in dmesg:

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    security/keys/keyring.c:727 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by keyctl/2076:
    #0: (key_types_sem){.+.+.+}, at: [] key_type_lookup+0x1c/0x71
    #1: (keyring_serialise_link_sem){+.+.+.}, at: [] __key_link+0x4d/0x3c5

    stack backtrace:
    Pid: 2076, comm: keyctl Not tainted 2.6.34-rc6-cachefs #54
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb2
    [] ? __key_link+0x4d/0x3c5
    [] __key_link+0x19e/0x3c5
    [] ? __key_instantiate_and_link+0xb1/0xdc
    [] ? key_instantiate_and_link+0x42/0x5f
    [] call_sbin_request_key+0xe7/0x33b
    [] ? mutex_unlock+0x9/0xb
    [] ? __key_instantiate_and_link+0xb1/0xdc
    [] ? key_instantiate_and_link+0x42/0x5f
    [] ? request_key_auth_new+0x1c2/0x23c
    [] ? cache_alloc_debugcheck_after+0x108/0x173
    [] ? request_key_and_link+0x146/0x300
    [] ? kmem_cache_alloc+0xe1/0x118
    [] request_key_and_link+0x28b/0x300
    [] sys_request_key+0xf7/0x14a
    [] ? trace_hardirqs_on_caller+0x10c/0x130
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • The keyring key type code should use RCU dereference wrappers, even when it
    holds the keyring's key semaphore.

    Reported-by: Vegard Nossum
    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • find_keyring_by_name() can gain access to a keyring that has had its reference
    count reduced to zero, and is thus ready to be freed. This then allows the
    dead keyring to be brought back into use whilst it is being destroyed.

    The following timeline illustrates the process:

    |(cleaner) (user)
    |
    | free_user(user) sys_keyctl()
    | | |
    | key_put(user->session_keyring) keyctl_get_keyring_ID()
    | || //=> keyring->usage = 0 |
    | |schedule_work(&key_cleanup_task) lookup_user_key()
    | || |
    | kmem_cache_free(,user) |
    | . |[KEY_SPEC_USER_KEYRING]
    | . install_user_keyrings()
    | . ||
    | key_cleanup() [serial_node,) } ||
    | | ||
    | [spin_unlock(&key_serial_lock)] |find_keyring_by_name()
    | | |||
    | keyring_destroy(keyring) ||[read_lock(&keyring_name_lock)]
    | || |||
    | |[write_lock(&keyring_name_lock)] ||atomic_inc(&keyring->usage)
    | |. ||| *** GET freeing keyring ***
    | |. ||[read_unlock(&keyring_name_lock)]
    | || ||
    | |list_del() |[mutex_unlock(&key_user_k..mutex)]
    | || |
    | |[write_unlock(&keyring_name_lock)] ** INVALID keyring is returned **
    | | .
    | kmem_cache_free(,keyring) .
    | .
    | atomic_dec(&keyring->usage)
    v *** DESTROYED ***
    TIME

    If CONFIG_SLUB_DEBUG=y then we may see the following message generated:

    =============================================================================
    BUG key_jar: Poison overwritten
    -----------------------------------------------------------------------------

    INFO: 0xffff880197a7e200-0xffff880197a7e200. First byte 0x6a instead of 0x6b
    INFO: Allocated in key_alloc+0x10b/0x35f age=25 cpu=1 pid=5086
    INFO: Freed in key_cleanup+0xd0/0xd5 age=12 cpu=1 pid=10
    INFO: Slab 0xffffea000592cb90 objects=16 used=2 fp=0xffff880197a7e200 flags=0x200000000000c3
    INFO: Object 0xffff880197a7e200 @offset=512 fp=0xffff880197a7e300

    Bytes b4 0xffff880197a7e1f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
    Object 0xffff880197a7e200: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk

    Alternatively, we may see a system panic happen, such as:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
    IP: [] kmem_cache_alloc+0x5b/0xe9
    PGD 6b2b4067 PUD 6a80d067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/kernel/kexec_crash_loaded
    CPU 1
    ...
    Pid: 31245, comm: su Not tainted 2.6.34-rc5-nofixed-nodebug #2 D2089/PRIMERGY
    RIP: 0010:[] [] kmem_cache_alloc+0x5b/0xe9
    RSP: 0018:ffff88006af3bd98 EFLAGS: 00010002
    RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88007d19900b
    RDX: 0000000100000000 RSI: 00000000000080d0 RDI: ffffffff81828430
    RBP: ffffffff81828430 R08: ffff88000a293750 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000100000 R12: 00000000000080d0
    R13: 00000000000080d0 R14: 0000000000000296 R15: ffffffff810f20ce
    FS: 00007f97116bc700(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000001 CR3: 000000006a91c000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process su (pid: 31245, threadinfo ffff88006af3a000, task ffff8800374414c0)
    Stack:
    0000000512e0958e 0000000000008000 ffff880037f8d180 0000000000000001
    0000000000000000 0000000000008001 ffff88007d199000 ffffffff810f20ce
    0000000000008000 ffff88006af3be48 0000000000000024 ffffffff810face3
    Call Trace:
    [] ? get_empty_filp+0x70/0x12f
    [] ? do_filp_open+0x145/0x590
    [] ? tlb_finish_mmu+0x2a/0x33
    [] ? unmap_region+0xd3/0xe2
    [] ? virt_to_head_page+0x9/0x2d
    [] ? alloc_fd+0x69/0x10e
    [] ? do_sys_open+0x56/0xfc
    [] ? system_call_fastpath+0x16/0x1b
    Code: 0f 1f 44 00 00 49 89 c6 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 60 e8 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 8b 04 03 49 89 00 eb 14 4c 89 f9 83 ca ff 44 89 e6 48 89 ef
    RIP [] kmem_cache_alloc+0x5b/0xe9

    This problem is that find_keyring_by_name does not confirm that the keyring is
    valid before accepting it.

    Skipping keyrings that have been reduced to a zero count seems the way to go.
    To this end, use atomic_inc_not_zero() to increment the usage count and skip
    the candidate keyring if that returns false.

    The following script _may_ cause the bug to happen, but there's no guarantee
    as the window of opportunity is small:

    #!/bin/sh
    LOOP=100000
    USER=dummy_user
    /bin/su -c "exit;" $USER || { /usr/sbin/adduser -m $USER; add=1; }
    for ((i=0; i&/dev/null

    as that uses a keyring named "foo" rather than relying on the user and
    user-session named keyrings.

    Reported-by: Toshiyuki Okajima
    Signed-off-by: David Howells
    Tested-by: Toshiyuki Okajima
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    Toshiyuki Okajima
     
  • * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
    drm/radeon/kms/legacy: only enable load detection property on DVI-I
    drm/radeon/kms: fix panel scaling adjusted mode setup
    drivers/gpu/drm/drm_sysfs.c: sysfs files error handling
    drivers/gpu/drm/radeon/radeon_atombios.c: range check issues
    gpu: vga_switcheroo, fix lock imbalance
    drivers/gpu/drm/drm_memory.c: fix check for end of loop
    drivers/gpu/drm/via/via_video.c: fix off by one issue
    drm/radeon/kms/agp The wrong AGP chipset can cause a NULL pointer dereference
    drm/radeon/kms: r300 fix CS checker to allow zbuffer-only fastfill

    Linus Torvalds
     
  • …git/x86/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
    powernow-k8: Fix frequency reporting
    x86: Fix parse_reservetop() build failure on certain configs
    x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests
    x86: Fix 'reservetop=' functionality

    Linus Torvalds
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    KEYS: Fix RCU handling in key_gc_keyring()
    KEYS: Fix an RCU warning in the reading of user keys

    Linus Torvalds
     
  • key_gc_keyring() needs to either hold the RCU read lock or hold the keyring
    semaphore if it's going to scan the keyring's list. Given that it only needs
    to read the key list, and it's doing so under a spinlock, the RCU read lock is
    the thing to use.

    Furthermore, the RCU check added in e7b0a61b7929632d36cf052d9e2820ef0a9c1bfe is
    incorrect as holding the spinlock on key_serial_lock is not grounds for
    assuming a keyring's pointer list can be read safely. Instead, a simple
    rcu_dereference() inside of the previously mentioned RCU read lock is what we
    want.

    Reported-by: Serge E. Hallyn
    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Acked-by: "Paul E. McKenney"
    Signed-off-by: James Morris

    David Howells
     
  • Fix an RCU warning in the reading of user keys:

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    security/keys/user_defined.c:202 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    1 lock held by keyctl/3637:
    #0: (&key->sem){+++++.}, at: [] keyctl_read_key+0x9c/0xcf

    stack backtrace:
    Pid: 3637, comm: keyctl Not tainted 2.6.34-rc5-cachefs #18
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb2
    [] user_read+0x47/0x91
    [] keyctl_read_key+0xac/0xcf
    [] sys_keyctl+0x75/0xb7
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • DVI-D doesn't have analog. This matches the avivo behavior.

    Signed-off-by: Alex Deucher
    Signed-off-by: Dave Airlie

    Alex Deucher
     
  • This should duplicate exactly what the ddx does for both
    legacy and avivo.

    Signed-off-by: Alex Deucher
    Signed-off-by: Dave Airlie

    Alex Deucher
     
  • * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
    ocfs2: Avoid a gcc warning in ocfs2_wipe_inode().
    ocfs2: Avoid direct write if we fall back to buffered I/O
    ocfs2_dlmfs: Fix math error when reading LVB.
    ocfs2: Update VFS inode's id info after reflink.
    ocfs2: potential ERR_PTR dereference on error paths
    ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod()
    ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path
    ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path
    ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code
    ocfs2: Reset status if we want to restart file extension.
    ocfs2: Compute metaecc for superblocks during online resize.
    ocfs2: Check the owner of a lockres inside the spinlock
    ocfs2: one more warning fix in ocfs2_file_aio_write(), v2
    ocfs2_dlmfs: User DLM_* when decoding file open flags.

    Linus Torvalds
     
  • The x86_64 call_rwsem_wait() treats the active state counter part of the
    R/W semaphore state as being 16-bit when it's actually 32-bit (it's half
    of the 64-bit state). It should do "decl %edx" not "decw %dx".

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
    i2c-core: Use per-adapter userspace device lists
    i2c: Fix probing of FSC hardware monitoring chips
    i2c-core: Erase pointer to clientdata on removal

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Fix resource leak in failure path of perf_event_open()

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: Fix RCU lockdep splat on freezer_fork path
    rcu: Fix RCU lockdep splat in set_task_cpu on fork path
    mutex: Don't spin when the owner CPU is offline or other weird cases

    Linus Torvalds
     

04 May, 2010

16 commits

  • Using a single list for all userspace devices leads to a dead lock
    on multiplexed buses in some circumstances (mux chip instantiated
    from userspace). This is solved by using a separate list for each
    bus segment.

    Signed-off-by: Jean Delvare
    Acked-by: Michael Lawnick

    Jean Delvare
     
  • Some FSC hardware monitoring chips (Syleus at least) doesn't like
    quick writes we typically use to probe for I2C chips. Use a regular
    byte read instead for the address they live at (0x73). These are the
    only known chips living at this address on PC systems.

    For clarity, this fix should not be needed for kernels 2.6.30 and
    later, as we started instantiating the hwmon devices explicitly based
    on DMI data. Still, this fix is valuable in the following two cases:
    * Support for recent FSC chips on older kernels. The DMI-based device
    instantiation is more difficult to backport than the device support
    itself.
    * Case where the DMI-based device instantiation fails, whatever the
    reason. We fall back to probing in that case, so it should work.

    This fixes kernel bug #15634:
    https://bugzilla.kernel.org/show_bug.cgi?id=15634

    Signed-off-by: Jean Delvare
    Acked-by: Hans de Goede
    Cc: stable@kernel.org

    Jean Delvare
     
  • After discovering that a lot of i2c-drivers leave the pointer to their
    clientdata dangling, it was decided to let the core handle this issue.
    It is assumed that the core may access the private data after remove()
    as there are no guarantees for the lifetime of such pointers anyhow (see
    thread starting at http://lkml.org/lkml/2010/3/21/68)

    Signed-off-by: Wolfram Sang
    Signed-off-by: Jean Delvare

    Wolfram Sang
     
  • gcc warns that a variable is uninitialized. It's actually handled, but
    an early return fools gcc. Let's just initialize the variable to a
    garbage value that will crash if the usage is ever broken.

    Signed-off-by: Joel Becker

    Joel Becker
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: remove bad auth_x kmem_cache
    ceph: fix lockless caps check
    ceph: clear dir complete, invalidate dentry on replayed rename
    ceph: fix direct io truncate offset
    ceph: discard incoming messages with bad seq #
    ceph: fix seq counting for skipped messages
    ceph: add missing #includes
    ceph: fix leaked spinlock during mds reconnect
    ceph: print more useful version info on module load
    ceph: fix snap realm splits
    ceph: clear dir complete on d_move

    Linus Torvalds
     
  • It's useless, since our allocations are already a power of 2. And it was
    allocated per-instance (not globally), which caused a name collision when
    we tried to mount a second file system with auth_x enabled.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The __ variant requires caller to hold i_lock.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If a rename operation is resent to the MDS following an MDS restart, the
    client does not get a full reply (containing the resulting metadata) back.
    In that case, a ceph_rename() needs to compensate by doing anything useful
    that fill_inode() would have, like d_move().

    It also needs to invalidate the dentry (to workaround the vfs_rename_dir()
    bug) and clear the dir complete flag, just like fill_trace().

    Signed-off-by: Sage Weil

    Sage Weil
     
  • truncate_inode_pages_range wants the end offset to align with the last byte
    in a page.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • We can get old message seq #'s after a tcp reconnect for stateful sessions
    (i.e., the MDS). If we get a higher seq #, that is an error, and we
    shouldn't see any bad seq #'s for stateless (mon, osd) connections.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • Increment in_seq even when the message is skipped for some reason.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • Signed-off-by: Sage Weil

    Sage Weil
     
  • Signed-off-by: Sage Weil

    Sage Weil
     
  • Decouple the client version from the server side. Print relevant protocol
    and map version info instead.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • The snap realm split was checking i_snap_realm, not the list_head, to
    determine if an inode belonged in the new realm. The check always failed,
    which meant we always moved the inode, corrupting the old realm's list and
    causing various crashes.

    Also wait to release old realm reference to avoid possibility of use after
    free.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • d_move() reorders the d_subdirs list, breaking the readdir result caching.
    Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on
    rename.

    Signed-off-by: Sage Weil

    Sage Weil
     

03 May, 2010

5 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
    watchdog: ep93xx_wdt.c fix default timout value in MODULE_PARM_DESC string.

    Linus Torvalds
     
  • As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
    And nilfs does not set s_bdi anywhere. I noticed this problem by the
    warning introduced by the recent commit 5129a469 ("Catch filesystem
    lacking s_bdi").

    WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
    Hardware name: PowerEdge 2850
    Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
    Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
    Call Trace:
    [] warn_slowpath_common+0x60/0x90
    [] warn_slowpath_null+0xd/0x10
    [] vfs_kern_mount+0xc5/0x14e
    [] do_kern_mount+0x32/0xbd
    [] do_mount+0x671/0x6d0
    [] ? __get_free_pages+0x1f/0x21
    [] ? copy_mount_options+0x2b/0xe2
    [] ? strndup_user+0x48/0x67
    [] sys_mount+0x61/0x8f
    [] sysenter_do_call+0x12/0x32

    This ensures to set s_bdi for nilfs and fixes the sync silent failure.

    Signed-off-by: Ryusuke Konishi
    Acked-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • With F10, model 10, all valid frequencies are in the ACPI _PST table.

    Cc: # 33.x 32.x
    Signed-off-by: Mark Langsdorf
    LKML-Reference:
    Signed-off-by: Borislav Petkov
    Reviewed-by: Thomas Renninger
    Signed-off-by: H. Peter Anvin
    Signed-off-by: Ingo Molnar

    Mark Langsdorf
     
  • The WATCHDOG_TIMEOUT macro does not exist. The default timeout value is WDT_TIMEOUT.
    Fix the MODULE_PARM_DESC so that the code can compile again.

    reported-by: Randy Dunlap
    Signed-off-by: Wim Van Sebroeck

    Wim Van Sebroeck
     
  • Commit e67a807 ("x86: Fix 'reservetop=' functionality") added a
    fixup_early_ioremap() call to parse_reservetop() and declared it
    in io.h.

    But asm/io.h was only included indirectly - and on some configs
    not at all, causing a build failure on those configs.

    Cc: Liang Li
    Cc: Konrad Rzeszutek Wilk
    Cc: Yinghai Lu
    Cc: Jeremy Fitzhardinge
    Cc: Wang Chen
    Cc: "H. Peter Anvin"
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

01 May, 2010

4 commits