15 Feb, 2009

2 commits

  • Here is another version, with the incremental patch rolled up, and
    added reclaim context annotation to kswapd, and allocation tracing
    to slab allocators (which may only ever reach the page allocator
    in rare cases, so it is good to put annotations here too).

    Haven't tested this version as such, but it should be getting closer
    to merge worthy ;)

    --
    After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
    allocation when it should not have been, I thought it might be a good idea to
    try to catch this kind of thing with lockdep.

    I coded up a little idea that seems to work. Unfortunately the system has to
    actually be in __GFP_FS page reclaim, then take the lock, before it will mark
    it. But at least that might still be some orders of magnitude more common
    (and more debuggable) than an actual deadlock condition, so we have some
    improvement I hope (the concept is no less complete than discovery of a lock's
    interrupt contexts).

    I guess we could even do the same thing with __GFP_IO (normal reclaim), and
    even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
    code paths, so let's start there and see how it goes.

    It *seems* to work. I did a quick test.

    =================================
    [ INFO: inconsistent lock state ]
    2.6.28-rc6-00007-ged31348-dirty #26
    ---------------------------------
    inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
    modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
    (testlock){--..}, at: [] brd_init+0x55/0x216 [brd]
    {in-reclaim-W} state was registered at:
    [] __lock_acquire+0x75b/0x1a60
    [] lock_acquire+0x91/0xc0
    [] mutex_lock_nested+0xb1/0x310
    [] brd_init+0x2b/0x216 [brd]
    [] _stext+0x3b/0x170
    [] sys_init_module+0xaf/0x1e0
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff
    irq event stamp: 3929
    hardirqs last enabled at (3929): [] mutex_lock_nested+0x285/0x310
    hardirqs last disabled at (3928): [] mutex_lock_nested+0x59/0x310
    softirqs last enabled at (3732): [] sk_filter+0x83/0xe0
    softirqs last disabled at (3730): [] sk_filter+0x16/0xe0

    other info that might help us debug this:
    1 lock held by modprobe/8526:
    #0: (testlock){--..}, at: [] brd_init+0x55/0x216 [brd]

    stack backtrace:
    Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26
    Call Trace:
    [] print_usage_bug+0x193/0x1d0
    [] mark_lock+0xaf0/0xca0
    [] mark_held_locks+0x55/0xc0
    [] ? brd_init+0x0/0x216 [brd]
    [] trace_reclaim_fs+0x2a/0x60
    [] __alloc_pages_internal+0x475/0x580
    [] ? mutex_lock_nested+0x26e/0x310
    [] ? brd_init+0x0/0x216 [brd]
    [] brd_init+0x6a/0x216 [brd]
    [] ? brd_init+0x0/0x216 [brd]
    [] _stext+0x3b/0x170
    [] ? mutex_unlock+0x9/0x10
    [] ? __mutex_unlock_slowpath+0x10d/0x180
    [] ? trace_hardirqs_on_caller+0x12c/0x190
    [] sys_init_module+0xaf/0x1e0
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Nick Piggin
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Nick Piggin
     
  • This modifies the timer code in a way to allow lockdep to detect
    deadlocks resulting from a lock being taken in the timer function
    as well as around the del_timer_sync() call.

    Signed-off-by: Johannes Berg

    Johannes Berg
     

08 Feb, 2009

1 commit


07 Feb, 2009

13 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (37 commits)
    Btrfs: Make sure dir is non-null before doing S_ISGID checks
    Btrfs: Fix memory leak in cache_drop_leaf_ref
    Btrfs: don't return congestion in write_cache_pages as often
    Btrfs: Only prep for btree deletion balances when nodes are mostly empty
    Btrfs: fix btrfs_unlock_up_safe to walk the entire path
    Btrfs: change btrfs_del_leaf to drop locks earlier
    Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode
    Btrfs: Don't try to compress pages past i_size
    Btrfs: join the transaction in __btrfs_setxattr
    Btrfs: Handle SGID bit when creating inodes
    Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks
    Btrfs: Change btree locking to use explicit blocking points
    Btrfs: hash_lock is no longer needed
    Btrfs: disable leak debugging checks in extent_io.c
    Btrfs: sort references by byte number during btrfs_inc_ref
    Btrfs: async threads should try harder to find work
    Btrfs: selinux support
    Btrfs: make btrfs acls selectable
    Btrfs: Catch missed bios in the async bio submission thread
    Btrfs: fix readdir on 32 bit machines
    ...

    Linus Torvalds
     
  • The addition of filename encryption caused a regression in unencrypted
    filename symlink support. ecryptfs_copy_filename() is used when dealing
    with unencrypted filenames and it reported that the new, copied filename
    was a character longer than it should have been.

    This caused the return value of readlink() to count the NULL byte of the
    symlink target. Most applications don't care about the extra NULL byte,
    but a version control system (bzr) helped in discovering the bug.

    Signed-off-by: Tyler Hicks
    Signed-off-by: Linus Torvalds

    Tyler Hicks
     
  • * 'x86/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
    x86-64: fix int $0x80 -ENOSYS return

    Linus Torvalds
     
  • One of my past fixes to this code introduced a different new bug.
    When using 32-bit "int $0x80" entry for a bogus syscall number,
    the return value is not correctly set to -ENOSYS. This only happens
    when neither syscall-audit nor syscall tracing is enabled (i.e., never
    seen if auditd ever started). Test program:

    /* gcc -o int80-badsys -m32 -g int80-badsys.c
    Run on x86-64 kernel.
    Note to reproduce the bug you need auditd never to have started. */

    #include
    #include

    int
    main (void)
    {
    long res;
    asm ("int $0x80" : "=a" (res) : "0" (99999));
    printf ("bad syscall returns %ld\n", res);
    return res != -ENOSYS;
    }

    The fix makes the int $0x80 path match the sysenter and syscall paths.

    Reported-by: Dmitry V. Levin
    Signed-off-by: Roland McGrath

    Roland McGrath
     
  • * 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
    elf core dump: fix get_user use

    Linus Torvalds
     
  • The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force,
    so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely
    use get_user() for a normal user-space address.

    Checking for VM_READ optimizes out the case where get_user() would fail
    anyway. The vm_file check here was already superfluous given the control
    flow earlier in the function, so that is a cleanup/optimization unrelated
    to other changes but an obvious and trivial one.

    Reported-by: Gerald Schaefer
    Signed-off-by: Roland McGrath

    Roland McGrath
     
  • This is a modification of a patch by Bill Pemberton

    nobh_write_end() could call attach_nobh_buffers() with head == NULL.
    This would result in a trap when attach_nobh_buffers() attempted to
    access bh->b_this_page.

    This can be illustrated by running the writev01 testcase from LTP on jfs.

    This error was introduced by commit 5b41e74a "vfs: fix data leak in
    nobh_write_end()". That patch did not take into account that if
    PageMappedToDisk() is true upon entry to nobh_write_begin(), then no
    buffers will be allocated for the page. In that case, we won't have to
    worry about a failed write leaving unitialized data in the page.

    Of course, head != NULL implies !page_has_buffers(page), so no need to
    test both.

    Signed-off-by: Dave Kleikamp
    Cc: Bill Pemberton
    Cc: Dmitri Monakhov
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: hda - Add missing COEF initialization for ALC887
    ALSA: hda - Add missing initialization for ALC272
    sound: usb-audio: handle wMaxPacketSize for FIXED_ENDPOINT devices
    ALSA: hda - Fix misc workqueue issues
    ALSA: hda - Add quirk for FSC Amilo Xi2550

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
    ieee1394: dv1394: move deprecation message from module init to file open
    firewire: core: Remove card from list of cards when enable fails

    Linus Torvalds
     
  • This fixes the shortlog attribution e.g. for 106757b38fff

    Signed-off-by: Uwe Kleine-König
    Acked-by: Sascha Hauer
    Acked-by: Wolfram Sang
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • I created commit 7971db5a4b4176ad5df590fce07a962c643a2740 on a machine
    where I forgot to set user.name and user.email before. The default
    values were not optimal.

    Signed-off-by: Uwe Kleine-König
    Acked-by: Wolfram Sang
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • I happened to forked lots of processes, and hit NULL pointer dereference.
    It is because in copy_process() after checking max_threads, 0 is returned
    but not -EAGAIN.

    The bug is introduced by "CRED: Detach the credentials from task_struct"
    (commit f1752eec6145c97163dbce62d17cf5d928e28a27).

    Signed-off-by: Li Zefan
    Signed-off-by: David Howells
    Acked-by: James Morris
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • The S_ISGID check in btrfs_new_inode caused an oops during subvol creation
    because sometimes the dir is null.

    Signed-off-by: Chris Mason

    Chris Mason
     

06 Feb, 2009

24 commits

  • * 'for-linus' of git://neil.brown.name/md:
    md: Ensure an md array never has too many devices.
    md: Fix a bug in linear.c causing which_dev() to return the wrong device.
    md: Allow read error in a single drive raid1 to be passed up.

    Linus Torvalds
     
  • On many Linux installations, the dv1394 driver will be auto-loaded
    whenever an AV/C device (e.g. camcorder or audio device) is plugged in.
    An irritating message would then appear in the kernel log.

    Defer this message to until a dv1394 character device file is actually
    used by a program. Also include the program name in the message and
    update the message slightly.

    Signed-off-by: Stefan Richter

    Stefan Richter
     
  • Takashi Iwai
     
  • Takashi Iwai
     
  • Signed-off-by: Takashi Iwai

    Takashi Iwai
     
  • ALC272 needs EAPD for speaker outputs as well as other similar ALC
    codecs.

    Signed-off-by: Takashi Iwai

    Takashi Iwai
     
  • For audio devices that do not have proper audio descriptors (e.g.,
    Edirol UA-20), we use hardcoded parameters from our quirks list.
    However, we must still read the maximum packet size from the standard
    endpoint descriptor; otherwise, we might use packets that are too big
    and therefore rejected by the USB core.

    Signed-off-by: Clemens Ladisch
    Cc:
    Signed-off-by: Takashi Iwai

    Clemens Ladisch
     
  • Each different metadata format supported by md supports a
    different maximum number of devices.
    We really should be enforcing this maximum in the kernel, but
    we aren't quite doing that properly.

    We currently only enforce it at the 'hot_add' point, which is an
    older interface which is not used by current userspace.

    We need to also enforce it at 'add_new_disk' time for active arrays
    and at 'do_md_run' time when starting a new array.

    So move the test from 'hot_add' into 'bind_rdev_to_array' which is
    called from both 'hot_add' and 'add_new_disk, and add a new
    test in 'analyse_sbs' which is called from 'do_md_run'.

    This bug (or missing feature) has been around "forever" and so
    the patch is suitable for any -stable that is currently maintained.

    Cc: stable@kernel.org

    Signed-off-by: NeilBrown

    NeilBrown
     
  • ab5bd5cbc8d4b868378d062eed3d4240930fbb86 introduced the following
    bug in linear software raid for large arrays on 32 bit machines:

    which_dev() computes the device holding a given sector by shifting
    down the sector number to a 32 bit range, dividing by the array
    spacing and looking up the resulting index in the hash table of
    the array.

    Because the computed index might be slightly too small, a loop at
    the end of which_dev() increases the index until the given sector
    actually falls into the range of the device associated with that index.

    The changes of the above mentioned commit caused this loop to check
    whether the _index_ rather than the sector number is small enough,
    effectively bypassing the loop and thus possibly returning the wrong
    device.

    As reported by Simon Kirby, this leads to errors such as

    linear_make_request: Sector 2340486136 out of bounds on dev sdi: 156301312 sectors, offset 2109870464

    Fix this bug by introducing a local variable for the index so that
    the variable containing the passed sector is left unchanged.

    Cc: stable@kernel.org
    Signed-off-by: Andre Noll
    Signed-off-by: NeilBrown

    Andre Noll
     
  • If a raid1 only has a single working device and gets a read error,
    we choose to simply return that error up to the filesystem (or whatever)
    rather than failing the whole array.

    However the codes doesn't quite do that. We attempt a readbalance
    which allocates the same drive, so we retry the read - indefinitely.

    Instead: If read_balance in the error case chooses the same drive that just
    failed, treat it as a failure and don't retry.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • Prevent kprobes from catching spurious faults which will cause infinite
    recursive page-fault and memory corruption by stack overflow.

    Signed-off-by: Masami Hiramatsu
    Cc: [2.6.28.x]
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • ... and yes, gcc is insane enough to eat that without complaint.
    We probably want sparse to scream on those...

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
    Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()"

    Linus Torvalds
     
  • * 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Fix up T-bit error handling in SH-4A mutex fastpath.
    sh: Fix up spurious syscall restarting.
    sh: fcnvds fix with denormalized numbers on SH-4 FPU.
    sh: Only reserve memory under CONFIG_ZERO_PAGE_OFFSET when it != 0.
    sh: Handle calling csum_partial with misaligned data
    sh: ap325rxa: Enable ov772x in defconfig.
    sh: ap325rxa: Add ov772x support.
    sh: ap325rxa: control camera power toggling.
    sh: mach-migor: Enable ov772x and tw9910 in defconfig.

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    Revert "tcp: Always set urgent pointer if it's beyond snd_nxt"
    ipv6: Copy cork options in ip6_append_data
    udp: Fix UDP short packet false positive
    gianfar: Fix potential soft reset race
    gianfar: Fix BD_LENGTH_MASK definition
    cxgb3: Fix lro switch
    iwlwifi: save PCI state before suspend, restore after resume
    iwlwifi: clean key table in iwl_clear_stations_table

    Linus Torvalds
     
  • This reverts commit 64ff3b938ec6782e6585a83d5459b98b0c3f6eb8.

    Jeff Chua reports that it breaks rlogin for him.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • As the options passed to ip6_append_data may be ephemeral, we need
    to duplicate it for corking. This patch applies the simplest fix
    which is to memdup all the relevant bits.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • David S. Miller
     
  • The UDP header pointer assignment must happen after calling
    pskb_may_pull(). As pskb_may_pull() can potentially alter the SKB
    buffer.

    This was exposted by running multicast traffic through the NIU driver,
    as it won't prepull the protocol headers into the linear area on
    receive.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • lseek() further than length of the file will leave stale ->index
    (second-to-last during iteration). Next seq_read() will not notice
    that ->f_pos is big enough to return 0, but will print last item
    as if ->f_pos is pointing to it.

    Introduced in commit cb510b8172602a66467f3551b4be1911f5a7c8c2
    aka "seq_file: more atomicity in traverse()".

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • In 2.6.25 some /proc files were converted to use the seq_file
    infrastructure. But seq_files do not correctly support pread(), which
    broke some usersapce applications.

    To handle pread correctly we can't assume that f_pos is where we left it
    in seq_read. So move traverse() so that we can eventually use it in
    seq_read and do thus some day support pread().

    Signed-off-by: Eric Biederman
    Cc: Paul Turner
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biederman
     
  • A missing type cast results in writing way beyond the end of a kzalloc()'d
    memory segment resulting in slab corruption. But it seems like the better
    solution is to define ->recv_msg_slots as a 'void *' rather than a
    'struct xpc_notify_mq_msg_uv *' and add the type cast.

    Signed-off-by: Dean Nelson
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     
  • Do usual do {} while (0) dance, otherwise

    fs/gfs2/util.c:99: error: expected expression before 'else'
    drivers/scsi/lpfc/lpfc_sli.c:363: error: expected expression before 'else'

    Signed-off-by: Alexey Dobriyan
    Acked-by: Ivan Kokshaysky
    Cc: Richard Henderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If we return directly with -EPERM then lock_kernel() is still held.

    This was found with a code checker (http://repo.or.cz/w/smatch.git/).

    [akpm@linux-foundation.org: fix another such path - missed func_exit()]
    Signed-off-by: Dan Carpenter
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter