14 May, 2011

1 commit

  • wait_task_stopped() tested task_stopped_code() without acquiring
    siglock and, if stop condition existed, called wait_task_stopped() and
    directly returned the result. This patch moves the initial
    task_stopped_code() testing into wait_task_stopped() and make
    wait_consider_task() fall through to wait_task_continue() on 0 return.

    This is for the following two reasons.

    * Because the initial task_stopped_code() test is done without
    acquiring siglock, it may race against SIGCONT generation. The
    stopped condition might have been replaced by continued state by the
    time wait_task_stopped() acquired siglock. This may lead to
    unexpected failure of WNOHANG waits.

    This reorganization addresses this single race case but there are
    other cases - TASK_RUNNING -> TASK_STOPPED transition and EXIT_*
    transitions.

    * Scheduled ptrace updates require changes to the initial test which
    would fit better inside wait_task_stopped().

    Signed-off-by: Tejun Heo
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     

09 May, 2011

2 commits

  • GROUP_STOP_TRAPPING waiting mechanism piggybacks on
    signal->wait_chldexit which is primarily used to implement waiting for
    wait(2) and friends. When do_wait() waits on signal->wait_chldexit,
    it uses a custom wake up callback, child_wait_callback(), which
    expects the child task which is waking up the parent to be passed in
    as @key to filter out spurious wakeups.

    task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
    @key causing the following oops if the parent was doing do_wait().

    BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
    IP: [] child_wait_callback+0x29/0x80
    PGD 1d899067 PUD 1e418067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus
    CPU 2
    Modules linked in:

    Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs
    RIP: 0010:[] [] child_wait_callback+0x29/0x80
    RSP: 0000:ffff88001b889bf8 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20
    RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001
    FS: 00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000)
    Stack:
    ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9
    0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518
    0000000000000002 0000000000000082 0000000000000001 0000000000000000
    Call Trace:
    [] __wake_up_common+0x59/0x90
    [] __wake_up_sync_key+0x53/0x80
    [] __wake_up_sync+0x10/0x20
    [] task_clear_jobctl_trapping+0x44/0x50
    [] ptrace_stop+0x7c/0x290
    [] do_signal_stop+0x28a/0x2d0
    [] get_signal_to_deliver+0x14f/0x5a0
    [] do_signal+0x75/0x7b0
    [] do_notify_resume+0x5d/0x70
    [] retint_signal+0x46/0x8c
    Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44

    Fix it by using __wake_up_sync_key() and passing in the child as @key.

    I still think it's a mistake to piggyback on wait_chldexit for this.
    Given the relative low frequency of ptrace use, we would be much
    better off leaving already complex wait_chldexit alone and using bit
    waitqueue.

    Signed-off-by: Tejun Heo
    Reviewed-by: Oleg Nesterov

    Tejun Heo
     
  • sys_sigprocmask() changes current->blocked by hand. Convert this code
    to use set_current_blocked().

    Signed-off-by: Oleg Nesterov

    Oleg Nesterov
     

28 Apr, 2011

13 commits

  • Cleanup. Remove the unneeded goto's, we can simply read blocked.sig[0]
    unconditionally and then copy-to-user it if oset != NULL.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • As Tejun and Linus pointed out, "nand" is the wrong name for "x & ~y",
    it should be "andn". Rename signandsets() as suggested.

    Suggested-by: Tejun Heo
    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • do_sigtimedwait() changes current->blocked and thus it needs
    set_current_blocked()->retarget_shared_pending().

    We could use set_current_blocked() directly. It is fine to change
    ->real_blocked from all-zeroes to ->blocked and vice versa lockless,
    but this is not immediately clear, looks racy, and needs a huge
    comment to explain why this is correct.

    To keep the things simple this patch adds the new static helper,
    __set_task_blocked() which should be called with ->siglock held. This
    way we can change both ->real_blocked and ->blocked atomically under
    ->siglock as the current code does. This is more understandable.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • Factor out the common code in sys_rt_sigtimedwait/compat_sys_rt_sigtimedwait
    to the new helper, do_sigtimedwait().

    Add the comment to document the extra tick we add to timespec_to_jiffies(ts),
    thanks to Linus who explained this to me.

    Perhaps it would be better to move compat_sys_rt_sigtimedwait() into
    signal.c under CONFIG_COMPAT, then we can make do_sigtimedwait() static.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • No functional changes, cleanup compat_sys_rt_sigtimedwait() and
    sys_rt_sigtimedwait().

    Calculate the timeout before we take ->siglock, this simplifies and
    lessens the code. Use timespec_valid() to check the timespec.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo
    Reviewed-by: Matt Fleming

    Oleg Nesterov
     
  • sys_rt_sigprocmask() looks unnecessarily complicated, simplify it.
    We can just read current->blocked lockless unconditionally before
    anything else and then copy-to-user it if needed. At worst we
    copy 4 words on mips.

    We could copy-to-user the old mask first and simplify the code even
    more, but the patch tries to keep the current behaviour: we change
    current->block even if copy_to_user(oset) fails.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • Normally sys_rt_sigreturn() restores the old current->blocked which was
    changed by handle_signal(), and unblocking is always fine.

    But the debugger or application itself can change frame->uc_sigmask and
    thus we need set_current_blocked()->retarget_shared_pending().

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • This is ugly, but if sigprocmask() needs retarget_shared_pending() then
    handle signal should follow this logic. In theory it is newer correct to
    add the new signals to current->blocked, the signal handler can sleep/etc
    so we should notify other threads in case we block the pending signal and
    nobody else has TIF_SIGPENDING.

    Of course, this change doesn't make signals faster :/

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • In short, almost every changing of current->blocked is wrong, or at least
    can lead to the unexpected results.

    For example. Two threads T1 and T2, T1 sleeps in sigtimedwait/pause/etc.
    kill(tgid, SIG) can pick T2 for TIF_SIGPENDING. If T2 calls sigprocmask()
    and blocks SIG before it notices the pending signal, nobody else can handle
    this pending shared signal.

    I am not sure this is bug, but at least this looks strange imho. T1 should
    not sleep forever, there is a signal which should wake it up.

    This patch moves the code which actually changes ->blocked into the new
    helper, set_current_blocked() and changes this code to call
    retarget_shared_pending() as exit_signals() does. We should only care about
    the signals we just blocked, we use "newset & ~current->blocked" as a mask.

    We do not check !sigisemptyset(newblocked), retarget_shared_pending() is
    cheap unless mask & shared_pending.

    Note: for this particular case we could simply change sigprocmask() to
    return -EINTR if signal_pending(), but then we should change other callers
    and, more importantly, if we need this fix then set_current_blocked() will
    have more callers and some of them can't restart. See the next patch as a
    random example.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • No functional changes, preparation to simplify the review of the next change.

    1. We can read current->block lockless, nobody else can ever change this mask.

    2. Calculate the resulting sigset_t outside of ->siglock into the temporary
    variable, then take ->siglock and change ->blocked.

    Also, kill the stale comment about BKL.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • retarget_shared_pending() blindly does recalc_sigpending_and_wake() for
    every sub-thread, this is suboptimal. We can check t->blocked and stop
    looping once every bit in shared_pending has the new target.

    Note: we do not take task_is_stopped_or_traced(t) into account, we are
    not trying to speed up the signal delivery or to avoid the unnecessary
    (but harmless) signal_wake_up(0) in this unlikely case.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • exit_signals() checks signal_pending() before retarget_shared_pending() but
    this is suboptimal. We can avoid the while_each_thread() loop in case when
    there are no shared signals visible to us.

    Add the "shared_pending.signal & ~blocked" check. We don't use tsk->blocked
    directly but pass ~blocked as an argument, this is needed for the next patch.

    Note: we can optimize this more. while_each_thread(t) can check t->blocked
    into account and stop after every pending signal has the new target, see the
    next patch.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • No functional changes. Move the notify-other-threads code from exit_signals()
    to the new helper, retarget_shared_pending().

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     

08 Apr, 2011

3 commits


07 Apr, 2011

1 commit

  • The normal mmap paths all avoid creating a mapping where the pgoff
    inside the mapping could wrap around due to overflow. However, an
    expanding mremap() can take such a non-wrapping mapping and make it
    bigger and cause a wrapping condition.

    Noticed by Robert Swiecki when running a system call fuzzer, where it
    caused a BUG_ON() due to terminally confusing the vma_prio_tree code. A
    vma dumping patch by Hugh then pinpointed the crazy wrapped case.

    Reported-and-tested-by: Robert Swiecki
    Acked-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Apr, 2011

20 commits

  • CH Pro Throttle needs NOGET the same way as other products from
    the same vendor require.

    Reported-by: Unavowed
    Signed-off-by: Jiri Kosina

    Jiri Kosina
     
  • The evdev buffer isn't big enough when you get many fingers on the
    device. Bump up the buffer to a reasonable size, matching what other
    multitouch devices use. Without this change, events may be discarded in
    the evdev buffer before they are read.

    Reported-by: Simon Budig
    Cc: Henrik Rydberg
    Cc: Jiri Kosina
    Cc: stable@kernel.org
    Signed-off-by: Chase Douglas
    Acked-by: Henrik Rydberg
    Signed-off-by: Jiri Kosina

    Chase Douglas
     
  • Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block:
    ide: always ensure that blk_delay_queue() is called if we have pending IO
    block: fix request sorting at unplug
    dm: improve block integrity support
    fs: export empty_aops
    ide: ide_requeue_and_plug() reinstate "always plug" behaviour
    blk-throttle: don't call xchg on bool
    ufs: remove unessecary blk_flush_plug
    block: make the flush insertion use the tail of the dispatch list
    block: get rid of elv_insert() interface
    block: dump request state on seeing a corrupted request completion

    Linus Torvalds
     
  • On an error path in inotify_init1 a normal user can trigger a double
    free of struct user. This is a regression introduced by a2ae4cc9a16e
    ("inotify: stop kernel memory leak on file creation failure").

    We fix this by making sure that if a group exists the user reference is
    dropped when the group is cleaned up. We should not explictly drop the
    reference on error and also drop the reference when the group is cleaned
    up.

    The new lifetime rules are that an inotify group lives from
    inotify_new_group to the last fsnotify_put_group. Since the struct user
    and inotify_devs are directly tied to this lifetime they are only
    changed/updated in those two locations. We get rid of all special
    casing of struct user or user->inotify_devs.

    Signed-off-by: Eric Paris
    Cc: stable@kernel.org (2.6.37 and up)
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Just because we are not requeuing a request does not mean that
    some aren't pending. So always issue a blk_delay_queue() if
    either we are requeueing OR there's pending IO.

    This fixes a boot problem for some IDE boxes.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Comparison function for list_sort() must be anticommutative,
    otherwise it is not sorting in ordinary meaning.

    But fortunately list_sort() always check ((*cmp)(priv, a, b)
    Signed-off-by: Jens Axboe

    Konstantin Khlebnikov
     
  • The current block integrity (DIF/DIX) support in DM is verifying that
    all devices' integrity profiles match during DM device resume (which
    is past the point of no return). To some degree that is unavoidable
    (stacked DM devices force this late checking). But for most DM
    devices (which aren't stacking on other DM devices) the ideal time to
    verify all integrity profiles match is during table load.

    Introduce the notion of an "initialized" integrity profile: a profile
    that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
    template. Add blk_integrity_is_initialized() to allow checking if a
    profile was initialized.

    Update DM integrity support to:
    - check all devices with _initialized_ integrity profiles match
    during table load; uninitialized profiles (e.g. for underlying DM
    device(s) of a stacked DM device) are ignored.
    - disallow a table load that would result in an integrity profile that
    conflicts with a DM device's existing (in-use) integrity profile
    - avoid clearing an existing integrity profile
    - validate all integrity profiles match during resume; but if they
    don't all we can do is report the mismatch (during resume we're past
    the point of no return)

    Signed-off-by: Mike Snitzer
    Cc: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Mike Snitzer
     
  • With the ->sync_page() hook gone, we have a few users that
    add their own static address_space_operations without any
    functions defined.

    fs/inode.c already has an empty_aops that it uses for init
    purposes. Lets export that and use it in the places where
    an otherwise empty aops was defined.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We see stalls if we don't always ensure that the queue gets run
    again. Even if rq == NULL, we could have other pending requests
    in the queue.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • xchg does not work portably with smaller than 32bit types.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Jens Axboe

    Andreas Schwab
     
  • We already flush the per-process plugging list when context switching,
    so a blk_flush_plug call just before a yield() is not needed.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • It's not a preempt type request, in fact we have to insert it
    behind requests that do specify INSERT_FRONT.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Merge it with __elv_add_request(), it's pretty pointless to
    have a function with only two callers. The main interface
    is elv_add_request()/__elv_add_request().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we just dump a non-informative 'request botched' message.
    Lets actually try and print something sane to help debug issues
    around this.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • * 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
    drm/i915/lvds: Remove 0xa0 DDC probe for LVDS
    drm/i915/crt: Remove 0xa0 probe for VGA

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: rpckbd - fix a leak of the IRQ during init failure
    Input: wacom - add support for Lenovo tablet ID (0xE6)
    Input: i8042 - downgrade selftest error message to dbg()
    Input: synaptics - fix crash in synaptics_module_init()
    Input: spear-keyboard - fix inverted condition in interrupt handler
    Input: uinput - allow for 0/0 min/max on absolute axes.
    Input: sparse-keymap - report KEY_UNKNOWN for unknown scan codes
    Input: sparse-keymap - report scancodes with key events
    Input: h3600_ts_input - fix a spelling error
    Input: wacom - report resolution for pen devices
    Input: wacom - constify wacom_features for a new missed Bamboo models

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/pseries: Fix build without CONFIG_HOTPLUG_CPU
    powerpc: Set nr_cpu_ids early and use it to free PACAs
    powerpc/pseries: Don't register global initcall
    powerpc/kexec: Fix mismatched ifdefs for PPC64/SMP.
    edac/mpc85xx: Limit setting/clearing of HID1[RFXE] to e500v1/v2 cores
    powerpc/85xx: Update dts for PCIe memory maps to match u-boot of Px020RDB

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: don't warn in btrfs_add_orphan
    Btrfs: fix free space cache when there are pinned extents and clusters V2
    Btrfs: Fix uninitialized root flags for subvolumes
    btrfs: clear __GFP_FS flag in the space cache inode
    Btrfs: fix memory leak in start_transaction()
    Btrfs: fix memory leak in btrfs_ioctl_start_sync()
    Btrfs: fix subvol_sem leak in btrfs_rename()
    Btrfs: Fix oops for defrag with compression turned on
    Btrfs: fix /proc/mounts info.
    Btrfs: fix compiler warning in file.c

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
    ipv6: Don't pass invalid dst_entry pointer to dst_release().
    mlx4: fix kfree on error path in new_steering_entry()
    tcp: len check is unnecessarily devastating, change to WARN_ON
    sctp: malloc enough room for asconf-ack chunk
    sctp: fix auth_hmacs field's length of struct sctp_cookie
    net: Fix dev dev_ethtool_get_rx_csum() for forced NETIF_F_RXCSUM
    usbnet: use eth%d name for known ethernet devices
    starfire: clean up dma_addr_t size test
    iwlegacy: fix bugs in change_interface
    carl9170: Fix tx aggregation problems with some clients
    iwl3945: disable hw scan by default
    wireless: rt2x00: rt2800usb.c add and identify ids
    iwl3945: do not deprecate software scan
    mac80211: fix aggregation frame release during timeout
    cfg80211: fix BSS double-unlinking (continued)
    cfg80211:: fix possible NULL pointer dereference
    mac80211: fix possible NULL pointer dereference
    mac80211: fix NULL pointer dereference in ieee80211_key_alloc()
    ath9k: fix a chip wakeup related crash in ath9k_start
    mac80211: fix a crash in minstrel_ht in HT mode with no supported MCS rates
    ...

    Linus Torvalds