13 Oct, 2009

1 commit

  • Meaning receive multiple messages, reducing the number of syscalls and
    net stack entry/exit operations.

    Next patches will introduce mechanisms where protocols that want to
    optimize this operation will provide an unlocked_recvmsg operation.

    This takes into account comments made by:

    . Paul Moore: sock_recvmsg is called only for the first datagram,
    sock_recvmsg_nosec is used for the rest.

    . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
    works in the same fashion as the ppoll one.

    If the underlying protocol returns a datagram with MSG_OOB set, this
    will make recvmmsg return right away with as many datagrams (+ the OOB
    one) it has received so far.

    . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
    datagrams and then recvmsg returns an error, recvmmsg will return
    the successfully received datagrams, store the error and return it
    in the next call.

    This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
    where we will be able to acquire the lock only at batch start and end, not at
    every underlying recvmsg call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

05 Oct, 2009

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
    Revert "Seperate read and write statistics of in_flight requests"
    cfq-iosched: don't delay async queue if it hasn't dispatched at all
    block: Topology ioctls
    cfq-iosched: use assigned slice sync value, not default
    cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
    cfq-iosched: implement slower async initiate and queue ramp up
    cfq-iosched: delay async IO dispatch, if sync IO was just done
    cfq-iosched: add a knob for desktop interactiveness
    Add a tracepoint for block request remapping
    block: allow large discard requests
    block: use normal I/O path for discard requests
    swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
    fs/bio.c: move EXPORT* macros to line after function
    Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
    cciss: fix build when !PROC_FS
    block: Do not clamp max_hw_sectors for stacking devices
    block: Set max_sectors correctly for stacking devices
    cciss: cciss_host_attr_groups should be const
    cciss: Dynamically allocate the drive_info_struct for each logical drive.
    cciss: Add usage_count attribute to each logical drive in /sys
    ...

    Linus Torvalds
     

02 Oct, 2009

6 commits

  • This patch clean up/fixes for memcg's uncharge soft limit path.

    Problems:
    Now, res_counter_charge()/uncharge() handles softlimit information at
    charge/uncharge and softlimit-check is done when event counter per memcg
    goes over limit. Now, event counter per memcg is updated only when
    memory usage is over soft limit. Here, considering hierarchical memcg
    management, ancesotors should be taken care of.

    Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
    This is not good.

    Prolems:
    1. memcg's event counter incremented only when softlimit hits. That's bad.
    It makes event counter hard to be reused for other purpose.

    2. At uncharge, only the lowest level rescounter is handled. This is bug.
    Because ancesotor's event counter is not incremented, children should
    take care of them.

    3. res_counter_uncharge()'s 3rd argument is NULL in most case.
    ops under res_counter->lock should be small. No "if" sentense is better.

    Fixes:
    * Removed soft_limit_xx poitner and checks in charge and uncharge.
    Do-check-only-when-necessary scheme works enough well without them.

    * make event-counter of memcg incremented at every charge/uncharge.
    (per-cpu area will be accessed soon anyway)

    * All ancestors are checked at soft-limit-check. This is necessary because
    ancesotor's event counter may never be modified. Then, they should be
    checked at the same time.

    Reviewed-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • __css_put() doesn't check a bug as refcnt goes to minus.
    I think it should be caught. This patch adds a check for it.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • [akpm@linux-foundation.org: fix KVM]
    Signed-off-by: Alexey Dobriyan
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Starting from commit 4a4962263f07d14660849ec134ee42b63e95ea9a "reduce
    symbol table for loaded modules (v2)", the kernel/module.c build is broken
    with CONFIG_KALLSYMS disabled.

    CC kernel/module.o
    kernel/module.c:1995: warning: type defaults to 'int' in declaration of 'Elf_Hdr'
    kernel/module.c:1995: error: expected ';', ',' or ')' before '*' token
    kernel/module.c: In function 'load_module':
    kernel/module.c:2203: error: 'strmap' undeclared (first use in this function)
    kernel/module.c:2203: error: (Each undeclared identifier is reported only once
    kernel/module.c:2203: error: for each function it appears in.)
    kernel/module.c:2239: error: 'symoffs' undeclared (first use in this function)
    kernel/module.c:2239: error: implicit declaration of function 'layout_symtab'
    kernel/module.c:2240: error: 'stroffs' undeclared (first use in this function)
    make[1]: *** [kernel/module.o] Error 1
    make: *** [kernel/module.o] Error 2

    There are three different issues:

    - layout_symtab() takes a const Elf_Ehdr

    - layout_symtab() needs to return a value

    - symoffs/stroffs/strmap are referenced by the load_module() code
    despite being ifdefed out, which seems unnecessary given the noop
    behaviour of layout_symtab()/add_kallsyms() in the case of
    CONFIG_KALLSYMS=n.

    Signed-off-by: Paul Mundt
    Acked-by: Jan Beulich
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • Since 2.6.31 now has request-based device-mapper, it's useful to have
    a tracepoint for request-remapping as well as bio-remapping.
    This patch adds a tracepoint for request-remapping, trace_block_rq_remap().

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Cc: Alasdair G Kergon
    Cc: Li Zefan
    Signed-off-by: Jens Axboe

    Jun'ichi Nomura
     
  • Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
    introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82.
    Release kobject also in case the request_fn is NULL.

    Problem was noticed via kmemleak backtrace when some sysfs entries were
    note properly destroyed during device removal:

    unreferenced object 0xffff88001aa76640 (size 80):
    comm "lvcreate", pid 2120, jiffies 4294885144
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S.....
    backtrace:
    [] kmemleak_alloc+0x26/0x60
    [] kmem_cache_alloc+0x133/0x1c0
    [] sysfs_new_dirent+0x41/0x120
    [] sysfs_add_file_mode+0x3c/0xb0
    [] internal_create_group+0xc1/0x1a0
    [] sysfs_create_group+0x13/0x20
    [] blk_trace_init_sysfs+0x14/0x20
    [] blk_register_queue+0x3c/0xf0
    [] add_disk+0x94/0x160
    [] dm_create+0x598/0x6e0 [dm_mod]
    [] dev_create+0x51/0x350 [dm_mod]
    [] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [] compat_sys_ioctl+0xcd/0x4f0
    [] sysenter_dispatch+0x7/0x2c
    [] 0xffffffffffffffff

    Signed-off-by: Zdenek Kabelac
    Reviewed-by: Li Zefan
    Signed-off-by: Jens Axboe

    Zdenek Kabelac
     

01 Oct, 2009

1 commit

  • Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
    cmpxchg() of 64bit values was available on X86_32.

    That is not so - and causes some subtle scheduler misbehavior due
    to incorrect timestamps off to up by ~4 seconds.

    Two symptoms are known right now:

    - interactivity problems seen by Arjan: up to 600 msecs
    latencies instead of the expected 20-40 msecs. These
    latencies are very visible on the desktop.

    - incorrect CPU stats: occasionally too high percentages in 'top',
    and crazy CPU usage stats.

    Reported-by: Martin Schwidefsky
    Signed-off-by: Eric Dumazet
    Signed-off-by: Arjan van de Ven
    Acked-by: Linus Torvalds
    Cc: John Stultz
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     

28 Sep, 2009

2 commits


27 Sep, 2009

3 commits


25 Sep, 2009

5 commits

  • Conflicts:
    drivers/staging/Kconfig
    drivers/staging/Makefile
    drivers/staging/cpc-usb/TODO
    drivers/staging/cpc-usb/cpc-usb_drv.c
    drivers/staging/cpc-usb/cpc.h
    drivers/staging/cpc-usb/cpc_int.h
    drivers/staging/cpc-usb/cpcusb.h

    David S. Miller
     
  • git commit 75c5158f70c065b9 converted the clocksource spinlock to a
    mutex. This causes the following BUG:

    BUG: sleeping function called from invalid context at
    kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473,
    name: pm-suspend 2 locks held by pm-suspend/2473:
    #0: (&buffer->mutex){......}, at: []
    sysfs_write_file+0x3c/0x137
    #1: (pm_mutex){......}, at: []
    enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31
    #1 Call Trace:
    [] ? __debug_show_held_locks+0x22/0x24
    [] __might_sleep+0x107/0x10b
    [] mutex_lock_nested+0x25/0x43
    [] clocksource_resume+0x1c/0x60
    [] timekeeping_resume+0x1e/0x1c8
    [] __sysdev_resume+0x25/0xcf
    [] sysdev_resume+0x6d/0xae
    [] suspend_devices_and_enter+0x12b/0x1af
    [] enter_state+0xdf/0x130
    [] state_store+0xb6/0xd3
    [] kobj_attr_store+0x17/0x19
    [] sysfs_write_file+0xfb/0x137
    [] vfs_write+0xae/0x10b
    [] ? __up_read+0x1a/0x7f
    [] sys_write+0x4a/0x6e
    [] system_call_fastpath+0x16/0x1b

    clocksource_resume is called early in the resume process, there is
    only one cpu, no processes are running and the interrupts are
    disabled. It is therefore possible to resume the clocksources
    without taking the clocksource mutex.

    Reported-by: Xiaotian Feng
    Signed-off-by: Martin Schwidefsky
    Tested-by: Michal Schmidt
    Cc: Xiaotian Feng
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Martin Schwidefsky
     
  • The memory barrier semantics of futex_wait_queue_me() are
    non-obvious. Add some commentary to try and clarify it.

    Signed-off-by: Darren Hart
    Cc: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: Dinakar Guniguntala
    Cc: John Stultz
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Darren Hart
     
  • * 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (24 commits)
    microblaze: Disable heartbeat/enable emaclite in defconfigs
    microblaze: Support simpleImage.dts make target
    microblaze: Fix _start symbol to physical address
    microblaze: Use LOAD_OFFSET macro to get correct LMA for all sections
    microblaze: Create the LOAD_OFFSET macro used to compute VMA vs LMA offsets
    microblaze: Copy ppc asm-compat.h for clean handling of constants in asm and C
    microblaze: Actually show KiB rather than pages in "Freeing initrd memory:"
    microblaze: Support ptrace syscall tracing.
    microblaze: Updated CPU version and FPGA family codes in PVR
    microblaze: Generate correct signal and siginfo for integer div-by-zero
    microblaze: Don't be noisy when userspace causes hardware exceptions
    microblaze: Remove ipc.h file which points to non-existing asm-generic file
    microblaze: Clear sticky FSR register after generating exception signals
    microblaze: Ensure CPU usermode is set on new userspace processes
    microblaze: Use correct kbuild variable KBUILD_CFLAGS
    microblaze: Save and restore msr in hw exception
    microblaze: Add architectural support for USB EHCI host controllers
    microblaze: Implement include/asm/syscall.h.
    microblaze: Improve checking mechanism for MSR instruction
    microblaze: Add checking mechanism for MSR instruction
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    module: don't call percpu_modfree on NULL pointer.
    module: fix memory leak when load fails after srcversion/version allocated
    module: preferred way to use MODULE_AUTHOR
    param: allow whitespace as kernel parameter separator
    module: reduce string table for loaded modules (v2)
    module: reduce symbol table for loaded modules (v2)

    Linus Torvalds
     

24 Sep, 2009

21 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
    lsm: Use a compressed IPv6 string format in audit events
    Audit: send signal info if selinux is disabled
    Audit: rearrange audit_context to save 16 bytes per struct
    Audit: reorganize struct audit_watch to save 8 bytes

    Linus Torvalds
     
  • The general one handles NULL, the static obsolescent
    (CONFIG_HAVE_LEGACY_PER_CPU_AREA) one in module.c doesn't; Eric's
    commit 720eba31 assumed it did, and various frobbings since then kept
    that assumption.

    All other callers in module.c all protect it with an if; this effectively
    does the same as free_init is only goto if we fail percpu_modalloc().

    Reported-by: Kamalesh Babulal
    Signed-off-by: Rusty Russell
    Cc: Eric Dumazet
    Cc: Masami Hiramatsu
    Cc: Américo Wang
    Tested-by: Kamalesh Babulal

    Rusty Russell
     
  • Normally the twisty paths of sysfs will free the attributes, but not if
    we fail before we hook it into sysfs (which is the last thing we do in
    load_module).

    (This sysfs code is a turd, no doubt there are other issues lurking too).

    Reported-by: Tetsuo Handa
    Signed-off-by: Rusty Russell
    Cc: Catalin Marinas
    Tested-by: Tetsuo Handa

    Rusty Russell
     
  • Some boot mechanisms require that kernel parameters are stored in a
    separate file which is loaded to memory without further processing
    (e.g. the "Load from FTP" method on s390). When such a file contains
    newline characters, the kernel parameter preceding the newline might
    not be correctly parsed (due to the newline being stuck to the end of
    the actual parameter value) which can lead to boot failures.

    This patch improves kernel command line usability in such a situation
    by allowing generic whitespace characters as separators between kernel
    parameters.

    Signed-off-by: Peter Oberparleiter
    Signed-off-by: Rusty Russell

    Peter Oberparleiter
     
  • Also remove all parts of the string table (referenced by the symbol
    table) that are not needed for kallsyms use (i.e. which were only
    referenced by symbols discarded by the previous patch, or not
    referenced at all for whatever reason).

    Signed-off-by: Jan Beulich
    Signed-off-by: Rusty Russell

    Jan Beulich
     
  • Discard all symbols not interesting for kallsyms use: absolute,
    section, and in the common case (!KALLSYMS_ALL) data ones.

    Signed-off-by: Jan Beulich
    Signed-off-by: Rusty Russell

    Jan Beulich
     
  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
    HWPOISON: Enable error_remove_page on btrfs
    HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
    HWPOISON: Add madvise() based injector for hardware poisoned pages v4
    HWPOISON: Enable error_remove_page for NFS
    HWPOISON: Enable .remove_error_page for migration aware file systems
    HWPOISON: The high level memory error handler in the VM v7
    HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
    HWPOISON: shmem: call set_page_dirty() with locked page
    HWPOISON: Define a new error_remove_page address space op for async truncation
    HWPOISON: Add invalidate_inode_page
    HWPOISON: Refactor truncate to allow direct truncating of page v2
    HWPOISON: check and isolate corrupted free pages v2
    HWPOISON: Handle hardware poisoned pages in try_to_unmap
    HWPOISON: Use bitmask/action code for try_to_unmap behaviour
    HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
    HWPOISON: Add poison check to page fault handling
    HWPOISON: Add basic support for poisoned pages in fault handler v3
    HWPOISON: Add new SIGBUS error codes for hardware poison signals
    HWPOISON: Add support for poison swap entries v2
    HWPOISON: Export some rmap vma locking to outside world
    ...

    Linus Torvalds
     
  • Because the binfmt is not different between threads in the same process,
    it can be moved from task_struct to mm_struct. And binfmt moudle is
    handled per mm_struct instead of task_struct.

    Signed-off-by: Hiroshi Shimamoto
    Acked-by: Oleg Nesterov
    Cc: Rusty Russell
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hiroshi Shimamoto
     
  • ->ioctx_lock and ->ioctx_list are used only under CONFIG_AIO.

    Signed-off-by: Alexey Dobriyan
    Cc: Zach Brown
    Cc: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • CLONE_PARENT was used to implement an older threading model. For
    consistency with the CLONE_THREAD check in copy_pid_ns(), disable
    CLONE_PARENT with CLONE_NEWPID, at least until the required semantics of
    pid namespaces are clear.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Roland McGrath
    Acked-by: Serge Hallyn
    Cc: Oren Laadan
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • When global or container-init processes use CLONE_PARENT, they create a
    multi-rooted process tree. Besides siblings of global init remain as
    zombies on exit since they are not reaped by their parent (swapper). So
    prevent global and container-inits from creating siblings.

    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Eric W. Biederman
    Acked-by: Roland McGrath
    Cc: Oren Laadan
    Cc: Oleg Nesterov
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • __fatal_signal_pending inlines to one instruction on x86, probably two
    instructions on other machines. It takes two longer x86 instructions just
    to call it and test its return value, not to mention the function itself.

    On my random x86_64 config, this saved 70 bytes of text (59 of those being
    __fatal_signal_pending itself).

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Introduce do_send_sig_info() and convert group_send_sig_info(),
    send_sig_info(), do_send_specific() to use this helper.

    Hopefully it will have more users soon, it allows to specify
    specific/group behaviour via "bool group" argument.

    Shaves 80 bytes from .text.

    Signed-off-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: stephane eranian
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Introduce core pipe limiting sysctl.

    Since we can dump cores to pipe, rather than directly to the filesystem,
    we create a condition in which a user can create a very high load on the
    system simply by running bad applications.

    If the pipe reader specified in core_pattern is poorly written, we can
    have lots of ourstandig resources and processes in the system.

    This sysctl introduces an ability to limit that resource consumption.
    core_pipe_limit defines how many in-flight dumps may be run in parallel,
    dumps beyond this value are skipped and a note is made in the kernel log.
    A special value of 0 in core_pipe_limit denotes unlimited core dumps may
    be handled (this is the default value).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Neil Horman
    Reported-by: Earl Chew
    Cc: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • This changes tracehook_notify_jctl() so it's called with the siglock held,
    and changes its argument and return value definition. These clean-ups
    make it a better fit for what new tracing hooks need to check.

    Tracing needs the siglock here, held from the time TASK_STOPPED was set,
    to avoid potential SIGCONT races if it wants to allow any blocking in its
    tracing hooks.

    This also folds the finish_stop() function into its caller
    do_signal_stop(). The function is short, called only once and only
    unconditionally. It aids readability to fold it in.

    [oleg@redhat.com: do not call tracehook_notify_jctl() in TASK_STOPPED state]
    [oleg@redhat.com: introduce tracehook_finish_jctl() helper]
    Signed-off-by: Roland McGrath
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Current behaviour of sys_waitid() looks odd. If user passes infop ==
    NULL, sys_waitid() returns success. When user additionally specifies flag
    WNOWAIT, sys_waitid() returns -EFAULT on the same conditions. When user
    combines WNOWAIT with WCONTINUED, sys_waitid() again returns success.

    This patch adds check for ->wo_info in wait_noreap_copyout().

    User-visible change: starting from this commit, sys_waitid() always checks
    infop != NULL and does not fail if it is NULL.

    Signed-off-by: Vitaly Mayatskikh
    Reviewed-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Mayatskikh
     
  • do_wait() checks ->wo_info to figure out who is the caller. If it's not
    NULL the caller should be sys_waitid(), in that case do_wait() fixes up
    the retval or zeros ->wo_info, depending on retval from underlying
    function.

    This is bug: user can pass ->wo_info == NULL and sys_waitid() will return
    incorrect value.

    man 2 waitid says:

    waitid(): returns 0 on success

    Test-case:

    int main(void)
    {
    if (fork())
    assert(waitid(P_ALL, 0, NULL, WEXITED) == 0);

    return 0;
    }

    Result:

    Assertion `waitid(P_ALL, 0, ((void *)0), 4) == 0' failed.

    Move that code to sys_waitid().

    User-visible change: sys_waitid() will return 0 on success, either
    infop is set or not.

    Note, there's another bug in wait_noreap_copyout() which affects
    return value of sys_waitid(). It will be fixed in next patch.

    Signed-off-by: Vitaly Mayatskikh
    Reviewed-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Mayatskikh
     
  • Kill the unused "parent" argument in wait_consider_task(), it was never used.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Ingo Molnar
    Cc: Ratan Nalumasu
    Cc: Vitaly Mayatskikh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_pid_type() is only used by eligible_pid() which has to check wo_type
    != PIDTYPE_MAX anyway. Remove this check from task_pid_type() and factor
    out ->pids[type] access, this shrinks .text a bit and simplifies the code.

    The matches the behaviour of other similar helpers, say get_task_pid().
    The caller must ensure that pid_type is valid, not the callee.

    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • child_wait_callback()->eligible_child() is not right, we can miss the
    wakeup if the task was detached before __wake_up_parent() and the caller
    of do_wait() didn't use __WALL.

    Move ->wo_pid checks from eligible_child() to the new helper,
    eligible_pid(), and change child_wait_callback() to use it instead of
    eligible_child().

    Note: actually I think it would be better to fix the __WCLONE check in
    eligible_child(), it doesn't look exactly right. But it is not clear what
    is the supposed behaviour, and any change is user-visible.

    Reported-by: KAMEZAWA Hiroyuki
    Tested-by: KAMEZAWA Hiroyuki
    Signed-off-by: Oleg Nesterov
    Cc: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov