01 Jun, 2007

3 commits

  • Make timer-stats have almost zero overhead when enabled in the config but
    not used. (this way distros can enable it more easily)

    Also update the documentation about overhead of timer_stats - it was
    written for the first version which had a global lock and a linear list
    walk based lookup ;-)

    Signed-off-by: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Fix two races in the timer stats lookup code. One by ensuring that the
    initialization of a new entry is finished upon insertion of that entry.
    The other by cleaning up the hash table when the entries array is cleared,
    so that we don't have any "pre-inserted" entries.

    Thanks to Eric Dumazet for reminding me of the memory barriers.

    Signed-off-by: Bjorn Steinbrink
    Signed-off-by: Ian Kumlien
    Acked-by: Ingo Molnar
    Cc: Eric Dumazet
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Steinbrink
     
  • When the private futex support was added the compat code wasn't changed.
    The result is that code using compat code which fail, e.g., because the
    timeout values are not correctly passed. The following patch should fix
    that.

    Signed-off-by: Ulrich Drepper
    Cc: Eric Dumazet
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

31 May, 2007

1 commit


30 May, 2007

1 commit

  • get_next_timer_interrupt() returns a delta of (LONG_MAX > 1) in case
    there is no timer pending. On 64 bit machines this results in a
    multiplication overflow in tick_nohz_stop_sched_tick().

    Reported by: Dave Miller

    Make the return value a constant and limit the return value to a 32 bit
    value.

    When the max timeout value is returned, we can safely stop the tick
    timer device. The max jiffies delta results in a 12 days timeout for
    HZ=1000.

    In the long term the get_next_timer_interrupt() code needs to be
    reworked to return ktime instead of jiffies, but we have to wait until
    the last users of the original NO_IDLE_HZ code are converted.

    Signed-off-by: Thomas Gleixner
    Acked-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

24 May, 2007

12 commits

  • With irqpoll enabled, trying to test the IRQF_IRQPOLL flag in the
    actions would cause a NULL pointer dereference if no action was
    installed (for example, the driver might have been unloaded with
    interrupts still pending).

    So be a bit more careful about testing the flag by making sure to test
    for that case.

    (The actual _change_ is trivial, the patch is more than a one-liner
    because I rewrote the testing to also be much more readable.

    Original (discarded) bugfix by Bernhard Walle.

    Cc: Bernhard Walle
    Tested-by: Vivek Goyal
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The NOHZ patch contains a check for softirqs pending when a CPU goes idle.
    The BUG is unrelated to NOHZ, it just was made visible by the NOHZ patch.
    The BUG showed up mainly on P4 / hyperthreading enabled machines which lead
    the investigations into the wrong direction in the first place. The real
    cause is in cond_resched_softirq():

    cond_resched_softirq() is enabling softirqs without invoking the softirq
    daemon when softirqs are pending. This leads to the warning message in the
    NOHZ idle code:

    t1 runs softirq disabled code on CPU#0
    interrupt happens, softirq is raised, but deferred (softirqs disabled)
    t1 calls cond_resched_softirq()
    enables softirqs via _local_bh_enable()
    calls schedule()
    t2 runs
    t1 is migrated to CPU#1
    t2 is done and invokes idle()
    NOHZ detects the pending softirq

    Fix: change _local_bh_enable() to local_bh_enable() so the softirq
    daemon is invoked.

    Thanks to Anant Nitya for debugging this with great patience !

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Fix sizeof(PAGE_SIZE) typo. It should be just PAGE_SIZE for zeroing the
    swsusp_header.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: OGAWA Hirofumi
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • cleanup_workqueue_thread() and cwq_should_stop() are overcomplicated.

    Convert the code to use kthread_should_stop/kthread_stop as was
    suggested by Gautham and Srivatsa.

    In particular this patch removes the (unlikely) busy-wait loop from the
    exit path, it was a temporary and ugly kludge (if not a bug).

    Note: the current code was designed to solve another old problem:
    work->func can't share locks with hotplug callbacks. I think this could
    be done, see

    http://marc.info/?l=linux-kernel&m=116905366428633

    but this needs some more complications to preserve CPU affinity of
    cwq->thread during cpu_up(). A freezer-based hotplug looks more
    appealing.

    [akpm@linux-foundation.org: make it more tolerant of gcc borkenness]
    Signed-off-by: Oleg Nesterov
    Cc: Zilvinas Valinskas
    Cc: Gautham R Shenoy
    Cc: Srivatsa Vaddagiri
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Steve Hawkes discovered a problem where recalc_sigpending_tsk was called in
    do_sigaction but no signal_wake_up call was made, preventing later signals
    from waking up blocked threads with TIF_SIGPENDING already set.

    In fact, the few other calls to recalc_sigpending_tsk outside the signals
    code are also subject to this problem in other race conditions.

    This change makes recalc_sigpending_tsk private to the signals code. It
    changes the outside calls, as well as do_sigaction, to use the new
    recalc_sigpending_and_wake instead.

    Signed-off-by: Roland McGrath
    Cc:
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • The warning in the NOHZ code, which triggers when a CPU goes idle with
    softirqs pending can fill up the logs quite quickly. Rate limit the output
    until we found the root cause of that problem.

    Signed-off-by: Thomas Gleixner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard hang,
    because ACPI ignores the maxcpus setting and sends timer broadcast info for
    the offline CPUs. This results in a stuck for ever call to
    smp_call_function_single() on an offline CPU.

    Ignore the bogus information and print a kernel error to remind ACPI
    folks to fix it.

    Signed-off-by: Thomas Gleixner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Other than refrigerator, no one else calls frozen_process(). So move it from
    include/linux/freezer.h to kernel/power/process.c.

    Also, since a task can be marked as frozen by itself, we don't need to pass
    the (struct task_struct *p) parameter to frozen_process().

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Rafael J. Wysocki
    Cc: Oleg Nesterov
    Cc: Pavel Machek
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham R Shenoy
     
  • kthread() sleeps in TASK_INTERRUPTIBLE state waiting for the first wakeup. In
    theory, this wakeup may come from freeze_process()->signal_wake_up(), so the
    task can disappear even before kthread_create() sets its ->comm.

    Change kthread() to use TASK_UNINTERRUPTIBLE.

    [akpm@linux-foundation.org: s/BUG_ON/WARN_ON+recover]
    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Kernel threads can become userland processes by calling kernel_execve().

    In particular, this may happen right after the try_to_freeze_tasks()
    called with FREEZER_USER_SPACE has returned, so try_to_freeze_tasks()
    needs to take userspace processes into consideration even if it is
    called with FREEZER_KERNEL_THREADS.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Currently try_to_freeze_tasks() has to wait until all of the vforked processes
    exit and for this reason every user can make it fail. To fix this problem we
    can introduce the additional process flag PF_FREEZER_SKIP to be used by tasks
    that do not want to be counted as freezable by the freezer and want to have
    TIF_FREEZE set nevertheless. Then, this flag can be set by tasks using
    sys_vfork() before they call wait_for_completion(&vfork) and cleared after
    they have woken up. After clearing it, the tasks should call try_to_freeze()
    as soon as possible.

    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Cc: Pavel Machek
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • If the freezing of tasks fails and a task is preempted in refrigerator()
    before calling frozen_process(), then thaw_tasks() may run before this task is
    frozen. In that case the task will freeze and no one will thaw it.

    To fix this race we can call freezing(current) in refrigerator() along with
    frozen_process(current) under the task_lock() which also should be taken in
    the error path of try_to_freeze_tasks() as well as in thaw_process().
    Moreover, if thaw_process() additionally clears TIF_FREEZE for tasks that are
    not frozen, we can be sure that all tasks are thawed and there are no pending
    "freeze" requests after thaw_tasks() has run.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Gautham R Shenoy
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

22 May, 2007

1 commit

  • First thing mm.h does is including sched.h solely for can_do_mlock() inline
    function which has "current" dereference inside. By dealing with can_do_mlock()
    mm.h can be detached from sched.h which is good. See below, why.

    This patch
    a) removes unconditional inclusion of sched.h from mm.h
    b) makes can_do_mlock() normal function in mm/mlock.c
    c) exports can_do_mlock() to not break compilation
    d) adds sched.h inclusions back to files that were getting it indirectly.
    e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
    getting them indirectly

    Net result is:
    a) mm.h users would get less code to open, read, preprocess, parse, ... if
    they don't need sched.h
    b) sched.h stops being dependency for significant number of files:
    on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
    after patch it's only 3744 (-8.3%).

    Cross-compile tested on

    all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
    alpha alpha-up
    arm
    i386 i386-up i386-defconfig i386-allnoconfig
    ia64 ia64-up
    m68k
    mips
    parisc parisc-up
    powerpc powerpc-up
    s390 s390-up
    sparc sparc-up
    sparc64 sparc64-up
    um-x86_64
    x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

    as well as my two usual configs.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

17 May, 2007

4 commits

  • The sysfs files /sys/power/disk and /sys/power/state do not work as
    documented, since they allow the user to write only a few initial
    characters of the input string to trigger the option (eg. 'echo pl >
    /sys/power/disk' activates the platform mode of hibernation). Fix it.

    Special thanks to Peter Moulder for
    pointing out the problem.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core
    filename size and change it to 128, so that extensive patterns such as
    '/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename
    generation.

    Signed-off-by: Dan Aloni
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Aloni
     
  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • In commit e3c7db621bed4afb8e231cb005057f2feb5db557 we fixed the resume
    ordering, so that the ACPI low-level resume code was called before the
    actual driver resume was called. However, that broke the nesting logic
    of suspend and resume, and we continued to suspend the devices _after_
    we the ACPI device suspend code was called.

    That resulted in us saving PCI state for devices that had already been
    changed by ACPI, and in some cases disabled entirely (causing the PCI
    save_state to be all-ones). Which in turn caused the wrong state to be
    written back on resume.

    This moves the ACPI device suspend to after the device model per-device
    suspend() calls. This fixes the bogus state save.

    Thanks to Lukáš Hejtmánek for testing.

    Acked-by: Lukas Hejtmanek
    Acked-by: Rafael J. Wysocki
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: Andrew Morton
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

16 May, 2007

1 commit


15 May, 2007

2 commits

  • lockdep complains about the lock nesting of clocksource and watchdog lock
    in the resume path.

    Change the resume marker to a bit operation and remove the lock from this
    path.

    Signed-off-by: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The time keeping code move to kernel/time/timekeeping.c broke the
    clocksource resume logic patch, which got applied to the old file by a
    fuzzy application. Fix it up and move the clocksource_resume() call to
    the appropriate place.

    Signed-off-by: Thomas Gleixner
    [ tssk, tssk, everybody should use --fuzz=0 ]
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

13 May, 2007

1 commit


12 May, 2007

3 commits

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] Quicklist support for IA64
    [IA64] fix Kprobes reentrancy
    [IA64] SN: validate smp_affinity mask on intr redirect
    [IA64] drivers/char/snsc_event.c:206: warning: unused variable `p'
    [IA64] mca.c:121: warning: 'cpe_poll_timer' defined but not used
    [IA64] Fix - Section mismatch: reference to .init.data:mvec_name
    [IA64] more warning cleanups
    [IA64] Wire up epoll_pwait and utimensat
    [IA64] Fix warnings resulting from type-checking in dev_dbg()
    [IA64] typo s/kenrel/kernel/

    Linus Torvalds
     
  • * 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
    [PATCH] Abnormal End of Processes
    [PATCH] match audit name data
    [PATCH] complete message queue auditing
    [PATCH] audit inode for all xattr syscalls
    [PATCH] initialize name osid
    [PATCH] audit signal recipients
    [PATCH] add SIGNAL syscall class (v3)
    [PATCH] auditing ptrace

    Linus Torvalds
     
  • On SN, only allow one bit to be set in the smp_affinty mask when
    redirecting an interrupt. Currently setting multiple bits is allowed, but
    only the first bit is used in determining the CPU to redirect to. This has
    caused confusion among some customers.

    [akpm@linux-foundation.org: fixes]
    Signed-off-by: John Keller
    Signed-off-by: Andrew Morton
    Signed-off-by: Tony Luck

    John Keller
     

11 May, 2007

11 commits

  • This is a very simple and light file descriptor, that can be used as event
    wait/dispatch by userspace (both wait and dispatch) and by the kernel
    (dispatch only). It can be used instead of pipe(2) in all cases where those
    would simply be used to signal events. Their kernel overhead is much lower
    than pipes, and they do not consume two fds. When used in the kernel, it can
    offer an fd-bridge to enable, for example, functionalities like KAIO or
    syslets/threadlets to signal to an fd the completion of certain operations.
    But more in general, an eventfd can be used by the kernel to signal readiness,
    in a POSIX poll/select way, of interfaces that would otherwise be incompatible
    with it. The API is:

    int eventfd(unsigned int count);

    The eventfd API accepts an initial "count" parameter, and returns an eventfd
    fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

    The POLLIN flag is raised when the internal counter is greater than zero.

    The POLLOUT flag is raised when at least a value of "1" can be written to the
    internal counter.

    The POLLERR flag is raised when an overflow in the counter value is detected.

    The write(2) operation can never overflow the counter, since it blocks (unless
    O_NONBLOCK is set, in which case -EAGAIN is returned).

    But the eventfd_signal() function can do it, since it's supposed to not sleep
    during its operation.

    The read(2) function reads the __u64 counter value, and reset the internal
    value to zero. If the value read is equal to (__u64) -1, an overflow happened
    on the internal counter (due to 2^64 eventfd_signal() posts that has never
    been retired - unlickely, but possible).

    The write(2) call writes an __u64 count value, and adds it to the current
    counter. The eventfd fd supports O_NONBLOCK also.

    On the kernel side, we have:

    struct file *eventfd_fget(int fd);
    int eventfd_signal(struct file *file, unsigned int n);

    The eventfd_fget() should be called to get a struct file* from an eventfd fd
    (this is an fget() + check of f_op being an eventfd fops pointer).

    The kernel can then call eventfd_signal() every time it wants to post an event
    to userspace. The eventfd_signal() function can be called from any context.
    An eventfd() simple test and bench is available here:

    http://www.xmailserver.org/eventfd-bench.c

    This is the eventfd-based version of pipetest-4 (pipe(2) based):

    http://www.xmailserver.org/pipetest-4.c

    Not that performance matters much in the eventfd case, but eventfd-bench
    shows almost as double as performance than pipetest-4.

    [akpm@linux-foundation.org: fix i386 build]
    [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch implements the necessary compat code for the timerfd system call.

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch introduces a new system call for timers events delivered though
    file descriptors. This allows timer event to be used with standard POSIX
    poll(2), select(2) and read(2). As a consequence of supporting the Linux
    f_op->poll subsystem, they can be used with epoll(2) too.

    The system call is defined as:

    int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

    The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
    w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
    s new file descriptor will be created, otherwise the existing "ufd" will be
    re-programmed.

    The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
    specified in the "utmr->it_value" parameter is the expiry time for the timer.

    If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
    otherwise it's a relative time.

    If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
    tv_nsec == 0), this is the period at which the following ticks should be
    generated.

    The "utmr->it_interval" should be set to zero if only one tick is requested.
    Setting the "utmr->it_value" to zero will disable the timer, or will create a
    timerfd without the timer enabled.

    The function returns the new (or same, in case "ufd" is a valid timerfd
    descriptor) file, or -1 in case of error.

    As stated before, the timerfd file descriptor supports poll(2), select(2) and
    epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
    returned.

    The read(2) call can be used, and it will return a u32 variable holding the
    number of "ticks" that happened on the interface since the last call to
    read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
    be returned if no ticks happened.

    A quick test program, shows timerfd working correctly on my amd64 box:

    http://www.xmailserver.org/timerfd-test.c

    [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch series implements the new signalfd() system call.

    I took part of the original Linus code (and you know how badly it can be
    broken :), and I added even more breakage ;) Signals are fetched from the same
    signal queue used by the process, so signalfd will compete with standard
    kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
    the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
    seems to be working fine on my Dual Opteron machine. I made a quick test
    program for it:

    http://www.xmailserver.org/signafd-test.c

    The signalfd() system call implements signal delivery into a file descriptor
    receiver. The signalfd file descriptor if created with the following API:

    int signalfd(int ufd, const sigset_t *mask, size_t masksize);

    The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
    to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
    signalfd file.

    The "mask" allows to specify the signal mask of signals that we are interested
    in. The "masksize" parameter is the size of "mask".

    The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
    will return POLLIN when signals are available to be dequeued. As a direct
    consequence of supporting the Linux poll subsystem, the signalfd fd can use
    used together with epoll(2) too.

    The read(2) system call will return a "struct signalfd_siginfo" structure in
    the userspace supplied buffer. The return value is the number of bytes copied
    in the supplied buffer, or -1 in case of error. The read(2) call can also
    return 0, in case the sighand structure to which the signalfd was attached,
    has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
    return -EAGAIN in case no signal is available.

    If the size of the buffer passed to read(2) is lower than sizeof(struct
    signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
    return -ERESTARTSYS in case a signal hits the process. The format of the
    struct signalfd_siginfo is, and the valid fields depends of the (->code &
    __SI_MASK) value, in the same way a struct siginfo would:

    struct signalfd_siginfo {
    __u32 signo; /* si_signo */
    __s32 err; /* si_errno */
    __s32 code; /* si_code */
    __u32 pid; /* si_pid */
    __u32 uid; /* si_uid */
    __s32 fd; /* si_fd */
    __u32 tid; /* si_fd */
    __u32 band; /* si_band */
    __u32 overrun; /* si_overrun */
    __u32 trapno; /* si_trapno */
    __s32 status; /* si_status */
    __s32 svint; /* si_int */
    __u64 svptr; /* si_ptr */
    __u64 utime; /* si_utime */
    __u64 stime; /* si_stime */
    __u64 addr; /* si_addr */
    };

    [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Use task_pgrp() and task_session() in copy_process(), and avoid find_pid()
    call when attaching the task to its process group and session.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Acked-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Modify copy_process() to take a struct pid * parameter instead of a pid_t.
    This simplifies the code a bit and also avoids having to call find_pid() to
    convert the pid_t to a struct pid.

    Changelog:
    - Fixed Badari Pulavarty's comments and passed in &init_struct_pid
    from fork_idle().
    - Fixed Eric Biederman's comments and simplified this patch and
    used a new patch to remove the likely(pid) check.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Eric Biederman
    Cc:
    Acked-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Statically initialize a struct pid for the swapper process (pid_t == 0) and
    attach it to init_task. This is needed so task_pid(), task_pgrp() and
    task_session() interfaces work on the swapper process also.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc: Eric Biederman
    Cc: Herbert Poetzl
    Cc:
    Acked-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • attach_pid() currently takes a pid_t and then uses find_pid() to find the
    corresponding struct pid. Sometimes we already have the struct pid. We can
    then skip find_pid() if attach_pid() were to take a struct pid parameter.

    Signed-off-by: Sukadev Bhattiprolu
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Serge Hallyn
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Switch to the defines for these two checks, instead of hard coding the
    values.

    [akpm@linux-foundation.org: add missing include]
    Signed-off-by: Daniel Walker
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Add a call to hard_irq_disable() to stop_machine so that we make sure IRQs are
    really disabled and not only lazy-disabled on archs like powerpc as some users
    of stop_machine() may rely on that.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Benjamin Herrenschmidt
    Acked-by: Rusty Russell
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • If CONFIG_TASK_IO_ACCOUNTING is defined, we update io accounting counters for
    each task.

    This patch permits reporting of values using the well known getrusage()
    syscall, filling ru_inblock and ru_oublock instead of null values.

    As TASK_IO_ACCOUNTING currently counts bytes counts, we approximate blocks
    count doing : nr_blocks = nr_bytes / 512

    Example of use :
    ----------------------
    After patch is applied, /usr/bin/time command can now give a good
    approximation of IO that the process had to do.

    $ /usr/bin/time grep tototo /usr/include/*
    Command exited with non-zero status 1
    0.00user 0.02system 0:02.11elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
    24288inputs+0outputs (0major+259minor)pagefaults 0swaps

    $ /usr/bin/time dd if=/dev/zero of=/tmp/testfile count=1000
    1000+0 enregistrements lus
    1000+0 enregistrements écrits
    512000 octets (512 kB) copiés, 0,00326601 seconde, 157 MB/s
    0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+3000outputs (0major+299minor)pagefaults 0swaps

    Signed-off-by: Eric Dumazet
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet