08 Dec, 2006

8 commits

  • Thomas Graf wrote:
    >
    > nla_nest_start() may return NULL, either rely on prepare_reply() to be
    > correct and BUG() on failure or do proper error handling for all
    > functions.

    nla_put() in taskstat.c can fail only if the 'size' argument of alloc_skb()
    was not right. This is a kernel bug, we should not hide it. So add 'BUG()'
    on error path and check for 'na == NULL'.

    > genlmsg_cancel() is only required in error paths for dumping
    > procedures.

    So we can remove 'genlmsg_cancel()' calls and 'void *reply' (saves 227 bytes).

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Graf
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Currently taskstats_user_cmd()/taskstats_exit() do:

    1) allocate stats
    2) fill stats
    3) make a temporary copy on stack (236 bytes)
    4) copy that copy to skb
    5) free stats

    With the help of nla_reserve() we can operate on skb->data directly,
    thus avoiding all these steps except 2).

    So, before this patch:

    // copy *stats to skb->data
    int mk_reply(skb, ..., struct taskstats *stats);

    fill_pid(stats);
    mk_reply(skb, ..., stats);

    After:
    // return a pointer to skb->data
    struct taskstats *mk_reply(skb, ...);

    stat = mk_reply(skb, ...);
    fill_pid(stats);

    Shrinks taskatsks.o by 162 bytes.

    A stupid benchmark (send one million TASKSTATS_CMD_ATTR_PID) shows the

    real user sys
    before:
    4.02 0.06 3.96
    4.02 0.04 3.98
    4.02 0.04 3.97
    after:
    3.86 0.08 3.78
    3.88 0.10 3.77
    3.89 0.09 3.80

    but this looks suspiciously good.

    Signed-off-by: Oleg Nesterov
    Acked-by: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Thomas Graf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Introduce mk_reply() helper which does all nla_put()s on reply.

    Saves 453 bytes and a preparation for the next patch.

    Signed-off-by: Oleg Nesterov
    Acked-by: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Thomas Graf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Allocate ->signal->stats on demand in taskstats_exit(), this allows us to
    remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's
    public interface.

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Shailabh Nagar
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • do_exit:
    taskstats_exit_alloc()
    ...
    taskstats_exit_send()
    taskstats_exit_free()

    I think this is not good, let it be a single function exported to the core
    kernel, taskstats_exit(), which does alloc + send + free itself.

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Shailabh Nagar
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • If there are no listeners, every task does unneeded kmem_cache alloc/free on
    exit. We don't need listeners->sem for 'if (!list_empty())' check. Yes, we may
    have a false positive, but this doesn't differ from the case when the listener
    is unregistered after we drop the semaphore. So we don't need to do allocation
    beforehand.

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Acked-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

03 Dec, 2006

3 commits


01 Nov, 2006

1 commit

  • If there are no listeners, taskstats_exit_send() just returns because
    taskstats_exit_alloc() didn't allocate *tidstats. This is wrong, each
    sub-thread should do fill_tgid_exit() on exit, otherwise its ->delays is
    not recorded in ->signal->stats and lost.

    Q: We don't send TASKSTATS_TYPE_AGGR_TGID when single-threaded process
    exits. Is it good? How can the listener figure out that it was actually a
    process exit, not sub-thread?

    Signed-off-by: Oleg Nesterov
    Cc: Balbir Singh
    Acked-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

30 Oct, 2006

2 commits

  • prepare_reply() adds GENL_HDRLEN to the payload (genlmsg_total_size()),
    but then it does genlmsg_put()->nlmsg_put(). This means we forget to
    reserve a room for 'struct nlmsghdr'.

    Signed-off-by: Oleg Nesterov
    Cc: Thomas Graf
    Cc: Andrew Morton
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 'return genlmsg_cancel()' in taskstats_user_cmd/taskstats_exit_send
    potentially leaks a skb. Unless we pass 'rep_skb' to the netlink layer
    we own sk_buff. This means we should always do kfree_skb() on failure.

    [ Thomas acked and pointed out missing return value in original version ]

    Signed-off-by: Oleg Nesterov
    Acked-by: Thomas Graf
    Cc: Andrew Morton
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

29 Oct, 2006

4 commits

  • fill_tgid() should skip not only an already exited group leader. If the
    task has ->exit_state != 0 it already did exit_notify(), so it also did
    fill_tgid_exit()->delayacct_add_tsk(->signal->stats) and we should skip it
    to avoid a double accounting.

    This patch doesn't close the race completely, but it cleanups the code.

    Signed-off-by: Oleg Nesterov
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Remove tasklist_lock from taskstats.c. find_task_by_pid() is rcu-safe.
    ->siglock allows us to traverse subthread without tasklist.

    Q: delay accounting looks wrong to me. If sub-thread has already called
    taskstats_exit_send() but didn't call release_task(self) yet it will be
    accounted twice. The window is big. No?

    Signed-off-by: Oleg Nesterov
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • signal_struct is (mostly) protected by ->sighand->siglock, I think we don't
    need ->taskstats_lock to protect ->stats. This also allows us to simplify the
    locking in fill_tgid().

    Signed-off-by: Oleg Nesterov
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. fill_tgid() forgets to do put_task_struct(first).

    2. release_task(first) can happen after fill_tgid() drops tasklist_lock,
    it is unsafe to dereference first->signal.

    This is a temporary fix, imho the locking should be reworked.

    Signed-off-by: Oleg Nesterov
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

01 Oct, 2006

3 commits

  • Add extended system accounting handling over taskstats interface. A
    CONFIG_TASK_XACCT flag is created to enable the extended accounting code.

    Signed-off-by: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jes Sorensen
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jay Lan
     
  • Add some basic accounting fields to the taskstats struct, add a new
    kernel/tsacct.c to handle basic accounting data handling upon exit. A handle
    is added to taskstats.c to invoke the basic accounting data handling.

    Signed-off-by: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jes Sorensen
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: "Michal Piotrowski"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jay Lan
     
  • The addition of the CSA patch pushed the size of struct taskstats to 256
    bytes. This exposed a problem with prepare_reply(), we were not allocating
    space for the netlink and genetlink header. It worked earlier because
    alloc_skb() would align the skb to SMP_CACHE_BYTES, which added some additonal
    bytes.

    Signed-off-by: Balbir Singh
    Cc: Jamal Hadi
    Cc: Shailabh Nagar
    Cc: Thomas Graf
    Cc: "David S. Miller"
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

23 Sep, 2006

1 commit

  • Adds:
    nlmsg_get_pos() return current position in message
    nlmsg_trim() trim part of message
    nla_reserve_nohdr(skb, len) reserve room for an attribute w/o hdr
    nla_put_nohdr(skb, len, data) add attribute w/o hdr
    nla_find_nested() find attribute in nested attributes

    Fixes nlmsg_new() to take allocation flags and consider size.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

01 Aug, 2006

2 commits


15 Jul, 2006

5 commits

  • In send_cpu_listeners(), which is called on the exit path, a down_write()
    was protecting operations like skb_clone() and genlmsg_unicast() that do
    GFP_KERNEL allocations. If the oom-killer decides to kill tasks to satisfy
    the allocations,the exit of those tasks could block on the same semphore.

    The down_write() was only needed to allow removal of invalid listeners from
    the listener list. The patch converts the down_write to a down_read and
    defers the removal to a separate critical region. This ensures that even
    if the oom-killer is called, no other task's exit is blocked as it can
    still acquire another down_read.

    Thanks to Andrew Morton & Herbert Xu for pointing out the oom related
    pitfalls, and to Chandra Seetharaman for suggesting this fix instead of
    using something more complex like RCU.

    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • On systems with a large number of cpus, with even a modest rate of tasks
    exiting per cpu, the volume of taskstats data sent on thread exit can
    overflow a userspace listener's buffers.

    One approach to avoiding overflow is to allow listeners to get data for a
    limited and specific set of cpus. By scaling the number of listeners
    and/or the cpus they monitor, userspace can handle the statistical data
    overload more gracefully.

    In this patch, each listener registers to listen to a specific set of cpus
    by specifying a cpumask. The interest is recorded per-cpu. When a task
    exits on a cpu, its taskstats data is unicast to each listener interested
    in that cpu.

    Thanks to Andrew Morton for pointing out the various scalability and
    general concerns of previous attempts and for suggesting this design.

    [akpm@osdl.org: build fix]
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Send per-tgid data only once during exit of a thread group instead of once
    with each member thread exit.

    Currently, when a thread exits, besides its per-tid data, the per-tgid data
    of its thread group is also sent out, if its thread group is non-empty.
    The per-tgid data sent consists of the sum of per-tid stats for all
    *remaining* threads of the thread group.

    This patch modifies this sending in two ways:

    - the per-tgid data is sent only when the last thread of a thread group
    exits. This cuts down heavily on the overhead of sending/receiving
    per-tgid data, especially when other exploiters of the taskstats
    interface aren't interested in per-tgid stats

    - the semantics of the per-tgid data sent are changed. Instead of being
    the sum of per-tid data for remaining threads, the value now sent is the
    true total accumalated statistics for all threads that are/were part of
    the thread group.

    The patch also addresses a minor issue where failure of one accounting
    subsystem to fill in the taskstats structure was causing the send of
    taskstats to not be sent at all.

    The patch has been tested for stability and run cerberus for over 4 hours
    on an SMP.

    [akpm@osdl.org: bugfixes]
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Usage of taskstats interface by delay accounting.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC
    family), for getting statistics of tasks and thread groups during their
    lifetime and when they exit. The interface is intended for use by multiple
    accounting packages though it is being created in the context of delay
    accounting.

    This patch creates the interface without populating the fields of the data
    that is sent to the user in response to a command or upon the exit of a task.
    Each accounting package interested in using taskstats has to provide an
    additional patch to add its stats to the common structure.

    [akpm@osdl.org: cleanups, Kconfig fix]
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar