30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

19 Sep, 2009

1 commit


26 Jul, 2008

1 commit

  • Sometimes, application responses become bad under heavy memory load.
    Applications take a bit time to reclaim memory. The statistics, how long
    memory reclaim takes, will be useful to measure memory usage.

    This patch adds accounting memory reclaim to per-task-delay-accounting for
    accounting the time of do_try_to_free_pages().

    - When System is under low memory load,
    memory reclaim may not occur.

    $ free
    total used free shared buffers cached
    Mem: 8197800 1577300 6620500 0 4808 1516724
    -/+ buffers/cache: 55768 8142032
    Swap: 16386292 0 16386292

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 0 5069748 10612 3014060 0 0 0 0 3 26 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 4 22 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 3 18 0 0 100 0

    Measure the time of tar command.

    $ ls -s test.dat
    1501472 test.dat

    $ time tar cvf test.tar test.dat
    real 0m13.388s
    user 0m0.116s
    sys 0m5.304s

    $ ./delayget -d -p
    CPU count real total virtual total delay total
    428 5528345500 5477116080 62749891
    IO count delay total
    338 8078977189
    SWAP count delay total
    0 0
    RECLAIM count delay total
    0 0

    - When system is under heavy memory load
    memory reclaim may occur.

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0
    0 0 7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0
    0 0 7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0

    In this case, one process uses more 8G memory
    by execution of malloc() and memset().

    $ time tar cvf test.tar test.dat
    real 1m38.563s
    CPU count real total virtual total delay total
    9021 7140446250 7315277975 923201824
    IO count delay total
    8965 90466349669
    SWAP count delay total
    3 21036367
    RECLAIM count delay total
    740 61011951153

    In the later case, the value of RECLAIM is increasing.
    So, taskstats can show how much memory reclaim influences TAT.

    Signed-off-by: Keika Kobayashi
    Acked-by: Balbir Singh
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     

20 Oct, 2007

1 commit

  • This patch is inspired by the discussion at
    http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
    as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263. The
    patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
    ported)

    This patch implements per cgroup statistics infrastructure and re-uses
    code from the taskstats interface. A new set of cgroup operations are
    registered with commands and attributes. It should be very easy to
    *extend* per cgroup statistics, by adding members to the cgroupstats
    structure.

    The current model for cgroupstats is a pull, a push model (to post
    statistics on interesting events), should be very easy to add. Currently
    user space requests for statistics by passing the cgroup file
    descriptor. Statistics about the state of all the tasks in the cgroup
    is returned to user space.

    TODO's/NOTE:

    This patch provides an infrastructure for implementing cgroup statistics.
    Based on the needs of each controller, we can incrementally add more statistics,
    event based support for notification of statistics, accumulation of taskstats
    into cgroup statistics in the future.

    Sample output

    # ./cgroupstats -C /cgroup/a
    sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0

    # ./cgroupstats -C /cgroup/
    sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0

    If the approach looks good, I'll enhance and post the user space utility for
    the same

    Feedback, comments, test results are always welcome!

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

08 Dec, 2006

1 commit

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

02 Sep, 2006

1 commit

  • Cleanup allocation and freeing of tsk->delays used by delay accounting.
    This solves two problems reported for delay accounting:

    1. oops in __delayacct_blkio_ticks
    http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1844.html

    Currently tsk->delays is getting freed too early in task exit which can
    cause a NULL tsk->delays to get accessed via reading of /proc//stats.
    The patch fixes this problem by freeing tsk->delays closer to when
    task_struct itself is freed up. As a result, it also eliminates the use of
    tsk->delays_lock which was only being used (inadequately) to safeguard
    access to tsk->delays while a task was exiting.

    2. Possible memory leak in kernel/delayacct.c
    http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1389.html

    The patch cleans up tsk->delays allocations after a bad fork which was
    missing earlier.

    The patch has been tested to fix the problems listed above and stress
    tested with rapid calls to delay accounting's taskstats command interface
    (which is the other path that can access the same data, besides the /proc
    interface causing the oops above).

    Signed-off-by: Shailabh Nagar
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     

01 Aug, 2006

2 commits

  • Enable delay accounting by default so that feature gets coverage testing
    without requiring special measures.

    Earlier, it was off by default and had to be enabled via a boot time param.
    This patch reverses the default behaviour to improve coverage testing. It
    can be removed late in the kernel development cycle if its believed users
    shouldn't have to incur any cost if they don't want delay accounting. Or
    it can be retained forever if the utility of the stats is deemed common
    enough to warrant keeping the feature on.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Complete the separation of delay accounting and taskstats by ignoring the
    return value of delay accounting functions that fill in parts of taskstats
    before it is sent out (either in response to a command or as part of a task
    exit).

    Also make delayacct_add_tsk return silently when delay accounting is turned
    off rather than treat it as an error.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     

15 Jul, 2006

4 commits

  • Export I/O delays seen by a task through /proc//stats for use in top
    etc.

    Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed
    together with all other I/O here (this is not the case in the netlink
    interface where the swapin I/O is kept distinct)

    [akpm@osdl.org: printk warning fix]
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Usage of taskstats interface by delay accounting.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Unlike earlier iterations of the delay accounting patches, now delays are only
    collected for the actual I/O waits rather than try and cover the delays seen
    in I/O submission paths.

    Account separately for block I/O delays incurred as a result of swapin page
    faults whose frequency can be affected by the task/process' rss limit. Hence
    swapin delays can act as feedback for rss limit changes independent of I/O
    priority changes.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Initialization code related to collection of per-task "delay" statistics which
    measure how long it had to wait for cpu, sync block io, swapping etc. The
    collection of statistics and the interface are in other patches. This patch
    sets up the data structures and allows the statistics collection to be
    disabled through a kernel boot parameter.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar