20 Apr, 2008

1 commit


14 Apr, 2008

1 commit

  • The per node counters are used mainly for showing data through the sysfs API.
    If that API is not compiled in then there is no point in keeping track of this
    data. Disable counters for the number of slabs and the number of total slabs
    if !SLUB_DEBUG. Incrementing the per node counters is also accessing a
    potentially contended cacheline so this could actually be a performance
    benefit to embedded systems.

    SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which
    is on by default).

    Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab()
    if the system is not compiled with NUMA support.

    [penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pekka Enberg

    Christoph Lameter
     

11 Mar, 2008

1 commit

  • The original preemptible-RCU patch put the choice between classic and
    preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures
    on machines not supporting CONFIG_PREEMPT. This choice was therefore moved to
    init/Kconfig, which worked, but placed the choice between classic and
    preemptible RCU at the top level, a very obtuse choice indeed.

    This patch changes from the Kconfig "choice" mechanism to a pair of booleans,
    only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in
    kernel/Kconfig.preempt, where one would expect it to be. The other
    (CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all
    architectures, hopefully avoiding build breakage. Thanks to Roman Zippel for
    suggesting this approach.

    Signed-off-by: Paul E. McKenney
    Cc: Ingo Molnar
    Acked-by: Steven Rostedt
    Cc: Dipankar Sarma
    Cc: Josh Triplett
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Roman Zippel
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     

05 Mar, 2008

4 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
    debugfs: fix sparse warnings
    Driver core: Fix cleanup when failing device_add().
    driver core: Remove dpm_sysfs_remove() from error path of device_add()
    PM: fix new mutex-locking bug in the PM core
    PM: Do not acquire device semaphores upfront during suspend
    kobject: properly initialize ksets
    sysfs: CONFIG_SYSFS_DEPRECATED fix
    driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED

    Linus Torvalds
     
  • Rename Memory Controller to Memory Resource Controller. Reflect the same
    changes in the CONFIG definition for the Memory Resource Controller. Group
    together the config options for Resource Counters and Memory Resource
    Controller.

    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • CONFIG_SYSFS_DEPRECATED=y changed its meaning recently and causes
    regressions in working setups that had SYSFS_DEPRECATED disabled.

    so rename it to SYSFS_DEPRECATED_V2 so that testers pick up the new
    default via 'make oldconfig', even if their old .config's disabled
    CONFIG_SYSFS_DEPRECATED ...

    Signed-off-by: Ingo Molnar
    Cc: Kay Sievers
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Ingo Molnar
     
  • As things get moved into this config option, the hard date of 2006 does
    not work anymore, so update the text to be more descriptive.

    Cc: Kay Sievers
    Cc: Jiri Slaby
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

24 Feb, 2008

1 commit

  • Document huge memory/cache overhead of memory controller in Kconfig

    I was a little surprised that 2.6.25-rc* increased struct page for the
    memory controller. At least on many x86-64 machines it will not fit into a
    single cache line now anymore and also costs considerable amounts of RAM.
    At earlier review I remembered asking for a external data structure for
    this.

    It's also quite unobvious that a innocent looking Kconfig option with a
    single line Kconfig description has such a negative effect.

    This patch attempts to document these disadvantages at least so that users
    configuring their kernel can make a informed decision.

    Signed-off-by: Andi Kleen
    Cc: Balbir Singh
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

13 Feb, 2008

1 commit


10 Feb, 2008

1 commit


09 Feb, 2008

5 commits

  • Just like with the user namespaces, move the namespace management code into
    the separate .c file and mark the (already existing) PID_NS option as "depend
    on NAMESPACES"

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Make the user_namespace.o compilation depend on this option and move the
    init_user_ns into user.c file to make the kernel compile and work without the
    namespaces support. This make the user namespace code be organized similar to
    other namespaces'.

    Also mask the USER_NS option as "depend on NAMESPACES".

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently all the namespace management code is in the kernel/utsname.c file,
    so just compile it out and make stubs in the appropriate header.

    The init namespace itself is in init/version.c and is in the kernel all the
    time.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The option is selectable if EMBEDDED is chosen only. When the EMBEDDED is off
    namespaces will be on.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

08 Feb, 2008

2 commits

  • Setup the memory cgroup and add basic hooks and controls to integrate
    and work with the cgroup.

    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • With fixes from David Rientjes

    Introduce generic structures and routines for resource accounting.

    Each resource accounting cgroup is supposed to aggregate it,
    cgroup_subsystem_state and its resource-specific members within.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: David Rientjes
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

07 Feb, 2008

1 commit

  • based on similar patch from: Pavel Machek

    Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free
    (but not obliged to) randomize the brk area.

    Heap randomization breaks ancient binaries, so we keep COMPAT_BRK
    enabled by default.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

06 Feb, 2008

3 commits


03 Feb, 2008

2 commits

  • Move the instrumentation Kconfig to

    arch/Kconfig for architecture dependent options
    - oprofile
    - kprobes

    and

    init/Kconfig for architecture independent options
    - profiling
    - markers

    Remove the "Instrumentation Support" menu. Everything moves to "General setup".
    Delete the kernel/Kconfig.instrumentation file.

    Signed-off-by: Mathieu Desnoyers
    Cc: Linus Torvalds
    Cc:
    Signed-off-by: Sam Ravnborg

    Mathieu Desnoyers
     
  • Puts the content of arch/Kconfig in the "General setup" menu.

    Linus:

    > Should it come with a re-duplication of it's content into each
    > architecture, which was the case previously ? The oprofile and kprobes
    > menu entries were litteraly cut and pasted from one architecture to
    > another. Should we put its content in init/Kconfig then ?

    I don't think it's a good idea to go back to making it per-architecture,
    although that extensive "depends on " might
    indicate that there certainly is room for cleanup there.

    And I don't think it's wrong keeping it in kernel/Kconfig.xyz per se, I
    just think it's wrong to (a) lump the code together when it really doesn't
    necessarily need to and (b) show it to users as some kind of choice that
    is tied together (whether it then has common code or not).

    On the per-architecture side, I do think it would be better to *not* have
    internal architecture knowledge in a generic file, and as such a line like

    depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32

    really shouldn't exist in a file like kernel/Kconfig.instrumentation.

    It would be much better to do

    depends on ARCH_SUPPORTS_KPROBES

    in that generic file, and then architectures that do support it would just
    have a

    bool ARCH_SUPPORTS_KPROBES
    default y

    in *their* architecture files. That would seem to be much more logical,
    and is readable both for arch maintainers *and* for people who have no
    clue - and don't care - about which architecture is supposed to support
    which interface...

    Sam Ravnborg:

    Stuff it into a new file: arch/Kconfig
    We can then extend this file to include all the 'trailing'
    Kconfig things that are anyway equal for all ARCHs.

    But it should be kept clean - so if we introduce such a file
    then we should use ARCH_HAS_whatever in the arch specific Kconfig
    files to enable stuff that is not shared.

    [...]

    The above suggestion is actually not exactly the best way to do it...
    First the naming..
    A quick grep shows following usage today (in Kconfig files)
    ARCH_HAS 51
    ARCH_SUPPORTS 4
    HAVE_ARCH 7

    ARCH_HAS is the clear winner.

    In the common Kconfig file do:

    config FOO
    depends on ARCH_HAS_FOO
    bool "bla bla"

    config ARCH_HAS_FOO
    def_bool n

    In the arch specific Kconfig file in a suitable place do:

    config SUITABLE_OPTION
    select ARCH_HAS_FOO

    The naming of ARCH_HAS_ is fixed and shall be:
    ARCH_HAS_

    Only a single line added pr. architecture.
    And we will end up with a (maybe even commented) list of trivial selects.

    - Yet another update :

    Moving to HAVE_* now.

    Signed-off-by: Mathieu Desnoyers
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Ananth N Mavinakayanahalli
    Signed-off-by: Sam Ravnborg

    Mathieu Desnoyers
     

01 Feb, 2008

1 commit


29 Jan, 2008

1 commit


28 Jan, 2008

1 commit


26 Jan, 2008

1 commit

  • This patch implements a new version of RCU which allows its read-side
    critical sections to be preempted. It uses a set of counter pairs
    to keep track of the read-side critical sections and flips them
    when all tasks exit read-side critical section. The details
    of this implementation can be found in this paper -

    http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf

    and the article-

    http://lwn.net/Articles/253651/

    This patch was developed as a part of the -rt kernel development and
    meant to provide better latencies when read-side critical sections of
    RCU don't disable preemption. As a consequence of keeping track of RCU
    readers, the readers have a slight overhead (optimizations in the paper).
    This implementation co-exists with the "classic" RCU implementations
    and can be switched to at compiler.

    Also includes RCU tracing summarized in debugfs.

    [ akpm@linux-foundation.org: build fixes on non-preempt architectures ]

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

25 Jan, 2008

1 commit


03 Jan, 2008

1 commit

  • Both SLUB and SLAB really did almost exactly the same thing for
    /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's.

    This just creates a common CONFIG_SLABINFO that is enabled by both SLUB
    and SLAB, and shares all the setup code. Maybe SLOB will want this some
    day too.

    Reviewed-by: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Dec, 2007

1 commit

  • Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for
    us, which provided a cpu accounting resource controller. This feature would be
    useful if someone wants to group tasks only for accounting purpose and doesnt
    really want to exercise any control over their cpu consumption.

    The patch below reintroduces the feature. It is based on Paul Menage's
    original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with
    these differences:

    - Removed load average information. I felt it needs more thought (esp
    to deal with SMP and virtualized platforms) and can be added for
    2.6.25 after more discussions.
    - Convert group cpu usage to be nanosecond accurate (as rest of the cfs
    stats are) and invoke cpuacct_charge() from the respective scheduler
    classes
    - Make accounting scalable on SMP systems by splitting the usage
    counter to be per-cpu
    - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the
    code is not big enough to warrant a new file and also this rightly
    needs to live inside the scheduler. Also things like accessing
    rq->lock while reading cpu usage becomes easier if the code lived in
    kernel/sched.c)

    The patch also modifies the cpu controller not to provide the same accounting
    information.

    Tested-by: Balbir Singh

    Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran
    some simple tests like cpuspin (spin on the cpu), ran several tasks in
    the same group and timed them. Compared their time stamps with
    cpuacct.usage.

    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Balbir Singh
    Signed-off-by: Ingo Molnar

    Srivatsa Vaddagiri
     

23 Nov, 2007

1 commit


15 Nov, 2007

2 commits

  • This is my trivial patch to swat innumerable little bugs with a single
    blow.

    After some intensive review (my apologies for not having gotten to this
    sooner) what we have looks like a good base to build on with the current
    pid namespace code but it is not complete, and it is still much to simple
    to find issues where the kernel does the wrong thing outside of the initial
    pid namespace.

    Until the dust settles and we are certain we have the ABI and the
    implementation is as correct as humanly possible let's keep process ID
    namespaces behind CONFIG_EXPERIMENTAL.

    Allowing us the option of fixing any ABI or other bugs we find as long as
    they are minor.

    Allowing users of the kernel to avoid those bugs simply by ensuring their
    kernel does not have support for multiple pid namespaces.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Adrian Bunk
    Cc: Jeremy Fitzhardinge
    Cc: Kir Kolyshkin
    Cc: Kirill Korotaev
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Revert 62d0df64065e7c135d0002f069444fbdfc64768f.

    This was originally intended as a simple initial example of how to create a
    control groups subsystem; it wasn't intended for mainline, but I didn't make
    this clear enough to Andrew.

    The CFS cgroup subsystem now has better functionality for the per-cgroup usage
    accounting (based directly on CFS stats) than the "usage" status file in this
    patch, and the "load" status file is rather simplistic - although having a
    per-cgroup load average report would be a useful feature, I don't believe this
    patch actually provides it. If it gets into the final 2.6.24 we'd probably
    have to support this interface for ever.

    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

25 Oct, 2007

1 commit


21 Oct, 2007

1 commit

  • New kind of audit rule predicates: "object is visible in given subtree".
    The part that can be sanely implemented, that is. Limitations:
    * if you have hardlink from outside of tree, you'd better watch
    it too (or just watch the object itself, obviously)
    * if you mount something under a watched tree, tell audit
    that new chunk should be added to watched subtrees
    * if you umount something in a watched tree and it's still mounted
    elsewhere, you will get matches on events happening there. New command
    tells audit to recalculate the trees, trimming such sources of false
    positives.

    Note that it's _not_ about path - if something mounted in several places
    (multiple mount, bindings, different namespaces, etc.), the match does
    _not_ depend on which one we are using for access.

    Signed-off-by: Al Viro

    Al Viro
     

20 Oct, 2007

5 commits

  • Enable "cgroup" (formerly containers) based fair group scheduling. This
    will let administrator create arbitrary groups of tasks (using "cgroup"
    pseudo filesystem) and control their cpu bandwidth usage.

    [akpm@linux-foundation.org: fix cpp condition]
    Signed-off-by: Srivatsa Vaddagiri
    Signed-off-by: Dhaval Giani
    Cc: Randy Dunlap
    Cc: Balbir Singh
    Cc: Paul Menage
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa Vaddagiri
     
  • When a task enters a new namespace via a clone() or unshare(), a new cgroup
    is created and the task moves into it.

    This version names cgroups which are automatically created using
    cgroup_clone() as "node_" where pid is the pid of the unsharing or
    cloned process. (Thanks Pavel for the idea) This is safe because if the
    process unshares again, it will create

    /cgroups/(...)/node_/node_

    The only possibilities (AFAICT) for a -EEXIST on unshare are

    1. pid wraparound
    2. a process fails an unshare, then tries again.

    Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the
    node_ will be empty and can be rmdir'ed to make the subsequent unshare()
    succeed.

    Changelog:
    Name cloned cgroups as "node_".

    [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
    Signed-off-by: Serge E. Hallyn
    Cc: Paul Menage
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • This example subsystem exports debugging information as an aid to diagnosing
    refcount leaks, etc, in the cgroup framework.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This example demonstrates how to use the generic cgroup subsystem for a
    simple resource tracker that counts, for the processes in a cgroup, the
    total CPU time used and the %CPU used in the last complete 10 second interval.

    Portions contributed by Balbir Singh

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Remove the filesystem support logic from the cpusets system and makes cpusets
    a cgroup subsystem

    The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
    passed through to the cgroup filesystem with the appropriate options to
    emulate the old cpuset filesystem behaviour.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage