18 Apr, 2008

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb:
    kgdb: always use icache flush for sw breakpoints
    kgdb: fix SMP NMI kgdb_handle_exception exit race
    kgdb: documentation fixes
    kgdb: allow static kgdbts boot configuration
    kgdb: add documentation
    kgdb: Kconfig fix
    kgdb: add kgdb internal test suite
    kgdb: fix several kgdb regressions
    kgdb: kgdboc pl011 I/O module
    kgdb: fix optional arch functions and probe_kernel_*
    kgdb: add x86 HW breakpoints
    kgdb: print breakpoint removed on exception
    kgdb: clocksource watchdog
    kgdb: fix NMI hangs
    kgdb: fix kgdboc dynamic module configuration
    kgdb: document parameters
    x86: kgdb support
    consoles: polling support, kgdboc
    kgdb: core
    uaccess: add probe_kernel_write()

    Linus Torvalds
     
  • kgdb core code. Handles the protocol and the arch details.

    [ mingo@elte.hu: heavily modified, simplified and cleaned up. ]
    [ xemul@openvz.org: use find_task_by_pid_ns ]

    Signed-off-by: Jason Wessel
    Signed-off-by: Ingo Molnar
    Signed-off-by: Jan Kiszka
    Reviewed-by: Thomas Gleixner

    Jason Wessel
     

17 Apr, 2008

1 commit

  • Semaphores are no longer performance-critical, so a generic C
    implementation is better for maintainability, debuggability and
    extensibility. Thanks to Peter Zijlstra for fixing the lockdep
    warning. Thanks to Harvey Harrison for pointing out that the
    unlikely() was unnecessary.

    Signed-off-by: Matthew Wilcox
    Acked-by: Ingo Molnar

    Matthew Wilcox
     

09 Feb, 2008

4 commits

  • When the conversion factor between jiffies and milli- or microseconds is
    not a single multiply or divide, as for the case of HZ == 300, we currently
    do a multiply followed by a divide. The intervening result, however, is
    subject to overflows, especially since the fraction is not simplified (for
    HZ == 300, we multiply by 300 and divide by 1000).

    This is exposed to the user when passing a large timeout to poll(), for
    example.

    This patch replaces the multiply-divide with a reciprocal multiplication on
    32-bit platforms. When the input is an unsigned long, there is no portable
    way to do this on 64-bit platforms there is no portable way to do this
    since it requires a 128-bit intermediate result (which gcc does support on
    64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but
    since the output is a 32-bit integer in the cases affected, just simplify
    the multiply-divide (*3/10 instead of *300/1000).

    The reciprocal multiply used can have off-by-one errors in the upper half
    of the valid output range. This could be avoided at the expense of having
    to deal with a potential 65-bit intermediate result. Since the intent is
    to avoid overflow problems and most of the other time conversions are only
    semiexact, the off-by-one errors were considered an acceptable tradeoff.

    At Ralf Baechle's suggestion, this version uses a Perl script to compute
    the necessary constants. We already have dependencies on Perl for kernel
    compiles. This does, however, require the Perl module Math::BigInt, which
    is included in the standard Perl distribution starting with version 5.8.0.
    In order to support older versions of Perl, include a table of canned
    constants in the script itself, and structure the script so that
    Math::BigInt isn't required if pulling values from said table.

    Running the script requires that the HZ value is available from the
    Makefile. Thus, this patch also adds the Kconfig variable CONFIG_HZ to the
    architectures which didn't already have it (alpha, cris, frv, h8300, m32r,
    m68k, m68knommu, sparc, v850, and xtensa.) It does *not* touch the sh or
    sh64 architectures, since Paul Mundt has dealt with those separately in the
    sh tree.

    Signed-off-by: H. Peter Anvin
    Cc: Ralf Baechle ,
    Cc: Sam Ravnborg ,
    Cc: Paul Mundt ,
    Cc: Richard Henderson ,
    Cc: Michael Starvik ,
    Cc: David Howells ,
    Cc: Yoshinori Sato ,
    Cc: Hirokazu Takata ,
    Cc: Geert Uytterhoeven ,
    Cc: Roman Zippel ,
    Cc: William L. Irwin ,
    Cc: Chris Zankel ,
    Cc: H. Peter Anvin ,
    Cc: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     
  • Just like with the user namespaces, move the namespace management code into
    the separate .c file and mark the (already existing) PID_NS option as "depend
    on NAMESPACES"

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Make the user_namespace.o compilation depend on this option and move the
    init_user_ns into user.c file to make the kernel compile and work without the
    namespaces support. This make the user namespace code be organized similar to
    other namespaces'.

    Also mask the USER_NS option as "depend on NAMESPACES".

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently all the namespace management code is in the kernel/utsname.c file,
    so just compile it out and make stubs in the appropriate header.

    The init namespace itself is in init/version.c and is in the kernel all the
    time.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

08 Feb, 2008

1 commit

  • With fixes from David Rientjes

    Introduce generic structures and routines for resource accounting.

    Each resource accounting cgroup is supposed to aggregate it,
    cgroup_subsystem_state and its resource-specific members within.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: David Rientjes
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

06 Feb, 2008

2 commits

  • Replace latency.c use with pm_qos_params use.

    Signed-off-by: mark gross
    Cc: "John W. Linville"
    Cc: Len Brown
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Gross
     
  • The following patch is a generalization of the latency.c implementation done
    by Arjan last year. It provides infrastructure for more than one parameter,
    and exposes a user mode interface for processes to register pm_qos
    expectations of processes.

    This interface provides a kernel and user mode interface for registering
    performance expectations by drivers, subsystems and user space applications on
    one of the parameters.

    Currently we have {cpu_dma_latency, network_latency, network_throughput} as
    the initial set of pm_qos parameters.

    The infrastructure exposes multiple misc device nodes one per implemented
    parameter. The set of parameters implement is defined by pm_qos_power_init()
    and pm_qos_params.h. This is done because having the available parameters
    being runtime configurable or changeable from a driver was seen as too easy to
    abuse.

    For each parameter a list of performance requirements is maintained along with
    an aggregated target value. The aggregated target value is updated with
    changes to the requirement list or elements of the list. Typically the
    aggregated target value is simply the max or min of the requirement values
    held in the parameter list elements.

    >From kernel mode the use of this interface is simple:

    pm_qos_add_requirement(param_id, name, target_value):

    Will insert a named element in the list for that identified PM_QOS
    parameter with the target value. Upon change to this list the new target is
    recomputed and any registered notifiers are called only if the target value
    is now different.

    pm_qos_update_requirement(param_id, name, new_target_value):

    Will search the list identified by the param_id for the named list element
    and then update its target value, calling the notification tree if the
    aggregated target is changed. with that name is already registered.

    pm_qos_remove_requirement(param_id, name):

    Will search the identified list for the named element and remove it, after
    removal it will update the aggregate target and call the notification tree
    if the target was changed as a result of removing the named requirement.

    >From user mode:

    Only processes can register a pm_qos requirement. To provide for
    automatic cleanup for process the interface requires the process to register
    its parameter requirements in the following way:

    To register the default pm_qos target for the specific parameter, the
    process must open one of /dev/[cpu_dma_latency, network_latency,
    network_throughput]

    As long as the device node is held open that process has a registered
    requirement on the parameter. The name of the requirement is
    "process_" derived from the current->pid from within the open system
    call.

    To change the requested target value the process needs to write a s32
    value to the open device node. This translates to a
    pm_qos_update_requirement call.

    To remove the user mode request for a target value simply close the device
    node.

    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: fix build again]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: mark gross
    Cc: "John W. Linville"
    Cc: Len Brown
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Cc: Arjan van de Ven
    Cc: Venki Pallipadi
    Cc: Adam Belay
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Gross
     

03 Feb, 2008

1 commit

  • kernel/ksysfs.c seems to be a random dumping group for misc globals
    that the rest of the tree depend on. This has caused problems with
    exports in the past when sysfs is disabled, which can already be
    observed in commit-id 51107301b629640f9ab76fe23bf385e187b9ac29.

    The latest one is the kernel_kobj usage, which presently results in:

    fs/built-in.o: In function `debugfs_init':
    inode.c:(.init.text+0xc34): undefined reference to `kernel_kobj'
    make: *** [.tmp_vmlinux1] Error 1

    kernel/ksysfs.c itself at this point only contains globals and some
    basic sysfs initialization, the sysfs initialization code is optimized
    out when we build with sysfs disabled. Given that, it's easier to just
    build in unconditionally, rather than trying to find some other random
    place to dump and initialize the globals.

    Additionally, the current trend seems to be decoupling of kobjects from
    sysfs, in which case it still makes sense to perform the kernel_kobj
    initialization that happens here even if sysfs is disabled, as
    lib/kobject.o is built-in unconditionally.

    Signed-off-by: Paul Mundt
    Signed-off-by: Greg Kroah-Hartman

    Paul Mundt
     

30 Jan, 2008

2 commits

  • During the work on the x86 32 and 64 bit backtrace code I found it useful
    to have a simple test module to test a process and irq context backtrace.
    Since the existing backtrace code was buggy, I figure it might be useful
    to have such a test module in the kernel so that maybe we can even
    detect such bugs earlier..

    [ mingo@elte.hu: build fix ]

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Arjan van de Ven
     
  • Here is a quick and naive smoke test for kprobes. This is intended to
    just verify if some unrelated change broke the *probes subsystem. It is
    self contained, architecture agnostic and isn't of any great use by itself.

    This needs to be built in the kernel and runs a basic set of tests to
    verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
    of an error, it'll print out a message with a "BUG" prefix.

    This is a start; we intend to add more tests to this bucket over time.

    Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.

    Tested on x86 (32/64) and powerpc.

    Signed-off-by: Ananth N Mavinakayanahalli
    Acked-by: Masami Hiramatsu
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ananth N Mavinakayanahalli
     

26 Jan, 2008

3 commits

  • LatencyTOP kernel infrastructure; it measures latencies in the
    scheduler and tracks it system wide and per process.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • This patch implements a new version of RCU which allows its read-side
    critical sections to be preempted. It uses a set of counter pairs
    to keep track of the read-side critical sections and flips them
    when all tasks exit read-side critical section. The details
    of this implementation can be found in this paper -

    http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf

    and the article-

    http://lwn.net/Articles/253651/

    This patch was developed as a part of the -rt kernel development and
    meant to provide better latencies when read-side critical sections of
    RCU don't disable preemption. As a consequence of keeping track of RCU
    readers, the readers have a slight overhead (optimizations in the paper).
    This implementation co-exists with the "classic" RCU implementations
    and can be switched to at compiler.

    Also includes RCU tracing summarized in debugfs.

    [ akpm@linux-foundation.org: build fixes on non-preempt architectures ]

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This patch re-organizes the RCU code to enable multiple implementations
    of RCU. Users of RCU continues to include rcupdate.h and the
    RCU interfaces remain the same. This is in preparation for
    subsequently merging the preemptible RCU implementation.

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

15 Nov, 2007

1 commit

  • Revert 62d0df64065e7c135d0002f069444fbdfc64768f.

    This was originally intended as a simple initial example of how to create a
    control groups subsystem; it wasn't intended for mainline, but I didn't make
    this clear enough to Andrew.

    The CFS cgroup subsystem now has better functionality for the per-cgroup usage
    accounting (based directly on CFS stats) than the "usage" status file in this
    patch, and the "load" status file is rather simplistic - although having a
    per-cgroup load average report would be a useful feature, I don't believe this
    patch actually provides it. If it gets into the final 2.6.24 we'd probably
    have to support this interface for ever.

    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

21 Oct, 2007

1 commit

  • New kind of audit rule predicates: "object is visible in given subtree".
    The part that can be sanely implemented, that is. Limitations:
    * if you have hardlink from outside of tree, you'd better watch
    it too (or just watch the object itself, obviously)
    * if you mount something under a watched tree, tell audit
    that new chunk should be added to watched subtrees
    * if you umount something in a watched tree and it's still mounted
    elsewhere, you will get matches on events happening there. New command
    tells audit to recalculate the trees, trimming such sources of false
    positives.

    Note that it's _not_ about path - if something mounted in several places
    (multiple mount, bindings, different namespaces, etc.), the match does
    _not_ depend on which one we are using for access.

    Signed-off-by: Al Viro

    Al Viro
     

20 Oct, 2007

7 commits

  • Weird I thought I had written the makefile so this would be handled. Oh
    well this should fix it.

    Sorry about that.

    Signed-off-by: Eric W. Biederman
    Acked-and-tested-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • The marker activation functions sits in kernel/marker.c. A hash table is used
    to keep track of the registered probes and armed markers, so the markers
    within a newly loaded module that should be active can be activated at module
    load time.

    marker_query has been removed. marker_get_first, marker_get_next and
    marker_release should be used as iterators on the markers.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Mathieu Desnoyers
    Acked-by: "Frank Ch. Eigler"
    Cc: Christoph Hellwig
    Cc: Rusty Russell
    Cc: Mike Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • When a task enters a new namespace via a clone() or unshare(), a new cgroup
    is created and the task moves into it.

    This version names cgroups which are automatically created using
    cgroup_clone() as "node_" where pid is the pid of the unsharing or
    cloned process. (Thanks Pavel for the idea) This is safe because if the
    process unshares again, it will create

    /cgroups/(...)/node_/node_

    The only possibilities (AFAICT) for a -EEXIST on unshare are

    1. pid wraparound
    2. a process fails an unshare, then tries again.

    Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the
    node_ will be empty and can be rmdir'ed to make the subsequent unshare()
    succeed.

    Changelog:
    Name cloned cgroups as "node_".

    [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
    Signed-off-by: Serge E. Hallyn
    Cc: Paul Menage
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • This example subsystem exports debugging information as an aid to diagnosing
    refcount leaks, etc, in the cgroup framework.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This example demonstrates how to use the generic cgroup subsystem for a
    simple resource tracker that counts, for the processes in a cgroup, the
    total CPU time used and the %CPU used in the last complete 10 second interval.

    Portions contributed by Balbir Singh

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Generic Process Control Groups
    --------------------------

    There have recently been various proposals floating around for
    resource management/accounting and other task grouping subsystems in
    the kernel, including ResGroups, User BeanCounters, NSProxy
    cgroups, and others. These all need the basic abstraction of being
    able to group together multiple processes in an aggregate, in order to
    track/limit the resources permitted to those processes, or control
    other behaviour of the processes, and all implement this grouping in
    different ways.

    This patchset provides a framework for tracking and grouping processes
    into arbitrary "cgroups" and assigning arbitrary state to those
    groupings, in order to control the behaviour of the cgroup as an
    aggregate.

    The intention is that the various resource management and
    virtualization/cgroup efforts can also become task cgroup
    clients, with the result that:

    - the userspace APIs are (somewhat) normalised

    - it's easier to test e.g. the ResGroups CPU controller in
    conjunction with the BeanCounters memory controller, or use either of
    them as the resource-control portion of a virtual server system.

    - the additional kernel footprint of any of the competing resource
    management systems is substantially reduced, since it doesn't need
    to provide process grouping/containment, hence improving their
    chances of getting into the kernel

    This patch:

    Add the main task cgroups framework - the cgroup filesystem, and the
    basic structures for tracking membership and associating subsystem state
    objects to tasks.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • There is separate notifier header, but no separate notifier .c file.

    Extract notifier code out of kernel/sys.c which will remain for
    misc syscalls I hope. Merge kernel/die_notifier.c into kernel/notifier.c.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

19 Oct, 2007

1 commit

  • After going through the kernels sysctl tables several times it has become
    clear that code review and testing is just not effective in prevent
    problematic sysctl tables from being used in the stable kernel. I certainly
    can't seem to fix the problems as fast as they are introduced.

    Therefore this patch adds sysctl_check_table which is called when a sysctl
    table is registered and checks to see if we have a problematic sysctl table.

    The biggest part of the code is the table of valid binary sysctl entries, but
    since we have frozen our set of binary sysctls this table should not need to
    change, and it makes it much easier to detect when someone unintentionally
    adds a new binary sysctl value.

    As best as I can determine all of the several hundred errors spewed on boot up
    now are legitimate.

    [bunk@kernel.org: kernel/sysctl_check.c must #include ]
    Signed-off-by: Eric W. Biederman
    Cc: Alexey Dobriyan
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

17 Jul, 2007

2 commits

  • Basically, it will allow a process to unshare its user_struct table,
    resetting at the same time its own user_struct and all the associated
    accounting.

    A new root user (uid == 0) is added to the user namespace upon creation.
    Such root users have full privileges and it seems that theses privileges
    should be controlled through some means (process capabilities ?)

    The unshare is not included in this patch.

    Changes since [try #4]:
    - Updated get_user_ns and put_user_ns to accept NULL, and
    get_user_ns to return the namespace.

    Changes since [try #3]:
    - moved struct user_namespace to files user_namespace.{c,h}

    Changes since [try #2]:
    - removed struct user_namespace* argument from find_user()

    Changes since [try #1]:
    - removed struct user_namespace* argument from find_user()
    - added a root_user per user namespace

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
    deactivate the unshare of the uts and ipc namespaces and do not improve
    performance.

    Signed-off-by: Cedric Le Goater
    Acked-by: "Serge E. Hallyn"
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     

09 May, 2007

1 commit

  • This patch moves the die notifier handling to common code. Previous
    various architectures had exactly the same code for it. Note that the new
    code is compiled unconditionally, this should be understood as an appel to
    the other architecture maintainer to implement support for it aswell (aka
    sprinkling a notify_die or two in the proper place)

    arm had a notifiy_die that did something totally different, I renamed it to
    arm_notify_die as part of the patch and made it static to the file it's
    declared and used at. avr32 used to pass slightly less information through
    this interface and I brought it into line with the other architectures.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
    [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Russell King
    Signed-off-by: Bryan Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

15 Feb, 2007

1 commit

  • This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
    with special cases, and by keeping all of the utsname logic to together it
    makes the code a little more readable.

    Signed-off-by: Eric W. Biederman
    Cc: Serge E. Hallyn
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

16 Dec, 2006

1 commit

  • It has caused more problems than it ever really solved, and is
    apparently not getting cleaned up and fixed. We can put it back when
    it's stable and isn't likely to make warning or bug events worse.

    In the meantime, enable frame pointers for more readable stack traces.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Oct, 2006

1 commit

  • Updated patch adding a variant of RCU that permits sleeping in read-side
    critical sections. SRCU is as follows:

    o Each use of SRCU creates its own srcu_struct, and each
    srcu_struct has its own set of grace periods. This is
    critical, as it prevents one subsystem with a blocking
    reader from holding up SRCU grace periods for other
    subsystems.

    o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(),
    and synchronize_srcu()) all take a pointer to a srcu_struct.

    o The SRCU primitives must be called from process context.

    o srcu_read_lock() returns an int that must be passed to
    the matching srcu_read_unlock(). Realtime RCU avoids the
    need for this by storing the state in the task struct,
    but SRCU needs to allow a given code path to pass through
    multiple SRCU domains -- storing state in the task struct
    would therefore require either arbitrary space in the
    task struct or arbitrary limits on SRCU nesting. So I
    kicked the state-storage problem up to the caller.

    Of course, it is not permitted to call synchronize_srcu()
    while in an SRCU read-side critical section.

    o There is no call_srcu(). It would not be hard to implement
    one, but it seems like too easy a way to OOM the system.
    (Hey, we have enough trouble with call_rcu(), which does
    -not- permit readers to sleep!!!) So, if you want it,
    please tell me why...

    [josht@us.ibm.com: sparse notation]
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     

02 Oct, 2006

2 commits

  • This patch defines the uts namespace and some manipulators.
    Adds the uts namespace to task_struct, and initializes a
    system-wide init namespace.

    It leaves a #define for system_utsname so sysctl will compile.
    This define will be removed in a separate patch.

    [akpm@osdl.org: build fix, cleanup]
    Signed-off-by: Serge Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • This patch adds a nsproxy structure to the task struct. Later patches will
    move the fs namespace pointer into this structure, and introduce a new utsname
    namespace into the nsproxy.

    The vserver and openvz functionality, then, would be implemented in large part
    by virtualizing/isolating more and more resources into namespaces, each
    contained in the nsproxy.

    [akpm@osdl.org: build fix]
    Signed-off-by: Serge Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

01 Oct, 2006

2 commits

  • Add some basic accounting fields to the taskstats struct, add a new
    kernel/tsacct.c to handle basic accounting data handling upon exit. A handle
    is added to taskstats.c to invoke the basic accounting data handling.

    Signed-off-by: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jes Sorensen
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: "Michal Piotrowski"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jay Lan
     
  • Add infrastructure to track "maximum allowable latency" for power saving
    policies.

    The reason for adding this infrastructure is that power management in the
    idle loop needs to make a tradeoff between latency and power savings
    (deeper power save modes have a longer latency to running code again). The
    code that today makes this tradeoff just does a rather simple algorithm;
    however this is not good enough: There are devices and use cases where a
    lower latency is required than that the higher power saving states provide.
    An example would be audio playback, but another example is the ipw2100
    wireless driver that right now has a very direct and ugly acpi hook to
    disable some higher power states randomly when it gets certain types of
    error.

    The proposed solution is to have an interface where drivers can

    * announce the maximum latency (in microseconds) that they can deal with
    * modify this latency
    * give up their constraint

    and a function where the code that decides on power saving strategy can
    query the current global desired maximum.

    This patch has a user of each side: on the consumer side, ACPI is patched
    to use this, on the producer side the ipw2100 driver is patched.

    A generic maximum latency is also registered of 2 timer ticks (more and you
    lose accurate time tracking after all).

    While the existing users of the patch are x86 specific, the infrastructure
    is not. I'd like to ask the arch maintainers of other architectures if the
    infrastructure is generic enough for their use (assuming the architecture
    has such a tradeoff as concept at all), and the sound/multimedia driver
    owners to look at the driver facing API to see if this is something they
    can use.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Acked-by: Jesse Barnes
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

15 Jul, 2006

2 commits

  • Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC
    family), for getting statistics of tasks and thread groups during their
    lifetime and when they exit. The interface is intended for use by multiple
    accounting packages though it is being created in the context of delay
    accounting.

    This patch creates the interface without populating the fields of the data
    that is sent to the user in response to a command or upon the exit of a task.
    Each accounting package interested in using taskstats has to provide an
    additional patch to add its stats to the common structure.

    [akpm@osdl.org: cleanups, Kconfig fix]
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Initialization code related to collection of per-task "delay" statistics which
    measure how long it had to wait for cpu, sync block io, swapping etc. The
    collection of statistics and the interface are in other patches. This patch
    sets up the data structures and allows the statistics collection to be
    disabled through a kernel boot parameter.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     

04 Jul, 2006

2 commits