01 Aug, 2007

2 commits


26 Jul, 2007

1 commit


18 Jul, 2007

1 commit


17 Jul, 2007

4 commits

  • Basically, it will allow a process to unshare its user_struct table,
    resetting at the same time its own user_struct and all the associated
    accounting.

    A new root user (uid == 0) is added to the user namespace upon creation.
    Such root users have full privileges and it seems that theses privileges
    should be controlled through some means (process capabilities ?)

    The unshare is not included in this patch.

    Changes since [try #4]:
    - Updated get_user_ns and put_user_ns to accept NULL, and
    get_user_ns to return the namespace.

    Changes since [try #3]:
    - moved struct user_namespace to files user_namespace.{c,h}

    Changes since [try #2]:
    - removed struct user_namespace* argument from find_user()

    Changes since [try #1]:
    - removed struct user_namespace* argument from find_user()
    - added a root_user per user namespace

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
    deactivate the unshare of the uts and ipc namespaces and do not improve
    performance.

    Signed-off-by: Cedric Le Goater
    Acked-by: "Serge E. Hallyn"
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Change menuconfig objects from "menu, config" into "menuconfig" so that the
    user can disable the whole feature without entering its menu first.

    Signed-off-by: Jan Engelhardt
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Engelhardt
     
  • Currently slob is disabled if we're using sparsemem, due to an earlier
    patch from Goto-san. Slob and static sparsemem work without any trouble as
    it is, and the only hiccup is a missing slab_is_available() in the case of
    sparsemem extreme. With this, we're rid of the last set of restrictions
    for slob usage.

    Signed-off-by: Paul Mundt
    Acked-by: Pekka Enberg
    Acked-by: Matt Mackall
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

10 Jul, 2007

1 commit


17 May, 2007

2 commits

  • No arch sets ARCH_USES_SLAB_PAGE_STRUCT anymore.

    Remove the experimental dependency as well since we want to have it as
    a real alternative to SLAB.

    It all comes down to killing a single line from init/Kconfig.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The SLOB allocator should implement SLAB_DESTROY_BY_RCU correctly, because
    even on UP, RCU freeing semantics are not equivalent to simply freeing
    immediately. This also allows SLOB to be used on SMP.

    Signed-off-by: Nick Piggin
    Acked-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

11 May, 2007

5 commits

  • This is a very simple and light file descriptor, that can be used as event
    wait/dispatch by userspace (both wait and dispatch) and by the kernel
    (dispatch only). It can be used instead of pipe(2) in all cases where those
    would simply be used to signal events. Their kernel overhead is much lower
    than pipes, and they do not consume two fds. When used in the kernel, it can
    offer an fd-bridge to enable, for example, functionalities like KAIO or
    syslets/threadlets to signal to an fd the completion of certain operations.
    But more in general, an eventfd can be used by the kernel to signal readiness,
    in a POSIX poll/select way, of interfaces that would otherwise be incompatible
    with it. The API is:

    int eventfd(unsigned int count);

    The eventfd API accepts an initial "count" parameter, and returns an eventfd
    fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).

    The POLLIN flag is raised when the internal counter is greater than zero.

    The POLLOUT flag is raised when at least a value of "1" can be written to the
    internal counter.

    The POLLERR flag is raised when an overflow in the counter value is detected.

    The write(2) operation can never overflow the counter, since it blocks (unless
    O_NONBLOCK is set, in which case -EAGAIN is returned).

    But the eventfd_signal() function can do it, since it's supposed to not sleep
    during its operation.

    The read(2) function reads the __u64 counter value, and reset the internal
    value to zero. If the value read is equal to (__u64) -1, an overflow happened
    on the internal counter (due to 2^64 eventfd_signal() posts that has never
    been retired - unlickely, but possible).

    The write(2) call writes an __u64 count value, and adds it to the current
    counter. The eventfd fd supports O_NONBLOCK also.

    On the kernel side, we have:

    struct file *eventfd_fget(int fd);
    int eventfd_signal(struct file *file, unsigned int n);

    The eventfd_fget() should be called to get a struct file* from an eventfd fd
    (this is an fget() + check of f_op being an eventfd fops pointer).

    The kernel can then call eventfd_signal() every time it wants to post an event
    to userspace. The eventfd_signal() function can be called from any context.
    An eventfd() simple test and bench is available here:

    http://www.xmailserver.org/eventfd-bench.c

    This is the eventfd-based version of pipetest-4 (pipe(2) based):

    http://www.xmailserver.org/pipetest-4.c

    Not that performance matters much in the eventfd case, but eventfd-bench
    shows almost as double as performance than pipetest-4.

    [akpm@linux-foundation.org: fix i386 build]
    [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch introduces a new system call for timers events delivered though
    file descriptors. This allows timer event to be used with standard POSIX
    poll(2), select(2) and read(2). As a consequence of supporting the Linux
    f_op->poll subsystem, they can be used with epoll(2) too.

    The system call is defined as:

    int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

    The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
    w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
    s new file descriptor will be created, otherwise the existing "ufd" will be
    re-programmed.

    The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
    specified in the "utmr->it_value" parameter is the expiry time for the timer.

    If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
    otherwise it's a relative time.

    If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
    tv_nsec == 0), this is the period at which the following ticks should be
    generated.

    The "utmr->it_interval" should be set to zero if only one tick is requested.
    Setting the "utmr->it_value" to zero will disable the timer, or will create a
    timerfd without the timer enabled.

    The function returns the new (or same, in case "ufd" is a valid timerfd
    descriptor) file, or -1 in case of error.

    As stated before, the timerfd file descriptor supports poll(2), select(2) and
    epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
    returned.

    The read(2) call can be used, and it will return a u32 variable holding the
    number of "ticks" that happened on the interface since the last call to
    read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
    be returned if no ticks happened.

    A quick test program, shows timerfd working correctly on my amd64 box:

    http://www.xmailserver.org/timerfd-test.c

    [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch series implements the new signalfd() system call.

    I took part of the original Linus code (and you know how badly it can be
    broken :), and I added even more breakage ;) Signals are fetched from the same
    signal queue used by the process, so signalfd will compete with standard
    kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
    the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
    seems to be working fine on my Dual Opteron machine. I made a quick test
    program for it:

    http://www.xmailserver.org/signafd-test.c

    The signalfd() system call implements signal delivery into a file descriptor
    receiver. The signalfd file descriptor if created with the following API:

    int signalfd(int ufd, const sigset_t *mask, size_t masksize);

    The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
    to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
    signalfd file.

    The "mask" allows to specify the signal mask of signals that we are interested
    in. The "masksize" parameter is the size of "mask".

    The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
    will return POLLIN when signals are available to be dequeued. As a direct
    consequence of supporting the Linux poll subsystem, the signalfd fd can use
    used together with epoll(2) too.

    The read(2) system call will return a "struct signalfd_siginfo" structure in
    the userspace supplied buffer. The return value is the number of bytes copied
    in the supplied buffer, or -1 in case of error. The read(2) call can also
    return 0, in case the sighand structure to which the signalfd was attached,
    has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
    return -EAGAIN in case no signal is available.

    If the size of the buffer passed to read(2) is lower than sizeof(struct
    signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
    return -ERESTARTSYS in case a signal hits the process. The format of the
    struct signalfd_siginfo is, and the valid fields depends of the (->code &
    __SI_MASK) value, in the same way a struct siginfo would:

    struct signalfd_siginfo {
    __u32 signo; /* si_signo */
    __s32 err; /* si_errno */
    __s32 code; /* si_code */
    __u32 pid; /* si_pid */
    __u32 uid; /* si_uid */
    __s32 fd; /* si_fd */
    __u32 tid; /* si_fd */
    __u32 band; /* si_band */
    __u32 overrun; /* si_overrun */
    __u32 trapno; /* si_trapno */
    __s32 status; /* si_status */
    __s32 svint; /* si_int */
    __u64 svptr; /* si_ptr */
    __u64 utime; /* si_utime */
    __u64 stime; /* si_stime */
    __u64 addr; /* si_addr */
    };

    [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • This patch add an anonymous inode source, to be used for files that need
    and inode only in order to create a file*. We do not care of having an
    inode for each file, and we do not even care of having different names in
    the associated dentries (dentry names will be same for classes of file*).
    This allow code reuse, and will be used by epoll, signalfd and timerfd
    (and whatever else there'll be).

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     
  • Otherwise people get asked about SLUB_DEBUG even if they have another
    slab allocator enabled.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
    sound: convert "sound" subdirectory to UTF-8
    MAINTAINERS: Add cxacru website/mailing list
    include files: convert "include" subdirectory to UTF-8
    general: convert "kernel" subdirectory to UTF-8
    documentation: convert the Documentation directory to UTF-8
    Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
    remove broken URLs from net drivers' output
    Magic number prefix consistency change to Documentation/magic-number.txt
    trivial: s/i_sem /i_mutex/
    fix file specification in comments
    drivers/base/platform.c: fix small typo in doc
    misc doc and kconfig typos
    Remove obsolete fat_cvf help text
    Fix occurrences of "the the "
    Fix minor typoes in kernel/module.c
    Kconfig: Remove reference to external mqueue library
    Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
    Correct comments in genrtc.c to refer to correct /proc file.
    Fix more "deprecated" spellos.
    Fix "deprecated" typoes.
    ...

    Fix trivial comment conflict in kernel/relay.c.

    Linus Torvalds
     
  • Fix some of the spelling issues. Fix sentences. Discourage SLOB use
    since SLUB can pack objects denser.

    Signed-off-by: Christoph Lameter
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • CONFIG_SLUB_DEBUG can be used to switch off the debugging and sysfs components
    of SLUB. Thus SLUB will be able to replace SLOB. SLUB can arrange objects in
    a denser way than SLOB and the code size should be minimal without debugging
    and sysfs support.

    Note that CONFIG_SLUB_DEBUG is materially different from CONFIG_SLAB_DEBUG.
    CONFIG_SLAB_DEBUG is used to enable slab debugging in SLAB. SLUB enables
    debugging via a boot parameter. SLUB debug code should always be present.

    CONFIG_SLUB_DEBUG can be modified in the embedded config section.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 May, 2007

3 commits

  • Remove the reference to an external mqueue library since that was
    merged into glibc in 2004.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Adrian Bunk

    Robert P. J. Day
     
  • Fix several typos in help text in Kconfig* files.

    Signed-off-by: David Sterba
    Signed-off-by: Adrian Bunk

    David Sterba
     
  • Several people have observed that perhaps LOG_BUF_SHIFT should be in a more
    obvious place than under DEBUG_KERNEL. Under some circumstances (such as the
    PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an
    undesirable trade off for something as trivial as increasing the kernel log
    buffer size.

    Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more
    likely to be able to change it such a circumstance that the default buffer
    size is insufficient.

    Signed-off-by: Alistair John Strachan
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alistair John Strachan
     

08 May, 2007

2 commits

  • This adds support for the Analog Devices Blackfin processor architecture, and
    currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561
    (Dual Core) devices, with a variety of development platforms including those
    avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP,
    BF561-EZKIT), and Bluetechnix! Tinyboards.

    The Blackfin architecture was jointly developed by Intel and Analog Devices
    Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in
    December of 2000. Since then ADI has put this core into its Blackfin
    processor family of devices. The Blackfin core has the advantages of a clean,
    orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC
    (Multiply/Accumulate), state-of-the-art signal processing engine and
    single-instruction, multiple-data (SIMD) multimedia capabilities into a single
    instruction-set architecture.

    The Blackfin architecture, including the instruction set, is described by the
    ADSP-BF53x/BF56x Blackfin Processor Programming Reference
    http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf

    The Blackfin processor is already supported by major releases of gcc, and
    there are binary and source rpms/tarballs for many architectures at:
    http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete
    documentation, including "getting started" guides available at:
    http://docs.blackfin.uclinux.org/ which provides links to the sources and
    patches you will need in order to set up a cross-compiling environment for
    bfin-linux-uclibc

    This patch, as well as the other patches (toolchain, distribution,
    uClibc) are actively supported by Analog Devices Inc, at:
    http://blackfin.uclinux.org/

    We have tested this on LTP, and our test plan (including pass/fails) can
    be found at:
    http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel

    [m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files]
    Signed-off-by: Bryan Wu
    Signed-off-by: Mariusz Kozlowski
    Signed-off-by: Aubrey Li
    Signed-off-by: Jie Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryan Wu
     
  • This is a new slab allocator which was motivated by the complexity of the
    existing code in mm/slab.c. It attempts to address a variety of concerns
    with the existing implementation.

    A. Management of object queues

    A particular concern was the complex management of the numerous object
    queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
    each allocating CPU and use objects from a slab directly instead of
    queueing them up.

    B. Storage overhead of object queues

    SLAB Object queues exist per node, per CPU. The alien cache queue even
    has a queue array that contain a queue for each processor on each
    node. For very large systems the number of queues and the number of
    objects that may be caught in those queues grows exponentially. On our
    systems with 1k nodes / processors we have several gigabytes just tied up
    for storing references to objects for those queues This does not include
    the objects that could be on those queues. One fears that the whole
    memory of the machine could one day be consumed by those queues.

    C. SLAB meta data overhead

    SLAB has overhead at the beginning of each slab. This means that data
    cannot be naturally aligned at the beginning of a slab block. SLUB keeps
    all meta data in the corresponding page_struct. Objects can be naturally
    aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
    boundaries and can fit tightly into a 4k page with no bytes left over.
    SLAB cannot do this.

    D. SLAB has a complex cache reaper

    SLUB does not need a cache reaper for UP systems. On SMP systems
    the per CPU slab may be pushed back into partial list but that
    operation is simple and does not require an iteration over a list
    of objects. SLAB expires per CPU, shared and alien object queues
    during cache reaping which may cause strange hold offs.

    E. SLAB has complex NUMA policy layer support

    SLUB pushes NUMA policy handling into the page allocator. This means that
    allocation is coarser (SLUB does interleave on a page level) but that
    situation was also present before 2.6.13. SLABs application of
    policies to individual slab objects allocated in SLAB is
    certainly a performance concern due to the frequent references to
    memory policies which may lead a sequence of objects to come from
    one node after another. SLUB will get a slab full of objects
    from one node and then will switch to the next.

    F. Reduction of the size of partial slab lists

    SLAB has per node partial lists. This means that over time a large
    number of partial slabs may accumulate on those lists. These can
    only be reused if allocator occur on specific nodes. SLUB has a global
    pool of partial slabs and will consume slabs from that pool to
    decrease fragmentation.

    G. Tunables

    SLAB has sophisticated tuning abilities for each slab cache. One can
    manipulate the queue sizes in detail. However, filling the queues still
    requires the uses of the spin lock to check out slabs. SLUB has a global
    parameter (min_slab_order) for tuning. Increasing the minimum slab
    order can decrease the locking overhead. The bigger the slab order the
    less motions of pages between per CPU and partial lists occur and the
    better SLUB will be scaling.

    G. Slab merging

    We often have slab caches with similar parameters. SLUB detects those
    on boot up and merges them into the corresponding general caches. This
    leads to more effective memory use. About 50% of all caches can
    be eliminated through slab merging. This will also decrease
    slab fragmentation because partial allocated slabs can be filled
    up again. Slab merging can be switched off by specifying
    slub_nomerge on boot up.

    Note that merging can expose heretofore unknown bugs in the kernel
    because corrupted objects may now be placed differently and corrupt
    differing neighboring objects. Enable sanity checks to find those.

    H. Diagnostics

    The current slab diagnostics are difficult to use and require a
    recompilation of the kernel. SLUB contains debugging code that
    is always available (but is kept out of the hot code paths).
    SLUB diagnostics can be enabled via the "slab_debug" option.
    Parameters can be specified to select a single or a group of
    slab caches for diagnostics. This means that the system is running
    with the usual performance and it is much more likely that
    race conditions can be reproduced.

    I. Resiliency

    If basic sanity checks are on then SLUB is capable of detecting
    common error conditions and recover as best as possible to allow the
    system to continue.

    J. Tracing

    Tracing can be enabled via the slab_debug=T, option
    during boot. SLUB will then protocol all actions on that slabcache
    and dump the object contents on free.

    K. On demand DMA cache creation.

    Generally DMA caches are not needed. If a kmalloc is used with
    __GFP_DMA then just create this single slabcache that is needed.
    For systems that have no ZONE_DMA requirement the support is
    completely eliminated.

    L. Performance increase

    Some benchmarks have shown speed improvements on kernbench in the
    range of 5-10%. The locking overhead of slub is based on the
    underlying base allocation size. If we can reliably allocate
    larger order pages then it is possible to increase slub
    performance much further. The anti-fragmentation patches may
    enable further performance increases.

    Tested on:
    i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

    SLUB Boot options

    slub_nomerge Disable merging of slabs
    slub_min_order=x Require a minimum order for slab caches. This
    increases the managed chunk size and therefore
    reduces meta data and locking overhead.
    slub_min_objects=x Mininum objects per slab. Default is 8.
    slub_max_order=x Avoid generating slabs larger than order specified.
    slub_debug Enable all diagnostics for all caches
    slub_debug= Enable selective options for all caches
    slub_debug=, Enable selective options for a certain set of
    caches

    Available Debug options
    F Double Free checking, sanity and resiliency
    R Red zoning
    P Object / padding poisoning
    U Track last free / alloc
    T Trace all allocs / frees (only use for individual slabs).

    To use SLUB: Apply this patch and then select SLUB as the default slab
    allocator.

    [hugh@veritas.com: fix an oops-causing locking error]
    [akpm@linux-foundation.org: various stupid cleanups and small fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

03 May, 2007

1 commit


07 Mar, 2007

1 commit


15 Feb, 2007

1 commit

  • This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
    with special cases, and by keeping all of the ipc logic to together it makes
    the code a little more readable.

    [gcoady.lk@gmail.com: build fix]
    Signed-off-by: Eric W. Biederman
    Cc: Serge E. Hallyn
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Signed-off-by: Grant Coady
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

12 Feb, 2007

2 commits

  • Since they depends on TASKSTATS, it would be nice to move them closer to
    another options depending on TASKSTATS.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The file init/initramfs.c is always compiled and linked in the kernel
    vmlinux even when BLK_DEV_RAM and BLK_DEV_INITRD are disabled and the
    system isn't using any form of an initramfs or initrd. In this situation
    the code is only used to unpack a (static) default initial rootfilesystem.
    The current init/initramfs.c code. usr/initramfs_data.o compiles to a size
    of ~15 kbytes. Disabling BLK_DEV_RAM and BLK_DEV_INTRD shrinks the kernel
    code size with ~60 Kbytes.

    This patch avoids compiling in the code and data for initramfs support if
    CONFIG_BLK_DEV_INITRD is not defined. Instead of the initramfs code and
    data it uses a small routine in init/noinitramfs.c to setup an initial
    static default environment for mounting a rootfilesystem later on in the
    kernel initialisation process. The new code is: 164 bytes of size.

    The patch is separated in two parts:
    1) doesn't compile initramfs code when CONFIG_BLK_DEV_INITRD is not set
    2) changing all plaforms vmlinux.lds.S files to not reserve an area of
    PAGE_SIZE when CONFIG_BLK_DEV_INITRD is not set.

    [deweerdt@free.fr: warning fix]
    Signed-off-by: Jean-Paul Saman
    Cc: Al Viro
    Cc:
    Signed-off-by: Frederik Deweerdt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean-Paul Saman
     

23 Dec, 2006

2 commits

  • This is to disallow to make SLOB with SMP or SPARSEMEM. This avoids latent
    troubles of SLOB with SLAB_DESTROY_BY_RCU. And fix compile error.

    Signed-off-by: Yasunori Goto
    Acked-by: Randy Dunlap
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • The VM event counters, enabled by CONFIG_VM_EVENT_COUNTERS, which provides
    VM event counters in /proc/vmstat, has become more essential to
    non-EMBEDDED kernel configurations than they were in the past. Comments in
    the code and the Kconfig configuration explanation were stale, downplaying
    their role excessively.

    Refresh those comments to correctly reflect the current role of VM event
    counters.

    Signed-off-by: Paul Jackson
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

13 Dec, 2006

1 commit


11 Dec, 2006

1 commit

  • The present per-task IO accounting isn't very useful. It simply counts the
    number of bytes passed into read() and write(). So if a process reads 1MB
    from an already-cached file, it is accused of having performed 1MB of I/O,
    which is wrong.

    (David Wright had some comments on the applicability of the present logical IO accounting:

    For billing purposes it is useless but for workload analysis it is very
    useful

    read_bytes/read_calls average read request size
    write_bytes/write_calls average write request size

    read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
    write_bytes/write_blocks ie logical/physical guess since pdflush writes can
    be missed

    I often look for logical larger than physical to see filesystem cache
    problems. And the bytes/cpusec can help find applications that are
    dominating the cache and causing slow interactive response from page cache
    contention.

    I want to find the IO intensive applications and make sure they are doing
    efficient IO. Thus the acctcms(sysV) or csacms command would give the high
    IO commands).

    This patchset adds new accounting which tries to be more accurate. We account
    for three things:

    reads:

    attempt to count the number of bytes which this process really did cause
    to be fetched from the storage layer. Done at the submit_bio() level, so it
    is accurate for block-backed filesystems. I also attempt to wire up NFS and
    CIFS.

    writes:

    attempt to count the number of bytes which this process caused to be sent
    to the storage layer. This is done at page-dirtying time.

    The big inaccuracy here is truncate. If a process writes 1MB to a file
    and then deletes the file, it will in fact perform no writeout. But it will
    have been accounted as having caused 1MB of write.

    So...

    cancelled_writes:

    account the number of bytes which this process caused to not happen, by
    truncating pagecache.

    We _could_ just subtract this from the process's `write' accounting. But
    that means that some processes would be reported to have done negative
    amounts of write IO, which is silly.

    So we just report the raw number and punt this decision up to userspace.

    Now, we _could_ account for writes at the physical I/O level. But

    - This would require that we track memory-dirtying tasks at the per-page
    level (would require a new pointer in struct page).

    - It would mean that IO statistics for a process are usually only available
    long after that process has exitted. Which means that we probably cannot
    communicate this info via taskstats.

    This patch:

    Wire up the kernel-private data structures and the accessor functions to
    manipulate them.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

02 Dec, 2006

1 commit

  • Provide a way to support older versions of udev that are shipped in
    older distros. If this option is disabled, it will also turn off the
    compatible symlinks in sysfs that older programs might rely on.

    When in doubt, or if running a distro older than 2006, say Yes here.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

09 Nov, 2006

1 commit

  • The basic issue is that despite have been deprecated and warned about as a
    very bad thing in the man pages since its inception there are a few real
    users of sys_sysctl. It was my assumption that because sysctl had been
    deprecated for all of 2.6 there would be no user space users by this point,
    so I initially gave sys_sysctl a very short deprecation period.

    Now that I know there are a few real users the only sane way to proceed
    with deprecation is to push the time limit out to a year or two work and
    work with distributions that have big testing pools like fedora core to
    find these last remaining users.

    Which means that the sys_sysctl interface needs to be maintained in the
    meantime.

    Since I have provided a technical measure that allows us to add new sysctl
    entries without reserving more binary numbers I believe that is enough to
    fix the sys_sysctl binary interface maintenance problems, because there is
    no longer a need to change the binary interface at all.

    Since the sys_sysctl implementation needs to stay around for a while and
    the worst of the maintenance issues that caused us to occasionally break
    the ABI have been addressed I don't see any advantage in continuing with
    the removal of sys_sysctl.

    So instead of merely increasing the deprecation period this patch removes
    the deprecation of sys_sysctl and modifies the kernel to compile the code
    in by default.

    With committing to maintain sys_sysctl we get all of the advantages of a
    fast interface for anything that needs it. Currently sys_sysctl is about
    5x faster than /proc/sys, for the same string data.

    Signed-off-by: Eric W. Biederman
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

21 Oct, 2006

1 commit

  • This should make sure that, for UML, host's configuration files are not
    considered, which avoids various pains to the user. Our dependency are such
    that the obtained Kconfig will be valid and will lead to successful
    compilation - however they cannot prevent an user from disabling any boot
    device, and if an option is not set in the read .config (say
    /boot/config-XXX), with make menuconfig ARCH=um, it is not set. This always
    disables UBD and all console I/O channels, which leads to non-working UML
    kernels, so this bothers users - especially now, since it will happen on
    almost every machine (/boot/config-`uname -r` exists almost on every machine).
    It can be workarounded with make defconfig ARCH=um, but it is non-obvious and
    can be avoided, so please _do_ merge this patch.

    Given the existence of options, it could be interesting to implement
    (additionally) "option required" - with it, Kconfig will refuse reading a
    .config file (from wherever it comes) if the given option is not set. With
    this, one could mark with it the option characteristic of the given
    architecture (it was an old proposal of Roman Zippel, when I pointed out our
    problem):

    config UML
    option required
    default y

    However this should be further discussed:
    *) for x86, it must support constructs like:

    ==arch/i386/Kconfig==
    config 64BIT
    option required
    default n
    where Kconfig must require that CONFIG_64BIT is disabled or not present in the
    read .config.

    *) do we want to do such checks only for the starting defconfig or also for
    .config? Which leads to:
    *) I may want to port a x86_64 .config to x86 and viceversa, or even among more
    different archs. Should that be allowed, and in which measure (the user may
    force skipping the check for a .config or it is only given a warning by
    default)?

    Cc: Roman Zippel
    Cc:
    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     

02 Oct, 2006

2 commits

  • This patch set allows to unshare IPCs and have a private set of IPC objects
    (sem, shm, msg) inside namespace. Basically, it is another building block of
    containers functionality.

    This patch implements core IPC namespace changes:
    - ipc_namespace structure
    - new config option CONFIG_IPC_NS
    - adds CLONE_NEWIPC flag
    - unshare support

    [clg@fr.ibm.com: small fix for unshare of ipc namespace]
    [akpm@osdl.org: build fix]
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     
  • This patch defines the uts namespace and some manipulators.
    Adds the uts namespace to task_struct, and initializes a
    system-wide init namespace.

    It leaves a #define for system_utsname so sysctl will compile.
    This define will be removed in a separate patch.

    [akpm@osdl.org: build fix, cleanup]
    Signed-off-by: Serge Hallyn
    Cc: Kirill Korotaev
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Andrey Savochkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

01 Oct, 2006

2 commits

  • Add extended system accounting handling over taskstats interface. A
    CONFIG_TASK_XACCT flag is created to enable the extended accounting code.

    Signed-off-by: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jes Sorensen
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jay Lan
     
  • SYSCTL should still depend on EMBEDDED. This unbreaks the EMBEDDED menu
    (from the recent SYSCTL_SYCALL menu option patch).

    Fix typos in new SYSCTL_SYSCALL menu.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap