08 May, 2007

4 commits

  • Make handle_initrd() call try_to_freeze() in a suitable place instead of setting
    PF_NOFREEZE for the current task.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • This adds support for the Analog Devices Blackfin processor architecture, and
    currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561
    (Dual Core) devices, with a variety of development platforms including those
    avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP,
    BF561-EZKIT), and Bluetechnix! Tinyboards.

    The Blackfin architecture was jointly developed by Intel and Analog Devices
    Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in
    December of 2000. Since then ADI has put this core into its Blackfin
    processor family of devices. The Blackfin core has the advantages of a clean,
    orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC
    (Multiply/Accumulate), state-of-the-art signal processing engine and
    single-instruction, multiple-data (SIMD) multimedia capabilities into a single
    instruction-set architecture.

    The Blackfin architecture, including the instruction set, is described by the
    ADSP-BF53x/BF56x Blackfin Processor Programming Reference
    http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf

    The Blackfin processor is already supported by major releases of gcc, and
    there are binary and source rpms/tarballs for many architectures at:
    http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete
    documentation, including "getting started" guides available at:
    http://docs.blackfin.uclinux.org/ which provides links to the sources and
    patches you will need in order to set up a cross-compiling environment for
    bfin-linux-uclibc

    This patch, as well as the other patches (toolchain, distribution,
    uClibc) are actively supported by Analog Devices Inc, at:
    http://blackfin.uclinux.org/

    We have tested this on LTP, and our test plan (including pass/fails) can
    be found at:
    http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel

    [m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files]
    Signed-off-by: Bryan Wu
    Signed-off-by: Mariusz Kozlowski
    Signed-off-by: Aubrey Li
    Signed-off-by: Jie Zhang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryan Wu
     
  • This is a new slab allocator which was motivated by the complexity of the
    existing code in mm/slab.c. It attempts to address a variety of concerns
    with the existing implementation.

    A. Management of object queues

    A particular concern was the complex management of the numerous object
    queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
    each allocating CPU and use objects from a slab directly instead of
    queueing them up.

    B. Storage overhead of object queues

    SLAB Object queues exist per node, per CPU. The alien cache queue even
    has a queue array that contain a queue for each processor on each
    node. For very large systems the number of queues and the number of
    objects that may be caught in those queues grows exponentially. On our
    systems with 1k nodes / processors we have several gigabytes just tied up
    for storing references to objects for those queues This does not include
    the objects that could be on those queues. One fears that the whole
    memory of the machine could one day be consumed by those queues.

    C. SLAB meta data overhead

    SLAB has overhead at the beginning of each slab. This means that data
    cannot be naturally aligned at the beginning of a slab block. SLUB keeps
    all meta data in the corresponding page_struct. Objects can be naturally
    aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
    boundaries and can fit tightly into a 4k page with no bytes left over.
    SLAB cannot do this.

    D. SLAB has a complex cache reaper

    SLUB does not need a cache reaper for UP systems. On SMP systems
    the per CPU slab may be pushed back into partial list but that
    operation is simple and does not require an iteration over a list
    of objects. SLAB expires per CPU, shared and alien object queues
    during cache reaping which may cause strange hold offs.

    E. SLAB has complex NUMA policy layer support

    SLUB pushes NUMA policy handling into the page allocator. This means that
    allocation is coarser (SLUB does interleave on a page level) but that
    situation was also present before 2.6.13. SLABs application of
    policies to individual slab objects allocated in SLAB is
    certainly a performance concern due to the frequent references to
    memory policies which may lead a sequence of objects to come from
    one node after another. SLUB will get a slab full of objects
    from one node and then will switch to the next.

    F. Reduction of the size of partial slab lists

    SLAB has per node partial lists. This means that over time a large
    number of partial slabs may accumulate on those lists. These can
    only be reused if allocator occur on specific nodes. SLUB has a global
    pool of partial slabs and will consume slabs from that pool to
    decrease fragmentation.

    G. Tunables

    SLAB has sophisticated tuning abilities for each slab cache. One can
    manipulate the queue sizes in detail. However, filling the queues still
    requires the uses of the spin lock to check out slabs. SLUB has a global
    parameter (min_slab_order) for tuning. Increasing the minimum slab
    order can decrease the locking overhead. The bigger the slab order the
    less motions of pages between per CPU and partial lists occur and the
    better SLUB will be scaling.

    G. Slab merging

    We often have slab caches with similar parameters. SLUB detects those
    on boot up and merges them into the corresponding general caches. This
    leads to more effective memory use. About 50% of all caches can
    be eliminated through slab merging. This will also decrease
    slab fragmentation because partial allocated slabs can be filled
    up again. Slab merging can be switched off by specifying
    slub_nomerge on boot up.

    Note that merging can expose heretofore unknown bugs in the kernel
    because corrupted objects may now be placed differently and corrupt
    differing neighboring objects. Enable sanity checks to find those.

    H. Diagnostics

    The current slab diagnostics are difficult to use and require a
    recompilation of the kernel. SLUB contains debugging code that
    is always available (but is kept out of the hot code paths).
    SLUB diagnostics can be enabled via the "slab_debug" option.
    Parameters can be specified to select a single or a group of
    slab caches for diagnostics. This means that the system is running
    with the usual performance and it is much more likely that
    race conditions can be reproduced.

    I. Resiliency

    If basic sanity checks are on then SLUB is capable of detecting
    common error conditions and recover as best as possible to allow the
    system to continue.

    J. Tracing

    Tracing can be enabled via the slab_debug=T, option
    during boot. SLUB will then protocol all actions on that slabcache
    and dump the object contents on free.

    K. On demand DMA cache creation.

    Generally DMA caches are not needed. If a kmalloc is used with
    __GFP_DMA then just create this single slabcache that is needed.
    For systems that have no ZONE_DMA requirement the support is
    completely eliminated.

    L. Performance increase

    Some benchmarks have shown speed improvements on kernbench in the
    range of 5-10%. The locking overhead of slub is based on the
    underlying base allocation size. If we can reliably allocate
    larger order pages then it is possible to increase slub
    performance much further. The anti-fragmentation patches may
    enable further performance increases.

    Tested on:
    i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

    SLUB Boot options

    slub_nomerge Disable merging of slabs
    slub_min_order=x Require a minimum order for slab caches. This
    increases the managed chunk size and therefore
    reduces meta data and locking overhead.
    slub_min_objects=x Mininum objects per slab. Default is 8.
    slub_max_order=x Avoid generating slabs larger than order specified.
    slub_debug Enable all diagnostics for all caches
    slub_debug= Enable selective options for all caches
    slub_debug=, Enable selective options for a certain set of
    caches

    Available Debug options
    F Double Free checking, sanity and resiliency
    R Red zoning
    P Object / padding poisoning
    U Track last free / alloc
    T Trace all allocs / frees (only use for individual slabs).

    To use SLUB: Apply this patch and then select SLUB as the default slab
    allocator.

    [hugh@veritas.com: fix an oops-causing locking error]
    [akpm@linux-foundation.org: various stupid cleanups and small fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The nr_cpu_ids value is currently only calculated in smp_init. However, it
    may be needed before (SLUB needs it on kmem_cache_init!) and other kernel
    components may also want to allocate dynamically sized per cpu array before
    smp_init. So move the determination of possible cpus into sched_init()
    where we already loop over all possible cpus early in boot.

    Also initialize both nr_node_ids and nr_cpu_ids with the highest value they
    could take. If we have accidental users before these values are determined
    then the current valud of 0 may cause too small per cpu and per node arrays
    to be allocated. If it is set to the maximum possible then we only waste
    some memory for early boot users.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 May, 2007

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits)
    kconfig: fix mconf segmentation fault
    kbuild: enable use of code from a different dir
    kconfig: error out if recursive dependencies are found
    kbuild: scripts/basic/fixdep segfault on pathological string-o-death
    kconfig: correct minor typo in Kconfig warning message.
    kconfig: fix path to modules.txt in Kconfig help
    usr/Kconfig: fix typo
    kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs'
    kbuild: be more explicit on missing .config file
    kbuild: clarify the creation of the LOCALVERSION_AUTO string.
    kbuild: propagate errors from find in scripts/gen_initramfs_list.sh
    kconfig: refer to qt3 if we cannot find qt libraries
    kbuild: handle compressed cpio initramfs-es
    kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text
    kbuild: remove stale comment in modpost.c
    kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE
    kbuild: fix make mrproper for Documentation/DocBook/man
    kbuild: remove kconfig binaries during make mrproper
    kconfig/menuconfig: do not hardcode '.config'
    kbuild: override build timestamp & version
    ...

    Linus Torvalds
     

03 May, 2007

3 commits


07 Mar, 2007

1 commit


21 Feb, 2007

1 commit

  • We frequently need the maximum number of possible processors in order to
    allocate arrays for all processors. So far this was done using
    highest_possible_processor_id(). However, we do need the number of
    processors not the highest id. Moreover the number was so far dynamically
    calculated on each invokation. The number of possible processors does not
    change when the system is running. We can therefore calculate that number
    once.

    Signed-off-by: Christoph Lameter
    Cc: Frederik Deweerdt
    Cc: Neil Brown
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

20 Feb, 2007

1 commit

  • powerpc gets:

    init/main.c: In function `do_basic_setup':
    init/main.c:714: warning: implicit declaration of function `init_irq_proc'

    but we cannot include linux/irq.h in generic code.

    Fix it by moving the declaration into linux/interrupt.h instead.

    And make sure all code that defines init_irq_proc() is including
    linux/interrupt.h.

    And nuke an ifdef-in-C

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

17 Feb, 2007

1 commit

  • With Ingo Molnar

    The tick-management code is the first user of the clockevents layer. It takes
    clock event devices from the clock events core and uses them to provide the
    periodic tick.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: john stultz
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

15 Feb, 2007

5 commits

  • * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
    [PATCH] x86-64: Remove mk_pte_phys()
    [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
    [PATCH] i386: fix 32-bit ioctls on x64_32
    [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
    [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
    [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
    [PATCH] i386: Move mce_disabled to asm/mce.h
    [PATCH] i386: paravirt unhandled fallthrough
    [PATCH] x86_64: Wire up compat epoll_pwait
    [PATCH] x86: Don't require the vDSO for handling a.out signals
    [PATCH] i386: Fix Cyrix MediaGX detection
    [PATCH] i386: Fix warning in cpu initialization
    [PATCH] i386: Fix warning in microcode.c
    [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
    [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
    [PATCH] i386: Remove fastcall in paravirt.[ch]
    [PATCH] x86-64: Fix wrong gcc check in bitops.h
    [PATCH] x86-64: survive having no irq mapping for a vector
    [PATCH] i386: geode configuration fixes
    [PATCH] i386: add option to show more code in oops reports
    ...

    Linus Torvalds
     
  • With this change the sysctl inodes can be cached and nothing needs to be done
    when removing a sysctl table.

    For a cost of 2K code we will save about 4K of static tables (when we remove
    de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
    about half that on a 32bit arch.

    The speed feels about the same, even though we can now cache the sysctl
    dentries :(

    We get the core advantage that we don't need to have a 1 to 1 mapping between
    ctl table entries and proc files. Making it possible to have /proc/sys vary
    depending on the namespace you are in. The currently merged namespaces don't
    have an issue here but the network namespace under /proc/sys/net needs to have
    different directories depending on which network adapters are visible. By
    simply being a cache different directories being visible depending on who you
    are is trivial to implement.

    [akpm@osdl.org: fix uninitialised var]
    [akpm@osdl.org: fix ARM build]
    [bunk@stusta.de: make things static]
    Signed-off-by: Eric W. Biederman
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
    with special cases, and by keeping all of the ipc logic to together it makes
    the code a little more readable.

    [gcoady.lk@gmail.com: build fix]
    Signed-off-by: Eric W. Biederman
    Cc: Serge E. Hallyn
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Signed-off-by: Grant Coady
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

13 Feb, 2007

2 commits

  • o init() is a non __init function in .text section but it calls many
    functions which are in .init.text section. Hence MODPOST generates lots
    of cross reference warnings on i386 if compiled with CONFIG_RELOCATABLE=y

    WARNING: vmlinux - Section mismatch: reference to .init.text:smp_prepare_cpus from .text between 'init' (at offset 0xc0101049) and 'rest_init'
    WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init from .text between 'init' (at offset 0xc010104e) and 'rest_init'
    WARNING: vmlinux - Section mismatch: reference to .init.text:spawn_ksoftirqd from .text between 'init' (at offset 0xc0101053) and 'rest_init'

    o This patch breaks down init() in two parts. One part which can go
    in .init.text section and can be freed and other part which has to
    be non __init(init_post()). Now init() calls init_post() and init_post()
    does not call any functions present in .init sections. Hence getting
    rid of warnings.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Vivek Goyal
     
  • Current implementation stores a static command-line buffer allocated to
    COMMAND_LINE_SIZE size. Most architectures stores two copies of this buffer,
    one for future reference and one for parameter parsing.

    Current kernel command-line size for most architecture is much too small for
    module parameters, video settings, initramfs paramters and much more. The
    problem is that setting COMMAND_LINE_SIZE to a grater value, allocates static
    buffers.

    In order to allow a greater command-line size, these buffers should be
    dynamically allocated or marked as init disposable buffers, so unused memory
    can be released.

    This patch renames the static saved_command_line variable into
    boot_command_line adding __initdata attribute, so that it can be disposed
    after initialization. This rename is required so applications that use
    saved_command_line will not be affected by this change.

    It reintroduces saved_command_line as dynamically allocated buffer to match
    the data in boot_command_line.

    It also mark secondary command-line buffer as __initdata, and copies it to
    dynamically allocated static_command_line buffer components may hold reference
    to it after initialization.

    This patch is for linux-2.6.20-rc4-mm1 and is divided to target each
    architecture. I could not check this in any architecture so please forgive me
    if I got it wrong.

    The per-architecture modification is very simple, use boot_command_line in
    place of saved_command_line. The common code is the change into dynamic
    command-line.

    This patch:

    1. Rename saved_command_line into boot_command_line, mark as init
    disposable.

    2. Add dynamic allocated saved_command_line.

    3. Add dynamic allocated static_command_line.

    4. During startup copy: boot_command_line into saved_command_line. arch
    command_line into static_command_line.

    5. Parse static_command_line and not arch command_line, so arch
    command_line may be freed.

    Signed-off-by: Alon Bar-Lev
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: Ian Molton
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Hirokazu Takata
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Richard Curnow
    Cc: William Lee Irwin III
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Cc: Miles Bader
    Cc: Chris Zankel
    Cc: "Luck, Tony"
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Greg Ungerer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alon Bar-Lev
     

12 Feb, 2007

4 commits

  • Since they depends on TASKSTATS, it would be nice to move them closer to
    another options depending on TASKSTATS.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove the last (and commented out) invocation of the obsolete
    smp_commence() call.

    Signed-off-by: Robert P. J. Day
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • The file init/initramfs.c is always compiled and linked in the kernel
    vmlinux even when BLK_DEV_RAM and BLK_DEV_INITRD are disabled and the
    system isn't using any form of an initramfs or initrd. In this situation
    the code is only used to unpack a (static) default initial rootfilesystem.
    The current init/initramfs.c code. usr/initramfs_data.o compiles to a size
    of ~15 kbytes. Disabling BLK_DEV_RAM and BLK_DEV_INTRD shrinks the kernel
    code size with ~60 Kbytes.

    This patch avoids compiling in the code and data for initramfs support if
    CONFIG_BLK_DEV_INITRD is not defined. Instead of the initramfs code and
    data it uses a small routine in init/noinitramfs.c to setup an initial
    static default environment for mounting a rootfilesystem later on in the
    kernel initialisation process. The new code is: 164 bytes of size.

    The patch is separated in two parts:
    1) doesn't compile initramfs code when CONFIG_BLK_DEV_INITRD is not set
    2) changing all plaforms vmlinux.lds.S files to not reserve an area of
    PAGE_SIZE when CONFIG_BLK_DEV_INITRD is not set.

    [deweerdt@free.fr: warning fix]
    Signed-off-by: Jean-Paul Saman
    Cc: Al Viro
    Cc:
    Signed-off-by: Frederik Deweerdt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean-Paul Saman
     
  • Add retain_initrd option to control freeing of initrd memory after
    extraction. By default, free memory as previously.

    The first boot will need to hold a copy of the in memory fs for the second
    boot. This image can be large (much larger than the kernel), hence we can
    save time when the memory loader is slow. Also, it reduces the memory
    footprint while extracting the first boot since you don't need another copy
    of the fs.

    Signed-off-by: Michael Neuling
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Neuling
     

12 Jan, 2007

1 commit

  • It might save a few bytes after bootup, but it causes the string to be
    linked in at the end of the final vmlinux image, which defeats the whole
    point of doing all this, namely allowing some broken user-space binaries
    to search for the kernel version string in the kernel binary.

    So just remove the __init specifier.

    Cc: Olaf Hering
    Cc: Jean Delvare
    Cc: Roman Zippel
    Cc: Andrey Borzenkov
    Cc: Andrew Morton
    Acked-by: Andy Whitcroft
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Jan, 2007

2 commits

  • o Some functions which should have been in init sections as they are called
    only once. Put them in init sections. Otherwise MODPOST generates warning
    as these functions are placed in .text and they end up accessing something
    in init sections.

    WARNING: vmlinux - Section mismatch: reference to .init.text:migration_init
    from .text between 'do_pre_smp_initcalls' (at offset 0xc01000d1) and
    'run_init_process'

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen

    Vivek Goyal
     
  • Revert previous attempts at messing with the linux banner string and
    simply use a separate format string for proc.

    Signed-off-by: Roman Zippel
    Acked-by: Olaf Hering
    Acked-by: Jean Delvare
    Cc: Andrey Borzenkov
    Cc: Andrew Morton
    Cc: Andy Whitcroft
    Cc: Herbert Poetzl
    Signed-off-by: Linus Torvalds

    Roman Zippel
     

06 Jan, 2007

1 commit

  • The calls made by parse_parms to other initialization code might enable
    interrupts again way too early.

    Having interrupts on this early can make systems PANIC when they initialize
    the IRQ controllers (which happens later in the code). This patch detects
    that irq's are enabled again, barfs about it and disables them again as a
    safety net.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Ard van Breemen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ard van Breemen
     

23 Dec, 2006

3 commits

  • compile.h is created super-late in the build. But proc_misc.c want to include
    it, and it's generally not sane to have a header file in include/linux be
    created at the end of the build: it's either not present or, worse, wrong for
    most of the build.

    So the patch arranges for compile.h to be built at the start of the build
    process. It also consolidates the compile.h rules with those for version.h
    and utsname.h, so they all get built together.

    I hope. My chances of having got this right are about 2%.

    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This is to disallow to make SLOB with SMP or SPARSEMEM. This avoids latent
    troubles of SLOB with SLAB_DESTROY_BY_RCU. And fix compile error.

    Signed-off-by: Yasunori Goto
    Acked-by: Randy Dunlap
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • The VM event counters, enabled by CONFIG_VM_EVENT_COUNTERS, which provides
    VM event counters in /proc/vmstat, has become more essential to
    non-EMBEDDED kernel configurations than they were in the past. Comments in
    the code and the Kconfig configuration explanation were stale, downplaying
    their role excessively.

    Refresh those comments to correctly reflect the current role of VM event
    counters.

    Signed-off-by: Paul Jackson
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

21 Dec, 2006

1 commit


13 Dec, 2006

1 commit


12 Dec, 2006

2 commits

  • We should not initialize rootfs before all the core initializers have
    run. So do it as a separate stage just before starting the regular
    driver initializers.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • As reported by Andy Whitcroft, at least the SLES9 initrd build process
    depends on getting the kernel version from the kernel binary. It does
    that by simply trawling the binary and looking for the signature of the
    "linux_banner" string (the string "Linux version " to be exact. Which
    is really broken in itself, but whatever..)

    That got broken when the string was changed to allow /proc/version to
    change the UTS release information dynamically, and "get_kernel_version"
    thus returned "%s" (see commit a2ee8649ba6d71416712e798276bf7c40b64e6e5:
    "[PATCH] Fix linux banner utsname information").

    This just restores "linux_banner" as a static string, which should fix
    the version finding. And /proc/version simply uses a different string.

    To avoid wasting even that miniscule amount of memory, the early boot
    string should really be marked __initdata, but that just causes the same
    bug in SLES9 to re-appear, since it will then find other occurrences of
    "Linux version " first.

    Cc: Andy Whitcroft
    Acked-by: Herbert Poetzl
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Steve Fox
    Acked-by: Olaf Hering
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Dec, 2006

1 commit

  • The present per-task IO accounting isn't very useful. It simply counts the
    number of bytes passed into read() and write(). So if a process reads 1MB
    from an already-cached file, it is accused of having performed 1MB of I/O,
    which is wrong.

    (David Wright had some comments on the applicability of the present logical IO accounting:

    For billing purposes it is useless but for workload analysis it is very
    useful

    read_bytes/read_calls average read request size
    write_bytes/write_calls average write request size

    read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
    write_bytes/write_blocks ie logical/physical guess since pdflush writes can
    be missed

    I often look for logical larger than physical to see filesystem cache
    problems. And the bytes/cpusec can help find applications that are
    dominating the cache and causing slow interactive response from page cache
    contention.

    I want to find the IO intensive applications and make sure they are doing
    efficient IO. Thus the acctcms(sysV) or csacms command would give the high
    IO commands).

    This patchset adds new accounting which tries to be more accurate. We account
    for three things:

    reads:

    attempt to count the number of bytes which this process really did cause
    to be fetched from the storage layer. Done at the submit_bio() level, so it
    is accurate for block-backed filesystems. I also attempt to wire up NFS and
    CIFS.

    writes:

    attempt to count the number of bytes which this process caused to be sent
    to the storage layer. This is done at page-dirtying time.

    The big inaccuracy here is truncate. If a process writes 1MB to a file
    and then deletes the file, it will in fact perform no writeout. But it will
    have been accounted as having caused 1MB of write.

    So...

    cancelled_writes:

    account the number of bytes which this process caused to not happen, by
    truncating pagecache.

    We _could_ just subtract this from the process's `write' accounting. But
    that means that some processes would be reported to have done negative
    amounts of write IO, which is silly.

    So we just report the raw number and punt this decision up to userspace.

    Now, we _could_ account for writes at the physical I/O level. But

    - This would require that we track memory-dirtying tasks at the per-page
    level (would require a new pointer in struct page).

    - It would mean that IO statistics for a process are usually only available
    long after that process has exitted. Which means that we probably cannot
    communicate this info via taskstats.

    This patch:

    Wire up the kernel-private data structures and the accessor functions to
    manipulate them.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

09 Dec, 2006

2 commits

  • Add a per pid_namespace child-reaper. This is needed so processes are reaped
    within the same pid space and do not spill over to the parent pid space. Its
    also needed so containers preserve existing semantic that pid == 1 would reap
    orphaned children.

    This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • utsname information is shown in the linux banner, which also is used for
    /proc/version (which can have different utsname values inside a uts
    namespaces). this patch makes the varying data arguments and changes the
    string to a format string, using those arguments.

    Signed-off-by: Herbert Poetzl
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Herbert Poetzl
     

08 Dec, 2006

3 commits

  • * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
    [PATCH] x86-64: Export smp_call_function_single
    [PATCH] i386: Clean up smp_tune_scheduling()
    [PATCH] unwinder: move .eh_frame to RODATA
    [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
    [PATCH] x86-64: don't use set_irq_regs()
    [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
    [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
    [PATCH] i386: replace kmalloc+memset with kzalloc
    [PATCH] x86-64: remove remaining pc98 code
    [PATCH] x86-64: remove unused variable
    [PATCH] x86-64: Fix constraints in atomic_add_return()
    [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
    [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
    [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
    [PATCH] x86-64: Fix numaq build error
    [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
    [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
    [PATCH] x86-64: Clarify error message in GART code
    [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
    [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
    ...

    Fixed conflict in include/linux/uaccess.h manually

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Keith says

    Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux), wait_hpet_tick is
    optimized away to a never ending loop and the kernel hangs on boot in timer
    setup.

    0000001a :
    1a: 55 push %ebp
    1b: 89 e5 mov %esp,%ebp
    1d: eb fe jmp 1d

    This is not a problem with gcc 3.3.5. Adding barrier() calls to
    wait_hpet_tick does not help, making the variables volatile does.

    And the consensus is that gcc-4.1.0 is busted. Suse went and shipped
    gcc-4.1.0 so we cannot ban it. Add a warning.

    Cc: Keith Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • It turns out that the "-c" option of cpio is highly unportable even between
    distros let alone unix variants, and may actually make the wrong type of
    cpio archive. I just wasted quite some time on this, and the kernel can
    detect this and warn about it (it's __init memory so it gets thrown away
    and thus there is no runtime overhead)

    Signed-off-by: Arjan van de Ven
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven