02 Nov, 2008

1 commit

  • Removed duplicated #include in init/do_mounts_md.c.

    The same compile error ("error: implicit declaration of function
    'msleep'") got fixed twice:

    - f8b77d39397e1510b1a3bcfd385ebd1a45aae77f ("init/do_mounts_md.c:
    msleep compile fix")

    - 73b4a24f5ff09389ba6277c53a266b142f655ed2 ("init/do_mounts_md.c must
    #include ")

    by people adding the include in two slightly different
    places. Andrew's quilt scripts happily ignore the fuzz, and will
    re-apply the patch even though they had conflicts.

    Signed-off-by: Huang Weiyi
    Signed-off-by: Linus Torvalds

    Huang Weiyi
     

31 Oct, 2008

2 commits


26 Oct, 2008

1 commit

  • This reverts commit a802dd0eb5fc97a50cf1abb1f788a8f6cc5db635 by moving
    the call to init_workqueues() back where it belongs - after SMP has been
    initialized.

    It also moves stop_machine_init() - which needs workqueues - to a later
    phase using a core_initcall() instead of early_initcall(). That should
    satisfy all ordering requirements, and was apparently the reason why
    init_workqueues() was moved to be too early.

    Cc: Heiko Carstens
    Cc: Rusty Russell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 Oct, 2008

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
    [PATCH] fs: add a sanity check in d_free
    [PATCH] i_version: remount support
    [patch] vfs: make security_inode_setattr() calling consistent
    [patch 1/3] FS_MBCACHE: don't needlessly make it built-in
    [PATCH] move executable checking into ->permission()
    [PATCH] fs/dcache.c: update comment of d_validate()
    [RFC PATCH] touch_mnt_namespace when the mount flags change
    [PATCH] reiserfs: add missing llseek method
    [PATCH] fix ->llseek for more directories
    [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
    [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
    [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
    [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
    [PATCH vfs-2.6 2/6] vfs: add d_ancestor()
    [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
    [PATCH] get rid of on-stack dentry in udf
    [PATCH 2/2] anondev: switch to IDA
    [PATCH 1/2] anondev: init IDR statically
    [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
    [PATCH] Optimise NFS readdir hack slightly.
    ...

    Linus Torvalds
     
  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (32 commits)
    PCI hotplug: fix logic in Compaq hotplug controller bus speed setup
    PCI: don't export linux/io.h from pci.h
    PCI: PCI_QUIRKS depends on PCI
    PCI hotplug: pciehp: poll data link layer link active
    PCI hotplug: pciehp: fix possible memory leak in pcie_init
    PCI: Workaround invalid P2P bridge bus numbers
    PCI Hotplug: fakephp: add duplicate slot name debugging
    PCI: Hotplug core: remove 'name'
    PCI: shcphp: remove 'name' parameter
    PCI: SGI Hotplug: stop managing bss_hotplug_slot->name
    PCI: rpaphp: kmalloc/kfree slot->name directly
    PCI: pciehp: remove 'name' parameter
    PCI: ibmphp: stop managing hotplug_slot->name
    PCI: fakephp: remove 'name' parameter
    PCI, PCI Hotplug: introduce slot_name helpers
    PCI: cpqphp: stop managing hotplug_slot->name
    PCI: cpci_hotplug: stop managing hotplug_slot->name
    PCI: acpiphp: remove 'name' parameter
    PCI: prevent duplicate slot names
    PCI Hotplug: serialize pci_hp_register and pci_hp_deregister
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    stop_machine: fix error code handling on multiple cpus
    stop_machine: use workqueues instead of kernel threads
    workqueue: introduce create_rt_workqueue
    Call init_workqueues before pre smp initcalls.
    Make panic= and panic_on_oops into core_params
    Make initcall_debug a core_param
    core_param() for genuinely core kernel parameters
    param: Fix duplicate module prefixes
    module: check kernel param length at compile time, not runtime
    Remove stop_machine during module load v2
    module: simplify load_module.

    Linus Torvalds
     

23 Oct, 2008

3 commits

  • page_cgroup_init() is called from mem_cgroup_init(). But at this
    point, we cannot call alloc_bootmem().
    (and this caused panic at boot.)

    This patch moves page_cgroup_init() to init/main.c.

    Time table is following:
    ==
    parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
    ....
    cgroup_init_early() # "early" init of cgroup.
    ....
    setup_arch() # memmap is allocated.
    ...
    page_cgroup_init();
    mem_init(); # we cannot call alloc_bootmem after this.
    ....
    cgroup_init() # mem_cgroup is initialized.
    ==

    Before page_cgroup_init(), mem_map must be initialized. So,
    I added page_cgroup_init() to init/main.c directly.

    (*) maybe this is not very clean but
    - cgroup_init_early() is too early
    - in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
    use of vmalloc area in x86-32 is important and we should avoid very large
    vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
    directly to init/main.c

    [akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
    [akpm@linux-foundation.org: fix build]
    Acked-by: Balbir Singh
    Tested-by: Balbir Singh
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     
  • commit 3d137310245e4cdc3e8c8ba1bea2e145a87ae8e3 ("PCI: allow quirks to be
    compiled out") introduced CONFIG_PCI_QUIRKS, which now shows up in each
    and every .config. Fix this by making it depend on PCI.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Jesse Barnes

    Geert Uytterhoeven
     

22 Oct, 2008

2 commits


21 Oct, 2008

3 commits

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (41 commits)
    PCI: fix pci_ioremap_bar() on s390
    PCI: fix AER capability check
    PCI: use pci_find_ext_capability everywhere
    PCI: remove #ifdef DEBUG around dev_dbg call
    PCI hotplug: fix get_##name return value problem
    PCI: document the pcie_aspm kernel parameter
    PCI: introduce an pci_ioremap(pdev, barnr) function
    powerpc/PCI: Add legacy PCI access via sysfs
    PCI: Add ability to mmap legacy_io on some platforms
    PCI: probing debug message uniformization
    PCI: support PCIe ARI capability
    PCI: centralize the capabilities code in probe.c
    PCI: centralize the capabilities code in pci-sysfs.c
    PCI: fix 64-vbit prefetchable memory resource BARs
    PCI: replace cfg space size (256/4096) by macros.
    PCI: use resource_size() everywhere.
    PCI: use same arg names in PCI_VDEVICE comment
    PCI hotplug: rpaphp: make debug var unique
    PCI: use %pF instead of print_fn_descriptor_symbol() in quirks.c
    PCI: fix hotplug get_##name return value problem
    ...

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
    tracing/fastboot: improve help text
    tracing/stacktrace: improve help text
    tracing/fastboot: fix initcalls disposition in bootgraph.pl
    tracing/fastboot: fix bootgraph.pl initcall name regexp
    tracing/fastboot: fix issues and improve output of bootgraph.pl
    tracepoints: synchronize unregister static inline
    tracepoints: tracepoint_synchronize_unregister()
    ftrace: make ftrace_test_p6nop disassembler-friendly
    markers: fix synchronize marker unregister static inline
    tracing/fastboot: add better resolution to initcall debug/tracing
    trace: add build-time check to avoid overrunning hex buffer
    ftrace: fix hex output mode of ftrace
    tracing/fastboot: fix initcalls disposition in bootgraph.pl
    tracing/fastboot: fix printk format typo in boot tracer
    ftrace: return an error when setting a nonexistent tracer
    ftrace: make some tracers reentrant
    ring-buffer: make reentrant
    ring-buffer: move page indexes into page headers
    tracing/fastboot: only trace non-module initcalls
    ftrace: move pc counter in irqtrace
    ...

    Manually fix conflicts:
    - init/main.c: initcall tracing
    - kernel/module.c: verbose level vs tracepoints
    - scripts/bootgraph.pl: fallout from cherry-picking commits.

    Linus Torvalds
     
  • This patch adds the CONFIG_PCI_QUIRKS option which allows to remove all
    the PCI quirks, which are not necessarily used on embedded systems when
    PCI is working properly. As this is a size-reduction option, it depends
    on CONFIG_EMBEDDED. It allows to save almost 12 kilobytes of kernel
    code:

    text data bss dec hex filename
    1287806 123596 212992 1624394 18c94a vmlinux.old
    1275854 123596 212992 1612442 189a9a vmlinux
    -11952 0 0 -11952 -2EB0 +/-

    This patch has originally been written by Zwane Mwaikambo
    and is part of the Linux Tiny project.

    Signed-off-by: Thomas Petazzoni
    Signed-off-by: Jesse Barnes

    Thomas Petazzoni
     

20 Oct, 2008

2 commits

  • This patch implements a new freezer subsystem in the control groups
    framework. It provides a way to stop and resume execution of all tasks in
    a cgroup by writing in the cgroup filesystem.

    The freezer subsystem in the container filesystem defines a file named
    freezer.state. Writing "FROZEN" to the state file will freeze all tasks
    in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
    the cgroup. Reading will return the current state.

    * Examples of usage :

    # mkdir /containers/freezer
    # mount -t cgroup -ofreezer freezer /containers
    # mkdir /containers/0
    # echo $some_pid > /containers/0/tasks

    to get status of the freezer subsystem :

    # cat /containers/0/freezer.state
    RUNNING

    to freeze all tasks in the container :

    # echo FROZEN > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    FREEZING
    # cat /containers/0/freezer.state
    FROZEN

    to unfreeze all tasks in the container :

    # echo RUNNING > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    RUNNING

    This is the basic mechanism which should do the right thing for user space
    task in a simple scenario.

    It's important to note that freezing can be incomplete. In that case we
    return EBUSY. This means that some tasks in the cgroup are busy doing
    something that prevents us from completely freezing the cgroup at this
    time. After EBUSY, the cgroup will remain partially frozen -- reflected
    by freezer.state reporting "FREEZING" when read. The state will remain
    "FREEZING" until one of these things happens:

    1) Userspace cancels the freezing operation by writing "RUNNING" to
    the freezer.state file
    2) Userspace retries the freezing operation by writing "FROZEN" to
    the freezer.state file (writing "FREEZING" is not legal
    and returns EIO)
    3) The tasks that blocked the cgroup from entering the "FROZEN"
    state disappear from the cgroup's set of tasks.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: export thaw_process]
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Matt Helsley
    Acked-by: Serge E. Hallyn
    Tested-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     
  • Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
    provide a fast, scalable percpu frontend for small vmaps (requires a
    slightly different API, though).

    The biggest problem with vmap is actually vunmap. Presently this requires
    a global kernel TLB flush, which on most architectures is a broadcast IPI
    to all CPUs to flush the cache. This is all done under a global lock. As
    the number of CPUs increases, so will the number of vunmaps a scaled
    workload will want to perform, and so will the cost of a global TLB flush.
    This gives terrible quadratic scalability characteristics.

    Another problem is that the entire vmap subsystem works under a single
    lock. It is a rwlock, but it is actually taken for write in all the fast
    paths, and the read locking would likely never be run concurrently anyway,
    so it's just pointless.

    This is a rewrite of vmap subsystem to solve those problems. The existing
    vmalloc API is implemented on top of the rewritten subsystem.

    The TLB flushing problem is solved by using lazy TLB unmapping. vmap
    addresses do not have to be flushed immediately when they are vunmapped,
    because the kernel will not reuse them again (would be a use-after-free)
    until they are reallocated. So the addresses aren't allocated again until
    a subsequent TLB flush. A single TLB flush then can flush multiple
    vunmaps from each CPU.

    XEN and PAT and such do not like deferred TLB flushing because they can't
    always handle multiple aliasing virtual addresses to a physical address.
    They now call vm_unmap_aliases() in order to flush any deferred mappings.
    That call is very expensive (well, actually not a lot more expensive than
    a single vunmap under the old scheme), however it should be OK if not
    called too often.

    The virtual memory extent information is stored in an rbtree rather than a
    linked list to improve the algorithmic scalability.

    There is a per-CPU allocator for small vmaps, which amortizes or avoids
    global locking.

    To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
    must be used in place of vmap and vunmap. Vmalloc does not use these
    interfaces at the moment, so it will not be quite so scalable (although it
    will use lazy TLB flushing).

    As a quick test of performance, I ran a test that loops in the kernel,
    linearly mapping then touching then unmapping 4 pages. Different numbers
    of tests were run in parallel on an 4 core, 2 socket opteron. Results are
    in nanoseconds per map+touch+unmap.

    threads vanilla vmap rewrite
    1 14700 2900
    2 33600 3000
    4 49500 2800
    8 70631 2900

    So with a 8 cores, the rewritten version is already 25x faster.

    In a slightly more realistic test (although with an older and less
    scalable version of the patch), I ripped the not-very-good vunmap batching
    code out of XFS, and implemented the large buffer mapping with vm_map_ram
    and vm_unmap_ram... along with a couple of other tricks, I was able to
    speed up a large directory workload by 20x on a 64 CPU system. I believe
    vmap/vunmap is actually sped up a lot more than 20x on such a system, but
    I'm running into other locks now. vmap is pretty well blown off the
    profiles.

    Before:
    1352059 total 0.1401
    798784 _write_lock 8320.6667
    Cc: Jeremy Fitzhardinge
    Cc: Krzysztof Helt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Oct, 2008

4 commits

  • This patch fixes the following compile error caused by commit
    589f800bb12c5cd6c9167bbf9bf3cb70cd8e422c ("fastboot: make the raid
    autodetect code wait for all devices to init"):

    CC init/do_mounts_md.o
    init/do_mounts_md.c: In function 'autodetect_raid':
    init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'
    make[2]: *** [init/do_mounts_md.o] Error 1

    Signed-off-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patchs adds the CONFIG_AIO option which allows to remove support
    for asynchronous I/O operations, that are not necessarly used by
    applications, particularly on embedded devices. As this is a
    size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
    save ~7 kilobytes of kernel code/data:

    text data bss dec hex filename
    1115067 119180 217088 1451335 162547 vmlinux
    1108025 119048 217088 1444161 160941 vmlinux.new
    -7042 -132 0 -7174 -1C06 +/-

    This patch has been originally written by Matt Mackall
    , and is part of the Linux Tiny project.

    [randy.dunlap@oracle.com: build fix]
    Signed-off-by: Thomas Petazzoni
    Cc: Benjamin LaHaise
    Cc: Zach Brown
    Signed-off-by: Matt Mackall
    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Petazzoni
     
  • When unpacking the cpio into the initramfs, mtimes are not preserved by
    default. This patch adds an INITRAMFS_PRESERVE_MTIME option that allows
    mtimes stored in the cpio image to be used when constructing the
    initramfs.

    For embedded applications that run exclusively out of the initramfs, this
    is invaluable:

    When building embedded application initramfs images, its nice to know when
    the files were actually created during the build process - that makes it
    easier to see what files were modified when so we can compare the files
    that are being used on the image with the files used during the build
    process. This might help (for example) to determine if the target system
    has all the updated files you expect to see w/o having to check MD5s etc.

    In our environment, the whole system runs off the initramfs partition, and
    seeing the modified times of the shared libraries (for example) helps us
    find bugs that may have been introduced by the build system incorrectly
    propogating outdated shared libraries into the image.

    Similarly, many of the initializion/configuration files in /etc might be
    dynamically built by the build system, and knowing when they were modified
    helps us sanity check whether the target system has the "latest" files
    etc.

    Finally, we might use last modified times to determine whether a hot fix
    should be applied or not to the running ramfs.

    Signed-off-by: Nye Liu
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nye Liu
     
  • identify_ramdisk_image() returns 0 (not -1) if a gzipped ramdisk is found:

    if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
    printk(KERN_NOTICE
    "RAMDISK: Compressed image found at block %d\n",
    start_block);
    nblocks = 0;
    ^^^^^^^^^^^
    goto done;
    }

    ...

    done:
    sys_lseek(fd, start_block * BLOCK_SIZE, 0);
    kfree(buf);
    return nblocks;
    ^^^^^^^^^^^^^^

    Hence correct the typo in the comment, which has existed since the
    addition of compressed ramdisk support in 1.3.48.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

15 Oct, 2008

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot:
    raid, fastboot: hide RAID autodetect option if MD is compiled as a module
    raid: make RAID autodetect default a KConfig option
    warning: fix init do_mounts_md c
    fastboot: make the RAID autostart code print a message just before waiting
    fastboot: make the raid autodetect code wait for all devices to init
    fastboot: Fix bootgraph.pl initcall name regexp
    fastboot: fix issues and improve output of bootgraph.pl
    Add a script to visualize the kernel boot process / time

    Linus Torvalds
     

14 Oct, 2008

11 commits

  • Change the time resolution for initcall_debug to microseconds, from
    milliseconds. This is handy to determine which initcalls you want to work
    on for faster booting.

    One one of my test machines, over 90% of the initcalls are less than a
    millisecond and (without this patch) these are all reported as 0 msecs.
    Working on the 900 us ones is more important than the 4 us ones.

    With 'quiet' on the kernel command line, this adds no significant overhead
    to kernel boot time.

    Signed-off-by: Tim Bird
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Tim Bird
     
  • At this time, only built-in initcalls interest us.
    We can't really produce a relevant graph if we include
    the modules initcall too.

    I had good results after this patch (see svg in attachment).

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • After some initcall traces, some initcall names may be inconsistent.
    That's because these functions will disappear from the .init section
    and also their name from the symbols table.

    So we have to copy the name of the function in a buffer large enough
    during the trace appending. It is not costly for the ring_buffer because
    the number of initcall entries is commonly not really large.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Change the boot tracer printing to make it parsable for
    the scripts/bootgraph.pl script.

    We have now to output two lines for each initcall, according to the
    printk in do_one_initcall() in init/main.c
    We need now the call's time and the return's time.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Launch the boot tracing inside the initcall_debug area. Old printk
    have not been removed to keep the old way of initcall tracing for
    backward compatibility.

    [ mingo@elte.hu: resolved conflicts ]
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frédéric Weisbecker
     
  • When optimizing the kernel boot time, it's very valuable to visualize
    what is going on at which time. In addition, with the fastboot asynchronous
    initcall level, it's very valuable to see which initcall gets run where
    and when.

    This patch adds a script to turn a dmesg into a SVG graph (that can be
    shown with tools such as InkScape, Gimp or Firefox) and a small change
    to the initcall code to print the PID of the thread calling the initcall
    (so that the script can work out the parallelism).

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • This is the infrastructure to the converting the mcount call sites
    recorded by the __mcount_loc section into nops on boot. It also allows
    for using these sites to enable tracing as normal. When the __mcount_loc
    section is used, the "ftraced" kernel thread is disabled.

    This uses the current infrastructure to record the mcount call sites
    as well as convert them to nops. The mcount function is kept as a stub
    on boot up and not converted to the ftrace_record_ip function. We use the
    ftrace_record_ip to only record from the table.

    This patch does not handle modules. That comes with a later patch.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • do not expose users to CONFIG_TRACEPOINTS - tracers can select it
    just fine.

    update ftrace to select CONFIG_TRACEPOINTS.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • while it's arguably low overhead, we dont enable new features by default.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Implementation of kernel tracepoints. Inspired from the Linux Kernel
    Markers. Allows complete typing verification by declaring both tracing
    statement inline functions and probe registration/unregistration static
    inline functions within the same macro "DEFINE_TRACE". No format string
    is required. See the tracepoint Documentation and Samples patches for
    usage examples.

    Taken from the documentation patch :

    "A tracepoint placed in code provides a hook to call a function (probe)
    that you can provide at runtime. A tracepoint can be "on" (a probe is
    connected to it) or "off" (no probe is attached). When a tracepoint is
    "off" it has no effect, except for adding a tiny time penalty (checking
    a condition for a branch) and space penalty (adding a few bytes for the
    function call at the end of the instrumented function and adds a data
    structure in a separate section). When a tracepoint is "on", the
    function you provide is called each time the tracepoint is executed, in
    the execution context of the caller. When the function provided ends its
    execution, it returns to the caller (continuing from the tracepoint
    site).

    You can put tracepoints at important locations in the code. They are
    lightweight hooks that can pass an arbitrary number of parameters, which
    prototypes are described in a tracepoint declaration placed in a header
    file."

    Addition and removal of tracepoints is synchronized by RCU using the
    scheduler (and preempt_disable) as guarantees to find a quiescent state
    (this is really RCU "classic"). The update side uses rcu_barrier_sched()
    with call_rcu_sched() and the read/execute side uses
    "preempt_disable()/preempt_enable()".

    We make sure the previous array containing probes, which has been
    scheduled for deletion by the rcu callback, is indeed freed before we
    proceed to the next update. It therefore limits the rate of modification
    of a single tracepoint to one update per RCU period. The objective here
    is to permit fast batch add/removal of probes on _different_
    tracepoints.

    Changelog :
    - Use #name ":" #proto as string to identify the tracepoint in the
    tracepoint table. This will make sure not type mismatch happens due to
    connexion of a probe with the wrong type to a tracepoint declared with
    the same name in a different header.
    - Add tracepoint_entry_free_old.
    - Change __TO_TRACE to get rid of the 'i' iterator.

    Masami Hiramatsu :
    Tested on x86-64.

    Performance impact of a tracepoint : same as markers, except that it
    adds about 70 bytes of instructions in an unlikely branch of each
    instrumented function (the for loop, the stack setup and the function
    call). It currently adds a memory read, a test and a conditional branch
    at the instrumentation site (in the hot path). Immediate values will
    eventually change this into a load immediate, test and branch, which
    removes the memory read which will make the i-cache impact smaller
    (changing the memory read for a load immediate removes 3-4 bytes per
    site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
    also saves the d-cache hit).

    About the performance impact of tracepoints (which is comparable to
    markers), even without immediate values optimizations, tests done by
    Hideo Aoki on ia64 show no regression. His test case was using hackbench
    on a kernel where scheduler instrumentation (about 5 events in code
    scheduler code) was added.

    Quoting Hideo Aoki about Markers :

    I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
    tree, which includes several markers for LTTng, using an ia64 server.

    While the immediate trace mark feature isn't implemented on ia64, there
    is no major performance regression. So, I think that we don't have any
    issues to propose merging marker point patches into Linus's tree from
    the viewpoint of performance impact.

    I prepared two kernels to evaluate. The first one was compiled without
    CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

    I downloaded the original hackbench from the following URL:
    http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

    I ran hackbench 5 times in each condition and calculated the average and
    difference between the kernels.

    The parameter of hackbench: every 50 from 50 to 800
    The number of CPUs of the server: 2, 4, and 8

    Below is the results. As you can see, major performance regression
    wasn't found in any case. Even if number of processes increases,
    differences between marker-enabled kernel and marker- disabled kernel
    doesn't increase. Moreover, if number of CPUs increases, the differences
    doesn't increase either.

    Curiously, marker-enabled kernel is better than marker-disabled kernel
    in more than half cases, although I guess it comes from the difference
    of memory access pattern.

    * 2 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 4.811 | 4.872 | +0.061 | +1.27 |
    100 | 9.854 | 10.309 | +0.454 | +4.61 |
    150 | 15.602 | 15.040 | -0.562 | -3.6 |
    200 | 20.489 | 20.380 | -0.109 | -0.53 |
    250 | 25.798 | 25.652 | -0.146 | -0.56 |
    300 | 31.260 | 30.797 | -0.463 | -1.48 |
    350 | 36.121 | 35.770 | -0.351 | -0.97 |
    400 | 42.288 | 42.102 | -0.186 | -0.44 |
    450 | 47.778 | 47.253 | -0.526 | -1.1 |
    500 | 51.953 | 52.278 | +0.325 | +0.63 |
    550 | 58.401 | 57.700 | -0.701 | -1.2 |
    600 | 63.334 | 63.222 | -0.112 | -0.18 |
    650 | 68.816 | 68.511 | -0.306 | -0.44 |
    700 | 74.667 | 74.088 | -0.579 | -0.78 |
    750 | 78.612 | 79.582 | +0.970 | +1.23 |
    800 | 85.431 | 85.263 | -0.168 | -0.2 |
    --------------------------------------------------------------

    * 4 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 2.586 | 2.584 | -0.003 | -0.1 |
    100 | 5.254 | 5.283 | +0.030 | +0.56 |
    150 | 8.012 | 8.074 | +0.061 | +0.76 |
    200 | 11.172 | 11.000 | -0.172 | -1.54 |
    250 | 13.917 | 14.036 | +0.119 | +0.86 |
    300 | 16.905 | 16.543 | -0.362 | -2.14 |
    350 | 19.901 | 20.036 | +0.135 | +0.68 |
    400 | 22.908 | 23.094 | +0.186 | +0.81 |
    450 | 26.273 | 26.101 | -0.172 | -0.66 |
    500 | 29.554 | 29.092 | -0.461 | -1.56 |
    550 | 32.377 | 32.274 | -0.103 | -0.32 |
    600 | 35.855 | 35.322 | -0.533 | -1.49 |
    650 | 39.192 | 38.388 | -0.804 | -2.05 |
    700 | 41.744 | 41.719 | -0.025 | -0.06 |
    750 | 45.016 | 44.496 | -0.520 | -1.16 |
    800 | 48.212 | 47.603 | -0.609 | -1.26 |
    --------------------------------------------------------------

    * 8 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 2.094 | 2.072 | -0.022 | -1.07 |
    100 | 4.162 | 4.273 | +0.111 | +2.66 |
    150 | 6.485 | 6.540 | +0.055 | +0.84 |
    200 | 8.556 | 8.478 | -0.078 | -0.91 |
    250 | 10.458 | 10.258 | -0.200 | -1.91 |
    300 | 12.425 | 12.750 | +0.325 | +2.62 |
    350 | 14.807 | 14.839 | +0.032 | +0.22 |
    400 | 16.801 | 16.959 | +0.158 | +0.94 |
    450 | 19.478 | 19.009 | -0.470 | -2.41 |
    500 | 21.296 | 21.504 | +0.208 | +0.98 |
    550 | 23.842 | 23.979 | +0.137 | +0.57 |
    600 | 26.309 | 26.111 | -0.198 | -0.75 |
    650 | 28.705 | 28.446 | -0.259 | -0.9 |
    700 | 31.233 | 31.394 | +0.161 | +0.52 |
    750 | 34.064 | 33.720 | -0.344 | -1.01 |
    800 | 36.320 | 36.114 | -0.206 | -0.57 |
    --------------------------------------------------------------

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: 'Peter Zijlstra'
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers
     
  • * 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
    proc: remove kernel.maps_protect
    proc: remove now unneeded ADDBUF macro
    [PATCH] proc: show personality via /proc/pid/personality
    [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
    proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
    proc: make grab_header() static
    proc: remove unused get_dma_list()
    proc: remove dummy vmcore_open()
    proc: proc_sys_root tweak
    proc: fix return value of proc_reg_open() in "too late" case

    Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h

    Linus Torvalds
     

12 Oct, 2008

5 commits

  • RAID autodetect has the side effect of requiring synchronisation
    of all device drivers, which can make the boot several seconds longer
    (I've measured 7 on one of my laptops).... even for systems that don't
    have RAID setup for the root filesystem (the only FS where this matters).

    This patch makes the default for autodetect a config option; either way
    the user can always override via the kernel command line.

    Signed-off-by: Arjan van de Ven
    Acked-by: NeilBrown

    Arjan van de Ven
     
  • fix warning:

    init/do_mounts_md.c: In function ‘md_run_setup’:
    init/do_mounts_md.c:282: warning: ISO C90 forbids mixed declarations and code

    also, use the opportunity to put the RAID autodetection code
    into a separate function - this also solves a checkpatch style warning.

    No code changed:

    md5:
    aa36a35faef371b05f1974ad583bdbbd do_mounts_md.o.before.asm
    aa36a35faef371b05f1974ad583bdbbd do_mounts_md.o.after.asm

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • As requested/suggested by Neil Brown: make the raid code print that it's
    about to wait for probing to be done as well as give a suggestion on how
    to disable the probing if the user doesn't use raid.

    Signed-off-by: Arjan van de Ven <arjan@linux.intel.com

    Arjan van de Ven
     
  • The raid autodetect code really needs to have all devices probed before
    it can detect raid arrays; not doing so would give rather messy situations
    where arrays would get detected as degraded while they shouldn't be etc.

    This is in preparation of removing the "wait for everything to init"
    code that makes everyone pay, not just raid users.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     
  • When optimizing the kernel boot time, it's very valuable to visualize
    what is going on at which time. In addition, with some of the initializing
    going asynchronous soon, it's valuable to track/print which worker thread
    is executing the initialization.

    This patch adds a script to turn a dmesg into a SVG graph (that can be
    shown with tools such as InkScape, Gimp or Firefox) and a small change
    to the initcall code to print the PID of the thread calling the initcall
    (so that the script can work out the parallelism).

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     

10 Oct, 2008

1 commit


09 Oct, 2008

1 commit

  • DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
    device number set using rdev become meaningless. Root devices should
    be explicitly specified using textual names. Warn about it if root
    can't be found and DEBUG_BLOCK_EXT_DEVT is enabled. Also, add warning
    to the help text.

    Signed-off-by: Tejun Heo
    Cc: Bartlomiej Zolnierkiewicz
    Signed-off-by: Jens Axboe

    Tejun Heo