16 Jan, 2009

2 commits

  • Move Documentation/cpusets.txt and Documentation/controllers/* to
    Documentation/cgroups/

    Signed-off-by: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • - move CONFIG_PROC_PID_CPUSET into cgroup menu
    - move MM_OWNER to the bottom for better menu indent
    - fix typos
    - use tabs not spaces

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

10 Jan, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
    MAINTAINERS: squashfs entry
    Squashfs: documentation
    Squashfs: initrd support
    Squashfs: Kconfig entry
    Squashfs: Makefiles
    Squashfs: header files
    Squashfs: block operations
    Squashfs: cache operations
    Squashfs: uid/gid lookup operations
    Squashfs: fragment block operations
    Squashfs: export operations
    Squashfs: super block operations
    Squashfs: symlink operations
    Squashfs: regular file operations
    Squashfs: directory readdir operations
    Squashfs: directory lookup operations
    Squashfs: inode operations

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu:
    NOMMU: Support XIP on initramfs
    NOMMU: Teach kobjsize() about VMA regions.
    FLAT: Don't attempt to expand the userspace stack to fill the space allocated
    FDPIC: Don't attempt to expand the userspace stack to fill the space allocated
    NOMMU: Improve procfs output using per-MM VMAs
    NOMMU: Make mmap allocation page trimming behaviour configurable.
    NOMMU: Make VMAs per MM as for MMU-mode linux
    NOMMU: Delete askedalloc and realalloc variables
    NOMMU: Rename ARM's struct vm_region
    NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()

    Linus Torvalds
     

09 Jan, 2009

3 commits

  • Config and control variable for mem+swap controller.

    This patch adds CONFIG_CGROUP_MEM_RES_CTLR_SWAP
    (memory resource controller swap extension.)

    For accounting swap, it's obvious that we have to use additional memory to
    remember "who uses swap". This adds more overhead. So, it's better to
    offer "choice" to users. This patch adds 2 choices.

    This patch adds 2 parameters to enable swap extension or not.
    - CONFIG
    - boot option

    Reviewed-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • s/contoller/controller/

    Signed-of-by: Li Zefan
    Cc: Paul Menage
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Making CGROUP related configs be a sub-menu.

    This patch make CGROUP related configs be a sub-menu and makes 1st level
    configs of "General Setup" shorter.

    including following additional changes
    - add help comment about CGROUPS and GROUP_SCHED.
    - moved MM_OWNER config to the bottom.
    (for good indent in menuconfig)

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

08 Jan, 2009

4 commits

  • Support XIP on files unpacked from the initramfs image on NOMMU systems. This
    simply requires the length of the file to be preset so that the ramfs fs can
    attempt to garner sufficient contiguous storage to store the file (NOMMU mmap
    can only map contiguous RAM).

    All the other bits to do XIP on initramfs files are present:

    (1) ramfs's truncate attempts to allocate a contiguous run of pages when a
    file is truncated upwards from nothing.

    (2) ramfs sets BDI on its files to indicate direct mapping is possible, and
    that its files can be mapped for read, write and exec.

    (3) NOMMU mmap() will use the above bits to determine that it can do XIP.
    Possibly this needs better controls, because it will _always_ try and do
    XIP.

    One disadvantage of this very simplistic approach is that sufficient space
    will be allocated to store the whole file, and not just the bit that would be
    XIP'd. To deal with this, though, the initramfs unpacker would have to be
    able to parse the file contents.

    Signed-off-by: David Howells
    Acked-by: Paul Mundt

    David Howells
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async:
    async: don't do the initcall stuff post boot
    bootchart: improve output based on Dave Jones' feedback
    async: make the final inode deletion an asynchronous event
    fastboot: Make libata initialization even more async
    fastboot: make the libata port scan asynchronous
    fastboot: make scsi probes asynchronous
    async: Asynchronous function calls to speed up kernel boot

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
    trivial: chack -> check typo fix in main Makefile
    trivial: Add a space (and a comma) to a printk in 8250 driver
    trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
    trivial: Fix misspelling of "firmware" in powerpc Makefile
    trivial: Fix misspelling of "firmware" in usb.c
    trivial: Fix misspelling of "firmware" in qla1280.c
    trivial: Fix misspelling of "firmware" in a100u2w.c
    trivial: Fix misspelling of "firmware" in megaraid.c
    trivial: Fix misspelling of "firmware" in ql4_mbx.c
    trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
    trivial: Fix misspelling of "firmware" in ipw2100.c
    trivial: Fix misspelling of "firmware" in atmel.c
    trivial: Fix misspelled firmware in Kconfig
    trivial: fix an -> a typos in documentation and comments
    trivial: fix then -> than typos in comments and documentation
    trivial: update Jesper Juhl CREDITS entry with new email
    trivial: fix singal -> signal typo
    trivial: Fix incorrect use of "loose" in event.c
    trivial: printk: fix indentation of new_text_line declaration
    trivial: rtc-stk17ta8: fix sparse warning
    ...

    Linus Torvalds
     
  • Right now, most of the kernel boot is strictly synchronous, such that
    various hardware delays are done sequentially.

    In order to make the kernel boot faster, this patch introduces
    infrastructure to allow doing some of the initialization steps
    asynchronously, which will hide significant portions of the hardware delays
    in practice.

    In order to not change device order and other similar observables, this
    patch does NOT do full parallel initialization.

    Rather, it operates more in the way an out of order CPU does; the work may
    be done out of order and asynchronous, but the observable effects
    (instruction retiring for the CPU) are still done in the original sequence.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     

07 Jan, 2009

8 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (60 commits)
    uio: make uio_info's name and version const
    UIO: Documentation for UIO ioport info handling
    UIO: Pass information about ioports to userspace (V2)
    UIO: uio_pdrv_genirq: allow custom irq_flags
    UIO: use pci_ioremap_bar() in drivers/uio
    arm: struct device - replace bus_id with dev_name(), dev_set_name()
    libata: struct device - replace bus_id with dev_name(), dev_set_name()
    avr: struct device - replace bus_id with dev_name(), dev_set_name()
    block: struct device - replace bus_id with dev_name(), dev_set_name()
    chris: struct device - replace bus_id with dev_name(), dev_set_name()
    dmi: struct device - replace bus_id with dev_name(), dev_set_name()
    gadget: struct device - replace bus_id with dev_name(), dev_set_name()
    gpio: struct device - replace bus_id with dev_name(), dev_set_name()
    gpu: struct device - replace bus_id with dev_name(), dev_set_name()
    hwmon: struct device - replace bus_id with dev_name(), dev_set_name()
    i2o: struct device - replace bus_id with dev_name(), dev_set_name()
    IA64: struct device - replace bus_id with dev_name(), dev_set_name()
    i7300_idle: struct device - replace bus_id with dev_name(), dev_set_name()
    infiniband: struct device - replace bus_id with dev_name(), dev_set_name()
    ISDN: struct device - replace bus_id with dev_name(), dev_set_name()
    ...

    Linus Torvalds
     
  • Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • The function autodetect_raid is only used by __init functions, and it refers
    to __initdata, so it needs __init markings. Fixes this error:
    The function autodetect_raid() references
    the variable __initdata raid_noautodetect.
    This is often because autodetect_raid lacks a __initdata
    annotation or the annotation of raid_noautodetect is wrong.

    Signed-off-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • Signed-off-by: Alexey Dobriyan
    Cc: Gabor Gombas
    Cc: Jan Beulich
    Cc: Andi Kleen
    Cc: Ingo Molnar ,
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • In the past, I used the root=... command line parameter to specify the
    root filesystem to the kernel. Now it seems that specifying it is not
    necessary. The kernel detects the root filesystem even if the kernel
    command line is empty. My root fs is on a raid1 device by the way, and I
    am not using initrd for the boot process.

    If the kernel detects the root filesystem somehow, I think it should print
    out the result of this detection, otherwise I will not know which device
    has the root filesystem. Or is there an easy way to get this information
    on a running system? I had a quick look at the /proc and /sys
    filesystems, but haven't found anything useful there.

    Signed-off-by: Marton Balint
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marton Balint
     
  • checkpatch warns about 'static void noinline'. It wants `static noinline
    void'.

    Both are permissible, but the kernel consistently uses `static inline' and
    `static noinline', and consistency is good. Hence let's keep the
    checkpatch warning and fix up this code site.

    [akpm@linux-foundation.org: rewrote changelog]
    Signed-off-by: Md.Rakib H. Mullick
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rakib Mullick
     
  • tiny-shmem shares most of its 130 lines of code with shmem and tends to
    break when particular bits of shmem get modified. Unifying saves code and
    makes keeping these two in sync much easier.

    before:
    14367 392 24 14783 39bf mm/shmem.o
    396 72 8 476 1dc mm/tiny-shmem.o

    after:
    14367 392 24 14783 39bf mm/shmem.o
    412 72 8 492 1ec mm/shmem.o tiny

    Signed-off-by: Matt Mackall
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • This should make the help text of SYSFS_DEPRECATED more clear, that this
    is _not_ about (what some people think it is) suppressing a few symlinks
    and variables, but a different sysfs _layout_ with new features.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

06 Jan, 2009

1 commit


05 Jan, 2009

2 commits


04 Jan, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
    x86: setup_per_cpu_areas() cleanup
    cpumask: fix compile error when CONFIG_NR_CPUS is not defined
    cpumask: use alloc_cpumask_var_node where appropriate
    cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
    x86: use cpumask_var_t in acpi/boot.c
    x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
    sched: put back some stack hog changes that were undone in kernel/sched.c
    x86: enable cpus display of kernel_max and offlined cpus
    ia64: cpumask fix for is_affinity_mask_valid()
    cpumask: convert RCU implementations, fix
    xtensa: define __fls
    mn10300: define __fls
    m32r: define __fls
    h8300: define __fls
    frv: define __fls
    cris: define __fls
    cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
    cpumask: zero extra bits in alloc_cpumask_var_node
    cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
    cpumask: convert mm/
    ...

    Linus Torvalds
     

03 Jan, 2009

2 commits

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
    x86: export vector_used_by_percpu_irq
    x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
    sched: nominate preferred wakeup cpu, fix
    x86: fix lguest used_vectors breakage, -v2
    x86: fix warning in arch/x86/kernel/io_apic.c
    sched: fix warning in kernel/sched.c
    sched: move test_sd_parent() to an SMP section of sched.h
    sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
    sched: activate active load balancing in new idle cpus
    sched: bias task wakeups to preferred semi-idle packages
    sched: nominate preferred wakeup cpu
    sched: favour lower logical cpu number for sched_mc balance
    sched: framework for sched_mc/smt_power_savings=N
    sched: convert BALANCE_FOR_xx_POWER to inline functions
    x86: use possible_cpus=NUM to extend the possible cpus allowed
    x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
    x86: update io_apic.c to the new cpumask code
    x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
    x86: xen: use smp_call_function_many()
    x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
    ...

    Fixed up trivial conflict in kernel/time/tick-sched.c manually

    Linus Torvalds
     
  • Impact: cleanup

    We now have a cleaner check for gcc 4.1.0/4.1.1 trouble in
    include/linux/compiler-gcc4.h, so remove the 4.1.0 quirk from
    init/main.c.

    Reported-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Ingo Molnar
    Acked-by: Sam Ravnborg
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

01 Jan, 2009

3 commits

  • Impact: cleanup

    There's one obvious place to use it: to find the highest possible cpu.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Impact: use new API

    cpu_*_map are going away in favour of cpu_*_mask, but const pointers.
    So we have accessors where we really do want to frob them. Archs
    will also need the (trivial) conversion before we can finally remove
    cpu_*_map.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     
  • …l/git/tip/linux-2.6-tip

    * 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sparseirq: move __weak symbols into separate compilation unit
    sparseirq: work around __weak alias bug
    sparseirq: fix hang with !SPARSE_IRQ
    sparseirq: set lock_class for legacy irq when sparse_irq is selected
    sparseirq: work around compiler optimizing away __weak functions
    sparseirq: fix desc->lock init
    sparseirq: do not printk when migrating IRQ descriptors
    sparseirq: remove duplicated arch_early_irq_init()
    irq: simplify for_each_irq_desc() usage
    proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
    irq: for_each_irq_desc() move to irqnr.h
    hrtimer: remove #include <linux/irq.h>

    Linus Torvalds
     

31 Dec, 2008

3 commits

  • Conflicts:

    arch/x86/kernel/io_apic.c

    Rusty Russell
     
  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, sparseirq: clean up Kconfig entry
    x86: turn CONFIG_SPARSE_IRQ off by default
    sparseirq: fix numa_migrate_irq_desc dependency and comments
    sparseirq: add kernel-doc notation for new member in irq_desc, -v2
    locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
    sparseirq, xen: make sure irq_desc is allocated for interrupts
    sparseirq: fix !SMP building, #2
    x86, sparseirq: move irq_desc according to smp_affinity, v7
    proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
    sparse irqs: add irqnr.h to the user headers list
    sparse irqs: handle !GENIRQ platforms
    sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
    sparseirq: fix Alpha build failure
    sparseirq: fix typo in !CONFIG_IO_APIC case
    x86, MSI: pass irq_cfg and irq_desc
    x86: MSI start irq numbering from nr_irqs_gsi
    x86: use NR_IRQS_LEGACY
    sparse irq_desc[] array: core kernel and x86 changes
    genirq: record IRQ_LEVEL in irq_desc[]
    irq.h: remove padding from irq_desc on 64bits

    Linus Torvalds
     
  • * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
    stacktrace: provide save_stack_trace_tsk() weak alias
    rcu: provide RCU options on non-preempt architectures too
    printk: fix discarding message when recursion_bug
    futex: clean up futex_(un)lock_pi fault handling
    "Tree RCU": scalable classic RCU implementation
    futex: rename field in futex_q to clarify single waiter semantics
    x86/swiotlb: add default swiotlb_arch_range_needs_mapping
    x86/swiotlb: add default physbus conversion
    x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
    x86: add swiotlb allocation functions
    swiotlb: consolidate swiotlb info message printing
    swiotlb: support bouncing of HighMem pages
    swiotlb: factor out copy to/from device
    swiotlb: add arch hook to force mapping
    swiotlb: allow architectures to override physbusphys conversions
    swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
    rcu: fix rcutorture behavior during reboot
    resources: skip sanity check of busy resources
    swiotlb: move some definitions to header
    swiotlb: allow architectures to override swiotlb pool allocation
    ...

    Fix up trivial conflicts in
    arch/x86/kernel/Makefile
    arch/x86/mm/init_32.c
    include/linux/hardirq.h
    as per Ingo's suggestions.

    Linus Torvalds
     

30 Dec, 2008

1 commit


29 Dec, 2008

3 commits

  • GCC has a bug with __weak alias functions: if the functions are in
    the same compilation unit as their call site, GCC can decide to
    inline them - and thus rob the linker of the opportunity to override
    the weak alias with the real thing.

    So move all the IRQ handling related __weak symbols to kernel/irq/chip.c.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
    allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
    kbuild: strip generated symbols from *.ko
    kbuild: simplify use of genksyms
    kernel-doc: check for extra kernel-doc notations
    kbuild: add headerdep used to detect inclusion cycles in header files
    kbuild: fix string equality testing in tags.sh
    kbuild: fix make tags/cscope
    kbuild: fix make incompatibility
    kbuild: remove TAR_IGNORE
    setlocalversion: add git-svn support
    setlocalversion: print correct subversion revision
    scripts: improve the decodecode script
    scripts/package: allow custom options to rpm
    genksyms: allow to ignore symbol checksum changes
    genksyms: track symbol checksum changes
    tags and cscope support really belongs in a shell script
    kconfig: fix options to check-lxdialog.sh
    kbuild: gen_init_cpio expands shell variables in file names
    remove bashisms from scripts/extract-ikconfig
    kbuild: teach mkmakfile to be silent
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
    sched, trace: update trace_sched_wakeup()
    tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
    Revert "x86: disable X86_PTRACE_BTS"
    ring-buffer: prevent false positive warning
    ring-buffer: fix dangling commit race
    ftrace: enable format arguments checking
    x86, bts: memory accounting
    x86, bts: add fork and exit handling
    ftrace: introduce tracing_reset_online_cpus() helper
    tracing: fix warnings in kernel/trace/trace_sched_switch.c
    tracing: fix warning in kernel/trace/trace.c
    tracing/ring-buffer: remove unused ring_buffer size
    trace: fix task state printout
    ftrace: add not to regex on filtering functions
    trace: better use of stack_trace_enabled for boot up code
    trace: add a way to enable or disable the stack tracer
    x86: entry_64 - introduce FTRACE_ frame macro v2
    tracing/ftrace: add the printk-msg-only option
    tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
    x86, bts: correctly report invalid bts records
    ...

    Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
    being already partly merged by the SH merge.

    Linus Torvalds
     

27 Dec, 2008

1 commit


25 Dec, 2008

1 commit

  • Impact: build fix

    Some old architectures still do not use kernel/Kconfig.preempt, so the
    moving of the RCU options there broke their build:

    In file included from /home/mingo/tip/include/linux/sem.h:81,
    from /home/mingo/tip/include/linux/sched.h:69,
    from /home/mingo/tip/arch/alpha/kernel/asm-offsets.c:9:
    /home/mingo/tip/include/linux/rcupdate.h:62:2: error: #error "Unknown RCU implementation specified to kernel configuration"

    Move these options back to init/Kconfig, which every architecture
    includes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

20 Dec, 2008

1 commit

  • Building upon parts of the module stripping patch, this patch
    introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y.
    Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of
    CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64)
    kernels I tested with.

    The patch also does away with the need to special case the kallsyms-
    internal symbols by making them available even in the first linking
    stage.

    While it is a generated file, the patch includes the changes to
    scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure
    here is.

    Signed-off-by: Jan Beulich
    Signed-off-by: Sam Ravnborg

    Jan Beulich
     

19 Dec, 2008

1 commit

  • This patch fixes a long-standing performance bug in classic RCU that
    results in massive internal-to-RCU lock contention on systems with
    more than a few hundred CPUs. Although this patch creates a separate
    flavor of RCU for ease of review and patch maintenance, it is intended
    to replace classic RCU.

    This patch still handles stress better than does mainline, so I am still
    calling it ready for inclusion. This patch is against the -tip tree.
    Nevertheless, experience on an actual 1000+ CPU machine would still be
    most welcome.

    Most of the changes noted below were found while creating an rcutiny
    (which should permit ejecting the current rcuclassic) and while doing
    detailed line-by-line documentation.

    Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

    o Fixes from remainder of line-by-line code walkthrough,
    including comment spelling, initialization, undesirable
    narrowing due to type conversion, removing redundant memory
    barriers, removing redundant local-variable initialization,
    and removing redundant local variables.

    I do not believe that any of these fixes address the CPU-hotplug
    issues that Andi Kleen was seeing, but please do give it a whirl
    in case the machine is smarter than I am.

    A writeup from the walkthrough may be found at the following
    URL, in case you are suffering from terminal insomnia or
    masochism:

    http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

    o Made rcutree tracing use seq_file, as suggested some time
    ago by Lai Jiangshan.

    o Added a .csv variant of the rcudata debugfs trace file, to allow
    people having thousands of CPUs to drop the data into
    a spreadsheet. Tested with oocalc and gnumeric. Updated
    documentation to suit.

    Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

    o Fix a theoretical race between grace-period initialization and
    force_quiescent_state() that could occur if more than three
    jiffies were required to carry out the grace-period
    initialization. Which it might, if you had enough CPUs.

    o Apply Ingo's printk-standardization patch.

    o Substitute local variables for repeated accesses to global
    variables.

    o Fix comment misspellings and redundant (but harmless) increments
    of ->n_rcu_pending (this latter after having explicitly added it).

    o Apply checkpatch fixes.

    Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

    o Fixed a number of problems noted by Gautham Shenoy, including
    the cpu-stall-detection bug that he was having difficulty
    convincing me was real. ;-)

    o Changed cpu-stall detection to wait for ten seconds rather than
    three in order to reduce false positive, as suggested by Ingo
    Molnar.

    o Produced a design document (http://lwn.net/Articles/305782/).
    The act of writing this document uncovered a number of both
    theoretical and "here and now" bugs as noted below.

    o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
    condition, fix kerneldoc comments, and add memory barriers
    in dynticks interface functions.

    o Add more data to tracing.

    o Remove unused "rcu_barrier" field from rcu_data structure.

    o Count calls to rcu_pending() from scheduling-clock interrupt
    to use as a surrogate timebase should jiffies stop counting.

    o Fix a theoretical race between force_quiescent_state() and
    grace-period initialization. Yes, initialization does have to
    go on for some jiffies for this race to occur, but given enough
    CPUs...

    Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

    o Fix a number of checkpatch.pl complaints.

    o Apply review comments from Ingo Molnar and Lai Jiangshan
    on the stall-detection code.

    o Fix several bugs in !CONFIG_SMP builds.

    o Fix a misspelled config-parameter name so that RCU now announces
    at boot time if stall detection is configured.

    o Run tests on numerous combinations of configurations parameters,
    which after the fixes above, now build and run correctly.

    Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

    o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
    changeset some time ago, and finally got around to retesting
    this option).

    o Fix some tracing bugs in rcupreempt that caused incorrect
    totals to be printed.

    o I now test with a more brutal random-selection online/offline
    script (attached). Probably more brutal than it needs to be
    on the people reading it as well, but so it goes.

    o A number of optimizations and usability improvements:

    o Make rcu_pending() ignore the grace-period timeout when
    there is no grace period in progress.

    o Make force_quiescent_state() avoid going for a global
    lock in the case where there is no grace period in
    progress.

    o Rearrange struct fields to improve struct layout.

    o Make call_rcu() initiate a grace period if RCU was
    idle, rather than waiting for the next scheduling
    clock interrupt.

    o Invoke rcu_irq_enter() and rcu_irq_exit() only when
    idle, as suggested by Andi Kleen. I still don't
    completely trust this change, and might back it out.

    o Make CONFIG_RCU_TRACE be the single config variable
    manipulated for all forms of RCU, instead of the prior
    confusion.

    o Document tracing files and formats for both rcupreempt
    and rcutree.

    Updates from v4 for those missing v5 given its bad subject line:

    o Separated dynticks interface so that NMIs and irqs call separate
    functions, greatly simplifying it. In particular, this code
    no longer requires a proof of correctness. ;-)

    o Separated dynticks state out into its own per-CPU structure,
    avoiding the duplicated accounting.

    o The case where a dynticks-idle CPU runs an irq handler that
    invokes call_rcu() is now correctly handled, forcing that CPU
    out of dynticks-idle mode.

    o Review comments have been applied (thank you all!!!).
    For but one example, fixed the dynticks-ordering issue that
    Manfred pointed out, saving me much debugging. ;-)

    o Adjusted rcuclassic and rcupreempt to handle dynticks changes.

    Attached is an updated patch to Classic RCU that applies a hierarchy,
    greatly reducing the contention on the top-level lock for large machines.
    This passes 10-hour concurrent rcutorture and online-offline testing on
    128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
    bugs in presence of dynticks (exciting working on a system where
    "sleep 1" hangs until interrupted...), which were fixed in the
    2.6.27 kernel. It is getting more reliable than mainline by some
    measures, so the next version will be against -tip for inclusion.
    See also Manfred Spraul's recent patches (or his earlier work from
    2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
    We will converge onto a common patch in the fullness of time, but are
    currently exploring different regions of the design space. That said,
    I have already gratefully stolen quite a few of Manfred's ideas.

    This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
    of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
    64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
    there is no hierarchy. By default, the RCU initialization code will
    adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
    architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
    this balancing, allowing the hierarchy to be exactly aligned to the
    underlying hardware. Up to two levels of hierarchy are permitted
    (in addition to the root node), allowing up to 16,384 CPUs on 32-bit
    systems and up to 262,144 CPUs on 64-bit systems. I just know that I
    am going to regret saying this, but this seems more than sufficient
    for the foreseeable future. (Some architectures might wish to set
    CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
    If this becomes a real problem, additional levels can be added, but I
    doubt that it will make a significant difference on real hardware.)

    In the common case, a given CPU will manipulate its private rcu_data
    structure and the rcu_node structure that it shares with its immediate
    neighbors. This can reduce both lock and memory contention by multiple
    orders of magnitude, which should eliminate the need for the strange
    manipulations that are reported to be required when running Linux on
    very large systems.

    Some shortcomings:

    o More bugs will probably surface as a result of an ongoing
    line-by-line code inspection.

    Patches will be provided as required.

    o There are probably hangs, rcutorture failures, &c. Seems
    quite stable on a 128-CPU machine, but that is kind of small
    compared to 4096 CPUs. However, seems to do better than
    mainline.

    Patches will be provided as required.

    o The memory footprint of this version is several KB larger
    than rcuclassic.

    A separate UP-only rcutiny patch will be provided, which will
    reduce the memory footprint significantly, even compared
    to the old rcuclassic. One such patch passes light testing,
    and has a memory footprint smaller even than rcuclassic.
    Initial reaction from various embedded guys was "it is not
    worth it", so am putting it aside.

    Credits:

    o Manfred Spraul for ideas, review comments, and bugs spotted,
    as well as some good friendly competition. ;-)

    o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
    Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
    for reviews and comments.

    o Thomas Gleixner for much-needed help with some timer issues
    (see patches below).

    o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
    Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
    Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
    alive despite my heavy abuse^Wtesting.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

13 Dec, 2008

1 commit