09 Jan, 2006

40 commits

  • Remove documentation for the cpuset 'marker_pid' feature, that was in the
    patch "cpuset: change marker for relative numbering" That patch was previously
    pulled from *-mm at my (pj) request.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Document the additional cpuset features:
    notify_on_release
    marker_pid
    memory_pressure
    memory_pressure_enabled

    Rearrange and improve formatting of existing documentation for
    cpu_exclusive and mem_exclusive features.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
    that the tasks in a cpuset call try_to_free_pages(), the synchronous
    (direct) memory reclaim code.

    This enables batch managers monitoring jobs running in dedicated cpusets to
    efficiently detect what level of memory pressure that job is causing.

    This is useful both on tightly managed systems running a wide mix of
    submitted jobs, which may choose to terminate or reprioritize jobs that are
    trying to use more memory than allowed on the nodes assigned them, and with
    tightly coupled, long running, massively parallel scientific computing jobs
    that will dramatically fail to meet required performance goals if they
    start to use more memory than allowed to them.

    This patch just provides a very economical way for the batch manager to
    monitor a cpuset for signs of memory pressure. It's up to the batch
    manager or other user code to decide what to do about it and take action.

    ==> Unless this feature is enabled by writing "1" to the special file
    /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
    code of __alloc_pages() for this metric reduces to simply noticing
    that the cpuset_memory_pressure_enabled flag is zero. So only
    systems that enable this feature will compute the metric.

    Why a per-cpuset, running average:

    Because this meter is per-cpuset, rather than per-task or mm, the
    system load imposed by a batch scheduler monitoring this metric is
    sharply reduced on large systems, because a scan of the tasklist can be
    avoided on each set of queries.

    Because this meter is a running average, instead of an accumulating
    counter, a batch scheduler can detect memory pressure with a single
    read, instead of having to read and accumulate results for a period of
    time.

    Because this meter is per-cpuset rather than per-task or mm, the
    batch scheduler can obtain the key information, memory pressure in a
    cpuset, with a single read, rather than having to query and accumulate
    results over all the (dynamically changing) set of tasks in the cpuset.

    A per-cpuset simple digital filter (requires a spinlock and 3 words of data
    per-cpuset) is kept, and updated by any task attached to that cpuset, if it
    enters the synchronous (direct) page reclaim code.

    A per-cpuset file provides an integer number representing the recent
    (half-life of 10 seconds) rate of direct page reclaims caused by the tasks
    in the cpuset, in units of reclaims attempted per second, times 1000.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Finish converting mm/mempolicy.c from bitmaps to nodemasks. The previous
    conversion had left one routine using bitmaps, since it involved a
    corresponding change to kernel/cpuset.c

    Fix that interface by replacing with a simple macro that calls nodes_subset(),
    or if !CONFIG_CPUSET, returns (1).

    Signed-off-by: Paul Jackson
    Cc: Christoph Lameter
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Fix the default behaviour for the remap operators in bitmap, cpumask and
    nodemask.

    As previously submitted, the pair of masks defined a map of the
    positions of the set bits in A to the corresponding bits in B. This is still
    true.

    The issue is how to map the other positions, corresponding to the unset (0)
    bits in A. As previously submitted, they were all mapped to the first set bit
    position in B, a constant map.

    When I tried to code per-vma mempolicy rebinding using these remap operators,
    I realized this was wrong.

    This patch changes the default to map all the unset bit positions in A to the
    same positions in B, the identity map.

    For example, if A has bits 4-7 set, and B has bits 9-12 set, then the map
    defined by the pair maps each bit position in the first 32 bits as
    follows:

    0 ==> 0
    ...
    3 ==> 3
    4 ==> 9
    ...
    7 ==> 12
    8 ==> 8
    9 ==> 9
    ...
    31 ==> 31

    This now corresponds to the typical behaviour desired when migrating pages and
    policies from one cpuset to another.

    The pages on nodes within the original cpuset, and the references in memory
    policies to nodes within the original cpuset, are migrated to the
    corresponding cpuset-relative nodes in the destination cpuset. Other pages
    and node references are left untouched.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • configurable replacement for slab allocator

    This adds a CONFIG_SLAB option under CONFIG_EMBEDDED. When CONFIG_SLAB is
    disabled, the kernel falls back to using the 'SLOB' allocator.

    SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer,
    similar to the original Linux kmalloc allocator that SLAB replaced. It's
    signicantly smaller code and is more memory efficient. But like all
    similar allocators, it scales poorly and suffers from fragmentation more
    than SLAB, so it's only appropriate for small systems.

    It's been tested extensively in the Linux-tiny tree. I've also
    stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not
    recommended).

    Here's a comparison for otherwise identical builds, showing SLOB saving
    nearly half a megabyte of RAM:

    $ size vmlinux*
    text data bss dec hex filename
    3336372 529360 190812 4056544 3de5e0 vmlinux-slab
    3323208 527948 190684 4041840 3dac70 vmlinux-slob

    $ size mm/{slab,slob}.o
    text data bss dec hex filename
    13221 752 48 14021 36c5 mm/slab.o
    1896 52 8 1956 7a4 mm/slob.o

    /proc/meminfo:
    SLAB SLOB delta
    MemTotal: 27964 kB 27980 kB +16 kB
    MemFree: 24596 kB 25092 kB +496 kB
    Buffers: 36 kB 36 kB 0 kB
    Cached: 1188 kB 1188 kB 0 kB
    SwapCached: 0 kB 0 kB 0 kB
    Active: 608 kB 600 kB -8 kB
    Inactive: 808 kB 812 kB +4 kB
    HighTotal: 0 kB 0 kB 0 kB
    HighFree: 0 kB 0 kB 0 kB
    LowTotal: 27964 kB 27980 kB +16 kB
    LowFree: 24596 kB 25092 kB +496 kB
    SwapTotal: 0 kB 0 kB 0 kB
    SwapFree: 0 kB 0 kB 0 kB
    Dirty: 4 kB 12 kB +8 kB
    Writeback: 0 kB 0 kB 0 kB
    Mapped: 560 kB 556 kB -4 kB
    Slab: 1756 kB 0 kB -1756 kB
    CommitLimit: 13980 kB 13988 kB +8 kB
    Committed_AS: 4208 kB 4208 kB 0 kB
    PageTables: 28 kB 28 kB 0 kB
    VmallocTotal: 1007312 kB 1007312 kB 0 kB
    VmallocUsed: 48 kB 48 kB 0 kB
    VmallocChunk: 1007264 kB 1007264 kB 0 kB

    (this work has been sponsored in part by CELF)

    From: Ingo Molnar

    Fix 32-bitness bugs in mm/slob.c.

    Signed-off-by: Matt Mackall
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • Add mm/util.c for functions common between SLAB and SLOB.

    Signed-off-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • Make DEBUG_SLAB depend on SLAB.

    Signed-off-by: Ingo Molnar
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Shrink the height of a radix tree when it is partially truncated - we only do
    shrinkage of full truncation at present.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Correctly determine the tags to be cleared in radix_tree_delete() so we
    don't keep moving up the tree clearing tags that we don't need to. For
    example, if a tag is simply not set in the deleted item, nor anywhere up
    the tree, radix_tree_delete() would attempt to clear it up the entire
    height of the tree.

    Also, tag_set() was made conditional so as not to dirty too many cachelines
    high up in the radix tree. Instead, put this logic into
    radix_tree_tag_set().

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Introduce helper any_tag_set() rather than repeat the same code sequence 4
    times.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • The latest set of signal-RCU patches does not use get_task_struct_rcu().
    Attached is a patch that removes it.

    Signed-off-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • Some simplification in checking signal delivery against concurrent exit.
    Instead of using get_task_struct_rcu(), which increments the task_struct
    reference count, check the reference count after acquiring sighand lock.

    Signed-off-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
    instead of tasklist_lock read-locked. This is a scalability improvement on
    SMP and a preemption-latency improvement under PREEMPT_RCU.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar
    Acked-by: William Irwin
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Applying RCU to the task structure broke oprofile, because
    free_task_notify() can now be called from softirq. This means that the
    task_mortuary lock must be acquired with irq disabled in order to avoid
    intermittent self-deadlock. Since irq is now disabled, the critical
    section within process_task_mortuary() has been restructured to be O(1) in
    order to maximize scalability and minimize realtime latency degradation.

    Kudos to Wu Fengguang for finding this problem!

    CC: Wu Fengguang
    Cc: Philippe Elie
    Cc: John Levon
    Signed-off-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • If MODE_TT=n, MODE_SKAS must be y.

    Signed-off-by: Adrian Bunk
    Acked-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This fixes some mangled whitespace added by the earlier trap_user.c patch.

    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • Most of the architectures have the same asm/futex.h. This consolidates them
    into asm-generic, with the arches including it from their own asm/futex.h.

    In the case of UML, this reverts the old broken futex.h and goes back to using
    the same one as almost everyone else.

    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • The serial UML OS-abstraction layer patch (um/kernel dir).

    This joins trap_user.c and trap_kernel.c files.

    Signed-off-by: Gennady Sharapov
    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gennady Sharapov
     
  • The serial UML OS-abstraction layer patch (um/kernel dir).

    This moves all systemcalls from trap_user.c file under os-Linux dir

    Signed-off-by: Gennady Sharapov
    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gennady Sharapov
     
  • The serial UML OS-abstraction layer patch (um/kernel dir).

    This moves all systemcalls from signal_user.c file under os-Linux dir

    Signed-off-by: Gennady Sharapov
    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gennady Sharapov
     
  • ds1620 module is using gpio_read symbol, so works only if "built-in" symbol
    needs to be exported from the kernel image

    Signed-off-by: Woody Suwalski
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Woody Suwalski
     
  • Kill L1_CACHE_SHIFT from all arches. Since L1_CACHE_SHIFT_MAX is not used
    anymore with the introduction of INTERNODE_CACHE, kill L1_CACHE_SHIFT_MAX.

    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     
  • ____cacheline_maxaligned_in_smp is currently used to align critical structures
    and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find
    L1_CACHE_SHIFT_MAX useless.

    However, we have been using ____cacheline_maxaligned_in_smp to align
    structures on the internode cacheline size. As per Andi's suggestion,
    following patch kills ____cacheline_maxaligned_in_smp and introduces
    INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches.
    Arches needing L3/Internode cacheline alignment can define
    INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces
    ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp

    With this patch, L1_CACHE_SHIFT_MAX can be killed

    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     
  • Fix an uninitialised variable warning in the serverworks driver.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix an uninitialised variable warning in the atm nicstar driver.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix a number of miscellanous items:

    (1) Declare lock sections in the linker script.

    (2) Recurse in the correct manner in the arch makefile.

    (3) asm/bug.h requires asm/linkage.h to be included first. One C file puts
    asm/bug.h first.

    (4) Add an empty RTC header file to avoid missing header file errors.

    (5) sg_dma_address() should use the dma_address member of a scatter list.

    (6) Add trivial pci_unmap support.

    (7) Add pgprot_noncached()

    (8) Discard u_quad_t.

    (9) Use ~0UL rather than ULONG_MAX in unistd.h in case the latter isn't
    declared.

    (10) Add an empty VGA header file to avoid missing header file errors.

    (11) Add an XOR header file to use the generic XOR stuff.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Make the get_user macro cast the source pointer to an appropriate type for the
    specified size.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Force the 8230 serial driver to be built in if the on-CPU UARTs are to be
    used. It can't be used as a module because the arch setup needs to call into
    it.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix PCMCIA configuration for FRV by including the stock PCMCIA configuration
    description file.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Implement pci_iomap() for FRV.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Add stubs for FRV module support.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Supply various I/O access primitives that are missing for the FRV arch:

    (*) mmiowb()

    (*) read*_relaxed()

    (*) ioport_*map()

    (*) ioread*(), iowrite*(), ioread*_rep() and iowrite*_rep()

    (*) pci_io*map()

    (*) check_signature()

    The patch also makes __is_PCI_addr() more efficient.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix the exception table handling so that modules exceptions are dealt with.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Export a number of features required to build all the modules. It also
    implements the following simple features:

    (*) csum_partial_copy_from_user() for MMU as well as no-MMU.

    (*) __ucmpdi2().

    so that they can be exported too.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Drop support for debugging features that aren't supported on FRV:

    (*) EARLY_PRINTK

    The on-chip UARTs are set up early enough that this isn't required,
    and VGA support isn't available. There's also a gdbstub available.

    (*) DEBUG_PAGEALLOC

    This can't be easily be done since we use huge static mappings to
    cover the kernel, not pages.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Drop support for 8-bit and 16-bit xchg and cmpxchg emulation and implements
    32-bit xchg with the SWAP/SWAPI instruction.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Suppress configuration of certain features for the FRV arch as they can't be
    built for FRV at the moment:

    (*) RTC

    (*) HISAX_*

    (*) PARPORT_PC

    (*) VGA_CONSOLE

    (*) BINFMT_ELF

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • The Kconfig symbol for pnx0105 was recently renamed to ARCH_PNX010X.

    Signed-off-by: Lennert Buytenhek
    Cc: dmitry pervushin
    Cc:
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lennert Buytenhek
     
  • PNX010X support for CS89x0 should be conditional on NET_PCI, as it is an 'on
    board controller' and NET_PCI includes that category of NICs. Since
    ARCH_PNX0105 was recently changed to ARCH_PNX010X, incorporate that change as
    well while we're at it.

    Signed-off-by: Lennert Buytenhek
    Cc: dmitry pervushin
    Cc:
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lennert Buytenhek