20 Jan, 2006

5 commits


19 Jan, 2006

33 commits

  • This also includes by necessity _TIF_RESTORE_SIGMASK support,
    which actually resulted in a lot of cleanups.

    The sparc signal handling code is quite a mess and I should
    clean it up some day.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • This is a subset of the bluesmoke project core code, stripped of the NMI work
    which isn't ready to merge and some of the "interesting" proc functionality
    that needs reworking or just has no place in kernel. It requires no core
    kernel changes except the added scrub functions already posted.

    The goal is to merge further functionality only after the core code is
    accepted and proven in the base kernel, and only at the point the upstream
    extras are really ready to merge.

    From: doug thompson

    This converts EDAC to sysfs and is the final chunk neccessary before EDAC
    has a stable user space API and can be considered for submission into the
    base kernel.

    Signed-off-by: Alan Cox
    Signed-off-by: Adrian Bunk
    Signed-off-by: Jesper Juhl
    Signed-off-by: doug thompson
    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • EDAC requires a way to scrub memory if an ECC error is found and the chipset
    does not do the work automatically. That means rewriting memory locations
    atomically with respect to all CPUs _and_ bus masters. That means we can't
    use atomic_add(foo, 0) as it gets optimised for non-SMP

    This adds a function to include/asm-foo/atomic.h for the platforms currently
    supported which implements a scrub of a mapped block.

    It also adjusts a few other files include order where atomic.h is included
    before types.h as this now causes an error as atomic_scrub uses u32.

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Add the sys_pselect6() and sys_poll() calls to the i386 syscall table.

    Signed-off-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     
  • The following implementation of ppoll() and pselect() system calls
    depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
    thread_info.

    These system calls have to change the signal mask during their
    operation, and signal handlers must be invoked using the new, temporary
    signal mask. The old signal mask must be restored either upon successful
    exit from the system call, or upon returning from the invoked signal
    handler if the system call is interrupted. We can't simply restore the
    original signal mask and return to userspace, since the restored signal
    mask may actually block the signal which interrupted the system call.

    The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
    path to trap into do_signal() just as TIF_SIGPENDING does, and by
    causing do_signal() to use the saved signal mask instead of the current
    signal mask when setting up the stack frame for the signal handler -- or
    by causing do_signal() to simply restore the saved signal mask in the
    case where there is no handler to be invoked.

    The first patch implements the sys_pselect() and sys_ppoll() system
    calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
    #ifdef should go away in time when all architectures have implemented
    it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
    kernel (in the -mm tree), and the third patch then removes the
    arch-specific implementations of sys_rt_sigsuspend() and replaces them
    with generic versions using the same trick.

    The fourth and fifth patches, provided by David Howells, implement
    TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
    adds the syscalls to the i386 syscall table.

    This patch:

    Add the pselect() and ppoll() system calls, providing core routines usable by
    the original select() and poll() system calls and also the new calls (with
    their semantics w.r.t timeouts).

    Signed-off-by: David Woodhouse
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     
  • Use the generic sys_rt_sigsuspend.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • Add support for TIF_RESTORE_SIGMASK. I copy the i386 handling of the flag.
    sys_sigsuspend is also changed to follow i386.
    Also a bit of cleanup -
    turn an if into a switch
    get rid of a couple more emacs formatting comments

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • Implement the TIF_RESTORE_SIGMASK flag in the new arch/powerpc kernel, for
    both 32-bit and 64-bit system call paths.

    Signed-off-by: David Woodhouse
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     
  • Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled:

    [PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc
    [PATCH] 3/3 Generic sys_rt_sigsuspend

    It does the following:

    (1) Declares TIF_RESTORE_SIGMASK for i386.

    (2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set.

    (3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved
    in current->saved_sigmask.

    (4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead.

    (5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK
    rather than attempting to fudge the return registers.

    (6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping
    intrinsically.

    (7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or
    -EFAULT rather than true/false to be consistent with the rest of the
    kernel.

    Due to the fact do_signal() is then only called from one place:

    (8) Makes do_signal() no longer have a return value is it was just being
    ignored; force_sig() takes care of this.

    (9) Discards the old sigmask argument to do_signal() as it's no longer
    necessary.

    (10) Makes do_signal() static.

    (11) Marks the second argument to do_notify_resume() as unused. The unused
    argument should remain in the middle as the arguments are passed in as
    registers, and the ordering is specific in entry.S

    Given the way do_signal() is now no longer called from sys_{,rt_}sigsuspend(),
    they no longer need access to the exception frame, and so can just take
    arguments normally.

    This patch depends on sys_rt_sigsuspend patch.

    Signed-off-by: David Howells
    Signed-off-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled:

    [PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc
    [PATCH] 3/3 Generic sys_rt_sigsuspend

    It does the following:

    (1) Declares TIF_RESTORE_SIGMASK for FRV.

    (2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set.

    (3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved
    in current->saved_sigmask.

    (4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead.

    (5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK
    rather than attempting to fudge the return registers.

    (6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping
    intrinsically.

    (7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or
    -EFAULT rather than true/false to be consistent with the rest of the
    kernel.

    Due to the fact do_signal() is then only called from one place:

    (8) Make do_signal() no longer have a return value is it was just being
    ignored; force_sig() takes care of this.

    (9) Discards the old sigmask argument to do_signal() as it's no longer
    necessary.

    This patch depends on the FRV signalling patches as well as the
    sys_rt_sigsuspend patch.

    Signed-off-by: David Howells
    Signed-off-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of
    sys_rt_sigsuspend() instead of duplicating it for each architecture. This
    provides such an implementation and makes arch/powerpc use it.

    It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK.

    Signed-off-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     
  • Wire up the x86_64 syscalls.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • Wire up the x86 syscalls

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • Here is a series of patches which introduce in total 13 new system calls
    which take a file descriptor/filename pair instead of a single file
    name. These functions, openat etc, have been discussed on numerous
    occasions. They are needed to implement race-free filesystem traversal,
    they are necessary to implement a virtual per-thread current working
    directory (think multi-threaded backup software), etc.

    We have in glibc today implementations of the interfaces which use the
    /proc/self/fd magic. But this code is rather expensive. Here are some
    results (similar to what Jim Meyering posted before).

    The test creates a deep directory hierarchy on a tmpfs filesystem. Then
    rm -fr is used to remove all directories. Without syscall support I get
    this:

    real 0m31.921s
    user 0m0.688s
    sys 0m31.234s

    With syscall support the results are much better:

    real 0m20.699s
    user 0m0.536s
    sys 0m20.149s

    The interfaces are for obvious reasons currently not much used. But they'll
    be used. coreutils (and Jeff's posixutils) are already using them.
    Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
    them. I expect a patch to make follow soon. Every program which is walking
    the filesystem tree will benefit.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Alexey Dobriyan
    Cc: Christoph Hellwig
    Cc: Al Viro
    Acked-by: Ingo Molnar
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • One of the things that's confusing about nfsd4_lock is that the lk_stateowner
    field could be set to either of two different lockowners: the open owner or
    the lock owner. Rename to lk_replay_owner and add a comment to make it clear
    that it's used for whichever stateowner has its sequence id bumped for replay
    detection.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • The server code currently keeps track of the destination address on every
    request so that it can reply using the same address. However we forget to do
    that in the case of a deferred request. Remedy this oversight. >From folks
    at PolyServe.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • Change nfsd_sync_dir to return an error if ->sync fails, and pass that error
    up through the stack. This involves a number of rearrangements of error
    paths, and care to distinguish between Linux -errno numbers and NFSERR
    numbers.

    In the 'create' routines, we continue with the 'setattr' even if a previous
    sync_dir failed.

    This patch is quite different from Takashi's in a few ways, but there is still
    a strong lineage.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    YAMAMOTO Takashi
     
  • All standard system calls should be declared in include/linux/syscalls.h.

    Add some of the new additions that were previously missed.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • I added this line to share this file with UML, but now it's no longer
    shared so remove this useless leftover.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Acked-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     
  • Add implementations of the write* and __raw_write* functions. __raw_writel is
    needed by lib/iocopy.c, which shouldn't be used in UML, but which is
    unconditionally linked in anyway.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • This patch fixes a regression in 2.6.14 against 2.6.13 that causes an
    imbalance in memory allocation during bootup.

    The slab allocator in 2.6.13 is not numa aware and simply calls
    alloc_pages(). This means that memory policies may control the behavior of
    alloc_pages(). During bootup the memory policy is set to MPOL_INTERLEAVE
    resulting in the spreading out of allocations during bootup over all
    available nodes. The slab allocator in 2.6.13 has only a single list of
    slab pages. As a result the per cpu slab cache and the spinlock controlled
    page lists may contain slab entries from off node memory. The slab
    allocator in 2.6.13 makes no effort to discern the locality of an entry on
    its lists.

    The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages
    explicitly by calling alloc_pages_node(). The NUMA slab allocator manages
    slab entries by having lists of available slab pages for each node. The
    per cpu slab cache can only contain slab entries associated with the node
    local to the processor. This guarantees that the default allocation mode
    of the slab allocator always assigns local memory if available.

    Setting MPOL_INTERLEAVE as a default policy during bootup has no effect
    anymore. In 2.6.14 all node unspecific slab allocations are performed on
    the boot processor. This means that most of key data structures are
    allocated on one node. Most processors will have to refer to these
    structures making the boot node a potential bottleneck. This may reduce
    performance and cause unnecessary memory pressure on the boot node.

    This patch implements NUMA policies in the slab layer. There is the need
    of explicit application of NUMA memory policies by the slab allcator itself
    since the NUMA slab allocator does no longer let the page_allocator control
    locality.

    The check for policies is made directly at the beginning of __cache_alloc
    using current->mempolicy. The memory policy is already frequently checked
    by the page allocator (alloc_page_vma() and alloc_page_current()). So it
    is highly likely that the cacheline is present. For MPOL_INTERLEAVE
    kmalloc() will spread out each request to one node after another so that an
    equal distribution of allocations can be obtained during bootup.

    It is not possible to push the policy check to lower layers of the NUMA
    slab allocator since the per cpu caches are now only containing slab
    entries from the current node. If the policy says that the local node is
    not to be preferred or forbidden then there is no point in checking the
    slab cache or local list of slab pages. The allocation better be directed
    immediately to the lists containing slab entries for the allowed set of
    nodes.

    This way of applying policy also fixes another strange behavior in 2.6.13.
    alloc_pages() is controlled by the memory allocation policy of the current
    process. It could therefore be that one process is running with
    MPOL_INTERLEAVE and would f.e. obtain a new page following that policy
    since no slab entries are in the lists anymore. A page can typically be
    used for multiple slab entries but lets say that the current process is
    only using one. The other entries are then added to the slab lists. These
    are now non local entries in the slab lists despite of the possible
    availability of local pages that would provide faster access and increase
    the performance of the application.

    Another process without MPOL_INTERLEAVE may now run and expect a local slab
    entry from kmalloc(). However, there are still these free slab entries
    from the off node page obtained from the other process via MPOL_INTERLEAVE
    in the cache. The process will then get an off node slab entry although
    other slab entries may be available that are local to that process. This
    means that the policy if one process may contaminate the locality of the
    slab caches for other processes.

    This patch in effect insures that a per process policy is followed for the
    allocation of slab entries and that there cannot be a memory policy
    influence from one process to another. A process with default policy will
    always get a local slab entry if one is available. And the process using
    memory policies will get its memory arranged as requested. Off-node slab
    allocation will require the use of spinlocks and will make the use of per
    cpu caches not possible. A process using memory policies to redirect
    allocations offnode will have to cope with additional lock overhead in
    addition to the latency added by the need to access a remote slab entry.

    Changes V1->V2
    - Remove #ifdef CONFIG_NUMA by moving forward declaration into
    prior #ifdef CONFIG_NUMA section.

    - Give the function determining the node number to use a saner
    name.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • proc support for zone reclaim

    This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
    used to override the automatic determination of the zone reclaim made on
    bootup.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Some bits for zone reclaim exists in 2.6.15 but they are not usable. This
    patch fixes them up, removes unused code and makes zone reclaim usable.

    Zone reclaim allows the reclaiming of pages from a zone if the number of
    free pages falls below the watermarks even if other zones still have enough
    pages available. Zone reclaim is of particular importance for NUMA
    machines. It can be more beneficial to reclaim a page than taking the
    performance penalties that come with allocating a page on a remote zone.

    Zone reclaim is enabled if the maximum distance to another node is higher
    than RECLAIM_DISTANCE, which may be defined by an arch. By default
    RECLAIM_DISTANCE is 20. 20 is the distance to another node in the same
    component (enclosure or motherboard) on IA64. The meaning of the NUMA
    distance information seems to vary by arch.

    If zone reclaim is not successful then no further reclaim attempts will
    occur for a certain time period (ZONE_RECLAIM_INTERVAL).

    This patch was discussed before. See

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2
    http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2
    http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2
    http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Migration code currently does not take a reference to target page
    properly, so between unlocking the pte and trying to take a new
    reference to the page with isolate_lru_page, anything could happen to
    it.

    Fix this by holding the pte lock until we get a chance to elevate the
    refcount.

    Other small cleanups while we're here.

    Signed-off-by: Nick Piggin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • On alpha:

    In file included from drivers/scsi/sym53c8xx_2/sym_glue.h:59,
    from drivers/scsi/sym53c8xx_2/sym_fw.c:40:
    include/scsi/scsi_transport_spi.h:57: error: field `dv_mutex' has incomplete type

    Cc: James Bottomley
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Linus Torvalds
     
  • Linus Torvalds
     
  • Linus Torvalds
     
  • Russell King
     
  • From: Eddie C. Dost

    I have the following patch for serial console over the RSC
    (remote system controller) on my E250 machine. It basically adds
    support for input-device=rsc and output-device=rsc from OBP, and
    allows 115200,8,n,1,- serial mode setting.

    Signed-off-by: David S. Miller

    Eddie C. Dost
     
  • Patch from David Vrabel

    PXA27x SSP controller has a few different registers, including SCR (serial clock rate) in SSCR0.

    Signed-off-by: David Vrabel
    Signed-off-by: Russell King

    David Vrabel
     
  • David S. Miller
     

18 Jan, 2006

2 commits