09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

29 Sep, 2005

1 commit


28 Sep, 2005

1 commit

  • My previous patch fixing invalidation of huge PTEs wasn't good enough, we
    still had an issue if a PTE invalidation batch contained both small and
    large pages. This patch fixes this by making sure the batch is flushed if
    the page size fed to it changes.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

23 Sep, 2005

1 commit

  • The SMU is the "system controller" chip used by Apple recent G5 machines
    including the iMac G5. It drives things like fans, i2c busses, real time
    clock, etc...

    The current kernel contains a very crude driver that doesn't do much more
    than reading the real time clock synchronously. This is a completely
    rewritten driver that provides interrupt based command queuing, a userland
    interface, and an i2c/smbus driver for accessing the devices hanging off
    the SMU i2c busses like temperature sensors. This driver is a basic block
    for upcoming work on thermal control for those machines, among others.

    Signed-off-by: Benjamin Herrenschmidt
    Cc: Jean Delvare
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

12 Sep, 2005

6 commits

  • ppc64_attention_msg and ppc64_dump_msg are not used so remove them.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • Add hardware data breakpoint support.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • - Add PTRACE_GET_DEBUGREG/PTRACE_SET_DEBUGREG. The definition is
    as follows:

    /*
    * Get or set a debug register. The first 16 are DABR registers and the
    * second 16 are IABR registers.
    */
    #define PTRACE_GET_DEBUGREG 25
    #define PTRACE_SET_DEBUGREG 26

    DABR == data breakpoint and IABR = instruction breakpoint in IBM
    speak. We could split out the IABR into 2 more ptrace calls but I
    figured there was no need and 16 DABR registers should be more
    than enough (POWER4/POWER5 have one).

    - Add 2 new SIGTRAP si_codes: TRAP_HWBKPT and TRAP_BRANCH. I couldnt
    find any standards on either of these so I copied what ia64 is
    doing. Again this might be better placed in
    include/asm-generic/siginfo.h

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • - Remove the PPC_REG* defines
    - Wrap some more stuff with ifdef __KERNEL__
    - Add missing PT_TRAP, PT_DAR, PT_DSISR defines
    - Add PTRACE_GETEVRREGS/PTRACE_SETEVRREGS, even though we dont use it on
    ppc64 we dont want to allocate them for something else.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • The ptrace get and set methods for VMX/Altivec registers present in the
    ppc tree were missing for ppc64. This patch adds the 32-bit and
    64-bit methods. Updated with the suggestions from Anton following the lines
    of his code snippet.

    Added:
    - flush_altivec_to_thread calls as suggested by Anton
    - piecewise copy of structure to preserve 32-bit vrsave data as per
    Anton

    (I consolidated the 32 and 64bit versions with 2 helper macros - Anton)

    Signed-off-by: Robert C Jennings
    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Robert Jennings
     
  • This adds code which gives us the option on ppc64 of instantiating the
    PCI tree (the tree of pci_bus and pci_dev structs) from the Open
    Firmware device tree rather than by probing PCI configuration space.
    The OF device tree has a node for each PCI device and bridge in the
    system, with properties that tell us what addresses the firmware has
    configured for them and other details.

    There are a couple of reasons why this is needed. First, on systems
    with a hypervisor, there is a PCI-PCI bridge per slot under the PCI
    host bridges. These PCI-PCI bridges have special isolation features
    for virtualization. We can't write to their config space, and we are
    not supposed to be reading their config space either. The firmware
    tells us about the address ranges that they pass in the OF device
    tree.

    Secondly, on powermacs, the interrupt controller is in a PCI device
    that may be behind a PCI-PCI bridge. If we happened to take an
    interrupt just at the point when the device or a bridge on the path to
    it was disabled for probing, we would crash when we try to access the
    interrupt controller.

    I have implemented a platform-specific function which is called for
    each PCI bridge (host or PCI-PCI) to say whether the code should look
    in the device tree or use normal PCI probing for the devices under
    that bridge. On pSeries machines we use the device tree if we're
    running under a hypervisor, otherwise we use normal probing. On
    powermacs we use normal probing for the AGP bridge, since the device
    for the AGP bridge itself isn't shown in the device tree (at least on
    my G5), and the device tree for everything else.

    This has been tested on a dual G5 powermac, a partition on a POWER5
    machine (running under the hypervisor), and a legacy iSeries
    partition.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     

11 Sep, 2005

1 commit

  • This patch (written by me and also containing many suggestions of Arjan van
    de Ven) does a major cleanup of the spinlock code. It does the following
    things:

    - consolidates and enhances the spinlock/rwlock debugging code

    - simplifies the asm/spinlock.h files

    - encapsulates the raw spinlock type and moves generic spinlock
    features (such as ->break_lock) into the generic code.

    - cleans up the spinlock code hierarchy to get rid of the spaghetti.

    Most notably there's now only a single variant of the debugging code,
    located in lib/spinlock_debug.c. (previously we had one SMP debugging
    variant per architecture, plus a separate generic one for UP builds)

    Also, i've enhanced the rwlock debugging facility, it will now track
    write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too.
    All locks have lockup detection now, which will work for both soft and hard
    spin/rwlock lockups.

    The arch-level include files now only contain the minimally necessary
    subset of the spinlock code - all the rest that can be generalized now
    lives in the generic headers:

    include/asm-i386/spinlock_types.h | 16
    include/asm-x86_64/spinlock_types.h | 16

    I have also split up the various spinlock variants into separate files,
    making it easier to see which does what. The new layout is:

    SMP | UP
    ----------------------------|-----------------------------------
    asm/spinlock_types_smp.h | linux/spinlock_types_up.h
    linux/spinlock_types.h | linux/spinlock_types.h
    asm/spinlock_smp.h | linux/spinlock_up.h
    linux/spinlock_api_smp.h | linux/spinlock_api_up.h
    linux/spinlock.h | linux/spinlock.h

    /*
    * here's the role of the various spinlock/rwlock related include files:
    *
    * on SMP builds:
    *
    * asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
    * initializers
    *
    * linux/spinlock_types.h:
    * defines the generic type and initializers
    *
    * asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel
    * implementations, mostly inline assembly code
    *
    * (also included on UP-debug builds:)
    *
    * linux/spinlock_api_smp.h:
    * contains the prototypes for the _spin_*() APIs.
    *
    * linux/spinlock.h: builds the final spin_*() APIs.
    *
    * on UP builds:
    *
    * linux/spinlock_type_up.h:
    * contains the generic, simplified UP spinlock type.
    * (which is an empty structure on non-debug builds)
    *
    * linux/spinlock_types.h:
    * defines the generic type and initializers
    *
    * linux/spinlock_up.h:
    * contains the __raw_spin_*()/etc. version of UP
    * builds. (which are NOPs on non-debug, non-preempt
    * builds)
    *
    * (included on UP-non-debug builds:)
    *
    * linux/spinlock_api_up.h:
    * builds the _spin_*() APIs.
    *
    * linux/spinlock.h: builds the final spin_*() APIs.
    */

    All SMP and UP architectures are converted by this patch.

    arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
    crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should
    be mostly fine.

    From: Grant Grundler

    Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
    Builds 32-bit SMP kernel (not booted or tested). I did not try to build
    non-SMP kernels. That should be trivial to fix up later if necessary.

    I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids
    some ugly nesting of linux/*.h and asm/*.h files. Those particular locks
    are well tested and contained entirely inside arch specific code. I do NOT
    expect any new issues to arise with them.

    If someone does ever need to use debug/metrics with them, then they will
    need to unravel this hairball between spinlocks, atomic ops, and bit ops
    that exist only because parisc has exactly one atomic instruction: LDCW
    (load and clear word).

    From: "Luck, Tony"

    ia64 fix

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Grant Grundler
    Cc: Matthew Wilcox
    Signed-off-by: Hirokazu Takata
    Signed-off-by: Mikael Pettersson
    Signed-off-by: Benoit Boissinot
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

10 Sep, 2005

2 commits


09 Sep, 2005

4 commits

  • This patch pulls the PCI-related junk out of struct device_node and
    puts it in a separate structure, struct pci_dn. The device_node now
    just has a void * pointer in it, which points to a struct pci_dn for
    nodes that represent PCI devices. It could potentially be used in
    future for device-specific data for other sorts of devices, such as
    virtual I/O devices.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     
  • Remove asm-ppc64/segment.h now that all users are gone.

    Signed-off-by: Kumar Gala
    Signed-off-by: Paul Mackerras

    Kumar Gala
     
  • Merge a few asm-ppc and asm-ppc64 header files.
    Note: the merge of setup.h intentionally does not carry
    forward the m68k cruft. That means this patch continues
    to break the already broken amiga on the ppc32.

    Signed-off-by: Jon Loeliger
    Signed-off-by: Kumar Gala
    Signed-off-by: Paul Mackerras

    jdl@freescale.com
     
  • There were three changes necessary in order to allow
    sparc64 to use setup-res.c:

    1) Sparc64 roots the PCI I/O and MEM address space using
    parent resources contained in the PCI controller structure.
    I'm actually surprised no other platforms do this, especially
    ones like Alpha and PPC{,64}. These resources get linked into the
    iomem/ioport tree when PCI controllers are probed.

    So the hierarchy looks like this:

    iomem --|
    PCI controller 1 MEM space --|
    device 1
    device 2
    etc.
    PCI controller 2 MEM space --|
    ...
    ioport --|
    PCI controller 1 IO space --|
    ...
    PCI controller 2 IO space --|
    ...

    You get the idea. The drivers/pci/setup-res.c code allocates
    using plain iomem_space and ioport_space as the root, so that
    wouldn't work with the above setup.

    So I added a pcibios_select_root() that is used to handle this.
    It uses the PCI controller struct's io_space and mem_space on
    sparc64, and io{port,mem}_resource on every other platform to
    keep current behavior.

    2) quirk_io_region() is buggy. It takes in raw BUS view addresses
    and tries to use them as a PCI resource.

    pci_claim_resource() expects the resource to be fully formed when
    it gets called. The sparc64 implementation would do the translation
    but that's absolutely wrong, because if the same resource gets
    released then re-claimed we'll adjust things twice.

    So I fixed up quirk_io_region() to do the proper pcibios_bus_to_resource()
    conversion before passing it on to pci_claim_resource().

    3) I was mistakedly __init'ing the function methods the PCI controller
    drivers provide on sparc64 to implement some parts of these
    routines. This was, of course, easy to fix.

    So we end up with the following, and that nasty SPARC64 makefile
    ifdef in drivers/pci/Makefile is finally zapped.

    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David S. Miller
     

08 Sep, 2005

10 commits

  • This patch fixes a race condition where in system used to hang or sometime
    crash within minutes when kprobes are inserted on ISR routine and a task
    routine.

    The fix has been stress tested on i386, ia64, pp64 and on x86_64. To
    reproduce the problem insert kprobes on schedule() and do_IRQ() functions
    and you should see hang or system crash.

    Signed-off-by: Anil S Keshavamurthy
    Signed-off-by: Ananth N Mavinakayanahalli
    Acked-by: Prasanna S Panchamukhi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keshavamurthy Anil S
     
  • This patch contains the ppc64 architecture specific changes to prevent the
    possible race conditions.

    Signed-off-by: Prasanna S Panchamukhi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prasanna S Panchamukhi
     
  • This makes sense now that we have asm-powerpc.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • These two files are basically identical, so make one just include the other
    (protecting the 32-bit-only parts with __powerpc64__). Also remove some
    completely unused defines.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • This set of patches creates asm-generic/fcntl.h and consolidates as much as
    possible from the asm-*/fcntl.h files into it.

    This patch just gathers all the identical bits of the asm-*/fcntl.h files into
    asm-generic/fcntl.h.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Yoichi Yuasa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • Remove the deprecated (and unused) verify_area() from various uaccess.h
    headers.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • IRQ_PER_CPU is not used by all architectures. This patch introduces the
    macros ARCH_HAS_IRQ_PER_CPU and CHECK_IRQ_PER_CPU() to avoid the generation
    of dead code in __do_IRQ().

    ARCH_HAS_IRQ_PER_CPU is defined by architectures using IRQ_PER_CPU in their
    include/asm_ARCH/irq.h file.

    Through grepping the tree I found the following architectures currently use
    IRQ_PER_CPU:

    cris, ia64, ppc, ppc64 and parisc.

    Signed-off-by: Karsten Wiese
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Karsten Wiese
     
  • The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't
    very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE
    so that we can change it if necessary when a new vector is added.

    Because of include file ordering problems, doing this necessitated the
    extraction of the AT_* symbols into a standalone header file.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H. J. Lu
     
  • When I first wrote the compat layer patches, I was somewhat cavalier about
    the definition of compat_uid_t and compat_gid_t (or maybe I just
    misunderstood :-)). This patch makes the compat types much more consistent
    with the types we are being compatible with and hopefully will fix a few
    bugs along the way.

    compat type type in compat arch
    __compat_[ug]id_t __kernel_[ug]id_t
    __compat_[ug]id32_t __kernel_[ug]id32_t
    compat_[ug]id_t [ug]id_t

    The difference is that compat_uid_t is always 32 bits (for the archs we
    care about) but __compat_uid_t may be 16 bits on some.

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
    (which at least on UP usually means an immediate context switch to one of
    the waiter threads). This waiter wakes up and after a few instructions it
    attempts to acquire the cv internal lock, but that lock is still held by
    the thread calling pthread_cond_signal. So it goes to sleep and eventually
    the signalling thread is scheduled in, unlocks the internal lock and wakes
    the waiter again.

    Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
    to avoid this performance issue, but it was removed when locks were
    redesigned to the 3 state scheme (unlocked, locked uncontended, locked
    contended).

    Following scenario shows why simply using FUTEX_REQUEUE in
    pthread_cond_signal together with using lll_mutex_unlock_force in place of
    lll_mutex_unlock is not enough and probably why it has been disabled at
    that time:

    The number is value in cv->__data.__lock.
    thr1 thr2 thr3
    0 pthread_cond_wait
    1 lll_mutex_lock (cv->__data.__lock)
    0 lll_mutex_unlock (cv->__data.__lock)
    0 lll_futex_wait (&cv->__data.__futex, futexval)
    0 pthread_cond_signal
    1 lll_mutex_lock (cv->__data.__lock)
    1 pthread_cond_signal
    2 lll_mutex_lock (cv->__data.__lock)
    2 lll_futex_wait (&cv->__data.__lock, 2)
    2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock)
    # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
    2 lll_mutex_unlock_force (cv->__data.__lock)
    0 cv->__data.__lock = 0
    0 lll_futex_wake (&cv->__data.__lock, 1)
    1 lll_mutex_lock (cv->__data.__lock)
    0 lll_mutex_unlock (cv->__data.__lock)
    # Here, lll_mutex_unlock doesn't know there are threads waiting
    # on the internal cv's lock

    Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
    but it will cost us not one, but 2 extra syscalls and, what's worse, one of
    these extra syscalls will be done for every single waiting loop in
    pthread_cond_*wait.

    We would need to use lll_mutex_unlock_force in pthread_cond_signal after
    requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.

    Another alternative is to do the unlocking pthread_cond_signal needs to do
    (the lock can't be unlocked before lll_futex_wake, as that is racy) in the
    kernel.

    I have implemented both variants, futex-requeue-glibc.patch is the first
    one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
    The kernel interface allows userland to specify how exactly an unlocking
    operation should look like (some atomic arithmetic operation with optional
    constant argument and comparison of the previous futex value with another
    constant).

    It has been implemented just for ppc*, x86_64 and i?86, for other
    architectures I'm including just a stub header which can be used as a
    starting point by maintainers to write support for their arches and ATM
    will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been
    (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
    32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.

    With the following benchmark on UP x86-64 I get:

    for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
    for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done
    time elf/ld.so --library-path .:nptl-orig /tmp/bench
    real 0m0.655s user 0m0.253s sys 0m0.403s
    real 0m0.657s user 0m0.269s sys 0m0.388s
    time elf/ld.so --library-path .:nptl-requeue /tmp/bench
    real 0m0.496s user 0m0.225s sys 0m0.271s
    real 0m0.531s user 0m0.242s sys 0m0.288s
    time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
    real 0m0.380s user 0m0.176s sys 0m0.204s
    real 0m0.382s user 0m0.175s sys 0m0.207s

    The benchmark is at:
    http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt
    Older futex-requeue-glibc.patch version is at:
    http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt
    Older futex-wake_op-glibc.patch version is at:
    http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt
    Will post a new version (just x86-64 fixes so that the patch
    applies against pthread_cond_signal.S) to libc-hacker ml soon.

    Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
    testcase that will not test the atomicity of the operation, but at least
    check if the threads that should have been woken up are woken up and
    whether the arithmetic operation in the kernel gave the expected results.

    Acked-by: Ingo Molnar
    Cc: Ulrich Drepper
    Cc: Jamie Lokier
    Cc: Rusty Russell
    Signed-off-by: Yoichi Yuasa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jakub Jelinek
     

06 Sep, 2005

11 commits


05 Sep, 2005

2 commits

  • We need to indicate to the hypervisor that it needs to save our VMX
    registers when switching partitions on a shared-processor system, just as
    it needs to for FP and PMC registers.

    This could be made to be on-demand when VMX is used, but we don't do that
    for FP nor PMC right now either so let's not overcomplicate things.

    Signed-off-by: Olof Johansson
    Acked-by: Paul Mackerras
    Cc: Anton Blanchard
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Olof Johansson
     
  • This is used only in slab.c and each architecture gets to define whcih
    underlying type is to be used.

    Seems a bit silly - move it to slab.c and use the same type for all
    architectures: unsigned int.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyle Moffett