11 Oct, 2008

1 commit


04 Oct, 2008

1 commit

  • This fixes a regression that came with 934b2857cc576ae53c92a66e63fce7ddcfa74691
    ("[S390] nohz/sclp: disable timer on synchronous waits.").
    If udelay() gets called from a disabled context it sets the clock comparator
    to a value where it expects the next interrupt. When the interrupt happens
    the clock comparator gets not reset and therefore the interrupt condition
    doesn't get cleared. The result is an endless timer interrupt loop.

    In addition this patch fixes also the following:

    rcutorture reveals that our __udelay implementation is still buggy,
    since it might schedule tasklets, but prevents their execution:

    NOHZ: local_softirq_pending 42
    NOHZ: local_softirq_pending 02
    NOHZ: local_softirq_pending 142
    NOHZ: local_softirq_pending 02

    To fix this we make sure that only the clock comparator interrupt
    is enabled when the enabled wait psw is loaded.
    Also no code gets called anymore which might schedule tasklets.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

01 Aug, 2008

1 commit

  • sclp_sync_wait wait synchronously for an sclp interrupt and disables
    timer interrupts. However on the irq enter paths there is an extra
    check if a timer interrupt would be due and calls the timer callback.
    This would schedule softirqs in the wrong context.
    So introduce local_tick_enable/disable which prevents this.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

30 Apr, 2008

2 commits


17 Apr, 2008

2 commits

  • The current uaccess page table walk code assumes at a few places that
    any access is a user space access. This is not correct if somebody
    has issued a set_fs(KERNEL_DS) in advance.
    Add code which checks which address space we are in and with this make
    sure we access the correct address space. This way we get also rid of
    the dirty
    if (!currrent-mm)
    return -EFAULT;
    hack in futex_atomic_cmpxchg_pt.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • This way we get rid of s390's NO_IDLE_HZ and use the generic dynticks
    variant instead. In addition we get high resolution timers for free.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     

21 Mar, 2008

1 commit

  • a0c1e9073ef7428a14309cba010633a6cd6719ea "futex: runtime enable pi and
    robust functionality" introduces a test wether futex in atomic stuff
    works or not.
    It does that by writing to address 0 of the kernel address space. This
    will crash on older machines where addressing mode switching is enabled
    but where the mvcos instruction is not available. Page table walking is
    done by hand and therefore the code tries to access current->mm which
    is NULL.
    Therefore add an extra check, so we survive the early test.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

19 Feb, 2008

1 commit

  • Add missing exception table entry so that the kernel can handle
    proctection exceptions as well on the cs instruction. Currently only
    specification exceptions are handled correctly.
    The missing entry allows user space to crash the kernel.

    Cc: stable
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

26 Jan, 2008

2 commits

  • In s390's spin_lock_irqsave, interrupts remain disabled while
    spinning. In other architectures like x86 and powerpc, interrupts are
    re-enabled while spinning if IRQ is not masked before spin_lock_irqsave
    is called.

    The following patch re-enables interrupts through local_irq_restore
    while spinning for a lock acquisition.
    This can improve system response.

    [heiko.carstens@de.ibm.com: removed saving of pc]

    Signed-off-by: Hisashi Hifumi
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Hisashi Hifumi
     
  • Used to contain the address of the holder of the lock. But since the
    spinlock code is not inlined anymore all locks contain the same address
    anyway. And since in addtition nobody complained about that for ages
    its obviously unused. So remove it.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

22 Oct, 2007

2 commits


20 Oct, 2007

1 commit

  • is_init() is an ambiguous name for the pid==1 check. Split it into
    is_global_init() and is_container_init().

    A cgroup init has it's tsk->pid == 1.

    A global init also has it's tsk->pid == 1 and it's active pid namespace
    is the init_pid_ns. But rather than check the active pid namespace,
    compare the task structure with 'init_pid_ns.child_reaper', which is
    initialized during boot to the /sbin/init process and never changes.

    Changelog:

    2.6.22-rc4-mm2-pidns1:
    - Use 'init_pid_ns.child_reaper' to determine if a given task is the
    global init (/sbin/init) process. This would improve performance
    and remove dependence on the task_pid().

    2.6.21-mm2-pidns2:

    - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
    ppc,avr32}/traps.c for the _exception() call to is_global_init().
    This way, we kill only the cgroup if the cgroup's init has a
    bug rather than force a kernel panic.

    [akpm@linux-foundation.org: fix comment]
    [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
    [bunk@stusta.de: kernel/pid.c: remove unused exports]
    [sukadev@us.ibm.com: Fix capability.c to work with threaded init]
    Signed-off-by: Serge E. Hallyn
    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Pavel Emelianov
    Cc: Eric W. Biederman
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Herbert Poetzel
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

20 Jul, 2007

1 commit

  • This patch completes Linus's wish that the fault return codes be made into
    bit flags, which I agree makes everything nicer. This requires requires
    all handle_mm_fault callers to be modified (possibly the modifications
    should go further and do things like fault accounting in handle_mm_fault --
    however that would be for another patch).

    [akpm@linux-foundation.org: fix alpha build]
    [akpm@linux-foundation.org: fix s390 build]
    [akpm@linux-foundation.org: fix sparc build]
    [akpm@linux-foundation.org: fix sparc64 build]
    [akpm@linux-foundation.org: fix ia64 build]
    Signed-off-by: Nick Piggin
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: Ian Molton
    Cc: Bryan Wu
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: "Luck, Tony"
    Cc: Hirokazu Takata
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Greg Ungerer
    Cc: Matthew Wilcox
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Richard Curnow
    Cc: William Lee Irwin III
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Cc: Miles Bader
    Cc: Chris Zankel
    Acked-by: Kyle McMartin
    Acked-by: Haavard Skinnemoen
    Acked-by: Ralf Baechle
    Acked-by: Andi Kleen
    Signed-off-by: Andrew Morton
    [ Still apparently needs some ARM and PPC loving - Linus ]
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

10 Jul, 2007

1 commit

  • The bogomips calculation triggered via reading from /proc/cpuinfo
    can return incorrect values if the qrnnd assembly is called with a
    pointer in %r2 with any of the upper 32 bits set.
    Fix this by using 64 bit division / remainder operation provided by
    gcc instead of calling the assembly.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

26 Apr, 2007

1 commit

  • Allow s390 to properly override the generic
    __div64_32() implementation by:

    1) Using obj-y for div64.o in s390's makefile instead
    of lib-y

    2) Adding the weak attribute to the generic implementation.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Feb, 2007

1 commit

  • The new delay implementation uses the clock comparator and an external
    interrupt even if it is called disabled for interrupts. To do this
    all external interrupt source except clock comparator are switched of
    before enabling external interrupts. The external interrupt at the
    end of the delay period may not execute softirqs or we can end up in a
    dead-lock.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

06 Feb, 2007

5 commits

  • Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Preset the bogomips number to the cpu capacity value reported by
    store system information in SYSIB 1.2.2. This value is constant
    for a particular machine model and can be used to determine
    relative performance differences between machines.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • This patch adds support for clock synchronization to an external time
    reference (ETR). The external time reference sends an oscillator
    signal and a synchronization signal every 2^20 microseconds to keep
    the TOD clocks of all connected servers in sync. For availability
    two ETR units can be connected to a machine. If the clock deviates
    for more than the sync-check tolerance all cpus get a machine check
    that indicates that the clock is out of sync. For the lovely details
    how to get the clock back in sync see the code below.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • This provides a noexec protection on s390 hardware. Our hardware does
    not have any bits left in the pte for a hw noexec bit, so this is a
    different approach using shadow page tables and a special addressing
    mode that allows separate address spaces for code and data.

    As a special feature of our "secondary-space" addressing mode, separate
    page tables can be specified for the translation of data addresses
    (storage operands) and instruction addresses. The shadow page table is
    used for the instruction addresses and the standard page table for the
    data addresses.
    The shadow page table is linked to the standard page table by a pointer
    in page->lru.next of the struct page corresponding to the page that
    contains the standard page table (since page->private is not really
    private with the pte_lock and the page table pages are not in the LRU
    list).
    Depending on the software bits of a pte, it is either inserted into
    both page tables or just into the standard (data) page table. Pages of
    a vma that does not have the VM_EXEC bit set get mapped only in the
    data address space. Any try to execute code on such a page will cause a
    page translation exception. The standard reaction to this is a SIGSEGV
    with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn)
    and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the
    kernel to the signal stack frame. Unfortunately, the signal return
    mechanism cannot be modified to use an SA_RESTORER because the
    exception unwinding code depends on the system call opcode stored
    behind the signal stack frame.

    This feature requires that user space is executed in secondary-space
    mode and the kernel in home-space mode, which means that the addressing
    modes need to be switched and that the noexec protection only works
    for user space.
    After switching the addressing modes, we cannot use the mvcp/mvcs
    instructions anymore to copy between kernel and user space. A new
    mvcos instruction has been added to the z9 EC/BC hardware which allows
    to copy between arbitrary address spaces, but on older hardware the
    page tables need to be walked manually.

    Signed-off-by: Gerald Schaefer
    Signed-off-by: Martin Schwidefsky

    Gerald Schaefer
     
  • Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

09 Jan, 2007

1 commit


08 Dec, 2006

2 commits

  • Doesn't seem to be a good idea to duplicate code :)

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Introduce pagefault_{disable,enable}() and use these where previously we did
    manual preempt increments/decrements to make the pagefault handler do the
    atomic thing.

    Currently they still rely on the increased preempt count, but do not rely on
    the disabled preemption, this might go away in the future.

    (NOTE: the extra barrier() in pagefault_disable might fix some holes on
    machines which have too many registers for their own good)

    [heiko.carstens@de.ibm.com: s390 fix]
    Signed-off-by: Peter Zijlstra
    Acked-by: Nick Piggin
    Cc: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

04 Dec, 2006

1 commit

  • Use a wrapper for copy_to/from_user to chose the best usercopy method.
    The mvcos instruction is better for sizes greater than 256 bytes, if
    mvcos is not available a page table walk is better for sizes greater
    than 1024 bytes. Also removed the redundant copy_to/from_user_std_small
    functions.

    Signed-off-by: Gerald Schaefer
    Signed-off-by: Martin Schwidefsky

    Gerald Schaefer
     

01 Oct, 2006

1 commit


28 Sep, 2006

3 commits

  • Major cleanup of all s390 inline assemblies. They now have a common
    coding style. Quite a few have been shortened, mainly by using register
    asm variables. Use of the EX_TABLE macro helps as well. The atomic ops,
    bit ops and locking inlines new use the Q-constraint if a newer gcc
    is used. That results in slightly better code.

    Thanks to Christian Borntraeger for proof reading the changes.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • A user space program can read uninitialised kernel memory
    by appending to a file from a bad address and then reading
    the result back. The cause is the copy_from_user function
    that does not clear the remaining bytes of the kernel
    buffer after it got a fault on the user space address.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The clocksource infrastructure introduced with commit
    ad596171ed635c51a9eef829187af100cbf8dcf7 broke 31 bit s390.
    The reason is that the do_div() primitive for 31 bit always
    had a restriction: it could only divide an unsigned 64 bit
    integer by an unsigned 31 bit integer. The clocksource code
    now uses do_div() with a base value that has the most
    significant bit set. The result is that clock->cycle_interval
    has a funny value which causes the linux time to jump around
    like mad.
    The solution is "obvious": implement a proper __div64_32
    function for 31 bit s390.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

20 Sep, 2006

2 commits


30 Aug, 2006

1 commit

  • The copy_in_user primitive does not work as advertised. If the source
    and target area are available copy_in_user copies one byte too much.
    If one of the memory areas is not available it does not copy as much
    data as it can, but up to 257 bytes less.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

12 Jul, 2006

1 commit


01 Jul, 2006

1 commit


10 Mar, 2006

1 commit

  • Currently the code tries up to spin_retry times to grab a lock using the cs
    instruction. The cs instruction has exclusive access to a memory region
    and therefore invalidates the appropiate cache line of all other cpus. If
    there is contention on a lock this leads to cache line trashing. This can
    be avoided if we first check wether a cs instruction is likely to succeed
    before the instruction gets actually executed.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christian Ehrhardt
     

09 Mar, 2006

1 commit

  • strnlen_user is supposed to return then length count + 1 if no terminating \0
    is found, and it should return 0 on exception. Found by David Howells
    .

    Signed-off-by: Gerald Schaefer
    Signed-off-by: Heiko Carstens
    Acked-By: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     

15 Feb, 2006

1 commit


02 Feb, 2006

1 commit

  • - Remove all CVS generated information like e.g. revision IDs from
    drivers/s390 and include/asm-s390 (none present in arch/s390).

    - Add newline at end of arch/s390/lib/Makefile to avoid diff message.

    Acked-by: Andreas Herrmann
    Acked-by: Frank Pavlic
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens