08 Jul, 2005

40 commits

  • We're dereferencing `flp' and then we're testing it for NULLness.

    Either the compiler accidentally saved us or the existing null-pointer checdk
    is redundant.

    This defect was found automatically by Coverity Prevent, a static analysis tool.

    Signed-off-by: Zaur Kambarov
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMBAROV, ZAUR
     
  • Correctly test for a null pointer before going and dereferencing it.

    This defect was found automatically by Coverity Prevent, a static analysis
    tool.

    Signed-off-by: Zaur Kambarov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMBAROV, ZAUR
     
  • The BKS might be reacquired before we have dropped PREEMPT_ACTIVE, which
    could trigger a second could trigger a second cond_resched() call. Bug
    found by Hirofumi Ogawa.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • The attached patch makes the keyring functions calculate the new size of a
    keyring's payload based on the size of pointer to the key struct, not the size
    of the key struct itself.

    Signed-Off-By: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Fix debugging printk.

    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • GCC 4 complains because the function put_compat_shminfo() can't get to its
    return statement if there is no error... If the function does not return
    -EFAULT, it doesn't return anything at all. Looks like a typo.

    Signed-off-by: Jesse Millan
    Signed-off-by: Domen Puncer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesse Millan
     
  • We are not using the in-inode space for xattrs in reserved inodes because
    mkfs.ext3 doesn't initialize it properly. For those inodes, we set
    i_extra_isize to 0. Make sure that we also don't overwrite the
    i_extra_isize field when writing out the inode in that case. This is for
    cleanliness only, and doesn't fix an actual bug.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Gruenbacher
     
  • Add a new section called ".data.read_mostly" for data items that are read
    frequently and rarely written to like cpumaps etc.

    If these maps are placed in the .data section then these frequenly read
    items may end up in cachelines with data is is frequently updated. In that
    case all processors in an SMP system must needlessly reload the cachelines
    again and again containing elements of those frequently used variables.

    The ability to share these cachelines will allow each cpu in an SMP system
    to keep local copies of those shared cachelines thereby optimizing
    performance.

    Signed-off-by: Alok N Kataria
    Signed-off-by: Shobhit Dayal
    Signed-off-by: Christoph Lameter
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Original patch from Matt Mackall

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Gruenbacher
     
  • Use a bit spin lock in the first buffer of the page to synchronise asynch
    IO buffer completions, instead of the global page_uptodate_lock, which is
    showing some scalabilty problems.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • The patch fixes a few corner cases around tty line editing with
    very long input lines:

    - n_tty_receive_char(): don't simply drop eol characters,
    otherwise canon_data isn't increased and the reader isn't woken
    up.

    - n_tty_receive_room(): If there is no newline pending and the
    edit buffer is full, allow only a single character to be written
    (until eol is found and the line is flushed), so characters from
    the next line aren't dropped.

    - write_chan(): if an incomplete line was written, continue
    writing until write() returns 0, otherwise it might not write
    the eol character to flush the line and the writer goes to sleep
    without ever being woken up.

    BTW the core problem is that part of this should be handled in the
    receive_buf path, but for this it has to return the number of
    written characters, as the amount of written characters may not be
    the same as the amount of characters going into the write buffer,
    so the receive_room() usage in pty_write() is not really reliable.

    Alan said:

    The problem looks valid. The behaviour of 'traditional unix' appears to
    be the following

    If you exceed the line limit then beep and drop the character
    Always allow EOL to complete a canonical line input
    Always do signal/control processing if enabled

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • xtensa is now in -rc1, with the obsolete syscalls still in there, so I
    guess this about the last chance to correct the ABI. Applying the patch
    obviously breaks all sorts of user space binaries and probably also
    requires the appropriate changes to be made to libc.

    On the other hand, if a decision is made to keep the broken interface, it
    should at least be a conscious one instead of an oversight.

    Signed-off-by: Arnd Bergmann
    Cc: Chris Zankel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Replace a semaphore (winch_handler_sem) used in atomic code with a
    spinlock, and reduces as needed the amount of protected code to the bare
    minimum (for instance no kmalloc calls are needed).

    This fixes the last problems with spinlocking (in UP mode with DEBUG
    options); the semaphore, taken inside spinlocks, caused a "spin_lock was
    already locked" warning, without this patch.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     
  • Some time ago a trivial patch broke HPPFS (one var became a pointer, not
    all uses were updated). It wasn't fixed at that time because not very
    used, now it's been requested so I've fixed this, and it has been tested
    positively (at least partially).

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     
  • This patch implements the clone-stub mechanism, which allows skas0 to run
    with proc_mm==0, even if the clib in UML uses modify_ldt.

    Note: There is a bug in skas3.v7 host patch, that avoids UML-skas from
    running properly on a SMP-box. In full skas3, I never really saw problems,
    but in skas0 they showed up.

    More commentary by jdike - What this patch does is makes sure that the host
    parent of each new host process matches the UML parent of the corresponding
    UML process. This ensures that any changed LDTs are inherited. This is
    done by having clone actually called by the UML process from its stub,
    rather than by the kernel. We have special syscall stubs that are loaded
    onto the stub code page because that code must be completely
    self-contained. These stubs are given C interfaces, and used like normal C
    functions, but there are subtleties. Principally, we have to be careful
    about stack variables in stub_clone_handler after the clone. The code is
    written so that there aren't any - everything boils down to a fixed
    address. If there were any locals, references to them after the clone
    would be wrong because the stack just changed.

    Signed-off-by: Bodo Stroesser
    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bodo Stroesser
     
  • UML has had two modes of operation - an insecure, slow mode (tt mode) in
    which the kernel is mapped into every process address space which requires
    no host kernel modifications, and a secure, faster mode (skas mode) in
    which the UML kernel is in a separate host address space, which requires a
    patch to the host kernel.

    This patch implements something very close to skas mode for hosts which
    don't support skas - I'm calling this skas0. It provides the security of
    the skas host patch, and some of the performance gains.

    The two main things that are provided by the skas patch, /proc/mm and
    PTRACE_FAULTINFO, are implemented in a way that require no host patch.

    For the remote address space changing stuff (mmap, munmap, and mprotect),
    we set aside two pages in the process above its stack, one of which
    contains a little bit of code which can call mmap et al.

    To update the address space, the system call information (system call
    number and arguments) are written to the stub page above the code. The
    %esp is set to the beginning of the data, the %eip is set the the start of
    the stub, and it repeatedly pops the information into its registers and
    makes the system call until it sees a system call number of zero. This is
    to amortize the cost of the context switch across multiple address space
    updates.

    When the updates are done, it SIGSTOPs itself, and the kernel process
    continues what it was doing.

    For a PTRACE_FAULTINFO replacement, we set up a SIGSEGV handler in the
    child, and let it handle segfaults rather than nullifying them. The
    handler is in the same page as the mmap stub. The second page is used as
    the stack. The handler reads cr2 and err from the sigcontext, sticks them
    at the base of the stack in a faultinfo struct, and SIGSTOPs itself. The
    kernel then reads the faultinfo and handles the fault.

    A complication on x86_64 is that this involves resetting the registers to
    the segfault values when the process is inside the kill system call. This
    breaks on x86_64 because %rcx will contain %rip because you tell SYSRET
    where to return to by putting the value in %rcx. So, this corrupts $rcx on
    return from the segfault. To work around this, I added an
    arch_finish_segv, which on x86 does nothing, but which on x86_64 ptraces
    the child back through the sigreturn. This causes %rcx to be restored by
    sigreturn and avoids the corruption. Ultimately, I think I will replace
    this with the trick of having it send itself a blocked signal which will be
    unblocked by the sigreturn. This will allow it to be stopped just after
    the sigreturn, and PTRACE_SYSCALLed without all the back-and-forth of
    PTRACE_SYSCALLing it through sigreturn.

    This runs on a stock host, so theoretically (and hopefully), tt mode isn't
    needed any more. We need to make sure that this is better in every way
    than tt mode, though. I'm concerned about the speed of address space
    updates and page fault handling, since they involve extra round-trips to
    the child. We can amortize the round-trip cost for large address space
    updates by writing all of the operations to the data page and having the
    child execute them all at the same time. This will help fork and exec, but
    not page faults, since they involve only one page.

    I can't think of any way to help page faults, except to add something like
    PTRACE_FAULTINFO to the host. There is PTRACE_SIGINFO, but UML doesn't use
    siginfo for SIGSEGV (or anything else) because there isn't enough
    information in the siginfo struct to handle page faults (the faulting
    operation type is missing). Adding that would make PTRACE_SIGINFO a usable
    equivalent to PTRACE_FAULTINFO.

    As for the code itself:

    - The system call stub is in arch/um/kernel/sys-$(SUBARCH)/stub.S. It is
    put in its own section of the binary along with stub_segv_handler in
    arch/um/kernel/skas/process.c. This is manipulated with run_syscall_stub
    in arch/um/kernel/skas/mem_user.c. syscall_stub will execute any system
    call at all, but it's only used for mmap, munmap, and mprotect.

    - The x86_64 stub calls sigreturn by hand rather than allowing the normal
    sigreturn to happen, because the normal sigreturn is a SA_RESTORER in
    UML's address space provided by libc. Needless to say, this is not
    available in the child's address space. Also, it does a couple of odd
    pops before that which restore the stack to the state it was in at the
    time the signal handler was called.

    - There is a new field in the arch mmu_context, which is now a union.
    This is the pid to be manipulated rather than the /proc/mm file
    descriptor. Code which deals with this now checks proc_mm to see whether
    it should use the usual skas code or the new code.

    - userspace_tramp is now used to create a new host process for every UML
    process, rather than one per UML processor. It checks proc_mm and
    ptrace_faultinfo to decide whether to map in the pages above its stack.

    - start_userspace now makes CLONE_VM conditional on proc_mm since we need
    separate address spaces now.

    - switch_mm_skas now just sets userspace_pid[0] to the new pid rather
    than PTRACE_SWITCH_MM. There is an addition to userspace which updates
    its idea of the pid being manipulated each time around the loop. This is
    important on exec, when the pid will change underneath userspace().

    - The stub page has a pte, but it can't be mapped in using tlb_flush
    because it is part of tlb_flush. This is why it's required for it to be
    mapped in by userspace_tramp.

    Other random things:

    - The stub section in uml.lds.S is page aligned. This page is written
    out to the backing vm file in setup_physmem because it is mapped from
    there into user processes.

    - There's some confusion with TASK_SIZE now that there are a couple of
    extra pages that the process can't use. TASK_SIZE is considered by the
    elf code to be the usable process memory, which is reasonable, so it is
    decreased by two pages. This confuses the definition of
    USER_PGDS_IN_LAST_PML4, making it too small because of the rounding down
    of the uneven division. So we round it to the nearest PGDIR_SIZE rather
    than the lower one.

    - I added a missing PT_SYSCALL_ARG6_OFFSET macro.

    - um_mmu.h was made into a userspace-usable file.

    - proc_mm and ptrace_faultinfo are globals which say whether the host
    supports these features.

    - There is a bad interaction between the mm.nr_ptes check at the end of
    exit_mmap, stack randomization, and skas0. exit_mmap will stop freeing
    pages at the PGDIR_SIZE boundary after the last vma. If the stack isn't
    on the last page table page, the last pte page won't be freed, as it
    should be since the stub ptes are there, and exit_mmap will BUG because
    there is an unfreed page. To get around this, TASK_SIZE is set to the
    next lowest PGDIR_SIZE boundary and mm->nr_ptes is decremented after the
    calls to init_stub_pte. This ensures that we know the process stack (and
    all other process mappings) will be below the top page table page, and
    thus we know that mm->nr_ptes will be one too many, and can be
    decremented.

    Things that need fixing:

    - We may need better assurrences that the stub code is PIC.

    - The stub pte is set up in init_new_context_skas.

    - alloc_pgdir is probably the right place.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • freezeable() already tests for TRACED/STOPPED processes, no need to do it
    twice.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Fix error handling and whitespace in swsusp.c. swsusp_free() was called when
    there was nothing allocating, leading to oops.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Move device name resolution code around so that it is not called from
    resume-from-initrd. name_to_dev_t may be unavailable at that point.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Fix u32 vs pm_message_t confusion in cpufreq.

    Signed-off-by: Bernard Blackham
    Signed-off-by: Pavel Machek
    Cc: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernard Blackham
     
  • Few more u32 vs. pm_message_t fixes.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • If CONFIG_NUMA isn't set, we use the define in for
    early_pfn_to_nid (which defines it to 0).

    Because of this, the prototype needs to move inside the CONFIG_NUMA too, or
    anal gcc's get really confused.

    Signed-off-by: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • The recent cleanups to asm-i386/mmzone.h were suboptimal nesting an ifdef of
    the same symbol. This patch removes some of the ifdef'ery to make things more
    readable again.

    Signed-off-by: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • There has been some discuss about solving the SMP MTRR suspend/resume
    breakage, but I didn't find a patch for it. This is an intent for it. The
    basic idea is moving mtrr initializing into cpu_identify for all APs (so it
    works for cpu hotplug). For BP, restore_processor_state is responsible for
    restoring MTRR.

    Signed-off-by: Shaohua Li
    Acked-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • This patch by Yoshihiro MATSUYAMA (already ACK'ed by David Howells) adds a
    defconfig for the frv arch.

    Signed-Off-By: Yoshihiro MATSUYAMA
    Signed-off-by: Adrian Bunk
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • We dont need to use the PERFMON exception on POWER5, in fact the firmware
    returns an error. Due to this just remove the warning.

    Also now that we have proper runlatch support we can remove the bootup
    hack.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • Not sure if we really need this, but it was handy to know which iSeries loop I
    was testing.

    Be consistent about printing which idle loop we're using, with this patch we
    cover all cases.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • Fix a compile warning introduced by the previous patches.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • - remove some unnecessary includes
    - add runlatch support
    - no need to use raw_smp_processor_id any more, current preempt debug
    logic checks for processes that are bound to one cpu.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • - separate out sleep logic in dedicated_idle, it was so far indented
    that it got squashed against the right side of the screen.
    - add runlatch support, looping on runlatch disable.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • - remove min/max yield time, we dont use the values anywhere
    - separate shared and dedicated idle loops
    - check need_resched again with irqs off to avoid sleeping with pending work
    - continually set runlatch off in idle loop, this means we dont need to
    turn the runlatch off on exception exit and suffer that associated
    cost for all exceptions. (A future patch will turn the runlatch on at
    exception entry)

    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • Now that the idle loop is configured by each platform we don't need
    idle_setup() anymore.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • This patch fixes up iSeries, pSeries, pmac and maple to set the correct idle
    function for each platform.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • dedicated_idle() and shared_idle() are only used by pSeries, so move them into
    pSeries_setup.c

    Signed-off-by: Michael Ellerman
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • Move iSeries_idle() into iSeries_setup.c, no one else needs to know about it.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • This patch adds an idle member to the ppc_md structure and calls it from
    cpu_idle(). If a platform leaves ppc_md.idle as null it will get the default
    idle loop default_idle().

    Signed-off-by: Michael Ellerman
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • Now that hvc_get_chars doesn't strip NULs, hvsi doesn't have to duplicate it.

    Signed-off-by: Milton Miller
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • Separate the NUL character filtering from get_hvc_chars.

    Signed-off-by: Milton Miller
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milton Miller
     
  • When registering the hvc console port, register a list of ops (read and write)
    to go with it, instead of calling fixed function names.

    This allows different ports to encode the data differently.

    Signed-off-by: Milton Miller
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milton Miller