23 Mar, 2006

40 commits

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Cc: Karsten Keil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Convert fs/9p/mux.c from semaphore to mutex.

    NOTE: fixed locking bugs in the process - the code was using semaphores
    the other way around.

    Signed-off-by: Ingo Molnar
    Cc: Eric Van Hensbergen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Convert kernel/rcupdate's rcu_barrier_sema to mutex.

    Signed-off-by: Ingo Molnar
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • convert cpuset.c's callback_sem and manage_sem to mutexes.
    Build and boot tested by Ingo.
    Build, boot, unit and stress tested by pj.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • When quota is being turned off we assumed that all the references to dquots
    were already dropped. That need not be true as inodes being deleted are
    not on superblock's inodes list and hence we need not reach it when
    removing quota references from inodes. So invalidate_dquots() has to wait
    for all the users of dquots (as quota is already marked as turned off, no
    new references can be acquired and so this is bound to happen rather
    early). When we do this, we can also remove the iprune_sem locking as it
    was protecting us against exactly the same problem when freeing inodes
    icache memory.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Seems like needless clutter having a bunch of #if defined(CONFIG_$ARCH) in
    include/linux/cache.h. Move the per architecture section definition to
    asm/cache.h, and keep the if-not-defined dummy case in linux/cache.h to
    catch architectures which don't implement the section.

    Verified that symbols still go in .data.read_mostly on parisc,
    and the compile doesn't break.

    Signed-off-by: Kyle McMartin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyle McMartin
     
  • Since early 2.4.x all cdrom drivers implement the block_device methods
    themselves, so they can handle additional ioctls directly instead of going
    through the cdrom layer.

    Signed-off-by: Christoph Hellwig
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Add a small helper for each ioctl to cut down cdrom_ioctl to a readable
    size.

    Signed-off-by: Christoph Hellwig
    Cc: Acked-by: Jens Axboe
    Signed-off-by: Benoit Boissinot
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Avoid taking the global tasklist_lock when possible, if a process is single
    threaded during getrusage(). Any avoidance of tasklist_lock is good for
    NUMA boxes (and possibly for large SMPs). Thanks to Oleg Nesterov for
    review and suggestions.

    Signed-off-by: Nippun Goel
    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     
  • 1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
    platforms, lowering kmalloc() allocated space by 50%.

    2) Reduce the size of (files_struct), using a special 32 bits (or
    64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
    close_on_exec_init and open_fds_init fields. This save some ram (248
    bytes per task) as most tasks dont open more than 32 files. D-Cache
    footprint for such tasks is also reduced to the minimum.

    3) Reduce size of allocated fdset. Currently two full pages are
    allocated, that is 32768 bits on x86 for example, and way too much. The
    minimum is now L1_CACHE_BYTES.

    UP and SMP should benefit from this patch, because most tasks will touch
    only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
    (next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
    same cache line)

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Linus points out that ext3_readdir's readahead only cuts in when
    ext3_readdir() is operating at the very start of the directory. So for large
    directories we end up performing no readahead at all and we suck.

    So take it all out and use the core VM's page_cache_readahead(). This means
    that ext3 directory reads will use all of readahead's dynamic sizing goop.

    Note that we're using the directory's filp->f_ra to hold the readahead state,
    but readahead is actually being performed against the underlying blockdev's
    address_space. Fortunately the readahead code is all set up to handle this.

    Tested with printk. It works. I was struggling to find a real workload which
    actually cared.

    (The patch also exports page_cache_readahead() to GPL modules)

    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Use the standard BCD macros instead of redefining them.

    Signed-off-by: Jean Delvare
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     
  • Add the SNAPSHOT_S2RAM ioctl to the snapshot device.

    This ioctl allows a userland application to make the system (previously frozen
    with the SNAPSHOT_FREE ioctl) enter the S3 state without freezing processes
    and disabling nonboot CPUs for the second time.

    This will allow us to implement the suspend-to-disk-and-RAM (STDR)
    functionality in the userland suspend tools.

    Signed-off-by: Luca Tettamanti
    Signed-off-by: Rafael J. Wysocki
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luca Tettamanti
     
  • Remove the console-switching code from the suspend part of the swsusp userland
    interface and let the userland tools switch the console.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • It is unsafe to suspend devices if the hardware is controlled by X. Add an
    extra check to prevent this from happening.

    Signed-off-by: Rafael J. Wysocki
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Highmem could be in pcp list as well.

    Signed-off-by: Shaohua Li
    Acked-by: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • This patch from Pavel moves userland freeze signals handling into more logical
    place. It now hits even with mysqld running.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Combination of printk/pr_debug led to in the middle of the line, and we
    printed way too many dots.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Allow swsusp to freeze processes successfully under heavy load by freezing
    userspace processes before kernel threads.

    [Thanks to Nigel Cunningham for suggesting the
    way to go.]

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • This patch introduces a user space interface for swsusp.

    The interface is based on a special character device, called the snapshot
    device, that allows user space processes to perform suspend and resume-related
    operations with the help of some ioctls and the read()/write() functions.
     Additionally it allows these processes to allocate free swap pages from a
    selected swap partition, called the resume partition, so that they know which
    sectors of the resume partition are available to them.

    The interface uses the same low-level system memory snapshot-handling
    functions that are used by the built-it swap-writing/reading code of swsusp.

    The interface documentation is included in the patch.

    The patch assumes that the major and minor numbers of the snapshot device will
    be 10 (ie. misc device) and 231, the registration of which has already been
    requested.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Update suspend-to-RAM documentation with new machines, and makes message
    when processes can't be stopped little clearer. (In one case, waiting
    longer actually did help).

    From: "Rafael J. Wysocki"

    Warn in the documentation that data may be lost if there are some
    filesystems mounted from USB devices before suspend.

    [Thanks to Alan Stern for providing the answer to the question in the
    Q:-A: part.]

    Signed-off-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Move externs from C source files to header files.

    Signed-off-by: Randy Dunlap
    Cc: "Rafael J. Wysocki"
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Move the swap-writing/reading code of swsusp to a separate file.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Introduce the low level interface that can be used for handling the
    snapshot of the system memory by the in-kernel swap-writing/reading code of
    swsusp and the userland interface code (to be introduced shortly).

    Also change the way in which swsusp records the allocated swap pages and,
    consequently, simplifies the in-kernel swap-writing/reading code (this is
    necessary for the userland interface too). To this end, it introduces two
    helper functions in mm/swapfile.c, so that the swsusp code does not refer
    directly to the swap internals.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • This was a temporary thing for 2.6.16.

    Cc: "Rafael J. Wysocki"
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Don't create "online" control file for BSP (i386/x86_64) since its
    not removable.

    We originally added this to support ppc64 if the kernel has support but
    BIOS indicated no offline support, we just didnt create online files for
    them.

    We used the same method in ia64 as well, if we have a cpu taking platform
    interrupts but cannot be removed if those interrupts cannot be re-targeted
    to another cpu.

    Signed-off-by: Ashok Raj
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ashok Raj
     
  • Gcc reserves %ebx when compiling position-independent-code on i386. This
    means, the _syscallX() macros in include/asm-i386/unistd.h will not
    compile. This patch is changes the existing macros to take special care to
    preserve %ebx.

    The bug can be tracked at http://bugzilla.kernel.org/show_bug.cgi?id=6204

    Signed-off-by: Markus Gutschke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Gutschke
     
  • You must always ensure to fulfill the dependencies of what you are
    select'ing.

    Signed-off-by: Adrian Bunk
    Cc: Martin Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • _raw_spin_lock_flags() is entered with interrupts disabled. If it cannot
    obtain a spinlock, it checks the flags that were passed and re-enables
    interrupts before spinning if that's how the flags are set. When the
    spinlock might be available, it disables interrupts (even if they are
    already disabled) before trying to get the lock. Change that so interrupts
    are only disabled if they have been enabled. This costs nine bytes of
    duplicated spinloop code.

    Fastpath before patch:
    jle not-taken conditional jump
    cli disable interrupts
    jmp unconditional jump

    Fastpath after patch, if interrupts were not enabled:
    jg taken conditional branch

    Signed-off-by: Chuck Ebbert
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     
  • Checking APIC version instead of CPU family to determine XAPIC. Family 6
    CPU could have xapic as well.

    Signed-off-by: Shaohua Li
    Cc: Dave Jones
    Cc: "Seth, Rohit"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Detecting cache line using cpuid.4, cpuid level 4 is enough.

    Signed-off-by: Shaohua Li
    Cc: Dave Jones
    Cc: "Seth, Rohit"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • i386 has a small bug in the stack dump code where it prints an extra log
    level code. Remove that and fix the alignment of normal stack dump
    printout. Also remove some unnecessary printk() calls.

    Signed-off-by: Chuck Ebbert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     
  • With cpu_gdt_descr having been converted to per-CPU data, the old object
    (in head.S) no longer needs to reserve space for each CPU's instance. With
    cpu_gdt_table not being used for CPU 0 anymore, it doesn't seem to need
    page alignment (or if in fact there is a need for it to retain that
    alignment, the whole object should go into .data.page_align).

    Signed-off-by: Jan Beulich
    Acked-by: Zachary Amsden
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • arch/i386/kernel/cpu/centaur.c: In function `centaur_mcr_insert':
    arch/i386/kernel/cpu/centaur.c:33: warning: implicit declaration of function `mtrr_centaur_report_mcr'

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • arch/i386/kernel/apic.c:840: warning: implicit declaration of function `GET_APIC_ID'

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Document a limitation of vsyscall-sysenter, since patches to fix it have
    been rejected.

    Signed-off-by: Chuck Ebbert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     
  • Using PTRACE_SINGLESTEP on a child that does an int80 syscall misses the
    SIGTRAP that should be delivered upon syscall exit. Fix that by setting
    TIF_SINGLESTEP when entering the kernel via int80 with TF set.

    /* Test whether singlestep through an int80 syscall works.
    */
    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static int child, status;
    static struct user_regs_struct regs;

    static void do_child()
    {
    ptrace(PTRACE_TRACEME, 0, 0, 0);
    kill(getpid(), SIGUSR1);
    asm ("int $0x80" : : "a" (20)); /* getpid */
    }

    static void do_parent()
    {
    unsigned long eip, expected = 0;
    again:
    waitpid(child, &status, 0);
    if (WIFEXITED(status) || WIFSIGNALED(status))
    return;

    if (WIFSTOPPED(status)) {
    ptrace(PTRACE_GETREGS, child, 0, ®s);
    eip = regs.eip;
    if (expected)
    fprintf(stderr, "child stop @ %08x, expected %08x %s\n",
    eip, expected,
    eip == expected ? "" : "
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chuck Ebbert
     
  • >commit 76381fee7e8feb4c22be636aa5d4765dbe4fbf9e
    >Author: Vincent Hanquez
    >Date: Thu Jun 23 00:08:46 2005 -0700
    >
    > [PATCH] xen: x86_64: use more usermode macro
    >
    > Make use of the user_mode macro where it's possible. This is useful for Xen
    > because it will need only to redefine only the macro to a hypervisor call.

    I am of the opinion that the above changeset is incomplete, i.e. it missed
    converting some previous uses of user_mode to user_mode_vm. While most of
    them could be considered just cosmetical, at least the one in die_nmi
    doesn't appear to be.

    Signed-off-by: Jan Beulich
    Cc: Vincent Hanquez
    Cc: Zachary Amsden
    Cc: James Bottomley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • Registering a callback handler through register_die_notifier() is obviously
    primarily intended for use by modules. However, the way these currently
    get called it is basically impossible for them to actually be used by
    modules, as there is, on non-PAE configurationes, a good chance (the larger
    the module, the better) for the system to crash as a result.

    This is because the callback gets invoked

    (a) in the page fault path before the top level page table propagation
    gets carried out (hence a fault to propagate the top level page table
    entry/entries mapping to module's code/data would nest infinitly) and

    (b) in the NMI path, where nested faults must absolutely not happen,
    since otherwise the IRET from the nested fault re-enables NMIs,
    potentially resulting in nested NMI occurences.

    Besides the modular aspect, similar problems would even arise for in-
    kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
    memory inside their handlers.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich