14 Jan, 2010

3 commits

  • This one is much faster than the spinlock based fallback rwsem code,
    with certain artifical benchmarks having shown 300%+ improvement on
    threaded page faults etc.

    Again, note the 32767-thread limit here. So this really does need that
    whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
    extension on top of it, but that is conceptually a totally independent
    issue.

    NOT TESTED! The original patch that this all was based on were tested by
    KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
    cleaned-up series, so caveat emptor..

    Also note that it _may_ be a good idea to mark some more registers
    clobbered on x86-64 in the inline asms instead of saving/restoring them.
    They are inline functions, but they are only used in places where there
    are not a lot of live registers _anyway_, so doing for example the
    clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
    worse, and would make the slow-path code smaller.

    (Not that the slow-path really matters to that degree. Saving a few
    unnecessary registers is the _least_ of our problems when we hit the slow
    path. The instruction/cycle counting really only matters in the fast
    path).

    Signed-off-by: Linus Torvalds
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Linus Torvalds
     
  • The fast version of the rwsems (the code that uses xadd) has
    traditionally only worked on x86-32, and as a result it mixes different
    kinds of types wildly - they just all happen to be 32-bit. We have
    "long", we have "__s32", and we have "int".

    To make it work on x86-64, the types suddenly matter a lot more. It can
    be either a 32-bit or 64-bit signed type, and both work (with the caveat
    that a 32-bit counter will only have 15 bits of effective write
    counters, so it's limited to 32767 users). But whatever type you
    choose, it needs to be used consistently.

    This makes a new 'rwsem_counter_t', that is a 32-bit signed type. For a
    64-bit type, you'd need to also update the BIAS values.

    Signed-off-by: Linus Torvalds
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Linus Torvalds
     
  • Using kernel_stack_pointer() allows 32-bit and 64-bit versions to
    be merged. This is more correct for 64-bit, since the old %rsp is
    always saved on the stack.

    Signed-off-by: Brian Gerst
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Brian Gerst
     

13 Jan, 2010

2 commits

  • Use a macro to define the cache sizes when cachesize > 1 MB.

    This is less typing, and less prone to introducing bugs like we
    saw in e02e0e1a130b9ca37c5186d38ad4b3aaf58bb149, and means we
    don't have to do maths when adding new non-power-of-2 updates
    like those seen recently.

    Signed-off-by: Dave Jones
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Dave Jones
     
  • This makes gcc use the right register names and instruction operand sizes
    automatically for the rwsem inline asm statements.

    So instead of using "(%%eax)" to specify the memory address that is the
    semaphore, we use "(%1)" or similar. And instead of forcing the operation
    to always be 32-bit, we use "%z0", taking the size from the actual
    semaphore data structure itself.

    This doesn't actually matter on x86-32, but if we want to use the same
    inline asm for x86-64, we'll need to have the compiler generate the proper
    64-bit names for the registers (%rax instead of %eax), and if we want to
    use a 64-bit counter too (in order to avoid the 15-bit limit on the
    write counter that limits concurrent users to 32767 threads), we'll need
    to be able to generate instructions with "q" accesses rather than "l".

    Since this header currently isn't enabled on x86-64, none of that matters,
    but we do want to use the xadd version of the semaphores rather than have
    to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
    when you have lots of threads all taking page faults, and the fallback
    rwsem code that uses a spinlock performs abysmally badly in that case.

    [ hpa: modified the patch to skip size suffixes entirely when they are
    redundant due to register operands. ]

    Signed-off-by: Linus Torvalds
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Linus Torvalds
     

08 Jan, 2010

3 commits


30 Dec, 2009

3 commits

  • In order to avoid unnecessary chains of branches, rather than
    implementing memcpy()/memset()'s access to their alternative
    implementations via a jump, patch the (larger) original function
    directly.

    The memcpy() part of this is slightly subtle: while alternative
    instruction patching does itself use memcpy(), with the
    replacement block being less than 64-bytes in size the main loop
    of the original function doesn't get used for copying memcpy_c()
    over memcpy(), and hence we can safely write over its beginning.

    Also note that the CFI annotations are fine for both variants of
    each of the functions.

    Signed-off-by: Jan Beulich
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • In order to avoid unnecessary chains of branches, rather than
    implementing copy_user_generic() as a function consisting of
    just a single (possibly patched) branch, instead properly deal
    with patching call instructions in the alternative instructions
    framework, and move the patching into the callers.

    As a follow-on, one could also introduce something like
    __EXPORT_SYMBOL_ALT() to avoid patching call sites in modules.

    Signed-off-by: Jan Beulich
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • The early ioremap fixmap entries cover half (or for 32-bit
    non-PAE, a quarter) of a page table, yet they got
    uncondtitionally aligned so far to a 256-entry boundary. This is
    not necessary if the range of page table entries anyway falls
    into a single page table.

    This buys back, for (theoretically) 50% of all configurations
    (25% of all non-PAE ones), at least some of the lowmem
    necessarily lost with commit e621bd18958ef5dbace3129ebe17a0a475e127d9.

    Signed-off-by: Jan Beulich
    Cc: Linus Torvalds
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

28 Dec, 2009

1 commit

  • Optimize hweight32 by using the same technique in hweight64.

    The proof of this technique can be found in the commit log for
    f9b4192923fa6e38331e88214b1fe5fc21583fcc ("bitops: hweight()
    speedup").

    The userspace benchmark on x86_32 showed 20% speedup with
    bitmap_weight() which uses hweight32 to count bits for each
    unsigned long on 32bit architectures.

    int main(void)
    {
    #define SZ (1024 * 1024 * 512)

    static DECLARE_BITMAP(bitmap, SZ) = {
    [0 ... 100] = 1,
    };

    return bitmap_weight(bitmap, SZ);
    }

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Cc: Linus Torvalds
    LKML-Reference:
    [ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
    Signed-off-by: Ingo Molnar

    Akinobu Mita
     

25 Dec, 2009

10 commits

  • Linus Torvalds
     
  • * 'sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6:
    SYSCTL: Add a mutex to the page_alloc zone order sysctl
    SYSCTL: Print binary sysctl warnings (nearly) only once

    Linus Torvalds
     
  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
    HWPOISON: Add PROC_FS dependency to hwpoison injector v2

    Linus Torvalds
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (34 commits)
    classmate-laptop: add support for Classmate PC ACPI devices
    hp-wmi: Fix two memleaks
    acer-wmi, msi-wmi: Remove needless DMI MODULE_ALIAS
    dell-wmi: do not keep driver loaded on unsupported boxes
    wmi: Free the allocated acpi objects through wmi_get_event_data
    drivers/platform/x86/acerhdf.c: check BIOS information whether it begins with string of table
    acerhdf: add new BIOS versions
    acerhdf: limit modalias matching to supported
    toshiba_acpi: convert to seq_file
    asus_acpi: convert to seq_file
    ACPI: do not select ACPI_DOCK from ATA_ACPI
    sony-laptop: enumerate rfkill devices using SN06
    sony-laptop: rfkill support for newer models
    ACPI: fix OSC regression that caused aer and pciehp not to load
    MAINTAINERS: add maintainer for msi-wmi driver
    fujitu-laptop: fix tests of acpi_evaluate_integer() return value
    arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c: avoid cross-CPU interrupts by using smp_call_function_any()
    ACPI: processor: remove _PDC object list from struct acpi_processor
    ACPI: processor: change acpi_processor_set_pdc() interface
    ACPI: processor: open code acpi_processor_cleanup_pdc
    ...

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
    ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c
    ocfs2/trivial: Use proper mask for 2 places in hearbeat.c
    Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink.
    Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode?
    ocfs2: Use FIEMAP_EXTENT_SHARED
    fiemap: Add new extent flag FIEMAP_EXTENT_SHARED
    ocfs2: replace u8 by __u8 in ocfs2_fs.h
    ocfs2: explicit declare uninitialized var in user_cluster_connect()
    ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock()
    ocfs2: return -EAGAIN instead of EAGAIN in dlm
    ocfs2/cluster: Make fence method configurable - v2
    ocfs2: Set MS_POSIXACL on remount
    ocfs2: Make acl use the default
    ocfs2: Always include ACL support

    Linus Torvalds
     
  • * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm:
    VIDEO: cyberpro: pci_request_regions needs a persistent name
    ARM: dma-isa: request cascade channel after registering it
    ARM: footbridge: trim down old ISA rtc setup
    ARM: fix PAGE_KERNEL
    ARM: Fix wrong shared bit for CPU write buffer bug test
    ARM: 5857/1: ARM: dmabounce: fix build
    ARM: 5856/1: Fix bug of uart0 platfrom data for nuc900
    ARM: 5855/1: putc support for nuc900
    ARM: 5854/1: fix compiling error for NUC900
    ARM: 5849/1: ARMv7: fix Oprofile events count
    ARM: add missing include to nwflash.c
    ARM: Kill CONFIG_CPU_32
    ARM: Convert VFP/Crunch/XscaleCP thread_release() to exit_thread()
    ARM: 5853/1: ARM: Fix build break on ARM v6 and v7

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
    edac, pci: remove pesky debug printk
    amd64_edac: restrict PCI config space access
    amd64_edac: fix forcing module load/unload
    amd64_edac: make driver loading more robust
    amd64_edac: fix driver instance freeing
    amd64_edac: fix K8 chip select reporting

    Linus Torvalds
     
  • * 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Ensure all PG_dcache_dirty pages are written back.
    sh: mach-ecovec24: setup.c detailed correction
    serial: sh-sci: Convert tremaining ctrl_xxx I/O routines to __raw_xxx.
    serial: sh-sci: earlyprintk zero uartclk fix
    sh: Only use bl bit toggling for sleeping idle.
    sh: Restore bl bit toggling in idle loop.
    sh: Fix up MAX_DMA_CHANNELS definition when DMA is disabled.
    sh: dmaengine support for SH7785
    sh: dmaengine support for sh7724.

    Linus Torvalds
     
  • Don't pass a name pointer from the kernel stack, it will not survive
    and will result in corrupted /proc/iomem output.

    Signed-off-by: Russell King

    Russell King
     
  • We can't request the cascade channel before it's been registered, so
    move it afterwards.

    Signed-off-by: Russell King

    Russell King
     

24 Dec, 2009

18 commits