01 Oct, 2009

2 commits

  • Conditionaly compile cmpxchg8b_emu.o and EXPORT_SYMBOL(cmpxchg8b_emu).

    This reduces the kernel size a bit.

    Signed-off-by: Eric Dumazet
    Cc: Arjan van de Ven
    Cc: Martin Schwidefsky
    Cc: John Stultz
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     
  • cmpxchg64() today generates, to quote Linus, "barf bag" code.

    cmpxchg64() is about to get used in the scheduler to fix a bug there,
    but it's a prerequisite that cmpxchg64() first be made non-sucking.

    This patch turns cmpxchg64() into an efficient implementation that
    uses the alternative() mechanism to just use the raw instruction on
    all modern systems.

    Note: the fallback is NOT smp safe, just like the current fallback
    is not SMP safe. (Interested parties with i486 based SMP systems
    are welcome to submit fix patches for that.)

    Signed-off-by: Arjan van de Ven
    Acked-by: Linus Torvalds
    [ fixed asm constraint bug ]
    Fixed-by: Eric Dumazet
    Cc: Martin Schwidefsky
    Cc: John Stultz
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

05 Sep, 2009

1 commit

  • Change msr-reg.o to obj-y (it will be included in virtually every
    kernel since it is used by the initialization code for AMD processors)
    and add a separate C file to export its symbols to modules, so that
    msr.ko can use them; on uniprocessors we bypass the helper functions
    in msr.o and use the accessor functions directly via inlines.

    Signed-off-by: H. Peter Anvin
    LKML-Reference:
    Cc: Borislav Petkov

    H. Peter Anvin
     

04 Sep, 2009

1 commit

  • The macro was defined in the 32-bit path as well - breaking the
    build on 32-bit platforms:

    arch/x86/lib/msr-reg.S: Assembler messages:
    arch/x86/lib/msr-reg.S:53: Error: Bad macro parameter list
    arch/x86/lib/msr-reg.S:100: Error: invalid character '_' in mnemonic
    arch/x86/lib/msr-reg.S:101: Error: invalid character '_' in mnemonic

    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Sep, 2009

1 commit


01 Sep, 2009

3 commits


04 Aug, 2009

1 commit


11 Jul, 2009

3 commits

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cfq-iosched: reset oom_cfqq in cfq_set_request()
    block: fix sg SG_DXFER_TO_FROM_DEV regression
    block: call blk_scsi_ioctl_init()
    Fix congestion_wait() sync/async vs read/write confusion

    Linus Torvalds
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    perf report: Add "Fractal" mode output - support callchains with relative overhead rate
    perf_counter tools: callchains: Manage the cumul hits on the fly
    perf report: Change default callchain parameters
    perf report: Use a modifiable string for default callchain options
    perf report: Warn on callchain output request from non-callchain file
    x86: atomic64: Inline atomic64_read() again
    x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
    x86: atomic64: Improve atomic64_xchg()
    x86: atomic64: Export APIs to modules
    x86: atomic64: Improve atomic64_read()
    x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
    x86: atomic64: Fix unclean type use in atomic64_xchg()
    x86: atomic64: Make atomic_read() type-safe
    x86: atomic64: Reduce size of functions
    x86: atomic64: Improve atomic64_add_return()
    x86: atomic64: Improve cmpxchg8b()
    x86: atomic64: Improve atomic64_read()
    x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
    x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
    perf report: Annotate variable initialization
    ...

    Linus Torvalds
     
  • Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke
    the bdi congestion wait queue logic, causing us to wait on congestion
    for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

04 Jul, 2009

4 commits

  • Now atomic64_read() is light weight (no register pressure and
    small icache), we can inline it again.

    Also use "=&A" constraint instead of "+A" to avoid warning
    about unitialized 'res' variable. (gcc had to force 0 in eax/edx)

    $ size vmlinux.prev vmlinux.after
    text data bss dec hex filename
    4908667 451676 1684868 7045211 6b805b vmlinux.prev
    4908651 451676 1684868 7045195 6b804b vmlinux.after

    Signed-off-by: Eric Dumazet
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    [ Also fix typo in atomic64_set() export ]
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     
  • Linus noticed that the variable name 'old_val' is
    confusingly named in these functions - the correct
    naming is 'new_val'.

    Reported-by: Linus Torvalds
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Remove the read-first logic from atomic64_xchg() and simplify
    the loop.

    This function was the last user of __atomic64_read() - remove it.

    Also, change the 'real_val' assumption from the somewhat quirky
    1ULL << 32 value to the (just as arbitrary, but simpler) value
    of 0.

    Reported-by: Linus Torvalds
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • atomic64_t primitives are used by a handful of drivers,
    so export the APIs consistently. These were inlined
    before.

    Also mark atomic64_32.o a core object, so that the symbols
    are available even if not linked to core kernel pieces.

    Cc: Eric Dumazet
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

03 Jul, 2009

8 commits

  • Optimize atomic64_read() as a special open-coded
    cmpxchg8b variant. This generates nicer code:

    arch/x86/lib/atomic64_32.o:

    text data bss dec hex filename
    435 0 0 435 1b3 atomic64_32.o.before
    431 0 0 431 1af atomic64_32.o.after

    md5:
    bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
    2bdfd4bd1f6b7b61b7fc127aef90ce3b atomic64_32.o.after.asm

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     
  • While examining symbol generation in perf_counter tools, I
    noticed that copy_to_user() had no size in vmlinux's symtab.

    Signed-off-by: Mike Galbraith
    Acked-by: Alexander van Heukelum
    Acked-by: Cyrill Gorcunov
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Linus noticed that atomic64_xchg() uses atomic_read(), which
    happens to work because atomic_read() is a macro so the
    .counter value gets u64-read on 32-bit too - but this is really
    bogus and serious bugs are waiting to happen.

    Fix atomic64_xchg() to use __atomic64_read() instead.

    No code changed:

    arch/x86/lib/atomic64_32.o:

    text data bss dec hex filename
    435 0 0 435 1b3 atomic64_32.o.before
    435 0 0 435 1b3 atomic64_32.o.after

    md5:
    bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.before.asm
    bd8ab95e69c93518578bfaf0ea3be4d9 atomic64_32.o.after.asm

    Reported-by: Linus Torvalds
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • cmpxchg8b is a huge instruction in terms of register footprint,
    we almost never want to inline it, not even within the same
    code module.

    GCC 4.3 still messes up for two functions, under-judging the
    true cost of this instruction - so annotate two key functions
    to reduce the bloat:

    arch/x86/lib/atomic64_32.o:

    text data bss dec hex filename
    1763 0 0 1763 6e3 atomic64_32.o.before
    435 0 0 435 1b3 atomic64_32.o.after

    Cc: Linus Torvalds
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Linus noted (based on Eric Dumazet's numbers) that we would
    probably be better off not trying an atomic_read() in
    atomic64_add_return() but intead intentionally let the first
    cmpxchg8b fail - to get a cache-friendly 'give me ownership
    of this cacheline' transaction. That can then be followed
    by the real cmpxchg8b which sets the value local to the CPU.

    Reported-by: Linus Torvalds
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Rewrite cmpxchg8b() to not use %edi register but a generic "+m"
    constraint, to increase compiler freedom in code generation and
    possibly better code.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     
  • Linus noticed that the 32-bit version of atomic64_read() was
    being overly complex with re-reading the value and doing a
    retry loop over that.

    Instead we can just rely on cmpxchg8b returning either the new
    value or returning the current value.

    We can use any 'old' value, which will be faster as it can be
    loaded via immediates. Using some value that is not equal to
    the real value in memory the instruction gets faster.

    This also has the advantage that the CPU could avoid dirtying
    the cacheline.

    Reported-by: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     
  • Linus noted that the atomic64_t primitives are all inlines
    currently which is crazy because these functions have a large
    register footprint anyway.

    Move them to a separate file: arch/x86/lib/atomic64_32.c

    Also, while at it, rename all uses of 'unsigned long long' to
    the much shorter u64.

    This makes the appearance of the prototypes a lot nicer - and
    it also uncovered a few bugs where (yet unused) API variants
    had 'long' as their return type instead of u64.

    [ More intrusive changes are not yet done in this patch. ]

    Reported-by: Linus Torvalds
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: David Howells
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

01 Jul, 2009

1 commit


26 Jun, 2009

1 commit

  • delay_tsc needs rdtsc_barrier to provide proper delay.

    Output from a test driver using hpet to cross check delay
    provided by udelay().

    Before:
    [ 86.794363] Expected delay 5us actual 4679ns
    [ 87.154362] Expected delay 5us actual 698ns
    [ 87.514162] Expected delay 5us actual 4539ns
    [ 88.653716] Expected delay 5us actual 4539ns
    [ 94.664106] Expected delay 10us actual 9638ns
    [ 95.049351] Expected delay 10us actual 10126ns
    [ 95.416110] Expected delay 10us actual 9568ns
    [ 95.799216] Expected delay 10us actual 9638ns
    [ 103.624104] Expected delay 10us actual 9707ns
    [ 104.020619] Expected delay 10us actual 768ns
    [ 104.419951] Expected delay 10us actual 9707ns

    After:
    [ 50.983320] Expected delay 5us actual 5587ns
    [ 51.261807] Expected delay 5us actual 5587ns
    [ 51.565715] Expected delay 5us actual 5657ns
    [ 51.861171] Expected delay 5us actual 5587ns
    [ 52.164704] Expected delay 5us actual 5726ns
    [ 52.487457] Expected delay 5us actual 5657ns
    [ 52.789338] Expected delay 5us actual 5726ns
    [ 57.119680] Expected delay 10us actual 10755ns
    [ 57.893997] Expected delay 10us actual 10615ns
    [ 58.261287] Expected delay 10us actual 10755ns
    [ 58.620505] Expected delay 10us actual 10825ns
    [ 58.941035] Expected delay 10us actual 10755ns
    [ 59.320903] Expected delay 10us actual 10615ns
    [ 61.306311] Expected delay 10us actual 10755ns
    [ 61.520542] Expected delay 10us actual 10615ns

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: H. Peter Anvin

    Pallipadi, Venkatesh
     

21 Jun, 2009

1 commit

  • The discussion about using "access_ok()" in get_user_pages_fast() (see
    commit 7f8189068726492950bf1a2dcfd9b51314560abf: "x86: don't use
    'access_ok()' as a range check in get_user_pages_fast()" for details and
    end result), made us notice that x86-64 was really being very sloppy
    about virtual address checking.

    So be way more careful and straightforward about masking x86-64 virtual
    addresses:

    - All the VIRTUAL_MASK* variants now cover half of the address
    space, it's not like we can use the full mask on a signed
    integer, and the larger mask just invites mistakes when
    applying it to either half of the 48-bit address space.

    - /proc/kcore's kc_offset_to_vaddr() becomes a lot more
    obvious when it transforms a file offset into a
    (kernel-half) virtual address.

    - Unify/simplify the 32-bit and 64-bit USER_DS definition to
    be based on TASK_SIZE_MAX.

    This cleanup and more careful/obvious user virtual address checking also
    uncovered a buglet in the x86-64 implementation of strnlen_user(): it
    would do an "access_ok()" check on the whole potential area, even if the
    string itself was much shorter, and thus return an error even for valid
    strings. Our sloppy checking had hidden this.

    So this fixes 'strnlen_user()' to do this properly, the same way we
    already handled user strings in 'strncpy_from_user()'. Namely by just
    checking the first byte, and then relying on fault handling for the
    rest. That always works, since we impose a guard page that cannot be
    mapped at the end of the user space address space (and even if we
    didn't, we'd have the address space hole).

    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Nick Piggin
    Cc: Hugh Dickins
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Alan Cox
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Jun, 2009

2 commits

  • Provide for concurrent MSR writes on all the CPUs in the cpumask. Also,
    add a temporary workaround for smp_call_function_many which skips the
    CPU we're executing on.

    Bart: zero out rv struct which is allocated on stack.

    CC: H. Peter Anvin
    Signed-off-by: Borislav Petkov
    Signed-off-by: Bartlomiej Zolnierkiewicz

    Borislav Petkov
     
  • Add a struct representing a 64bit MSR pair consisting of a low and high
    register part and convert msr_info to use it. Also, rename msr-on-cpu.c
    to msr.c.

    Side note: Put the cpumask.h include in __KERNEL__ space thus fixing an
    allmodconfig build failure in the headers_check target.

    CC: H. Peter Anvin
    Signed-off-by: Borislav Petkov

    Borislav Petkov
     

12 Mar, 2009

2 commits


14 Feb, 2009

1 commit


21 Jan, 2009

1 commit

  • Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions

    Hugh Dickins noticed that strncpy_from_user() was miscompiled
    in some circumstances with gcc 4.3.

    Thanks to Hugh's excellent analysis it was easy to track down.

    Hugh writes:

    > Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
    > except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
    > and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
    > might_fault() there, which hides the issue): using a
    > gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
    >
    > It generates the following:
    >
    > 0000000000000000 :
    > 0: 48 89 d1 mov %rdx,%rcx
    > 3: 48 85 c9 test %rcx,%rcx
    > 6: 74 0e je 16
    > 8: ac lods %ds:(%rsi),%al
    > 9: aa stos %al,%es:(%rdi)
    > a: 84 c0 test %al,%al
    > c: 74 05 je 13
    > e: 48 ff c9 dec %rcx
    > 11: 75 f5 jne 8
    > 13: 48 29 c9 sub %rcx,%rcx
    > 16: 48 89 c8 mov %rcx,%rax
    > 19: c3 retq
    >
    > Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
    > (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
    > Isn't it returning 0 when it ought to be returning strlen?

    The asm constraints for the strncpy_from_user() result were missing an
    early clobber, which tells gcc that the last output arguments
    are written before all input arguments are read.

    Also add more early clobbers in the rest of the file and fix 32-bit
    usercopy.c in the same way.

    Signed-off-by: Andi Kleen
    Signed-off-by: H. Peter Anvin
    [ since this API is rarely used and no in-kernel user relies on a 'len'
    return value (they only rely on negative return values) this miscompile
    was never noticed in the field. But it's worth fixing it nevertheless. ]
    Signed-off-by: Ingo Molnar

    Andi Kleen
     

28 Oct, 2008

1 commit


12 Oct, 2008

1 commit


12 Sep, 2008

1 commit


11 Sep, 2008

1 commit


10 Sep, 2008

1 commit

  • copy_to/from_user and all its variants (except the atomic ones) can take a
    page fault and perform non-trivial work like taking mmap_sem and entering
    the filesyste/pagecache.

    Unfortunately, this often escapes lockdep because a common pattern is to
    use it to read in some arguments just set up from userspace, or write data
    back to a hot buffer. In those cases, it will be unlikely for page reclaim
    to get a window in to cause copy_*_user to fault.

    With the new might_lock primitives, add some annotations to x86. I don't
    know if I caught all possible faulting points (it's a bit of a maze, and I
    didn't really look at 32-bit). But this is a starting point.

    Boots and runs OK so far.

    Signed-off-by: Nick Piggin
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Nick Piggin
     

04 Sep, 2008

1 commit

  • Impact: performance optimization

    I did some rebenchmarking with modern compilers and dropping
    -funroll-loops makes the function consistently go faster by a few
    percent. So drop that flag.

    Thanks to Richard Guenther for a hint.

    Signed-off-by: Andi Kleen
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     

28 Aug, 2008

1 commit