21 May, 2011

1 commit

  • Commit e66eed651fd1 ("list: remove prefetching from regular list
    iterators") removed the include of prefetch.h from list.h, which
    uncovered several cases that had apparently relied on that rather
    obscure header file dependency.

    So this fixes things up a bit, using

    grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
    grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

    to guide us in finding files that either need
    inclusion, or have it despite not needing it.

    There are more of them around (mostly network drivers), but this gets
    many core ones.

    Reported-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Mar, 2010

1 commit

  • * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Mark atomic irq ops raw for 32bit legacy
    x86: Merge show_regs()
    x86: Macroise x86 cache descriptors
    x86-32: clean up rwsem inline asm statements
    x86: Merge asm/atomic_{32,64}.h
    x86: Sync asm/atomic_32.h and asm/atomic_64.h
    x86: Split atomic64_t functions into seperate headers
    x86-64: Modify memcpy()/memset() alternatives mechanism
    x86-64: Modify copy_user_generic() alternatives mechanism
    x86: Lift restriction on the location of FIX_BTMAP_*
    x86, core: Optimize hweight32()

    Linus Torvalds
     

06 Jan, 2010

1 commit

  • Callers of copy_from_user() expect it to return the number of bytes
    it could not copy. In no case it is supposed to return -EFAULT.

    In case of a detected buffer overflow just return the requested
    length. In addition one could think of a memset that would clear
    the size of the target object.

    [ hpa: code is not in .32 so not needed for -stable ]

    Signed-off-by: Heiko Carstens
    Acked-by: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Heiko Carstens
     

30 Dec, 2009

1 commit

  • In order to avoid unnecessary chains of branches, rather than
    implementing copy_user_generic() as a function consisting of
    just a single (possibly patched) branch, instead properly deal
    with patching call instructions in the alternative instructions
    framework, and move the patching into the callers.

    As a follow-on, one could also introduce something like
    __EXPORT_SYMBOL_ALT() to avoid patching call sites in modules.

    Signed-off-by: Jan Beulich
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

16 Nov, 2009

1 commit

  • On x86-64, copy_[to|from]_user() rely on assembly routines that
    never call might_fault(), making us missing various lockdep
    checks.

    This doesn't apply to __copy_from,to_user() that explicitly
    handle these calls, neither is it a problem in x86-32 where
    copy_to,from_user() rely on the "__" prefixed versions that
    also call might_fault().

    Signed-off-by: Frederic Weisbecker
    Cc: Arjan van de Ven
    Cc: Linus Torvalds
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    LKML-Reference:
    [ v2: fix module export ]
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

15 Nov, 2009

1 commit

  • This v2.6.26 commit:

    ad2fc2c: x86: fix copy_user on x86

    rendered __copy_from_user_inatomic() identical to
    copy_user_generic(), yet didn't make the former just call the
    latter from an inline function.

    Furthermore, this v2.6.19 commit:

    b885808: [PATCH] Add proper sparse __user casts to __copy_to_user_inatomic

    converted the return type of __copy_to_user_inatomic() from
    unsigned long to int, but didn't do the same to
    __copy_from_user_inatomic().

    Signed-off-by: Jan Beulich
    Cc: Linus Torvalds
    Cc: Alexander Viro
    Cc: Arjan van de Ven
    Cc: Andi Kleen
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

26 Sep, 2009

1 commit

  • gcc (4.x) supports the __builtin_object_size() builtin, which
    reports the size of an object that a pointer point to, when known
    at compile time. If the buffer size is not known at compile time, a
    constant -1 is returned.

    This patch uses this feature to add a sanity check to
    copy_from_user(); if the target buffer is known to be smaller than
    the copy size, the copy is aborted and a WARNing is emitted in
    memory debug mode.

    These extra checks compile away when the object size is not known,
    or if both the buffer size and the copy length are constants.

    Signed-off-by: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

21 Jul, 2009

1 commit

  • arch/x86/include/asm/uaccess_64.h uses wrong asm operand constraint
    ("ir") for movq insn. Since movq sign-extends its immediate operand,
    "er" constraint should be used instead.

    Attached patch changes all uses of __put_user_asm in uaccess_64.h to use
    "er" when "q" insn suffix is involved.

    Patch was compile tested on x86_64 with defconfig.

    Signed-off-by: Uros Bizjak
    Signed-off-by: H. Peter Anvin
    Cc: stable@kernel.org

    Uros Bizjak
     

02 Mar, 2009

1 commit

  • Impact: standardize IO on cached ops

    On modern CPUs it is almost always a bad idea to use non-temporal stores,
    as the regression in this commit has shown it:

    30d697f: x86: fix performance regression in write() syscall

    The kernel simply has no good information about whether using non-temporal
    stores is a good idea or not - and trying to add heuristics only increases
    complexity and inserts fragility.

    The regression on cached write()s took very long to be found - over two
    years. So dont take any chances and let the hardware decide how it makes
    use of its caches.

    The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
    absolutely sure that another entity (the GPU) will pick up the dirty
    data immediately and that the CPU will not touch that data before the
    GPU will.

    Also, keep the _nocache() primitives to make it easier for people to
    experiment with these details. There may be more clear-cut cases where
    non-cached copies can be used, outside of filemap.c.

    Cc: Salman Qazi
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

25 Feb, 2009

3 commits

  • Impact: make more types of copies non-temporal

    This change makes the following simple fix:

    30d697f: x86: fix performance regression in write() syscall

    A bit more sophisticated: we check the 'total' number of bytes
    written to decide whether to copy in a cached or a non-temporal
    way.

    This will for example cause the tail (modulo 4096 bytes) chunk
    of a large write() to be non-temporal too - not just the page-sized
    chunks.

    Cc: Salman Qazi
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: cleanup, enable future change

    Add a 'total bytes copied' parameter to __copy_from_user_*nocache(),
    and update all the callsites.

    The parameter is not used yet - architecture code can use it to
    more intelligently decide whether the copy should be cached or
    non-temporal.

    Cc: Salman Qazi
    Cc: Nick Piggin
    Cc: Linus Torvalds
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • While the introduction of __copy_from_user_nocache (see commit:
    0812a579c92fefa57506821fa08e90f47cb6dbdd) may have been an improvement
    for sufficiently large writes, there is evidence to show that it is
    deterimental for small writes. Unixbench's fstime test gives the
    following results for 256 byte writes with MAX_BLOCK of 2000:

    2.6.29-rc6 ( 5 samples, each in KB/sec ):
    283750, 295200, 294500, 293000, 293300

    2.6.29-rc6 + this patch (5 samples, each in KB/sec):
    313050, 3106750, 293350, 306300, 307900

    2.6.18
    395700, 342000, 399100, 366050, 359850

    See w_test() in src/fstime.c in unixbench version 4.1.0. Basically, the above test
    consists of counting how much we can write in this manner:

    alarm(10);
    while (!sigalarm) {
    for (f_blocks = 0; f_blocks < 2000; ++f_blocks) {
    write(f, buf, 256);
    }
    lseek(f, 0L, 0);
    }

    Note, there are other components to the write syscall regression
    that are not addressed here.

    Signed-off-by: Salman Qazi
    Cc: Linus Torvalds
    Signed-off-by: Ingo Molnar

    Salman Qazi
     

25 Nov, 2008

1 commit


19 Nov, 2008

1 commit


28 Oct, 2008

1 commit


23 Oct, 2008

2 commits