26 Jun, 2006

8 commits

  • I ran into a bug where the kernel died in the idr code:

    cpu 0x1d: Vector: 300 (Data Access) at [c000000b7096f710]
    pc: c0000000001f8984: .idr_get_new_above_int+0x140/0x330
    lr: c0000000001f89b4: .idr_get_new_above_int+0x170/0x330
    sp: c000000b7096f990
    msr: 800000000000b032
    dar: 0
    dsisr: 40010000
    current = 0xc000000b70d43830
    paca = 0xc000000000556900
    pid = 2022, comm = hwup
    1d:mon> t
    [c000000b7096f990] c0000000000d2ad8 .expand_files+0x2e8/0x364 (unreliable)
    [c000000b7096faa0] c0000000001f8bf8 .idr_get_new_above+0x18/0x68
    [c000000b7096fb20] c00000000002a054 .init_new_context+0x5c/0xf0
    [c000000b7096fbc0] c000000000049dc8 .copy_process+0x91c/0x1404
    [c000000b7096fcd0] c00000000004a988 .do_fork+0xd8/0x224
    [c000000b7096fdc0] c00000000000ebdc .sys_clone+0x5c/0x74
    [c000000b7096fe30] c000000000008950 .ppc_clone+0x8/0xc

    Sonny Rao
     
  • Implement kasprintf, a kernel version of asprintf. This allocates the
    memory required for the formatted string, including the trailing '\0'.
    Returns NULL on allocation failure.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     
  • This change allows callers to use a 0-byte buffer and a NULL buffer pointer
    with vsnprintf, so it can be used to determine how large the resulting
    formatted string will be.

    Previously the code effectively treated a size of 0 as a size of 4G (on
    32-bit systems), with other checks preventing it from actually trying to
    emit the string - but the terminal \0 would still be written, which would
    crash if the buffer is NULL.

    This change changes the boundary check so that 'end' points to the putative
    location of the terminal '\0', which is only written if size > 0.

    vsnprintf still allows the buffer size to be set very large, to allow
    unbounded buffer sizes (to implement sprintf, etc).

    [akpm@osdl.org: fix long-vs-longlong confusion]
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     
  • Fix kernel-doc formatting in Reed-Solomon code.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Make kernel-doc corrections & additions to lib/crc*.c. Add crc functions to
    kernel-api.tmpl in DocBook.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Make corrections/fixes to kernel-doc in lib/bitmap.c and include it in DocBook
    template.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • In radix_tree_tag_get(), return normalized value of 0/1, as indicated
    by its comment.

    Signed-off-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • constify a medium-large CRC code table.

    Signed-off-by: Andreas Mohr
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Mohr
     

23 Jun, 2006

8 commits

  • Add a new strstrip() function to lib/string.c for removing leading and
    trailing whitespace from a string.

    Cc: Michael Holzheu
    Acked-by: Ingo Oeser
    Acked-by: Joern Engel
    Cc: Corey Minyard
    Signed-off-by: Pekka Enberg
    Acked-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • The percpu counter data type are changed in this set of patches to support
    more users like ext3 who need more than 32 bit to store the free blocks
    total in the filesystem.

    - Generic perpcu counters data type changes. The size of the global counter
    and local counter were explictly specified using s64 and s32. The global
    counter is changed from long to s64, while the local counter is changed from
    long to s32, so we could avoid doing 64 bit update in most cases.

    - Users of the percpu counters are updated to make use of the new
    percpu_counter_init() routine now taking an additional parameter to allow
    users to pass the initial value of the global counter.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • - Move percpu_counter routines from mm/swap.c to lib/percpu_counter.c

    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     
  • The comment states: 'Setting a tag on a not-present item is a BUG.' Hence
    if 'index' is larger than the maxindex; the item _cannot_ be presen; it
    should also be a BUG.

    Also, this allows the following statement (assume a fresh tree):

    radix_tree_tag_set(root, 16, 1);

    to fail silently, but when preceded by:

    radix_tree_insert(root, 32, item);

    it would BUG, because the height has been extended by the insert.

    In neither case was 16 present.

    Signed-off-by: Peter Zijlstra
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Reduce radix tree node memory usage by about a factor of 4 for small files
    (< 64K). There are pointer traversal and memory usage costs for large
    files with dense pagecache.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • The ability to have height 0 radix trees (a direct pointer to the data item
    rather than going through a full node->slot) quietly disappeared with
    old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd. On 64-bit
    machines this causes nearly 600 bytes to be used for every gfp_mask bits.

    Simplify radix_tree_delete's complex tag clearing arrangement (which would
    become even more complex) by just falling back to tag clearing functions
    (the pagecache radix-tree never uses this path anyway, so the icache
    savings will mean it's actually a speedup).

    On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in
    pagecache.

    Pagecache lookup, insertion, and removal speed for small files will also be
    improved.

    This makes RCU radix tree harder, but it's worth it.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Modify the gen_pool allocator (lib/genalloc.c) to utilize a bitmap scheme
    instead of the buddy scheme. The purpose of this change is to eliminate
    the touching of the actual memory being allocated.

    Since the change modifies the interface, a change to the uncached allocator
    (arch/ia64/kernel/uncached.c) is also required.

    Both Andrey Volkov and Jes Sorenson have expressed a desire that the
    gen_pool allocator not write to the memory being managed. See the
    following:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113518602713125&w=2
    http://marc.theaimsgroup.com/?l=linux-kernel&m=113533568827916&w=2

    Signed-off-by: Dean Nelson
    Cc: Andrey Volkov
    Acked-by: Jes Sorensen
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     
  • Upgrade the zlib_inflate implementation in the kernel from a patched
    version 1.1.3/4 to a patched 1.2.3.

    The code in the kernel is about seven years old and I noticed that the
    external zlib library's inflate performance was significantly faster (~50%)
    than the code in the kernel on ARM (and faster again on x86_32).

    For comparison the newer deflate code is 20% slower on ARM and 50% slower
    on x86_32 but gives an approx 1% compression ratio improvement. I don't
    consider this to be an improvement for kernel use so have no plans to
    change the zlib_deflate code.

    Various changes have been made to the zlib code in the kernel, the most
    significant being the extra functions/flush option used by ppp_deflate.
    This update reimplements the features PPP needs to ensure it continues to
    work.

    This code has been tested on ARM under both JFFS2 (with zlib compression
    enabled) and ppp_deflate and on x86_32. JFFS2 sees an approx. 10% real
    world file read speed improvement.

    This patch also removes ZLIB_VERSION as it no longer has a correct value.
    We don't need version checks anyway as the kernel's module handling will
    take care of that for us. This removal is also more in keeping with the
    zlib author's wishes (http://www.zlib.net/zlib_faq.html#faq24) and I've
    added something to the zlib.h header to note its a modified version.

    Signed-off-by: Richard Purdie
    Acked-by: Joern Engel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Purdie
     

22 Jun, 2006

1 commit


21 Jun, 2006

2 commits

  • Introduce __iowrite64_copy. It will be used by the Myri-10G Ethernet
    driver to post requests to the NIC. This driver will be submitted soon.

    __iowrite64_copy copies to I/O memory in units of 64 bits when possible (on
    64 bit architectures). It reverts to __iowrite32_copy on 32 bit
    architectures.

    Signed-off-by: Brice Goglin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brice Goglin
     
  • * git://git.infradead.org/~dwmw2/rbtree-2.6:
    [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency
    Update UML kernel/physmem.c to use rb_parent() accessor macro
    [RBTREE] Update hrtimers to use rb_parent() accessor macro.
    [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node.
    [RBTREE] Merge colour and parent fields of struct rb_node.
    [RBTREE] Remove dead code in rb_erase()
    [RBTREE] Update JFFS2 to use rb_parent() accessor macro.
    [RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
    [RBTREE] Update key.c to use rb_parent() accessor macro.
    [RBTREE] Update ext3 to use rb_parent() accessor macro.
    [RBTREE] Change rbtree off-tree marking in I/O schedulers.
    [RBTREE] Add accessor macros for colour and parent fields of rb_node

    Linus Torvalds
     

06 Jun, 2006

1 commit


22 May, 2006

1 commit

  • People don't like released kernels yelling at them, no matter how real the
    error might be. So only report it if CONFIG_KOBJECT_DEBUG is enabled.

    Sent on request of Andrew Morton.

    (akpm: should bring this back post-2.6.17)

    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Kroah-Hartman
     

13 May, 2006

1 commit


28 Apr, 2006

2 commits

  • This patch contains the following possible cleanups:
    - #if 0 the following unused global function:
    - subsys_remove_file()
    - remove the following unused EXPORT_SYMBOL's:
    - kset_find_obj
    - subsystem_init
    - remove the following unused EXPORT_SYMBOL_GPL:
    - kobject_add_dir

    Signed-off-by: Adrian Bunk
    Signed-off-by: Greg Kroah-Hartman

    Adrian Bunk
     
  • This fixes a build error for various odd combinations of CONFIG_HOTPLUG
    and CONFIG_NET.

    Signed-off-by: Kay Sievers
    Cc: Nigel Cunningham
    Cc: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

21 Apr, 2006

2 commits

  • We only used a single bit for colour information, so having a whole
    machine word of space allocated for it was a bit wasteful. Instead,
    store it in the lowest bit of the 'parent' pointer, since that was
    always going to be aligned anyway.

    Signed-off-by: David Woodhouse

    David Woodhouse
     
  • Observe rb_erase(), when the victim node 'old' has two children so
    neither of the simple cases at the beginning are taken.

    Observe that it effectively does an 'rb_next()' operation to find the
    next (by value) node in the tree. That is; we go to the victim's
    right-hand child and then follow left-hand pointers all the way
    down the tree as far as we can until we find the next node 'node'. We
    end up with 'node' being either the same immediate right-hand child of
    'old', or one of its descendants on the far left-hand side.

    For a start, we _know_ that 'node' has a parent. We can drop that check.

    We also know that if 'node's parent is 'old', then 'node' is the
    right-hand child of its parent. And that if 'node's parent is _not_
    'old', then 'node' is the left-hand child of its parent.

    So instead of checking for 'node->rb_parent == old' in one place and
    also checking 'node's heritage separately when we're trying to change
    its link from its parent, we can shuffle things around a bit and do
    it like this...

    Signed-off-by: David Woodhouse

    David Woodhouse
     

20 Apr, 2006

1 commit

  • DEBUG_MUTEX flag is on by default in current kernel configuration.

    During performance testing, we saw mutex debug functions like
    mutex_debug_check_no_locks_freed (called by kfree()) is expensive as it
    goes through a global list of memory areas with mutex lock and do the
    checking. For benchmarks such as Volanomark and Hackbench, we have seen
    more than 40% drop in performance on some platforms. We suggest to set
    DEBUG_MUTEX off by default. Or at least do that later when we feel that
    the mutex changes in the current code have stabilized.

    Signed-off-by: Tim Chen
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Chen
     

15 Apr, 2006

1 commit

  • It works like this:
    Open the file
    Read all the contents.
    Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
    When poll returns,
    close the file and go to top of loop.
    or lseek to start of file and go back to the 'read'.

    Events are signaled by an object manager calling
    sysfs_notify(kobj, dir, attr);

    If the dir is non-NULL, it is used to find a subdirectory which
    contains the attribute (presumably created by sysfs_create_group).

    This has a cost of one int per attribute, one wait_queuehead per kobject,
    one int per open file.

    The name "sysfs_notify" may be confused with the inotify
    functionality. Maybe it would be nice to support inotify for sysfs
    attributes as well?

    This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
    to be pollable

    Signed-off-by: Neil Brown
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     

11 Apr, 2006

3 commits

  • lib/string.c: In function 'memcpy':
    lib/string.c:470: warning: initialization discards qualifiers from pointer =
    target type

    Signed-off-by: Jan-Benedict Glaw
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan-Benedict Glaw
     
  • Some string functions were safely overrideable in lib/string.c, but their
    corresponding declarations in linux/string.h were not. Correct this, and
    make strcspn overrideable.

    Odds of someone wanting to do optimized assembly of these are small, but
    for the sake of cleanliness, might as well bring them into line with the
    rest of the file.

    Signed-off-by: Kyle McMartin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyle McMartin
     
  • While cleaning up parisc_ksyms.c earlier, I noticed that strpbrk wasn't
    being exported from lib/string.c. Investigating further, I noticed a
    changeset that removed its export and added it to _ksyms.c on a few more
    architectures. The justification was that "other arches do it."

    I think this is wrong, since no architecture currently defines
    __HAVE_ARCH_STRPBRK, there's no reason for any of them to be exporting it
    themselves. Therefore, consolidate the export to lib/string.c.

    Signed-off-by: Kyle McMartin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyle McMartin
     

31 Mar, 2006

1 commit


27 Mar, 2006

6 commits

  • We use it generally now, at least blktrace isn't a specific debug
    kernel feature.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • wrote:

    This is an extremely well-known technique. You can see a similar version that
    uses a multiply for the last few steps at
    http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel whch
    refers to "Software Optimization Guide for AMD Athlon 64 and Opteron
    Processors"
    http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

    It's section 8.6, "Efficient Implementation of Population-Count Function in
    32-bit Mode", pages 179-180.

    It uses the name that I am more familiar with, "popcount" (population count),
    although "Hamming weight" also makes sense.

    Anyway, the proof of correctness proceeds as follows:

    b = a - ((a >> 1) & 0x55555555);
    c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
    d = (c + (c >> 4)) & 0x0f0f0f0f;
    #if SLOW_MULTIPLY
    e = d + (d >> 8)
    f = e + (e >> 16);
    return f & 63;
    #else
    /* Useful if multiply takes at most 4 cycles */
    return (d * 0x01010101) >> 24;
    #endif

    The input value a can be thought of as 32 1-bit fields each holding their own
    hamming weight. Now look at it as 16 2-bit fields. Each 2-bit field a1..a0
    has the value 2*a1 + a0. This can be converted into the hamming weight of the
    2-bit field a1+a0 by subtracting a1.

    That's what the (a >> 1) & mask subtraction does. Since there can be no
    borrows, you can just do it all at once.

    Enumerating the 4 possible cases:

    0b00 = 0 -> 0 - 0 = 0
    0b01 = 1 -> 1 - 0 = 1
    0b10 = 2 -> 2 - 1 = 1
    0b11 = 3 -> 3 - 1 = 2

    The next step consists of breaking up b (made of 16 2-bir fields) into
    even and odd halves and adding them into 4-bit fields. Since the largest
    possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    "which will not fit into a 2-bit field"

    fields have to be masked before they are added.

    After this point, the masking can be delayed. Each 4-bit field holds a
    population count from 0..4, taking at most 3 bits. These numbers can be added
    without overflowing a 4-bit field, so we can compute c + (c >> 4), and only
    then mask off the unwanted bits.

    This produces d, a number of 4 8-bit fields, each in the range 0..8. From
    this point, we can shift and add d multiple times without overflowing an 8-bit
    field, and only do a final mask at the end.

    The number to mask with has to be at least 63 (so that 32 on't be truncated),
    but can also be 128 or 255. The x86 has a special encoding for signed
    immediate byte values -128..127, so the value of 255 is slower. On other
    processors, a special "sign extend byte" instruction might be faster.

    On a processor with fast integer multiplies (Athlon but not P4), you can
    reduce the final few serially dependent instructions to a single integer
    multiply. Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the
    range 0..8. The multiply forms the partial products:

    d3 d2 d1 d0
    d3 d2 d1 d0
    d3 d2 d1 d0
    + d3 d2 d1 d0
    ----------------------
    e3 e2 e1 e0

    Where e3 = d3 + d2 + d1 + d0. e2, e1 and e0 obviously cannot generate
    any carries.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • By defining generic hweight*() routines

    - hweight64() will be defined on all architectures
    - hweight_long() will use architecture optimized hweight32() or hweight64()

    I found two possible cleanups by these reasons.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch introduces the C-language equivalents of the functions below:

    int ext2_set_bit(int nr, volatile unsigned long *addr);
    int ext2_clear_bit(int nr, volatile unsigned long *addr);
    int ext2_test_bit(int nr, const volatile unsigned long *addr);
    unsigned long ext2_find_first_zero_bit(const unsigned long *addr,
    unsigned long size);
    unsinged long ext2_find_next_zero_bit(const unsigned long *addr,
    unsigned long size);

    In include/asm-generic/bitops/ext2-non-atomic.h

    This code largely copied from:

    include/asm-powerpc/bitops.h
    include/asm-parisc/bitops.h

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch introduces the C-language equivalents of the functions below:

    unsigned int hweight32(unsigned int w);
    unsigned int hweight16(unsigned int w);
    unsigned int hweight8(unsigned int w);
    unsigned long hweight64(__u64 w);

    In include/asm-generic/bitops/hweight.h

    This code largely copied from: include/linux/bitops.h

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch introduces the C-language equivalents of the functions below:

    unsigned logn find_next_bit(const unsigned long *addr, unsigned long size,
    unsigned long offset);
    unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
    unsigned long offset);
    unsigned long find_first_zero_bit(const unsigned long *addr,
    unsigned long size);
    unsigned long find_first_bit(const unsigned long *addr, unsigned long size);

    In include/asm-generic/bitops/find.h

    This code largely copied from: arch/powerpc/lib/bitops.c

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

26 Mar, 2006

2 commits

  • DEBUG_KERNEL is often enabled just for sysrq, but this doesn't
    mean the user wants more heavyweight debugging information.

    Cc: jbeulich@novell.com

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
    BUG_ON() Conversion in drivers/video/
    BUG_ON() Conversion in drivers/parisc/
    BUG_ON() Conversion in drivers/block/
    BUG_ON() Conversion in sound/sparc/cs4231.c
    BUG_ON() Conversion in drivers/s390/block/dasd.c
    BUG_ON() Conversion in lib/swiotlb.c
    BUG_ON() Conversion in kernel/cpu.c
    BUG_ON() Conversion in ipc/msg.c
    BUG_ON() Conversion in block/elevator.c
    BUG_ON() Conversion in fs/coda/
    BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
    BUG_ON() Conversion in input/serio/hil_mlc.c
    BUG_ON() Conversion in md/dm-hw-handler.c
    BUG_ON() Conversion in md/bitmap.c
    The comment describing how MS_ASYNC works in msync.c is confusing
    rcu: undeclared variable used in documentation
    fix typos "wich" -> "which"
    typo patch for fs/ufs/super.c
    Fix simple typos
    tabify drivers/char/Makefile
    ...

    Linus Torvalds