26 Oct, 2012

1 commit

  • The genalloc code uses the bitmap API from include/linux/bitmap.h and
    lib/bitmap.c, which is based on long values. Both bitmap_set from
    lib/bitmap.c and bitmap_set_ll, which is the lockless version from
    genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
    the bitmap.

    That one uses (1 << bits) - 1, 0b111, if you are setting the first three
    bits. This means that the API counts from the least significant bits
    (LSB from now on) to the MSB. The LSB in the first long is bit 0, then.
    The same works for the lookup functions.

    The genalloc code uses longs for the bitmap, as it should. In
    include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
    bits[0] as its last member. When allocating the struct, genalloc should
    reserve enough space for the bitmap. This should be a proper number of
    longs that can fit the amount of bits in the bitmap.

    However, genalloc allocates an integer number of bytes that fit the
    amount of bits, but may not be an integer amount of longs. 9 bytes, for
    example, could be allocated for 70 bits.

    This is a problem in itself if the Least Significat Bit in a long is in
    the byte with the largest address, which happens in Big Endian machines.
    This means genalloc is not allocating the byte in which it will try to
    set or check for a bit.

    This may end up in memory corruption, where genalloc will try to set the
    bits it has not allocated. In fact, genalloc may not set these bits
    because it may find them already set, because they were not zeroed since
    they were not allocated. And that's what causes a BUG when
    gen_pool_destroy is called and check for any set bits.

    What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
    on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab
    will be cleared, not only the requested bytes. Since struct
    gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
    multiples of 8, we get lucky and allocate and clear the right amount of
    bytes.

    Hower, this is not the case with SLOB or with older code that did memset
    after allocating instead of using __GFP_ZERO.

    So, a simple module as this (running 3.6.0), will cause a crash when
    rmmod'ed.

    [root@phantom-lp2 foo]# cat foo.c
    #include
    #include
    #include
    #include

    MODULE_LICENSE("GPL");
    MODULE_VERSION("0.1");

    static struct gen_pool *foo_pool;

    static __init int foo_init(void)
    {
    int ret;
    foo_pool = gen_pool_create(10, -1);
    if (!foo_pool)
    return -ENOMEM;
    ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
    if (ret) {
    gen_pool_destroy(foo_pool);
    return ret;
    }
    return 0;
    }

    static __exit void foo_exit(void)
    {
    gen_pool_destroy(foo_pool);
    }

    module_init(foo_init);
    module_exit(foo_exit);
    [root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
    CONFIG_SLOB=y
    [root@phantom-lp2 foo]# insmod ./foo.ko
    [root@phantom-lp2 foo]# rmmod foo
    ------------[ cut here ]------------
    kernel BUG at lib/genalloc.c:243!
    cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
    pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
    lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
    sp: c0000000bb0e7be0
    msr: 8000000000029032
    current = 0xc0000000bb0e0000
    paca = 0xc000000006d30e00 softe: 0 irq_happened: 0x01
    pid = 13044, comm = rmmod
    kernel BUG at lib/genalloc.c:243!
    [c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
    [c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
    [c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
    --- Exception: c00 (System Call) at 000000800753d1a0
    SP (fffd0b0e640) is in userspace

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Cc: Paul Gortmaker
    Cc: Benjamin Gaignard
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thadeu Lima de Souza Cascardo
     

20 Oct, 2012

1 commit

  • If there is only one match, the unique matched entry should be returned.

    Without the fix, the upcoming dma debug interfaces ("dma-debug: new
    interfaces to debug dma mapping errors") can't work reliably because
    only device and dma_addr are passed to dma_mapping_error().

    Signed-off-by: Ming Lei
    Reported-by: Wu Fengguang
    Cc: Joerg Roedel
    Tested-by: Shuah Khan
    Cc: Paul Gortmaker
    Cc: Jakub Kicinski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ming Lei
     

15 Oct, 2012

1 commit

  • Pull module signing support from Rusty Russell:
    "module signing is the highlight, but it's an all-over David Howells frenzy..."

    Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

    * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
    X.509: Fix indefinite length element skip error handling
    X.509: Convert some printk calls to pr_devel
    asymmetric keys: fix printk format warning
    MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
    MODSIGN: Make mrproper should remove generated files.
    MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
    MODSIGN: Use the same digest for the autogen key sig as for the module sig
    MODSIGN: Sign modules during the build process
    MODSIGN: Provide a script for generating a key ID from an X.509 cert
    MODSIGN: Implement module signature checking
    MODSIGN: Provide module signing public keys to the kernel
    MODSIGN: Automatically generate module signing keys if missing
    MODSIGN: Provide Kconfig options
    MODSIGN: Provide gitignore and make clean rules for extra files
    MODSIGN: Add FIPS policy
    module: signature checking hook
    X.509: Add a crypto key parser for binary (DER) X.509 certificates
    MPILIB: Provide a function to read raw data into an MPI
    X.509: Add an ASN.1 decoder
    X.509: Add simple ASN.1 grammar compiler
    ...

    Linus Torvalds
     

11 Oct, 2012

3 commits

  • Merge misc fixes from Andrew Morton:
    "Followups, fixes and some random stuff I found on the internet."

    * emailed patches from Andrew Morton : (11 patches)
    perf: fix duplicate header inclusion
    memcg, kmem: fix build error when CONFIG_INET is disabled
    rtc: kconfig: fix RTC_INTF defaults connected to RTC_CLASS
    rapidio: fix comment
    lib/kasprintf.c: use kmalloc_track_caller() to get accurate traces for kvasprintf
    rapidio: update for destination ID allocation
    rapidio: update asynchronous discovery initialization
    rapidio: use msleep in discovery wait
    mm: compaction: fix bit ranges in {get,clear,set}_pageblock_skip()
    arch/powerpc/platforms/pseries/hotplug-memory.c: section removal cleanups
    arch/powerpc/platforms/pseries/hotplug-memory.c: fix section handling code

    Linus Torvalds
     
  • Pull block IO update from Jens Axboe:
    "Core block IO bits for 3.7. Not a huge round this time, it contains:

    - First series from Kent cleaning up and generalizing bio allocation
    and freeing.

    - WRITE_SAME support from Martin.

    - Mikulas patches to prevent O_DIRECT crashes when someone changes
    the block size of a device.

    - Make bio_split() work on data-less bio's (like trim/discards).

    - A few other minor fixups."

    Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
    Morton. It is due to the VM no longer using a prio-tree (see commit
    6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree").

    So make set_blocksize() use mapping_mapped() instead of open-coding the
    internal VM knowledge that has changed.

    * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
    block: makes bio_split support bio without data
    scatterlist: refactor the sg_nents
    scatterlist: add sg_nents
    fs: fix include/percpu-rwsem.h export error
    percpu-rw-semaphore: fix documentation typos
    fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
    blockdev: turn a rw semaphore into a percpu rw semaphore
    Fix a crash when block device is read and block size is changed at the same time
    block: fix request_queue->flags initialization
    block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
    block: ioctl to zero block ranges
    block: Make blkdev_issue_zeroout use WRITE SAME
    block: Implement support for WRITE SAME
    block: Consolidate command flag and queue limit checks for merges
    block: Clean up special command handling logic
    block/blk-tag.c: Remove useless kfree
    block: remove the duplicated setting for congestion_threshold
    block: reject invalid queue attribute values
    block: Add bio_clone_bioset(), bio_clone_kmalloc()
    block: Consolidate bio_alloc_bioset(), bio_kmalloc()
    ...

    Linus Torvalds
     
  • Previously kvasprintf() allocation was being done through kmalloc(),
    thus producing an inaccurate trace report.

    This is a common problem: in order to get accurate callsite tracing, a
    lib/utils function shouldn't allocate kmalloc but instead use
    kmalloc_track_caller.

    Signed-off-by: Ezequiel Garcia
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ezequiel Garcia
     

10 Oct, 2012

1 commit

  • asn1_find_indefinite_length() returns an error indicator of -1, which the
    caller asn1_ber_decoder() places in a size_t (which is usually unsigned) and
    then checks to see whether it is less than 0 (which it can't be). This can
    lead to the following warning:

    lib/asn1_decoder.c:320 asn1_ber_decoder()
    warn: unsigned 'len' is never less than zero.

    Instead, asn1_find_indefinite_length() update the caller's idea of the data
    cursor and length separately from returning the error code.

    Reported-by: Dan Carpenter
    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     

09 Oct, 2012

28 commits

  • Add a CONFIG_DEBUG_VM_RB build option for the previously existing
    DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
    recursive algorithms, we can expose it a bit more.

    Also extend this code to validate_mm() after stack expansion, and to check
    that the vma's start and last pgoffs have not changed since the nodes were
    inserted on the anon vma interval tree (as it is important that the nodes
    be reindexed after each such update).

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Update the generic interval tree code that was introduced in "mm: replace
    vma prio_tree with an interval tree".

    Changes:

    - fixed 'endpoing' typo noticed by Andrew Morton

    - replaced include/linux/interval_tree_tmpl.h, which was used as a
    template (including it automatically defined the interval tree
    functions) with include/linux/interval_tree_generic.h, which only
    defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself
    defines the interval tree functions when invoked. Now that is a very
    long macro which is unfortunate, but it does make the usage sites
    (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously.

    - make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro,
    instead of duplicating that code in the interval tree template.

    - replaced vma_interval_tree_add(), which was actually handling the
    nonlinear and interval tree cases, with vma_interval_tree_insert_after()
    which handles only the interval tree case and has an API that is more
    consistent with the other interval tree handling functions.
    The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap().

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Provide rb_insert_augmented() and rb_erase_augmented() through a new
    rbtree_augmented.h include file. rb_erase_augmented() is defined there as
    an __always_inline function, in order to allow inlining of augmented
    rbtree callbacks into it. Since this generates a relatively large
    function, each augmented rbtree user should make sure to have a single
    call site.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Catalin Marinas
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • After both prio_tree users have been converted to use red-black trees,
    there is no need to keep around the prio tree library anymore.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Catalin Marinas
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Implement an interval tree as a replacement for the VMA prio_tree. The
    algorithms are similar to lib/interval_tree.c; however that code can't be
    directly reused as the interval endpoints are not explicitly stored in the
    VMA. So instead, the common algorithm is moved into a template and the
    details (node type, how to get interval endpoints from the node, etc) are
    filled in using the C preprocessor.

    Once the interval tree functions are available, using them as a
    replacement to the VMA prio tree is a relatively simple, mechanical job.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Catalin Marinas
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Patch 1 implements support for interval trees, on top of the augmented
    rbtree API. It also adds synthetic tests to compare the performance of
    interval trees vs prio trees. Short answers is that interval trees are
    slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
    on search. It is debatable how realistic the synthetic test is, and I have
    not made such measurements yet, but my impression is that interval trees
    would still come out faster.

    Patch 2 uses a preprocessor template to make the interval tree generic,
    and uses it as a replacement for the vma prio_tree.

    Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
    a basic rbtree. We don't actually need the augmented rbtree support here
    because the intervals are always non-overlapping.

    Patch 4 removes the now-unused prio tree library.

    Patch 5 proposes an additional optimization to rb_erase_augmented, now
    providing it as an inline function so that the augmented callbacks can be
    inlined in. This provides an additional 5-10% performance improvement
    for the interval tree insert/erase benchmark. There is a maintainance cost
    as it exposes augmented rbtree users to some of the rbtree library internals;
    however I think this cost shouldn't be too high as I expect the augmented
    rbtree will always have much less users than the base rbtree.

    I should probably add a quick summary of why I think it makes sense to
    replace prio trees with augmented rbtree based interval trees now. One of
    the drivers is that we need augmented rbtrees for Rik's vma gap finding
    code, and once you have them, it just makes sense to use them for interval
    trees as well, as this is the simpler and more well known algorithm. prio
    trees, in comparison, seem *too* clever: they impose an additional 'heap'
    constraint on the tree, which they use to guarantee a faster worst-case
    complexity of O(k+log N) for stabbing queries in a well-balanced prio
    tree, vs O(k*log N) for interval trees (where k=number of matches,
    N=number of intervals). Now this sounds great, but in practice prio trees
    don't realize this theorical benefit. First, the additional constraint
    makes them harder to update, so that the kernel implementation has to
    simplify things by balancing them like a radix tree, which is not always
    ideal. Second, the fact that there are both index and heap properties
    makes both tree manipulation and search more complex, which results in a
    higher multiplicative time constant. As it turns out, the simple interval
    tree algorithm ends up running faster than the more clever prio tree.

    This patch:

    Add two test modules:

    - prio_tree_test measures the performance of lib/prio_tree.c, both for
    insertion/removal and for stabbing searches

    - interval_tree_test measures the performance of a library of equivalent
    functionality, built using the augmented rbtree support.

    In order to support the second test module, lib/interval_tree.c is
    introduced. It is kept separate from the interval_tree_test main file
    for two reasons: first we don't want to provide an unfair advantage
    over prio_tree_test by having everything in a single compilation unit,
    and second there is the possibility that the interval tree functionality
    could get some non-test users in kernel over time.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Catalin Marinas
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • As proposed by Peter Zijlstra, this makes it easier to define the augmented
    rbtree callbacks.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api
    and remove the old augmented rbtree implementation.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Introduce new augmented rbtree APIs that allow minimal recalculation of
    augmented node information.

    A new callback is added to the rbtree insertion and erase rebalancing
    functions, to be called on each tree rotations. Such rotations preserve
    the subtree's root augmented value, but require recalculation of the one
    child that was previously located at the subtree root.

    In the insertion case, the handcoded search phase must be updated to
    maintain the augmented information on insertion, and then the rbtree
    coloring/rebalancing algorithms keep it up to date.

    In the erase case, things are more complicated since it is library
    code that manipulates the rbtree in order to remove internal nodes.
    This requires a couple additional callbacks to copy a subtree's
    augmented value when a new root is stitched in, and to recompute
    augmented values down the ancestry path when a node is removed from
    the tree.

    In order to preserve maximum speed for the non-augmented case,
    we provide two versions of each tree manipulation function.
    rb_insert_augmented() is the augmented equivalent of rb_insert_color(),
    and rb_erase_augmented() is the augmented equivalent of rb_erase().

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Small test to measure the performance of augmented rbtrees.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Various minor optimizations in rb_erase():
    - Avoid multiple loading of node->__rb_parent_color when computing parent
    and color information (possibly not in close sequence, as there might
    be further branches in the algorithm)
    - In the 1-child subcase of case 1, copy the __rb_parent_color field from
    the erased node to the child instead of recomputing it from the desired
    parent and color
    - When searching for the erased node's successor, differentiate between
    cases 2 and 3 based on whether any left links were followed. This avoids
    a condition later down.
    - In case 3, keep a pointer to the erased node's right child so we don't
    have to refetch it later to adjust its parent.
    - In the no-childs subcase of cases 2 and 3, place the rebalance assigment
    last so that the compiler can remove the following if(rebalance) test.

    Also, added some comments to illustrate cases 2 and 3.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • An interesting observation for rb_erase() is that when a node has
    exactly one child, the node must be black and the child must be red.
    An interesting consequence is that removing such a node can be done by
    simply replacing it with its child and making the child black,
    which we can do efficiently in rb_erase(). __rb_erase_color() then
    only needs to handle the no-childs case and can be modified accordingly.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • In rb_erase, move the easy case (node to erase has no more than
    1 child) first. I feel the code reads easier that way.

    Signed-off-by: Michel Lespinasse
    Reviewed-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Add __rb_change_child() as an inline helper function to replace code that
    would otherwise be duplicated 4 times in the source.

    No changes to binary size or speed.

    Signed-off-by: Michel Lespinasse
    Reviewed-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Just a small fix to make sparse happy.

    Signed-off-by: Michel Lespinasse
    Reported-by: Fengguang Wu
    Acked-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • When looking to fetch a node's sibling, we went through a sequence of:
    - check if node is the parent's left child
    - if it is, then fetch the parent's right child

    This can be replaced with:
    - fetch the parent's right child as an assumed sibling
    - check that node is NOT the fetched child

    This avoids fetching the parent's left child when node is actually
    that child. Saves a bit on code size, though it doesn't seem to make
    a large difference in speed.

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Acked-by: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Set comment and indentation style to be consistent with linux coding style
    and the rest of the file, as suggested by Peter Zijlstra

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • In __rb_erase_color(), we often already have pointers to the nodes being
    rotated and/or know what their colors must be, so we can generate more
    efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
    functions.

    Also when the current node is red or when flipping the sibling's color,
    the parent is already known so we can use the more efficient
    rb_set_parent_color() function to set the desired color.

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • In __rb_erase_color(), we have to select one of 3 cases depending on the
    color on the 'other' node children. If both children are black, we flip a
    few node colors and iterate. Otherwise, we do either one or two tree
    rotations, depending on the color of the 'other' child opposite to 'node',
    and then we are done.

    The corresponding logic had duplicate checks for the color of the 'other'
    child opposite to 'node'. It was checking it first to determine if both
    children are black, and then to determine how many tree rotations are
    required. Rearrange the logic to avoid that extra check.

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • In __rb_erase_color(), we were always setting a node to black after
    exiting the main loop. And in one case, after fixing up the tree to
    satisfy all rbtree invariants, we were setting the current node to root
    just to guarantee a loop exit, at which point the root would be set to
    black. However this is not necessary, as the root of an rbtree is already
    known to be black. The only case where the color flip is required is when
    we exit the loop due to the current node being red, and it's easiest to
    just do the flip at that point instead of doing it after the loop.

    [adrian.hunter@intel.com: perf tools: fix build for another rbtree.c change]
    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Adrian Hunter
    Cc: Alexander Shishkin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • - Use the newly introduced rb_set_parent_color() function to flip the color
    of nodes whose parent is already known.
    - Optimize rb_parent() when the node is known to be red - there is no need
    to mask out the color in that case.
    - Flipping gparent's color to red requires us to fetch its rb_parent_color
    field, so we can reuse it as the parent value for the next loop iteration.
    - Do not use __rb_rotate_left() and __rb_rotate_right() to handle tree
    rotations: we already have pointers to all relevant nodes, and know their
    colors (either because we want to adjust it, or because we've tested it,
    or we can deduce it as black due to the node proximity to a known red node).
    So we can generate more efficient code by making use of the node pointers
    we already have, and setting both the parent and color attributes for
    nodes all at once. Also in Case 2, some node attributes don't have to
    be set because we know another tree rotation (Case 3) will always follow
    and override them.

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • The root node of an rbtree must always be black. However,
    rb_insert_color() only needs to maintain this invariant when it has been
    broken - that is, when it exits the loop due to the current (red) node
    being the root. In all other cases (exiting after tree rotations, or
    exiting due to an existing black parent) the invariant is already
    satisfied, so there is no need to adjust the root node color.

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • It is a well known property of rbtrees that insertion never requires more
    than two tree rotations. In our implementation, after one loop iteration
    identified one or two necessary tree rotations, we would iterate and look
    for more. However at that point the node's parent would always be black,
    which would cause us to exit the loop.

    We can make the code flow more obvious by just adding a break statement
    after the tree rotations, where we know we are done. Additionally, in the
    cases where two tree rotations are necessary, we don't have to update the
    'node' pointer as it wouldn't be used until the next loop iteration, which
    we now avoid due to this break statement.

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • This small module helps measure the performance of rbtree insert and
    erase.

    Additionally, we run a few correctness tests to check that the rbtrees
    have all desired properties:

    - contains the right number of nodes in the order desired,
    - never two consecutive red nodes on any path,
    - all paths to leaf nodes have the same number of black nodes,
    - root node is black

    [akpm@linux-foundation.org: fix printk warning: sparc64 cycles_t is unsigned long]
    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • rbtree users must use the documented APIs to manipulate the tree
    structure. Low-level helpers to manipulate node colors and parenthood are
    not part of that API, so move them to lib/rbtree.c

    [dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field]
    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Signed-off-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Empty nodes have no color. We can make use of this property to simplify
    the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also,
    we can get rid of the rb_init_node function which had been introduced by
    commit 88d19cf37952 ("timers: Add rb_init_node() to allow for stack
    allocated rb nodes") to avoid some issue with the empty node's color not
    being initialized.

    I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
    doing there, though. axboe introduced them in commit 10fd48f2376d
    ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I
    see it, the 'empty node' abstraction is only used by rbtree users to
    flag nodes that they haven't inserted in any rbtree, so asking the
    predecessor or successor of such nodes doesn't make any sense.

    One final rb_init_node() caller was recently added in sysctl code to
    implement faster sysctl name lookups. This code doesn't make use of
    RB_EMPTY_NODE at all, and from what I could see it only called
    rb_init_node() under the mistaken assumption that such initialization was
    required before node insertion.

    [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Acked-by: David Woodhouse
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Jens Axboe
    Cc: "Eric W. Biederman"
    Cc: John Stultz
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Introduce HAVE_DEBUG_BUGVERBOSE config option and select it in
    corresponding architecture Kconfig files. Architectures that already
    select GENERIC_BUG don't need to select HAVE_DEBUG_BUGVERBOSE.

    Signed-off-by: Catalin Marinas
    Acked-by: Geert Uytterhoeven
    Cc: David Howells
    Cc: Hirokazu Takata
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     
  • Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding
    architecture Kconfig files. DEBUG_KMEMLEAK now only depends on
    HAVE_DEBUG_KMEMLEAK.

    Signed-off-by: Catalin Marinas
    Cc: Russell King
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

08 Oct, 2012

5 commits

  • Provide a function to read raw data of a predetermined size into an MPI rather
    than expecting the size to be encoded within the data. The data is assumed to
    represent an unsigned integer, and the resulting MPI will be positive.

    The function looks like this:

    MPI mpi_read_raw_data(const void *, size_t);

    This is useful for reading ASN.1 integer primitives where the length is encoded
    in the ASN.1 metadata.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     
  • Add an ASN.1 BER/DER/CER decoder. This uses the bytecode from the ASN.1
    compiler in the previous patch to inform it as to what to expect to find in the
    encoded byte stream. The output from the compiler also tells it what functions
    to call on what tags, thus allowing the caller to retrieve information.

    The decoder is called as follows:

    int asn1_decoder(const struct asn1_decoder *decoder,
    void *context,
    const unsigned char *data,
    size_t datalen);

    The decoder argument points to the bytecode from the ASN.1 compiler. context
    is the caller's context and is passed to the action functions. data and
    datalen define the byte stream to be decoded.

    Note that the decoder is currently limited to datalen being less than 64K.
    This reduces the amount of stack space used by the decoder because ASN.1 is a
    nested construct. Similarly, the decoder is limited to a maximum of 10 levels
    of constructed data outside of a leaf node also in an effort to keep stack
    usage down.

    These restrictions can be raised if necessary.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     
  • Add a pair of utility functions to render OIDs as strings. The first takes an
    encoded OID and turns it into a "a.b.c.d" form string:

    int sprint_oid(const void *data, size_t datasize,
    char *buffer, size_t bufsize);

    The second takes an OID enum index and calls the first on the data held
    therein:

    int sprint_OID(enum OID oid, char *buffer, size_t bufsize);

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     
  • Implement a simple static OID registry that allows the mapping of an encoded
    OID to an enum value for ease of use.

    The OID registry index enum appears in the:

    linux/oid_registry.h

    header file. A script generates the registry from lines in the header file
    that look like:

    OID_foo,/*1.2.3.4*/

    The actual OID is taken to be represented by the numbers with interpolated
    dots in the comment.

    All other lines in the header are ignored.

    The registry is queries by calling:

    OID look_up_oid(const void *data, size_t datasize);

    This returns a number from the registry enum representing the OID if found or
    OID__NR if not.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells
     
  • Reinstate and export mpi_cmp() and mpi_cmp_ui() from the MPI library for use by
    RSA signature verification as per RFC3447 section 5.2.2 step 1.

    Signed-off-by: David Howells
    Signed-off-by: Rusty Russell

    David Howells