Eric Lee / smarc-fsl-linux-kernel

26 Oct, 2012

1 commit

eedce141c genalloc: stop crashing the system when destroying a pool ... Browse Code »

The genalloc code uses the bitmap API from include/linux/bitmap.h and
lib/bitmap.c, which is based on long values. Both bitmap_set from
lib/bitmap.c and bitmap_set_ll, which is the lockless version from
genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
the bitmap.

That one uses (1 << bits) - 1, 0b111, if you are setting the first three
bits. This means that the API counts from the least significant bits
(LSB from now on) to the MSB. The LSB in the first long is bit 0, then.
The same works for the lookup functions.

The genalloc code uses longs for the bitmap, as it should. In
include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
bits[0] as its last member. When allocating the struct, genalloc should
reserve enough space for the bitmap. This should be a proper number of
longs that can fit the amount of bits in the bitmap.

However, genalloc allocates an integer number of bytes that fit the
amount of bits, but may not be an integer amount of longs. 9 bytes, for
example, could be allocated for 70 bits.

This is a problem in itself if the Least Significat Bit in a long is in
the byte with the largest address, which happens in Big Endian machines.
This means genalloc is not allocating the byte in which it will try to
set or check for a bit.

This may end up in memory corruption, where genalloc will try to set the
bits it has not allocated. In fact, genalloc may not set these bits
because it may find them already set, because they were not zeroed since
they were not allocated. And that's what causes a BUG when
gen_pool_destroy is called and check for any set bits.

What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab
will be cleared, not only the requested bytes. Since struct
gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
multiples of 8, we get lucky and allocate and clear the right amount of
bytes.

Hower, this is not the case with SLOB or with older code that did memset
after allocating instead of using __GFP_ZERO.

So, a simple module as this (running 3.6.0), will cause a crash when
rmmod'ed.

[root@phantom-lp2 foo]# cat foo.c
#include
#include
#include
#include

MODULE_LICENSE("GPL");
MODULE_VERSION("0.1");

static struct gen_pool *foo_pool;

static __init int foo_init(void)
{
int ret;
foo_pool = gen_pool_create(10, -1);
if (!foo_pool)
return -ENOMEM;
ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
if (ret) {
gen_pool_destroy(foo_pool);
return ret;
}
return 0;
}

static __exit void foo_exit(void)
{
gen_pool_destroy(foo_pool);
}

module_init(foo_init);
module_exit(foo_exit);
[root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
CONFIG_SLOB=y
[root@phantom-lp2 foo]# insmod ./foo.ko
[root@phantom-lp2 foo]# rmmod foo
------------[ cut here ]------------
kernel BUG at lib/genalloc.c:243!
cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
sp: c0000000bb0e7be0
msr: 8000000000029032
current = 0xc0000000bb0e0000
paca = 0xc000000006d30e00 softe: 0 irq_happened: 0x01
pid = 13044, comm = rmmod
kernel BUG at lib/genalloc.c:243!
[c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
[c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
[c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
--- Exception: c00 (System Call) at 000000800753d1a0
SP (fffd0b0e640) is in userspace

Signed-off-by: Thadeu Lima de Souza Cascardo
Cc: Paul Gortmaker
Cc: Benjamin Gaignard
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thadeu Lima de Souza Cascardo
2012-10-26 05:37:52 +0800

20 Oct, 2012

1 commit

fe73fbe1c lib/dma-debug.c: fix __hash_bucket_find() ... Browse Code »

If there is only one match, the unique matched entry should be returned.

Without the fix, the upcoming dma debug interfaces ("dma-debug: new
interfaces to debug dma mapping errors") can't work reliably because
only device and dma_addr are passed to dma_mapping_error().

Signed-off-by: Ming Lei
Reported-by: Wu Fengguang
Cc: Joerg Roedel
Tested-by: Shuah Khan
Cc: Paul Gortmaker
Cc: Jakub Kicinski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ming Lei
2012-10-20 05:07:48 +0800

15 Oct, 2012

1 commit

d25282d1c Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module signing support from Rusty Russell:
"module signing is the highlight, but it's an all-over David Howells frenzy..."

Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
X.509: Fix indefinite length element skip error handling
X.509: Convert some printk calls to pr_devel
asymmetric keys: fix printk format warning
MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
MODSIGN: Make mrproper should remove generated files.
MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
MODSIGN: Use the same digest for the autogen key sig as for the module sig
MODSIGN: Sign modules during the build process
MODSIGN: Provide a script for generating a key ID from an X.509 cert
MODSIGN: Implement module signature checking
MODSIGN: Provide module signing public keys to the kernel
MODSIGN: Automatically generate module signing keys if missing
MODSIGN: Provide Kconfig options
MODSIGN: Provide gitignore and make clean rules for extra files
MODSIGN: Add FIPS policy
module: signature checking hook
X.509: Add a crypto key parser for binary (DER) X.509 certificates
MPILIB: Provide a function to read raw data into an MPI
X.509: Add an ASN.1 decoder
X.509: Add simple ASN.1 grammar compiler
...

Linus Torvalds
2012-10-15 04:39:34 +0800

11 Oct, 2012

3 commits

14ffe009c Merge branch 'akpm' (Fixups from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"Followups, fixes and some random stuff I found on the internet."

* emailed patches from Andrew Morton : (11 patches)
perf: fix duplicate header inclusion
memcg, kmem: fix build error when CONFIG_INET is disabled
rtc: kconfig: fix RTC_INTF defaults connected to RTC_CLASS
rapidio: fix comment
lib/kasprintf.c: use kmalloc_track_caller() to get accurate traces for kvasprintf
rapidio: update for destination ID allocation
rapidio: update asynchronous discovery initialization
rapidio: use msleep in discovery wait
mm: compaction: fix bit ranges in {get,clear,set}_pageblock_skip()
arch/powerpc/platforms/pseries/hotplug-memory.c: section removal cleanups
arch/powerpc/platforms/pseries/hotplug-memory.c: fix section handling code

Linus Torvalds
2012-10-11 09:14:16 +0800
ce40be7a8 Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO update from Jens Axboe:
"Core block IO bits for 3.7. Not a huge round this time, it contains:

- First series from Kent cleaning up and generalizing bio allocation
and freeing.

- WRITE_SAME support from Martin.

- Mikulas patches to prevent O_DIRECT crashes when someone changes
the block size of a device.

- Make bio_split() work on data-less bio's (like trim/discards).

- A few other minor fixups."

Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton. It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree").

So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.

* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
block: makes bio_split support bio without data
scatterlist: refactor the sg_nents
scatterlist: add sg_nents
fs: fix include/percpu-rwsem.h export error
percpu-rw-semaphore: fix documentation typos
fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
blockdev: turn a rw semaphore into a percpu rw semaphore
Fix a crash when block device is read and block size is changed at the same time
block: fix request_queue->flags initialization
block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
block: ioctl to zero block ranges
block: Make blkdev_issue_zeroout use WRITE SAME
block: Implement support for WRITE SAME
block: Consolidate command flag and queue limit checks for merges
block: Clean up special command handling logic
block/blk-tag.c: Remove useless kfree
block: remove the duplicated setting for congestion_threshold
block: reject invalid queue attribute values
block: Add bio_clone_bioset(), bio_clone_kmalloc()
block: Consolidate bio_alloc_bioset(), bio_kmalloc()
...

Linus Torvalds
2012-10-11 08:04:23 +0800
3e1aa66bd lib/kasprintf.c: use kmalloc_track_caller() to get accurate traces for kvasprintf ... Browse Code »

Previously kvasprintf() allocation was being done through kmalloc(),
thus producing an inaccurate trace report.

This is a common problem: in order to get accurate callsite tracing, a
lib/utils function shouldn't allocate kmalloc but instead use
kmalloc_track_caller.

Signed-off-by: Ezequiel Garcia
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ezequiel Garcia
2012-10-11 07:50:15 +0800

10 Oct, 2012

1 commit

dbadc1768 X.509: Fix indefinite length element skip error handling ... Browse Code »

asn1_find_indefinite_length() returns an error indicator of -1, which the
caller asn1_ber_decoder() places in a size_t (which is usually unsigned) and
then checks to see whether it is less than 0 (which it can't be). This can
lead to the following warning:

lib/asn1_decoder.c:320 asn1_ber_decoder()
warn: unsigned 'len' is never less than zero.

Instead, asn1_find_indefinite_length() update the caller's idea of the data
cursor and length separately from returning the error code.

Reported-by: Dan Carpenter
Signed-off-by: David Howells
Signed-off-by: Rusty Russell

David Howells
2012-10-10 17:36:39 +0800

09 Oct, 2012

28 commits

ed8ea8150 mm: add CONFIG_DEBUG_VM_RB build option ... Browse Code »

Add a CONFIG_DEBUG_VM_RB build option for the previously existing
DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
recursive algorithms, we can expose it a bit more.

Also extend this code to validate_mm() after stack expansion, and to check
that the vma's start and last pgoffs have not changed since the nodes were
inserted on the anon vma interval tree (as it is important that the nodes
be reindexed after each such update).

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:42 +0800
9826a516f mm: interval tree updates ... Browse Code »

Update the generic interval tree code that was introduced in "mm: replace
vma prio_tree with an interval tree".

Changes:

- fixed 'endpoing' typo noticed by Andrew Morton

- replaced include/linux/interval_tree_tmpl.h, which was used as a
template (including it automatically defined the interval tree
functions) with include/linux/interval_tree_generic.h, which only
defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself
defines the interval tree functions when invoked. Now that is a very
long macro which is unfortunate, but it does make the usage sites
(lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously.

- make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro,
instead of duplicating that code in the interval tree template.

- replaced vma_interval_tree_add(), which was actually handling the
nonlinear and interval tree cases, with vma_interval_tree_insert_after()
which handles only the interval tree case and has an API that is more
consistent with the other interval tree handling functions.
The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap().

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:40 +0800
9c079add0 rbtree: move augmented rbtree functionality to rbtree_augmented.h ... Browse Code »

Provide rb_insert_augmented() and rb_erase_augmented() through a new
rbtree_augmented.h include file. rb_erase_augmented() is defined there as
an __always_inline function, in order to allow inlining of augmented
rbtree callbacks into it. Since this generates a relatively large
function, each augmented rbtree user should make sure to have a single
call site.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:40 +0800
147e615f8 prio_tree: remove ... Browse Code »

After both prio_tree users have been converted to use red-black trees,
there is no need to keep around the prio tree library anymore.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:40 +0800
6b2dbba8b mm: replace vma prio_tree with an interval tree ... Browse Code »

Implement an interval tree as a replacement for the VMA prio_tree. The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA. So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:39 +0800
fff3fd8a1 rbtree: add prio tree and interval tree tests ... Browse Code »

Patch 1 implements support for interval trees, on top of the augmented
rbtree API. It also adds synthetic tests to compare the performance of
interval trees vs prio trees. Short answers is that interval trees are
slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
on search. It is debatable how realistic the synthetic test is, and I have
not made such measurements yet, but my impression is that interval trees
would still come out faster.

Patch 2 uses a preprocessor template to make the interval tree generic,
and uses it as a replacement for the vma prio_tree.

Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
a basic rbtree. We don't actually need the augmented rbtree support here
because the intervals are always non-overlapping.

Patch 4 removes the now-unused prio tree library.

Patch 5 proposes an additional optimization to rb_erase_augmented, now
providing it as an inline function so that the augmented callbacks can be
inlined in. This provides an additional 5-10% performance improvement
for the interval tree insert/erase benchmark. There is a maintainance cost
as it exposes augmented rbtree users to some of the rbtree library internals;
however I think this cost shouldn't be too high as I expect the augmented
rbtree will always have much less users than the base rbtree.

I should probably add a quick summary of why I think it makes sense to
replace prio trees with augmented rbtree based interval trees now. One of
the drivers is that we need augmented rbtrees for Rik's vma gap finding
code, and once you have them, it just makes sense to use them for interval
trees as well, as this is the simpler and more well known algorithm. prio
trees, in comparison, seem *too* clever: they impose an additional 'heap'
constraint on the tree, which they use to guarantee a faster worst-case
complexity of O(k+log N) for stabbing queries in a well-balanced prio
tree, vs O(k*log N) for interval trees (where k=number of matches,
N=number of intervals). Now this sounds great, but in practice prio trees
don't realize this theorical benefit. First, the additional constraint
makes them harder to update, so that the kernel implementation has to
simplify things by balancing them like a radix tree, which is not always
ideal. Second, the fact that there are both index and heap properties
makes both tree manipulation and search more complex, which results in a
higher multiplicative time constant. As it turns out, the simple interval
tree algorithm ends up running faster than the more clever prio tree.

This patch:

Add two test modules:

- prio_tree_test measures the performance of lib/prio_tree.c, both for
insertion/removal and for stabbing searches

- interval_tree_test measures the performance of a library of equivalent
functionality, built using the augmented rbtree support.

In order to support the second test module, lib/interval_tree.c is
introduced. It is kept separate from the interval_tree_test main file
for two reasons: first we don't want to provide an unfair advantage
over prio_tree_test by having everything in a single compilation unit,
and second there is the possibility that the interval tree functionality
could get some non-test users in kernel over time.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:39 +0800
3908836aa rbtree: add RB_DECLARE_CALLBACKS() macro ... Browse Code »

As proposed by Peter Zijlstra, this makes it easier to define the augmented
rbtree callbacks.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:38 +0800
9d9e6f970 rbtree: remove prior augmented rbtree implementation ... Browse Code »

convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api
and remove the old augmented rbtree implementation.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:38 +0800
14b94af0b rbtree: faster augmented rbtree manipulation ... Browse Code »

Introduce new augmented rbtree APIs that allow minimal recalculation of
augmented node information.

A new callback is added to the rbtree insertion and erase rebalancing
functions, to be called on each tree rotations. Such rotations preserve
the subtree's root augmented value, but require recalculation of the one
child that was previously located at the subtree root.

In the insertion case, the handcoded search phase must be updated to
maintain the augmented information on insertion, and then the rbtree
coloring/rebalancing algorithms keep it up to date.

In the erase case, things are more complicated since it is library
code that manipulates the rbtree in order to remove internal nodes.
This requires a couple additional callbacks to copy a subtree's
augmented value when a new root is stitched in, and to recompute
augmented values down the ancestry path when a node is removed from
the tree.

In order to preserve maximum speed for the non-augmented case,
we provide two versions of each tree manipulation function.
rb_insert_augmented() is the augmented equivalent of rb_insert_color(),
and rb_erase_augmented() is the augmented equivalent of rb_erase().

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
dadf93534 rbtree: augmented rbtree test ... Browse Code »

Small test to measure the performance of augmented rbtrees.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
4f035ad67 rbtree: low level optimizations in rb_erase() ... Browse Code »

Various minor optimizations in rb_erase():
- Avoid multiple loading of node->__rb_parent_color when computing parent
and color information (possibly not in close sequence, as there might
be further branches in the algorithm)
- In the 1-child subcase of case 1, copy the __rb_parent_color field from
the erased node to the child instead of recomputing it from the desired
parent and color
- When searching for the erased node's successor, differentiate between
cases 2 and 3 based on whether any left links were followed. This avoids
a condition later down.
- In case 3, keep a pointer to the erased node's right child so we don't
have to refetch it later to adjust its parent.
- In the no-childs subcase of cases 2 and 3, place the rebalance assigment
last so that the compiler can remove the following if(rebalance) test.

Also, added some comments to illustrate cases 2 and 3.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
46b6135a7 rbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color() ... Browse Code »

An interesting observation for rb_erase() is that when a node has
exactly one child, the node must be black and the child must be red.
An interesting consequence is that removing such a node can be done by
simply replacing it with its child and making the child black,
which we can do efficiently in rb_erase(). __rb_erase_color() then
only needs to handle the no-childs case and can be modified accordingly.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
60670b803 rbtree: place easiest case first in rb_erase() ... Browse Code »

In rb_erase, move the easy case (node to erase has no more than
1 child) first. I feel the code reads easier that way.

Signed-off-by: Michel Lespinasse
Reviewed-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:36 +0800
7abc704ae rbtree: add __rb_change_child() helper function ... Browse Code »

Add __rb_change_child() as an inline helper function to replace code that
would otherwise be duplicated 4 times in the source.

No changes to binary size or speed.

Signed-off-by: Michel Lespinasse
Reviewed-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:36 +0800
28d753092 rbtree test: fix sparse warning about 64-bit constant ... Browse Code »

Just a small fix to make sparse happy.

Signed-off-by: Michel Lespinasse
Reported-by: Fengguang Wu
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:36 +0800
59633abf3 rbtree: optimize fetching of sibling node ... Browse Code »

When looking to fetch a node's sibling, we went through a sequence of:
- check if node is the parent's left child
- if it is, then fetch the parent's right child

This can be replaced with:
- fetch the parent's right child as an assumed sibling
- check that node is NOT the fetched child

This avoids fetching the parent's left child when node is actually
that child. Saves a bit on code size, though it doesn't seem to make
a large difference in speed.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: David Woodhouse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:35 +0800
7ce6ff9e5 rbtree: coding style adjustments ... Browse Code »

Set comment and indentation style to be consistent with linux coding style
and the rest of the file, as suggested by Peter Zijlstra

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:35 +0800
6280d2356 rbtree: low level optimizations in __rb_erase_color() ... Browse Code »

In __rb_erase_color(), we often already have pointers to the nodes being
rotated and/or know what their colors must be, so we can generate more
efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
functions.

Also when the current node is red or when flipping the sibling's color,
the parent is already known so we can use the more efficient
rb_set_parent_color() function to set the desired color.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:35 +0800
e125d1471 rbtree: optimize case selection logic in __rb_erase_color() ... Browse Code »

In __rb_erase_color(), we have to select one of 3 cases depending on the
color on the 'other' node children. If both children are black, we flip a
few node colors and iterate. Otherwise, we do either one or two tree
rotations, depending on the color of the 'other' child opposite to 'node',
and then we are done.

The corresponding logic had duplicate checks for the color of the 'other'
child opposite to 'node'. It was checking it first to determine if both
children are black, and then to determine how many tree rotations are
required. Rearrange the logic to avoid that extra check.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:34 +0800
d6ff12739 rbtree: adjust node color in __rb_erase_color() only when necessary ... Browse Code »

In __rb_erase_color(), we were always setting a node to black after
exiting the main loop. And in one case, after fixing up the tree to
satisfy all rbtree invariants, we were setting the current node to root
just to guarantee a loop exit, at which point the root would be set to
black. However this is not necessary, as the root of an rbtree is already
known to be black. The only case where the color flip is required is when
we exit the loop due to the current node being red, and it's easiest to
just do the flip at that point instead of doing it after the loop.

[adrian.hunter@intel.com: perf tools: fix build for another rbtree.c change]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Adrian Hunter
Cc: Alexander Shishkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:34 +0800
5bc9188aa rbtree: low level optimizations in rb_insert_color() ... Browse Code »

- Use the newly introduced rb_set_parent_color() function to flip the color
of nodes whose parent is already known.
- Optimize rb_parent() when the node is known to be red - there is no need
to mask out the color in that case.
- Flipping gparent's color to red requires us to fetch its rb_parent_color
field, so we can reuse it as the parent value for the next loop iteration.
- Do not use __rb_rotate_left() and __rb_rotate_right() to handle tree
rotations: we already have pointers to all relevant nodes, and know their
colors (either because we want to adjust it, or because we've tested it,
or we can deduce it as black due to the node proximity to a known red node).
So we can generate more efficient code by making use of the node pointers
we already have, and setting both the parent and color attributes for
nodes all at once. Also in Case 2, some node attributes don't have to
be set because we know another tree rotation (Case 3) will always follow
and override them.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:34 +0800
6d58452dc rbtree: adjust root color in rb_insert_color() only when necessary ... Browse Code »

The root node of an rbtree must always be black. However,
rb_insert_color() only needs to maintain this invariant when it has been
broken - that is, when it exits the loop due to the current (red) node
being the root. In all other cases (exiting after tree rotations, or
exiting due to an existing black parent) the invariant is already
satisfied, so there is no need to adjust the root node color.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:33 +0800
1f0528653 rbtree: break out of rb_insert_color loop after tree rotation ... Browse Code »

It is a well known property of rbtrees that insertion never requires more
than two tree rotations. In our implementation, after one loop iteration
identified one or two necessary tree rotations, we would iterate and look
for more. However at that point the node's parent would always be black,
which would cause us to exit the loop.

We can make the code flow more obvious by just adding a break statement
after the tree rotations, where we know we are done. Additionally, in the
cases where two tree rotations are necessary, we don't have to update the
'node' pointer as it wouldn't be used until the next loop iteration, which
we now avoid due to this break statement.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:33 +0800
910a742d4 rbtree: performance and correctness test ... Browse Code »

This small module helps measure the performance of rbtree insert and
erase.

Additionally, we run a few correctness tests to check that the rbtrees
have all desired properties:

- contains the right number of nodes in the order desired,
- never two consecutive red nodes on any path,
- all paths to leaf nodes have the same number of black nodes,
- root node is black

[akpm@linux-foundation.org: fix printk warning: sparc64 cycles_t is unsigned long]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:33 +0800
bf7ad8eea rbtree: move some implementation details from rbtree.h to rbtree.c ... Browse Code »

rbtree users must use the documented APIs to manipulate the tree
structure. Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c

[dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:32 +0800
4c199a93a rbtree: empty nodes have no color ... Browse Code »

Empty nodes have no color. We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf37952 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though. axboe introduced them in commit 10fd48f2376d
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups. This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Cc: John Stultz
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:32 +0800
9b2a60c48 Kconfig: clean up the long arch list for the DEBUG_BUGVERBOSE config option ... Browse Code »

Introduce HAVE_DEBUG_BUGVERBOSE config option and select it in
corresponding architecture Kconfig files. Architectures that already
select GENERIC_BUG don't need to select HAVE_DEBUG_BUGVERBOSE.

Signed-off-by: Catalin Marinas
Acked-by: Geert Uytterhoeven
Cc: David Howells
Cc: Hirokazu Takata
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Chris Metcalf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2012-10-09 15:22:14 +0800
b69ec42b1 Kconfig: clean up the long arch list for the DEBUG_KMEMLEAK config option ... Browse Code »

Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding
architecture Kconfig files. DEBUG_KMEMLEAK now only depends on
HAVE_DEBUG_KMEMLEAK.

Signed-off-by: Catalin Marinas
Cc: Russell King
Cc: Michal Simek
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Chris Metcalf
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2012-10-09 15:22:14 +0800

08 Oct, 2012

5 commits

e10459929 MPILIB: Provide a function to read raw data into an MPI ... Browse Code »

Provide a function to read raw data of a predetermined size into an MPI rather
than expecting the size to be encoded within the data. The data is assumed to
represent an unsigned integer, and the resulting MPI will be positive.

The function looks like this:

MPI mpi_read_raw_data(const void *, size_t);

This is useful for reading ASN.1 integer primitives where the length is encoded
in the ASN.1 metadata.

Signed-off-by: David Howells
Signed-off-by: Rusty Russell

David Howells
2012-10-08 11:20:21 +0800
42d5ec27f X.509: Add an ASN.1 decoder ... Browse Code »

Add an ASN.1 BER/DER/CER decoder. This uses the bytecode from the ASN.1
compiler in the previous patch to inform it as to what to expect to find in the
encoded byte stream. The output from the compiler also tells it what functions
to call on what tags, thus allowing the caller to retrieve information.

The decoder is called as follows:

int asn1_decoder(const struct asn1_decoder *decoder,
void *context,
const unsigned char *data,
size_t datalen);

The decoder argument points to the bytecode from the ASN.1 compiler. context
is the caller's context and is passed to the action functions. data and
datalen define the byte stream to be decoded.

Note that the decoder is currently limited to datalen being less than 64K.
This reduces the amount of stack space used by the decoder because ASN.1 is a
nested construct. Similarly, the decoder is limited to a maximum of 10 levels
of constructed data outside of a leaf node also in an effort to keep stack
usage down.

These restrictions can be raised if necessary.

Signed-off-by: David Howells
Signed-off-by: Rusty Russell

David Howells
2012-10-08 11:20:20 +0800
4f73175d0 X.509: Add utility functions to render OIDs as strings ... Browse Code »

Add a pair of utility functions to render OIDs as strings. The first takes an
encoded OID and turns it into a "a.b.c.d" form string:

int sprint_oid(const void *data, size_t datasize,
char *buffer, size_t bufsize);

The second takes an OID enum index and calls the first on the data held
therein:

int sprint_OID(enum OID oid, char *buffer, size_t bufsize);

Signed-off-by: David Howells
Signed-off-by: Rusty Russell

David Howells
2012-10-08 11:20:18 +0800
a77ad6ea0 X.509: Implement simple static OID registry ... Browse Code »

Implement a simple static OID registry that allows the mapping of an encoded
OID to an enum value for ease of use.

The OID registry index enum appears in the:

linux/oid_registry.h

header file. A script generates the registry from lines in the header file
that look like:

OID_foo,/*1.2.3.4*/

The actual OID is taken to be represented by the numbers with interpolated
dots in the comment.

All other lines in the header are ignored.

The registry is queries by calling:

OID look_up_oid(const void *data, size_t datasize);

This returns a number from the registry enum representing the OID if found or
OID__NR if not.

Signed-off-by: David Howells
Signed-off-by: Rusty Russell

David Howells
2012-10-08 11:20:18 +0800
12f008b6d MPILIB: Reinstate mpi_cmp[_ui]() and export for RSA signature verification ... Browse Code »

Reinstate and export mpi_cmp() and mpi_cmp_ui() from the MPI library for use by
RSA signature verification as per RFC3447 section 5.2.2 step 1.

Signed-off-by: David Howells
Signed-off-by: Rusty Russell

David Howells
2012-10-08 11:20:15 +0800