Eric Lee / smarc-fsl-linux-kernel

09 Oct, 2012

28 commits

ed8ea8150 mm: add CONFIG_DEBUG_VM_RB build option ... Browse Code »

Add a CONFIG_DEBUG_VM_RB build option for the previously existing
DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
recursive algorithms, we can expose it a bit more.

Also extend this code to validate_mm() after stack expansion, and to check
that the vma's start and last pgoffs have not changed since the nodes were
inserted on the anon vma interval tree (as it is important that the nodes
be reindexed after each such update).

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:42 +0800
9826a516f mm: interval tree updates ... Browse Code »

Update the generic interval tree code that was introduced in "mm: replace
vma prio_tree with an interval tree".

Changes:

- fixed 'endpoing' typo noticed by Andrew Morton

- replaced include/linux/interval_tree_tmpl.h, which was used as a
template (including it automatically defined the interval tree
functions) with include/linux/interval_tree_generic.h, which only
defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself
defines the interval tree functions when invoked. Now that is a very
long macro which is unfortunate, but it does make the usage sites
(lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously.

- make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro,
instead of duplicating that code in the interval tree template.

- replaced vma_interval_tree_add(), which was actually handling the
nonlinear and interval tree cases, with vma_interval_tree_insert_after()
which handles only the interval tree case and has an API that is more
consistent with the other interval tree handling functions.
The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap().

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:40 +0800
9c079add0 rbtree: move augmented rbtree functionality to rbtree_augmented.h ... Browse Code »

Provide rb_insert_augmented() and rb_erase_augmented() through a new
rbtree_augmented.h include file. rb_erase_augmented() is defined there as
an __always_inline function, in order to allow inlining of augmented
rbtree callbacks into it. Since this generates a relatively large
function, each augmented rbtree user should make sure to have a single
call site.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:40 +0800
147e615f8 prio_tree: remove ... Browse Code »

After both prio_tree users have been converted to use red-black trees,
there is no need to keep around the prio tree library anymore.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:40 +0800
6b2dbba8b mm: replace vma prio_tree with an interval tree ... Browse Code »

Implement an interval tree as a replacement for the VMA prio_tree. The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA. So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:39 +0800
fff3fd8a1 rbtree: add prio tree and interval tree tests ... Browse Code »

Patch 1 implements support for interval trees, on top of the augmented
rbtree API. It also adds synthetic tests to compare the performance of
interval trees vs prio trees. Short answers is that interval trees are
slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
on search. It is debatable how realistic the synthetic test is, and I have
not made such measurements yet, but my impression is that interval trees
would still come out faster.

Patch 2 uses a preprocessor template to make the interval tree generic,
and uses it as a replacement for the vma prio_tree.

Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
a basic rbtree. We don't actually need the augmented rbtree support here
because the intervals are always non-overlapping.

Patch 4 removes the now-unused prio tree library.

Patch 5 proposes an additional optimization to rb_erase_augmented, now
providing it as an inline function so that the augmented callbacks can be
inlined in. This provides an additional 5-10% performance improvement
for the interval tree insert/erase benchmark. There is a maintainance cost
as it exposes augmented rbtree users to some of the rbtree library internals;
however I think this cost shouldn't be too high as I expect the augmented
rbtree will always have much less users than the base rbtree.

I should probably add a quick summary of why I think it makes sense to
replace prio trees with augmented rbtree based interval trees now. One of
the drivers is that we need augmented rbtrees for Rik's vma gap finding
code, and once you have them, it just makes sense to use them for interval
trees as well, as this is the simpler and more well known algorithm. prio
trees, in comparison, seem *too* clever: they impose an additional 'heap'
constraint on the tree, which they use to guarantee a faster worst-case
complexity of O(k+log N) for stabbing queries in a well-balanced prio
tree, vs O(k*log N) for interval trees (where k=number of matches,
N=number of intervals). Now this sounds great, but in practice prio trees
don't realize this theorical benefit. First, the additional constraint
makes them harder to update, so that the kernel implementation has to
simplify things by balancing them like a radix tree, which is not always
ideal. Second, the fact that there are both index and heap properties
makes both tree manipulation and search more complex, which results in a
higher multiplicative time constant. As it turns out, the simple interval
tree algorithm ends up running faster than the more clever prio tree.

This patch:

Add two test modules:

- prio_tree_test measures the performance of lib/prio_tree.c, both for
insertion/removal and for stabbing searches

- interval_tree_test measures the performance of a library of equivalent
functionality, built using the augmented rbtree support.

In order to support the second test module, lib/interval_tree.c is
introduced. It is kept separate from the interval_tree_test main file
for two reasons: first we don't want to provide an unfair advantage
over prio_tree_test by having everything in a single compilation unit,
and second there is the possibility that the interval tree functionality
could get some non-test users in kernel over time.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Hillf Danton
Cc: Peter Zijlstra
Cc: Catalin Marinas
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:39 +0800
3908836aa rbtree: add RB_DECLARE_CALLBACKS() macro ... Browse Code »

As proposed by Peter Zijlstra, this makes it easier to define the augmented
rbtree callbacks.

Signed-off-by: Michel Lespinasse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:38 +0800
9d9e6f970 rbtree: remove prior augmented rbtree implementation ... Browse Code »

convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api
and remove the old augmented rbtree implementation.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:38 +0800
14b94af0b rbtree: faster augmented rbtree manipulation ... Browse Code »

Introduce new augmented rbtree APIs that allow minimal recalculation of
augmented node information.

A new callback is added to the rbtree insertion and erase rebalancing
functions, to be called on each tree rotations. Such rotations preserve
the subtree's root augmented value, but require recalculation of the one
child that was previously located at the subtree root.

In the insertion case, the handcoded search phase must be updated to
maintain the augmented information on insertion, and then the rbtree
coloring/rebalancing algorithms keep it up to date.

In the erase case, things are more complicated since it is library
code that manipulates the rbtree in order to remove internal nodes.
This requires a couple additional callbacks to copy a subtree's
augmented value when a new root is stitched in, and to recompute
augmented values down the ancestry path when a node is removed from
the tree.

In order to preserve maximum speed for the non-augmented case,
we provide two versions of each tree manipulation function.
rb_insert_augmented() is the augmented equivalent of rb_insert_color(),
and rb_erase_augmented() is the augmented equivalent of rb_erase().

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
dadf93534 rbtree: augmented rbtree test ... Browse Code »

Small test to measure the performance of augmented rbtrees.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
4f035ad67 rbtree: low level optimizations in rb_erase() ... Browse Code »

Various minor optimizations in rb_erase():
- Avoid multiple loading of node->__rb_parent_color when computing parent
and color information (possibly not in close sequence, as there might
be further branches in the algorithm)
- In the 1-child subcase of case 1, copy the __rb_parent_color field from
the erased node to the child instead of recomputing it from the desired
parent and color
- When searching for the erased node's successor, differentiate between
cases 2 and 3 based on whether any left links were followed. This avoids
a condition later down.
- In case 3, keep a pointer to the erased node's right child so we don't
have to refetch it later to adjust its parent.
- In the no-childs subcase of cases 2 and 3, place the rebalance assigment
last so that the compiler can remove the following if(rebalance) test.

Also, added some comments to illustrate cases 2 and 3.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
46b6135a7 rbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color() ... Browse Code »

An interesting observation for rb_erase() is that when a node has
exactly one child, the node must be black and the child must be red.
An interesting consequence is that removing such a node can be done by
simply replacing it with its child and making the child black,
which we can do efficiently in rb_erase(). __rb_erase_color() then
only needs to handle the no-childs case and can be modified accordingly.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:37 +0800
60670b803 rbtree: place easiest case first in rb_erase() ... Browse Code »

In rb_erase, move the easy case (node to erase has no more than
1 child) first. I feel the code reads easier that way.

Signed-off-by: Michel Lespinasse
Reviewed-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:36 +0800
7abc704ae rbtree: add __rb_change_child() helper function ... Browse Code »

Add __rb_change_child() as an inline helper function to replace code that
would otherwise be duplicated 4 times in the source.

No changes to binary size or speed.

Signed-off-by: Michel Lespinasse
Reviewed-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:36 +0800
28d753092 rbtree test: fix sparse warning about 64-bit constant ... Browse Code »

Just a small fix to make sparse happy.

Signed-off-by: Michel Lespinasse
Reported-by: Fengguang Wu
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:36 +0800
59633abf3 rbtree: optimize fetching of sibling node ... Browse Code »

When looking to fetch a node's sibling, we went through a sequence of:
- check if node is the parent's left child
- if it is, then fetch the parent's right child

This can be replaced with:
- fetch the parent's right child as an assumed sibling
- check that node is NOT the fetched child

This avoids fetching the parent's left child when node is actually
that child. Saves a bit on code size, though it doesn't seem to make
a large difference in speed.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: David Woodhouse
Acked-by: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:35 +0800
7ce6ff9e5 rbtree: coding style adjustments ... Browse Code »

Set comment and indentation style to be consistent with linux coding style
and the rest of the file, as suggested by Peter Zijlstra

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:35 +0800
6280d2356 rbtree: low level optimizations in __rb_erase_color() ... Browse Code »

In __rb_erase_color(), we often already have pointers to the nodes being
rotated and/or know what their colors must be, so we can generate more
efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
functions.

Also when the current node is red or when flipping the sibling's color,
the parent is already known so we can use the more efficient
rb_set_parent_color() function to set the desired color.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:35 +0800
e125d1471 rbtree: optimize case selection logic in __rb_erase_color() ... Browse Code »

In __rb_erase_color(), we have to select one of 3 cases depending on the
color on the 'other' node children. If both children are black, we flip a
few node colors and iterate. Otherwise, we do either one or two tree
rotations, depending on the color of the 'other' child opposite to 'node',
and then we are done.

The corresponding logic had duplicate checks for the color of the 'other'
child opposite to 'node'. It was checking it first to determine if both
children are black, and then to determine how many tree rotations are
required. Rearrange the logic to avoid that extra check.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:34 +0800
d6ff12739 rbtree: adjust node color in __rb_erase_color() only when necessary ... Browse Code »

In __rb_erase_color(), we were always setting a node to black after
exiting the main loop. And in one case, after fixing up the tree to
satisfy all rbtree invariants, we were setting the current node to root
just to guarantee a loop exit, at which point the root would be set to
black. However this is not necessary, as the root of an rbtree is already
known to be black. The only case where the color flip is required is when
we exit the loop due to the current node being red, and it's easiest to
just do the flip at that point instead of doing it after the loop.

[adrian.hunter@intel.com: perf tools: fix build for another rbtree.c change]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Adrian Hunter
Cc: Alexander Shishkin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:34 +0800
5bc9188aa rbtree: low level optimizations in rb_insert_color() ... Browse Code »

- Use the newly introduced rb_set_parent_color() function to flip the color
of nodes whose parent is already known.
- Optimize rb_parent() when the node is known to be red - there is no need
to mask out the color in that case.
- Flipping gparent's color to red requires us to fetch its rb_parent_color
field, so we can reuse it as the parent value for the next loop iteration.
- Do not use __rb_rotate_left() and __rb_rotate_right() to handle tree
rotations: we already have pointers to all relevant nodes, and know their
colors (either because we want to adjust it, or because we've tested it,
or we can deduce it as black due to the node proximity to a known red node).
So we can generate more efficient code by making use of the node pointers
we already have, and setting both the parent and color attributes for
nodes all at once. Also in Case 2, some node attributes don't have to
be set because we know another tree rotation (Case 3) will always follow
and override them.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:34 +0800
6d58452dc rbtree: adjust root color in rb_insert_color() only when necessary ... Browse Code »

The root node of an rbtree must always be black. However,
rb_insert_color() only needs to maintain this invariant when it has been
broken - that is, when it exits the loop due to the current (red) node
being the root. In all other cases (exiting after tree rotations, or
exiting due to an existing black parent) the invariant is already
satisfied, so there is no need to adjust the root node color.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:33 +0800
1f0528653 rbtree: break out of rb_insert_color loop after tree rotation ... Browse Code »

It is a well known property of rbtrees that insertion never requires more
than two tree rotations. In our implementation, after one loop iteration
identified one or two necessary tree rotations, we would iterate and look
for more. However at that point the node's parent would always be black,
which would cause us to exit the loop.

We can make the code flow more obvious by just adding a break statement
after the tree rotations, where we know we are done. Additionally, in the
cases where two tree rotations are necessary, we don't have to update the
'node' pointer as it wouldn't be used until the next loop iteration, which
we now avoid due to this break statement.

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:33 +0800
910a742d4 rbtree: performance and correctness test ... Browse Code »

This small module helps measure the performance of rbtree insert and
erase.

Additionally, we run a few correctness tests to check that the rbtrees
have all desired properties:

- contains the right number of nodes in the order desired,
- never two consecutive red nodes on any path,
- all paths to leaf nodes have the same number of black nodes,
- root node is black

[akpm@linux-foundation.org: fix printk warning: sparc64 cycles_t is unsigned long]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:33 +0800
bf7ad8eea rbtree: move some implementation details from rbtree.h to rbtree.c ... Browse Code »

rbtree users must use the documented APIs to manipulate the tree
structure. Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c

[dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Signed-off-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:32 +0800
4c199a93a rbtree: empty nodes have no color ... Browse Code »

Empty nodes have no color. We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf37952 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though. axboe introduced them in commit 10fd48f2376d
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups. This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Acked-by: David Woodhouse
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Daniel Santos
Cc: Jens Axboe
Cc: "Eric W. Biederman"
Cc: John Stultz
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-10-09 15:22:32 +0800
9b2a60c48 Kconfig: clean up the long arch list for the DEBUG_BUGVERBOSE config option ... Browse Code »

Introduce HAVE_DEBUG_BUGVERBOSE config option and select it in
corresponding architecture Kconfig files. Architectures that already
select GENERIC_BUG don't need to select HAVE_DEBUG_BUGVERBOSE.

Signed-off-by: Catalin Marinas
Acked-by: Geert Uytterhoeven
Cc: David Howells
Cc: Hirokazu Takata
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Chris Metcalf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2012-10-09 15:22:14 +0800
b69ec42b1 Kconfig: clean up the long arch list for the DEBUG_KMEMLEAK config option ... Browse Code »

Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding
architecture Kconfig files. DEBUG_KMEMLEAK now only depends on
HAVE_DEBUG_KMEMLEAK.

Signed-off-by: Catalin Marinas
Cc: Russell King
Cc: Michal Simek
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Chris Metcalf
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2012-10-09 15:22:14 +0800

08 Oct, 2012

1 commit

c0703c12e Merge tag 'iommu-updates-v3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu ... Browse Code »

Pull IOMMU updates from Joerg Roedel:
"This time the IOMMU updates contain a bunch of fixes and cleanups to
various IOMMU drivers and the DMA debug code. New features are the
code for IRQ remapping support with the AMD IOMMU (preperation for
that was already merged in the last release) and a debugfs interface
to export some statistics in the NVidia Tegra IOMMU driver."

* tag 'iommu-updates-v3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (27 commits)
iommu/amd: Remove obsolete comment line
dma-debug: Remove local BUS_NOTIFY_UNBOUND_DRIVER define
iommu/amd: Fix possible use after free in get_irq_table()
iommu/amd: Report irq remapping through IOMMU-API
iommu/amd: Print message to system log when irq remapping is enabled
iommu/irq: Use amd_iommu_irq_ops if supported
iommu/amd: Make sure irq remapping still works on dma init failure
iommu/amd: Add initialization routines for AMD interrupt remapping
iommu/amd: Add call-back routine for HPET MSI
iommu/amd: Implement MSI routines for interrupt remapping
iommu/amd: Add IOAPIC remapping routines
iommu/amd: Add routines to manage irq remapping tables
iommu/amd: Add IRTE invalidation routine
iommu/amd: Make sure IOMMU is not considered to translate itself
iommu/amd: Split device table initialization into irq and dma part
iommu/amd: Check if IOAPIC information is correct
iommu/amd: Allocate data structures to keep track of irq remapping tables
iommu/amd: Add slab-cache for irq remapping tables
iommu/amd: Keep track of HPET and IOAPIC device ids
iommu/amd: Fix features reporting
...

Linus Torvalds
2012-10-08 05:33:44 +0800

06 Oct, 2012

11 commits

33e2a4227 lib/decompress.c add __init to decompress_method and data ... Browse Code »

Fix the warning:

WARNING: vmlinux.o(.text+0x14cfd8): Section mismatch in reference from the variable compressed_formats to the function .init.text:gunzip()
The function compressed_formats() references
the function __init gunzip().
etc..

Within decompress.c, compressed_formats[] needs 'a __initdata annotation',
because some of it's data members refer to functions which will be
unloaded after init.

Consequently, its user decompress_method() will get the __init prefix.

Signed-off-by: Hein Tibosch
Cc: Albin Tonnerre
Cc: Phillip Lougher
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hein Tibosch
2012-10-06 02:05:32 +0800
8290e2d2d scatterlist: atomic sg_mapping_iter() no longer needs disabled IRQs ... Browse Code »

SG mapping iterator w/ SG_MITER_ATOMIC set required IRQ disabled because
it originally used KM_BIO_SRC_IRQ to allow use from IRQ handlers.
kmap_atomic() has long been updated to handle stacking atomic mapping
requests on per-cpu basis and only requires not sleeping while mapped.

Update sg_mapping_iter such that atomic iterators only require disabling
preemption instead of disabling IRQ.

While at it, convert wte weird @ARG@ notations to @ARG in the comment of
sg_miter_start().

Signed-off-by: Tejun Heo
Cc: Maxim Levitsky
Cc: Alex Dubov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2012-10-06 02:04:58 +0800
17d7aac9a lib/plist.c: make plist test announcements KERN_DEBUG ... Browse Code »

They show up in dmesg

[ 4.041094] start plist test
[ 4.045804] end plist test

without a lot of meaning so hide them behind debug loglevel.

Signed-off-by: Borislav Petkov
Cc: Lai Jiangshan
Acked-by: Steven Rostedt
Cc: Paul Gortmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Borislav Petkov
2012-10-06 02:04:58 +0800
da99075c1 lib/vsprintf.c: improve standard conformance of sscanf() ... Browse Code »

Xen's pciback points out a couple of deficiencies with vsscanf()'s
standard conformance:

- Trailing character matching cannot be checked by the caller: With a
format string of "(%x:%x.%x) %n" absence of the closing parenthesis
cannot be checked, as input of "(00:00.0)" doesn't cause the %n to be
evaluated (because of the code not skipping white space before the
trailing %n).

- The parameter corresponding to a trailing %n could get filled even if
there was a matching error: With a format string of "(%x:%x.%x)%n",
input of "(00:00.0]" would still fill the respective variable pointed to
(and hence again make the mismatch non-detectable by the caller).

This patch aims at fixing those, but leaves other non-conforming aspects
of it untouched, among them these possibly relevant ones:

- improper handling of the assignment suppression character '*' (blindly
discarding all succeeding non-white space from the format and input
strings),

- not honoring conversion specifiers for %n, - not recognizing the C99
conversion specifier 't' (recognized by vsprintf()).

Signed-off-by: Jan Beulich
Cc: Konrad Rzeszutek Wilk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2012-10-06 02:04:58 +0800
214f766ea lib/spinlock_debug: avoid livelock in do_raw_spin_lock() ... Browse Code »

The logic in do_raw_spin_lock() attempts to acquire a spinlock by invoking
arch_spin_trylock() in a loop with a delay between each attempt. Now
consider the following situation in a 2 CPU system:

1. CPU-0 continually acquires and releases a spinlock in a
tight loop; it stays in this loop until some condition X
is satisfied. X can only be satisfied by another CPU.

2. CPU-1 tries to acquire the same spinlock, in an attempt
to satisfy the aforementioned condition X. However, it
never sees the unlocked value of the lock because the
debug spinlock code uses trylock instead of just lock;
it checks at all the wrong moments - whenever CPU-0 has
locked the lock.

Now in the absence of debug spinlocks, the architecture specific spinlock
code can correctly allow CPU-1 to wait in a "queue" (e.g., ticket
spinlocks), ensuring that it acquires the lock at some point. However,
with the debug spinlock code, livelock can easily occur due to the use of
try_lock, which obviously cannot put the CPU in that "queue". This
queueing mechanism is implemented in both x86 and ARM spinlock code.

Note that the situation mentioned above is not hypothetical. A real
problem was encountered where CPU-0 was running hrtimer_cancel with
interrupts disabled, and CPU-1 was attempting to run the hrtimer that
CPU-0 was trying to cancel.

Address this by actually attempting arch_spin_lock once it is suspected
that there is a spinlock lockup. If we're in a situation that is
described above, the arch_spin_lock should succeed; otherwise other
timeout mechanisms (e.g., watchdog) should alert the system of a lockup.
Therefore, if there is a genuine system problem and the spinlock can't be
acquired, the end result (irrespective of this change being present) is
the same. If there is a livelock caused by the debug code, this change
will allow the lock to be acquired, depending on the implementation of the
lower level arch specific spinlock code.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Vikram Mulukutla
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vikram Mulukutla
2012-10-06 02:04:57 +0800
ca279cf10 genalloc: make it possible to use a custom allocation algorithm ... Browse Code »

Premit use of another algorithm than the default first-fit one. For
example a custom algorithm could be used to manage alignment requirements.

As I can't predict all the possible requirements/needs for all allocation
uses cases, I add a "free" field 'void *data' to pass any needed
information to the allocation function. For example 'data' could be used
to handle a structure where you store the alignment, the expected memory
bank, the requester device, or any information that could influence the
allocation algorithm.

An usage example may look like this:
struct my_pool_constraints {
int align;
int bank;
...
};

unsigned long my_custom_algo(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data)
{
struct my_pool_constraints *constraints = data;
...
deal with allocation contraints
...
return the index in bitmap where perform the allocation
}

void create_my_pool()
{
struct my_pool_constraints c;
struct gen_pool *pool = gen_pool_create(...);
gen_pool_add(pool, ...);
gen_pool_set_algo(pool, my_custom_algo, &c);
}

Add of best-fit algorithm function:
most of the time best-fit is slower then first-fit but memory fragmentation
is lower. The random buffer allocation/free tests don't show any arithmetic
relation between the allocation time and fragmentation but the
best-fit algorithm
is sometime able to perform the allocation when the first-fit can't.

This new algorithm help to remove static allocations on ESRAM, a small but
fast on-chip RAM of few KB, used for high-performance uses cases like DMA
linked lists, graphic accelerators, encoders/decoders. On the Ux500
(in the ARM tree) we have define 5 ESRAM banks of 128 KB each and use of
static allocations becomes unmaintainable:
cd arch/arm/mach-ux500 && grep -r ESRAM .
./include/mach/db8500-regs.h:/* Base address and bank offsets for ESRAM */
./include/mach/db8500-regs.h:#define U8500_ESRAM_BASE 0x40000000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK_SIZE 0x00020000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK0 U8500_ESRAM_BASE
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK1 (U8500_ESRAM_BASE + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK2 (U8500_ESRAM_BANK1 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK3 (U8500_ESRAM_BANK2 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK4 (U8500_ESRAM_BANK3 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_DMA_LCPA_OFFSET 0x10000
./include/mach/db8500-regs.h:#define U8500_DMA_LCPA_BASE
(U8500_ESRAM_BANK0 + U8500_ESRAM_DMA_LCPA_OFFSET)
./include/mach/db8500-regs.h:#define U8500_DMA_LCLA_BASE U8500_ESRAM_BANK4

I want to use genalloc to do dynamic allocations but I need to be able to
fine tune the allocation algorithm. I my case best-fit algorithm give
better results than first-fit, but it will not be true for every use case.

Signed-off-by: Benjamin Gaignard
Cc: Huang Ying
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Gaignard
2012-10-06 02:04:57 +0800
e96875677 lib/gcd.c: prevent possible div by 0 ... Browse Code »

Account for all properties when a and/or b are 0:
gcd(0, 0) = 0
gcd(a, 0) = a
gcd(0, b) = b

Fixes no known problems in current kernels.

Signed-off-by: Davidlohr Bueso
Cc: Eric Dumazet
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2012-10-06 02:04:57 +0800
8f1f66ed7 lib/Kconfig.debug: adjust hard-lockup related Kconfig options ... Browse Code »

The main option should not appear in the resulting .config when the
dependencies aren't met (i.e. use "depends on" rather than directly
setting the default from the combined dependency values).

The sub-options should depend on the main option rather than a more
generic higher level one.

Signed-off-by: Jan Beulich
Acked-by: Don Zickus
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2012-10-06 02:04:57 +0800
77dd3b0bd lib/parser.c: avoid overflow in match_number() ... Browse Code »

The result of converting an integer value to another signed integer type
that's unable to represent the original value is implementation defined.
(See notes in section 6.3.1.3 of the C standard.)

In match_number(), the result of simple_strtol() (which returns type long)
is assigned to a value of type int.

Instead, handle the result of simple_strtol() in a well-defined way, and
return -ERANGE if the result won't fit in the int variable used to hold
the parsed result.

No current callers pay attention to the particular error value returned,
so this additional return code shouldn't do any harm.

[akpm@linux-foundation.org: coding-style tweaks]
Signed-off-by: Alex Elder
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alex Elder
2012-10-06 02:04:56 +0800
125c4c706 idr: rename MAX_LEVEL to MAX_IDR_LEVEL ... Browse Code »

To avoid name conflicts:

drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: Fengguang Wu
Cc: Bernd Petrovitsch
Cc: walter harms
Cc: Glauber Costa
Signed-off-by: Stephen Rothwell
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2012-10-06 02:04:56 +0800
7c59154e7 lib/vsprintf: update documentation to cover all of %p[Mm][FR] ... Browse Code »

Acked-by: Andrei Emeltchenko
Signed-off-by: Andy Shevchenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Shevchenko
2012-10-06 02:04:50 +0800