Eric Lee / smarc-fsl-linux-kernel

17 Apr, 2020

2 commits

07378b099 xarray: Fix early termination of xas_for_each_marked ... Browse Code »

commit 7e934cf5ace1dceeb804f7493fa28bb697ed3c52 upstream.

xas_for_each_marked() is using entry == NULL as a termination condition
of the iteration. When xas_for_each_marked() is used protected only by
RCU, this can however race with xas_store(xas, NULL) in the following
way:

TASK1 TASK2
page_cache_delete() find_get_pages_range_tag()
xas_for_each_marked()
xas_find_marked()
off = xas_find_chunk()

xas_store(&xas, NULL)
xas_init_marks(&xas);
...
rcu_assign_pointer(*slot, NULL);
entry = xa_entry(off);

And thus xas_for_each_marked() terminates prematurely possibly leading
to missed entries in the iteration (translating to missing writeback of
some pages or a similar problem).

If we find a NULL entry that has been marked, skip it (unless we're trying
to allocate an entry).

Reported-by: Jan Kara
CC: stable@vger.kernel.org
Fixes: ef8e5717db01 ("page cache: Convert delete_batch to XArray")
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Greg Kroah-Hartman

Matthew Wilcox (Oracle)
2020-04-17 16:50:18 +0800
8f4c8e92b XArray: Fix xas_pause for large multi-index entries ... Browse Code »

commit c36d451ad386b34f452fc3c8621ff14b9eaa31a6 upstream.

Inspired by the recent Coverity report, I looked for other places where
the offset wasn't being converted to an unsigned long before being
shifted, and I found one in xas_pause() when the entry being paused is
of order >32.

Fixes: b803b42823d0 ("xarray: Add XArray iterators")
Signed-off-by: Matthew Wilcox (Oracle)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Matthew Wilcox (Oracle)
2020-04-17 16:50:18 +0800

08 Apr, 2020

1 commit

16696ee7b XArray: Fix xa_find_next for large multi-index entries ... Browse Code »

[ Upstream commit bd40b17ca49d7d110adf456e647701ce74de2241 ]

Coverity pointed out that xas_sibling() was shifting xa_offset without
promoting it to an unsigned long first, so the shift could cause an
overflow and we'd get the wrong answer. The fix is obvious, and the
new test-case provokes UBSAN to report an error:
runtime error: shift exponent 60 is too large for 32-bit type 'int'

Fixes: 19c30f4dd092 ("XArray: Fix xa_find_after with multi-index entries")
Reported-by: Bjorn Helgaas
Reported-by: Kees Cook
Signed-off-by: Matthew Wilcox (Oracle)
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

Matthew Wilcox (Oracle)
2020-04-08 15:08:40 +0800

06 Feb, 2020

1 commit

08022255a XArray: Fix xas_pause at ULONG_MAX ... Browse Code »

[ Upstream commit 82a22311b7a68a78709699dc8c098953b70e4fd2 ]

If we were unlucky enough to call xas_pause() when the index was at
ULONG_MAX (or a multi-slot entry which ends at ULONG_MAX), we would
wrap the index back around to 0 and restart the iteration from the
beginning. Use the XAS_BOUNDS state to indicate that we should just
stop the iteration.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Sasha Levin

Matthew Wilcox (Oracle)
2020-02-06 05:22:47 +0800

29 Jan, 2020

3 commits

dd05cf12c XArray: Fix xas_find returning too many entries ... Browse Code »

commit c44aa5e8ab58b5f4cf473970ec784c3333496a2e upstream.

If you call xas_find() with the initial index > max, it should have
returned NULL but was returning the entry at index.

Signed-off-by: Matthew Wilcox (Oracle)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Matthew Wilcox (Oracle)
2020-01-29 23:45:27 +0800
db3856128 XArray: Fix xa_find_after with multi-index entries ... Browse Code »

commit 19c30f4dd0923ef191f35c652ee4058e91e89056 upstream.

If the entry is of an order which is a multiple of XA_CHUNK_SIZE,
the current detection of sibling entries does not work. Factor out
an xas_sibling() function to make xa_find_after() a little more
understandable, and write a new implementation that doesn't suffer from
the same bug.

Signed-off-by: Matthew Wilcox (Oracle)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Matthew Wilcox (Oracle)
2020-01-29 23:45:26 +0800
a5135ca1f XArray: Fix infinite loop with entry at ULONG_MAX ... Browse Code »

commit 430f24f94c8a174d411a550d7b5529301922e67a upstream.

If there is an entry at ULONG_MAX, xa_for_each() will overflow the
'index + 1' in xa_find_after() and wrap around to 0. Catch this case
and terminate the loop by returning NULL.

Signed-off-by: Matthew Wilcox (Oracle)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Matthew Wilcox (Oracle)
2020-01-29 23:45:26 +0800

02 Jul, 2019

1 commit

91abab838 XArray: Fix xas_next() with a single entry at 0 ... Browse Code »

If there is only a single entry at 0, the first time we call xas_next(),
we return the entry. Unfortunately, all subsequent times we call
xas_next(), we also return the entry at 0 instead of noticing that the
xa_index is now greater than zero. This broke find_get_pages_contig().

Fixes: 64d3e9a9e0cc ("xarray: Step through an XArray")
Reported-by: Kent Overstreet
Signed-off-by: Matthew Wilcox (Oracle)

Matthew Wilcox (Oracle)
2019-07-02 05:11:16 +0800

01 Jun, 2019

1 commit

7b785645e mm: fix page cache convergence regression ... Browse Code »

Since a28334862993 ("page cache: Finish XArray conversion"), on most
major Linux distributions, the page cache doesn't correctly transition
when the hot data set is changing, and leaves the new pages thrashing
indefinitely instead of kicking out the cold ones.

On a freshly booted, freshly ssh'd into virtual machine with 1G RAM
running stock Arch Linux:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120086/153600 workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120268/153600 workingset-b

workingset-b is a 600M file on a 1G host that is otherwise entirely
idle. No matter how often it's being accessed, it won't get cached.

While investigating, I noticed that the non-resident information gets
aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is
a problem because a workingset transition like this relies on the
non-resident information tracked in the page cache tree of evicted
file ranges: when the cache faults are refaults of recently evicted
cache, we challenge the existing active set, and that allows a new
workingset to establish itself.

Tracing the shrinker that maintains this memory revealed that all page
cache tree nodes were allocated to the root cgroup. This is a problem,
because 1) the shrinker sizes the amount of non-resident information
it keeps to the size of the cgroup's other memory and 2) on most major
Linux distributions, only kernel threads live in the root cgroup and
everything else gets put into services or session groups:

[root@ham ~]# cat /proc/self/cgroup
0::/user.slice/user-0.slice/session-c1.scope

As a result, we basically maintain no non-resident information for the
workloads running on the system, thus breaking the caching algorithm.

Looking through the code, I found the culprit in the above-mentioned
patch: when switching from the radix tree to xarray, it dropped the
__GFP_ACCOUNT flag from the tree node allocations - the flag that
makes sure the allocated memory gets charged to and tracked by the
cgroup of the calling process - in this case, the one doing the fault.

To fix this, allow xarray users to specify per-tree flag that makes
xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache
tree annotation to request such cgroup tracking for the cache nodes.

With this patch applied, the page cache correctly converges on new
workingsets again after just a few iterations:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ ./mincore workingset-a workingset-b
124607/153600 workingset-a
87876/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
81313/153600 workingset-a
133321/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
63036/153600 workingset-a
153600/153600 workingset-b

Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Johannes Weiner
Reviewed-by: Shakeel Butt
Signed-off-by: Matthew Wilcox (Oracle)

Johannes Weiner
2019-06-01 01:52:41 +0800

22 Feb, 2019

2 commits

4a5c8d898 XArray: Fix xa_reserve for 2-byte aligned entries ... Browse Code »

If we reserve index 0, the next entry to be stored there might be 2-byte
aligned. That means we have to create the root xa_node at the time of
reserving the initial entry.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-22 06:54:44 +0800
2fbe967b3 XArray: Fix xa_erase of 2-byte aligned entries ... Browse Code »

xas_store() was interpreting the entry it found in the array as a node
entry if the bottom two bits had value 2. That's only true if either
the entry is in the root node or in a non-leaf node.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-22 06:36:45 +0800

21 Feb, 2019

2 commits

962033d55 XArray: Use xa_cmpxchg to implement xa_reserve ... Browse Code »

Jason feels this is clearer, and it saves a function and an exported
symbol.

Suggested-by: Jason Gunthorpe
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-21 06:08:54 +0800
b38f6c502 XArray: Fix xa_release in allocating arrays ... Browse Code »

xa_cmpxchg() was a little too magic in turning ZERO entries into NULL,
and would leave the entry set to the ZERO entry instead of releasing
it for future use. After careful review of existing users of
xa_cmpxchg(), change the semantics so that it does not translate either
incoming argument from NULL into ZERO entries.

Add several tests to the test-suite to make sure this problem doesn't
come back.

Reported-by: Jason Gunthorpe
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-21 06:08:54 +0800

07 Feb, 2019

4 commits

2fa044e51 XArray: Add cyclic allocation ... Browse Code »

This differs slightly from the IDR equivalent in five ways.

1. It can allocate up to UINT_MAX instead of being limited to INT_MAX,
like xa_alloc(). Also like xa_alloc(), it will write to the 'id'
pointer before placing the entry in the XArray.
2. The 'next' cursor is allocated separately from the XArray instead
of being part of the IDR. This saves memory for all the users which
do not use the cyclic allocation API and suits some users better.
3. It returns -EBUSY instead of -ENOSPC.
4. It will attempt to wrap back to the minimum value on memory allocation
failure as well as on an -EBUSY error, assuming that a user would
rather allocate a small ID than suffer an ID allocation failure.
5. It reports whether it has wrapped, which is important to some users.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:32:25 +0800
a3e4d3f97 XArray: Redesign xa_alloc API ... Browse Code »

It was too easy to forget to initialise the start index. Add an
xa_limit data structure which can be used to pass min & max, and
define a couple of special values for common cases. Also add some
more tests cribbed from the IDR test suite. Change the return value
from -ENOSPC to -EBUSY to match xa_insert().

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:32:23 +0800
3ccaf57a6 XArray: Add support for 1s-based allocation ... Browse Code »

A lot of places want to allocate IDs starting at 1 instead of 0.
While the xa_alloc() API supports this, it's not very efficient if lots
of IDs are allocated, due to having to walk down to the bottom of the
tree to see if ID 1 is available, then all the way over to the next
non-allocated ID. This method marks ID 0 as being occupied which wastes
one slot in the XArray, but preserves xa_empty() as working.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:13:24 +0800
fd9dc93e3 XArray: Change xa_insert to return -EBUSY ... Browse Code »

Userspace translates EEXIST to "File exists" which isn't a very good
error message for the problem. "Device or resource busy" is a better
indication of what went wrong.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-07 02:12:15 +0800

05 Feb, 2019

1 commit

809ab9371 XArray: Update xa_erase family descriptions ... Browse Code »

xa_erase does not allocate memory and doesn't have a gfp parameter.
Update the descriptions of all four variants to be more useful.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-02-05 12:16:58 +0800

07 Jan, 2019

3 commits

b0606fed6 XArray: Honour reserved entries in xa_insert ... Browse Code »

xa_insert() should treat reserved entries as occupied, not as available.
Also, it should treat requests to insert a NULL pointer as a request
to reserve the slot. Add xa_insert_bh() and xa_insert_irq() for
completeness.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-01-07 11:12:58 +0800
76b4e5299 XArray: Permit storing 2-byte-aligned pointers ... Browse Code »

On m68k, statically allocated pointers may only be two-byte aligned.
This clashes with the XArray's method for tagging internal pointers.
Permit storing these pointers in single slots (ie not in multislots).

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-01-07 11:12:57 +0800
02669b17a XArray: Turn xa_init_flags into a static inline ... Browse Code »

A regular xa_init_flags() put all dynamically-initialised XArrays into
the same locking class. That leads to lockdep believing that taking
one XArray lock while holding another is a deadlock. It's possible to
work around some of these situations with separate locking classes for
irq/bh/regular XArrays, and SINGLE_DEPTH_NESTING, but that's ugly, and
it doesn't work for all situations (where we have completely unrelated
XArrays).

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2019-01-07 10:24:43 +0800

14 Dec, 2018

1 commit

48483614d XArray: Fix xa_alloc when id exceeds max ... Browse Code »

Specifying a starting ID greater than the maximum ID isn't something
attempted very often, but it should fail. It was succeeding due to
xas_find_marked() returning the wrong error state, so add tests for
both xa_alloc() and xas_find_marked().

Fixes: b803b42823d0 ("xarray: Add XArray iterators")
Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-12-14 03:07:33 +0800

17 Nov, 2018

1 commit

44a4a66b6 XArray: Correct xa_store_range ... Browse Code »

The explicit '64' should have been BITS_PER_LONG, but while looking at
this code I realised I meant to use __ffs(), not ilog2().

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-17 05:27:28 +0800

06 Nov, 2018

8 commits

804dfaf01 XArray: Fix Documentation ... Browse Code »

Minor fixes.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 05:38:10 +0800
d9c480435 XArray: Handle NULL pointers differently for allocation ... Browse Code »

For allocating XArrays, it makes sense to distinguish beteen erasing an
entry and storing NULL. Storing NULL keeps the index allocated with a
NULL pointer associated with it while xa_erase() frees the index. Some
existing IDR users rely on this ability.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 05:38:09 +0800
611f31863 XArray: Unify xa_store and __xa_store ... Browse Code »

Saves around 115 bytes on a tinyconfig build and reduces the amount
of code duplication in the XArray implementation.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 05:38:09 +0800
9c16bb889 XArray: Turn xa_erase into an exported function ... Browse Code »

Make xa_erase() take the spinlock and then call __xa_erase(), but make
it out of line since it's such a common function.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 05:38:09 +0800
c5beb07e7 XArray: Unify xa_cmpxchg and __xa_cmpxchg ... Browse Code »

xa_cmpxchg() was one of the largest functions in the xarray
implementation. By turning it into a wrapper and having the callers
take the lock (like several other functions), we save 160 bytes on a
tinyconfig build and reduce the duplication in xarray.c.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 05:38:08 +0800
4c0608f4a XArray: Regularise xa_reserve ... Browse Code »

The xa_reserve() function was a little unusual in that it attempted to
be callable for all kinds of locking scenarios. Make it look like the
other APIs with __xa_reserve, xa_reserve_bh and xa_reserve_irq variants.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 05:38:08 +0800
9ee5a3b7e XArray: Export __xa_foo to non-GPL modules ... Browse Code »

Without this, it's not possible to use static inlines like xa_store_bh()
and xa_erase_irq().

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 03:56:58 +0800
8229706e0 XArray: Fix xa_for_each with a single element at 0 ... Browse Code »

The following sequence of calls would result in an infinite loop in
xa_find_after():

xa_store(xa, 0, x, GFP_KERNEL);
index = 0;
xa_for_each(xa, entry, index, ULONG_MAX, XA_PRESENT) { }

xa_find_after() was confusing the situation where we found no entry in
the tree with finding a multiorder entry, so it would look for the
successor entry forever. Just check for this case explicitly. Includes
a few new checks in the test suite to be sure this doesn't reappear.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-11-06 03:56:46 +0800

21 Oct, 2018

9 commits

0e9446c35 xarray: Add range store functionality ... Browse Code »

This version of xa_store_range() really only supports load and store.
Our only user only needs basic load and store functionality, so there's
no need to do the extra work to support marking and overlapping stores
correctly yet.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:46 +0800
371c752dc xarray: Track free entries in an XArray ... Browse Code »

Add the optional ability to track which entries in an XArray are free
and provide xa_alloc() to replace most of the functionality of the IDR.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:32 +0800
9f14d4f1f xarray: Add xa_reserve and xa_release ... Browse Code »

This function reserves a slot in the XArray for users which need
to acquire multiple locks before storing their entry in the tree and
so cannot use a plain xa_store().

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:00 +0800
2264f5132 xarray: Add xas_create_range ... Browse Code »

This hopefully temporary function is useful for users who have not yet
been converted to multi-index entries.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:45:59 +0800
4e99d4e95 xarray: Add xas_for_each_conflict ... Browse Code »

This iterator iterates over each entry that is stored in the index or
indices specified by the xa_state. This is intended for use for a
conditional store of a multiindex entry, or to allow entries which are
about to be removed from the xarray to be disposed of properly.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:45:59 +0800
64d3e9a9e xarray: Step through an XArray ... Browse Code »

The xas_next and xas_prev functions move the xas index by one position,
and adjust the rest of the iterator state to match it. This is more
efficient than calling xas_set() as it keeps the iterator at the leaves
of the tree instead of walking the iterator from the root each time.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:45:59 +0800
687149fca xarray: Destroy an XArray ... Browse Code »

This function frees all the internal memory allocated to the xarray
and reinitialises it to be empty.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:45:59 +0800
80a0a1a9a xarray: Extract entries from an XArray ... Browse Code »

The xa_extract function combines the functionality of
radix_tree_gang_lookup() and radix_tree_gang_lookup_tagged().
It extracts entries matching the specified filter into a normal array.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:45:58 +0800
b803b4282 xarray: Add XArray iterators ... Browse Code »

The xa_for_each iterator allows the user to efficiently walk a range
of the array, executing the loop body once for each entry in that
range that matches the filter. This commit also includes xa_find()
and xa_find_after() which are helper functions for xa_for_each() but
may also be useful in their own right.

In the xas family of functions, we have xas_for_each(), xas_find(),
xas_next_entry(), xas_for_each_tagged(), xas_find_tagged(),
xas_next_tagged() and xas_pause().

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:45:58 +0800