21 Oct, 2020

1 commit

  • Pull XArray updates from Matthew Wilcox:

    - Fix the test suite after introduction of the local_lock

    - Fix a bug in the IDA spotted by Coverity

    - Change the API that allows the workingset code to delete a node

    - Fix xas_reload() when dealing with entries that occupy multiple
    indices

    - Add a few more tests to the test suite

    - Fix an unsigned int being shifted into an unsigned long

    * tag 'xarray-5.9' of git://git.infradead.org/users/willy/xarray:
    XArray: Fix xas_create_range for ranges above 4 billion
    radix-tree: fix the comment of radix_tree_next_slot()
    XArray: Fix xas_reload for multi-index entries
    XArray: Add private interface for workingset node deletion
    XArray: Fix xas_for_each_conflict documentation
    XArray: Test marked multiorder iterations
    XArray: Test two more things about xa_cmpxchg
    ida: Free allocated bitmap in error path
    radix tree test suite: Fix compilation

    Linus Torvalds
     

17 Oct, 2020

1 commit

  • The documentation for these functions indicates that callers don't need to
    hold a lock while calling them, but that documentation is only in one
    place under "IDA Usage". Let's state the same information on each IDA
    function so that it's clear what the calling context requires.
    Furthermore, let's document ida_simple_get() with the same information so
    that callers know how this API works.

    Signed-off-by: Stephen Boyd
    Signed-off-by: Andrew Morton
    Reviewed-by: Greg Kroah-Hartman
    Cc: Tri Vo
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Link: https://lkml.kernel.org/r/20200910055246.2297797-1-swboyd@chromium.org
    Signed-off-by: Linus Torvalds

    Stephen Boyd
     

07 Oct, 2020

1 commit

  • If a bitmap needs to be allocated, and then by the time the thread
    is scheduled to be run again all the indices which would satisfy the
    allocation have been allocated then we would leak the allocation. Almost
    impossible to hit in practice, but a trivial fix. Found by Coverity.

    Fixes: f32f004cddf8 ("ida: Convert to XArray")
    Reported-by: coverity-bot
    Reviewed-by: Kees Cook
    Signed-off-by: Matthew Wilcox (Oracle)

    Matthew Wilcox (Oracle)
     

02 Nov, 2019

1 commit

  • Commit 5c089fd0c734 ("idr: Fix idr_get_next race with idr_remove")
    neglected to fix idr_get_next_ul(). As far as I can tell, nobody's
    actually using this interface under the RCU read lock, but fix it now
    before anybody decides to use it.

    Fixes: 5c089fd0c734 ("idr: Fix idr_get_next race with idr_remove")
    Signed-off-by: Matthew Wilcox (Oracle)

    Matthew Wilcox (Oracle)
     

03 Jun, 2019

1 commit

  • If the entry is deleted from the IDR between the call to
    radix_tree_iter_find() and rcu_dereference_raw(), idr_get_next()
    will return NULL, which will end the iteration prematurely. We should
    instead continue to the next entry in the IDR. This only happens if the
    iteration is protected by the RCU lock. Most IDR users use a spinlock
    or semaphore to exclude simultaneous modifications. It was noticed once
    the PID allocator was converted to use the IDR, as it uses the RCU lock,
    but there may be other users elsewhere in the kernel.

    We can't use the normal pattern of calling radix_tree_deref_retry()
    (which catches both a retry entry in a leaf node and a node entry in
    the root) as the IDR supports storing entries which are unaligned,
    which will trigger an infinite loop if they are encountered. Instead,
    we have to explicitly check whether the entry is a retry entry.

    Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
    Reported-by: Brendan Gregg
    Tested-by: Brendan Gregg
    Signed-off-by: Matthew Wilcox (Oracle)

    Matthew Wilcox (Oracle)
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Oct, 2018

3 commits

  • The only user of this functionality was the workingset code, and it's
    now been converted to the XArray. Remove __radix_tree_delete_node()
    entirely as it was also only used by the workingset code.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • Use the XA_TRACK_FREE ability to track which entries have a free bit,
    similarly to how it uses the radix tree's IDR_FREE tag. This eliminates
    the per-cpu ida_bitmap preload, and fixes the memory consumption
    regression I introduced when making the IDR able to store any pointer.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • This is a direct replacement for struct radix_tree_root. Some of the
    struct members have changed name; convert those, and use a #define so
    that radix_tree users continue to work without change.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Josef Bacik

    Matthew Wilcox
     

30 Sep, 2018

2 commits

  • Introduce xarray value entries and tagged pointers to replace radix
    tree exceptional entries. This is a slight change in encoding to allow
    the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
    value entry). It is also a change in emphasis; exceptional entries are
    intimidating and different. As the comment explains, you can choose
    to store values or pointers in the xarray and they are both first-class
    citizens.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Josef Bacik

    Matthew Wilcox
     
  • An upcoming change to the encoding of internal entries will set the bottom
    two bits to 0b10. Unfortunately, m68k only aligns some data structures
    to 2 bytes, so the IDR will interpret them as internal entries and things
    will go badly wrong.

    Change the radix tree so that it stops either when the node indicates
    that it's the bottom of the tree (shift == 0) or when the entry is not an
    internal entry. This means we cannot insert an arbitrary kernel pointer
    as a multiorder entry, but the IDR does not permit multiorder entries.

    Annoyingly, this means the IDR can no longer take advantage of the radix
    tree's ability to store a single entry at offset 0 without allocating
    memory. A pointer which is 2-byte aligned cannot be stored directly in
    the root as it would be indistinguishable from a node, so we must allocate
    a node in order to store a 2-byte pointer at index 0. The idr_replace()
    function does not take a GFP flags argument, so cannot allocate memory.
    If a user inserts a 4-byte aligned pointer at index 0 and then replaces
    it with a 2-byte aligned pointer, we must be able to store it.

    Arbitrary pointer values are still not permitted; pointers of the
    form 2 + (i * 4) for values of i between 0 and 1023 are reserved for
    the implementation. These are not valid kernel pointers as they would
    point into the zero page.

    This change does cause a runtime memory consumption regression for
    the IDA. I will recover that later.

    Signed-off-by: Matthew Wilcox
    Tested-by: Guenter Roeck

    Matthew Wilcox
     

22 Aug, 2018

4 commits

  • This calling convention makes more sense for the implementation as well
    as the callers. It even shaves 32 bytes off the compiled code size.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • Delete ida_pre_get(), ida_get_new(), ida_get_new_above() and ida_remove()
    from the public API. Some of these functions still exist as internal
    helpers, but they should not be called by consumers.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • Add ida_alloc(), ida_alloc_min(), ida_alloc_max(), ida_alloc_range()
    and ida_free(). The ida_alloc_max() and ida_alloc_range() functions
    differ from ida_simple_get() in that they take an inclusive 'max'
    parameter instead of an exclusive 'end' parameter. Callers are about
    evenly split whether they'd like inclusive or exclusive parameters and
    'max' is easier to document than 'end'.

    Change the IDA allocation to first attempt to allocate a bit using
    existing memory, and only allocate memory afterwards. Also change the
    behaviour of 'min' > INT_MAX from being a BUG() to returning -ENOSPC.

    Leave compatibility wrappers in place for ida_simple_get() and
    ida_simple_remove() to avoid changing all callers.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • The user has no need to handle locking between ida_simple_get() and
    ida_simple_remove(). They shouldn't be forced to think about whether
    ida_destroy() might be called at the same time as any of their other
    IDA manipulation calls. Improve the documnetation while I'm in here.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     

08 Jun, 2018

1 commit

  • Improve the scalability of the IDA by using the per-IDA xa_lock rather
    than the global simple_ida_lock. IDAs are not typically used in
    performance-sensitive locations, but since we have this lock anyway, we
    can use it. It is also a step towards converting the IDA from the radix
    tree to the XArray.

    [akpm@linux-foundation.org: idr.c needs xarray.h]
    Link: http://lkml.kernel.org/r/20180331125332.GF13332@bombadil.infradead.org
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Cc: Rasmus Villemoes
    Cc: Daniel Vetter
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

27 Feb, 2018

1 commit

  • Khalid reported that the kernel selftests are currently failing:

    selftests: test_bpf.sh
    ========================================
    test_bpf: [FAIL]
    not ok 1..8 selftests: test_bpf.sh [FAIL]

    He bisected it to 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make
    1-based IDRs more efficient").

    The root cause is doing a signed comparison in idr_alloc_u32() instead
    of an unsigned comparison. I went looking for any similar problems and
    found a couple (which would each result in the failure to warn in two
    situations that aren't supposed to happen).

    I knocked up a few test-cases to prove that I was right and added them
    to the test-suite.

    Reported-by: Khalid Aziz
    Tested-by: Khalid Aziz
    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     

22 Feb, 2018

1 commit

  • As far as I can tell, the only place the per-cpu ida_bitmap is populated
    is in ida_pre_get. The pre-allocated element is stolen in two places in
    ida_get_new_above, in both cases immediately followed by a memset(0).

    Since ida_get_new_above is called with locks held, do the zeroing in
    ida_pre_get, or rather let kmalloc() do it. Also, apparently gcc
    generates ~44 bytes of code to do a memset(, 0, 128):

    $ scripts/bloat-o-meter vmlinux.{0,1}
    add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83)
    Function old new delta
    ida_pre_get 115 119 +4
    vermagic 27 28 +1
    ida_get_new_above 715 627 -88

    Link: http://lkml.kernel.org/r/20180108225634.15340-1-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes
    Acked-by: Matthew Wilcox
    Cc: Eric Biggers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

07 Feb, 2018

6 commits

  • About 20% of the IDR users in the kernel want the allocated IDs to start
    at 1. The implementation currently searches all the way down the left
    hand side of the tree, finds no free ID other than ID 0, walks all the
    way back up, and then all the way down again. This patch 'rebases' the
    ID so we fill the entire radix tree, rather than leave a gap at 0.

    Chris Wilson says: "I did the quick hack of allocating index 0 of the
    idr and that eradicated idr_get_free() from being at the top of the
    profiles for the many-object stress tests. This improvement will be
    much appreciated."

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • Now that the IDR can be used to store large IDs, it is possible somebody
    might only partially convert their old code and use the iterators which
    can only handle IDs up to INT_MAX. It's probably unwise to show them a
    truncated ID, so settle for spewing warnings to dmesg, and terminating
    the iteration.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • Most places in the kernel that we need to distinguish functions by the
    type of their arguments, we use '_ul' as a suffix for the unsigned long
    variant, not '_ext'. Also add kernel-doc.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • It has no more users, so remove it. Move idr_alloc() back into idr.c,
    move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
    wrappers around idr_get_free_cmn() and rename it to idr_get_free().
    While there is now no interface to allocate IDs larger than a u32,
    the IDR internals remain ready to handle a larger ID should a need arise.

    These changes make it possible to provide the guarantee that, if the
    nextid pointer points into the object, the object's ID will be initialised
    before a concurrent lookup can find the object.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • All current users of idr_alloc_ext() actually want to allocate a u32
    and idr_alloc_u32() fits their needs better.

    Like idr_get_next(), it uses a 'nextid' argument which serves as both
    a pointer to the start ID and the assigned ID (instead of a separate
    minimum and pointer-to-assigned-ID argument). It uses a 'max' argument
    rather than 'end' because the semantics that idr_alloc has for 'end'
    don't work well for unsigned types.

    Since idr_alloc_u32() returns an errno instead of the allocated ID, mark
    it as __must_check to help callers use it correctly. Include copious
    kernel-doc. Chris Mi has promised to contribute
    test-cases for idr_alloc_u32.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • Changing idr_replace's 'id' argument to 'unsigned long' works for all
    callers. Callers which passed a negative ID now get -ENOENT instead of
    -EINVAL. No callers relied on this error value.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     

16 Nov, 2017

1 commit

  • During truncation, the mapping has already been checked for shmem and
    dax so it's known that workingset_update_node is required.

    This patch avoids the checks on mapping for each page being truncated.
    In all other cases, a lookup helper is used to determine if
    workingset_update_node() needs to be called. The one danger is that the
    API is slightly harder to use as calling workingset_update_node directly
    without checking for dax or shmem mappings could lead to surprises.
    However, the API rarely needs to be used and hopefully the comment is
    enough to give people the hint.

    sparsetruncate (tiny)
    4.14.0-rc4 4.14.0-rc4
    oneirq-v1r1 pickhelper-v1r1
    Min Time 141.00 ( 0.00%) 140.00 ( 0.71%)
    1st-qrtle Time 142.00 ( 0.00%) 141.00 ( 0.70%)
    2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%)
    3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%)
    Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%)
    Max-95% Time 147.00 ( 0.00%) 145.00 ( 1.36%)
    Max-99% Time 195.00 ( 0.00%) 191.00 ( 2.05%)
    Max Time 230.00 ( 0.00%) 205.00 ( 10.87%)
    Amean Time 144.37 ( 0.00%) 143.82 ( 0.38%)
    Stddev Time 10.44 ( 0.00%) 9.00 ( 13.74%)
    Coeff Time 7.23 ( 0.00%) 6.26 ( 13.41%)
    Best99%Amean Time 143.72 ( 0.00%) 143.34 ( 0.26%)
    Best95%Amean Time 142.37 ( 0.00%) 142.00 ( 0.26%)
    Best90%Amean Time 142.19 ( 0.00%) 141.85 ( 0.24%)
    Best75%Amean Time 141.92 ( 0.00%) 141.58 ( 0.24%)
    Best50%Amean Time 141.69 ( 0.00%) 141.31 ( 0.27%)
    Best25%Amean Time 141.38 ( 0.00%) 140.97 ( 0.29%)

    As you'd expect, the gain is marginal but it can be detected. The
    differences in bonnie are all within the noise which is not surprising
    given the impact on the microbenchmark.

    radix_tree_update_node_t is a callback for some radix operations that
    optionally passes in a private field. The only user of the callback is
    workingset_update_node and as it no longer requires a mapping, the
    private field is removed.

    Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Reviewed-by: Jan Kara
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

04 Oct, 2017

1 commit


14 Sep, 2017

1 commit

  • IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id <
    0)' in idr_replace(), but it was intentionally removed by commit
    2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").

    Then it was added back by commit 0a835c4f090a ("Reimplement IDR and IDA
    using the radix tree"). However it seems that adding it back was a
    mistake, given that some users such as drm_gem_handle_delete()
    (DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
    allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete()
    actually just wants idr_replace() to return an error code if the ID is
    not allocated, including in the case where the ID is invalid (negative).

    So once again remove the bogus WARN_ON_ONCE().

    This bug was found by syzkaller, which encountered the following
    warning:

    WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
    do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
    do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
    do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
    invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
    RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
    RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
    RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
    RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
    R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
    R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
    drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
    drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
    drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
    drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
    vfs_ioctl fs/ioctl.c:45 [inline]
    do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
    SYSC_ioctl fs/ioctl.c:700 [inline]
    SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    Here is a C reproducer:

    #include
    #include
    #include
    #include
    #include

    int main(void)
    {
    int cardfd = open("/dev/dri/card0", O_RDONLY);

    ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
    &(struct drm_gem_close) { .handle = -1 } );
    }

    Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
    Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
    Signed-off-by: Eric Biggers
    Acked-by: Tejun Heo
    Cc: Dmitry Vyukov
    Cc: Matthew Wilcox
    Cc: [v4.11+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

31 Aug, 2017

1 commit

  • The following new APIs are added:

    int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
    unsigned long start, unsigned long end, gfp_t gfp);
    void *idr_remove_ext(struct idr *idr, unsigned long id);
    void *idr_find_ext(const struct idr *idr, unsigned long id);
    void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
    void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);

    Signed-off-by: Chris Mi
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Chris Mi
     

14 Feb, 2017

4 commits

  • Where we use the radix tree iteration macros, we need to annotate 'slot'
    with __rcu. Make sure we don't forget any new places in the future with
    the same CFLAGS check used for radix-tree.c.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • We can use the root entry as a bitmap and save allocating a 128 byte
    bitmap for an IDA that contains only a few entries (30 on a 32-bit
    machine, 62 on a 64-bit machine). This costs about 300 bytes of kernel
    text on x86-64, so as long as 3 IDAs fall into this category, this
    is a net win for memory consumption.

    Thanks to Rasmus Villemoes for his work documenting the problem and
    collecting statistics on IDAs.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • When we preload the IDA, we allocate an IDA bitmap. Instead of storing
    that preallocated bitmap in the IDA, we store it in a percpu variable.
    Generally there are more IDAs in the system than CPUs, so this cuts down
    on the number of preallocated bitmaps that are unused, and about half
    of the IDA users did not call ida_destroy() so they were leaking IDA
    bitmaps.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • The IDR is very similar to the radix tree. It has some functionality that
    the radix tree did not have (alloc next free, cyclic allocation, a
    callback-based for_each, destroy tree), which is readily implementable on
    top of the radix tree. A few small changes were needed in order to use a
    tag to represent nodes with free space below them. More extensive
    changes were needed to support storing NULL as a valid entry in an IDR.
    Plain radix trees still interpret NULL as a not-present entry.

    The IDA is reimplemented as a client of the newly enhanced radix tree. As
    in the current implementation, it uses a bitmap at the last level of the
    tree.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton

    Matthew Wilcox
     

13 Dec, 2016

1 commit

  • I wanted to wrap a bunch of ida_simple_get calls into their own locking,
    until I dug around and read the original commit message. Stuff like
    this should imo be added to the kernel doc, let's do that.

    Link: http://lkml.kernel.org/r/20161027072216.20411-1-daniel.vetter@ffwll.ch
    Signed-off-by: Daniel Vetter
    Acked-by: Tejun Heo
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Vetter
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

13 Feb, 2015

1 commit

  • idr.c doesn't seem to use anything from hardirq.h (or anything included
    from that). Removing it produces identical objdump -d output, and gives
    44 fewer lines in the .idr.o.cmd dependency file.

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

08 Oct, 2014

1 commit

  • Pull "trivial tree" updates from Jiri Kosina:
    "Usual pile from trivial tree everyone is so eagerly waiting for"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Remove MN10300_PROC_MN2WS0038
    mei: fix comments
    treewide: Fix typos in Kconfig
    kprobes: update jprobe_example.c for do_fork() change
    Documentation: change "&" to "and" in Documentation/applying-patches.txt
    Documentation: remove obsolete pcmcia-cs from Changes
    Documentation: update links in Changes
    Documentation: Docbook: Fix generated DocBook/kernel-api.xml
    score: Remove GENERIC_HAS_IOMAP
    gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
    tty: doc: Fix grammar in serial/tty
    dma-debug: modify check_for_stack output
    treewide: fix errors in printk
    genirq: fix reference in devm_request_threaded_irq comment
    treewide: fix synchronize_rcu() in comments
    checkstack.pl: port to AArch64
    doc: queue-sysfs: minor fixes
    init/do_mounts: better syntax description
    MIPS: fix comment spelling
    powerpc/simpleboot: fix comment
    ...

    Linus Torvalds
     

09 Sep, 2014

1 commit


09 Aug, 2014

1 commit

  • I'm working on address sanitizer project for kernel. Recently we
    started experiments with stack instrumentation, to detect out-of-bounds
    read/write bugs on stack.

    Just after booting I've hit out-of-bounds read on stack in idr_for_each
    (and in __idr_remove_all as well):

    struct idr_layer **paa = &pa[0];

    while (id >= 0 && id < fls(id)) {
    n += IDR_BITS;
    p = *--paa;
    Reviewed-by: Lai Jiangshan
    Cc: Tejun Heo
    Cc: Alexey Preobrazhensky
    Cc: Dmitry Vyukov
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

07 Jun, 2014

2 commits