04 Oct, 2017

1 commit


14 Sep, 2017

1 commit

  • IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id <
    0)' in idr_replace(), but it was intentionally removed by commit
    2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").

    Then it was added back by commit 0a835c4f090a ("Reimplement IDR and IDA
    using the radix tree"). However it seems that adding it back was a
    mistake, given that some users such as drm_gem_handle_delete()
    (DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
    allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete()
    actually just wants idr_replace() to return an error code if the ID is
    not allocated, including in the case where the ID is invalid (negative).

    So once again remove the bogus WARN_ON_ONCE().

    This bug was found by syzkaller, which encountered the following
    warning:

    WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
    do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
    do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
    do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
    invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
    RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
    RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
    RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
    RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
    R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
    R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
    drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
    drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
    drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
    drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
    vfs_ioctl fs/ioctl.c:45 [inline]
    do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
    SYSC_ioctl fs/ioctl.c:700 [inline]
    SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
    entry_SYSCALL_64_fastpath+0x1f/0xbe

    Here is a C reproducer:

    #include
    #include
    #include
    #include
    #include

    int main(void)
    {
    int cardfd = open("/dev/dri/card0", O_RDONLY);

    ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
    &(struct drm_gem_close) { .handle = -1 } );
    }

    Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
    Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
    Signed-off-by: Eric Biggers
    Acked-by: Tejun Heo
    Cc: Dmitry Vyukov
    Cc: Matthew Wilcox
    Cc: [v4.11+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Biggers
     

31 Aug, 2017

1 commit

  • The following new APIs are added:

    int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
    unsigned long start, unsigned long end, gfp_t gfp);
    void *idr_remove_ext(struct idr *idr, unsigned long id);
    void *idr_find_ext(const struct idr *idr, unsigned long id);
    void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
    void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);

    Signed-off-by: Chris Mi
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Chris Mi
     

14 Feb, 2017

4 commits

  • Where we use the radix tree iteration macros, we need to annotate 'slot'
    with __rcu. Make sure we don't forget any new places in the future with
    the same CFLAGS check used for radix-tree.c.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • We can use the root entry as a bitmap and save allocating a 128 byte
    bitmap for an IDA that contains only a few entries (30 on a 32-bit
    machine, 62 on a 64-bit machine). This costs about 300 bytes of kernel
    text on x86-64, so as long as 3 IDAs fall into this category, this
    is a net win for memory consumption.

    Thanks to Rasmus Villemoes for his work documenting the problem and
    collecting statistics on IDAs.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • When we preload the IDA, we allocate an IDA bitmap. Instead of storing
    that preallocated bitmap in the IDA, we store it in a percpu variable.
    Generally there are more IDAs in the system than CPUs, so this cuts down
    on the number of preallocated bitmaps that are unused, and about half
    of the IDA users did not call ida_destroy() so they were leaking IDA
    bitmaps.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • The IDR is very similar to the radix tree. It has some functionality that
    the radix tree did not have (alloc next free, cyclic allocation, a
    callback-based for_each, destroy tree), which is readily implementable on
    top of the radix tree. A few small changes were needed in order to use a
    tag to represent nodes with free space below them. More extensive
    changes were needed to support storing NULL as a valid entry in an IDR.
    Plain radix trees still interpret NULL as a not-present entry.

    The IDA is reimplemented as a client of the newly enhanced radix tree. As
    in the current implementation, it uses a bitmap at the last level of the
    tree.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton

    Matthew Wilcox
     

13 Dec, 2016

1 commit

  • I wanted to wrap a bunch of ida_simple_get calls into their own locking,
    until I dug around and read the original commit message. Stuff like
    this should imo be added to the kernel doc, let's do that.

    Link: http://lkml.kernel.org/r/20161027072216.20411-1-daniel.vetter@ffwll.ch
    Signed-off-by: Daniel Vetter
    Acked-by: Tejun Heo
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Vetter
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

13 Feb, 2015

1 commit

  • idr.c doesn't seem to use anything from hardirq.h (or anything included
    from that). Removing it produces identical objdump -d output, and gives
    44 fewer lines in the .idr.o.cmd dependency file.

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

08 Oct, 2014

1 commit

  • Pull "trivial tree" updates from Jiri Kosina:
    "Usual pile from trivial tree everyone is so eagerly waiting for"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Remove MN10300_PROC_MN2WS0038
    mei: fix comments
    treewide: Fix typos in Kconfig
    kprobes: update jprobe_example.c for do_fork() change
    Documentation: change "&" to "and" in Documentation/applying-patches.txt
    Documentation: remove obsolete pcmcia-cs from Changes
    Documentation: update links in Changes
    Documentation: Docbook: Fix generated DocBook/kernel-api.xml
    score: Remove GENERIC_HAS_IOMAP
    gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
    tty: doc: Fix grammar in serial/tty
    dma-debug: modify check_for_stack output
    treewide: fix errors in printk
    genirq: fix reference in devm_request_threaded_irq comment
    treewide: fix synchronize_rcu() in comments
    checkstack.pl: port to AArch64
    doc: queue-sysfs: minor fixes
    init/do_mounts: better syntax description
    MIPS: fix comment spelling
    powerpc/simpleboot: fix comment
    ...

    Linus Torvalds
     

09 Sep, 2014

1 commit


09 Aug, 2014

1 commit

  • I'm working on address sanitizer project for kernel. Recently we
    started experiments with stack instrumentation, to detect out-of-bounds
    read/write bugs on stack.

    Just after booting I've hit out-of-bounds read on stack in idr_for_each
    (and in __idr_remove_all as well):

    struct idr_layer **paa = &pa[0];

    while (id >= 0 && id < fls(id)) {
    n += IDR_BITS;
    p = *--paa;
    Reviewed-by: Lai Jiangshan
    Cc: Tejun Heo
    Cc: Alexey Preobrazhensky
    Cc: Dmitry Vyukov
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

07 Jun, 2014

6 commits

  • If "idr->hint == p" is true, it also implies "idr->hint" is true(not NULL).

    Signed-off-by: Lai Jiangshan
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • After idr subsystem is changed to RCU-awared, the free layer will not go
    to the free list. The free list will not be filled up when
    idr_remove(). So we don't need to shink it too.

    Signed-off-by: Lai Jiangshan
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • When the smaller id is not found, idr_replace() returns -ENOENT. But
    when the id is bigger enough, idr_replace() returns -EINVAL, actually
    there is no difference between these two kinds of ids.

    These are all unallocated id, the return values of the idr_replace() for
    these ids should be the same: -ENOENT.

    Signed-off-by: Lai Jiangshan
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • If the ida has at least one existing id, and when an unallocated ID
    which meets a certain condition is passed to the ida_remove(), the
    system will crash because it hits NULL pointer dereference.

    The condition is that the unallocated ID shares the same lowest idr
    layer with the existing ID, but the idr slot would be different if the
    unallocated ID were to be allocated.

    In this case the matching idr slot for the unallocated_id is NULL,
    causing @bitmap to be NULL which the function dereferences without
    checking crashing the kernel.

    See the test code:

    static void test3(void)
    {
    int id;
    DEFINE_IDA(test_ida);

    printk(KERN_INFO "Start test3\n");
    if (ida_pre_get(&test_ida, GFP_KERNEL) < 0) return;
    if (ida_get_new(&test_ida, &id) < 0) return;
    ida_remove(&test_ida, 4000); /* bug: null deference here */
    printk(KERN_INFO "End of test3\n");
    }

    It happens only when the caller tries to free an unallocated ID which is
    the caller's fault. It is not a bug. But it is better to add the
    proper check and complain rather than crashing the kernel.

    [tj@kernel.org: updated patch description]
    Signed-off-by: Lai Jiangshan
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • If unallocated_id = (ANY * idr_max(idp->layers) + existing_id) is passed
    to idr_remove(). The existing_id will be removed unexpectedly.

    The following test shows this unexpected id-removal:

    static void test4(void)
    {
    int id;
    DEFINE_IDR(test_idr);

    printk(KERN_INFO "Start test4\n");
    id = idr_alloc(&test_idr, (void *)1, 42, 43, GFP_KERNEL);
    BUG_ON(id != 42);
    idr_remove(&test_idr, 42 + IDR_SIZE);
    TEST_BUG_ON(idr_find(&test_idr, 42) != (void *)1);
    idr_destroy(&test_idr);
    printk(KERN_INFO "End of test4\n");
    }

    ida_remove() shares the similar problem.

    It happens only when the caller tries to free an unallocated ID which is
    the caller's fault. It is not a bug. But it is better to add the
    proper check and complain rather than removing an existing_id silently.

    Signed-off-by: Lai Jiangshan
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • idr_replace() open-codes the logic to calculate the maximum valid ID
    given the height of the idr tree; unfortunately, the open-coded logic
    doesn't account for the fact that the top layer may have unused slots
    and over-shifts the limit to zero when the tree is at its maximum
    height.

    The following test code shows it fails to replace the value for
    id=((1<<
    Acked-by: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     

08 Apr, 2014

2 commits

  • Replace rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

    The rcu_assign_pointer() ensures that the initialization of a structure
    is carried out before storing a pointer to that structure. And in the
    case of the NULL pointer, there is no structure to initialize.

    So, rcu_assign_pointer(p, NULL) can be safely converted to
    RCU_INIT_POINTER(p, NULL)

    Signed-off-by: Monam Agarwal
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Monam Agarwal
     
  • Remove no longer used deprecated code, and make local functions
    static.

    Signed-off-by: Stephen Hemminger
    Acked-by: Jean Delvare
    Acked-by: Tejun Heo
    Cc: Jeff Layton
    Cc: Philipp Reisner
    Cc: Jens Axboe
    Cc: George Spelvin
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Hemminger
     

17 Feb, 2014

1 commit


04 Jul, 2013

1 commit

  • We print a dump stack after idr_remove warning. This is useful to find
    the faulty piece of code. Let's do the same for ida_remove, as it would
    be equally useful there.

    [akpm@linux-foundation.org: convert the open-coded printk+dump_stack into WARN()]
    Signed-off-by: Jean Delvare
    Cc: Tejun Heo
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jean Delvare
     

30 Apr, 2013

1 commit

  • As Tejun points out, there are several users of the IDR facility that
    attempt to use it in a cyclic fashion. These users are likely to see
    -ENOSPC errors after the counter wraps one or more times however.

    This patchset adds a new idr_alloc_cyclic routine and converts several
    of these users to it. Many of these users are in obscure parts of the
    kernel, and I don't have a good way to test some of them. The change is
    pretty straightforward though, so hopefully it won't be an issue.

    There is one other cyclic user of idr_alloc that I didn't touch in
    ipc/util.c. That one is doing some strange stuff that I didn't quite
    understand, but it looks like it should probably be converted later
    somehow.

    This patch:

    Thus spake Tejun Heo:

    Ooh, BTW, the cyclic allocation is broken. It's prone to -ENOSPC
    after the first wraparound. There are several cyclic users in the
    kernel and I think it probably would be best to implement cyclic
    support in idr.

    This patch does that by adding new idr_alloc_cyclic function that such
    users in the kernel can use. With this, there's no need for a caller to
    keep track of the last value used as that's now tracked internally. This
    should prevent the ENOSPC problems that can hit when the "last allocated"
    counter exceeds INT_MAX.

    Later patches will convert existing cyclic users to the new interface.

    Signed-off-by: Jeff Layton
    Reviewed-by: Tejun Heo
    Cc: "David S. Miller"
    Cc: "J. Bruce Fields"
    Cc: Eric Paris
    Cc: Jack Morgenstein
    Cc: John McCutchan
    Cc: Neil Horman
    Cc: Or Gerlitz
    Cc: Robert Love
    Cc: Roland Dreier
    Cc: Sridhar Samudrala
    Cc: Steve Wise
    Cc: Tom Tucker
    Cc: Vlad Yasevich

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

14 Mar, 2013

2 commits

  • GFP_NOIO is often used for idr_alloc() inside preloaded section as the
    allocation mask doesn't really matter. If the idr tree needs to be
    expanded, idr_alloc() first tries to allocate using the specified
    allocation mask and if it fails falls back to the preloaded buffer. This
    order prevent non-preloading idr_alloc() users from taking advantage of
    preloading ones by using preload buffer without filling it shifting the
    burden of allocation to the preload users.

    Unfortunately, this allowed/expected-to-fail kmem_cache allocation ends up
    generating spurious slab lowmem warning before succeeding the request from
    the preload buffer.

    This patch makes idr_layer_alloc() add __GFP_NOWARN to the first
    kmem_cache attempt and try kmem_cache again w/o __GFP_NOWARN after
    allocation from preload_buffer fails so that lowmem warning is generated
    if not suppressed by the original @gfp_mask.

    Signed-off-by: Tejun Heo
    Reported-by: David Teigland
    Tested-by: David Teigland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Now that all in-kernel users are converted to ues the new alloc
    interface, mark the old interface deprecated. We should be able to
    remove these in a few releases.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

13 Mar, 2013

1 commit

  • Fix new kernel-doc warnings in idr:

    Warning(include/linux/idr.h:113): No description found for parameter 'idr'
    Warning(include/linux/idr.h:113): Excess function parameter 'idp' description in 'idr_find'
    Warning(lib/idr.c:232): Excess function parameter 'id' description in 'sub_alloc'
    Warning(lib/idr.c:232): Excess function parameter 'id' description in 'sub_alloc'

    Signed-off-by: Randy Dunlap
    Acked-by: Tejun Heo
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

09 Mar, 2013

1 commit

  • idr_find(), idr_remove() and idr_replace() used to silently ignore the
    sign bit and perform lookup with the rest of the bits. The weird behavior
    has been changed such that negative IDs are treated as invalid. As the
    behavior change was subtle, WARN_ON_ONCE() was added in the hope of
    determining who's calling idr functions with negative IDs so that they can
    be examined for problems.

    Up until now, all two reported cases are ID number coming directly from
    userland and getting fed into idr_find() and the warnings seem to cause
    more problems than being helpful. Drop the WARN_ON_ONCE()s.

    Signed-off-by: Tejun Heo
    Reported-by:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

28 Feb, 2013

12 commits

  • Until recently, when an negative ID is specified, idr functions used to
    ignore the sign bit and proceeded with the operation with the rest of
    bits, which is bizarre and error-prone. The behavior recently got changed
    so that negative IDs are treated as invalid but we're triggering
    WARN_ON_ONCE() on negative IDs just in case somebody was depending on the
    sign bit being ignored, so that those can be detected and fixed easily.

    We only need this for a while. Explain why WARN_ON_ONCE()s are there and
    that they can be removed later.

    Signed-off-by: Tejun Heo
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • While idr lookup isn't a particularly heavy operation, it still is too
    substantial to use in hot paths without worrying about the performance
    implications. With recent changes, each idr_layer covers 256 slots
    which should be enough to cover most use cases with single idr_layer
    making lookup hint very attractive.

    This patch adds idr->hint which points to the idr_layer which
    allocated an ID most recently and the fast path lookup becomes

    if (look up target's prefix matches that of the hinted layer)
    return hint->ary[ID's offset in the leaf layer];

    which can be inlined.

    idr->hint is set to the leaf node on idr_fill_slot() and cleared from
    free_layer().

    [andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized]
    Signed-off-by: Tejun Heo
    Cc: Kirill A. Shutemov
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Add a field which carries the prefix of ID the idr_layer covers. This
    will be used to implement lookup hint.

    This patch doesn't make use of the new field and doesn't introduce any
    behavior difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Currently, idr->bitmap is declared as an unsigned long which restricts
    the number of bits an idr_layer can contain. All bitops can handle
    arbitrary positive integer bit number and there's no reason for this
    restriction.

    Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
    unsigned long.

    * idr_layer->bitmap is now an array. '&' dropped from params to
    bitops.

    * Replaced "== IDR_FULL" tests with bitmap_full() and removed
    IDR_FULL.

    * Replaced find_next_bit() on ~bitmap with find_next_zero_bit().

    * Replaced "bitmap = 0" with bitmap_clear().

    This patch doesn't (or at least shouldn't) introduce any behavior
    changes.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • MAX_IDR_MASK is another weirdness in the idr interface. As idr covers
    whole positive integer range, it's defined as 0x7fffffff or INT_MAX.

    Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
    They basically mask off the sign bit and operate on the rest, so if
    the caller, by accident, passes in a negative number, the sign bit
    will be masked off and the remaining part will be used as if that was
    the input, which is worse than crashing.

    The constant is visible in idr.h and there are several users in the
    kernel.

    * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()

    Basically used to test if adap->nr is a negative number which isn't
    -1 and returns -EINVAL if so. idr_alloc() already has negative
    @start checking (w/ WARN_ON_ONCE), so this can go away.

    * drivers/infiniband/core/cm.c:cm_alloc_id()
    drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()

    Used to wrap cyclic @start. Can be replaced with max(next, 0).
    Note that this type of cyclic allocation using idr is buggy. These
    are prone to spurious -ENOSPC failure after the first wraparound.

    * fs/super.c:get_anon_bdev()

    The ID allocated from ida is masked off before being tested whether
    it's inside valid range. ida allocated ID can never be a negative
    number and the masking is unnecessary.

    Update idr_*() functions to fail with -EINVAL when negative @id is
    specified and update other MAX_IDR_MASK users as described above.

    This leaves MAX_IDR_MASK without any user, remove it and relocate
    other MAX_IDR_* constants to lib/idr.c.

    Signed-off-by: Tejun Heo
    Cc: Jean Delvare
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hal Rosenstock
    Cc: "Marciniszyn, Mike"
    Cc: Jack Morgenstein
    Cc: Or Gerlitz
    Cc: Al Viro
    Acked-by: Wolfram Sang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Most functions in idr fail to deal with the high bits when the idr
    tree grows to the maximum height.

    * idr_get_empty_slot() stops growing idr tree once the depth reaches
    MAX_IDR_LEVEL - 1, which is one depth shallower than necessary to
    cover the whole range. The function doesn't even notice that it
    didn't grow the tree enough and ends up allocating the wrong ID
    given sufficiently high @starting_id.

    For example, on 64 bit, if the starting id is 0x7fffff01,
    idr_get_empty_slot() will grow the tree 5 layer deep, which only
    covers the 30 bits and then proceed to allocate as if the bit 30
    wasn't specified. It ends up allocating 0x3fffff01 without the bit
    30 but still returns 0x7fffff01.

    * __idr_remove_all() will not remove anything if the tree is fully
    grown.

    * idr_find() can't find anything if the tree is fully grown.

    * idr_for_each() and idr_get_next() can't iterate anything if the tree
    is fully grown.

    Fix it by introducing idr_max() which returns the maximum possible ID
    given the depth of tree and replacing the id limit checks in all
    affected places.

    As the idr_layer pointer array pa[] needs to be 1 larger than the
    maximum depth, enlarge pa[] arrays by one.

    While this plugs the discovered issues, the whole code base is
    horrible and in desparate need of rewrite. It's fragile like hell,

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Cc:

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • The current idr interface is very cumbersome.

    * For all allocations, two function calls - idr_pre_get() and
    idr_get_new*() - should be made.

    * idr_pre_get() doesn't guarantee that the following idr_get_new*()
    will not fail from memory shortage. If idr_get_new*() returns
    -EAGAIN, the caller is expected to retry pre_get and allocation.

    * idr_get_new*() can't enforce upper limit. Upper limit can only be
    enforced by allocating and then freeing if above limit.

    * idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping
    around MAX_IDR_FREE idr_layers. The memory consumed per idr is
    under two pages but it makes it difficult to make idr_layer larger.

    This patch implements the following new set of allocation functions.

    * idr_preload[_end]() - Similar to radix preload but doesn't fail.
    The first idr_alloc() inside preload section can be treated as if it
    were called with @gfp_mask used for idr_preload().

    * idr_alloc() - Allocate an ID w/ lower and upper limits. Takes
    @gfp_flags and can be used w/o preloading. When used inside
    preloaded section, the allocation mask of preloading can be assumed.

    If idr_alloc() can be called from a context which allows sufficiently
    relaxed @gfp_mask, it can be used by itself. If, for example,
    idr_alloc() is called inside spinlock protected region, preloading can
    be used like the following.

    idr_preload(GFP_KERNEL);
    spin_lock(lock);

    id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);

    spin_unlock(lock);
    idr_preload_end();
    if (id < 0)
    error;

    which is much simpler and less error-prone than idr_pre_get and
    idr_get_new*() loop.

    The new interface uses per-pcu idr_layer buffer and thus the number of
    idr's in the system doesn't affect the amount of memory used for
    preloading.

    idr_layer_alloc() is introduced to handle idr_layer allocations for
    both old and new ID allocation paths. This is a bit hairy now but the
    new interface is expected to replace the old and the internal
    implementation eventually will become simpler.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Move slot filling to idr_fill_slot() from idr_get_new_above_int() and
    make idr_get_new_above() directly call it. idr_get_new_above_int() is
    no longer needed and removed.

    This will be used to implement a new ID allocation interface.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate
    exception conditions internally. The return value is later translated
    to errno values using _idr_rc_to_errno().

    This is confusing. Drop the custom ones and consistently use -EAGAIN
    for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for
    "ran out of ID space".

    Due to the weird memory preloading mechanism, [ra]_get_new*() return
    -EAGAIN on memory shortage, so we need to substitute -ENOMEM w/
    -EAGAIN on those interface functions. They'll eventually be cleaned
    up and the translations will go away.

    This patch doesn't introduce any functional changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • * Move idr_for_each_entry() definition next to other idr related
    definitions.

    * Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above().

    This changes the implementation of idr_get_new() but the new
    implementation is trivial. This patch doesn't introduce any
    functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • There was only one legitimate use of idr_remove_all() and a lot more of
    incorrect uses (or lack of it). Now that idr_destroy() implies
    idr_remove_all() and all the in-kernel users updated not to use it,
    there's no reason to keep it around. Mark it deprecated so that we can
    later unexport it.

    idr_remove_all() is made an inline function calling __idr_remove_all()
    to avoid triggering deprecated warning on EXPORT_SYMBOL().

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr is silly in quite a few ways, one of which is how it's supposed to
    be destroyed - idr_destroy() doesn't release IDs and doesn't even whine
    if the idr isn't empty. If the caller forgets idr_remove_all(), it
    simply leaks memory.

    Even ida gets this wrong and leaks memory on destruction. There is
    absoltely no reason not to call idr_remove_all() from idr_destroy().
    Nobody is abusing idr_destroy() for shrinking free layer buffer and
    continues to use idr after idr_destroy(), so it's safe to do remove_all
    from destroy.

    In the whole kernel, there is only one place where idr_remove_all() is
    legitimiately used without following idr_destroy() while there are quite
    a few places where the caller forgets either idr_remove_all() or
    idr_destroy() leaking memory.

    This patch makes idr_destroy() call idr_destroy_all() and updates the
    function description accordingly.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo