28 Feb, 2013

40 commits

  • Convert to the much saner new idr interface.

    Signed-off-by: Tejun Heo
    Cc: Paul Gortmaker
    Cc: Maciej Sosnowski
    Cc: Shannon Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert to the much saner new idr interface.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert to the much saner new idr interface. The existing code looks
    buggy to me - ID 0 is treated as no-ID but allocation specifies 0 as
    lower limit and there's no error handling after partial success. This
    conversion keeps the bugs unchanged.

    Signed-off-by: Tejun Heo
    Acked-by: Chas Williams
    Reported-by: kbuild test robot
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert to the much saner new idr interface.

    Signed-off-by: Tejun Heo
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert to the much saner new idr interface. Both bsg and genhd
    protect idr w/ mutex making preloading unnecessary.

    Signed-off-by: Tejun Heo
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr allocation in blk_alloc_devt() wasn't synchronized against lookup
    and removal, and its limit check was off by one - 1 << MINORBITS is
    the number of minors allowed, not the maximum allowed minor.

    Add locking and rename MAX_EXT_DEVT to NR_EXT_DEVT and fix limit
    checking.

    Signed-off-by: Tejun Heo
    Acked-by: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • The current idr interface is very cumbersome.

    * For all allocations, two function calls - idr_pre_get() and
    idr_get_new*() - should be made.

    * idr_pre_get() doesn't guarantee that the following idr_get_new*()
    will not fail from memory shortage. If idr_get_new*() returns
    -EAGAIN, the caller is expected to retry pre_get and allocation.

    * idr_get_new*() can't enforce upper limit. Upper limit can only be
    enforced by allocating and then freeing if above limit.

    * idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping
    around MAX_IDR_FREE idr_layers. The memory consumed per idr is
    under two pages but it makes it difficult to make idr_layer larger.

    This patch implements the following new set of allocation functions.

    * idr_preload[_end]() - Similar to radix preload but doesn't fail.
    The first idr_alloc() inside preload section can be treated as if it
    were called with @gfp_mask used for idr_preload().

    * idr_alloc() - Allocate an ID w/ lower and upper limits. Takes
    @gfp_flags and can be used w/o preloading. When used inside
    preloaded section, the allocation mask of preloading can be assumed.

    If idr_alloc() can be called from a context which allows sufficiently
    relaxed @gfp_mask, it can be used by itself. If, for example,
    idr_alloc() is called inside spinlock protected region, preloading can
    be used like the following.

    idr_preload(GFP_KERNEL);
    spin_lock(lock);

    id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);

    spin_unlock(lock);
    idr_preload_end();
    if (id < 0)
    error;

    which is much simpler and less error-prone than idr_pre_get and
    idr_get_new*() loop.

    The new interface uses per-pcu idr_layer buffer and thus the number of
    idr's in the system doesn't affect the amount of memory used for
    preloading.

    idr_layer_alloc() is introduced to handle idr_layer allocations for
    both old and new ID allocation paths. This is a bit hairy now but the
    new interface is expected to replace the old and the internal
    implementation eventually will become simpler.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Move slot filling to idr_fill_slot() from idr_get_new_above_int() and
    make idr_get_new_above() directly call it. idr_get_new_above_int() is
    no longer needed and removed.

    This will be used to implement a new ID allocation interface.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate
    exception conditions internally. The return value is later translated
    to errno values using _idr_rc_to_errno().

    This is confusing. Drop the custom ones and consistently use -EAGAIN
    for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for
    "ran out of ID space".

    Due to the weird memory preloading mechanism, [ra]_get_new*() return
    -EAGAIN on memory shortage, so we need to substitute -ENOMEM w/
    -EAGAIN on those interface functions. They'll eventually be cleaned
    up and the translations will go away.

    This patch doesn't introduce any functional changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • * Move idr_for_each_entry() definition next to other idr related
    definitions.

    * Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above().

    This changes the implementation of idr_get_new() but the new
    implementation is trivial. This patch doesn't introduce any
    functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • * Tab align fields like a normal person.

    * Drop the unnecessary 0 inits from IDR_INIT().

    This patch is purely cosmetic.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • There was only one legitimate use of idr_remove_all() and a lot more of
    incorrect uses (or lack of it). Now that idr_destroy() implies
    idr_remove_all() and all the in-kernel users updated not to use it,
    there's no reason to keep it around. Mark it deprecated so that we can
    later unexport it.

    idr_remove_all() is made an inline function calling __idr_remove_all()
    to avoid triggering deprecated warning on EXPORT_SYMBOL().

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop reference to idr_remove_all(). Note that the code
    wasn't completely correct before because idr_remove() on all entries
    doesn't necessarily release all idr_layers which could lead to memory
    leak.

    Signed-off-by: Tejun Heo
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated.

    The conversion isn't completely trivial for recover_idr_clear() as it's
    the only place in kernel which makes legitimate use of idr_remove_all()
    w/o idr_destroy(). Replace it with idr_remove() call inside
    idr_for_each_entry() loop. It goes on top so that it matches the
    operation order in recover_idr_del().

    Signed-off-by: Tejun Heo
    Cc: Christine Caulfield
    Cc: David Teigland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert recover_idr_clear() to use idr_for_each_entry() instead of
    idr_for_each(). It's somewhat less efficient this way but it shouldn't
    matter in an error path. This is to help with deprecation of
    idr_remove_all().

    Signed-off-by: Tejun Heo
    Cc: Christine Caulfield
    Cc: David Teigland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: Ohad Ben-Cohen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: Ohad Ben-Cohen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: Alasdair Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    * drm_ctxbitmap_cleanup() was calling idr_remove_all() but forgetting
    idr_destroy() thus leaking all buffered free idr_layers. Replace it
    with idr_destroy().

    Signed-off-by: Tejun Heo
    Acked-by: David Airlie
    Cc: Inki Dae
    Cc: Joonyoung Shim
    Cc: Seung-Woo Kim
    Cc: Kyungmin Park
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: Stefan Richter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: Chas Williams
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr is silly in quite a few ways, one of which is how it's supposed to
    be destroyed - idr_destroy() doesn't release IDs and doesn't even whine
    if the idr isn't empty. If the caller forgets idr_remove_all(), it
    simply leaks memory.

    Even ida gets this wrong and leaks memory on destruction. There is
    absoltely no reason not to call idr_remove_all() from idr_destroy().
    Nobody is abusing idr_destroy() for shrinking free layer buffer and
    continues to use idr after idr_destroy(), so it's safe to do remove_all
    from destroy.

    In the whole kernel, there is only one place where idr_remove_all() is
    legitimiately used without following idr_destroy() while there are quite
    a few places where the caller forgets either idr_remove_all() or
    idr_destroy() leaking memory.

    This patch makes idr_destroy() call idr_destroy_all() and updates the
    function description accordingly.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • The iteration logic of idr_get_next() is borrowed mostly verbatim from
    idr_for_each(). It walks down the tree looking for the slot matching
    the current ID. If the matching slot is not found, the ID is
    incremented by the distance of single slot at the given level and
    repeats.

    The implementation assumes that during the whole iteration id is aligned
    to the layer boundaries of the level closest to the leaf, which is true
    for all iterations starting from zero or an existing element and thus is
    fine for idr_for_each().

    However, idr_get_next() may be given any point and if the starting id
    hits in the middle of a non-existent layer, increment to the next layer
    will end up skipping the same offset into it. For example, an IDR with
    IDs filled between [64, 127] would look like the following.

    [ 0 64 ... ]
    /----/ |
    | |
    NULL [ 64 ... 127 ]

    If idr_get_next() is called with 63 as the starting point, it will try
    to follow down the pointer from 0. As it is NULL, it will then try to
    proceed to the next slot in the same level by adding the slot distance
    at that level which is 64 - making the next try 127. It goes around the
    loop and finds and returns 127 skipping [64, 126].

    Note that this bug also triggers in idr_for_each_entry() loop which
    deletes during iteration as deletions can make layers go away leaving
    the iteration with unaligned ID into missing layers.

    Fix it by ensuring proceeding to the next slot doesn't carry over the
    unaligned offset - ie. use round_up(id + 1, slot_distance) instead of
    id += slot_distance.

    Signed-off-by: Tejun Heo
    Reported-by: David Teigland
    Cc: KAMEZAWA Hiroyuki
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • While adding and removing a lot of disks disks and partitions this
    sometimes shows up:

    WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
    Hardware name:
    sysfs: cannot create duplicate filename '/dev/block/259:751'
    Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan]
    Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1
    Call Trace:
    warn_slowpath_common+0x87/0xc0
    warn_slowpath_fmt+0x46/0x50
    sysfs_add_one+0xc9/0x130
    sysfs_do_create_link+0x12b/0x170
    sysfs_create_link+0x13/0x20
    device_add+0x317/0x650
    idr_get_new+0x13/0x50
    add_partition+0x21c/0x390
    rescan_partitions+0x32b/0x470
    sd_open+0x81/0x1f0 [sd_mod]
    __blkdev_get+0x1b6/0x3c0
    blkdev_get+0x10/0x20
    register_disk+0x155/0x170
    add_disk+0xa6/0x160
    sd_probe_async+0x13b/0x210 [sd_mod]
    add_wait_queue+0x46/0x60
    async_thread+0x102/0x250
    default_wake_function+0x0/0x20
    async_thread+0x0/0x250
    kthread+0x96/0xa0
    child_rip+0xa/0x20
    kthread+0x0/0xa0
    child_rip+0x0/0x20

    This most likely happens because dev_t is freed while the number is
    still used and idr_get_new() is not protected on every use. The fix
    adds a mutex where it wasn't before and moves the dev_t free function so
    it is called after device del.

    Signed-off-by: Tomas Henzl
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomas Henzl
     
  • Though there is no error if we free a NULL pointer, I think we could
    avoid this behaviour. Change the code a little in kimage_crash_alloc()
    could avoid this kind of unnecessary free.

    Signed-off-by: Zhang Yanfei
    Cc: "Eric W. Biederman"
    Cc: Sasha Levin
    Reviewed-by: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • If kimage_normal_alloc() fails to alloc pages for image->swap_page, it
    should call kimage_free_page_list() to free allocated pages in
    image->control_pages list before it frees image.

    Signed-off-by: Zhang Yanfei
    Cc: "Eric W. Biederman"
    Cc: Sasha Levin
    Reviewed-by: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • If kimage_normal_alloc() fails to initialize an allocated kimage, it will
    free the image but would still set 'rimage', as a result kexec_load will
    try to free it again.

    This would explode as part of the freeing process is accessing internal
    members which point to uninitialized memory.

    Signed-off-by: Sasha Levin
    Cc: "Eric W. Biederman"
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • This patch exports a PG_hwpoison into vmcoreinfo when
    CONFIG_MEMORY_FAILURE is defined. "makedumpfile" needs to read
    information of memory, such as 'mem_section', 'zone', 'pageflags' from
    vmcore.

    We introduce a function into "makedumpfile" to exclude hwpoison page from
    vmcore dump. In order to introduce this function, PG_hwpoison flag have
    to export into vmcoreinfo.

    Signed-off-by: Mitsuhiro Tanino
    Acked-by: "Eric W. Biederman"
    Cc: Mitsuhiro Tanino
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mitsuhiro Tanino
     
  • hole_end has been checked to make sure it is
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • tAdd adds the values related to buddy system to vmcoreinfo data so that
    makedumpfile (dump filtering command) can filter out all free pages with
    the new logic.

    It's faster than the current logic because it can distinguish free page
    by analyzing page structure at the same time as filtering for other
    unnecessary pages (e.g. anonymous page).

    OTOH, the current logic has to trace free_list to distinguish free pages
    while analyzing page structure to filter out other unnecessary pages.

    The new logic uses the fact that buddy page is marked by _mapcount ==
    PAGE_BUDDY_MAPCOUNT_VALUE. But, _mapcount shares its memory with other
    fields for SLAB/SLUB when PG_slab is set, so we need to check if PG_slab
    is set or not before looking up _mapcount value. And we can get the
    order of buddy system from private field. To sum it up, the values
    below are required for this logic.

    Required values:
    - OFFSET(page._mapcount)
    - OFFSET(page.private)
    - NUMBER(PG_slab)
    - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)

    Changelog from v1 to v2:
    1. remove SIZE(pageflags)
    The new logic was changed after I sent v1 patch.
    Accordingly, SIZE(pageflags) has been unnecessary for makedumpfile.

    What's makedumpfile:
    makedumpfile creates a small dumpfile by excluding unnecessary pages
    for the analysis. To distinguish unnecessary pages, makedumpfile gets
    the vmcoreinfo data which has the minimum debugging information only
    for dump filtering.

    Signed-off-by: Atsushi Kumagai
    Cc: "Eric W. Biederman"
    Acked-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Kumagai
     
  • If new_nsproxy is set we will always call switch_task_namespaces and
    then set new_nsproxy back to NULL so the reassignment and fall through
    check are redundant

    Signed-off-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • [akpm@linux-foundation.org: checkpatch fixes]
    Cc: Cyrill Gorcunov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Signed-off-by: Cyrill Gorcunov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Prevents hung_task detector from panicing the machine. This is also
    needed to prevent this wait from blocking suspend.

    (It doesnt' currently block suspend but it would once the next
    patch in this series is applied.)

    [yongjun_wei@trendmicro.com.cn: kernel/exit.c: remove duplicated include]
    Signed-off-by: Mandeep Singh Baines
    Cc: Ben Chan
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Rafael J. Wysocki
    Cc: Ingo Molnar
    Signed-off-by: Wei Yongjun
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
    deadlock if the lock is later acquired in the suspend or hibernate path
    (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
    cgroup_freezer if a lock is held inside a frozen cgroup that is later
    acquired by a process outside that group.

    [akpm@linux-foundation.org: export debug_check_no_locks_held]
    Signed-off-by: Mandeep Singh Baines
    Cc: Ben Chan
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Rafael J. Wysocki
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • In read_vmcore() two `if' tests are duplicated. Change the position of
    them could reduce the duplication. This change does not affect the
    behaviour of the function.

    [akpm@linux-foundation.org: avoid `if (foo = bar)' thing, use min_t()]
    [akpm@linux-foundation.org: s/max_t/min_t/]
    Signed-off-by: Zhang Yanfei
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang Yanfei
     
  • - use pr_foo() throughout

    - remove a couple of duplicated KERN_WARNINGs, via WARN(KERN_WARNING "...")

    - nuke a few warnings which I've never seen happen, ever.

    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton