28 Feb, 2013

16 commits

  • Until recently, when an negative ID is specified, idr functions used to
    ignore the sign bit and proceeded with the operation with the rest of
    bits, which is bizarre and error-prone. The behavior recently got changed
    so that negative IDs are treated as invalid but we're triggering
    WARN_ON_ONCE() on negative IDs just in case somebody was depending on the
    sign bit being ignored, so that those can be detected and fixed easily.

    We only need this for a while. Explain why WARN_ON_ONCE()s are there and
    that they can be removed later.

    Signed-off-by: Tejun Heo
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • While idr lookup isn't a particularly heavy operation, it still is too
    substantial to use in hot paths without worrying about the performance
    implications. With recent changes, each idr_layer covers 256 slots
    which should be enough to cover most use cases with single idr_layer
    making lookup hint very attractive.

    This patch adds idr->hint which points to the idr_layer which
    allocated an ID most recently and the fast path lookup becomes

    if (look up target's prefix matches that of the hinted layer)
    return hint->ary[ID's offset in the leaf layer];

    which can be inlined.

    idr->hint is set to the leaf node on idr_fill_slot() and cleared from
    free_layer().

    [andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized]
    Signed-off-by: Tejun Heo
    Cc: Kirill A. Shutemov
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Add a field which carries the prefix of ID the idr_layer covers. This
    will be used to implement lookup hint.

    This patch doesn't make use of the new field and doesn't introduce any
    behavior difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Currently, idr->bitmap is declared as an unsigned long which restricts
    the number of bits an idr_layer can contain. All bitops can handle
    arbitrary positive integer bit number and there's no reason for this
    restriction.

    Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
    unsigned long.

    * idr_layer->bitmap is now an array. '&' dropped from params to
    bitops.

    * Replaced "== IDR_FULL" tests with bitmap_full() and removed
    IDR_FULL.

    * Replaced find_next_bit() on ~bitmap with find_next_zero_bit().

    * Replaced "bitmap = 0" with bitmap_clear().

    This patch doesn't (or at least shouldn't) introduce any behavior
    changes.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • MAX_IDR_MASK is another weirdness in the idr interface. As idr covers
    whole positive integer range, it's defined as 0x7fffffff or INT_MAX.

    Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
    They basically mask off the sign bit and operate on the rest, so if
    the caller, by accident, passes in a negative number, the sign bit
    will be masked off and the remaining part will be used as if that was
    the input, which is worse than crashing.

    The constant is visible in idr.h and there are several users in the
    kernel.

    * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()

    Basically used to test if adap->nr is a negative number which isn't
    -1 and returns -EINVAL if so. idr_alloc() already has negative
    @start checking (w/ WARN_ON_ONCE), so this can go away.

    * drivers/infiniband/core/cm.c:cm_alloc_id()
    drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()

    Used to wrap cyclic @start. Can be replaced with max(next, 0).
    Note that this type of cyclic allocation using idr is buggy. These
    are prone to spurious -ENOSPC failure after the first wraparound.

    * fs/super.c:get_anon_bdev()

    The ID allocated from ida is masked off before being tested whether
    it's inside valid range. ida allocated ID can never be a negative
    number and the masking is unnecessary.

    Update idr_*() functions to fail with -EINVAL when negative @id is
    specified and update other MAX_IDR_MASK users as described above.

    This leaves MAX_IDR_MASK without any user, remove it and relocate
    other MAX_IDR_* constants to lib/idr.c.

    Signed-off-by: Tejun Heo
    Cc: Jean Delvare
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hal Rosenstock
    Cc: "Marciniszyn, Mike"
    Cc: Jack Morgenstein
    Cc: Or Gerlitz
    Cc: Al Viro
    Acked-by: Wolfram Sang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Most functions in idr fail to deal with the high bits when the idr
    tree grows to the maximum height.

    * idr_get_empty_slot() stops growing idr tree once the depth reaches
    MAX_IDR_LEVEL - 1, which is one depth shallower than necessary to
    cover the whole range. The function doesn't even notice that it
    didn't grow the tree enough and ends up allocating the wrong ID
    given sufficiently high @starting_id.

    For example, on 64 bit, if the starting id is 0x7fffff01,
    idr_get_empty_slot() will grow the tree 5 layer deep, which only
    covers the 30 bits and then proceed to allocate as if the bit 30
    wasn't specified. It ends up allocating 0x3fffff01 without the bit
    30 but still returns 0x7fffff01.

    * __idr_remove_all() will not remove anything if the tree is fully
    grown.

    * idr_find() can't find anything if the tree is fully grown.

    * idr_for_each() and idr_get_next() can't iterate anything if the tree
    is fully grown.

    Fix it by introducing idr_max() which returns the maximum possible ID
    given the depth of tree and replacing the id limit checks in all
    affected places.

    As the idr_layer pointer array pa[] needs to be 1 larger than the
    maximum depth, enlarge pa[] arrays by one.

    While this plugs the discovered issues, the whole code base is
    horrible and in desparate need of rewrite. It's fragile like hell,

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Cc:

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • The current idr interface is very cumbersome.

    * For all allocations, two function calls - idr_pre_get() and
    idr_get_new*() - should be made.

    * idr_pre_get() doesn't guarantee that the following idr_get_new*()
    will not fail from memory shortage. If idr_get_new*() returns
    -EAGAIN, the caller is expected to retry pre_get and allocation.

    * idr_get_new*() can't enforce upper limit. Upper limit can only be
    enforced by allocating and then freeing if above limit.

    * idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping
    around MAX_IDR_FREE idr_layers. The memory consumed per idr is
    under two pages but it makes it difficult to make idr_layer larger.

    This patch implements the following new set of allocation functions.

    * idr_preload[_end]() - Similar to radix preload but doesn't fail.
    The first idr_alloc() inside preload section can be treated as if it
    were called with @gfp_mask used for idr_preload().

    * idr_alloc() - Allocate an ID w/ lower and upper limits. Takes
    @gfp_flags and can be used w/o preloading. When used inside
    preloaded section, the allocation mask of preloading can be assumed.

    If idr_alloc() can be called from a context which allows sufficiently
    relaxed @gfp_mask, it can be used by itself. If, for example,
    idr_alloc() is called inside spinlock protected region, preloading can
    be used like the following.

    idr_preload(GFP_KERNEL);
    spin_lock(lock);

    id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);

    spin_unlock(lock);
    idr_preload_end();
    if (id < 0)
    error;

    which is much simpler and less error-prone than idr_pre_get and
    idr_get_new*() loop.

    The new interface uses per-pcu idr_layer buffer and thus the number of
    idr's in the system doesn't affect the amount of memory used for
    preloading.

    idr_layer_alloc() is introduced to handle idr_layer allocations for
    both old and new ID allocation paths. This is a bit hairy now but the
    new interface is expected to replace the old and the internal
    implementation eventually will become simpler.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Move slot filling to idr_fill_slot() from idr_get_new_above_int() and
    make idr_get_new_above() directly call it. idr_get_new_above_int() is
    no longer needed and removed.

    This will be used to implement a new ID allocation interface.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate
    exception conditions internally. The return value is later translated
    to errno values using _idr_rc_to_errno().

    This is confusing. Drop the custom ones and consistently use -EAGAIN
    for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for
    "ran out of ID space".

    Due to the weird memory preloading mechanism, [ra]_get_new*() return
    -EAGAIN on memory shortage, so we need to substitute -ENOMEM w/
    -EAGAIN on those interface functions. They'll eventually be cleaned
    up and the translations will go away.

    This patch doesn't introduce any functional changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • * Move idr_for_each_entry() definition next to other idr related
    definitions.

    * Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above().

    This changes the implementation of idr_get_new() but the new
    implementation is trivial. This patch doesn't introduce any
    functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • There was only one legitimate use of idr_remove_all() and a lot more of
    incorrect uses (or lack of it). Now that idr_destroy() implies
    idr_remove_all() and all the in-kernel users updated not to use it,
    there's no reason to keep it around. Mark it deprecated so that we can
    later unexport it.

    idr_remove_all() is made an inline function calling __idr_remove_all()
    to avoid triggering deprecated warning on EXPORT_SYMBOL().

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr is silly in quite a few ways, one of which is how it's supposed to
    be destroyed - idr_destroy() doesn't release IDs and doesn't even whine
    if the idr isn't empty. If the caller forgets idr_remove_all(), it
    simply leaks memory.

    Even ida gets this wrong and leaks memory on destruction. There is
    absoltely no reason not to call idr_remove_all() from idr_destroy().
    Nobody is abusing idr_destroy() for shrinking free layer buffer and
    continues to use idr after idr_destroy(), so it's safe to do remove_all
    from destroy.

    In the whole kernel, there is only one place where idr_remove_all() is
    legitimiately used without following idr_destroy() while there are quite
    a few places where the caller forgets either idr_remove_all() or
    idr_destroy() leaking memory.

    This patch makes idr_destroy() call idr_destroy_all() and updates the
    function description accordingly.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • The iteration logic of idr_get_next() is borrowed mostly verbatim from
    idr_for_each(). It walks down the tree looking for the slot matching
    the current ID. If the matching slot is not found, the ID is
    incremented by the distance of single slot at the given level and
    repeats.

    The implementation assumes that during the whole iteration id is aligned
    to the layer boundaries of the level closest to the leaf, which is true
    for all iterations starting from zero or an existing element and thus is
    fine for idr_for_each().

    However, idr_get_next() may be given any point and if the starting id
    hits in the middle of a non-existent layer, increment to the next layer
    will end up skipping the same offset into it. For example, an IDR with
    IDs filled between [64, 127] would look like the following.

    [ 0 64 ... ]
    /----/ |
    | |
    NULL [ 64 ... 127 ]

    If idr_get_next() is called with 63 as the starting point, it will try
    to follow down the pointer from 0. As it is NULL, it will then try to
    proceed to the next slot in the same level by adding the slot distance
    at that level which is 64 - making the next try 127. It goes around the
    loop and finds and returns 127 skipping [64, 126].

    Note that this bug also triggers in idr_for_each_entry() loop which
    deletes during iteration as deletions can make layers go away leaving
    the iteration with unaligned ID into missing layers.

    Fix it by ensuring proceeding to the next slot doesn't carry over the
    unaligned offset - ie. use round_up(id + 1, slot_distance) instead of
    id += slot_distance.

    Signed-off-by: Tejun Heo
    Reported-by: David Teigland
    Cc: KAMEZAWA Hiroyuki
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • For better code reuse use the newly added page iterator to iterate
    through the pages. The offset, length within the page is still
    calculated by the mapping iterator as well as the actual mapping. Idea
    from Tejun Heo.

    Signed-off-by: Imre Deak
    Cc: Maxim Levitsky
    Cc: Tejun Heo
    Cc: Daniel Vetter
    Cc: James Hogan
    Cc: Stephen Warren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Imre Deak
     
  • Add an iterator to walk through a scatter list a page at a time starting
    at a specific page offset. As opposed to the mapping iterator this is
    meant to be small, performing well even in simple loops like collecting
    all pages on the scatterlist into an array or setting up an iommu table
    based on the pages' DMA address.

    Signed-off-by: Imre Deak
    Cc: Maxim Levitsky
    Cc: Tejun Heo
    Cc: Daniel Vetter
    Tested-by: Stephen Warren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Imre Deak
     
  • A misplaced #endif causes link errors related to pcim_*() functions.

    This is because pcim_*() functions are related to CONFIG_PCI option,
    however these are not related to CONFIG_HAS_IOPORT option. Therefore,
    when CONFIG_PCI is enabled and CONFIG_HAS_IOPORT is not enabled, it makes
    link errors related to pcim_*() functions as below:

    drivers/ata/libata-sff.c:3233: undefined reference to `pcim_iomap_regions'
    drivers/ata/libata-sff.c:3238: undefined reference to `pcim_iomap_table'
    drivers/built-in.o: In function `ata_pci_sff_init_host':
    drivers/ata/libata-sff.c:2318: undefined reference to `pcim_iomap_regions'
    drivers/ata/libata-sff.c:2329: undefined reference to `pcim_iomap_table

    Signed-off-by: Jingoo Han
    Cc: Greg KH
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jingoo Han
     

26 Feb, 2013

1 commit

  • Pull module update from Rusty Russell:
    "The sweeping change is to make add_taint() explicitly indicate whether
    to disable lockdep, but it's a mechanical change."

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    MODSIGN: Add option to not sign modules during modules_install
    MODSIGN: Add -s option to sign-file
    MODSIGN: Specify the hash algorithm on sign-file command line
    MODSIGN: Simplify Makefile with a Kconfig helper
    module: clean up load_module a little more.
    modpost: Ignore ARC specific non-alloc sections
    module: constify within_module_*
    taint: add explicit flag to show whether lock dep is still OK.
    module: printk message when module signature fail taints kernel.

    Linus Torvalds
     

23 Feb, 2013

1 commit

  • Pull core locking changes from Ingo Molnar:
    "The biggest change is the rwsem lock-steal improvements, both to the
    assembly optimized and the spinlock based variants.

    The other notable change is the clean up of the seqlock implementation
    to be based on the seqcount infrastructure.

    The rest is assorted smaller debuggability, cleanup and continued -rt
    locking changes."

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    rwsem-spinlock: Implement writer lock-stealing for better scalability
    futex: Revert "futex: Mark get_robust_list as deprecated"
    generic: Use raw local irq variant for generic cmpxchg
    lockdep: Selftest: convert spinlock to raw spinlock
    seqlock: Use seqcount infrastructure
    seqlock: Remove unused functions
    ntp: Make ntp_lock raw
    intel_idle: Convert i7300_idle_lock to raw_spinlock
    locking: Various static lock initializer fixes
    lockdep: Print more info when MAX_LOCK_DEPTH is exceeded
    rwsem: Implement writer lock-stealing for better scalability
    lockdep: Silence warning if CONFIG_LOCKDEP isn't set
    watchdog: Use local_clock for get_timestamp()
    lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug()
    locking/stat: Fix a typo

    Linus Torvalds
     

22 Feb, 2013

11 commits

  • Pull x86 mm changes from Peter Anvin:
    "This is a huge set of several partly interrelated (and concurrently
    developed) changes, which is why the branch history is messier than
    one would like.

    The *really* big items are two humonguous patchsets mostly developed
    by Yinghai Lu at my request, which completely revamps the way we
    create initial page tables. In particular, rather than estimating how
    much memory we will need for page tables and then build them into that
    memory -- a calculation that has shown to be incredibly fragile -- we
    now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
    a #PF handler which creates temporary page tables on demand.

    This has several advantages:

    1. It makes it much easier to support things that need access to data
    very early (a followon patchset uses this to load microcode way
    early in the kernel startup).

    2. It allows the kernel and all the kernel data objects to be invoked
    from above the 4 GB limit. This allows kdump to work on very large
    systems.

    3. It greatly reduces the difference between Xen and native (Xen's
    equivalent of the #PF handler are the temporary page tables created
    by the domain builder), eliminating a bunch of fragile hooks.

    The patch series also gets us a bit closer to W^X.

    Additional work in this pull is the 64-bit get_user() work which you
    were also involved with, and a bunch of cleanups/speedups to
    __phys_addr()/__pa()."

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
    x86, mm: Move reserving low memory later in initialization
    x86, doc: Clarify the use of asm("%edx") in uaccess.h
    x86, mm: Redesign get_user with a __builtin_choose_expr hack
    x86: Be consistent with data size in getuser.S
    x86, mm: Use a bitfield to mask nuisance get_user() warnings
    x86/kvm: Fix compile warning in kvm_register_steal_time()
    x86-32: Add support for 64bit get_user()
    x86-32, mm: Remove reference to alloc_remap()
    x86-32, mm: Remove reference to resume_map_numa_kva()
    x86-32, mm: Rip out x86_32 NUMA remapping code
    x86/numa: Use __pa_nodebug() instead
    x86: Don't panic if can not alloc buffer for swiotlb
    mm: Add alloc_bootmem_low_pages_nopanic()
    x86, 64bit, mm: hibernate use generic mapping_init
    x86, 64bit, mm: Mark data/bss/brk to nx
    x86: Merge early kernel reserve for 32bit and 64bit
    x86: Add Crash kernel low reservation
    x86, kdump: Remove crashkernel range find limit for 64bit
    memblock: Add memblock_mem_size()
    x86, boot: Not need to check setup_header version for setup_data
    ...

    Linus Torvalds
     
  • Merge misc patches from Andrew Morton:

    - Florian has vanished so I appear to have become fbdev maintainer
    again :(

    - Joel and Mark are distracted to welcome to the new OCFS2 maintainer

    - The backlight queue

    - Small core kernel changes

    - lib/ updates

    - The rtc queue

    - Various random bits

    * akpm: (164 commits)
    rtc: rtc-davinci: use devm_*() functions
    rtc: rtc-max8997: use devm_request_threaded_irq()
    rtc: rtc-max8907: use devm_request_threaded_irq()
    rtc: rtc-da9052: use devm_request_threaded_irq()
    rtc: rtc-wm831x: use devm_request_threaded_irq()
    rtc: rtc-tps80031: use devm_request_threaded_irq()
    rtc: rtc-lp8788: use devm_request_threaded_irq()
    rtc: rtc-coh901331: use devm_clk_get()
    rtc: rtc-vt8500: use devm_*() functions
    rtc: rtc-tps6586x: use devm_request_threaded_irq()
    rtc: rtc-imxdi: use devm_clk_get()
    rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()
    rtc: rtc-pcf8583: use dev_warn() instead of printk()
    rtc: rtc-sun4v: use pr_warn() instead of printk()
    rtc: rtc-vr41xx: use dev_info() instead of printk()
    rtc: rtc-rs5c313: use pr_err() instead of printk()
    rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()
    rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()
    rtc: rtc-ds2404: use dev_err() instead of printk()
    rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()
    ...

    Linus Torvalds
     
  • Change the defautl XZ_DEC_* config symbol to match the configured
    architecture. It is perfectly legitimate to support multiple XZ BCJ
    filters for different architectures (e.g.: to mount foreign squashfs/xz
    compressed filesystems), it is however more natural not to select them all
    by default, but only the one matching the configured architecture.

    Signed-off-by: Florian Fainelli
    Acked-by: Lasse Collin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Fainelli
     
  • Remove the XZ_DEC_* depedencey on CONFIG_EXPERT as recommended by Lasse
    Colin.

    Signed-off-by: Florian Fainelli
    Acked-by: Lasse Collin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Fainelli
     
  • Group all architecture-specific BCJ filter configuration symbols under an
    if XZ_BCJ / endif statement.

    Signed-off-by: Florian Fainelli
    Acked-by: Lasse Collin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Florian Fainelli
     
  • match_number() has return values of -ENOMEM, -EINVAL and -ERANGE. So, for
    all the functions calling match_number, the return value should include
    these values. Fix up the comments to reflect the correct values.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Amit Sahrawat
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namjae Jeon
     
  • Add the %pa format specifier for printing a phys_addr_t type and its
    derivative types (such as resource_size_t), since the physical address
    size on some platforms can vary based on build options, regardless of
    the native integer type.

    Signed-off-by: Stepan Moskovchenko
    Cc: Rob Landley
    Cc: George Spelvin
    Cc: Andy Shevchenko
    Cc: Stephen Boyd
    Cc: Andrei Emeltchenko
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stepan Moskovchenko
     
  • CONFIG_EXPERT doesn't really make sense, and hides it unintentionally.
    Remove superfluous "default n" pointed out by Ingo as well.

    Signed-off-by: Kyle McMartin
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kyle McMartin
     
  • Pull tty/serial patches from Greg Kroah-Hartman:
    "Here's the big tty/serial driver patches for 3.9-rc1.

    More tty port rework and fixes from Jiri here, as well as lots of
    individual serial driver updates and fixes.

    All of these have been in the linux-next tree for a while."

    * tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits)
    tty: mxser: improve error handling in mxser_probe() and mxser_module_init()
    serial: imx: fix uninitialized variable warning
    serial: tegra: assume CONFIG_OF
    TTY: do not update atime/mtime on read/write
    lguest: select CONFIG_TTY to build properly.
    ARM defconfigs: add missing inclusions of linux/platform_device.h
    fb/exynos: include platform_device.h
    ARM: sa1100/assabet: include platform_device.h directly
    serial: imx: Fix recursive locking bug
    pps: Fix build breakage from decoupling pps from tty
    tty: Remove ancient hardpps()
    pps: Additional cleanups in uart_handle_dcd_change
    pps: Move timestamp read into PPS code proper
    pps: Don't crash the machine when exiting will do
    pps: Fix a use-after free bug when unregistering a source.
    pps: Use pps_lookup_dev to reduce ldisc coupling
    pps: Add pps_lookup_dev() function
    tty: serial: uartlite: Support uartlite on big and little endian systems
    tty: serial: uartlite: Fix sparse and checkpatch warnings
    serial/arc-uart: Miscll DT related updates (Grant's review comments)
    ...

    Fix up trivial conflicts, mostly just due to the TTY config option
    clashing with the EXPERIMENTAL removal.

    Linus Torvalds
     
  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     
  • Pull security subsystem updates from James Morris:
    "This is basically a maintenance update for the TPM driver and EVM/IMA"

    Fix up conflicts in lib/digsig.c and security/integrity/ima/ima_main.c

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (45 commits)
    tpm/ibmvtpm: build only when IBM pseries is configured
    ima: digital signature verification using asymmetric keys
    ima: rename hash calculation functions
    ima: use new crypto_shash API instead of old crypto_hash
    ima: add policy support for file system uuid
    evm: add file system uuid to EVM hmac
    tpm_tis: check pnp_acpi_device return code
    char/tpm/tpm_i2c_stm_st33: drop temporary variable for return value
    char/tpm/tpm_i2c_stm_st33: remove dead assignment in tpm_st33_i2c_probe
    char/tpm/tpm_i2c_stm_st33: Remove __devexit attribute
    char/tpm/tpm_i2c_stm_st33: Don't use memcpy for one byte assignment
    tpm_i2c_stm_st33: removed unused variables/code
    TPM: Wait for TPM_ACCESS tpmRegValidSts to go high at startup
    tpm: Fix cancellation of TPM commands (interrupt mode)
    tpm: Fix cancellation of TPM commands (polling mode)
    tpm: Store TPM vendor ID
    TPM: Work around buggy TPMs that block during continue self test
    tpm_i2c_stm_st33: fix oops when i2c client is unavailable
    char/tpm: Use struct dev_pm_ops for power management
    TPM: STMicroelectronics ST33 I2C BUILD STUFF
    ...

    Linus Torvalds
     

20 Feb, 2013

1 commit

  • Pull RCU changes from Ingo Molnar:
    "SRCU changes:

    - These include debugging aids, updates that move towards the goal of
    permitting srcu_read_lock() and srcu_read_unlock() to be used from
    idle and offline CPUs, and a few small fixes.

    Changes to rcutorture and to RCU documentation:

    - Posted to LKML at https://lkml.org/lkml/2013/1/26/188

    Enhancements to uniprocessor handling in tiny RCU:

    - Posted to LKML at https://lkml.org/lkml/2013/1/27/2

    Tag RCU callbacks with grace-period number to simplify callback
    advancement:

    - Posted to LKML at https://lkml.org/lkml/2013/1/26/203

    Miscellaneous fixes:

    - Posted to LKML at https://lkml.org/lkml/2013/1/26/204"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock()
    srcu: Update synchronize_srcu_expedited()'s comments
    srcu: Update synchronize_srcu()'s comments
    srcu: Remove checks preventing idle CPUs from calling srcu_read_lock()
    srcu: Remove checks preventing offline CPUs from calling srcu_read_lock()
    srcu: Simple cleanup for cleanup_srcu_struct()
    srcu: Add might_sleep() annotation to synchronize_srcu()
    srcu: Simplify __srcu_read_unlock() via this_cpu_dec()
    rcu: Allow rcutorture to be built at low optimization levels
    rcu: Make rcutorture's shuffler task shuffle recently added tasks
    rcu: Allow TREE_PREEMPT_RCU on UP systems
    rcu: Provide RCU CPU stall warnings for tiny RCU
    context_tracking: Add comments on interface and internals
    rcu: Remove obsolete Kconfig option from comment
    rcu: Remove unused code originally used for context tracking
    rcu: Consolidate debugging Kconfig options
    rcu: Correct 'optimized' to 'optimize' in header comment
    rcu: Trace callback acceleration
    rcu: Tag callback lists with corresponding grace-period number
    rcutorture: Don't compare ptr with 0
    ...

    Linus Torvalds
     

19 Feb, 2013

3 commits

  • We (Linux Kernel Performance project) found a regression
    introduced by commit:

    5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem

    which converted all anon_vma::mutex locks rwsem write locks.

    The semantics are the same, but the behavioral difference is
    quite huge in some cases. After investigating it we found the
    root cause: mutexes support lock stealing while rwsems don't.

    Here is the link for the detailed regression report:

    https://lkml.org/lkml/2013/1/29/84

    Ingo suggested adding write lock stealing to rwsems:

    "I think we should allow lock-steal between rwsem writers - that
    will not hurt fairness as most rwsem fairness concerns relate to
    reader vs. writer fairness"

    And here is the rwsem-spinlock version.

    With this patch, we got a double performance increase in one
    test box with following aim7 workfile:

    FILESIZE: 1M
    POOLSIZE: 10M
    10 fork_test

    /usr/bin/time output w/o patch /usr/bin/time_output with patch
    -- Percent of CPU this job got: 369% Percent of CPU this job got: 537%
    Voluntary context switches: 640595016 Voluntary context switches: 157915561

    We got a 45% increase in CPU usage and saved about 3/4 voluntary context switches.

    Reported-by: LKP project
    Suggested-by: Ingo Molnar
    Signed-off-by: Yuanhan Liu
    Cc: Alex Shi
    Cc: David Howells
    Cc: Michel Lespinasse
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Anton Blanchard
    Cc: Arjan van de Ven
    Cc: paul.gortmaker@windriver.com
    Link: http://lkml.kernel.org/r/1359716356-23865-1-git-send-email-yuanhan.liu@linux.intel.com
    Signed-off-by: Ingo Molnar

    Yuanhan Liu
     
  • To make the lockdep selftest working on RT we need to convert the
    spinlock tests to a raw spinlock. Otherwise we cannot run the irq
    context checks. For mainline this is just annotational as spinlocks
    are mapped to raw_spinlocks anyway.

    Signed-off-by: Yong Zhang
    Link: http://lkml.kernel.org/r/1334559716-18447-2-git-send-email-yong.zhang0@gmail.com
    Signed-off-by: Thomas Gleixner

    Yong Zhang
     
  • Commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex
    to an rwsem") changed struct anon_vma::mutex to an rwsem, which
    caused aim7 fork_test performance to drop by 50%.

    Yuanhan Liu did the following excellent analysis:

    https://lkml.org/lkml/2013/1/29/84

    and found that the regression is caused by strict, serialized,
    FIFO sequential write-ownership of rwsems. Ingo suggested
    implementing opportunistic lock-stealing for the front writer
    task in the waitqueue.

    Yuanhan Liu implemented lock-stealing for spinlock-rwsems,
    which indeed recovered much of the regression - confirming
    the analysis that the main factor in the regression was the
    FIFO writer-fairness of rwsems.

    In this patch we allow lock-stealing to happen when the first
    waiter is also writer. With that change in place the
    aim7 fork_test performance is fully recovered on my
    Intel NHM EP, NHM EX, SNB EP 2S and 4S test-machines.

    Reported-by: lkp@linux.intel.com
    Reported-by: Yuanhan Liu
    Signed-off-by: Alex Shi
    Cc: David Howells
    Cc: Michel Lespinasse
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Anton Blanchard
    Cc: Arjan van de Ven
    Cc: paul.gortmaker@windriver.com
    Link: https://lkml.org/lkml/2013/1/29/84
    Link: http://lkml.kernel.org/r/1360069915-31619-1-git-send-email-alex.shi@intel.com
    [ Small stylistic fixes, updated changelog. ]
    Signed-off-by: Ingo Molnar

    Alex Shi
     

05 Feb, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    1. Changes to rcutorture and to RCU documentation. Posted to LKML at
    https://lkml.org/lkml/2013/1/26/188.

    2. Enhancements to uniprocessor handling in tiny RCU. Posted to LKML
    at https://lkml.org/lkml/2013/1/27/2.

    3. Tag RCU callbacks with grace-period number to simplify callback
    advancement. Posted to LKML at https://lkml.org/lkml/2013/1/26/203.

    4. Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/1/26/204.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

01 Feb, 2013

3 commits


30 Jan, 2013

1 commit

  • Normal boot path on system with iommu support:
    swiotlb buffer will be allocated early at first and then try to initialize
    iommu, if iommu for intel or AMD could setup properly, swiotlb buffer
    will be freed.

    The early allocating is with bootmem, and could panic when we try to use
    kdump with buffer above 4G only, or with memmap to limit mem under 4G.
    for example: memmap=4095M$1M to remove memory under 4G.

    According to Eric, add _nopanic version and no_iotlb_memory to fail
    map single later if swiotlb is still needed.

    -v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
    panic early instead of using swiotlb_full to panic...according to Eric/Konrad.
    -v3: make swiotlb_init to be notpanic, but will affect:
    arm64, ia64, powerpc, tile, unicore32, x86.
    -v4: cleanup swiotlb_init by removing swiotlb_init_with_default_size.

    Suggested-by: Eric W. Biederman
    Signed-off-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/1359058816-7615-36-git-send-email-yinghai@kernel.org
    Reviewed-and-tested-by: Konrad Rzeszutek Wilk
    Cc: Joerg Roedel
    Cc: Ralf Baechle
    Cc: Jeremy Fitzhardinge
    Cc: Kyungmin Park
    Cc: Marek Szyprowski
    Cc: Arnd Bergmann
    Cc: Andrzej Pietrasiewicz
    Cc: linux-mips@linux-mips.org
    Cc: xen-devel@lists.xensource.com
    Cc: virtualization@lists.linux-foundation.org
    Cc: Shuah Khan
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

29 Jan, 2013

2 commits

  • …' and 'tiny.2013.01.29b' into HEAD

    doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation.

    fixes.2013.01.26a: Miscellaneous fixes.

    tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to
    simplify callback advancement.

    tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.

    Paul E. McKenney
     
  • Tiny RCU has historically omitted RCU CPU stall warnings in order to
    reduce memory requirements, however, lack of these warnings caused
    Thomas Gleixner some debugging pain recently. Therefore, this commit
    adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps
    the memory footprint small, while still enabling CPU stall warnings
    in kernels built to enable them.

    Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON
    config variable to simplify #if expressions.

    Reported-by: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney