11 Sep, 2013

6 commits

  • This series reworks our current object cache shrinking infrastructure in
    two main ways:

    * Noticing that a lot of users copy and paste their own version of LRU
    lists for objects, we put some effort in providing a generic version.
    It is modeled after the filesystem users: dentries, inodes, and xfs
    (for various tasks), but we expect that other users could benefit in
    the near future with little or no modification. Let us know if you
    have any issues.

    * The underlying list_lru being proposed automatically and
    transparently keeps the elements in per-node lists, and is able to
    manipulate the node lists individually. Given this infrastructure, we
    are able to modify the up-to-now hammer called shrink_slab to proceed
    with node-reclaim instead of always searching memory from all over like
    it has been doing.

    Per-node lru lists are also expected to lead to less contention in the lru
    locks on multi-node scans, since we are now no longer fighting for a
    global lock. The locks usually disappear from the profilers with this
    change.

    Although we have no official benchmarks for this version - be our guest to
    independently evaluate this - earlier versions of this series were
    performance tested (details at
    http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
    visible performance regressions while yielding a better qualitative
    behavior in NUMA machines.

    With this infrastructure in place, we can use the list_lru entry point to
    provide memcg isolation and per-memcg targeted reclaim. Historically,
    those two pieces of work have been posted together. This version presents
    only the infrastructure work, deferring the memcg work for a later time,
    so we can focus on getting this part tested. You can see more about the
    history of such work at http://lwn.net/Articles/552769/

    Dave Chinner (18):
    dcache: convert dentry_stat.nr_unused to per-cpu counters
    dentry: move to per-sb LRU locks
    dcache: remove dentries from LRU before putting on dispose list
    mm: new shrinker API
    shrinker: convert superblock shrinkers to new API
    list: add a new LRU list type
    inode: convert inode lru list to generic lru list code.
    dcache: convert to use new lru list infrastructure
    list_lru: per-node list infrastructure
    shrinker: add node awareness
    fs: convert inode and dentry shrinking to be node aware
    xfs: convert buftarg LRU to generic code
    xfs: rework buffer dispose list tracking
    xfs: convert dquot cache lru to list_lru
    fs: convert fs shrinkers to new scan/count API
    drivers: convert shrinkers to new count/scan API
    shrinker: convert remaining shrinkers to count/scan API
    shrinker: Kill old ->shrink API.

    Glauber Costa (7):
    fs: bump inode and dentry counters to long
    super: fix calculation of shrinkable objects for small numbers
    list_lru: per-node API
    vmscan: per-node deferred work
    i915: bail out earlier when shrinker cannot acquire mutex
    hugepage: convert huge zero page shrinker to new shrinker API
    list_lru: dynamically adjust node arrays

    This patch:

    There are situations in very large machines in which we can have a large
    quantity of dirty inodes, unused dentries, etc. This is particularly true
    when umounting a filesystem, where eventually since every live object will
    eventually be discarded.

    Dave Chinner reported a problem with this while experimenting with the
    shrinker revamp patchset. So we believe it is time for a change. This
    patch just moves int to longs. Machines where it matters should have a
    big long anyway.

    Signed-off-by: Glauber Costa
    Cc: Dave Chinner
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: Dave Chinner
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     
  • Signed-off-by: Dave Jones
    Signed-off-by: Al Viro

    Dave Jones
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • For a long time no filesystem has been using vfs_follow_link, and as seen
    by recent filesystem submissions any new use is accidental as well.

    Remove vfs_follow_link, document the replacement in
    Documentation/filesystems/porting and also rename __vfs_follow_link
    to match its only caller better.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Pull vfs pile 3 (of many) from Al Viro:
    "Waiman's conversion of d_path() and bits related to it,
    kern_path_mountpoint(), several cleanups and fixes (exportfs
    one is -stable fodder, IMO).

    There definitely will be more... ;-/"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    split read_seqretry_or_unlock(), convert d_walk() to resulting primitives
    dcache: Translating dentry into pathname without taking rename_lock
    autofs4 - fix device ioctl mount lookup
    introduce kern_path_mountpoint()
    rename user_path_umountat() to user_path_mountpoint_at()
    take unlazy_walk() into umount_lookup_last()
    Kill indirect include of file.h from eventfd.h, use fdget() in cgroup.c
    prune_super(): sb->s_op is never NULL
    exportfs: don't assume that ->iterate() won't feed us too long entries
    afs: get rid of redundant ->d_name.len checks

    Linus Torvalds
     
  • When I moved the RCU walk termination into unlazy_walk(), I didn't copy
    quite all of it: for the successful RCU termination we properly add the
    necessary reference counts to our temporary copy of the root path, but
    for the failure case we need to make sure that any temporary root path
    information is cleared out (since it does _not_ have the proper
    reference counts from the RCU lookup).

    We could clean up this mess by just always dropping the temporary root
    information, but Al points out that that would mean that a single lookup
    through symlinks could see multiple different root entries if it races
    with another thread doing chroot. Not that I think we should really
    care (we had that before too, back before we had a copy of the root path
    in the nameidata).

    Al says he has a cunning plan. In the meantime, this is the minimal fix
    for the problem, even if it's not all that pretty.

    Reported-by: Mace Moneta
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Sep, 2013

27 commits

  • Pull dmaengine update from Dan Williams:
    "Collection of random updates to the core and some end-driver fixups
    for ioatdma and mv_xor:
    - NUMA aware channel allocation
    - Cleanup dmatest debugfs interface
    - ioat: make raid-support Atom only
    - mv_xor: big endian

    Aside from the top three commits these have all had some soak time in
    -next. The top commit fixes a recent build breakage.

    It has been a long while since my last pull request, hopefully it does
    not show. Thanks to Vinod for keeping an eye on drivers/dma/ this
    past year"

    * tag 'dmaengine-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
    dmaengine: dma_sync_wait and dma_find_channel undefined
    MAINTAINERS: update email for Dan Williams
    dma: mv_xor: Fix incorrect error path
    ioatdma: silence GCC warnings
    dmaengine: make dma_channel_rebalance() NUMA aware
    dmaengine: make dma_submit_error() return an error code
    ioatdma: disable RAID on non-Atom platforms and reenable unaligned copies
    mv_xor: support big endian systems using descriptor swap feature
    mv_xor: use {readl, writel}_relaxed instead of __raw_{readl, writel}
    dmatest: print message on debug level in case of no error
    dmatest: remove IS_ERR_OR_NULL checks of debugfs calls
    dmatest: make module parameters writable

    Linus Torvalds
     
  • dma_sync_wait and dma_find_channel are declared regardless of whether
    CONFIG_DMA_ENGINE is enabled, but calling the function without
    CONFIG_DMA_ENGINE enabled results "undefined reference" errors.

    To get around this, declare dma_sync_wait and dma_find_channel as inline
    functions if CONFIG_DMA_ENGINE is undefined.

    Signed-off-by: Jon Mason
    Signed-off-by: Dan Williams

    Jon Mason
     
  • Pull ARM SoC late changes from Kevin Hilman:
    "These are changes that arrived a little late before the merge window,
    or had dependencies on previous branches.

    Highlights:
    - ux500: misc. cleanup, fixup I2C devices
    - exynos: DT updates for RTC; PM updates
    - at91: DT updates for NAND; new platforms added to generic defconfig
    - sunxi: DT updates: cubieboard2, pinctrl driver, gated clocks
    - highbank: LPAE fixes, select necessary ARM errata
    - omap: PM fixes and improvements; OMAP5 mailbox support
    - omap: basic support for new DRA7xx SoCs"

    * tag 'late-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits)
    ARM: dts: vexpress: Add CCI node to TC2 device-tree
    ARM: EXYNOS: Skip C1 cpuidle state for exynos5440
    ARM: EXYNOS: always enable PM domains support for EXYNOS4X12
    ARM: highbank: clean-up some unused includes
    ARM: sun7i: Enable the A20 clocks in the DTSI
    ARM: sun6i: Enable clock support in the DTSI
    ARM: sun5i: dt: Use the A10s gates in the DTSI
    ARM: at91: at91_dt_defconfig: enable rm9200 support
    ARM: dts: add ADC device tree node for exynos5420/5250
    ARM: dts: Add RTC DT node to Exynos5420 SoC
    ARM: dts: Update the "status" property of RTC DT node for Exynos5250 SoC
    ARM: dts: Fix the RTC DT node name for Exynos5250
    irqchip: mmp: avoid to include irqs head file
    ARM: mmp: avoid to include head file in mach-mmp
    irqchip: mmp: support irqchip
    irqchip: move mmp irq driver
    ARM: OMAP: AM33xx: clock: Add RNG clock data
    ARM: OMAP: TI81XX: add always-on powerdomain for TI81XX
    ARM: OMAP4: clock: Lock PLLs in the right sequence
    ARM: OMAP: AM33XX: hwmod: Add hwmod data for debugSS
    ...

    Linus Torvalds
     
  • Pull ARM Renesas SoC cleanup, refactoring and more SMP support from Kevin Hilman:
    "Lots of cleanup and refactoring and some SMP additions for Renesas
    platforms. Due to some inter-dependencies with other arm-soc
    branches, this Renesas stuff was separated out for sending after the
    other branches were merged.

    Highlights:
    - remove unused board support and cleanup of unused headers
    - refactoring of init and device registration
    - simplify IRQ initialization"

    * tag 'renesas-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (68 commits)
    ARM: shmobile: Per-CPU SMP boot / sleep code for SCU SoCs
    ARM: shmobile: Introduce per-CPU SMP boot / sleep code
    ARM: shmobile: Use shared SCU CPU Hotplug code on r8a7779
    ARM: shmobile: Use shared SCU CPU Hotplug code on sh73a0
    ARM: shmobile: Add shared SCU CPU Hotplug code
    ARM: shmobile: Use shared SCU SMP boot code on emev2
    ARM: shmobile: Use shared SCU SMP boot code on r8a7779
    ARM: shmobile: Use shared SCU SMP boot code on sh73a0
    ARM: shmobile: Introduce shared SCU SMP boot code
    ARM: shmobile: sh73a0: Remove global GPIO_NR definition
    ARM: shmobile: kzm9d: remove nfsroot settings from bootargs
    ARM: shmobile: armadillo800eva: remove nfsroot settings from bootargs
    ARM: shmobile: r8a7779: move r8a7779_init_irq_xxx() to setup
    ARM: shmobile: r8a7740: move r8a7740_init_irq_of() to setup
    ARM: shmobile: bockw: add missing __initdata
    ARM: shmobile: r8a7790: add missing __initdata
    ARM: shmobile: r8a7779: add missing __initdata
    ARM: shmobile: Remove unused shmobile_init_time()
    ARM: shmobile: Use clocksource_of_init() on r8a7790
    ARM: shmobile: Use default ->init_time() on KZM9G DT ref
    ...

    Linus Torvalds
     
  • Pull ARM SoC driver update from Kevin Hilman:
    "This contains the ARM SoC related driver updates for v3.12. The only
    thing this cycle are core PM updates and CPUidle support for ARM's TC2
    big.LITTLE development platform"

    * tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    cpuidle: big.LITTLE: vexpress-TC2 CPU idle driver
    ARM: vexpress: tc2: disable GIC CPU IF in tc2_pm_suspend
    drivers: irq-chip: irq-gic: introduce gic_cpu_if_down()

    Linus Torvalds
     
  • Pull clock framework changes from Michael Turquette:
    "The common clk framework changes for 3.12 are dominated by clock
    driver patches, both new drivers and fixes to existing. A high
    percentage of these are for Samsung platforms like Exynos. Core
    framework fixes and some new features like automagical clock
    re-parenting round out the patches"

    * tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linux: (102 commits)
    clk: only call get_parent if there is one
    clk: samsung: exynos5250: Simplify registration of PLL rate tables
    clk: samsung: exynos4: Register PLL rate tables for Exynos4x12
    clk: samsung: exynos4: Register PLL rate tables for Exynos4210
    clk: samsung: exynos4: Reorder registration of mout_vpllsrc
    clk: samsung: pll: Add support for rate configuration of PLL46xx
    clk: samsung: pll: Use new registration method for PLL46xx
    clk: samsung: pll: Add support for rate configuration of PLL45xx
    clk: samsung: pll: Use new registration method for PLL45xx
    clk: samsung: exynos4: Rename exynos4_plls to exynos4x12_plls
    clk: samsung: exynos4: Remove checks for DT node
    clk: samsung: exynos4: Remove unused static clkdev aliases
    clk: samsung: Modify _get_rate() helper to use __clk_lookup()
    clk: samsung: exynos4: Use separate aliases for cpufreq related clocks
    clocksource: samsung_pwm_timer: Get clock from device tree
    ARM: dts: exynos4: Specify PWM clocks in PWM node
    pwm: samsung: Update DT bindings documentation to cover clocks
    clk: Move symbol export to proper location
    clk: fix new_parent dereference before null check
    clk: wm831x: Initialise wm831x pointer on init
    ...

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "Not much changes for the 3.12 merge window. The major tracing changes
    are still in flux, and will have to wait for 3.13.

    The changes for 3.12 are mostly clean ups and minor fixes.

    H Peter Anvin added a check to x86_32 static function tracing that
    helps a small segment of the kernel community.

    Oleg Nesterov had a few changes from 3.11, but were mostly clean ups
    and not worth pushing in the -rc time frame.

    Li Zefan had small clean up with annotating a raw_init with __init.

    I fixed a slight race in updating function callbacks, but the race is
    so small and the bug that happens when it occurs is so minor it's not
    even worth pushing to stable.

    The only real enhancement is from Alexander Z Lam that made the
    tracing_cpumask work for trace buffer instances, instead of them all
    sharing a global cpumask"

    * tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ftrace/rcu: Do not trace debug_lockdep_rcu_enabled()
    x86-32, ftrace: Fix static ftrace when early microcode is enabled
    ftrace: Fix a slight race in modifying what function callback gets traced
    tracing: Make tracing_cpumask available for all instances
    tracing: Kill the !CONFIG_MODULES code in trace_events.c
    tracing: Don't pass file_operations array to event_create_dir()
    tracing: Kill trace_create_file_ops() and friends
    tracing/syscalls: Annotate raw_init function with __init

    Linus Torvalds
     
  • In __clk_init(), after a clock is mostly initialized, a scan is done
    of the orphan clocks to see if the clock being registered is the
    parent of any of them.

    This code assumes that any clock that provides a get_parent method
    actually has at least one parent, and that's not a valid assumption.

    As a result, an orphan clock with no parent can return *something*
    as the parent index, and that value is blindly used to dereference
    the orphan's parent_names[] array (which will be ZERO_SIZE_PTR or
    NULL).

    Fix this by ensuring get_parent is only called for orphans with at
    least one parent.

    Signed-off-by: Alex Elder
    Signed-off-by: Mike Turquette

    Alex Elder
     
  • Separate "check if we need to retry" from "unlock if we are done and
    had seq_writelock"; that allows to use these guys in d_walk(), where
    we need to recheck every time we ascend back to parent, but do *not*
    want to unlock until the very end. Lift rcu_read_lock/rcu_read_unlock
    out into callers.

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull xfs updates from Ben Myers:
    "For 3.12-rc1 there are a number of bugfixes in addition to work to
    ease usage of shared code between libxfs and the kernel, the rest of
    the work to enable project and group quotas to be used simultaneously,
    performance optimisations in the log and the CIL, directory entry file
    type support, fixes for log space reservations, some spelling/grammar
    cleanups, and the addition of user namespace support.

    - introduce readahead to log recovery
    - add directory entry file type support
    - fix a number of spelling errors in comments
    - introduce new Q_XGETQSTATV quotactl for project quotas
    - add USER_NS support
    - log space reservation rework
    - CIL optimisations
    - kernel/userspace libxfs rework"

    * tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
    xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
    xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
    Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
    xfs: finish removing IOP_* macros.
    xfs: inode log reservations are too small
    xfs: check correct status variable for xfs_inobt_get_rec() call
    xfs: inode buffers may not be valid during recovery readahead
    xfs: check LSN ordering for v5 superblocks during recovery
    xfs: btree block LSN escaping to disk uninitialised
    XFS: Assertion failed: first < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
    xfs: fix bad dquot buffer size in log recovery readahead
    xfs: don't account buffer cancellation during log recovery readahead
    xfs: check for underflow in xfs_iformat_fork()
    xfs: xfs_dir3_sfe_put_ino can be static
    xfs: introduce object readahead to log recovery
    xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
    xfs: Register hotcpu notifier after initialization
    xfs: add xfs sb v4 support for dirent filetype field
    xfs: Add write support for dirent filetype field
    xfs: Add read-only support for dirent filetype field
    ...

    Linus Torvalds
     
  • Not using the return value can in the generic case be racy, so it's
    in general good practice to check the return value instead.

    This also resolved the warning caused on ARM and other architectures:

    fs/direct-io.c: In function 'sb_init_dio_done_wq':
    fs/direct-io.c:557:2: warning: value computed is not used [-Wunused-value]

    Signed-off-by: Olof Johansson
    Reviewed-by: Jan Kara
    Cc: Geert Uytterhoeven
    Cc: Stephen Rothwell
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Russell King
    Cc: H Peter Anvin
    Signed-off-by: Linus Torvalds

    Olof Johansson
     
  • When running the AIM7's short workload, Linus' lockref patch eliminated
    most of the spinlock contention. However, there were still some left:

    8.46% reaim [kernel.kallsyms] [k] _raw_spin_lock
    |--42.21%-- d_path
    | proc_pid_readlink
    | SyS_readlinkat
    | SyS_readlink
    | system_call
    | __GI___readlink
    |
    |--40.97%-- sys_getcwd
    | system_call
    | __getcwd

    The big one here is the rename_lock (seqlock) contention in d_path()
    and the getcwd system call. This patch will eliminate the need to take
    the rename_lock while translating dentries into the full pathnames.

    The need to take the rename_lock is to make sure that no rename
    operation can be ongoing while the translation is in progress. However,
    only one thread can take the rename_lock thus blocking all the other
    threads that need it even though the translation process won't make
    any change to the dentries.

    This patch will replace the writer's write_seqlock/write_sequnlock
    sequence of the rename_lock of the callers of the prepend_path() and
    __dentry_path() functions with the reader's read_seqbegin/read_seqretry
    sequence within these 2 functions. As a result, the code will have to
    retry if one or more rename operations had been performed. In addition,
    RCU read lock will be taken during the translation process to make sure
    that no dentries will go away. To prevent live-lock from happening,
    the code will switch back to take the rename_lock if read_seqretry()
    fails for three times.

    To further reduce spinlock contention, this patch does not take the
    dentry's d_lock when copying the filename from the dentries. Instead,
    it treats the name pointer and length as unreliable and just copy
    the string byte-by-byte over until it hits a null byte or the end of
    string as specified by the length. This should avoid stepping into
    invalid memory address. The error cases are left to be handled by
    the sequence number check.

    The following code re-factoring are also made:
    1. Move prepend('/') into prepend_name() to remove one conditional
    check.
    2. Move the global root check in prepend_path() back to the top of
    the while loop.

    With this patch, the _raw_spin_lock will now account for only 1.2%
    of the total CPU cycles for the short workload. This patch also has
    the effect of reducing the effect of running perf on its profile
    since the perf command itself can be a heavy user of the d_path()
    function depending on the complexity of the workload.

    When taking the perf profile of the high-systime workload, the amount
    of spinlock contention contributed by running perf without this patch
    was about 16%. With this patch, the spinlock contention caused by
    the running of perf will go away and we will have a more accurate
    perf profile.

    Signed-off-by: Waiman Long
    Signed-off-by: Al Viro

    Waiman Long
     
  • Pull mtd updates from David Woodhouse:
    - factor out common code from MTD tests
    - nand-gpio cleanup and portability to non-ARM
    - m25p80 support for 4-byte addressing chips, other new chips
    - pxa3xx cleanup and support for new platforms
    - remove obsolete alauda, octagon-5066 drivers
    - erase/write support for bcm47xxsflash
    - improve detection of ECC requirements for NAND, controller setup
    - NFC acceleration support for atmel-nand, read/write via SRAM
    - etc

    * tag 'for-linus-20130909' of git://git.infradead.org/linux-mtd: (184 commits)
    mtd: chips: Add support for PMC SPI Flash chips in m25p80.c
    mtd: ofpart: use for_each_child_of_node() macro
    mtd: mtdswap: replace strict_strtoul() with kstrtoul()
    mtd cs553x_nand: use kzalloc() instead of memset
    mtd: atmel_nand: fix error return code in atmel_nand_probe()
    mtd: bcm47xxsflash: writing support
    mtd: bcm47xxsflash: implement erasing support
    mtd: bcm47xxsflash: convert to module_platform_driver instead of init/exit
    mtd: bcm47xxsflash: convert kzalloc to avoid invalid access
    mtd: remove alauda driver
    mtd: nand: mxc_nand: mark 'const' properly
    mtd: maps: cfi_flagadm: add missing __iomem annotation
    mtd: spear_smi: add missing __iomem annotation
    mtd: r852: Staticize local symbols
    mtd: nandsim: Staticize local symbols
    mtd: impa7: add missing __iomem annotation
    mtd: sm_ftl: Staticize local symbols
    mtd: m25p80: add support for mr25h10
    mtd: m25p80: make CONFIG_M25PXX_USE_FAST_READ safe to enable
    mtd: m25p80: Pass flags through CAT25_INFO macro
    ...

    Linus Torvalds
     
  • Pull firewire updates from Stefan Richter:

    - Fix a regression since 3.2 inclusive: The subsystem workqueue
    deadlocked between transaction completion handling and bus reset
    handling if the worker pool could not be increased in time.

    - janitorial updates

    * tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
    firewire: ohci: Fix deadlock at bus reset
    firewire: ohci: Change module_pci_driver to module_init/module_exit
    firewire: ohci: beautify some macro definitions
    firewire: ohci: change confusing name of a struct member
    firewire: core: typecast from gfp_t to bool more safely
    firewire: WQ_NON_REENTRANT is meaningless and going away

    Linus Torvalds
     
  • Returned to intel.com

    Cc: Vinod Koul
    Cc: Linus Walleij
    Cc: Jon Mason
    Cc: Dave Jiang
    Cc: Neil Brown
    Cc: Shaohua Li
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Pull DMA mapping update from Marek Szyprowski:
    "This contains an addition of Device Tree support for reserved memory
    regions (Contiguous Memory Allocator is one of the drivers for it) and
    changes required by the KVM extensions for PowerPC architectue"

    * 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
    ARM: init: add support for reserved memory defined by device tree
    drivers: of: add initialization code for dma reserved memory
    drivers: of: add function to scan fdt nodes given by path
    drivers: dma-contiguous: clean source code and prepare for device tree

    Linus Torvalds
     
  • Return directly if memory allocation fails. There is no need
    of dma_free_coherent().

    Signed-off-by: Sachin Kamat
    Cc: Saeed Bishara
    Signed-off-by: Dan Williams

    Sachin Kamat
     
  • Pull virtio update from Rusty Russell:
    "More console fixes; these are the theoretical ones which didn't get
    CC:stable. But for that reason, I did a merge with master partway
    through to avoid an unnecessary conflict.

    Also: a fun lguest bug turns out if you don't clear the TF flag when
    trapping Bad Things happen to the guest kernel as the stack
    overflows..."

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio_pci: pm: Use CONFIG_PM_SLEEP instead of CONFIG_PM
    lguest: fix GPF in guest when using gdb.
    lguest: fix guest kernel stack overflow when TF bit set.
    lguest: fix BUG_ON() in invalid guest page table.
    virtio: console: prevent use-after-free of port name in port unplug
    virtio: console: cleanup an error message
    virtio: console: fix locking around send_sigio_to_port()
    virtio: console: add locking in port unplug path
    virtio: console: add locks around buffer removal in port unplug path
    tools/lguest: offer VIRTIO_F_ANY_LAYOUT for net device.
    virtio tools: add .gitignore
    lguest: Point to the right directory for the lguest launcher

    Linus Torvalds
     
  • Pull VFIO update from Alex Williamson:
    "VFIO updates include safer default file flags for VFIO device fds, an
    external user interface exported to allow other modules to hold
    references to VFIO groups, a fix to test for extended config space on
    PCIe and PCI-x, and new hot reset interfaces for PCI devices which
    allows the user to do PCI bus/slot resets when all of the devices
    affected by the reset are owned by the user.

    For this last feature, the PCI bus reset interface, I depend on
    changes already merged from Bjorn's PCI pull request. I therefore
    merged my tree up to commit cb3e433, which I think was the correct
    action, but as Stephen Rothwell noted, I failed to provide a commit
    message indicating why the merge was required. Sorry for that.
    Thanks, Alex"

    * tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio:
    vfio: fix documentation
    vfio-pci: PCI hot reset interface
    vfio-pci: Test for extended config space
    vfio-pci: Use fdget() rather than eventfd_fget()
    vfio: Add O_CLOEXEC flag to vfio device fd
    vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
    vfio: add external user support

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
    such as lease loss due to a network partition, where doing so may
    result in data corruption. Add a kernel parameter to control
    choice of legacy behaviour or not.
    - Performance improvements when 2 processes are writing to the same
    file.
    - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
    - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
    NFS clients from being able to manipulate our lease and file
    locking state.
    - Allow sharing of RPCSEC_GSS caches between different rpc clients.
    - Fix the broken NFSv4 security auto-negotiation between client and
    server.
    - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
    - Add a tracepoint framework for debugging NFSv4 state recovery
    issues.
    - Add tracing to the generic NFS layer.
    - Add tracing for the SUNRPC socket connection state.
    - Clean up the rpc_pipefs mount/umount event management.
    - Merge more patches from Chuck in preparation for NFSv4 migration
    support"

    * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
    NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
    NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
    NFSv4: Allow security autonegotiation for submounts
    NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
    NFSv4: Fix security auto-negotiation
    NFS: Clean up nfs_parse_security_flavors()
    NFS: Clean up the auth flavour array mess
    NFSv4.1 Use MDS auth flavor for data server connection
    NFS: Don't check lock owner compatability unless file is locked (part 2)
    NFS: Don't check lock owner compatibility in writes unless file is locked
    nfs4: Map NFS4ERR_WRONG_CRED to EPERM
    nfs4.1: Add SP4_MACH_CRED write and commit support
    nfs4.1: Add SP4_MACH_CRED stateid support
    nfs4.1: Add SP4_MACH_CRED secinfo support
    nfs4.1: Add SP4_MACH_CRED cleanup support
    nfs4.1: Add state protection handler
    nfs4.1: Minimal SP4_MACH_CRED implementation
    SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
    SUNRPC: Add an identifier for struct rpc_clnt
    SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
    ...

    Linus Torvalds
     
  • Pull fuse bugfixes from Miklos Szeredi:
    "Just a bunch of bugfixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: use list_for_each_entry() for list traversing
    fuse: readdir: check for slash in names
    fuse: hotfix truncate_pagecache() issue
    fuse: invalidate inode attributes on xattr modification
    fuse: postpone end_page_writeback() in fuse_writepage_locked()

    Linus Torvalds
     
  • Pull GFS2 updates from Steven Whitehouse:
    "This is possibly the smallest ever set of GFS2 patches for a merge
    window. Also, most of them are bug fixes this time.

    Two of my three patches (moving gfs2_sync_meta and merging the two
    writepage implementations) are clean ups with the third (taking the
    glock ref in examine_bucket) being a fix for a difficult to hit race
    condition.

    The removal of an unused memory barrier is a clean up from Bob
    Peterson, and the "spectator" relates to a rarely used mount option.
    Ben Marzinski's patch fixes a corner case where the incorrect inode
    flags were being set, resulting in incorrect behaviour on fsync"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
    GFS2: dirty inode correctly in gfs2_write_end
    GFS2: Don't flag consistency error if first mounter is a spectator
    GFS2: Remove unnecessary memory barrier
    GFS2: Merge ordered and writeback writepage
    GFS2: Take glock reference in examine_bucket()
    GFS2: Move gfs2_sync_meta to lops.c

    Linus Torvalds
     
  • Pull ceph updates from Sage Weil:
    "This includes both the first pile of Ceph patches (which I sent to
    torvalds@vger, sigh) and a few new patches that add support for
    fscache for Ceph. That includes a few fscache core fixes that David
    Howells asked go through the Ceph tree. (Thanks go to Milosz Tanski
    for putting this feature together)

    This first batch of patches (included here) had (has) several
    important RBD bug fixes, hole punch support, several different
    cleanups in the page cache interactions, improvements in the truncate
    code (new truncate mutex to avoid shenanigans with i_mutex), and a
    series of fixes in the synchronous striping read/write code.

    On top of that is a random collection of small fixes all across the
    tree (error code checks and error path cleanup, obsolete wq flags,
    etc)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
    ceph: use d_invalidate() to invalidate aliases
    ceph: remove ceph_lookup_inode()
    ceph: trivial buildbot warnings fix
    ceph: Do not do invalidate if the filesystem is mounted nofsc
    ceph: page still marked private_2
    ceph: ceph_readpage_to_fscache didn't check if marked
    ceph: clean PgPrivate2 on returning from readpages
    ceph: use fscache as a local presisent cache
    fscache: Netfs function for cleanup post readpages
    FS-Cache: Fix heading in documentation
    CacheFiles: Implement interface to check cache consistency
    FS-Cache: Add interface to check consistency of a cached object
    rbd: fix null dereference in dout
    rbd: fix buffer size for writes to images with snapshots
    libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
    rbd: fix I/O error propagation for reads
    ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
    ceph: allow sync_read/write return partial successed size of read/write.
    ceph: fix bugs about handling short-read for sync read mode.
    ceph: remove useless variable revoked_rdcache
    ...

    Linus Torvalds
     
  • Pull metag architecture changes from James Hogan:
    - Device tree updates for TZ1090 GPIO drivers merged via GPIO tree.
    - Add driver for ImgTec PDC irqchip as found in TZ1090 SoC.
    - Add linux-metag mailing list to MAINTAINERS file.

    * tag 'metag-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
    irq-imgpdc: add ImgTec PDC irqchip driver
    MAINTAINERS: add linux-metag mailing list
    metag: tz1090: instantiate gpio-tz1090-pdc
    metag: tz1090: select and instantiate gpio-tz1090
    metag: tz1090: select and instantiate irq-imgpdc

    Linus Torvalds
     
  • Pull ARC changes from Vineet Gupta:

    - ARC MM changes:
    - preparation for MMUv4 (accomodate new PTE bits, new cmds)
    - Rework the ASID allocation algorithm to remove asid-mm reverse map
    - Boilerplate code consolidation in Exception Handlers
    - Disable FRAME_POINTER for ARC
    - Unaligned Access Emulation for Big-Endian from Noam
    - Bunch of fixes (udelay, missing accessors) from Mischa

    * tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: fix new Section mismatches in build (post __cpuinit cleanup)
    Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC
    ARC: Fix __udelay calculation
    ARC: remove console_verbose() from setup_arch()
    ARC: Add read*_relaxed to asm/io.h
    ARC: Handle un-aligned user space access in BE.
    ARC: [ASID] Track ASID allocation cycles/generations
    ARC: [ASID] activate_mm() == switch_mm()
    ARC: [ASID] get_new_mmu_context() to conditionally allocate new ASID
    ARC: [ASID] Refactor the TLB paranoid debug code
    ARC: [ASID] Remove legacy/unused debug code
    ARC: No need to flush the TLB in early boot
    ARC: MMUv4 preps/3 - Abstract out TLB Insert/Delete
    ARC: MMUv4 preps/2 - Reshuffle PTE bits
    ARC: MMUv4 preps/1 - Fold PTE K/U access flags
    ARC: Code cosmetics (Nothing semantical)
    ARC: Entry Handler tweaks: Optimize away redundant IRQ_DISABLE_SAVE
    ARC: Exception Handlers Code consolidation
    ARC: Add some .gitignore entries

    Linus Torvalds
     
  • Pull m68knommu fixes from Greg Ungerer:
    "Just a small collection of cleanups and fixes this time, no big
    changes. The most interresting are to make the m68k and m68knommu
    consistently use CONFIG_IOMAP, clean out some unused board config
    options and flush the cache on signal stack creation"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    m68k: remove 16 unused boards in Kconfig.machine
    m68k: define 'VM_DATA_DEFAULT_FLAGS' no matter whether has 'NOMMU' or not
    m68knommu: user generic iomap to support ioread*/iowrite*
    m68k/coldfire: flush cache when creating the signal stack frame
    m68knommu: Mark functions only called from setup_arch() __init

    Linus Torvalds
     
  • Pull UML updates from Richard Weinberger:
    "This pile contains mostly fixes and improvements for issues identified
    by Richard W M Jones while adding UML as backend to libguestfs"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Add irq chip um/mask handlers
    um: prctl: Do not include linux/ptrace.h
    um: Run UML in it's own session.
    um: Cleanup SIGTERM handling
    um: ubd: Introduce submit_request()
    um: ubd: Add REQ_FLUSH suppport
    um: Implement probe_kernel_read()
    um: hostfs: Fix writeback

    Linus Torvalds
     

09 Sep, 2013

7 commits

  • When reconnecting to automounts at startup an autofs ioctl is used
    to find the device and inode of existing mounts so they can be used
    to open a file descriptor of possibly covered mounts.

    At this time the the caller might not yet "own" the mount so it can
    trigger calling ->d_automount(). This causes automount to hang when
    trying to reconnect to direct or offset mount types.

    Consequently kern_path() can't be used but kern_path_mountpoint() can be.

    Signed-off-by: Ian Kent
    Cc: Jeff Layton
    Cc: Al Viro
    Signed-off-by: Al Viro

    Ian Kent
     
  • This is the fix that the last two commits indirectly led up to - making
    sure that we don't call dput() in a bad context on the dentries we've
    looked up in RCU mode after the sequence count validation fails.

    This basically expands d_rcu_to_refcount() into the callers, and then
    fixes the callers to delay the dput() in the failure case until _after_
    we've dropped all locks and are no longer in an RCU-locked region.

    The case of 'complete_walk()' was trivial, since its failure case did
    the unlock_rcu_walk() directly after the call to d_rcu_to_refcount(),
    and as such that is just a pure expansion of the function with a trivial
    movement of the resulting dput() to after 'unlock_rcu_walk()'.

    In contrast, the unlazy_walk() case was much more complicated, because
    not only does convert two different dentries from RCU to be reference
    counted, but it used to not call unlock_rcu_walk() at all, and instead
    just returned an error and let the caller clean everything up in
    "terminate_walk()".

    Happily, one of the dentries in question (called "parent" inside
    unlazy_walk()) is the dentry of "nd->path", which terminate_walk() wants
    a refcount to anyway for the non-RCU case.

    So what the new and improved unlazy_walk() does is to first turn that
    dentry into a refcounted one, and once that is set up, the error cases
    can continue to use the terminate_walk() helper for cleanup, but for the
    non-RCU case. Which makes it possible to drop out of RCU mode if we
    actually hit the sequence number failure case.

    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • The virtio_pci_freeze/restore are defined under CONFIG_PM but is used
    by SET_SYSTEM_SLEEP_PM_OPS macro, which is defined under
    CONFIG_PM_SLEEP. So if CONFIG_PM_SLEEP is not cofigured but
    CONFIG_PM_RUNTIME is, the following warning message appeared:

    drivers/virtio/virtio_pci.c:770:12: warning: ‘virtio_pci_freeze’ defined but not used [-Wunused-function]
    static int virtio_pci_freeze(struct device *dev)
    ^
    drivers/virtio/virtio_pci.c:790:12: warning: ‘virtio_pci_restore’ defined but not used [-Wunused-function]
    static int virtio_pci_restore(struct device *dev)
    ^
    Fix it by changing CONFIG_PM to CONFIG_PM_SLEEP.

    Signed-off-by: Aaron Lu
    Reviewed-by: Amit Shah
    Signed-off-by: Rusty Russell

    Aaron Lu
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ... and move the extern from linux/namei.h to fs/internal.h,
    along with that of vfs_path_lookup().

    Signed-off-by: Al Viro

    Al Viro
     
  • ... and massage it a bit to reduce nesting

    Signed-off-by: Al Viro

    Al Viro
     
  • This simplifies the RCU to refcounting code in particular.

    I was originally intending to leave this for later, but walking through
    all the dput() logic (see previous commit), I realized that the dput()
    "might_sleep()" check was misleadingly weak. And I removed it as
    misleading, both for performance profiling and for debugging.

    However, the might_sleep() debugging case is actually true: the final
    dput() can indeed sleep, if the inode of the dentry that you are
    releasing ends up sleeping at iput time (see dentry_iput()). So the
    problem with the might_sleep() in dput() wasn't that it wasn't true, it
    was that it wasn't actually testing and triggering on the interesting
    case.

    In particular, just about *any* dput() can indeed sleep, if you happen
    to race with another thread deleting the file in question, and you then
    lose the race to the be the last dput() for that file. But because it's
    a very rare race, the debugging code would never trigger it in practice.

    Why is this problematic? The new d_rcu_to_refcount() (see commit
    15570086b590: "vfs: reimplement d_rcu_to_refcount() using
    lockref_get_or_lock()") does a dput() for the failure case, and it does
    it under the RCU lock. So potentially sleeping really is a bug.

    But there's no way I'm going to fix this with the previous complicated
    "lockref_get_or_lock()" interface. And rather than revert to the old
    and crufty nested dentry locking code (which did get this right by
    delaying the reference count updates until they were verified to be
    safe), let's make forward progress.

    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds