09 Apr, 2014

1 commit

  • Pull drm updates from Dave Airlie:
    "Highlights:

    - drm:

    Generic display port aux features, primary plane support, drm
    master management fixes, logging cleanups, enforced locking checks
    (instead of docs), documentation improvements, minor number
    handling cleanup, pseudofs for shared inodes.

    - ttm:

    add ability to allocate from both ends

    - i915:

    broadwell features, power domain and runtime pm, per-process
    address space infrastructure (not enabled)

    - msm:

    power management, hdmi audio support

    - nouveau:

    ongoing GPU fault recovery, initial maxwell support, random fixes

    - exynos:

    refactored driver to clean up a lot of abstraction, DP support
    moved into drm, LVDS bridge support added, parallel panel support

    - gma500:

    SGX MMU support, SGX irq handling, asle irq work fixes

    - radeon:

    video engine bringup, ring handling fixes, use dp aux helpers

    - vmwgfx:

    add rendernode support"

    * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits)
    DRM: armada: fix corruption while loading cursors
    drm/dp_helper: don't return EPROTO for defers (v2)
    drm/bridge: export ptn3460_init function
    drm/exynos: remove MODULE_DEVICE_TABLE definitions
    ARM: dts: exynos4412-trats2: enable exynos/fimd node
    ARM: dts: exynos4210-trats: enable exynos/fimd node
    ARM: dts: exynos4412-trats2: add panel node
    ARM: dts: exynos4210-trats: add panel node
    ARM: dts: exynos4: add MIPI DSI Master node
    drm/panel: add S6E8AA0 driver
    ARM: dts: exynos4210-universal_c210: add proper panel node
    drm/panel: add ld9040 driver
    panel/ld9040: add DT bindings
    panel/s6e8aa0: add DT bindings
    drm/exynos: add DSIM driver
    exynos/dsim: add DT bindings
    drm/exynos: disallow fbdev initialization if no device is connected
    drm/mipi_dsi: create dsi devices only for nodes with reg property
    drm/mipi_dsi: add flags to DSI messages
    Skip intel_crt_init for Dell XPS 8700
    ...

    Linus Torvalds
     

01 Apr, 2014

1 commit

  • If flags contain RENAME_EXCHANGE then exchange source and destination files.
    There's no restriction on the type of the files; e.g. a directory can be
    exchanged with a symlink.

    Signed-off-by: Miklos Szeredi
    Reviewed-by: Jan Kara
    Reviewed-by: J. Bruce Fields

    Miklos Szeredi
     

31 Mar, 2014

1 commit

  • Linux 3.14

    The vt-d w/a merged late in 3.14-rc needs a bit of fine-tuning, hence
    backmerge.

    Conflicts:
    drivers/gpu/drm/i915/i915_gem_gtt.c
    drivers/gpu/drm/i915/intel_ddi.c
    drivers/gpu/drm/i915/intel_dp.c

    All trivial adjacent lines changed type conflicts, so trivial git
    doesn't even show them in the merg commit.

    Signed-off-by: Daniel Vetter

    Daniel Vetter
     

23 Mar, 2014

1 commit

  • In all callchains leading to prepend_name(), the value left in *buflen
    is eventually discarded unused if prepend_name() has returned a negative.
    So we are free to do what prepend() does, and subtract from *buflen
    *before* checking for underflow (which turns into checking the sign
    of subtraction result, of course).

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

16 Mar, 2014

1 commit

  • Our current DRM design uses a single address_space for all users of the
    same DRM device. However, there is no way to create an anonymous
    address_space without an underlying inode. Therefore, we wait for the
    first ->open() callback on a registered char-dev and take-over the inode
    of the char-dev. This worked well so far, but has several drawbacks:
    - We screw with FS internals and rely on some non-obvious invariants like
    inode->i_mapping being the same as inode->i_data for char-devs.
    - We don't have any address_space prior to the first ->open() from
    user-space. This leads to ugly fallback code and we cannot allocate
    global objects early.

    As pointed out by Al-Viro, fs/anon_inode.c is *not* supposed to be used by
    drivers for anonymous inode-allocation. Therefore, this patch follows the
    proposed alternative solution and adds a pseudo filesystem mount-point to
    DRM. We can then allocate private inodes including a private address_space
    for each DRM device at initialization time.

    Note that we could use:
    sysfs_get_inode(sysfs_mnt->mnt_sb, drm_device->dev->kobj.sd);
    to get access to the underlying sysfs-inode of a "struct device" object.
    However, most of this information is currently hidden and it's not clear
    whether this address_space is suitable for driver access. Thus, unless
    linux allows anonymous address_space objects or driver-core provides a
    public inode per device, we're left with our own private internal mount
    point.

    Cc: Al Viro
    Signed-off-by: David Herrmann

    David Herrmann
     

27 Jan, 2014

1 commit

  • * we need to save the starting point for restarts
    * reject pathologically short buffers outright

    Spotted-by: Denys Vlasenko
    Spotted-by: Oleg Nesterov
    Signed-off-by: Al Viro

    Al Viro
     

26 Jan, 2014

1 commit

  • In commit 232d2d60aa5469bb097f55728f65146bd49c1d25
    Author: Waiman Long
    Date: Mon Sep 9 12:18:13 2013 -0400

    dcache: Translating dentry into pathname without taking rename_lock

    The __dentry_path locking was changed and the variable error was
    intended to be moved outside of the loop. Unfortunately the inner
    declaration of error was not removed. Resulting in a version of
    __dentry_path that will never return an error.

    Remove the problematic inner declaration of error and allow
    __dentry_path to return errors once again.

    Cc: stable@vger.kernel.org
    Cc: Waiman Long
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Al Viro

    Eric W. Biederman
     

18 Jan, 2014

1 commit

  • Pull namespace fixes from Eric Biederman:
    "This is a set of 3 regression fixes.

    This fixes /proc/mounts when using "ip netns add " to display
    the actual mount point.

    This fixes a regression in clone that broke lxc-attach.

    This fixes a regression in the permission checks for mounting /proc
    that made proc unmountable if binfmt_misc was in use. Oops.

    My apologies for sending this pull request so late. Al Viro gave
    interesting review comments about the d_path fix that I wanted to
    address in detail before I sent this pull request. Unfortunately a
    bad round of colds kept from addressing that in detail until today.
    The executive summary of the review was:

    Al: Is patching d_path really sufficient?
    The prepend_path, d_path, d_absolute_path, and __d_path family of
    functions is a really mess.

    Me: Yes, patching d_path is really sufficient. Yes, the code is mess.
    No it is not appropriate to rewrite all of d_path for a regression
    that has existed for entirely too long already, when a two line
    change will do"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    vfs: Fix a regression in mounting proc
    fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)
    vfs: In d_path don't call d_dname on a mount point

    Linus Torvalds
     

13 Dec, 2013

1 commit

  • When explicitly hashing the end of a string with the word-at-a-time
    interface, we have to be careful which end of the word we pick up.

    On big-endian CPUs, the upper-bits will contain the data we're after, so
    ensure we generate our masks accordingly (and avoid hashing whatever
    random junk may have been sitting after the string).

    This patch adds a new dcache helper, bytemask_from_count, which creates
    a mask appropriate for the CPU endianness.

    Cc: Al Viro
    Signed-off-by: Will Deacon
    Signed-off-by: Linus Torvalds

    Will Deacon
     

27 Nov, 2013

1 commit

  • Aditya Kali (adityakali@google.com) wrote:
    > Commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
    > "proc: Fix the namespace inode permission checks." converted
    > the namespace files into symlinks. The same commit changed
    > the way namespace bind mounts appear in /proc/mounts:
    > $ mount --bind /proc/self/ns/ipc /mnt/ipc
    > Originally:
    > $ cat /proc/mounts | grep ipc
    > proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
    >
    > After commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
    > $ cat /proc/mounts | grep ipc
    > proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
    >
    > This breaks userspace which expects the 2nd field in
    > /proc/mounts to be a valid path.

    The symlink /proc//ns/{ipc,mnt,net,pid,user,uts} point to
    dentries allocated with d_alloc_pseudo that we can mount, and
    that have interesting names printed out with d_dname.

    When these files are bind mounted /proc/mounts is not currently
    displaying the mount point correctly because d_dname is called instead
    of just displaying the path where the file is mounted.

    Solve this by adding an explicit check to distinguish mounted pseudo
    inodes and unmounted pseudo inodes. Unmounted pseudo inodes always
    use mount of their filesstem as the mnt_root in their path making
    these two cases easy to distinguish.

    CC: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Reported-by: Aditya Kali
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

16 Nov, 2013

3 commits


14 Nov, 2013

1 commit

  • Pull core locking changes from Ingo Molnar:
    "The biggest changes:

    - add lockdep support for seqcount/seqlocks structures, this
    unearthed both bugs and required extra annotation.

    - move the various kernel locking primitives to the new
    kernel/locking/ directory"

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    block: Use u64_stats_init() to initialize seqcounts
    locking/lockdep: Mark __lockdep_count_forward_deps() as static
    lockdep/proc: Fix lock-time avg computation
    locking/doc: Update references to kernel/mutex.c
    ipv6: Fix possible ipv6 seqlock deadlock
    cpuset: Fix potential deadlock w/ set_mems_allowed
    seqcount: Add lockdep functionality to seqcount/seqlock structures
    net: Explicitly initialize u64_stats_sync structures for lockdep
    locking: Move the percpu-rwsem code to kernel/locking/
    locking: Move the lglocks code to kernel/locking/
    locking: Move the rwsem code to kernel/locking/
    locking: Move the rtmutex code to kernel/locking/
    locking: Move the semaphore core to kernel/locking/
    locking: Move the spinlock code to kernel/locking/
    locking: Move the lockdep code to kernel/locking/
    locking: Move the mutex code to kernel/locking/
    hung_task debugging: Add tracepoint to report the hang
    x86/locking/kconfig: Update paravirt spinlock Kconfig description
    lockstat: Report avg wait and hold times
    lockdep, x86/alternatives: Drop ancient lockdep fixup message
    ...

    Linus Torvalds
     

13 Nov, 2013

3 commits

  • ... and equivalent is needed in 3.12; it's broken there as well

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Li Zhong
    Signed-off-by: Al Viro

    Li Zhong
     
  • Pull vfs updates from Al Viro:
    "All kinds of stuff this time around; some more notable parts:

    - RCU'd vfsmounts handling
    - new primitives for coredump handling
    - files_lock is gone
    - Bruce's delegations handling series
    - exportfs fixes

    plus misc stuff all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
    ecryptfs: ->f_op is never NULL
    locks: break delegations on any attribute modification
    locks: break delegations on link
    locks: break delegations on rename
    locks: helper functions for delegation breaking
    locks: break delegations on unlink
    namei: minor vfs_unlink cleanup
    locks: implement delegations
    locks: introduce new FL_DELEG lock flag
    vfs: take i_mutex on renamed file
    vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
    vfs: don't use PARENT/CHILD lock classes for non-directories
    vfs: pull ext4's double-i_mutex-locking into common code
    exportfs: fix quadratic behavior in filehandle lookup
    exportfs: better variable name
    exportfs: move most of reconnect_path to helper function
    exportfs: eliminate unused "noprogress" counter
    exportfs: stop retrying once we race with rename/remove
    exportfs: clear DISCONNECTED on all parents sooner
    exportfs: more detailed comment for path_reconnect
    ...

    Linus Torvalds
     

09 Nov, 2013

7 commits

  • DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
    connected all the way up to the root of the filesystem. It *shouldn't*
    be cleared as soon as the dentry is connected to a parent. That will
    cause bugs at least on exportable filesystems.

    Acked-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • I can't for the life of me see any reason why anyone should care whether
    a dentry that is never hooked into the dentry cache would need
    DCACHE_DISCONNECTED set.

    This originates from 4b936885ab04dc6e0bb0ef35e0e23c1a7364d9e5 "fs:
    improve scalability of pseudo filesystems", which probably just made the
    false assumption the DCACHE_DISCONNECTED was meant to be set on anything
    not connected to a parent somehow.

    So this is just confusing. Ideally the only uses of DCACHE_DISCONNECTED
    would be in the filehandle-lookup code, which needs it to ensure
    dentries are connected into the dentry tree before use.

    I left d_alloc_pseudo there even though it's now equivalent to
    __d_alloc(), just on the theory the name is better documentation of its
    intended use outside dcache.c.

    Cc: Nick Piggin
    Acked-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Every hashed dentry is either hashed in the dentry_hashtable, or a
    superblock's s_anon list.

    __d_drop() assumes it can determine which is the case by checking
    DCACHE_DISCONNECTED; this is not true.

    It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
    only hashed on dentry_hashtable, but is fully connected to its parents
    back to the root.

    But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
    attempts to connect a directory (found by filehandle lookup) back to
    root by ascending to parents and performing lookups one at a time. It
    does not clear DCACHE_DISCONNECTED until it's done, and that is not at
    all an atomic process.

    In particular, it is possible for DCACHE_DISCONNECTED to be set on a
    dentry which is hashed on the dentry_hashtable.

    Instead, use IS_ROOT() to check which hash chain a dentry is on. This
    *does* work:

    Dentries are hashed only by:

    - d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

    - __d_rehash, called by _d_rehash: hashes to the dentry's
    parent, and all callers of _d_rehash appear to have d_parent
    set to a "real" parent.
    - __d_rehash, called by __d_move: rehashes the moved dentry to
    hash chain determined by target, and assigns target's d_parent
    to its d_parent, before dropping the dentry's d_lock.

    Therefore I believe it's safe for a holder of a dentry's d_lock to
    assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
    true.

    I believe the incorrect assumption about DCACHE_DISCONNECTED was
    originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash
    locking".

    Also add a comment while we're here.

    Cc: Nick Piggin
    Acked-by: Christoph Hellwig
    Reviewed-by: NeilBrown
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Al Viro

    J. Bruce Fields
     
  • Put a type field into struct dentry::d_flags to indicate if the dentry is one
    of the following types that relate particularly to pathwalk:

    Miss (negative dentry)
    Directory
    "Automount" directory (defective - no i_op->lookup())
    Symlink
    Other (regular, socket, fifo, device)

    The type field is set to one of the first five types on a dentry by calls to
    __d_instantiate() and d_obtain_alias() from information in the inode (if one is
    given).

    The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
    dentry as a negative dentry.

    Accessors provided are:

    d_set_type(dentry, type)
    d_is_directory(dentry)
    d_is_autodir(dentry)
    d_is_symlink(dentry)
    d_is_file(dentry)
    d_is_negative(dentry)
    d_is_positive(dentry)

    A bunch of checks in pathname resolution switched to those.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • * RCU-delayed freeing of vfsmounts
    * vfsmount_lock replaced with a seqlock (mount_lock)
    * sequence number from mount_lock is stored in nameidata->m_seq and
    used when we exit RCU mode
    * new vfsmount flag - MNT_SYNC_UMOUNT. Set by umount_tree() when its
    caller knows that vfsmount will have no surviving references.
    * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
    and doing pending mntput().
    * new helper: legitimize_mnt(mnt, seq). Checks the mount_lock sequence
    number against seq, then grabs reference to mnt. Then it rechecks mount_lock
    again to close the race and either returns success or drops the reference it
    has acquired. The subtle point is that in case of MNT_SYNC_UMOUNT we can
    simply decrement the refcount and sod off - aforementioned synchronize_rcu()
    makes sure that final mntput() won't come until we leave RCU mode. We need
    that, since we don't want to end up with some lazy pathwalk racing with
    umount() and stealing the final mntput() from it - caller of umount() may
    expect it to return only once the fs is shut down and we don't want to break
    that. In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
    full-blown mntput() in case of mount_lock sequence number mismatch happening
    just as we'd grabbed the reference, but in those cases we won't be stealing
    the final mntput() from anything that would care.
    * mntput_no_expire() doesn't lock anything on the fast path now. Incidentally,
    SMP and UP cases are handled the same way - no ifdefs there.
    * normal pathname resolution does *not* do any writes to mount_lock. It does,
    of course, bump the refcounts of vfsmount and dentry in the very end, but that's
    it.

    Signed-off-by: Al Viro

    Al Viro
     
  • we have too many iterators in fs/dcache.c...

    Signed-off-by: Al Viro

    Al Viro
     

06 Nov, 2013

1 commit

  • Currently seqlocks and seqcounts don't support lockdep.

    After running across a seqcount related deadlock in the timekeeping
    code, I used a less-refined and more focused variant of this patch
    to narrow down the cause of the issue.

    This is a first-pass attempt to properly enable lockdep functionality
    on seqlocks and seqcounts.

    Since seqcounts are used in the vdso gettimeofday code, I've provided
    non-lockdep accessors for those needs.

    I've also handled one case where there were nested seqlock writers
    and there may be more edge cases.

    Comments and feedback would be appreciated!

    Signed-off-by: John Stultz
    Signed-off-by: Peter Zijlstra
    Cc: Eric Dumazet
    Cc: Li Zefan
    Cc: Mathieu Desnoyers
    Cc: Steven Rostedt
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     

01 Nov, 2013

1 commit

  • We do not want to dirty the dentry->d_flags cacheline in dput() just to
    set the DCACHE_REFERENCED flag when it is already set in the common case
    anyway. This way the first cacheline of the dentry (which contains the
    RCU lookup information etc) can stay shared among multiple CPU's.

    This finishes off some of the details of all the scalability patches
    merged during the merge window.

    Also don't mark dentry_kill() for inlining, since it's the uncommon path
    and inlining it just makes the common path slower due to extra function
    entry/exit overhead.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

25 Oct, 2013

2 commits


22 Oct, 2013

1 commit

  • Move kernel-doc notation to immediately before its function to eliminate
    kernel-doc warnings introduced by commit db14fc3abcd5 ("vfs: add
    d_walk()")

    Warning(fs/dcache.c:1343): No description found for parameter 'data'
    Warning(fs/dcache.c:1343): No description found for parameter 'dentry'
    Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount'

    Signed-off-by: Randy Dunlap
    Cc: Miklos Szeredi
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

15 Sep, 2013

1 commit


14 Sep, 2013

1 commit

  • The LRU list changes interacted badly with our nr_dentry_unused
    accounting, and even worse with the new DCACHE_LRU_LIST bit logic.

    This introduces helper functions to make sure everything follows the
    proper dcache d_lru list rules: the dentry cache is complicated by the
    fact that some of the hotpaths don't even want to look at the LRU list
    at all, and the fact that we use the same list entry in the dentry for
    both the LRU list and for our temporary shrinking lists when removing
    things from the LRU.

    The helper functions temporarily have some extra sanity checking for the
    flag bits that have to match the current LRU state of the dentry. We'll
    remove that before the final 3.12 release, but considering how easy it
    is to get wrong, this first cleanup version has some very particular
    sanity checking.

    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Sep, 2013

7 commits

  • Pull vfs pile 4 from Al Viro:
    "list_lru pile, mostly"

    This came out of Andrew's pile, Al ended up doing the merge work so that
    Andrew didn't have to.

    Additionally, a few fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
    super: fix for destroy lrus
    list_lru: dynamically adjust node arrays
    shrinker: Kill old ->shrink API.
    shrinker: convert remaining shrinkers to count/scan API
    staging/lustre/libcfs: cleanup linux-mem.h
    staging/lustre/ptlrpc: convert to new shrinker API
    staging/lustre/obdclass: convert lu_object shrinker to count/scan API
    staging/lustre/ldlm: convert to shrinkers to count/scan API
    hugepage: convert huge zero page shrinker to new shrinker API
    i915: bail out earlier when shrinker cannot acquire mutex
    drivers: convert shrinkers to new count/scan API
    fs: convert fs shrinkers to new scan/count API
    xfs: fix dquot isolation hang
    xfs-convert-dquot-cache-lru-to-list_lru-fix
    xfs: convert dquot cache lru to list_lru
    xfs: rework buffer dispose list tracking
    xfs-convert-buftarg-lru-to-generic-code-fix
    xfs: convert buftarg LRU to generic code
    fs: convert inode and dentry shrinking to be node aware
    vmscan: per-node deferred work
    ...

    Linus Torvalds
     
  • This avoids the spinlocks and refcounts in the d_path() sequence too
    (used by /proc and various other entities). See commit 8b19e34188a3 for
    the equivalent getcwd() system call path.

    And unlike getcwd(), d_path() doesn't copy the result to user space, so
    I don't need to fear _that_ particular bug happening again.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • It's a pathname. It should use the pathname allocators and
    deallocators, and PATH_MAX instead of PAGE_SIZE. Never mind that the
    two are commonly the same.

    With this, the allocations scale up nicely too, and I can do getcwd()
    system calls at a rate of about 300M/s, with no lock contention
    anywhere.

    Of course, nobody sane does that, especially since getcwd() is
    traditionally a very slow operation in Unix. But this was also the
    simplest way to benchmark the prepend_path() improvements by Waiman, and
    once I saw the profiles I couldn't leave it well enough alone.

    But apart from being an performance improvement (from using per-cpu slab
    allocators instead of the raw page allocator), it's actually a valid and
    real cleanup.

    Signed-off-by: Linus "OCD" Torvalds

    Linus Torvalds
     
  • Oops. That wasn't very smart. We don't actually need the RCU lock any
    more by the time we copy the cwd string to user space, but I had
    stupidly surrounded the whole thing with it.

    Introduced by commit 8b19e34188a3 ("vfs: make getcwd() get the root and
    pwd path under rcu")

    Is-a-big-hairy-idiot: Linus Torvalds

    Linus Torvalds
     
  • This allows us to skip all the crazy spinlocks and reference count
    updates, and instead use the fs sequence read-lock to get an atomic
    snapshot of the root and cwd information.

    We might want to make the rule that "prepend_path()" is always called
    with the RCU lock held, but the RCU lock nests fine and this is the
    minimal fix.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Let's not pollute the include files with inline functions that are only
    used in a single place. Especially not if we decide we might want to
    change the semantics of said function to make it more efficient..

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This patch modifies read_seqbegin_or_lock() and need_seqretry() to use
    newly introduced read_seqlock_excl() and read_sequnlock_excl()
    primitives so that they won't change the sequence number even if they
    fall back to take the lock. This is OK as no change to the protected
    data structure is being made.

    It will prevent one fallback to lock taking from cascading into a series
    of lock taking reducing performance because of the sequence number
    change. It will also allow other sequence readers to go forward while
    an exclusive reader lock is taken.

    This patch also updates some of the inaccurate comments in the code.

    Signed-off-by: Waiman Long
    To: Alexander Viro
    Signed-off-by: Linus Torvalds

    Waiman Long
     

11 Sep, 2013

2 commits

  • Now that the shrinker is passing a node in the scan control structure, we
    can pass this to the the generic LRU list code to isolate reclaim to the
    lists on matching nodes.

    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Chinner
     
  • The list_lru implementation has one function, list_lru_dispose_all, with
    only one user (the dentry code). At first, such function appears to make
    sense because we are really not interested in the result of isolating each
    dentry separately - all of them are going away anyway. However, it's
    implementation is buggy in the following way:

    When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries
    marking them with DCACHE_SHRINK_LIST. However, this is done without the
    nlru->lock taken. The imediate result of that is that someone else may
    add or remove the dentry from the LRU at the same time. When list_lru_del
    happens in that scenario we will see an element that is not yet marked
    with DCACHE_SHRINK_LIST (even though it will be in the future) and
    obviously remove it from an lru where the element no longer is. Since
    list_lru_dispose_all will in effect count down nlru's nr_items and
    list_lru_del will do the same, this will lead to an imbalance.

    The solution for this would not be so simple: we can obviously just keep
    the lru_lock taken, but then we have no guarantees that we will be able to
    acquire the dentry lock (dentry->d_lock). To properly solve this, we need
    a communication mechanism between the lru and dentry code, so they can
    coordinate this with each other.

    Such mechanism already exists in the form of the list_lru_walk_cb
    callback. So it is possible to construct a dcache-side prune function
    that does the right thing only by calling list_lru_walk in a loop until no
    more dentries are available.

    With only one user, plus the fact that a sane solution for the problem
    would involve boucing between dcache and list_lru anyway, I see little
    justification to keep the special case list_lru_dispose_all in tree.

    Signed-off-by: Glauber Costa
    Cc: Michal Hocko
    Acked-by: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa