27 Jun, 2016

1 commit

  • Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
    (and not for cached inodes anymore), there no longer is a need to
    optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
    and gfs2_ilookup can be removed.

    In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
    i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
    inconsistency goes away as well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

15 Mar, 2016

1 commit

  • This patch basically reverts a very old patch from 2008,
    7a9f53b3c1875bef22ad4588e818bc046ef183da, with the title
    "Alternate gfs2_iget to avoid looking up inodes being freed".
    The original patch was designed to avoid a deadlock caused by lock
    ordering with try_rgrp_unlink. The patch forced the function to not
    find inodes that were being removed by VFS. The problem is, that
    made it impossible for nodes to delete their own unlinked dinodes
    after a certain point in time, because the inode needed was not found
    by this filtering process. There is no longer a need for the patch,
    since function try_rgrp_unlink no longer locks the inode: All it does
    is queue the glock onto the delete work_queue, so there should be no
    more deadlock.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

16 Apr, 2015

1 commit


01 Nov, 2014

1 commit


29 Jun, 2013

2 commits


26 Feb, 2013

1 commit


10 Oct, 2012

1 commit

  • Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
    u64 inum = fid->raw[2];
    which is unhelpfully reported as at the end of shmem_alloc_inode():

    BUG: unable to handle kernel paging request at ffff880061cd3000
    IP: [] shmem_alloc_inode+0x40/0x40
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Call Trace:
    [] ? exportfs_decode_fh+0x79/0x2d0
    [] do_handle_open+0x163/0x2c0
    [] sys_open_by_handle_at+0xc/0x10
    [] tracesys+0xe1/0xe6

    Right, tmpfs is being stupid to access fid->raw[2] before validating that
    fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
    fall at the end of a page, and the next page not be present.

    But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
    careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
    could oops in the same way: add the missing fh_len checks to those.

    Reported-by: Sasha Levin
    Signed-off-by: Hugh Dickins
    Cc: Al Viro
    Cc: Sage Weil
    Cc: Steven Whitehouse
    Cc: Christoph Hellwig
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Hugh Dickins
     

30 May, 2012

1 commit

  • pass inode + parent's inode or NULL instead of dentry + bool saying
    whether we want the parent or not.

    NOTE: that needs ceph fix folded in.

    Signed-off-by: Al Viro

    Al Viro
     

08 Nov, 2011

1 commit

  • This patch adds read-ahead capability to GFS2's
    directory hash table management. It greatly improves
    performance for some directory operations. For example:
    In one of my file systems that has 1000 directories, each
    of which has 1000 files, time to execute a recursive
    ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced
    from 2m2.814s on a stock kernel to 0m45.938s.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

20 Apr, 2011

1 commit

  • This patch adds writeback_control to writing back the AIL
    list. This means that we can then take advantage of the
    information we get in ->write_inode() in order to set off
    some pre-emptive writeback.

    In addition, the AIL code is cleaned up a bit to make it
    a bit simpler to understand.

    There is still more which can usefully be done in this area,
    but this is a good start at least.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

14 Mar, 2011

1 commit

  • The exportfs encode handle function should return the minimum required
    handle size. This helps user to find out the handle size by passing 0
    handle size in the first step and then redoing to the call again with
    the returned handle size value.

    Acked-by: Serge Hallyn
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     

13 Jan, 2011

1 commit


07 Jan, 2011

1 commit

  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     

15 Nov, 2010

1 commit

  • This area of the code has always been a bit delicate due to the
    subtleties of lock ordering. The problem is that for "normal"
    alloc/dealloc, we always grab the inode locks first and the rgrp lock
    later.

    In order to ensure no races in looking up the unlinked, but still
    allocated inodes, we need to hold the rgrp lock when we do the lookup,
    which means that we can't take the inode glock.

    The solution is to borrow the technique already used by NFS to solve
    what is essentially the same problem (given an inode number, look up
    the inode carefully, checking that it really is in the expected
    state).

    We cannot do that directly from the allocation code (lock ordering
    again) so we give the job to the pre-existing delete workqueue and
    carry on with the allocation as normal.

    If we find there is no space, we do a journal flush (required anyway
    if space from a deallocation is to be released) which should block
    against the pending deallocations, so we should always get the space
    back.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Sep, 2010

1 commit


21 May, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
    GFS2: Fix typo
    GFS2: stuck in inode wait, no glocks stuck
    GFS2: Eliminate useless err variable
    GFS2: Fix writing to non-page aligned gfs2_quota structures
    GFS2: Add some useful messages
    GFS2: fix quota state reporting
    GFS2: Various gfs2_logd improvements
    GFS2: glock livelock
    GFS2: Clean up stuffed file copying
    GFS2: docs update
    GFS2: Remove space from slab cache name

    Linus Torvalds
     

14 Apr, 2010

1 commit

  • This patch fixes a couple gfs2 problems with the reclaiming of
    unlinked dinodes. First, there were a couple of livelocks where
    everything would come to a halt waiting for a glock that was
    seemingly held by a process that no longer existed. In fact, the
    process did exist, it just had the wrong pid number in the holder
    information. Second, there was a lock ordering problem between
    inode locking and glock locking. Third, glock/inode contention
    could sometimes cause inodes to be improperly marked invalid by
    iget_failed.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

09 Sep, 2009

1 commit

  • There is a potential race in the inode deallocation code if two
    nodes try to deallocate the same inode at the same time. Most of
    the issue is solved by the iopen locking. There is still a small
    window which is not covered by the iopen lock. This patches fixes
    that and also makes the deallocation code more robust in the face of
    any errors in the rgrp bitmaps, or erroneous iopen callbacks from
    other nodes.

    This does introduce one extra disk read, but that is generally not
    an issue since its the same block that must be written to later
    in the deallocation process. The total disk accesses therefore stay
    the same,

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

22 May, 2009

1 commit

  • This patch renames the ops_*.c files which have no counterpart
    without the ops_ prefix in order to shorten the name and make
    it more readable. In addition, ops_address.h (which was very
    small) is moved into inode.h and inode.h is cleaned up by
    adding extern where required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse