20 Jul, 2007

3 commits

  • Change ->fault prototype. We now return an int, which contains
    VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
    FAULT_RET_ code tells the VM whether a page was found, whether it has been
    locked, and potentially other things. This is not quite the way he wanted
    it yet, but that's changed in the next patch (which requires changes to
    arch code).

    This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
    that a page is locked which requires filemap_nopage to go away (because we
    can no longer remain backward compatible without that flag), but we were
    going to do that anyway.

    struct fault_data is renamed to struct vm_fault as Linus asked. address
    is now a void __user * that we should firmly encourage drivers not to use
    without really good reason.

    The page is now returned via a page pointer in the vm_fault struct.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
    the virtual address -> file offset differently from linear mappings.

    ->populate is a layering violation because the filesystem/pagecache code
    should need to know anything about the virtual memory mapping. The hitch here
    is that the ->nopage handler didn't pass down enough information (ie. pgoff).
    But it is more logical to pass pgoff rather than have the ->nopage function
    calculate it itself anyway (because that's a similar layering violation).

    Having the populate handler install the pte itself is likewise a nasty thing
    to be doing.

    This patch introduces a new fault handler that replaces ->nopage and
    ->populate and (later) ->nopfn. Most of the old mechanism is still in place
    so there is a lot of duplication and nice cleanups that can be removed if
    everyone switches over.

    The rationale for doing this in the first place is that nonlinear mappings are
    subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
    to duplicate the synchronisation logic rather than just consolidate the two.

    After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
    pagecache. Seems like a fringe functionality anyway.

    NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
    users have hit mainline yet.

    [akpm@linux-foundation.org: cleanup]
    [randy.dunlap@oracle.com: doc. fixes for readahead]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Randy Dunlap
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fix the race between invalidate_inode_pages and do_no_page.

    Andrea Arcangeli identified a subtle race between invalidation of pages from
    pagecache with userspace mappings, and do_no_page.

    The issue is that invalidation has to shoot down all mappings to the page,
    before it can be discarded from the pagecache. Between shooting down ptes to
    a particular page, and actually dropping the struct page from the pagecache,
    do_no_page from any process might fault on that page and establish a new
    mapping to the page just before it gets discarded from the pagecache.

    The most common case where such invalidation is used is in file truncation.
    This case was catered for by doing a sort of open-coded seqlock between the
    file's i_size, and its truncate_count.

    Truncation will decrease i_size, then increment truncate_count before
    unmapping userspace pages; do_no_page will read truncate_count, then find the
    page if it is within i_size, and then check truncate_count under the page
    table lock and back out and retry if it had subsequently been changed (ptl
    will serialise against unmapping, and ensure a potentially updated
    truncate_count is actually visible).

    Complexity and documentation issues aside, the locking protocol fails in the
    case where we would like to invalidate pagecache inside i_size. do_no_page
    can come in anytime and filemap_nopage is not aware of the invalidation in
    progress (as it is when it is outside i_size). The end result is that
    dangling (->mapping == NULL) pages that appear to be from a particular file
    may be mapped into userspace with nonsense data. Valid mappings to the same
    place will see a different page.

    Andrea implemented two working fixes, one using a real seqlock, another using
    a page->flags bit. He also proposed using the page lock in do_no_page, but
    that was initially considered too heavyweight. However, it is not a global or
    per-file lock, and the page cacheline is modified in do_no_page to increment
    _count and _mapcount anyway, so a further modification should not be a large
    performance hit. Scalability is not an issue.

    This patch implements this latter approach. ->nopage implementations return
    with the page locked if it is possible for their underlying file to be
    invalidated (in that case, they must set a special vm_flags bit to indicate
    so). do_no_page only unlocks the page after setting up the mapping
    completely. invalidation is excluded because it holds the page lock during
    invalidation of each page (and ensures that the page is not mapped while
    holding the lock).

    This also allows significant simplifications in do_no_page, because we have
    the page locked in the right place in the pagecache from the start.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

19 Jul, 2007

1 commit

  • Since gfs2 can't prevent conflicting opens or leases on other nodes, we
    probably shouldn't allow it to give out leases at all.

    Put the newly defined lease operation into use in gfs2 by turning off
    lease, unless we're using the "nolock' locking module (in which case all
    locking is local anyway).

    Signed-off-by: Marc Eshel
    Signed-off-by: J. Bruce Fields
    Cc: Steven Whitehouse

    Marc Eshel
     

11 Jul, 2007

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits)
    [GFS2] Accept old format NFS filehandles
    [GFS2] Small fixes to logging code
    [DLM] dump more lock values
    [GFS2] Remove i_mode passing from NFS File Handle
    [GFS2] Obtaining no_formal_ino from directory entry
    [GFS2] git-gfs2-nmw-build-fix
    [GFS2] System won't suspend with GFS2 file system mounted
    [GFS2] remounting w/o acl option leaves acls enabled
    [GFS2] inode size inconsistency
    [DLM] Telnet to port 21064 can stop all lockspaces
    [GFS2] Fix gfs2_block_truncate_page err return
    [GFS2] Addendum to the journaled file/unmount patch
    [GFS2] Simplify multiple glock aquisition
    [GFS2] assertion failure after writing to journaled file, umount
    [GFS2] Use zero_user_page() in stuffed_readpage()
    [GFS2] Remove bogus '\0' in rgrp.c
    [GFS2] Journaled file write/unstuff bug
    [DLM] don't require FS flag on all nodes
    [GFS2] Fix deallocation issues
    [GFS2] return conflicts for GETLK
    ...

    Linus Torvalds
     

10 Jul, 2007

1 commit


09 Jul, 2007

1 commit

  • This patch cleans up the inode number handling code. The main difference
    is that instead of looking up the inodes using a struct gfs2_inum_host
    we now use just the no_addr member of this structure. The tests relating
    to no_formal_ino can then be done by the calling code. This has
    advantages in that we want to do different things in different code
    paths if the no_formal_ino doesn't match. In the NFS patch we want to
    return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
    no_formal_ino doesn't match and thus we can withdraw in this case.

    In order to later fix bz #201012, we need to be able to look up an inode
    without knowing no_formal_ino, as the only information that is known to
    us is the on-disk location of the inode in question.

    This patch will also help us to fix bz #236099 at a later date by
    cleaning up a lot of the code in that area.

    There are no user visible changes as a result of this patch and there
    are no changes to the on-disk format either.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

09 May, 2007

1 commit


07 May, 2007

2 commits


15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

06 Feb, 2007

1 commit

  • This removes the extra filldir callback which gfs2 was using to
    enclose an attempt at readahead for inodes during readdir. The
    code was too complicated and also hurts performance badly in the
    case that the getdents64/readdir call isn't being followed by
    stat() and it wasn't even getting it right all the time when it
    was.

    As a result, on my test box an "ls" of a directory containing 250000
    files fell from about 7mins (freshly mounted, so nothing cached) to
    between about 15 to 25 seconds. When the directory content was cached,
    the time taken fell from about 3mins to about 4 or 5 seconds.

    Interestingly in the cached case, running "ls -l" once reduced the time
    taken for subsequent runs of "ls" to about 6 secs even without this
    patch. Now it turns out that there was a special case of glocks being
    used for prefetching the metadata, but because of the timeouts for these
    locks (set to 10 secs) the metadata was being timed out before it was
    being used and this the prefetch code was constantly trying to prefetch
    the same data over and over.

    Calling "ls -l" meant that the inodes were brought into memory and once
    the inodes are cached, the glocks are not disposed of until the inodes
    are pushed out of the cache, thus extending the lifetime of the glocks,
    and thus bringing down the time for subsequent runs of "ls"
    considerably.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

09 Dec, 2006

1 commit


07 Dec, 2006

1 commit

  • This is a bit better than the previous version of gfs2_fsync()
    although it would be better still if we were able to call a
    function which only wrote the inode & metadata. Its no big deal
    though that this will potentially write the data as well since
    the VFS has already done that before calling gfs2_fsync(). I've
    also added a comment to explain whats going on here.

    Signed-off-by: Steven Whitehouse
    Cc: Andrew Morton

    Steven Whitehouse
     

30 Nov, 2006

7 commits


02 Oct, 2006

4 commits


25 Sep, 2006

1 commit


20 Sep, 2006

1 commit


19 Sep, 2006

1 commit

  • lm_interface.h has a few out of the tree clients such as GFS1
    and userland tools.

    Right now, these clients keeps a copy of the file in their build tree
    that can go out of sync.

    Move lm_interface.h to include/linux, export it to userland and
    clean up fs/gfs2 to use the new location.

    Signed-off-by: Fabio M. Di Nitto
    Signed-off-by: Steven Whitehouse

    Fabio Massimo Di Nitto
     

05 Sep, 2006

4 commits


01 Sep, 2006

1 commit

  • As per comments from Jan Engelhardt this
    updates the copyright message to say "version" in full rather than
    "v.2". Also incore.h has been updated to remove forward structure
    declarations which are not required.

    The gfs2_quota_lvb structure has now had endianess annotations added
    to it. Also quota.c has been updated so that we now store the
    lvb data locally in endian independant format to avoid needing
    a structure in host endianess too. As a result the endianess
    conversions are done as required at various points and thus the
    conversion routines in lvb.[ch] are no longer required. I've
    moved the one remaining constant in lvb.h thats used into lm.h
    and removed the unused lvb.[ch].

    I have not changed the HIF_ constants. That is left to a later patch
    which I hope will unify the gh_flags and gh_iflags fields of the
    struct gfs2_holder.

    Cc: Jan Engelhardt
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

01 Aug, 2006

1 commit


26 Jul, 2006

1 commit

  • As per comments received, alter the GFS2 direct I/O path so that
    it uses the standard read functions "out of the box". Needs a
    small change to one of the VFS functions. This reduces the size
    of the code quite a lot and also removes the need for one new export.

    Some more work remains to be done, but this is the bones of the
    thing.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

21 Jul, 2006

1 commit

  • traced the "umount hang due to spurious glock" issue that I was having
    with gfs2meta. It's in the do_gfs2_set_flags function, which does a
    gfs2_holder_init as well as a gfs2_glock_nq_init (increases ref count by
    2 instead of 1).

    Signed-off-by: Abhijith Das
    Signed-off-by: Steven Whitehouse

    Abhijith Das
     

11 Jul, 2006

1 commit

  • This adds a generation number for the eventual use of NFS to the
    ondisk inode. Its backward compatible with the current code since
    it doesn't really matter what the generation number is to start with,
    and indeed since its set to zero, due to it being taken from padding
    in both the inode and rgrp header, it should be fine.

    The eventual plan is to use this rather than no_formal_ino in the
    NFS filehandles. At that point no_formal_ino will be unused.

    At the same time we also add a releasepages call back to the
    "normal" address space for gfs2 inodes. Also I've removed a
    one-linrer function thats not required any more.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

04 Jul, 2006

1 commit

  • As per Arjan's patches:
    http://www.kernel.org/git/?p=linux/kernel/git/steve/gfs2-2.6.git;a=commitdiff;h=99ac48f54a91d02140c497edc31dc57d4bc5c85d
    and
    http://www.kernel.org/git/?p=linux/kernel/git/steve/gfs2-2.6.git;a=commitdiff;h=4b6f5d20b04dcbc3d888555522b90ba6d36c4106

    make the GFS2 file_operations structures const.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

24 Jun, 2006

1 commit


22 Jun, 2006

1 commit