28 Jun, 2019

3 commits

  • This patch replaces a few leftover printk errors with calls to
    fs_info and similar, so that the file system having the error is
    properly logged.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     
  • Before this patch, if a glock error was encountered, the glock with
    the problem was dumped. But sometimes you may have lots of file systems
    mounted, and that doesn't tell you which file system it was for.

    This patch adds a new boolean parameter fsid to the dump_glock family
    of functions. For non-error cases, such as dumping the glocks debugfs
    file, the fsid is not dumped in order to keep lock dumps and glocktop
    as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
    file system id is now printed. This will make it easier to debug.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     
  • Before this patch, the superblock flag indicating when a file system
    is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
    the more obvious SDF_WITHDRAWN.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this copyrighted material is made available to anyone wishing to use
    modify copy or redistribute it subject to the terms and conditions
    of the gnu general public license version 2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 44 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

08 May, 2019

1 commit

  • Use bios to read in the journal into the address space of the journal inode
    (jd_inode), sequentially and in large chunks. This is faster for locating the
    journal head that the previous binary search approach. When performing
    recovery, we keep the journal in the address space until recovery is done,
    which further speeds up things.

    Signed-off-by: Abhi Das
    Signed-off-by: Andreas Gruenbacher

    Abhi Das
     

15 Feb, 2019

1 commit


12 Dec, 2018

2 commits

  • This patch is based on an idea from Steve Whitehouse. The idea is
    to dump the number of pages for inodes in the glock dumps.
    The additional locking required me to drop const from quite a few
    places.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     
  • Use bio(s) to read in the journal sequentially in large chunks and
    locate the head of the journal.

    This version addresses the issues Christoph pointed out w.r.t error handling
    and using deprecated API.

    Signed-off-by: Abhi Das
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Cc: Christoph Hellwig

    Abhi Das
     

06 Jun, 2018

1 commit

  • struct timespec is not y2038 safe. Transition vfs to use
    y2038 safe struct timespec64 instead.

    The change was made with the help of the following cocinelle
    script. This catches about 80% of the changes.
    All the header file and logic changes are included in the
    first 5 rules. The rest are trivial substitutions.
    I avoid changing any of the function signatures or any other
    filesystem specific data structures to keep the patch simple
    for review.

    The script can be a little shorter by combining different cases.
    But, this version was sufficient for my usecase.

    virtual patch

    @ depends on patch @
    identifier now;
    @@
    - struct timespec
    + struct timespec64
    current_time ( ... )
    {
    - struct timespec now = current_kernel_time();
    + struct timespec64 now = current_kernel_time64();
    ...
    - return timespec_trunc(
    + return timespec64_trunc(
    ... );
    }

    @ depends on patch @
    identifier xtime;
    @@
    struct \( iattr \| inode \| kstat \) {
    ...
    - struct timespec xtime;
    + struct timespec64 xtime;
    ...
    }

    @ depends on patch @
    identifier t;
    @@
    struct inode_operations {
    ...
    int (*update_time) (...,
    - struct timespec t,
    + struct timespec64 t,
    ...);
    ...
    }

    @ depends on patch @
    identifier t;
    identifier fn_update_time =~ "update_time$";
    @@
    fn_update_time (...,
    - struct timespec *t,
    + struct timespec64 *t,
    ...) { ... }

    @ depends on patch @
    identifier t;
    @@
    lease_get_mtime( ... ,
    - struct timespec *t
    + struct timespec64 *t
    ) { ... }

    @te depends on patch forall@
    identifier ts;
    local idexpression struct inode *inode_node;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn_update_time =~ "update_time$";
    identifier fn;
    expression e, E3;
    local idexpression struct inode *node1;
    local idexpression struct inode *node2;
    local idexpression struct iattr *attr1;
    local idexpression struct iattr *attr2;
    local idexpression struct iattr attr;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    @@
    (
    (
    - struct timespec ts;
    + struct timespec64 ts;
    |
    - struct timespec ts = current_time(inode_node);
    + struct timespec64 ts = current_time(inode_node);
    )

    i_xtime, &ts)
    + timespec64_equal(&inode_node->i_xtime, &ts)
    |
    - timespec_equal(&ts, &inode_node->i_xtime)
    + timespec64_equal(&ts, &inode_node->i_xtime)
    |
    - timespec_compare(&inode_node->i_xtime, &ts)
    + timespec64_compare(&inode_node->i_xtime, &ts)
    |
    - timespec_compare(&ts, &inode_node->i_xtime)
    + timespec64_compare(&ts, &inode_node->i_xtime)
    |
    ts = current_time(e)
    |
    fn_update_time(..., &ts,...)
    |
    inode_node->i_xtime = ts
    |
    node1->i_xtime = ts
    |
    ts = inode_node->i_xtime
    |
    ia_xtime ...+> = ts
    |
    ts = attr1->ia_xtime
    |
    ts.tv_sec
    |
    ts.tv_nsec
    |
    btrfs_set_stack_timespec_sec(..., ts.tv_sec)
    |
    btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
    |
    - ts = timespec64_to_timespec(
    + ts =
    ...
    -)
    |
    - ts = ktime_to_timespec(
    + ts = ktime_to_timespec64(
    ...)
    |
    - ts = E3
    + ts = timespec_to_timespec64(E3)
    |
    - ktime_get_real_ts(&ts)
    + ktime_get_real_ts64(&ts)
    |
    fn(...,
    - ts
    + timespec64_to_timespec(ts)
    ,...)
    )
    ...+>
    (

    )
    |
    - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
    |
    - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
    + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
    |
    - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
    |
    node1->i_xtime1 =
    - timespec_trunc(attr1->ia_xtime1,
    + timespec64_trunc(attr1->ia_xtime1,
    ...)
    |
    - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
    + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
    ...)
    |
    - ktime_get_real_ts(&attr1->ia_xtime1)
    + ktime_get_real_ts64(&attr1->ia_xtime1)
    |
    - ktime_get_real_ts(&attr.ia_xtime1)
    + ktime_get_real_ts64(&attr.ia_xtime1)
    )

    @ depends on patch @
    struct inode *node;
    struct iattr *attr;
    identifier fn;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    expression e;
    @@
    (
    - fn(node->i_xtime);
    + fn(timespec64_to_timespec(node->i_xtime));
    |
    fn(...,
    - node->i_xtime);
    + timespec64_to_timespec(node->i_xtime));
    |
    - e = fn(attr->ia_xtime);
    + e = fn(timespec64_to_timespec(attr->ia_xtime));
    )

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn;
    @@
    {
    + struct timespec ts;
    i_xtime);
    fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    )
    ...+>
    }

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    struct kstat *stat;
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier i_xtime =~ "^i_[acm]time$";
    identifier xtime =~ "^[acm]time$";
    identifier fn, ret;
    @@
    {
    + struct timespec ts;
    i_xtime);
    ret = fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(node->i_xtime);
    ret = fn (...,
    - &node->i_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(stat->xtime);
    ret = fn (...,
    - &stat->xtime);
    + &ts);
    )
    ...+>
    }

    @ depends on patch @
    struct inode *node;
    struct inode *node2;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier i_xtime3 =~ "^i_[acm]time$";
    struct iattr *attrp;
    struct iattr *attrp2;
    struct iattr attr ;
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    struct kstat *stat;
    struct kstat stat1;
    struct timespec64 ts;
    identifier xtime =~ "^[acmb]time$";
    expression e;
    @@
    (
    ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
    |
    node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    stat->xtime = node2->i_xtime1;
    |
    stat1.xtime = node2->i_xtime1;
    |
    ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
    |
    ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
    |
    - e = node->i_xtime1;
    + e = timespec64_to_timespec( node->i_xtime1 );
    |
    - e = attrp->ia_xtime1;
    + e = timespec64_to_timespec( attrp->ia_xtime1 );
    |
    node->i_xtime1 = current_time(...);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    - node->i_xtime1 = e;
    + node->i_xtime1 = timespec_to_timespec64(e);
    )

    Signed-off-by: Deepa Dinamani
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:

    Deepa Dinamani
     

23 Jan, 2018

2 commits

  • This patch just adds the capability for GFS2 to track which function
    called gfs2_log_flush. This should make it easier to diagnose
    problems based on the sequence of events found in the journals.

    Signed-off-by: Bob Peterson
    Reviewed-by: Andreas Gruenbacher

    Bob Peterson
     
  • This patch adds a new structure called gfs2_log_header_v2 which is used
    to store expanded fields into previously unused areas of the log headers
    (i.e., this change is backwards compatible). Some of these are used for
    debug purposes so we can backtrack when problems occur. Others are
    reserved for future expansion.

    This patch is based on a prototype from Steve Whitehouse.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

15 Sep, 2017

1 commit

  • Pull mount flag updates from Al Viro:
    "Another chunk of fmount preparations from dhowells; only trivial
    conflicts for that part. It separates MS_... bits (very grotty
    mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
    only a small subset of MS_... stuff).

    This does *not* convert the filesystems to new constants; only the
    infrastructure is done here. The next step in that series is where the
    conflicts would be; that's the conversion of filesystems. It's purely
    mechanical and it's better done after the merge, so if you could run
    something like

    list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

    sed -i -e 's/\/SB_RDONLY/g' \
    -e 's/\/SB_NOSUID/g' \
    -e 's/\/SB_NODEV/g' \
    -e 's/\/SB_NOEXEC/g' \
    -e 's/\/SB_SYNCHRONOUS/g' \
    -e 's/\/SB_MANDLOCK/g' \
    -e 's/\/SB_DIRSYNC/g' \
    -e 's/\/SB_NOATIME/g' \
    -e 's/\/SB_NODIRATIME/g' \
    -e 's/\/SB_SILENT/g' \
    -e 's/\/SB_POSIXACL/g' \
    -e 's/\/SB_KERNMOUNT/g' \
    -e 's/\/SB_I_VERSION/g' \
    -e 's/\/SB_LAZYTIME/g' \
    $list

    and commit it with something along the lines of 'convert filesystems
    away from use of MS_... constants' as commit message, it would save a
    quite a bit of headache next cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VFS: Differentiate mount flags (MS_*) from internal superblock flags
    VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
    vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

    Linus Torvalds
     

10 Aug, 2017

1 commit

  • Remove gfs2_set_nlink which prevents the link count of an inode from
    becoming non-zero once it has reached zero. The next commit reduces the
    amount of waiting on glocks when an inode is evicted from memory. With
    that, an inode can become reallocated before all the remote-unlink
    callbacks from a previous delete are processed, which causes the link
    count to change from zero to non-zero.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

21 Jul, 2017

1 commit

  • In inode_go_lock() function, the parameter order of list_add() is error.
    According to the define of list_add(), the first parameter is new entry
    and the second is the list head, so ip->i_trunc_list should be the
    first parameter and the sdp->sd_trunc_list should be second.

    Signed-off-by: Wang Xibo
    Signed-off-by: Xiao Likun
    Signed-off-by: Bob Peterson

    Wang Xibo
     

17 Jul, 2017

1 commit

  • Firstly by applying the following with coccinelle's spatch:

    @@ expression SB; @@
    -SB->s_flags & MS_RDONLY
    +sb_rdonly(SB)

    to effect the conversion to sb_rdonly(sb), then by applying:

    @@ expression A, SB; @@
    (
    -(!sb_rdonly(SB)) && A
    +!sb_rdonly(SB) && A
    |
    -A != (sb_rdonly(SB))
    +A != sb_rdonly(SB)
    |
    -A == (sb_rdonly(SB))
    +A == sb_rdonly(SB)
    |
    -!(sb_rdonly(SB))
    +!sb_rdonly(SB)
    |
    -A && (sb_rdonly(SB))
    +A && sb_rdonly(SB)
    |
    -A || (sb_rdonly(SB))
    +A || sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) != A
    +sb_rdonly(SB) != A
    |
    -(sb_rdonly(SB)) == A
    +sb_rdonly(SB) == A
    |
    -(sb_rdonly(SB)) && A
    +sb_rdonly(SB) && A
    |
    -(sb_rdonly(SB)) || A
    +sb_rdonly(SB) || A
    )

    @@ expression A, B, SB; @@
    (
    -(sb_rdonly(SB)) ? 1 : 0
    +sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) ? A : B
    +sb_rdonly(SB) ? A : B
    )

    to remove left over excess bracketage and finally by applying:

    @@ expression A, SB; @@
    (
    -(A & MS_RDONLY) != sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) != sb_rdonly(SB)
    |
    -(A & MS_RDONLY) == sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) == sb_rdonly(SB)
    )

    to make comparisons against the result of sb_rdonly() (which is a bool)
    work correctly.

    Signed-off-by: David Howells

    David Howells
     

05 Jul, 2017

2 commits

  • Put all remaining accesses to gl->gl_object under the
    gl->gl_lockref.lock spinlock to prevent races.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     
  • So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
    work queue to make sure that inode glops which dereference gl->gl_object
    have finished running before the inode is destroyed. However, flushing
    the work queue may do more work than needed, and in particular, it may
    call into DLM, which we want to avoid here. Use a bit lock
    (GIF_GLOP_PENDING) to synchronize between the inode glops and
    gfs2_evict_inode instead to get rid of the flushing.

    In addition, flush the work queues of existing glocks before reusing
    them for new inodes to get those glocks into a known state: the glock
    state engine currently doesn't handle glock re-appropriation correctly.
    (We may be able to fix the glock state engine instead later.)

    Based on a patch by Steven Whitehouse .

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

05 Apr, 2016

1 commit

  • Function inode_go_demote_ok had some code that was only executed
    if gl_holders was not empty. However, if gl_holders was not empty,
    the only caller, demote_ok(), returns before inode_go_demote_ok
    would ever be called. Therefore, it's dead code, so I removed it.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     

25 Dec, 2015

1 commit

  • When gfs2 releases the glock of an inode, it must invalidate all
    information cached for that inode, including the page cache and acls.
    Use the new security_inode_invalidate_secctx hook to also invalidate
    security labels in that case. These items will be reread from disk
    when needed after reacquiring the glock.

    Signed-off-by: Andreas Gruenbacher
    Acked-by: Bob Peterson
    Acked-by: Steven Whitehouse
    Cc: cluster-devel@redhat.com
    [PM: fixed spelling errors and description line lengths]
    Signed-off-by: Paul Moore

    Andreas Gruenbacher
     

30 Oct, 2015

1 commit

  • Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a
    gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in
    gl_lockref). Remove that define to make the references to gl_lockref.lock more
    obvious.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

04 Sep, 2015

1 commit

  • What uniquely identifies a glock in the glock hash table is not
    gl_name, but gl_name and its superblock pointer. This patch makes
    the gl_name field correspond to a unique glock identifier. That will
    allow us to simplify hashing with a future patch, since the hash
    algorithm can then take the gl_name and hash its components in one
    operation.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Steven Whitehouse

    Bob Peterson
     

19 Jun, 2015

2 commits

  • This patch allows the block allocation code to retain the buffers
    for the resource groups so they don't need to be re-read from buffer
    cache with every request. This is a performance improvement that's
    especially noticeable when resource groups are very large. For
    example, with 2GB resource groups and 4K blocks, there can be 33
    blocks for every resource group. This patch allows those 33 buffers
    to be kept around and not read in and thrown away with every
    operation. The buffers are released when the resource group is
    either synced or invalidated.

    Signed-off-by: Bob Peterson
    Reviewed-by: Steven Whitehouse
    Reviewed-by: Benjamin Marzinski

    Bob Peterson
     
  • The glocks used for resource groups often come and go hundreds of
    thousands of times per second. Adding them to the lru list just
    adds unnecessary contention for the lru_lock spin_lock, especially
    considering we're almost certainly going to re-use the glock and
    take it back off the lru microseconds later. We never want the
    glock shrinker to cull them anyway. This patch adds a new bit in
    the glops that determines which glock types get put onto the lru
    list and which ones don't.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     

17 Nov, 2014

1 commit

  • The current gfs2 freezing code is considerably more complicated than it
    should be because it doesn't use the vfs freezing code on any node except
    the one that begins the freeze. This is because it needs to acquire a
    cluster glock before calling the vfs code to prevent a deadlock, and
    without the new freeze_super and thaw_super hooks, that was impossible. To
    deal with the issue, gfs2 had to do some hacky locking tricks to make sure
    that a frozen node couldn't be holding on a lock it needed to do the
    unfreeze ioctl.

    This patch makes use of the new hooks to simply the gfs2 locking code. Now,
    all the nodes in the cluster freeze and thaw in exactly the same way. Every
    node in the cluster caches the freeze glock in the shared state. The new
    freeze_super hook allows the freezing node to grab this freeze glock in
    the exclusive state without first calling the vfs freeze_super function.
    All the nodes in the cluster see this lock change, and call the vfs
    freeze_super function. The vfs locking code guarantees that the nodes can't
    get stuck holding the glocks necessary to unfreeze the system. To
    unfreeze, the freezing node uses the new thaw_super hook to drop the freeze
    glock. Again, all the nodes notice this, reacquire the glock in shared mode
    and call the vfs thaw_super function.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

08 Oct, 2014

1 commit


18 Jul, 2014

1 commit


04 Jun, 2014

1 commit

  • …teve/gfs2-3.0-nmw into next

    Pull gfs2 updates from Steven Whitehouse:
    "This must be about the smallest merge window patch set ever for GFS2.
    It is probably also the first one without a single patch from me.
    That is down to a combination of factors, and I have some things in
    the works that are not quite ready yet, that I hope to put in next
    time around.

    Returning to what is here this time... we have 3 patches which fix
    various warnings. Two are bug fixes (for quotas and also a rare
    recovery race condition). The final patch, from Ben Marzinski, is an
    important change in the freeze code which has been in progress for
    some time. This removes the need to take and drop the transaction
    lock for every single transaction, when the only time it was used, was
    at file system freeze time. Ben's patch integrates the freeze
    operation into the journal flush code as an alternative with lower
    overheads and also lands up resolving some difficult to fix races at
    the same time"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
    GFS2: Prevent recovery before the local journal is set
    GFS2: fs/gfs2/file.c: kernel-doc warning fixes
    GFS2: fs/gfs2/bmap.c: kernel-doc warning fixes
    GFS2: remove transaction glock
    GFS2: lops.c: replace 0 by NULL for pointers
    GFS2: quotas not being refreshed in gfs2_adjust_quota

    Linus Torvalds
     

14 May, 2014

1 commit

  • GFS2 has a transaction glock, which must be grabbed for every
    transaction, whose purpose is to deal with freezing the filesystem.
    Aside from this involving a large amount of locking, it is very easy to
    make the current fsfreeze code hang on unfreezing.

    This patch rewrites how gfs2 handles freezing the filesystem. The
    transaction glock is removed. In it's place is a freeze glock, which is
    cached (but not held) in a shared state by every node in the cluster
    when the filesystem is mounted. This lock only needs to be grabbed on
    freezing, and actions which need to be safe from freezing, like
    recovery.

    When a node wants to freeze the filesystem, it grabs this glock
    exclusively. When the freeze glock state changes on the nodes (either
    from shared to unlocked, or shared to exclusive), the filesystem does a
    special log flush. gfs2_log_flush() does all the work for flushing out
    the and shutting down the incore log, and then it tries to grab the
    freeze glock in a shared state again. Since the filesystem is stuck in
    gfs2_log_flush, no new transaction can start, and nothing can be written
    to disk. Unfreezing the filesytem simply involes dropping the freeze
    glock, allowing gfs2_log_flush() to grab and then release the shared
    lock, so it is cached for next time.

    However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
    shared lock on the filesystem root directory inode to check permissions.
    If that glock has already been grabbed exclusively, fsfreeze will be
    unable to get the shared lock and unfreeze the filesystem.

    In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
    on the filesystem root directory during the freeze, and hold it until it
    unfreezes the filesystem. The functions which need to grab a shared
    lock in order to allow the unfreeze ioctl to be issued now use the lock
    grabbed by the freeze code instead.

    The freeze and unfreeze code take care to make sure that this shared
    lock will not be dropped while another process is using it.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Feb, 2014

1 commit

  • Over time, we hope to be able to improve the concurrency available
    in the log code. This is one small step towards that, by moving
    the buffer lists from the super block, and into the transaction
    structure, so that each transaction builds its own buffer lists.

    At transaction commit time, the buffer lists are merged into
    the currently accumulating transaction. That transaction then
    is passed into the before and after commit functions at journal
    flush time. Thus there should be no change in overall behaviour
    yet.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

16 Jan, 2014

1 commit

  • Al Viro has tactfully pointed out that we are using the incorrect
    error code in some cases. This patch fixes that, and also removes
    the (unused) return value for glock dumping.

    > * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is
    > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
    > we don't support STREAMS it's probably fair game, but... what the hell?

    Signed-off-by: Steven Whitehouse
    Cc: Al Viro

    Steven Whitehouse
     

03 Jan, 2014

2 commits

  • Prior to this patch, GFS2 had one address space for each rgrp,
    stored in the glock. This patch changes them to use a single
    address space in the super block. This therefore saves
    (sizeof(struct address_space) * nr_of_rgrps) bytes of memory
    and for large filesystems, that can be significant.

    It would be nice to be able to do something similar and merge
    the inode metadata address space into the same global
    address space. However, that is rather more complicated as the
    on-disk location doesn't have a 1:1 mapping with the inodes in
    general. So while it could be done, it will be a more complicated
    operation as it requires changing a lot more code paths.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Each rgrp header is represented as a single extent on disk, so we
    can calculate the position within the address space, since we are
    using address spaces mapped 1:1 to the disk. This means that it
    is possible to use the range based versions of filemap_fdatawrite/wait
    and for invalidating the page cache.

    Our eventual intent is to then be able to merge the address spaces
    used for rgrps into a single address space, rather than to have
    one for each glock, saving memory and reducing complexity.

    Since during umount, the rgrp structures are disposed of before
    the glocks, we need to store the extent information in the glock
    so that is is available for a final invalidation. This patch uses
    a field which is otherwise unused in rgrp glocks to do that, so
    that we do not have to expand the size of a glock.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Dec, 2013

1 commit

  • We need to wait for any outstanding DIO to complete in a couple
    of situations. Firstly, in case we are changing out of deferred
    mode (in inode_go_sync) where GLF_DIRTY will not be set. That
    call could be prefixed with a test for gl_state == LM_ST_DEFERRED
    but it doesn't seem worth it bearing in mind that the test for
    outstanding DIO is very quick anyway, in the usual case that there
    is none.

    The second case is in inode_go_lock which will catch the cases
    where we have a cached EX lock, but where we grant deferred locks
    against it so that there is no glock state transistion. We only
    need to wait if the state is not deferred, since DIO is valid
    anyway in that state.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

15 Oct, 2013

1 commit

  • Currently glocks have an atomic reference count and also a spinlock
    which covers various internal fields, such as the state. This intent of
    this patch is to replace the spinlock and the atomic reference count
    with a lockref structure. This contains a spinlock which we can continue
    to use as before, and a reference counter which is used in conjuction
    with the spinlock to replace the previous atomic counter.

    As a result of this there are some new rules for reference counting on
    glocks. We need to distinguish between reference count changes under
    gl_spin (which are now just increment or decrement of the new counter,
    provided the count cannot hit zero) and those which are outside of
    gl_spin, but which now take gl_spin internally.

    The conversion is relatively straight forward. There is probably some
    further clean up which can be done, but the priority at this stage is to
    make the change in as simple a manner as possible.

    A consequence of this change is that the reference count is being
    decoupled from the lru list processing. This should allow future
    adoption of the lru_list code with glocks in due course.

    The reason for using the "dead" state and not just relying on 0 being
    the "invalid state" is so that in due course 0 ref counts can be
    allowable. The intent is to eventually be able to remove the ref count
    changes which are currently hidden away in state_change().

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 Aug, 2013

1 commit

  • When run during fsync, a gfs2_log_flush could happen between the
    time when gfs2_ail_flush checked the number of blocks to revoke,
    and when it actually started the transaction to do those revokes.
    This occassionally caused it to need more revokes than it reserved,
    causing gfs2 to crash.

    Instead of just reserving enough revokes to handle the blocks that
    currently need them, this patch makes gfs2_ail_flush reserve the
    maximum number of revokes it can, without increasing the total number
    of reserved log blocks. This patch also passes the number of reserved
    revokes to __gfs2_ail_flush() so that it doesn't go over its limit
    and cause a crash like we're seeing. Non-fsync calls to __gfs2_ail_flush
    will still cause a BUG() necessary revokes are skipped.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

19 Jun, 2013

1 commit

  • This patch looks at all the outstanding blocks in all the transactions
    on the log, and moves the completed ones to the ail2 list. Then it
    issues revokes for these blocks. This will hopefully speed things up
    in situations where there is a lot of contention for glocks, especially
    if they are acquired serially.

    revoke_lo_before_commit will issue at most one log block's full of these
    preemptive revokes. The amount of reserved log space that
    gfs2_log_reserve() ignores has been incremented to allow for this extra
    block.

    This patch also consolidates the common revoke instructions into one
    function, gfs2_add_revoke().

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

10 Apr, 2013

1 commit

  • This patch adds a bool indicating whether the demote
    request was originated locally or remotely. This is then
    used by the iopen ->go_callback() to make 100% sure that
    it will only respond to remote callbacks.

    Since ->evict_inode() uses GL_NOCACHE when it attempts to
    get an exclusive lock on the iopen lock, this may result
    in extra scheduling of the workqueue in case that the
    exclusive promotion request failed. This patch prevents
    that from happening.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

13 Feb, 2013

1 commit

  • When reading dinodes from the disk convert uids and gids
    into kuids and kgids to store in vfs data structures.

    When writing to dinodes to the disk convert kuids and kgids
    in the in memory structures into plain uids and gids.

    For now all on disk data structures are assumed to be
    stored in the initial user namespace.

    Cc: Steven Whitehouse
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

15 Nov, 2012

1 commit