21 Jun, 2018

1 commit

  • Before this patch, block reservations kept track of the inode
    number. At one point, that was a valid thing to do. However, since
    we made the reservation a part of the inode (rather than a pointer
    to a separate allocated object) the reservation can determine the
    inode number by using container_of. This saves us a little memory
    in our inode.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse
    Reviewed-by: Andreas Gruenbacher

    Bob Peterson
     

15 Mar, 2018

1 commit


23 Jan, 2018

1 commit


15 Nov, 2017

1 commit

  • Pull gfs2 updates from Bob Peterson:
    "We've got a total of 17 GFS2 patches for this merge window. The
    patches are basically in three categories: (1) patches related to
    broken xfstest cases, (2) patches related to improving iomap and start
    using it in GFS2, and (3) general typos and clarifications.

    Please note that one of the iomap patches extends beyond GFS2 and
    affects other file systems, but it was publically reviewed by a
    variety of file system people in the community.

    From Andreas Gruenbacher:

    - rename variable 'bsize' to 'factor' to clarify the logic related to
    gfs2_block_map.

    - correctly set ctime in the setflags ioctl, which fixes broken
    xfstests test 277.

    - fix broken xfstest 258, due to an atime initialization problem.

    - fix broken xfstest 307, in which GFS2 was not setting ctime when
    setting acls.

    - switch general iomap code from blkno to disk offset for a variety
    of file systems.

    - add a new IOMAP_F_DATA_INLINE flag for iomap to indicate blocks
    that have data mixed with metadata.

    - implement SEEK_HOLE and SEEK_DATA via iomap in GFS2.

    - fix failing xfstest case 066, which was due to not properly syncing
    dirty inodes when changing extended attributes.

    - fix a minor typo in a comment.

    - partially fix xfstest 424, which involved GET_FLAGS and SET_FLAGS
    ioctl. This is also a cleanup and simplification of the translation
    of flags from fs flags to gfs2 flags.

    - add support for STATX_ATTR_ in statx, which fixed broken xfstest
    424.

    - fix for failing xfstest 093 which fixes a recursive glock problem
    with gfs2_xattr_get and _set

    From me:

    - make inode height info part of the 'metapath' data structure to
    facilitate using iomap in GFS2.

    - start using iomap inside GFS2 and switch GFS2's block_map functions
    to use iomap under the covers.

    - switch GFS2's fiemap implementation from using block_map to using
    iomap under the covers.

    - fix journaled data pages not being properly synced to media when
    writing inodes. This was caught with xfstests.

    - fix another failing xfstest case in which switching a file from
    ordered_write to journaled data via set_flags caused a deadlock"

    * tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: Allow gfs2_xattr_set to be called with the glock held
    gfs2: Add support for statx inode flags
    gfs2: Fix and clean up {GET,SET}FLAGS ioctl
    gfs2: Fix a harmless typo
    gfs2: Fix xattr fsync
    GFS2: Take inode off order_write list when setting jdata flag
    GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL
    gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap
    GFS2: Switch fiemap implementation to use iomap
    GFS2: Implement iomap for block_map
    GFS2: Make height info part of metapath
    gfs2: Always update inode ctime in set_acl
    gfs2: Support negative atimes
    gfs2: Update ctime in setflags ioctl
    gfs2: Clarify gfs2_block_map

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

31 Oct, 2017

1 commit

  • This patch implements iomap for block mapping, and switches the
    block_map function to use it under the covers.

    The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has
    reached a "metadata boundary" and fetching the next mapping is likely to
    incur an additional I/O. This flag is used for setting the bh buffer
    boundary flag.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

04 Sep, 2015

2 commits

  • None of these statistics can meaningfully be negative, and the
    numerator for do_div() must have the type u64. The generic
    implementation of do_div() used on some 32-bit architectures asserts
    that, resulting in a compiler error in gfs2_rgrp_congested().

    Fixes: 0166b197c2ed ("GFS2: Average in only non-zero round-trip times ...")

    Signed-off-by: Ben Hutchings
    Signed-off-by: Bob Peterson
    Acked-by: Andreas Gruenbacher

    Ben Hutchings
     
  • What uniquely identifies a glock in the glock hash table is not
    gl_name, but gl_name and its superblock pointer. This patch makes
    the gl_name field correspond to a unique glock identifier. That will
    allow us to simplify hashing with a future patch, since the hash
    algorithm can then take the gl_name and hash its components in one
    operation.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Steven Whitehouse

    Bob Peterson
     

10 Apr, 2013

1 commit

  • This adds the origin indicator to the trace point for glock
    demotion, so that it is possible to see where demote requests
    have come from.

    Note that requests generated from the demote_rq sysfs interface
    will show as remote, since they are intended to replicate
    exactly the effect of a demote reuqest from a remote node. It
    is still possible to tell these apart by looking at the process
    which initiated the demote request.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

16 Nov, 2012

1 commit


24 Sep, 2012

2 commits

  • This patch improves the tracing of block reservations by
    removing some corner cases and also providing more useful
    detail in the traces.

    A new field is added to the reservation structure to contain
    the inode number. This is used since in certain contexts it is
    not possible to access the inode itself to obtain this information.
    As a result we can then display the inode number for all tracepoints
    and also in case we dump the resource group.

    The "del" tracepoint operation has been removed. This could be called
    with the reservation rgrp set to NULL. That resulted in not printing
    the device number, and thus making the information largely useless
    anyway. Also, the conditional on the rgrp being NULL can then be
    removed from the tracepoint. After this change, all the block
    reservation tracepoint calls will be called with the rgrp information.

    The existing ins,clm and tdel calls to the block reservation tracepoint
    are sufficient to track the entire life of the block reservation.

    In gfs2_block_alloc() the error detection is updated to print out
    the inode number of the problematic inode. This can then be compared
    against the information in the glock dump,tracepoints, etc.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This patch introduces a new structure, gfs2_rbm, which is a
    tuple of a resource group, a bitmap within the resource group
    and an offset within that bitmap. This is designed to make
    manipulating these sets of variables easier. There is also a
    new helper function which converts this representation back
    to a disk block address.

    In addition, the rbtree nodes which are used for the reservations
    were not being correctly initialised, which is now fixed. Also,
    the tracing was not passing through the inode where it should
    have been. That is mostly fixed aside from one corner case. This
    needs to be revisited since there can also be a NULL rgrp in
    some cases which results in the device being incorrect in the
    trace.

    This is intended to be the first step towards cleaning up some
    of the allocation code, and some further bug fixes.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

19 Jul, 2012

1 commit

  • This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
    resulting improved on disk layout greatly speeds up operations in cases
    which would have resulted in interlaced allocation of blocks previously.
    A typical example of this is 10 parallel dd processes, each writing to a
    file in a common dirctory.

    The implementation uses an rbtree of reservations attached to each
    resource group (and each inode).

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

11 May, 2012

1 commit

  • This is a second attempt at a patch that adds rgrp information to the
    block allocation trace point for GFS2. As suggested, the patch was
    modified to list the rgrp information _after_ the fields that exist today.

    Again, the reason for this patch is to allow us to trace and debug
    problems with the block reservations patch, which is still in the works.
    We can debug problems with reservations if we can see what block allocations
    result from the block reservations. It may also be handy in figuring out
    if there are problems in rgrp free space accounting. In other words,
    we can use it to track the rgrp and its free space along side the allocations
    that are taking place.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

29 Feb, 2012

1 commit

  • The stats are divided into two sets: those relating to the
    super block and those relating to an individual glock. The
    super block stats are done on a per cpu basis in order to
    try and reduce the overhead of gathering them. They are also
    further divided by glock type.

    In the case of both the super block and glock statistics,
    the same information is gathered in each case. The super
    block statistics are used to provide default values for
    most of the glock statistics, so that newly created glocks
    should have, as far as possible, a sensible starting point.

    The statistics are divided into three pairs of mean and
    variance, plus two counters. The mean/variance pairs are
    smoothed exponential estimates and the algorithm used is
    one which will be very familiar to those used to calculation
    of round trip times in network code.

    The three pairs of mean/variance measure the following
    things:

    1. DLM lock time (non-blocking requests)
    2. DLM lock time (blocking requests)
    3. Inter-request time (again to the DLM)

    A non-blocking request is one which will complete right
    away, whatever the state of the DLM lock in question. That
    currently means any requests when (a) the current state of
    the lock is exclusive (b) the requested state is either null
    or unlocked or (c) the "try lock" flag is set. A blocking
    request covers all the other lock requests.

    There are two counters. The first is there primarily to show
    how many lock requests have been made, and thus how much data
    has gone into the mean/variance calculations. The other counter
    is counting queueing of holders at the top layer of the glock
    code. Hopefully that number will be a lot larger than the number
    of dlm lock requests issued.

    So why gather these statistics? There are several reasons
    we'd like to get a better idea of these timings:

    1. To be able to better set the glock "min hold time"
    2. To spot performance issues more easily
    3. To improve the algorithm for selecting resource groups for
    allocation (to base it on lock wait time, rather than blindly
    using a "try lock")
    Due to the smoothing action of the updates, a step change in
    some input quantity being sampled will only fully be taken
    into account after 8 samples (or 4 for the variance) and this
    needs to be carefully considered when interpreting the
    results.

    Knowing both the time it takes a lock request to complete and
    the average time between lock requests for a glock means we
    can compute the total percentage of the time for which the
    node is able to use a glock vs. time that the rest of the
    cluster has its share. That will be very useful when setting
    the lock min hold time.

    The other point to remember is that all times are in
    nanoseconds. Great care has been taken to ensure that we
    measure exactly the quantities that we want, as accurately
    as possible. There are always inaccuracies in any
    measuring system, but I hope this is as accurate as we
    can reasonably make it.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Apr, 2011

2 commits

  • Add a tracepoint for monitoring writeback of the AIL.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This adds support for two new flags. One keeps track of whether
    the glock is on the LRU list or not. The other isn't really a
    flag as such, but an indication of whether the glock has an
    attached object or not. This indication is reported without
    any locking, which is ok since we do not dereference the object
    pointer but merely report whether it is NULL or not.

    Also, this fixes one place where a tracepoint was missing, which
    was at the point we remove deallocated blocks from the journal.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

20 Sep, 2010

1 commit

  • Due to the design of the VFS, it is quite usual for operations on GFS2
    to consist of a lookup (requiring a shared lock) followed by an
    operation requiring an exclusive lock. If a remote node has cached an
    exclusive lock, then it will receive two demote events in rapid succession
    firstly for a shared lock and then to unlocked. The existing min hold time
    code was triggering in this case, even if the node was otherwise idle
    since the state change time was being updated by the initial demote.

    This patch introduces logic to skip the min hold timer in the case that
    a "double demote" of this kind has occurred. The min hold timer will
    still be used in all other cases.

    A new glock flag is introduced which is used to keep track of whether
    there have been any newly queued holders since the last glock state
    change. The min hold time is only applied if the flag is set.

    Signed-off-by: Steven Whitehouse
    Tested-by: Abhijith Das

    Steven Whitehouse
     

13 Jul, 2009

1 commit

  • If TRACE_INCLDUE_FILE is defined,
    will be included and compiled, otherwise it will be

    So TRACE_SYSTEM should be defined outside of #if proctection,
    just like TRACE_INCLUDE_FILE.

    Imaging this scenario:

    #include
    -> TRACE_SYSTEM == foo
    ...
    #include
    -> TRACE_SYSTEM == bar
    ...
    #define CREATE_TRACE_POINTS
    #include
    -> TRACE_SYSTEM == bar !!!

    and then bar.h will be included and compiled.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

12 Jun, 2009

1 commit

  • This patch adds the ability to trace various aspects of the GFS2
    filesystem. The trace points are divided into three groups,
    glocks, logging and bmap. These points have been chosen because
    they allow inspection of the major internal functions of GFS2
    and they are also generic enough that they are unlikely to need
    any major changes as the filesystem evolves.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse