30 Oct, 2020

1 commit

  • Before this patch, function gfs2_meta_sync called filemap_fdatawrite to write
    the address space for the metadata being synced. That's great for inodes, but
    resource groups all point to the same superblock-address space, sdp->sd_aspace.
    Each rgrp has its own range of blocks on which it should operate. That meant
    every time an rgrp's metadata was synced, it would write all of them instead
    of just the range.

    This patch eliminates function gfs2_meta_sync and tailors specific metasync
    functions for inodes and rgrps.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

23 Oct, 2020

1 commit

  • Apply the outstanding statfs changes in the journal head to the
    master statfs file. Zero out the local statfs file for good measure.

    Previously, statfs updates would be read in from the local statfs inode and
    synced to the master statfs inode during recovery.

    We now use the statfs updates in the journal head to update the master statfs
    inode instead of reading in from the local statfs inode. To preserve backward
    compatibility with kernels that can't do this, we still need to keep the
    local statfs inode up to date by writing changes to it. At some point in the
    future, we can do away with the local statfs inodes altogether and keep the
    statfs changes solely in the journal.

    Signed-off-by: Abhi Das
    Signed-off-by: Andreas Gruenbacher

    Abhi Das
     

17 Jul, 2020

1 commit

  • Using uninitialized_var() is dangerous as it papers over real bugs[1]
    (or can in the future), and suppresses unrelated compiler warnings
    (e.g. "unused variable"). If the compiler thinks it is uninitialized,
    either simply initialize the variable or make compiler changes.

    In preparation for removing[2] the[3] macro[4], remove all remaining
    needless uses with the following script:

    git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
    xargs perl -pi -e \
    's/\buninitialized_var\(([^\)]+)\)/\1/g;
    s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

    drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
    pathological white-space.

    No outstanding warnings were found building allmodconfig with GCC 9.3.0
    for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
    alpha, and m68k.

    [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
    [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
    [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
    [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

    Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
    Acked-by: Jason Gunthorpe # IB
    Acked-by: Kalle Valo # wireless drivers
    Reviewed-by: Chao Yu # erofs
    Signed-off-by: Kees Cook

    Kees Cook
     

29 May, 2020

1 commit

  • Fix several issues in the previous gfs2_find_jhead fix:
    * When updating @blocks_submitted, @block refers to the first block block not
    submitted yet, not the last block submitted, so fix an off-by-one error.
    * We want to ensure that @blocks_submitted is far enough ahead of @blocks_read
    to guarantee that there is in-flight I/O. Otherwise, we'll eventually end up
    waiting for pages that haven't been submitted, yet.
    * It's much easier to compare the number of blocks added with the number of
    blocks submitted to limit the maximum bio size.
    * Even with bio chaining, we can keep adding blocks until we reach the maximum
    bio size, as long as we stop at a page boundary. This simplifies the logic.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Bob Peterson

    Andreas Gruenbacher
     

08 May, 2020

1 commit

  • It turns out that when extending an existing bio, gfs2_find_jhead fails to
    check if the block number is consecutive, which leads to incorrect reads for
    fragmented journals.

    In addition, limit the maximum bio size to an arbitrary value of 2 megabytes:
    since commit 07173c3ec276 ("block: enable multipage bvecs"), if we just keep
    adding pages until bio_add_page fails, bios will grow much larger than useful,
    which pins more memory than necessary with barely any additional performance
    gains.

    Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks")
    Cc: stable@vger.kernel.org # v5.2+
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

28 Mar, 2020

1 commit


27 Feb, 2020

1 commit

  • Before this patch, function gfs2_end_log_write would detect any IO
    errors writing to the journal and put out an appropriate message,
    but it never set a withdrawing condition. Eventually, the log daemon
    would see the error and determine it was time to withdraw, but in
    the meantime, other processes could continue running as if nothing
    bad ever happened. The biggest consequence is that __gfs2_glock_put
    would BUG() when it saw that there were still unwritten items.

    This patch sets the WITHDRAWING status as soon as an IO error is
    detected, and that way, the BUG will be avoided so the file system
    can be properly withdrawn and unmounted.

    Signed-off-by: Bob Peterson
    Reviewed-by: Andreas Gruenbacher

    Bob Peterson
     

10 Feb, 2020

1 commit

  • Before this patch, all io errors received by the quota daemon or the
    logd daemon would cause a complaint message to be issued, such as:

    gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0

    This patch changes it so that the error message is only issued the
    first time the error is encountered.

    Also, before this patch function gfs2_end_log_write did not set the
    sd_log_error value, so log errors would not cause the file system to
    be withdrawn. This patch sets the error code so the file system is
    properly withdrawn if an io error is encountered writing to the journal.

    WARNING: This change in function breaks check xfstests generic/441
    and causes it to fail: io errors writing to the log should cause a
    file system to be withdrawn, and no further operations are tolerated.

    Signed-off-by: Bob Peterson
    Reviewed-by: Andreas Gruenbacher

    Bob Peterson
     

07 Feb, 2020

1 commit

  • When the first log header in a journal happens to have a sequence
    number of 0, a bug in gfs2_find_jhead() causes it to prematurely exit,
    and return an uninitialized jhead with seq 0. This can cause failures
    in the caller. For instance, a mount fails in one test case.

    The correct behavior is for it to continue searching through the journal
    to find the correct journal head with the highest sequence number.

    Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks")
    Cc: stable@vger.kernel.org # v5.2+
    Signed-off-by: Abhi Das
    Signed-off-by: Andreas Gruenbacher

    Abhi Das
     

08 Jan, 2020

1 commit

  • Every caller of function gfs2_struct2blk specified sizeof(u64).

    This patch eliminates the unnecessary parameter and replaces the
    size calculation with a new superblock variable that is computed
    to be the maximum number of block pointers we can fit inside a
    log descriptor, as is done for pointers per dinode and indirect
    block.

    Signed-off-by: Bob Peterson
    Reviewed-by: Andrew Price
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

07 Jan, 2020

1 commit

  • On filesystems with a block size smaller than the page size,
    gfs2_find_jhead can split a page across two bios (for example, when
    blocks are not allocated consecutively). When that happens, the first
    bio that completes will unlock the page in its bi_end_io handler even
    though the page hasn't been read completely yet. Fix that by using a
    chained bio for the rest of the page.

    While at it, clean up the sector calculation logic in
    gfs2_log_alloc_bio. In gfs2_find_jhead, simplify the disk block and
    offset calculation logic and fix a variable name.

    Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks")
    Cc: stable@vger.kernel.org # v5.2+
    Signed-off-by: Andreas Gruenbacher

    Andreas Gruenbacher
     

14 Nov, 2019

1 commit

  • Commit 9287c6452d2b fixed a situation in which gfs2 could use a glock
    after it had been freed. To do that, it temporarily added a new glock
    reference by calling gfs2_glock_hold in function gfs2_add_revoke.
    However, if the bd element was removed by gfs2_trans_remove_revoke, it
    failed to drop the additional reference.

    This patch adds logic to gfs2_trans_remove_revoke to properly drop the
    additional glock reference.

    Fixes: 9287c6452d2b ("gfs2: Fix occasional glock use-after-free")
    Cc: stable@vger.kernel.org # v5.2+
    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

12 Nov, 2019

1 commit

  • Function gfs2_write_log_header can be used to write a log header into any of
    the journals of a filesystem. When used on the node's own journal,
    gfs2_write_log_header advances the current position in the log
    (sdp->sd_log_flush_head) as a side effect, through function gfs2_log_bmap.

    This is confusing, and it also means that we can't use gfs2_log_bmap for other
    journals even if they have an extent map. So clean this mess up by not
    advancing sdp->sd_log_flush_head in gfs2_write_log_header or gfs2_log_bmap
    anymore and making that a responsibility of the callers instead.

    This is related to commit 7c70b896951c ("gfs2: clean_journal improperly set
    sd_log_flush_head").

    Signed-off-by: Andreas Gruenbacher

    Andreas Gruenbacher
     

28 Jun, 2019

2 commits

  • Before this patch, if a glock error was encountered, the glock with
    the problem was dumped. But sometimes you may have lots of file systems
    mounted, and that doesn't tell you which file system it was for.

    This patch adds a new boolean parameter fsid to the dump_glock family
    of functions. For non-error cases, such as dumping the glocks debugfs
    file, the fsid is not dumped in order to keep lock dumps and glocktop
    as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
    file system id is now printed. This will make it easier to debug.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     
  • This patch adds some instrumentation in gfs2's journal replay that
    indicates when we're about to overwrite a rgrp for which we already
    have a valid buffer_head.

    When this problem occurs, it's a situation in which this node has
    been granted a rgrp glock and subsequently read in buffer_heads for
    it, and possibly even made changes to the rgrp bits and/or
    allocation values. But now another node has failed and forced us to
    replay its journal, but its journal contains a copy of the same
    rgrp, without a revoke, which means we're about to overwrite a
    rgrp that we now rightfully own, with an obsolete copy. That is
    always a problem. It means the other node (which failed and left
    its journal to be replayed) failed to flush out its rgrp buffers,
    write out the revoke, and invalidate its copy before it released
    the glock to our possession.

    No node should ever release a glock until its metadata has been
    written to the journal and revoked and invalidated..

    We also kludge around the problem and refuse to replace our good
    copy with the journals bad copy by not marking the buffer dirty,
    but never do it silently. That's wallpapering over a larger problem
    that still exists. IOW, if this situation can happen to this node,
    it can also happen to a different node and we wouldn't even know it
    or be able to circumvent it: Suppose we have a 3-node cluster:
    Node 1 fails, leaving an obsolete rgrp block in its journal without
    a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is
    released and starts making changes, allocating and freeing blocks
    from the rgrp, etc. Node 3 replays the journal from node 1,
    oblivious and unaware that it's about to overwrite node 2's changes.
    So we still need to be vocal and log the error to make it apparent
    that a corruption path still exists in gfs2.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

09 Jun, 2019

1 commit

  • Pull yet more SPDX updates from Greg KH:
    "Another round of SPDX header file fixes for 5.2-rc4

    These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
    added, based on the text in the files. We are slowly chipping away at
    the 700+ different ways people tried to write the license text. All of
    these were reviewed on the spdx mailing list by a number of different
    people.

    We now have over 60% of the kernel files covered with SPDX tags:
    $ ./scripts/spdxcheck.py -v 2>&1 | grep Files
    Files checked: 64533
    Files with SPDX: 40392
    Files with errors: 0

    I think the majority of the "easy" fixups are now done, it's now the
    start of the longer-tail of crazy variants to wade through"

    * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
    ...

    Linus Torvalds
     

06 Jun, 2019

1 commit


05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this copyrighted material is made available to anyone wishing to use
    modify copy or redistribute it subject to the terms and conditions
    of the gnu general public license version 2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 44 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

09 May, 2019

1 commit

  • Pull GFS2 updates from Andreas Gruenbacher:
    "We've got the following patches ready for this merge window:

    - "gfs2: Fix loop in gfs2_rbm_find (v2)"

    A rework of a fix we ended up reverting in 5.0 because of an
    iozone performance regression.

    - "gfs2: read journal in large chunks"
    "gfs2: fix race between gfs2_freeze_func and unmount"

    An improved version of a commit we also ended up reverting in 5.0
    because of a regression in xfstest generic/311. It turns out that
    the journal changes were mostly innocent and that unfreeze didn't
    wait for the freeze to complete, which caused the filesystem to be
    unmounted before it was actually idle.

    - "gfs2: Fix occasional glock use-after-free"
    "gfs2: Fix iomap write page reclaim deadlock"
    "gfs2: Fix lru_count going negative"

    Fixes for various problems reported and partially fixed by Citrix
    engineers. Thank you very much.

    - "gfs2: clean_journal improperly set sd_log_flush_head"

    Another fix from Bob.

    - .. and a few other minor cleanups"

    * tag 'gfs2-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: read journal in large chunks
    gfs2: Fix iomap write page reclaim deadlock
    gfs2: fix race between gfs2_freeze_func and unmount
    gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}
    gfs2: Rename sd_log_le_{revoke,ordered}
    gfs2: Remove unnecessary extern declarations
    gfs2: Remove misleading comments in gfs2_evict_inode
    gfs2: Replace gl_revokes with a GLF flag
    gfs2: Fix occasional glock use-after-free
    gfs2: clean_journal improperly set sd_log_flush_head
    gfs2: Fix lru_count going negative
    gfs2: Fix loop in gfs2_rbm_find (v2)

    Linus Torvalds
     

08 May, 2019

6 commits

  • Use bios to read in the journal into the address space of the journal inode
    (jd_inode), sequentially and in large chunks. This is faster for locating the
    journal head that the previous binary search approach. When performing
    recovery, we keep the journal in the address space until recovery is done,
    which further speeds up things.

    Signed-off-by: Abhi Das
    Signed-off-by: Andreas Gruenbacher

    Abhi Das
     
  • Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to
    sd_log_ordered: not sure what le stands for here, but it doesn't add
    clarity, and if it stands for list entry, it's actually confusing as
    those are both list heads but not list entries.

    Signed-off-by: Andreas Gruenbacher

    Andreas Gruenbacher
     
  • Make log operations statuc; they are only used locally.

    Signed-off-by: Andreas Gruenbacher

    Andreas Gruenbacher
     
  • The gl_revokes value determines how many outstanding revokes a glock has
    on the superblock revokes list; this is used to avoid unnecessary log
    flushes. However, gl_revokes is only ever tested for being zero, and it's
    only decremented in revoke_lo_after_commit, which removes all revokes
    from the list, so we know that the gl_revoke values of all the glocks on
    the list will reach zero. Therefore, we can replace gl_revokes with a
    bit flag. This saves an atomic counter in struct gfs2_glock.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     
  • This patch has to do with the life cycle of glocks and buffers. When
    gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
    object is assigned to track the buffer, and that is queued to various
    lists, including the glock's gl_ail_list to indicate it's on the active
    items list. Once the page associated with the buffer has been written,
    it is removed from the ail list, but its life isn't over until a revoke
    has been successfully written.

    So after the block is written, its bufdata object is moved from the
    glock's gl_ail_list to a file-system-wide list of pending revokes,
    sd_log_le_revoke. At that point the glock still needs to track how many
    revokes it contributed to that list (in gl_revokes) so that things like
    glock go_sync can ensure all the metadata has been not only written, but
    also revoked before the glock is granted to a different node. This is
    to guarantee journal replay doesn't replay the block once the glock has
    been granted to another node.

    Ross Lagerwall recently discovered a race in which an inode could be
    evicted, and its glock freed after its ail list had been synced, but
    while it still had unwritten revokes on the sd_log_le_revoke list. The
    evict decremented the glock reference count to zero, which allowed the
    glock to be freed. After the revoke was written, function
    revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
    and clear its GLF_LFLUSH flag, at which time it referenced the freed
    glock.

    This patch fixes the problem by incrementing the glock reference count
    in gfs2_add_revoke when the glock's first bufdata object is moved from
    the glock to the global revokes list. Later, when the glock's last such
    bufdata object is freed, the reference count is decremented. This
    guarantees that whichever process finishes last (the revoke writing or
    the evict) will properly free the glock, and neither will reference the
    glock after it has been freed.

    Reported-by: Ross Lagerwall
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     
  • This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef.
    Due to that patch, function clean_journal was setting the value of
    sd_log_flush_head, but that's only valid if it is replaying the node's
    own journal. If it's replaying another node's journal, that's completely
    wrong and will lead to multiple problems. This patch tries to clean up
    the mess by passing the value of the logical journal block number into
    gfs2_write_log_header so the function can treat non-owned journals
    generically. For the local journal, the journal extent map is used for
    best performance. For other nodes from other journals, new function
    gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get.

    This patch also tries to establish more consistency when passing journal
    block parameters by changing several unsigned int types to a consistent
    u32.

    Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers")
    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

30 Apr, 2019

1 commit


09 Mar, 2019

1 commit

  • Pull block layer updates from Jens Axboe:
    "Not a huge amount of changes in this round, the biggest one is that we
    finally have Mings multi-page bvec support merged. Apart from that,
    this pull request contains:

    - Small series that avoids quiescing the queue for sysfs changes that
    match what we currently have (Aleksei)

    - Series of bcache fixes (via Coly)

    - Series of lightnvm fixes (via Mathias)

    - NVMe pull request from Christoph. Nothing major, just SPDX/license
    cleanups, RR mp policy (Hannes), and little fixes (Bart,
    Chaitanya).

    - BFQ series (Paolo)

    - Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
    for the fast path (Jianchao)

    - fops->iopoll() added for async IO polling, this is a feature that
    the upcoming io_uring interface will use (Christoph, me)

    - Partition scan loop fixes (Dongli)

    - mtip32xx conversion from managed resource API (Christoph)

    - cdrom registration race fix (Guenter)

    - MD pull from Song, two minor fixes.

    - Various documentation fixes (Marcos)

    - Multi-page bvec feature. This brings a lot of nice improvements
    with it, like more efficient splitting, larger IOs can be supported
    without growing the bvec table size, and so on. (Ming)

    - Various little fixes to core and drivers"

    * tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
    block: fix updating bio's front segment size
    block: Replace function name in string with __func__
    nbd: propagate genlmsg_reply return code
    floppy: remove set but not used variable 'q'
    null_blk: fix checking for REQ_FUA
    block: fix NULL pointer dereference in register_disk
    fs: fix guard_bio_eod to check for real EOD errors
    blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
    block: optimize bvec iteration in bvec_iter_advance
    block: introduce mp_bvec_for_each_page() for iterating over page
    block: optimize blk_bio_segment_split for single-page bvec
    block: optimize __blk_segment_map_sg() for single-page bvec
    block: introduce bvec_nth_page()
    iomap: wire up the iopoll method
    block: add bio_set_polled() helper
    block: wire up block device iopoll method
    fs: add an iopoll method to struct file_operations
    loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
    loop: do not print warn message if partition scan is successful
    block: bounce: make sure that bvec table is updated
    ...

    Linus Torvalds
     

15 Feb, 2019

2 commits

  • This patch introduces one extra iterator variable to bio_for_each_segment_all(),
    then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

    Given it is just one mechannical & simple change on all bio_for_each_segment_all()
    users, this patch does tree-wide change in one single patch, so that we can
    avoid to use a temporary helper for this conversion.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241.

    This patch causes xfstests generic/311 to fail. Reverting this for
    now until we have a proper fix.

    Signed-off-by: Abhi Das
    Signed-off-by: Bob Peterson
    Signed-off-by: Linus Torvalds

    Bob Peterson
     

12 Dec, 2018

2 commits

  • Use bio(s) to read in the journal sequentially in large chunks and
    locate the head of the journal.

    This version addresses the issues Christoph pointed out w.r.t error handling
    and using deprecated API.

    Signed-off-by: Abhi Das
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Cc: Christoph Hellwig

    Abhi Das
     
  • Change gfs2_log_XXX_bio family of functions so they can be used
    with different bios, not just sdp->sd_log_bio.

    This patch also contains some clean up suggested by Andreas.

    Signed-off-by: Abhi Das
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Abhi Das
     

12 Oct, 2018

1 commit

  • This field indicates the size of the bitmap in bytes, similar to how the
    bi_blocks field indicates the size of the bitmap in blocks.

    In count_unlinked, replace an instance of bi_bytes * GFS2_NBBY by
    bi_blocks.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Reviewed-by: Steven Whitehouse

    Andreas Gruenbacher
     

21 Jun, 2018

1 commit

  • In two places, the gfs2_io_error_bh macro is called while holding the
    sd_ail_lock spin lock. This isn't allowed because gfs2_io_error_bh
    withdraws the filesystem, which can sleep because it issues a uevent.
    To fix that, add a gfs2_io_error_bh_wd macro that does withdraw the
    filesystem and change gfs2_io_error_bh to not withdraw the filesystem.
    In those places where the new gfs2_io_error_bh is used, withdraw the
    filesystem after releasing sd_ail_lock.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Reviewed-by: Andrew Price

    Andreas Gruenbacher
     

26 Jan, 2018

1 commit


23 Jan, 2018

1 commit

  • This patch adds a new structure called gfs2_log_header_v2 which is used
    to store expanded fields into previously unused areas of the log headers
    (i.e., this change is backwards compatible). Some of these are used for
    debug purposes so we can backtrack when problems occur. Others are
    reserved for future expansion.

    This patch is based on a prototype from Steve Whitehouse.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher

    Bob Peterson
     

08 Sep, 2017

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     

25 Aug, 2017

1 commit

  • Before this patch, if GFS2 encountered IO errors while writing to
    the journal, it would not report the problem, so they would go
    unnoticed, sometimes for many hours. Sometimes this would only be
    noticed later, when recovery tried to do journal replay and failed
    due to invalid metadata at the blocks that resulted in IO errors.

    This patch makes GFS2's log daemon check for IO errors. If it
    encounters one, it withdraws from the file system and reports
    why in dmesg. A similar action is taken when IO errors occur when
    writing to the system statfs file.

    These errors are also reported back to any callers of fsync, since
    that requires the journal to be flushed. Therefore, any IO errors
    that would previously go unnoticed are now noticed and the file
    system is withdrawn as early as possible, thus preventing further
    file system damage.

    Also note that this reintroduces superblock variable sd_log_error,
    which Christoph removed with commit f729b66fca.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

24 Aug, 2017

1 commit

  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Jul, 2017

1 commit

  • Pull Writeback error handling updates from Jeff Layton:
    "This pile represents the bulk of the writeback error handling fixes
    that I have for this cycle. Some of the earlier patches in this pile
    may look trivial but they are prerequisites for later patches in the
    series.

    The aim of this set is to improve how we track and report writeback
    errors to userland. Most applications that care about data integrity
    will periodically call fsync/fdatasync/msync to ensure that their
    writes have made it to the backing store.

    For a very long time, we have tracked writeback errors using two flags
    in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
    writeback error occurs (via mapping_set_error) and are cleared as a
    side-effect of filemap_check_errors (as you noted yesterday). This
    model really sucks for userland.

    Only the first task to call fsync (or msync or fdatasync) will see the
    error. Any subsequent task calling fsync on a file will get back 0
    (unless another writeback error occurs in the interim). If I have
    several tasks writing to a file and calling fsync to ensure that their
    writes got stored, then I need to have them coordinate with one
    another. That's difficult enough, but in a world of containerized
    setups that coordination may even not be possible.

    But wait...it gets worse!

    The calls to filemap_check_errors can be buried pretty far down in the
    call stack, and there are internal callers of filemap_write_and_wait
    and the like that also end up clearing those errors. Many of those
    callers ignore the error return from that function or return it to
    userland at nonsensical times (e.g. truncate() or stat()). If I get
    back -EIO on a truncate, there is no reason to think that it was
    because some previous writeback failed, and a subsequent fsync() will
    (incorrectly) return 0.

    This pile aims to do three things:

    1) ensure that when a writeback error occurs that that error will be
    reported to userland on a subsequent fsync/fdatasync/msync call,
    regardless of what internal callers are doing

    2) report writeback errors on all file descriptions that were open at
    the time that the error occurred. This is a user-visible change,
    but I think most applications are written to assume this behavior
    anyway. Those that aren't are unlikely to be hurt by it.

    3) document what filesystems should do when there is a writeback
    error. Today, there is very little consistency between them, and a
    lot of cargo-cult copying. We need to make it very clear what
    filesystems should do in this situation.

    To achieve this, the set adds a new data type (errseq_t) and then
    builds new writeback error tracking infrastructure around that. Once
    all of that is in place, we change the filesystems to use the new
    infrastructure for reporting wb errors to userland.

    Note that this is just the initial foray into cleaning up this mess.
    There is a lot of work remaining here:

    1) convert the rest of the filesystems in a similar fashion. Once the
    initial set is in, then I think most other fs' will be fairly
    simple to convert. Hopefully most of those can in via individual
    filesystem trees.

    2) convert internal waiters on writeback to use errseq_t for
    detecting errors instead of relying on the AS_* flags. I have some
    draft patches for this for ext4, but they are not quite ready for
    prime time yet.

    This was a discussion topic this year at LSF/MM too. If you're
    interested in the gory details, LWN has some good articles about this:

    https://lwn.net/Articles/718734/
    https://lwn.net/Articles/724307/"

    * tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    btrfs: minimal conversion to errseq_t writeback error reporting on fsync
    xfs: minimal conversion to errseq_t writeback error reporting
    ext4: use errseq_t based error handling for reporting data writeback errors
    fs: convert __generic_file_fsync to use errseq_t based reporting
    block: convert to errseq_t based writeback error tracking
    dax: set errors in mapping when writeback fails
    Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
    mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
    fs: new infrastructure for writeback error handling and reporting
    lib: add errseq_t type and infrastructure for handling it
    mm: don't TestClearPageError in __filemap_fdatawait_range
    mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
    jbd2: don't clear and reset errors after waiting on writeback
    buffer: set errors in mapping at the time that the error occurs
    fs: check for writeback errors after syncing out buffers in generic_file_fsync
    buffer: use mapping_set_error instead of setting the flag
    mm: fix mapping_set_error call in me_pagecache_dirty

    Linus Torvalds
     

06 Jul, 2017

1 commit

  • I noticed on xfs that I could still sometimes get back an error on fsync
    on a fd that was opened after the error condition had been cleared.

    The problem is that the buffer code sets the write_io_error flag and
    then later checks that flag to set the error in the mapping. That flag
    perisists for quite a while however. If the file is later opened with
    O_TRUNC, the buffers will then be invalidated and the mapping's error
    set such that a subsequent fsync will return error. I think this is
    incorrect, as there was no writeback between the open and fsync.

    Add a new mark_buffer_write_io_error operation that sets the flag and
    the error in the mapping at the same time. Replace all calls to
    set_buffer_write_io_error with mark_buffer_write_io_error, and remove
    the places that check this flag in order to set the error in the
    mapping.

    This sets the error in the mapping earlier, at the time that it's first
    detected.

    Signed-off-by: Jeff Layton
    Reviewed-by: Jan Kara
    Reviewed-by: Carlos Maiolino

    Jeff Layton