30 Apr, 2019

1 commit


15 Feb, 2019

1 commit

  • This patch introduces one extra iterator variable to bio_for_each_segment_all(),
    then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

    Given it is just one mechannical & simple change on all bio_for_each_segment_all()
    users, this patch does tree-wide change in one single patch, so that we can
    avoid to use a temporary helper for this conversion.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

21 Jun, 2018

1 commit

  • In two places, the gfs2_io_error_bh macro is called while holding the
    sd_ail_lock spin lock. This isn't allowed because gfs2_io_error_bh
    withdraws the filesystem, which can sleep because it issues a uevent.
    To fix that, add a gfs2_io_error_bh_wd macro that does withdraw the
    filesystem and change gfs2_io_error_bh to not withdraw the filesystem.
    In those places where the new gfs2_io_error_bh is used, withdraw the
    filesystem after releasing sd_ail_lock.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson
    Reviewed-by: Andrew Price

    Andreas Gruenbacher
     

08 Sep, 2017

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     

24 Aug, 2017

1 commit

  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

21 Jul, 2017

1 commit

  • When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
    the bio. But flag REQ_META is just a hint for block trace, not for block
    layer code to handle a bio as metadata request.

    For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
    would be very informative to block layer code. For example, if bcache is
    used as a I/O cache for gfs2, it will be possible for bcache code to get
    the hint and cache the pre-fetched metadata blocks on cache device. This
    behavior may be helpful to improve metadata I/O performance if the
    following requests hit the cache.

    Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
    - All places where REQ_READAHEAD is used, gfs2 code uses this flag for
    metadata read ahead.
    - In gfs2_meta_rq() where the first metadata block is read in.
    - In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
    up to date.
    These metadata blocks are probably to be accessed again in future, adding
    a REQ_PRIO flag may have bcache to keep such metadata in fast cache
    device. For system without a cache layer, REQ_PRIO can still provide hint
    to block layer to handle metadata requests more properly.

    Signed-off-by: Coly Li
    Signed-off-by: Bob Peterson

    Coly Li
     

17 Jul, 2017

1 commit

  • Before this patch, problems reading in indirect buffers would send
    an IO error back to the caller, and release the buffer_head with
    brelse() in function gfs2_meta_indirect_buffer, however, it would
    still return the address of the buffer_head it released. After the
    error was discovered, function gfs2_block_map would call function
    release_metapath to free all buffers. That checked:
    if (mp->mp_bh[i] == NULL) but since the value was set after the
    error, it was non-zero, so brelse was called a second time. This
    resulted in the following error:

    kernel: WARNING: at fs/buffer.c:1224 __brelse+0x3a/0x40() (Tainted: G W -- ------------ )
    kernel: Hardware name: RHEV Hypervisor
    kernel: VFS: brelse: Trying to free free buffer

    This patch changes gfs2_meta_indirect_buffer so it only sets
    the buffer_head pointer in cases where it isn't released.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     

09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

21 Feb, 2017

1 commit

  • Pull GFS2 updates from Robert Peterson:
    "We've got eight GFS2 patches for this merge window:

    - Andy Price submitted a patch to make gfs2_write_full_page a static
    function.

    - Dan Carpenter submitted a patch to fix a ERR_PTR thinko.

    Three patches fix bugs related to deleting very large files, which
    cause GFS2 to run out of journal space:

    - The first one prevents GFS2 delete operation from requesting too
    much journal space.

    - The second one fixes a problem whereby GFS2 can hang because it
    wasn't taking journal space demand into its calculations.

    - The third one wakes up IO waiters when a flush is done to restart
    processes stuck waiting for journal space to become available.

    The final three patches are a performance improvement related to
    spin_lock contention between multiple writers:

    - The "tr_touched" variable was switched to a flag to be more atomic
    and eliminate the possibility of some races.

    - Function meta_lo_add was moved inline with its only caller to make
    the code more readable and efficient.

    - Contention on the gfs2_log_lock spinlock was greatly reduced by
    avoiding the lock altogether in cases where we don't really need
    it: buffers that already appear in the appropriate metadata list
    for the journal. Many thanks to Steve Whitehouse for the ideas and
    principles behind these patches"

    * tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: Make gfs2_write_full_page static
    GFS2: Reduce contention on gfs2_log_lock
    GFS2: Inline function meta_lo_add
    GFS2: Switch tr_touched to flag in transaction
    GFS2: Wake up io waiters whenever a flush is done
    GFS2: Made logd daemon take into account log demand
    GFS2: Limit number of transaction blocks requested for truncates
    GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next

    Linus Torvalds
     

27 Jan, 2017

1 commit


03 Nov, 2016

1 commit

  • Add wbc_to_write_flags(), which returns the write modifier flags to use,
    based on a struct writeback_control. No functional changes in this
    patch, but it prepares us for factoring other wbc fields for write type.

    Signed-off-by: Jens Axboe
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     

01 Nov, 2016

1 commit


19 Aug, 2016

1 commit

  • Commit 39b0555f didn't check for a failing bio_add_page in
    gfs2_submit_bhs. This could cause I/O requests to get lost, and the
    affected buffer heads to stay locked forever. Fix that by submitting
    the current bio and allocating another one when bio_add_page fails. (It
    is guaranteed that we can at least add one page to a bio.)

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

21 Jul, 2016

1 commit

  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Jun, 2016

4 commits


21 May, 2016

1 commit

  • Pull GFS2 updates from Bob Peterson:
    "We've got nine patches this time:

    - Abhi Das has two patches that fix a GFS2 splice issue (and an
    adjustment).

    - Ben Marzinski has a patch which allows the proper unmount of a GFS2
    file system after hitting a withdraw error.

    - I have a patch to fix a problem where GFS2 would dereference an
    error value, plus three cosmetic / refactoring patches.

    - Daniel DeFreez has a patch to fix two glock reference count
    problems, where GFS2 was not properly "uninitializing" its glock
    holder on error paths.

    - Denys Vlasenko has a patch to change a function to not be inlined,
    thus reducing the memory footprint of the GFS2 module"

    * tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    GFS2: Refactor gfs2_remove_from_journal
    GFS2: Remove allocation parms from gfs2_rbm_find
    gfs2: use inode_lock/unlock instead of accessing i_mutex directly
    GFS2: Add calls to gfs2_holder_uninit in two error handlers
    GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
    GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
    gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
    GFS2: Get rid of dead code in inode_go_demote_ok
    GFS2: ignore unlock failures after withdraw

    Linus Torvalds
     

07 May, 2016

1 commit

  • This patch makes two simple changes to function gfs2_remove_from_journal.
    First, it removes the parameter that specifies the transaction.
    Since it's always passed in as current->journal_info, we might as well
    set that in the function rather than passing it in. Second, it changes
    the meta parameter to use an enum to make the code more clear.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

19 Nov, 2015

1 commit

  • Instead of submitting a READ_SYNC bio for the inode and a READA bio for
    the inode's extended attributes through submit_bh, submit a single READ_SYNC
    bio for both through submit_bio when possible. This can be more
    efficient on some kinds of block devices.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

17 Nov, 2015

1 commit

  • When gfs2 allocates an inode and its extended attribute block next to
    each other at inode create time, the inode's directory entry indicates
    that in de_rahead. In that case, we can readahead the extended
    attribute block when we read in the inode.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

04 Sep, 2015

1 commit

  • What uniquely identifies a glock in the glock hash table is not
    gl_name, but gl_name and its superblock pointer. This patch makes
    the gl_name field correspond to a unique glock identifier. That will
    allow us to simplify hashing with a future patch, since the hash
    algorithm can then take the gl_name and hash its components in one
    operation.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Steven Whitehouse

    Bob Peterson
     

05 Jun, 2014

1 commit

  • aops->write_begin may allocate a new page and make it visible only to have
    mark_page_accessed called almost immediately after. Once the page is
    visible the atomic operations are necessary which is noticable overhead
    when writing to an in-memory filesystem like tmpfs but should also be
    noticable with fast storage. The objective of the patch is to initialse
    the accessed information with non-atomic operations before the page is
    visible.

    The bulk of filesystems directly or indirectly use
    grab_cache_page_write_begin or find_or_create_page for the initial
    allocation of a page cache page. This patch adds an init_page_accessed()
    helper which behaves like the first call to mark_page_accessed() but may
    called before the page is visible and can be done non-atomically.

    The primary APIs of concern in this care are the following and are used
    by most filesystems.

    find_get_page
    find_lock_page
    find_or_create_page
    grab_cache_page_nowait
    grab_cache_page_write_begin

    All of them are very similar in detail to the patch creates a core helper
    pagecache_get_page() which takes a flags parameter that affects its
    behavior such as whether the page should be marked accessed or not. Then
    old API is preserved but is basically a thin wrapper around this core
    function.

    Each of the filesystems are then updated to avoid calling
    mark_page_accessed when it is known that the VM interfaces have already
    done the job. There is a slight snag in that the timing of the
    mark_page_accessed() has now changed so in rare cases it's possible a page
    gets to the end of the LRU as PageReferenced where as previously it might
    have been repromoted. This is expected to be rare but it's worth the
    filesystem people thinking about it in case they see a problem with the
    timing change. It is also the case that some filesystems may be marking
    pages accessed that previously did not but it makes sense that filesystems
    have consistent behaviour in this regard.

    The test case used to evaulate this is a simple dd of a large file done
    multiple times with the file deleted on each iterations. The size of the
    file is 1/10th physical memory to avoid dirty page balancing. In the
    async case it will be possible that the workload completes without even
    hitting the disk and will have variable results but highlight the impact
    of mark_page_accessed for async IO. The sync results are expected to be
    more stable. The exception is tmpfs where the normal case is for the "IO"
    to not hit the disk.

    The test machine was single socket and UMA to avoid any scheduling or NUMA
    artifacts. Throughput and wall times are presented for sync IO, only wall
    times are shown for async as the granularity reported by dd and the
    variability is unsuitable for comparison. As async results were variable
    do to writback timings, I'm only reporting the maximum figures. The sync
    results were stable enough to make the mean and stddev uninteresting.

    The performance results are reported based on a run with no profiling.
    Profile data is based on a separate run with oprofile running.

    async dd
    3.15.0-rc3 3.15.0-rc3
    vanilla accessed-v2
    ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%)
    tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%)
    btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%)
    ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%)
    xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%)

    The XFS figure is a bit strange as it managed to avoid a worst case by
    sheer luck but the average figures looked reasonable.

    samples percentage
    ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
    tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
    tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
    tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

    [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
    Signed-off-by: Mel Gorman
    Cc: Johannes Weiner
    Cc: Vlastimil Babka
    Cc: Jan Kara
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Theodore Ts'o
    Cc: "Paul E. McKenney"
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Tested-by: Prabhakar Lad
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

01 Apr, 2014

1 commit

  • Now that rgrps use the address space which is part of the super
    block, we need to update gfs2_mapping2sbd() to take account of
    that. The only way to do that easily is to use a different set
    of address_space_operations for rgrps.

    Reported-by: Abhi Das
    Tested-by: Abhi Das
    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

25 Feb, 2014

1 commit

  • Now we have a master transaction into which other transactions
    are merged, the accounting can be done using this master
    transaction. We no longer require the superblock fields which
    were being used for this function.

    In addition, this allows for a clean up in calc_reserved()
    making it rather easier understand. Also, by reducing the
    number of variables used to track the buffers being added
    and removed from the journal, a number of error checks are
    now no longer required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

03 Jan, 2014

1 commit

  • Prior to this patch, GFS2 had one address space for each rgrp,
    stored in the glock. This patch changes them to use a single
    address space in the super block. This therefore saves
    (sizeof(struct address_space) * nr_of_rgrps) bytes of memory
    and for large filesystems, that can be significant.

    It would be nice to be able to do something similar and merge
    the inode metadata address space into the same global
    address space. However, that is rather more complicated as the
    on-disk location doesn't have a 1:1 mapping with the inodes in
    general. So while it could be done, it will be a more complicated
    operation as it requires changing a lot more code paths.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

14 Dec, 2013

1 commit

  • This patch fixes a slab memory leak that sometimes can occur
    for files with a very short lifespan. The problem occurs when
    a dinode is deleted before it has gotten to the journal properly.
    In the leak scenario, the bd object is pinned for journal
    committment (queued to the metadata buffers queue: sd_log_le_buf)
    but is subsequently unpinned and dequeued before it finds its way
    to the ail or the revoke queue. In this rare circumstance, the bd
    object needs to be freed from slab memory, or it is forgotten.
    We have to be very careful how we do it, though, because
    multiple processes can call gfs2_remove_from_journal. In order to
    avoid double-frees, only the process that does the unpinning is
    allowed to free the bd.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

20 Aug, 2013

1 commit


19 Jun, 2013

1 commit

  • This patch looks at all the outstanding blocks in all the transactions
    on the log, and moves the completed ones to the ail2 list. Then it
    issues revokes for these blocks. This will hopefully speed things up
    in situations where there is a lot of contention for glocks, especially
    if they are acquired serially.

    revoke_lo_before_commit will issue at most one log block's full of these
    preemptive revokes. The amount of reserved log space that
    gfs2_log_reserve() ignores has been incremented to allow for this extra
    block.

    This patch also consolidates the common revoke instructions into one
    function, gfs2_add_revoke().

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

08 Apr, 2013

1 commit

  • In order to allow transactions and log flushes to happen at the same
    time, gfs2 needs to move the transaction accounting and active items
    list code into the gfs2_trans structure. As a first step toward this,
    this patch removes the gfs2_ail structure, and handles the active items
    list in the gfs_trans structure. This keeps gfs2 from allocating an ail
    structure on log flushes, and gives us a struture that can later be used
    to store the transaction accounting outside of the gfs2 superblock
    structure.

    With this patch, at the end of a transaction, gfs2 will add the
    gfs2_trans structure to the superblock if there is not one already.
    This structure now has the active items fields that were previously in
    gfs2_ail. This is not necessary in the case where the transaction was
    simply used to add revokes, since these are never written outside of the
    journal, and thus, don't need an active items list.

    Also, in order to make sure that the transaction structure is not
    removed while it's still in use by gfs2_trans_end, unlocking the
    sd_log_flush_lock has to happen slightly later in ending the
    transaction.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

29 Jan, 2013

1 commit

  • The locking in gfs2_attach_bufdata() was type specific (data/meta)
    which made the function rather confusing. This patch moves the core
    of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata()
    and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta()

    As a result all of the locking related to adding data and metadata to
    the journal is now in these two functions. This should help to clarify
    what is going on, and give us some opportunities to simplify in
    some cases.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

04 Aug, 2012

1 commit


28 Jun, 2012

1 commit

  • This patch fixes buffer_head double free in following code path:

    gfs2_block_map
    => gfs2_meta_inode_buffer
    => gfs2_meta_indirect_buffer
    => gfs2_meta_read
    => release_metapath

    gfs2_block_map calls gfs2_meta_inode_buffer with &mp.mp_bh[0]
    as an argument. mp.mp_bh are filled with zero at the beginning
    of gfs2_block_map.

    If gfs2_meta_inode_buffer returns non-zero value, gfs2_block_map
    calls release_metapath to free buffers chained to mp.mp_bh.
    release_metapath checks each slot of mp.mp_bh[i] and
    free(with brelse) unless the slot is filled with NULL.

    &mp.mp_bh[0] passed to gfs2_meta_inode_buffer is filled at
    gfs2_meta_read. gfs2_meta_read is filled a buffer allocated with
    gfs2_getbuf even if EIO occurs. When EIO occurs, the allocated buffer
    is brelse'ed though the pointer(wrong poiner) points the brelse'ed is
    passed back to caller via an argument bhp.

    gfs2_meta_indirect_buffer, the caller also pass the wrong pointer
    to its caller with EIO. Finally gfs2_block_map gets both EIO and
    &mp.mp_bh[0] filled with the wrong pointer. release_metapath
    calls brelse again on the wrong pointer.

    Signed-off-by: Masatake YAMATO
    Signed-off-by: Steven Whitehouse

    Masatake YAMATO
     

11 May, 2012

1 commit


02 May, 2012

1 commit

  • This patch eliminates the gfs2_log_element data structure and
    rolls its two components into the gfs2_bufdata. This makes the code
    easier to understand and makes it easier to migrate to a rbtree
    to keep the list sorted.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

24 Apr, 2012

1 commit

  • This is another clean up in the logging code. This per-transaction
    list was largely unused. Its main function was to ensure that the
    number of buffers in a transaction was correct, however that counter
    was only used to check the number of buffers in the bd_list_tr, plus
    an assert at the end of each transaction. With the assert now changed
    to use the calculated buffer counts, we can remove both bd_list_tr and
    its associated counter.

    This should make the code easier to understand as well as shrinking
    a couple of structures.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

08 Nov, 2011

1 commit

  • Christoph has split up REQ_PRIO from REQ_META. That means that
    we can drop REQ_PRIO from places where is it not needed. I'm
    not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO
    makes any kind of sense, anyway.

    In addition, I've added REQ_META to one place in the code where
    it was missing. REQ_PRIO has been left for read/writes triggered
    by glock acquisition and writeback only. We can adjust it again
    if required, but these are the most important points from a
    performance perspective.

    Signed-off-by: Steven Whitehouse
    Cc: Christoph Hellwig

    Steven Whitehouse
     

23 Aug, 2011

1 commit

  • Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
    and lave REQ_META purely for marking requests as metadata in blktrace.

    All existing callers of REQ_META except for XFS are updated to also
    set REQ_PRIO for now.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Namhyung Kim
    Signed-off-by: Jens Axboe

    Christoph Hellwig