28 Oct, 2016

1 commit

  • Now that we don't need the common flags to overflow outside the range
    of a 32-bit type we can encode them the same way for both the bio and
    request fields. This in addition allows us to place the operation
    first (and make some room for more ops while we're at it) and to
    stop having to shift around the operation values.

    In addition this allows passing around only one value in the block layer
    instead of two (and eventuall also in the file systems, but we can do
    that later) and thus clean up a lot of code.

    Last but not least this allows decreasing the size of the cmd_flags
    field in struct request to 32-bits. Various functions passing this
    value could also be updated, but I'd like to avoid the churn for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 Jul, 2016

1 commit

  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     

22 Jul, 2016

1 commit

  • Before this patch, if you used gfs2_jadd to add new journals of a
    size smaller than the existing journals, replaying those new journals
    would withdraw. That's because function gfs2_replay_incr_blk was
    using the number of journal blocks (jd_block) from the superblock's
    journal pointer. In other words, "My journal's max size" rather than
    "the journal we're replaying's size." This patch changes the function
    to use the size of the pertinent journal rather than always using the
    journal we happen to be using.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

08 Jun, 2016

2 commits


12 Sep, 2015

1 commit

  • Pull GFS2 updates from Bob Peterson:
    "Here is a list of patches we've accumulated for GFS2 for the current
    upstream merge window. This time we've only got six patches, many of
    which are very small:

    - three cleanups from Andreas Gruenbacher, including a nice cleanup
    of the sequence file code for the sbstats debugfs file.

    - a patch from Ben Hutchings that changes statistics variables from
    signed to unsigned.

    - two patches from me that increase GFS2's glock scalability by
    switching from a conventional hash table to rhashtable"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: A minor "sbstats" cleanup
    gfs2: Fix a typo in a comment
    gfs2: Make statistics unsigned, suitable for use with do_div()
    GFS2: Use resizable hash table for glocks
    GFS2: Move glock superblock pointer to field gl_name
    gfs2: Simplify the seq file code for "sbstats"

    Linus Torvalds
     

04 Sep, 2015

1 commit

  • What uniquely identifies a glock in the glock hash table is not
    gl_name, but gl_name and its superblock pointer. This patch makes
    the gl_name field correspond to a unique glock identifier. That will
    allow us to simplify hashing with a future patch, since the hash
    algorithm can then take the gl_name and hash its components in one
    operation.

    Signed-off-by: Bob Peterson
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Steven Whitehouse

    Bob Peterson
     

14 Aug, 2015

1 commit

  • We can always fill up the bio now, no need to estimate the possible
    size based on queue parameters.

    Acked-by: Steven Whitehouse
    Signed-off-by: Kent Overstreet
    [hch: rebased and wrote a changelog]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

28 Apr, 2014

1 commit


07 Mar, 2014

1 commit

  • If multiple nodes fail and their recovery work runs simultaneously, they
    would use the same unprotected variables in the superblock. For example,
    they would stomp on each other's revoked blocks lists, which resulted
    in file system metadata corruption. This patch moves the necessary
    variables so that each journal has its own separate area for tracking
    its journal replay.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

03 Mar, 2014

1 commit

  • This patch fixes a long standing issue in mapping the journal
    extents. Most journals will consist of only a single extent,
    and although the cache took account of that by merging extents,
    it did not actually map large extents, but instead was doing a
    block by block mapping. Since the journal was only being mapped
    on mount, this was not normally noticeable.

    With the updated code, it is now possible to use the same extent
    mapping system during journal recovery (which will be added in a
    later patch). This will allow checking of the integrity of the
    journal before any reply of the journal content is attempted. For
    this reason the code is moving to bmap.c, since it will be used
    more widely in due course.

    An exercise left for the reader is to compare the new function
    gfs2_map_journal_extents() with gfs2_write_alloc_required()

    Additionally, should there be a failure, the error reporting is
    also updated to show more detail about what went wrong.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

25 Feb, 2014

2 commits

  • Now we have a master transaction into which other transactions
    are merged, the accounting can be done using this master
    transaction. We no longer require the superblock fields which
    were being used for this function.

    In addition, this allows for a clean up in calc_reserved()
    making it rather easier understand. Also, by reducing the
    number of variables used to track the buffers being added
    and removed from the journal, a number of error checks are
    now no longer required.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Over time, we hope to be able to improve the concurrency available
    in the log code. This is one small step towards that, by moving
    the buffer lists from the super block, and into the transaction
    structure, so that each transaction builds its own buffer lists.

    At transaction commit time, the buffer lists are merged into
    the currently accumulating transaction. That transaction then
    is passed into the before and after commit functions at journal
    flush time. Thus there should be no change in overall behaviour
    yet.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

31 Jan, 2014

1 commit

  • Pull core block IO changes from Jens Axboe:
    "The major piece in here is the immutable bio_ve series from Kent, the
    rest is fairly minor. It was supposed to go in last round, but
    various issues pushed it to this release instead. The pull request
    contains:

    - Various smaller blk-mq fixes from different folks. Nothing major
    here, just minor fixes and cleanups.

    - Fix for a memory leak in the error path in the block ioctl code
    from Christian Engelmayer.

    - Header export fix from CaiZhiyong.

    - Finally the immutable biovec changes from Kent Overstreet. This
    enables some nice future work on making arbitrarily sized bios
    possible, and splitting more efficient. Related fixes to immutable
    bio_vecs:

    - dm-cache immutable fixup from Mike Snitzer.
    - btrfs immutable fixup from Muthu Kumar.

    - bio-integrity fix from Nic Bellinger, which is also going to stable"

    * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
    xtensa: fixup simdisk driver to work with immutable bio_vecs
    block/blk-mq-cpu.c: use hotcpu_notifier()
    blk-mq: for_each_* macro correctness
    block: Fix memory leak in rw_copy_check_uvector() handling
    bio-integrity: Fix bio_integrity_verify segment start bug
    block: remove unrelated header files and export symbol
    blk-mq: uses page->list incorrectly
    blk-mq: use __smp_call_function_single directly
    btrfs: fix missing increment of bi_remaining
    Revert "block: Warn and free bio if bi_end_io is not set"
    block: Warn and free bio if bi_end_io is not set
    blk-mq: fix initializing request's start time
    block: blk-mq: don't export blk_mq_free_queue()
    block: blk-mq: make blk_sync_queue support mq
    block: blk-mq: support draining mq queue
    dm cache: increment bi_remaining when bi_end_io is restored
    block: fixup for generic bio chaining
    block: Really silence spurious compiler warnings
    block: Silence spurious compiler warnings
    block: Kill bio_pair_split()
    ...

    Linus Torvalds
     

03 Jan, 2014

2 commits

  • Prior to this patch, GFS2 had one address space for each rgrp,
    stored in the glock. This patch changes them to use a single
    address space in the super block. This therefore saves
    (sizeof(struct address_space) * nr_of_rgrps) bytes of memory
    and for large filesystems, that can be significant.

    It would be nice to be able to do something similar and merge
    the inode metadata address space into the same global
    address space. However, that is rather more complicated as the
    on-disk location doesn't have a 1:1 mapping with the inodes in
    general. So while it could be done, it will be a more complicated
    operation as it requires changing a lot more code paths.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • With the preceding patch, we started accepting block reservations
    smaller than the ideal size, which requires a lot more parsing of the
    bitmaps. To reduce the amount of bitmap searching, this patch
    implements a scheme whereby each rgrp keeps track of the point
    at this multi-block reservations will fail.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

24 Nov, 2013

1 commit

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

20 Aug, 2013

1 commit


19 Jun, 2013

1 commit

  • This patch looks at all the outstanding blocks in all the transactions
    on the log, and moves the completed ones to the ail2 list. Then it
    issues revokes for these blocks. This will hopefully speed things up
    in situations where there is a lot of contention for glocks, especially
    if they are acquired serially.

    revoke_lo_before_commit will issue at most one log block's full of these
    preemptive revokes. The amount of reserved log space that
    gfs2_log_reserve() ignores has been incremented to allow for this extra
    block.

    This patch also consolidates the common revoke instructions into one
    function, gfs2_add_revoke().

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

05 Jun, 2013

2 commits

  • With recent changes to the transactions, it appears that we
    are no longer using the "log ops" for resource groups. Since the
    log commit code processes the array of log ops, eliminating this
    should be marginally better for performance. Therefore this patch
    eliminates it.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch simply sort the data and metadata buffer lists by their
    inplace block number. This makes gfs2_log_flush issue the inplace IO
    in sequential order, which will hopefully speed up writing the IO
    out to disk.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

03 Jun, 2013

1 commit

  • This patch sets the log descriptor type according to whether the
    journal commit is for (journaled) data or metadata. This was
    recently broken when the functions to process data and metadata
    log ops were combined.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

24 May, 2013

1 commit


09 May, 2013

1 commit

  • Pull block core updates from Jens Axboe:

    - Major bit is Kents prep work for immutable bio vecs.

    - Stable candidate fix for a scheduling-while-atomic in the queue
    bypass operation.

    - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
    discard bios.

    - Tejuns changes to convert the writeback thread pool to the generic
    workqueue mechanism.

    - Runtime PM framework, SCSI patches exists on top of these in James'
    tree.

    - A few random fixes.

    * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
    relay: move remove_buf_file inside relay_close_buf
    partitions/efi.c: replace useless kzalloc's by kmalloc's
    fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
    block: fix max discard sectors limit
    blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
    Documentation: cfq-iosched: update documentation help for cfq tunables
    writeback: expose the bdi_wq workqueue
    writeback: replace custom worker pool implementation with unbound workqueue
    writeback: remove unused bdi_pending_list
    aoe: Fix unitialized var usage
    bio-integrity: Add explicit field for owner of bip_buf
    block: Add an explicit bio flag for bios that own their bvec
    block: Add bio_alloc_pages()
    block: Convert some code to bio_for_each_segment_all()
    block: Add bio_for_each_segment_all()
    bounce: Refactor __blk_queue_bounce to not use bi_io_vec
    raid1: use bio_copy_data()
    pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
    pktcdvd: use bio_copy_data()
    block: Add bio_copy_data()
    ...

    Linus Torvalds
     

08 Apr, 2013

1 commit

  • In order to allow transactions and log flushes to happen at the same
    time, gfs2 needs to move the transaction accounting and active items
    list code into the gfs2_trans structure. As a first step toward this,
    this patch removes the gfs2_ail structure, and handles the active items
    list in the gfs_trans structure. This keeps gfs2 from allocating an ail
    structure on log flushes, and gives us a struture that can later be used
    to store the transaction accounting outside of the gfs2 superblock
    structure.

    With this patch, at the end of a transaction, gfs2 will add the
    gfs2_trans structure to the superblock if there is not one already.
    This structure now has the active items fields that were previously in
    gfs2_ail. This is not necessary in the case where the transaction was
    simply used to add revokes, since these are never written outside of the
    journal, and thus, don't need an active items list.

    Also, in order to make sure that the transaction structure is not
    removed while it's still in use by gfs2_trans_end, unlocking the
    sd_log_flush_lock has to happen slightly later in ending the
    transaction.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     

24 Mar, 2013

1 commit

  • Just a little convenience macro - main reason to add it now is preparing
    for immutable bio vecs, it'll reduce the size of the patch that puts
    bi_sector/bi_size/bi_idx into a struct bvec_iter.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: Lars Ellenberg
    CC: Jiri Kosina
    CC: Alasdair Kergon
    CC: dm-devel@redhat.com
    CC: Neil Brown
    CC: Martin Schwidefsky
    CC: Heiko Carstens
    CC: linux-s390@vger.kernel.org
    CC: Chris Mason
    CC: Steven Whitehouse
    Acked-by: Steven Whitehouse

    Kent Overstreet
     

29 Jan, 2013

2 commits


07 Nov, 2012

2 commits

  • In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
    buffer without having the gfs2_log_lock held. It was then assuming it would
    stay attached for the rest of the function. However, without either the log
    lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
    time. This patch moves the locking before the test. If there isn't a bd
    already attached, gfs2 can safely allocate one and attach it before locking.
    There is no way that the newly allocated bd could be on the ail list,
    and thus no way for __gfs2_ail_flush() to detach it.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • Cleans up two cases where variables were assigned values but then never
    used again.

    Signed-off-by: Andrew Price
    Signed-off-by: Steven Whitehouse

    Andrew Price
     

06 Jun, 2012

1 commit

  • When we read an invalid block from the journal, we should not call
    withdraw, but simply print a message and return an error. It is
    up to the caller to then handle that error. In the case of mount
    that means a failed mount, rather than a withdraw (requiring a
    reboot). In the case of recovering another nodes journal then
    we return an error via the uevent.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

02 May, 2012

1 commit

  • This patch eliminates the gfs2_log_element data structure and
    rolls its two components into the gfs2_bufdata. This makes the code
    easier to understand and makes it easier to migrate to a rbtree
    to keep the list sorted.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     

24 Apr, 2012

5 commits

  • This patch removes a log lock from around atomic operation where
    it is not needed, removes an unused variable, and also changes
    a void pointer used incorrectly to a struct page pointer.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is another clean up in the logging code. This per-transaction
    list was largely unused. Its main function was to ensure that the
    number of buffers in a transaction was correct, however that counter
    was only used to check the number of buffers in the bd_list_tr, plus
    an assert at the end of each transaction. With the assert now changed
    to use the calculated buffer counts, we can remove both bd_list_tr and
    its associated counter.

    This should make the code easier to understand as well as shrinking
    a couple of structures.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • The main part of this patch merges the two functions used to
    write metadata and data buffers to the log. Most of the code
    is common between the two functions, so this provides a nice
    clean up, and makes the code more readable.

    The gfs2_get_log_desc() function is also extended to take two more
    arguments, and thus avoid having to set the length and data1
    fields of this strucuture as a separate operation.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Prior to this patch, we have two ways of sending i/o to the log.
    One of those is used when we need to allocate both the data
    to be written itself and also a buffer head to submit it. This
    is done via sb_getblk and friends. This is used mostly for writing
    log headers.

    The other method is used when writing blocks which have some
    in-place counterpart. This is the case for all the metadata
    blocks which are journalled, and when journaled data is in use,
    for unescaped journalled data blocks.

    This patch replaces both of those two methods, and about half
    a dozen separate i/o submission points with a single i/o
    submission function. We also go direct to bio rather than
    using buffer heads, since this allows us to build i/o
    requests of the maximum size for the block device in
    question. It also reduces the memory required for flushing
    the log, which can be very useful in low memory situations.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Since we always write the buffer directly after this function
    returns, we might as well merge it into here. This is a clean
    up in preparation for some further updates to the log code
    which are coming soon.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     

22 Mar, 2012

1 commit

  • Pull gfs2 changes from Steven Whitehouse.

    * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
    GFS2: Change truncate page allocation to be GFP_NOFS
    GFS2: call gfs2_write_alloc_required for each chunk
    GFS2: Clean up log flush header writing
    GFS2: Remove a __GFP_NOFAIL allocation
    GFS2: Flush pending glock work when evicting an inode
    GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd
    GFS2: Eliminate sd_rindex_mutex
    GFS2: Unlock rindex mutex on glock error
    GFS2: Make bd_cmp() static
    GFS2: Sort the ordered write list
    GFS2: FITRIM ioctl support
    GFS2: Move two functions from log.c to lops.c
    GFS2: glock statistics gathering

    Linus Torvalds
     

20 Mar, 2012

1 commit