25 Sep, 2020

2 commits

  • Drivers shouldn't really mess with the readahead size, as that is a VM
    concept. Instead set it based on the optimal I/O size by lifting the
    algorithm from the md driver when registering the disk. Also set
    bdi->io_pages there as well by applying the same scheme based on
    max_sectors. To ensure the limits work well for stacking drivers a
    new helper is added to update the readahead limits from the block
    limits, which is also called from disk_stack_limits.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Jan Kara
    Reviewed-by: Mike Snitzer
    Reviewed-by: Martin K. Petersen
    Acked-by: Coly Li
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • aoe forces a larger readahead size, but any reason to do larger I/O
    is not limited to readahead. Also set the optimal I/O size, and
    remove the local constants in favor of just using SZ_2G.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

02 Sep, 2020

1 commit

  • Two different callers use two different mutexes for updating the
    block device size, which obviously doesn't help to actually protect
    against concurrent updates from the different callers. In addition
    one of the locks, bd_mutex is rather prone to deadlocks with other
    parts of the block stack that use it for high level synchronization.

    Switch to using a new spinlock protecting just the size updates, as
    that is all we need, and make sure everyone does the update through
    the proper helper.

    This fixes a bug reported with the nvme revalidating disks during a
    hot removal operation, which can currently deadlock on bd_mutex.

    Reported-by: Xianting Tian
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

11 May, 2020

1 commit


12 Mar, 2020

1 commit


03 Jan, 2020

1 commit

  • These drivers implement the HDIO_GET_IDENTITY and CDROMVOLREAD ioctl
    commands, which are compatible between 32-bit and 64-bit user space and
    traditionally handled by compat_blkdev_driver_ioctl().

    As a prerequisite to removing that function, make both drivers use
    blkdev_compat_ptr_ioctl() as their .compat_ioctl callback.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

08 Aug, 2019

1 commit

  • Since commit 3582dd291788 ("aoe: convert aoeblk to blk-mq"), aoedev_downdev
    has had the possibility of sleeping and causing the following crash.

    BUG: scheduling while atomic: rmmod/2242/0x00000003
    Modules linked in: aoe
    Preemption disabled at:
    [] flush+0x95/0x4a0 [aoe]
    CPU: 7 PID: 2242 Comm: rmmod Tainted: G I 5.2.3 #1
    Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009
    Call Trace:
    dump_stack+0x4f/0x6a
    ? flush+0x95/0x4a0 [aoe]
    __schedule_bug.cold+0x44/0x54
    __schedule+0x44f/0x680
    schedule+0x44/0xd0
    blk_mq_freeze_queue_wait+0x46/0xb0
    ? wait_woken+0x80/0x80
    blk_mq_freeze_queue+0x1b/0x20
    aoedev_downdev+0x111/0x160 [aoe]
    flush+0xff/0x4a0 [aoe]
    aoedev_exit+0x23/0x30 [aoe]
    aoe_exit+0x35/0x948 [aoe]
    __se_sys_delete_module+0x183/0x210
    __x64_sys_delete_module+0x16/0x20
    do_syscall_64+0x4d/0x130
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f24e0043b07
    Code: 73 01 c3 48 8b 0d 89 73 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f
    1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 3d 01 f0 ff
    ff 73 01 c3 48 8b 0d 59 73 0b 00 f7 d8 64 89 01 48
    RSP: 002b:00007ffe18f7f1e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f24e0043b07
    RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555c3ecf87c8
    RBP: 00007ffe18f7f1f0 R08: 0000000000000000 R09: 0000000000000000
    R10: 00007f24e00b4ac0 R11: 0000000000000206 R12: 00007ffe18f7f238
    R13: 00007ffe18f7f410 R14: 00007ffe18f80e73 R15: 0000555c3ecf8760

    This patch, handling in the same way of pass two, unlocks the locks and
    restart pass one after aoedev_downdev is done.

    Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
    Signed-off-by: He Zhe
    Signed-off-by: Jens Axboe

    He Zhe
     

05 Jun, 2019

1 commit


21 May, 2019

1 commit


17 Dec, 2018

1 commit


10 Nov, 2018

1 commit


15 Oct, 2018

1 commit

  • Straight forward conversion - instead of rewriting the internal buffer
    retrieval logic, just replace the previous elevator peeking with an
    internal list of requests.

    Reviewed-by: "Ed L. Cashin"
    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Sep, 2018

1 commit


08 Aug, 2018

1 commit


02 Aug, 2018

1 commit


25 May, 2018

1 commit

  • Convert the S_ symbolic permissions to their octal equivalents as
    using octal and not symbolic permissions is preferred by many as more
    readable.

    see: https://lkml.org/lkml/2016/8/2/1945

    Done with automated conversion via:
    $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace

    Miscellanea:

    o Wrapped modified multi-line calls to a single line where appropriate
    o Realign modified multi-line calls to open parenthesis

    Signed-off-by: Joe Perches
    Signed-off-by: Jens Axboe

    Joe Perches
     

12 May, 2018

1 commit


17 Jan, 2018

1 commit

  • 'struct frame' uses two variables to store the sent timestamp - 'struct
    timeval' and jiffies. jiffies is used to avoid discrepancies caused by
    updates to system time. 'struct timeval' is deprecated because it uses
    32-bit representation for seconds which will overflow in year 2038.

    This patch does the following:
    - Replace the use of 'struct timeval' and jiffies with ktime_t, which
    is the recommended type for timestamping
    - ktime_t provides both long range (like jiffies) and high resolution
    (like timeval). Using ktime_get (monotonic time) instead of wall-clock
    time prevents any discprepancies caused by updates to system time.

    [updates by Arnd below]
    The original patch from Tina never went anywhere as we discussed how
    to keep the impact on performance minimal. I've started over now but
    arrived at basically the same patch that she had originally, except for
    an slightly improved tsince_hr() function. I'm making it more robust
    against overflows, and also optimize explicitly for the common case
    in which a frame is less than 4.2 seconds old, using only a 32-bit
    division in that case.

    This should make the new version more efficient than the old code,
    since we replace the existing two 32-bit division in do_gettimeofday()
    plus one multiplication with a single single 32-bit division in
    tsince_hr() and drop the double bookkeeping. It's also more efficient
    than the ktime_get_us() API we discussed before, since that would
    also rely on multiple divisions.

    Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.html
    Signed-off-by: Tina Ruchandani
    Cc: Ed Cashin
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Tina Ruchandani
     

22 Nov, 2017

1 commit

  • With all callbacks converted, and the timer callback prototype
    switched over, the TIMER_FUNC_TYPE cast is no longer needed,
    so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
    $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
    $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

    The now unused macros are also dropped from include/linux/timer.h.

    Signed-off-by: Kees Cook

    Kees Cook
     

15 Nov, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    Cc: Jens Axboe
    Cc: "Ed L. Cashin"
    Cc: linux-block@vger.kernel.org
    Cc: Thomas Gleixner
    Signed-off-by: Kees Cook
    Signed-off-by: Jens Axboe

    Kees Cook
     

07 Nov, 2017

1 commit

  • In preparation for unconditionally passing the struct timer_list pointer to
    all timer callbacks, switch to using the new timer_setup() and from_timer()
    to pass the timer pointer explicitly.

    This refactors the discover_timer to remove the needless locking and
    state machine used for synchronizing timer death. Using del_timer_sync()
    will already do the right thing.

    Cc: Jens Axboe
    Cc: "Ed L. Cashin"
    Cc: Thomas Gleixner
    Signed-off-by: Kees Cook

    Kees Cook
     

28 Jun, 2017

1 commit


09 Jun, 2017

2 commits

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Currently we use nornal Linux errno values in the block layer, and while
    we accept any error a few have overloaded magic meanings. This patch
    instead introduces a new blk_status_t value that holds block layer specific
    status codes and explicitly explains their meaning. Helpers to convert from
    and to the previous special meanings are provided for now, but I suspect
    we want to get rid of them in the long run - those drivers that have a
    errno input (e.g. networking) usually get errnos that don't know about
    the special block layer overloads, and similarly returning them to userspace
    will usually return somethings that strictly speaking isn't correct
    for file system operations, but that's left as an exercise for later.

    For now the set of errors is a very limited set that closely corresponds
    to the previous overloaded errno values, but there is some low hanging
    fruite to improve it.

    blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
    typechecking, so that we can easily catch places passing the wrong values.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

02 Feb, 2017

1 commit

  • We will want to have struct backing_dev_info allocated separately from
    struct request_queue. As the first step add pointer to backing_dev_info
    to request_queue and convert all users touching it. No functional
    changes in this patch.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

12 Nov, 2016

1 commit

  • aoeblk contains some mysterious code, that wants to elevate the bio
    vec page counts while it's under IO. That is not needed, it's
    fragile, and it's causing kernel oopses for some.

    Reported-by: Tested-by: Don Koch
    Tested-by: Tested-by: Don Koch
    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 Jun, 2016

1 commit

  • This is the third version of the patchset previously sent [1]. I have
    basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
    rid of superfluous gfp flags" which went through dm tree. I am sending
    it now because it is tree wide and chances for conflicts are reduced
    considerably when we want to target rc2. I plan to send the next step
    and rename the flag and move to a better semantic later during this
    release cycle so we will have a new semantic ready for 4.8 merge window
    hopefully.

    Motivation:

    While working on something unrelated I've checked the current usage of
    __GFP_REPEAT in the tree. It seems that a majority of the usage is and
    always has been bogus because __GFP_REPEAT has always been about costly
    high order allocations while we are using it for order-0 or very small
    orders very often. It seems that a big pile of them is just a
    copy&paste when a code has been adopted from one arch to another.

    I think it makes some sense to get rid of them because they are just
    making the semantic more unclear. Please note that GFP_REPEAT is
    documented as

    * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt

    * _might_ fail. This depends upon the particular VM implementation.
    while !costly requests have basically nofail semantic. So one could
    reasonably expect that order-0 request with __GFP_REPEAT will not loop
    for ever. This is not implemented right now though.

    I would like to move on with __GFP_REPEAT and define a better semantic
    for it.

    $ git grep __GFP_REPEAT origin/master | wc -l
    111
    $ git grep __GFP_REPEAT | wc -l
    36

    So we are down to the third after this patch series. The remaining
    places really seem to be relying on __GFP_REPEAT due to large allocation
    requests. This still needs some double checking which I will do later
    after all the simple ones are sorted out.

    I am touching a lot of arch specific code here and I hope I got it right
    but as a matter of fact I even didn't compile test for some archs as I
    do not have cross compiler for them. Patches should be quite trivial to
    review for stupid compile mistakes though. The tricky parts are usually
    hidden by macro definitions and thats where I would appreciate help from
    arch maintainers.

    [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org

    This patch (of 19):

    __GFP_REPEAT has a rather weak semantic but since it has been introduced
    around 2.6.12 it has been ignored for low order allocations. Yet we
    have the full kernel tree with its usage for apparently order-0
    allocations. This is really confusing because __GFP_REPEAT is
    explicitly documented to allow allocation failures which is a weaker
    semantic than the current order-0 has (basically nofail).

    Let's simply drop __GFP_REPEAT from those places. This would allow to
    identify place which really need allocator to retry harder and formulate
    a more specific semantic for what the flag is supposed to do actually.

    Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: "David S. Miller"
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: "Theodore Ts'o"
    Cc: Andy Lutomirski
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Chen Liqin
    Cc: Chris Metcalf [for tile]
    Cc: Guan Xuetao
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: John Crispin
    Cc: Lennox Wu
    Cc: Ley Foon Tan
    Cc: Martin Schwidefsky
    Cc: Matt Fleming
    Cc: Ralf Baechle
    Cc: Rich Felker
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

20 May, 2016

1 commit

  • Many developers already know that field for reference count of the
    struct page is _count and atomic type. They would try to handle it
    directly and this could break the purpose of page reference count
    tracepoint. To prevent direct _count modification, this patch rename it
    to _refcount and add warning message on the code. After that, developer
    who need to handle reference count will find that field should not be
    accessed directly.

    [akpm@linux-foundation.org: fix comments, per Vlastimil]
    [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
    [sfr@canb.auug.org.au: sync ethernet driver changes]
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Stephen Rothwell
    Cc: Vlastimil Babka
    Cc: Hugh Dickins
    Cc: Johannes Berg
    Cc: "David S. Miller"
    Cc: Sunil Goutham
    Cc: Chris Metcalf
    Cc: Manish Chopra
    Cc: Yuval Mintz
    Cc: Tariq Toukan
    Cc: Saeed Mahameed
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

18 Mar, 2016

1 commit

  • The success of CMA allocation largely depends on the success of
    migration and key factor of it is page reference count. Until now, page
    reference is manipulated by direct calling atomic functions so we cannot
    follow up who and where manipulate it. Then, it is hard to find actual
    reason of CMA allocation failure. CMA allocation should be guaranteed
    to succeed so finding offending place is really important.

    In this patch, call sites where page reference is manipulated are
    converted to introduced wrapper function. This is preparation step to
    add tracepoint to each page reference manipulation function. With this
    facility, we can easily find reason of CMA allocation failure. There is
    no functional change in this patch.

    In addition, this patch also converts reference read sites. It will
    help a second step that renames page._count to something else and
    prevents later attempt to direct access to it (Suggested by Andrew).

    Signed-off-by: Joonsoo Kim
    Acked-by: Michal Nazarewicz
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: "Kirill A. Shutemov"
    Cc: Sergey Senozhatsky
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

19 Aug, 2015

1 commit

  • This reverts commit 34b48db66e08ca1c1bc07cf305d672ac940268dc.
    That commit caused performance regressions for streaming I/O
    workloads on a number of different storage devices, from
    SATA disks to external RAID arrays. It also managed to
    trip up some buggy firmware in at least one drive, causing
    data corruption.

    The next patch will bump the default max_sectors_kb value to
    1280, which will accommodate a 10-data-disk stripe write
    with chunk size 128k. In the testing I've done using iozone,
    fio, and aio-stress, a value of 1280 does not show a big
    performance difference from 512. This will hopefully still
    help the software RAID setup that Christoph saw the original
    performance gains with while still not regressing other
    storage configurations.

    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

22 Oct, 2014

1 commit

  • Set max_sectors to the value the drivers provides as hardware limit by
    default. Linux had proper I/O throttling for a long time and doesn't
    rely on a artifically small maximum I/O size anymore. By not limiting
    the I/O size by default we remove an annoying tuning step required for
    most Linux installation.

    Note that both the user, and if absolutely required the driver can still
    impose a limit for FS requests below max_hw_sectors_kb.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

04 Mar, 2014

1 commit

  • Commit bf6bddf1924e ("mm: introduce compaction and migration for
    ballooned pages") introduces page_count(page) into memory compaction
    which dereferences page->first_page if PageTail(page).

    This results in a very rare NULL pointer dereference on the
    aforementioned page_count(page). Indeed, anything that does
    compound_head(), including page_count() is susceptible to racing with
    prep_compound_page() and seeing a NULL or dangling page->first_page
    pointer.

    This patch uses Andrea's implementation of compound_trans_head() that
    deals with such a race and makes it the default compound_head()
    implementation. This includes a read memory barrier that ensures that
    if PageTail(head) is true that we return a head page that is neither
    NULL nor dangling. The patch then adds a store memory barrier to
    prep_compound_page() to ensure page->first_page is set.

    This is the safest way to ensure we see the head page that we are
    expecting, PageTail(page) is already in the unlikely() path and the
    memory barriers are unfortunately required.

    Hugetlbfs is the exception, we don't enforce a store memory barrier
    during init since no race is possible.

    Signed-off-by: David Rientjes
    Cc: Holger Kiehl
    Cc: Christoph Lameter
    Cc: Rafael Aquini
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: "Kirill A. Shutemov"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

24 Nov, 2013

4 commits

  • Now that we've got a mechanism for immutable biovecs -
    bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
    respect it instead of using the bvec array directly.

    The aoe code no longer has to manually iterate over partial bvecs, so
    some struct members go away - other struct members are effectively
    renamed:

    buf->resid -> buf->iter.bi_size
    buf->sector -> buf->iter.bi_sector

    f->bcnt -> f->iter.bi_size
    f->lba -> f->iter.bi_sector

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: "Ed L. Cashin"

    Kent Overstreet
     
  • More prep work for immutable biovecs - with immutable bvecs drivers
    won't be able to use the biovec directly, they'll need to use helpers
    that take into account bio->bi_iter.bi_bvec_done.

    This updates callers for the new usage without changing the
    implementation yet.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Paul Clements
    Cc: Jim Paris
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Nagalakshmi Nandigama
    Cc: Sreekanth Reddy
    Cc: support@lsi.com
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: Alexander Viro
    Cc: Steven Whitehouse
    Cc: Herton Ronaldo Krzesinski
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Matthew Wilcox
    Cc: Keith Busch
    Cc: Stephen Hemminger
    Cc: Quoc-Son Anh
    Cc: Sebastian Ott
    Cc: Nitin Gupta
    Cc: Minchan Kim
    Cc: Jerome Marchand
    Cc: Seth Jennings
    Cc: "Martin K. Petersen"
    Cc: Mike Snitzer
    Cc: Vivek Goyal
    Cc: "Darrick J. Wong"
    Cc: Chris Metcalf
    Cc: Jan Kara
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: drbd-user@lists.linbit.com
    Cc: nbd-general@lists.sourceforge.net
    Cc: cbe-oss-dev@lists.ozlabs.org
    Cc: xen-devel@lists.xensource.com
    Cc: virtualization@lists.linux-foundation.org
    Cc: linux-raid@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: DL-MPTFusionLinux@lsi.com
    Cc: linux-scsi@vger.kernel.org
    Cc: devel@driverdev.osuosl.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: cluster-devel@redhat.com
    Cc: linux-mm@kvack.org
    Acked-by: Geoff Levand

    Kent Overstreet
     
  • For immutable biovecs, we'll be introducing a new bio_iovec() that uses
    our new bvec iterator to construct a biovec, taking into account
    bvec_iter->bi_bvec_done - this patch updates existing users for the new
    usage.

    Some of the existing users really do need a pointer into the bvec array
    - those uses are all going to be removed, but we'll need the
    functionality from immutable to remove them - so for now rename the
    existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
    patches.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: "Ed L. Cashin"
    Cc: Alasdair Kergon
    Cc: dm-devel@redhat.com
    Cc: "James E.J. Bottomley"

    Kent Overstreet
     
  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet