02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

14 Oct, 2017

1 commit

  • When using FAT on a block device which supports rw_page, we can hit
    BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we
    call clean_buffers() after unlocking the page we've written. Introduce
    a new clean_page_buffers() which cleans all buffers associated with a
    page and call it from within bdev_write_page().

    [akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew]
    Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org
    Signed-off-by: Matthew Wilcox
    Reported-by: Toshi Kani
    Reported-by: OGAWA Hirofumi
    Tested-by: Toshi Kani
    Acked-by: Johannes Thumshirn
    Cc: Ross Zwisler
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

24 Aug, 2017

1 commit

  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

04 Jul, 2017

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "There has been a fair amount of activity in the docs tree this time
    around. Highlights include:

    - Conversion of a bunch of security documentation into RST

    - The conversion of the remaining DocBook templates by The Amazing
    Mauro Machine. We can now drop the entire DocBook build chain.

    - The usual collection of fixes and minor updates"

    * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
    scripts/kernel-doc: handle DECLARE_HASHTABLE
    Documentation: atomic_ops.txt is core-api/atomic_ops.rst
    Docs: clean up some DocBook loose ends
    Make the main documentation title less Geocities
    Docs: Use kernel-figure in vidioc-g-selection.rst
    Docs: fix table problems in ras.rst
    Docs: Fix breakage with Sphinx 1.5 and upper
    Docs: Include the Latex "ifthen" package
    doc/kokr/howto: Only send regression fixes after -rc1
    docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
    doc: Document suitability of IBM Verse for kernel development
    Doc: fix a markup error in coding-style.rst
    docs: driver-api: i2c: remove some outdated information
    Documentation: DMA API: fix a typo in a function name
    Docs: Insert missing space to separate link from text
    doc/ko_KR/memory-barriers: Update control-dependencies example
    Documentation, kbuild: fix typo "minimun" -> "minimum"
    docs: Fix some formatting issues in request-key.rst
    doc: ReSTify keys-trusted-encrypted.txt
    doc: ReSTify keys-request-key.txt
    ...

    Linus Torvalds
     

28 Jun, 2017

1 commit


09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

16 May, 2017

1 commit

  • Sphinx gets confused when it finds identation without a
    good reason for it and without a preceding blank line:

    ./fs/mpage.c:347: ERROR: Unexpected indentation.
    ./fs/namei.c:4303: ERROR: Unexpected indentation.
    ./fs/fs-writeback.c:2060: ERROR: Unexpected indentation.

    No functional changes.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

28 Feb, 2017

1 commit

  • Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
    branch.

    This patch also fixes multiple checkpatch warnings: WARNING: Prefer
    'unsigned int' to bare use of 'unsigned'

    Thanks to Andrew Morton for suggesting more appropriate function instead
    of macro.

    [geliangtang@gmail.com: truncate: use i_blocksize()]
    Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
    Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Signed-off-by: Geliang Tang
    Cc: Alexander Viro
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     

05 Nov, 2016

1 commit

  • Add a helper function that clears buffer heads from a block device
    aliasing passed bh. Use this helper function from filesystems instead of
    the original unmap_underlying_metadata() to save some boiler plate code
    and also have a better name for the functionalily since it is not
    unmapping anything for a *long* time.

    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

03 Nov, 2016

1 commit

  • Add wbc_to_write_flags(), which returns the write modifier flags to use,
    based on a struct writeback_control. No functional changes in this
    patch, but it prepares us for factoring other wbc fields for write type.

    Signed-off-by: Jens Axboe
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig

    Jens Axboe
     

01 Nov, 2016

1 commit


08 Aug, 2016

1 commit

  • Commit abf545484d31 changed it from an 'rw' flags type to the
    newer ops based interface, but now we're effectively leaking
    some bdev internals to the rest of the kernel. Since we only
    care about whether it's a read or a write at that level, just
    pass in a bool 'is_write' parameter instead.

    Then we can also move op_is_write() and friends back under
    CONFIG_BLOCK protection.

    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Aug, 2016

1 commit

  • The rw_page users were not converted to use bio/req ops. As a result
    bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
    be sent down as reads.

    Signed-off-by: Mike Christie
    Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code")

    Modified by me to:

    1) Drop op_flags passing into ->rw_page(), as we don't use it.
    2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK

    Signed-off-by: Jens Axboe

    Mike Christie
     

27 Jul, 2016

2 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds
     
  • Vladimir has noticed that we might declare memcg oom even during
    readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
    restriction) while __do_page_cache_readahead uses
    page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
    OOMs. This gfp mask discrepancy is really unfortunate and easily
    fixable. Drop page_cache_alloc_readahead() which only has one user and
    outsource the gfp_mask logic into readahead_gfp_mask and propagate this
    mask from __do_page_cache_readahead down to read_pages.

    This alone would have only very limited impact as most filesystems are
    implementing ->readpages and the common implementation mpage_readpages
    does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
    use readahead_gfp_mask instead as this function is called only during
    readahead as well. The same applies to read_cache_pages.

    ext4 has its own ext4_mpage_readpages but the path which has pages !=
    NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
    doing a very similar pattern to mpage_readpages so the same can be
    applied to them as well.

    [akpm@linux-foundation.org: coding-style fixes]
    [mhocko@suse.com: restrict gfp mask in mpage_alloc]
    Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Chris Mason
    Cc: Steve French
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Cc: Mike Marshall
    Cc: Jaegeuk Kim
    Cc: Changman Lee
    Cc: Chao Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

08 Jun, 2016

2 commits


05 Apr, 2016

2 commits

  • Mostly direct substitution with occasional adjustment or removing
    outdated comments.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

16 Mar, 2016

1 commit


07 Nov, 2015

1 commit

  • There are many places which use mapping_gfp_mask to restrict a more
    generic gfp mask which would be used for allocations which are not
    directly related to the page cache but they are performed in the same
    context.

    Let's introduce a helper function which makes the restriction explicit and
    easier to track. This patch doesn't introduce any functional changes.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Michal Hocko
    Suggested-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

05 Nov, 2015

1 commit

  • Pull core block updates from Jens Axboe:
    "This is the core block pull request for 4.4. I've got a few more
    topic branches this time around, some of them will layer on top of the
    core+drivers changes and will come in a separate round. So not a huge
    chunk of changes in this round.

    This pull request contains:

    - Enable blk-mq page allocation tracking with kmemleak, from Catalin.

    - Unused prototype removal in blk-mq from Christoph.

    - Cleanup of the q->blk_trace exchange, using cmpxchg instead of two
    xchg()'s, from Davidlohr.

    - A plug flush fix from Jeff.

    - Also from Jeff, a fix that means we don't have to update shared tag
    sets at init time unless we do a state change. This cuts down boot
    times on thousands of devices a lot with scsi/blk-mq.

    - blk-mq waitqueue barrier fix from Kosuke.

    - Various fixes from Ming:

    - Fixes for segment merging and splitting, and checks, for
    the old core and blk-mq.

    - Potential blk-mq speedup by marking ctx pending at the end
    of a plug insertion batch in blk-mq.

    - direct-io no page dirty on kernel direct reads.

    - A WRITE_SYNC fix for mpage from Roman"

    * 'for-4.4/core' of git://git.kernel.dk/linux-block:
    blk-mq: avoid excessive boot delays with large lun counts
    blktrace: re-write setting q->blk_trace
    blk-mq: mark ctx as pending at batch in flush plug path
    blk-mq: fix for trace_block_plug()
    block: check bio_mergeable() early before merging
    blk-mq: check bio_mergeable() early before merging
    block: avoid to merge splitted bio
    block: setup bi_phys_segments after splitting
    block: fix plug list flushing for nomerge queues
    blk-mq: remove unused blk_mq_clone_flush_request prototype
    blk-mq: fix waitqueue_active without memory barrier in block/blk-mq-tag.c
    fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read
    fs/mpage.c: forgotten WRITE_SYNC in case of data integrity write
    block: kmemleak: Track the page allocations for struct request

    Linus Torvalds
     

17 Oct, 2015

1 commit

  • Commit 6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache
    allocation paths") has caught some users of hardcoded GFP_KERNEL used in
    the page cache allocation paths. This, however, wasn't complete and
    there were others which went unnoticed.

    Dave Chinner has reported the following deadlock for xfs on loop device:
    : With the recent merge of the loop device changes, I'm now seeing
    : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
    :
    : The deadlocked is as follows:
    :
    : kloopd1: loop_queue_read_work
    : xfs_file_iter_read
    : lock XFS inode XFS_IOLOCK_SHARED (on image file)
    : page cache read (GFP_KERNEL)
    : radix tree alloc
    : memory reclaim
    : reclaim XFS inodes
    : log force to unpin inodes
    :
    :
    : xfs-cil/loop1:
    : xlog_cil_push
    : xlog_write
    :
    : xlog_state_get_iclog_space()
    :
    :
    :
    : kloopd1: loop_queue_write_work
    : xfs_file_write_iter
    : lock XFS inode XFS_IOLOCK_EXCL (on image file)
    :
    :
    : i.e. the kloopd, with it's split read and write work queues, has
    : introduced a dependency through memory reclaim. i.e. that writes
    : need to be able to progress for reads make progress.
    :
    : The problem, fundamentally, is that mpage_readpages() does a
    : GFP_KERNEL allocation, rather than paying attention to the inode's
    : mapping gfp mask, which is set to GFP_NOFS.
    :
    : The didn't used to happen, because the loop device used to issue
    : reads through the splice path and that does:
    :
    : error = add_to_page_cache_lru(page, mapping, index,
    : GFP_KERNEL & mapping_gfp_mask(mapping));

    This has changed by commit aa4d86163e4 ("block: loop: switch to VFS
    ITER_BVEC").

    This patch changes mpage_readpage{s} to follow gfp mask set for the
    mapping. There are, however, other places which are doing basically the
    same.

    lustre:ll_dir_filler is doing GFP_KERNEL from the function which
    apparently uses GFP_NOFS for other allocations so let's make this
    consistent.

    cifs:readpages_get_pages is called from cifs_readpages and
    __cifs_readpages_from_fscache called from the same path obeys mapping
    gfp.

    ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
    regardless it uses mapping_gfp_mask for the page allocation.

    ext4_mpage_readpages is the called from the page cache allocation path
    same as read_pages and read_cache_pages

    As I've noticed in my previous post I cannot say I would be happy about
    sprinkling mapping_gfp_mask all over the place and it sounds like we
    should drop gfp_mask argument altogether and use it internally in
    __add_to_page_cache_locked that would require all the filesystems to use
    mapping gfp consistently which I am not sure is the case here. From a
    quick glance it seems that some file system use it all the time while
    others are selective.

    Signed-off-by: Michal Hocko
    Reported-by: Dave Chinner
    Cc: "Theodore Ts'o"
    Cc: Ming Lei
    Cc: Andreas Dilger
    Cc: Oleg Drokin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

24 Sep, 2015

1 commit

  • In case of wbc->sync_mode == WB_SYNC_ALL we need to do data integrity
    write, thus mark request as WRITE_SYNC.

    akpm: afaict this change will cause the data integrity write bios to be
    placed onto the second queue in cfq_io_cq.cfqq[], which presumably results
    in special treatment. The documentation for REQ_SYNC is horrid.

    Signed-off-by: Roman Pen
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Reviewed-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Roman Pen
     

14 Aug, 2015

1 commit

  • We can always fill up the bio now, no need to estimate the possible
    size based on queue parameters.

    Acked-by: Steven Whitehouse
    Signed-off-by: Kent Overstreet
    [hch: rebased and wrote a changelog]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

02 Jun, 2015

3 commits

  • As concurrent write sharing of an inode is expected to be very rare
    and memcg only tracks page ownership on first-use basis severely
    confining the usefulness of such sharing, cgroup writeback tracks
    ownership per-inode. While the support for concurrent write sharing
    of an inode is deemed unnecessary, an inode being written to by
    different cgroups at different points in time is a lot more common,
    and, more importantly, charging only by first-use can too readily lead
    to grossly incorrect behaviors (single foreign page can lead to
    gigabytes of writeback to be incorrectly attributed).

    To resolve this issue, cgroup writeback detects the majority dirtier
    of an inode and will transfer the ownership to it. To avoid
    unnnecessary oscillation, the detection mechanism keeps track of
    history and gives out the switch verdict only if the foreign usage
    pattern is stable over a certain amount of time and/or writeback
    attempts.

    The detection mechanism has fairly low space and computation overhead.
    It adds 8 bytes to struct inode (one int and two u16's) and minimal
    amount of calculation per IO. The detection mechanism converges to
    the correct answer usually in several seconds of IO time when there's
    a clear majority dirtier. Even when there isn't, it can reach an
    acceptable answer fairly quickly under most circumstances.

    Please see wb_detach_inode() for more details.

    This patch only implements detection. Following patches will
    implement actual switching.

    v2: wbc_account_io() now checks whether the wbc is associated with a
    wb before dereferencing it. This can happen when pageout() is
    writing pages directly without going through the usual writeback
    path. As pageout() path is single-threaded, we don't want it to
    be blocked behind a slow cgroup and ultimately want it to delegate
    actual writing to the usual writeback path.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Cc: Wu Fengguang
    Cc: Greg Thelen
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, for cgroup writeback, the IO submission paths directly
    associate the bio's with the blkcg from inode_to_wb_blkcg_css();
    however, it'd be necessary to keep more writeback context to implement
    foreign inode writeback detection. wbc (writeback_control) is the
    natural fit for the extra context - it persists throughout the
    writeback of each inode and is passed all the way down to IO
    submission paths.

    This patch adds wbc_attach_and_unlock_inode(), wbc_detach_inode(), and
    wbc_attach_fdatawrite_inode() which are used to associate wbc with the
    inode being written back. IO submission paths now use wbc_init_bio()
    instead of directly associating bio's with blkcg themselves. This
    leaves inode_to_wb_blkcg_css() w/o any user. The function is removed.

    wbc currently only tracks the associated wb (bdi_writeback). Future
    patches will add more for foreign inode detection. The association is
    established under i_lock which will be depended upon when migrating
    foreign inodes to other wb's.

    As currently, once established, inode to wb association never changes,
    going through wbc when initializing bio's doesn't cause any behavior
    changes.

    v2: submit_blk_blkcg() now checks whether the wbc is associated with a
    wb before dereferencing it. This can happen when pageout() is
    writing pages directly without going through the usual writeback
    path. As pageout() path is single-threaded, we don't want it to
    be blocked behind a slow cgroup and ultimately want it to delegate
    actual writing to the usual writeback path.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Cc: Wu Fengguang
    Cc: Greg Thelen
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • __mpage_writepage() is used to implement mpage_writepages() which in
    turn is used for ->writepages() of various filesystems. All writeback
    logic is now updated to handle cgroup writeback and the block cgroup
    to issue IOs for is encoded in writeback_control and can be retrieved
    from the inode; however, __mpage_writepage() currently ignores the
    blkcg indicated by the inode and issues all bio's without explicit
    blkcg association.

    This patch updates __mpage_writepage() so that the issued bio's are
    associated with inode_to_writeback_blkcg_css(inode).

    v2: Updated for per-inode wb association.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Cc: Andrew Morton
    Cc: Alexander Viro
    Signed-off-by: Jens Axboe

    Tejun Heo
     

10 Oct, 2014

1 commit

  • Add guard_bio_eod() check for mpage code in order to allow us to do IO
    even on the odd last sectors of a device, even if the block size is some
    multiple of the physical sector size.

    Using mpage_readpages() for block device requires this guard check.

    Signed-off-by: Akinobu Mita
    Cc: Jens Axboe
    Cc: Alexander Viro
    Cc: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

05 Jun, 2014

3 commits

  • A block device driver may choose to provide a rw_page operation. These
    will be called when the filesystem is attempting to do page sized I/O to
    page cache pages (ie not for direct I/O). This does preclude I/Os that
    are larger than page size, so this may only be a performance gain for
    some devices.

    Signed-off-by: Matthew Wilcox
    Tested-by: Dheeraj Reddy
    Cc: Dave Chinner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • page_endio() takes care of updating all the appropriate page flags once
    I/O has finished to a page. Switch to using mapping_set_error() instead
    of setting AS_EIO directly; this will handle thin-provisioned devices
    correctly.

    Signed-off-by: Matthew Wilcox
    Cc: Dave Chinner
    Cc: Dheeraj Reddy
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • __mpage_writepage() is over 200 lines long, has 20 local variables, four
    goto labels and could desperately use simplification. Splitting
    clean_buffers() into a helper function improves matters a little,
    removing 20+ lines from it.

    Signed-off-by: Matthew Wilcox
    Cc: Dave Chinner
    Cc: Dheeraj Reddy
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

24 Nov, 2013

2 commits

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     
  • With immutable biovecs we don't want code accessing bi_io_vec directly -
    the uses this patch changes weren't incorrect since they all own the
    bio, but it makes the code harder to audit for no good reason - also,
    this will help with multipage bvecs later.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: Jaegeuk Kim
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust

    Kent Overstreet
     

29 Feb, 2012

1 commit


12 Jan, 2012

1 commit


27 May, 2011

1 commit

  • This fourth patch of eight in this cleancache series provides the
    core hooks in VFS for: initializing cleancache per filesystem;
    capturing clean pages reclaimed by page cache; attempting to get
    pages from cleancache before filesystem read; and ensuring coherency
    between pagecache, disk, and cleancache. Note that the placement
    of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
    change was required due to a patchset in 2.6.39.

    All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
    a check of a boolean global if CONFIG_CLEANCACHE is set but no
    cleancache "backend" has claimed cleancache_ops.

    Details and a FAQ can be found in Documentation/vm/cleancache.txt

    [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
    Signed-off-by: Chris Mason
    Signed-off-by: Dan Magenheimer
    Reviewed-by: Jeremy Fitzhardinge
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Rik Van Riel
    Cc: Jan Beulich
    Cc: Andreas Dilger
    Cc: Ted Ts'o
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Nitin Gupta

    Dan Magenheimer
     

10 Mar, 2011

1 commit


14 Jan, 2011

1 commit

  • Merge mpage_end_io_read() and mpage_end_io_write() into mpage_end_io() to
    eliminate code duplication.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Hai Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hai Shan