09 Feb, 2017

1 commit

  • commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d upstream.

    Tetsuo has noticed that an OOM stress test which performs large write
    requests can cause the full memory reserves depletion. He has tracked
    this down to the following path

    __alloc_pages_nodemask+0x436/0x4d0
    alloc_pages_current+0x97/0x1b0
    __page_cache_alloc+0x15d/0x1a0 mm/filemap.c:728
    pagecache_get_page+0x5a/0x2b0 mm/filemap.c:1331
    grab_cache_page_write_begin+0x23/0x40 mm/filemap.c:2773
    iomap_write_begin+0x50/0xd0 fs/iomap.c:118
    iomap_write_actor+0xb5/0x1a0 fs/iomap.c:190
    ? iomap_write_end+0x80/0x80 fs/iomap.c:150
    iomap_apply+0xb3/0x130 fs/iomap.c:79
    iomap_file_buffered_write+0x68/0xa0 fs/iomap.c:243
    ? iomap_write_end+0x80/0x80
    xfs_file_buffered_aio_write+0x132/0x390 [xfs]
    ? remove_wait_queue+0x59/0x60
    xfs_file_write_iter+0x90/0x130 [xfs]
    __vfs_write+0xe5/0x140
    vfs_write+0xc7/0x1f0
    ? syscall_trace_enter+0x1d0/0x380
    SyS_write+0x58/0xc0
    do_syscall_64+0x6c/0x200
    entry_SYSCALL64_slow_path+0x25/0x25

    the oom victim has access to all memory reserves to make a forward
    progress to exit easier. But iomap_file_buffered_write and other
    callers of iomap_apply loop to complete the full request. We need to
    check for fatal signals and back off with a short write instead.

    As the iomap_apply delegates all the work down to the actor we have to
    hook into those. All callers that work with the page cache are calling
    iomap_write_begin so we will check for signals there. dax_iomap_actor
    has to handle the situation explicitly because it copies data to the
    userspace directly. Other callers like iomap_page_mkwrite work on a
    single page or iomap_fiemap_actor do not allocate memory based on the
    given len.

    Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path")
    Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reported-by: Tetsuo Handa
    Reviewed-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     

24 Oct, 2016

1 commit

  • iomap_page_mkwrite_actor() calls __block_write_begin_int() with position
    masked as pos & ~PAGE_MASK which is equivalent to pos & (PAGE_SIZE-1).
    Thus it masks off high bits of file position. However
    __block_write_begin_int() expects full file position on input. This does
    not cause any visible issues because all __block_write_begin_int()
    really cares about are low file position bits but still it is a bug
    waiting to happen.

    Signed-off-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Jan Kara
     

20 Oct, 2016

1 commit

  • This allows the file system to tell a FIEMAP from a read operation, and thus
    avoids the need to report flags that aren't actually used in the read path.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

03 Oct, 2016

1 commit


19 Sep, 2016

3 commits


29 Aug, 2016

1 commit

  • Filesystems like XFS that use extents should not set the
    FIEMAP_EXTENT_MERGED flag in the fiemap extent structures. To allow
    for both behaviors for the upcoming gfs2 usage split the iomap
    type field into type and flags, and only set FIEMAP_EXTENT_MERGED if
    the IOMAP_F_MERGED flag is set. The flags field will also come in
    handy for future features such as shared extents on reflink-enabled
    file systems.

    Reported-by: Andreas Gruenbacher
    Signed-off-by: Christoph Hellwig
    Acked-by: Darrick J. Wong
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

17 Aug, 2016

5 commits


21 Jun, 2016

3 commits

  • Add a simple fiemap implementation based on iomap_ops, partially based
    on a previous implementation from Bob Peterson .

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • This avoid needing a separate inefficient get_block based DAX zero_range
    implementation in file systems.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • Add infrastructure for multipage buffered writes. This is implemented
    using an main iterator that applies an actor function to a range that
    can be written.

    This infrastucture is used to implement a buffered write helper, one
    to zero file ranges and one to implement the ->page_mkwrite VM
    operations. All of them borrow a fair amount of code from fs/buffers.
    for now by using an internal version of __block_write_begin that
    gets passed an iomap and builds the corresponding buffer head.

    The file system is gets a set of paired ->iomap_begin and ->iomap_end
    calls which allow it to map/reserve a range and get a notification
    once the write code is finished with it.

    Based on earlier code from Dave Chinner.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bob Peterson
    Signed-off-by: Dave Chinner

    Christoph Hellwig