07 Jan, 2020

1 commit

  • The check in block_page_mkwrite that is meant to determine whether an
    offset is within the inode size is off by one. This bug has been copied
    into iomap_page_mkwrite and several filesystems (ubifs, ext4, f2fs,
    ceph).

    Fix that by introducing a new page_mkwrite_check_truncate helper that
    checks for truncate and computes the bytes in the page up to EOF. Use
    the helper in iomap.

    NOTE from Darrick: The original patch fixed a number of filesystems, but
    then there were merge conflicts with the f2fs for-next tree; a
    subsequent re-submission of the patch had different btrfs changes with
    no explanation; and Christoph complained that each per-fs fix should be
    a separate patch. In my view that's too much risk to take on, so I
    decided to drop all the hunks except for iomap, since I've actually QA'd
    XFS.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Darrick J. Wong
    [darrick: drop everything but the iomap parts]
    Signed-off-by: Darrick J. Wong

    Andreas Gruenbacher
     

05 Dec, 2019

2 commits

  • This patch fixes the following KASAN report. The @ioend has been
    freed by dio_put(), but the iomap_finish_ioend() still trys to access
    its data.

    [20563.631624] BUG: KASAN: use-after-free in iomap_finish_ioend+0x58c/0x5c0
    [20563.638319] Read of size 8 at addr fffffc0c54a36928 by task kworker/123:2/22184

    [20563.647107] CPU: 123 PID: 22184 Comm: kworker/123:2 Not tainted 5.4.0+ #1
    [20563.653887] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
    [20563.664499] Workqueue: xfs-conv/sda5 xfs_end_io [xfs]
    [20563.669547] Call trace:
    [20563.671993] dump_backtrace+0x0/0x370
    [20563.675648] show_stack+0x1c/0x28
    [20563.678958] dump_stack+0x138/0x1b0
    [20563.682455] print_address_description.isra.9+0x60/0x378
    [20563.687759] __kasan_report+0x1a4/0x2a8
    [20563.691587] kasan_report+0xc/0x18
    [20563.694985] __asan_report_load8_noabort+0x18/0x20
    [20563.699769] iomap_finish_ioend+0x58c/0x5c0
    [20563.703944] iomap_finish_ioends+0x110/0x270
    [20563.708396] xfs_end_ioend+0x168/0x598 [xfs]
    [20563.712823] xfs_end_io+0x1e0/0x2d0 [xfs]
    [20563.716834] process_one_work+0x7f0/0x1ac8
    [20563.720922] worker_thread+0x334/0xae0
    [20563.724664] kthread+0x2c4/0x348
    [20563.727889] ret_from_fork+0x10/0x18

    [20563.732941] Allocated by task 83403:
    [20563.736512] save_stack+0x24/0xb0
    [20563.739820] __kasan_kmalloc.isra.9+0xc4/0xe0
    [20563.744169] kasan_slab_alloc+0x14/0x20
    [20563.747998] slab_post_alloc_hook+0x50/0xa8
    [20563.752173] kmem_cache_alloc+0x154/0x330
    [20563.756185] mempool_alloc_slab+0x20/0x28
    [20563.760186] mempool_alloc+0xf4/0x2a8
    [20563.763845] bio_alloc_bioset+0x2d0/0x448
    [20563.767849] iomap_writepage_map+0x4b8/0x1740
    [20563.772198] iomap_do_writepage+0x200/0x8d0
    [20563.776380] write_cache_pages+0x8a4/0xed8
    [20563.780469] iomap_writepages+0x4c/0xb0
    [20563.784463] xfs_vm_writepages+0xf8/0x148 [xfs]
    [20563.788989] do_writepages+0xc8/0x218
    [20563.792658] __writeback_single_inode+0x168/0x18f8
    [20563.797441] writeback_sb_inodes+0x370/0xd30
    [20563.801703] wb_writeback+0x2d4/0x1270
    [20563.805446] wb_workfn+0x344/0x1178
    [20563.808928] process_one_work+0x7f0/0x1ac8
    [20563.813016] worker_thread+0x334/0xae0
    [20563.816757] kthread+0x2c4/0x348
    [20563.819979] ret_from_fork+0x10/0x18

    [20563.825028] Freed by task 22184:
    [20563.828251] save_stack+0x24/0xb0
    [20563.831559] __kasan_slab_free+0x10c/0x180
    [20563.835648] kasan_slab_free+0x10/0x18
    [20563.839389] slab_free_freelist_hook+0xb4/0x1c0
    [20563.843912] kmem_cache_free+0x8c/0x3e8
    [20563.847745] mempool_free_slab+0x20/0x28
    [20563.851660] mempool_free+0xd4/0x2f8
    [20563.855231] bio_free+0x33c/0x518
    [20563.858537] bio_put+0xb8/0x100
    [20563.861672] iomap_finish_ioend+0x168/0x5c0
    [20563.865847] iomap_finish_ioends+0x110/0x270
    [20563.870328] xfs_end_ioend+0x168/0x598 [xfs]
    [20563.874751] xfs_end_io+0x1e0/0x2d0 [xfs]
    [20563.878755] process_one_work+0x7f0/0x1ac8
    [20563.882844] worker_thread+0x334/0xae0
    [20563.886584] kthread+0x2c4/0x348
    [20563.889804] ret_from_fork+0x10/0x18

    [20563.894855] The buggy address belongs to the object at fffffc0c54a36900
    which belongs to the cache bio-1 of size 248
    [20563.906844] The buggy address is located 40 bytes inside of
    248-byte region [fffffc0c54a36900, fffffc0c54a369f8)
    [20563.918485] The buggy address belongs to the page:
    [20563.923269] page:ffffffff82f528c0 refcount:1 mapcount:0 mapping:fffffc8e4ba31900 index:0xfffffc0c54a33300
    [20563.932832] raw: 17ffff8000000200 ffffffffa3060100 0000000700000007 fffffc8e4ba31900
    [20563.940567] raw: fffffc0c54a33300 0000000080aa0042 00000001ffffffff 0000000000000000
    [20563.948300] page dumped because: kasan: bad access detected

    [20563.955345] Memory state around the buggy address:
    [20563.960129] fffffc0c54a36800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
    [20563.967342] fffffc0c54a36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [20563.974554] >fffffc0c54a36900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [20563.981766] ^
    [20563.986288] fffffc0c54a36980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
    [20563.993501] fffffc0c54a36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [20564.000713] ==================================================================

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205703
    Signed-off-by: Zorro Lang
    Fixes: 9cd0ed63ca514 ("iomap: enhance writeback error message")
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Zorro Lang
     
  • bio completions can race when a page spans more than one file system
    block. Add a spinlock to synchronize marking the page uptodate.

    Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
    Reported-by: Jan Stancek
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

27 Nov, 2019

2 commits

  • The 'start' variable indicates the start of a filemap and is set to the
    iocb's position, which we have already cached as 'pos', upon function
    entry.

    'pos' is used as a cursor indicating the current position and updated
    later in iomap_dio_rw(), but not before the last use of 'start'.

    Remove 'start' as it's synonym for 'pos' before we're entering the loop
    calling iomapp_apply().

    Signed-off-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Johannes Thumshirn
     
  • iomap_dio_bio_actor() copies iter to a local variable and then limits it
    to a file extent we have mapped. When IO is submitted,
    iomap_dio_bio_actor() advances the original iter while the copied iter
    is advanced inside bio_iov_iter_get_pages(). This logic is non-obvious
    especially because both iters still point to same shared structures
    (such as pipe info) so if iov_iter_advance() changes anything in the
    shared structure, this scheme breaks. Let's just truncate and reexpand
    the original iter as needed instead of playing games with copying iters
    and keeping them in sync.

    Signed-off-by: Jan Kara
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Jan Kara
     

23 Nov, 2019

2 commits

  • When splicing using iomap_dio_rw() to a pipe, we may leak pipe pages
    because bio_iov_iter_get_pages() records that the pipe will have full
    extent worth of data however if file size is not block size aligned
    iomap_dio_rw() returns less than what bio_iov_iter_get_pages() set up
    and splice code gets confused leaking a pipe page with the file tail.

    Handle the situation similarly to the old direct IO implementation and
    revert iter to actually returned read amount which makes iter consistent
    with value returned from iomap_dio_rw() and thus the splice code is
    happy.

    Fixes: ff6a9292e6f6 ("iomap: implement direct I/O")
    CC: stable@vger.kernel.org
    Reported-by: syzbot+991400e8eba7e00a26e1@syzkaller.appspotmail.com
    Signed-off-by: Jan Kara
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Jan Kara
     
  • Add some tracepoints so that we can more easily debug what the
    filesystem is returning from ->iomap_begin.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

12 Nov, 2019

1 commit

  • Naresh reported LTP diotest4 failing for 32bit x86 and arm -next
    kernels on ext4. Same problem exists in 5.4-rc7 on xfs.

    The failure comes down to:
    openat(AT_FDCWD, "testdata-4.5918", O_RDWR|O_DIRECT) = 4
    mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7b000
    read(4, 0xb7f7b000, 4096) = 0 // expects -EFAULT

    Problem is conversion at iomap_dio_bio_actor() return. Ternary
    operator has a return type and an attempt is made to convert each
    of operands to the type of the other. In this case "ret" (int)
    is converted to type of "copied" (unsigned long). Both have size
    of 4 bytes:
    size_t copied = 0;
    int ret = -14;
    long long actor_ret = copied ? copied : ret;

    On x86_64: actor_ret == -14;
    On x86 : actor_ret == 4294967282

    Replace ternary operator with 2 return statements to avoid this
    unwanted conversion.

    Fixes: 4721a6010990 ("iomap: dio data corruption and spurious errors when pipes fill")
    Reported-by: Naresh Kamboju
    Signed-off-by: Jan Stancek
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Jan Stancek
     

08 Nov, 2019

1 commit


07 Nov, 2019

1 commit

  • On architectures where loff_t is wider than pgoff_t, the expression
    ((page->index + 1) << PAGE_SHIFT) can overflow. Rewrite to use the page
    offset, which we already compute here anyway.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Andreas Gruenbacher
     

30 Oct, 2019

1 commit


21 Oct, 2019

14 commits


15 Oct, 2019

1 commit

  • Filesystems do not support doing IO as asynchronous in some cases. For
    example in case of unaligned writes or in case file size needs to be
    extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO
    in such cases, add argument to iomap_dio_rw() which makes the function
    wait for IO completion. This also results in executing
    iomap_dio_complete() inline in iomap_dio_rw() providing its return value
    to the caller as for ordinary sync IO.

    Signed-off-by: Jan Kara
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Jan Kara
     

20 Sep, 2019

2 commits

  • Add a new iomap_dio_ops structure that for now just contains the end_io
    handler. This avoid storing the function pointer in a mutable structure,
    which is a possible exploit vector for kernel code execution, and prepares
    for adding a submit_io handler that btrfs needs.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     
  • Modify the calling convention for the iomap_dio_rw ->end_io() callback.
    Rather than passing either dio->error or dio->size as the 'size' argument,
    instead pass both the dio->error and the dio->size value separately.

    In the instance that an error occurred during a write, we currently cannot
    determine whether any blocks have been allocated beyond the current EOF and
    data has subsequently been written to these blocks within the ->end_io()
    callback. As a result, we cannot judge whether we should take the truncate
    failed write path. Having both dio->error and dio->size will allow us to
    perform such checks within this callback.

    Signed-off-by: Matthew Bobrowski
    [hch: minor cleanups]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Matthew Wilcox (Oracle)

    Matthew Bobrowski
     

25 Jul, 2019

1 commit

  • Detected by:

    $ ./scripts/spdxcheck.py
    fs/iomap/Makefile: 1:27 Invalid License ID: GPL-2.0-or-newer

    Fixes: 1c230208f53d ("iomap: start moving code to fs/iomap/")
    Signed-off-by: Masahiro Yamada
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Masahiro Yamada
     

20 Jul, 2019

1 commit

  • Pull iomap split/cleanup from Darrick Wong:
    "As promised, here's the second part of the iomap merge for 5.3, in
    which we break up iomap.c into smaller files grouped by functional
    area so that it'll be easier in the long run to maintain cohesiveness
    of code units and to review incoming patches. There are no functional
    changes and fs/iomap.c split cleanly.

    Summary:

    - Regroup the fs/iomap.c code by major functional area so that we can
    start development for 5.4 from a more stable base"

    * tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    iomap: move internal declarations into fs/iomap/
    iomap: move the main iteration code into a separate file
    iomap: move the buffered IO code into a separate file
    iomap: move the direct IO code into a separate file
    iomap: move the SEEK_HOLE code into a separate file
    iomap: move the file mapping reporting code into a separate file
    iomap: move the swapfile code into a separate file
    iomap: start moving code to fs/iomap/

    Linus Torvalds
     

17 Jul, 2019

7 commits


15 Jul, 2019

1 commit