25 May, 2018

1 commit

  • commit 5aa1437d2d9a068c0334bd7c9dafa8ec4f97f13b upstream.

    open file, unlink it, then use ioctl(2) to make it immutable or
    append only. Now close it and watch the blocks *not* freed...

    Immutable/append-only checks belong in ->setattr().
    Note: the bug is old and backport to anything prior to 737f2e93b972
    ("ext2: convert to use the new truncate convention") will need
    these checks lifted into ext2_setattr().

    Cc: stable@kernel.org
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

01 Sep, 2017

1 commit

  • The ->iomap_begin() operation is a hot path, so cache the
    fs_dax_get_by_host() result at mount time to avoid the incurring the
    hash lookup overhead on a per-i/o basis.

    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Reviewed-by: Jan Kara
    Reported-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

13 Jul, 2017

1 commit

  • Buffer heads referencing indirect blocks may not be released if the file
    is truncated at the right time. This happens because ext2_get_branch()
    returns NULL when it finds the whole chain of indirect blocks already
    set, and when truncate alters the chain this value of NULL is
    treated as the address of the last head to be released. Handle this in the
    same way as it's done after the got_it label.

    Signed-off-by: Ernesto A. Fernández
    Signed-off-by: Jan Kara

    Ernesto A. Fernández
     

14 May, 2017

1 commit

  • Tetsuo reports:

    fs/built-in.o: In function `xfs_file_iomap_end':
    xfs_iomap.c:(.text+0xe0ef9): undefined reference to `put_dax'
    fs/built-in.o: In function `xfs_file_iomap_begin':
    xfs_iomap.c:(.text+0xe1a7f): undefined reference to `dax_get_by_host'
    make: *** [vmlinux] Error 1
    $ grep DAX .config
    CONFIG_DAX=m
    # CONFIG_DEV_DAX is not set
    # CONFIG_FS_DAX is not set

    When FS_DAX=n we can/must throw away the dax code in filesystems.
    Implement 'fs_' versions of dax_get_by_host() and put_dax() that are
    nops in the FS_DAX=n case.

    Cc:
    Cc:
    Cc: Jan Kara
    Cc: "Theodore Ts'o"
    Cc: "Darrick J. Wong"
    Cc: Ross Zwisler
    Tested-by: Tony Luck
    Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
    Reported-by: Tetsuo Handa
    Signed-off-by: Dan Williams

    Dan Williams
     

06 May, 2017

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "The bulk of this has been in multiple -next releases. There were a few
    late breaking fixes and small features that got added in the last
    couple days, but the whole set has received a build success
    notification from the kbuild robot.

    Change summary:

    - Region media error reporting: A libnvdimm region device is the
    parent to one or more namespaces. To date, media errors have been
    reported via the "badblocks" attribute attached to pmem block
    devices for namespaces in "raw" or "memory" mode. Given that
    namespaces can be in "device-dax" or "btt-sector" mode this new
    interface reports media errors generically, i.e. independent of
    namespace modes or state.

    This subsequently allows userspace tooling to craft "ACPI 6.1
    Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
    requests and submit them via the ioctl path for NVDIMM root bus
    devices.

    - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
    by a request from Linus and feedback from Christoph this allows for
    dax capable drivers to publish their own custom dax operations.
    This fixes the broken assumption that all dax operations are
    related to a persistent memory device, and makes it easier for
    other architectures and platforms to add customized persistent
    memory support.

    - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
    available for storage appliance applications to manually trigger
    memory controllers to drain write-pending buffers that would
    otherwise be flushed automatically by the platform ADR
    (asynchronous-DRAM-refresh) mechanism at a power loss event.
    Support for "locked" DIMMs is included to prevent namespaces from
    surfacing when the namespace label data area is locked. Finally,
    fixes for various reported deadlocks and crashes, also tagged for
    -stable.

    - ACPI / nfit driver updates: General updates of the nfit driver to
    add DSM command overrides, ACPI 6.1 health state flags support, DSM
    payload debug available by default, and various fixes.

    Acknowledgements that came after the branch was pushed:

    - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock":
    Tested-by: Yi Zhang

    - commit 23f498448362 "libnvdimm: rework region badblocks clearing"
    Tested-by: Toshi Kani "

    * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
    libnvdimm, pfn: fix 'npfns' vs section alignment
    libnvdimm: handle locked label storage areas
    libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
    brd: fix uninitialized use of brd->dax_dev
    block, dax: use correct format string in bdev_dax_supported
    device-dax: fix sysfs attribute deadlock
    libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
    libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
    libnvdimm: rework region badblocks clearing
    acpi, nfit: kill ACPI_NFIT_DEBUG
    libnvdimm: fix clear length of nvdimm_forget_poison()
    libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
    libnvdimm, region: sysfs trigger for nvdimm_flush()
    libnvdimm: fix phys_addr for nvdimm_clear_poison
    x86, dax, pmem: remove indirection around memcpy_from_pmem()
    block: remove block_device_operations ->direct_access()
    block, dax: convert bdev_dax_supported() to dax_direct_access()
    filesystem-dax: convert to dax_direct_access()
    Revert "block: use DAX for partition table reads"
    ext2, ext4, xfs: retrieve dax_device for iomap operations
    ...

    Linus Torvalds
     

26 Apr, 2017

1 commit


19 Apr, 2017

1 commit


05 Apr, 2017

1 commit

  • ext2_sync_fs() could be called without s_umount semaphore held when
    called through ext2_write_super() from __ext2_write_inode(). This
    function then calls dquot_writeback_dquots() which relies on s_umount to
    be held for protection against other quota operations.

    In fact __ext2_write_inode() does not need all the functionality
    ext2_write_super() provides. It is enough to just write the superblock.
    So use ext2_sync_super() instead.

    Fixes: 9d1ccbe70e0b14545caad12dc73adb3605447df0
    Reported-by: Jan Beulich
    Signed-off-by: Jan Kara

    Jan Kara
     

31 Jan, 2017

1 commit


27 Dec, 2016

1 commit

  • So far we did not return BH_New buffers from ext2_get_blocks() when we
    allocated and zeroed-out a block for DAX inode to avoid racy zeroing in
    DAX code. This zeroing is gone these days so we can remove the
    workaround.

    Reviewed-by: Ross Zwisler
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     

20 Dec, 2016

1 commit

  • Pull quota, fsnotify and ext2 updates from Jan Kara:
    "Changes to locking of some quota operations from dedicated quota mutex
    to s_umount semaphore, a fsnotify fix and a simple ext2 fix"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    quota: Fix bogus warning in dquot_disable()
    fsnotify: Fix possible use-after-free in inode iteration on umount
    ext2: reject inodes with negative size
    quota: Remove dqonoff_mutex
    ocfs2: Use s_umount for quota recovery protection
    quota: Remove dqonoff_mutex from dquot_scan_active()
    ocfs2: Protect periodic quota syncing with s_umount semaphore
    quota: Use s_umount protection for quota operations
    quota: Hold s_umount in exclusive mode when enabling / disabling quotas
    fs: Provide function to get superblock with exclusive s_umount

    Linus Torvalds
     

15 Dec, 2016

1 commit

  • Pull fs meta data unmap optimization from Jens Axboe:
    "A series from Jan Kara, providing a more efficient way for unmapping
    meta data from in the buffer cache than doing it block-by-block.

    Provide a general helper that existing callers can use"

    * 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block:
    fs: Remove unmap_underlying_metadata
    fs: Add helper to clean bdev aliases under a bh and use it
    ext2: Use clean_bdev_aliases() instead of iteration
    ext4: Use clean_bdev_aliases() instead of iteration
    direct-io: Use clean_bdev_aliases() instead of handmade iteration
    fs: Provide function to unmap metadata for a range of blocks

    Linus Torvalds
     

07 Dec, 2016

1 commit


21 Nov, 2016

1 commit


05 Nov, 2016

1 commit


18 Oct, 2016

1 commit

  • On ARM, we get this false-positive warning since the rework of
    the ext2_get_blocks interface:

    fs/ext2/inode.c: In function 'ext2_get_block':
    include/linux/buffer_head.h:340:16: error: 'bno' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    The calling conventions for this function are rather complex, and it's
    not surprising that the compiler gets this wrong, I spent a long time
    trying to understand how it all fits together myself.

    This change to avoid the warning makes sure the compiler sees that we
    always set 'bno' pointer whenever we have a positive return code.
    The transformation is correct because we always arrive at the 'got_it'
    label with a positive count that gets used as the return value, while
    any branch to the 'cleanup' label has a negative or zero 'err'.

    Fixes: 6750ad71986d ("ext2: stop passing buffer_head to ext2_get_blocks")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Christoph Hellwig
    Cc: Dave Chinner
    Signed-off-by: Jan Kara

    Arnd Bergmann
     

11 Oct, 2016

2 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

06 Oct, 2016

1 commit

  • Pull xfs and iomap updates from Dave Chinner:
    "The main things in this update are the iomap-based DAX infrastructure,
    an XFS delalloc rework, and a chunk of fixes to how log recovery
    schedules writeback to prevent spurious corruption detections when
    recovery of certain items was not required.

    The other main chunk of code is some preparation for the upcoming
    reflink functionality. Most of it is generic and cleanups that stand
    alone, but they were ready and reviewed so are in this pull request.

    Speaking of reflink, I'm currently planning to send you another pull
    request next week containing all the new reflink functionality. I'm
    working through a similar process to the last cycle, where I sent the
    reverse mapping code in a separate request because of how large it
    was. The reflink code merge is even bigger than reverse mapping, so
    I'll be doing the same thing again....

    Summary for this update:

    - change of XFS mailing list to linux-xfs@vger.kernel.org

    - iomap-based DAX infrastructure w/ XFS and ext2 support

    - small iomap fixes and additions

    - more efficient XFS delayed allocation infrastructure based on iomap

    - a rework of log recovery writeback scheduling to ensure we don't
    fail recovery when trying to replay items that are already on disk

    - some preparation patches for upcoming reflink support

    - configurable error handling fixes and documentation

    - aio access time update race fixes for XFS and
    generic_file_read_iter"

    * tag 'xfs-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (40 commits)
    fs: update atime before I/O in generic_file_read_iter
    xfs: update atime before I/O in xfs_file_dio_aio_read
    ext2: fix possible integer truncation in ext2_iomap_begin
    xfs: log recovery tracepoints to track current lsn and buffer submission
    xfs: update metadata LSN in buffers during log recovery
    xfs: don't warn on buffers not being recovered due to LSN
    xfs: pass current lsn to log recovery buffer validation
    xfs: rework log recovery to submit buffers on LSN boundaries
    xfs: quiesce the filesystem after recovery on readonly mount
    xfs: remote attribute blocks aren't really userdata
    ext2: use iomap to implement DAX
    ext2: stop passing buffer_head to ext2_get_blocks
    xfs: use iomap to implement DAX
    xfs: refactor xfs_setfilesize
    xfs: take the ilock shared if possible in xfs_file_iomap_begin
    xfs: fix locking for DAX writes
    dax: provide an iomap based fault handler
    dax: provide an iomap based dax read/write path
    dax: don't pass buffer_head to copy_user_dax
    dax: don't pass buffer_head to dax_insert_mapping
    ...

    Linus Torvalds
     

03 Oct, 2016

1 commit


28 Sep, 2016

2 commits

  • CURRENT_TIME_SEC is not y2038 safe. current_time() will
    be transitioned to use 64 bit time along with vfs in a
    separate patch.
    There is no plan to transistion CURRENT_TIME_SEC to use
    y2038 safe time interfaces.

    current_time() will also be extended to use superblock
    range checking parameters when range checking is introduced.

    This works because alloc_super() fills in the the s_time_gran
    in super block to NSEC_PER_SEC.

    Signed-off-by: Deepa Dinamani
    Acked-by: Jan Kara
    Signed-off-by: Al Viro

    Deepa Dinamani
     
  • When zeroing blocks for DAX allocations, we also have to unmap aliases
    in the block device mappings. Otherwise writeback can overwrite zeros
    with stale data from block device page cache.

    Signed-off-by: Jan Kara

    Jan Kara
     

22 Sep, 2016

1 commit

  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

19 Sep, 2016

2 commits


06 Jul, 2016

1 commit

  • This bug can be reproducible with fsfuzzer, although, I couldn't reproduce it
    100% of my tries, it is quite easily reproducible.

    During the deletion of an inode, ext2_xattr_delete_inode() does not check if the
    block pointed by EXT2_I(inode)->i_file_acl is a valid data block, this might
    lead to a deadlock, when i_file_acl == 1, and the filesystem block size is 1024.

    In that situation, ext2_xattr_delete_inode, will load the superblock's buffer
    head (instead of a valid i_file_acl block), and then lock that buffer head,
    which, ext2_sync_super will also try to lock, making the filesystem deadlock in
    the following stack trace:

    root 17180 0.0 0.0 113660 660 pts/0 D+ 07:08 0:00 rmdir
    /media/test/dir1

    [] __sync_dirty_buffer+0xaf/0x100
    [] sync_dirty_buffer+0x13/0x20
    [] ext2_sync_super+0xb7/0xc0 [ext2]
    [] ext2_error+0x119/0x130 [ext2]
    [] ext2_free_blocks+0x83/0x350 [ext2]
    [] ext2_xattr_delete_inode+0x173/0x190 [ext2]
    [] ext2_evict_inode+0xc9/0x130 [ext2]
    [] evict+0xb3/0x180
    [] iput+0x1b8/0x240
    [] d_delete+0x11c/0x150
    [] vfs_rmdir+0xfe/0x120
    [] do_rmdir+0x17e/0x1f0
    [] SyS_rmdir+0x16/0x20
    [] entry_SYSCALL_64_fastpath+0x1a/0xa4
    [] 0xffffffffffffffff

    Fix this by using the same approach ext4 uses to test data blocks validity,
    implementing ext2_data_block_valid.

    An another possibility when the superblock is very corrupted, is that i_file_acl
    is 1, block_count is 1 and first_data_block is 0. For such situations, we might
    have i_file_acl pointing to a 'valid' block, but still step over the superblock.
    The approach I used was to also test if the superblock is not in the range
    described by ext2_data_block_valid() arguments

    Signed-off-by: Carlos Maiolino
    Signed-off-by: Theodore Ts'o

    Carlos Maiolino
     

27 May, 2016

1 commit

  • Pull misc DAX updates from Vishal Verma:
    "DAX error handling for 4.7

    - Until now, dax has been disabled if media errors were found on any
    device. This enables the use of DAX in the presence of these
    errors by making all sector-aligned zeroing go through the driver.

    - The driver (already) has the ability to clear errors on writes that
    are sent through the block layer using 'DSMs' defined in ACPI 6.1.

    Other misc changes:

    - When mounting DAX filesystems, check to make sure the partition is
    page aligned. This is a requirement for DAX, and previously, we
    allowed such unaligned mounts to succeed, but subsequent
    reads/writes would fail.

    - Misc/cleanup fixes from Jan that remove unused code from DAX
    related to zeroing, writeback, and some size checks"

    * tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    dax: fix a comment in dax_zero_page_range and dax_truncate_page
    dax: for truncate/hole-punch, do zeroing through the driver if possible
    dax: export a low-level __dax_zero_page_range helper
    dax: use sb_issue_zerout instead of calling dax_clear_sectors
    dax: enable dax in the presence of known media errors (badblocks)
    dax: fallback from pmd to pte on error
    block: Update blkdev_dax_capable() for consistency
    xfs: Add alignment check for DAX mount
    ext2: Add alignment check for DAX mount
    ext4: Add alignment check for DAX mount
    block: Add bdev_dax_supported() for dax mount checks
    block: Add vfs_msg() interface
    dax: Remove redundant inode size checks
    dax: Remove pointless writeback from dax_do_io()
    dax: Remove zeroing from dax_io()
    dax: Remove dead zeroing code from fault handlers
    ext2: Avoid DAX zeroing to corrupt data
    ext2: Fix block zeroing in ext2_get_blocks() for DAX
    dax: Remove complete_unwritten argument
    DAX: move RADIX_DAX_ definitions to dax.c

    Linus Torvalds
     

19 May, 2016

1 commit

  • dax_clear_sectors() cannot handle poisoned blocks. These must be
    zeroed using the BIO interface instead. Convert ext2 and XFS to use
    only sb_issue_zerout().

    Reviewed-by: Jeff Moyer
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Matthew Wilcox
    [vishal: Also remove the dax_clear_sectors function entirely]
    Signed-off-by: Vishal Verma

    Matthew Wilcox
     

17 May, 2016

2 commits

  • Currently ext2 zeroes any data blocks allocated for DAX inode however it
    still returns them as BH_New. Thus DAX code zeroes them again in
    dax_insert_mapping() which can possibly overwrite the data that has been
    already stored to those blocks by a racing dax_io(). Avoid marking
    pre-zeroed buffers as new.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Vishal Verma

    Jan Kara
     
  • When zeroing allocated blocks for DAX, we accidentally zeroed only the
    first allocated block instead of all of them. So far this problem is
    hidden by the fact that page faults always need only a single block and
    DAX write code zeroes blocks again. But the zeroing in DAX code is racy
    and needs to be removed so fix the zeroing in ext2 to zero all allocated
    blocks.

    Reported-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Vishal Verma

    Jan Kara
     

02 May, 2016

1 commit


28 Feb, 2016

3 commits

  • Previously calls to dax_writeback_mapping_range() for all DAX filesystems
    (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

    dax_writeback_mapping_range() needs a struct block_device, and it used
    to get that from inode->i_sb->s_bdev. This is correct for normal inodes
    mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
    block devices and for XFS real-time files.

    Instead, call dax_writeback_mapping_range() directly from the filesystem
    ->writepages function so that it can supply us with a valid block
    device. This also fixes DAX code to properly flush caches in response
    to sync(2).

    Signed-off-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Cc: Al Viro
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • dax_clear_blocks() needs a valid struct block_device and previously it
    was using inode->i_sb->s_bdev in all cases. This is correct for normal
    inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
    DAX raw block devices and for XFS real-time devices.

    Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
    its arguments to take a bdev and a sector instead of an inode and a
    block. This better reflects what the function does, and it allows the
    filesystem and raw block device code to pass in an appropriate struct
    block_device.

    Signed-off-by: Ross Zwisler
    Suggested-by: Dan Williams
    Reviewed-by: Jan Kara
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • When S_DAX is set on an inode we assume that if there are pages attached
    to the mapping (mapping->nrpages != 0), those pages are clean zero pages
    that were used to service reads from holes. Any dirty data associated
    with the inode should be in the form of DAX exceptional entries
    (mapping->nrexceptional) that is written back via
    dax_writeback_mapping_range().

    With the current code, though, this isn't always true. For example,
    ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
    data stored as dirty page cache entries. For these types of inodes,
    having S_DAX set doesn't really make sense since their I/O doesn't
    actually happen through the DAX code path.

    Instead, only allow S_DAX to be set for regular inodes for ext2 and
    ext4. This allows us to have strict DAX vs non-DAX paths in the
    writeback code.

    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

09 Dec, 2015

1 commit

  • kmap() in page_follow_link_light() needed to go - allowing to hold
    an arbitrary number of kmaps for long is a great way to deadlocking
    the system.

    new helper (inode_nohighmem(inode)) needs to be used for pagecache
    symlinks inodes; done for all in-tree cases. page_follow_link_light()
    instrumented to yell about anything missed.

    Signed-off-by: Al Viro

    Al Viro
     

19 Oct, 2015

1 commit

  • Add locking to ensure that DAX faults are isolated from ext2 operations
    that modify the data blocks allocation for an inode. This is intended to
    be analogous to the work being done in XFS by Dave Chinner:

    http://www.spinics.net/lists/linux-fsdevel/msg90260.html

    Compared with XFS the ext2 case is greatly simplified by the fact that ext2
    already allocates and zeros new blocks before they are returned as part of
    ext2_get_block(), so DAX doesn't need to worry about getting unmapped or
    unwritten buffer heads.

    This means that the only work we need to do in ext2 is to isolate the DAX
    faults from inode block allocation changes. I believe this just means that
    we need to isolate the DAX faults from truncate operations.

    The newly introduced dax_sem is intended to replicate the protection
    offered by i_mmaplock in XFS. In addition to truncate the i_mmaplock also
    protects XFS operations like hole punching, fallocate down, extent
    manipulation IOCTLS like xfs_ioc_space() and extent swapping. Truncate is
    the only one of these operations supported by ext2.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Jan Kara

    Ross Zwisler
     

09 Sep, 2015

1 commit

  • In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
    return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
    defined in linux/mm.h. Given that we don't want to include
    in , the easiest solution is to move the DAX-related
    functions to a new header, . We could also have moved
    VM_FAULT_* definitions to a new header, or a different header that isn't
    quite such a boil-the-ocean header as , but this felt like
    the best option.

    Signed-off-by: Matthew Wilcox
    Cc: Hillf Danton
    Cc: "Kirill A. Shutemov"
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

24 Jul, 2015

1 commit


11 May, 2015

1 commit