18 Aug, 2018

1 commit

  • This patch is reworked from an earlier patch that Dan has posted:
    https://patchwork.kernel.org/patch/10131727/

    VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that
    the memory page it is dealing with is not typical memory from the linear
    map. The get_user_pages_fast() path, since it does not resolve the vma,
    is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we
    use that as a VM_MIXEDMAP replacement in some locations. In the cases
    where there is no pte to consult we fallback to using vma_is_dax() to
    detect the VM_MIXEDMAP special case.

    Now that we have explicit driver pfn_t-flag opt-in/opt-out for
    get_user_pages() support for DAX we can stop setting VM_MIXEDMAP. This
    also means we no longer need to worry about safely manipulating vm_flags
    in a future where we support dynamically changing the dax mode of a
    file.

    DAX should also now be supported with madvise_behavior(), vma_merge(),
    and copy_page_range().

    This patch has been tested against ndctl unit test. It has also been
    tested against xfstests commit: 625515d using fake pmem created by
    memmap and no additional issues have been observed.

    Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Acked-by: Dan Williams
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

16 Apr, 2018

1 commit


08 Jan, 2018

1 commit

  • Ext4 needs to pass through error from its iomap handler to the page
    fault handler so that it can properly detect ENOSPC and force
    transaction commit and retry the fault (and block allocation). Add
    argument to dax_iomap_fault() for passing such error.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

18 Nov, 2017

1 commit

  • Pull libnvdimm and dax updates from Dan Williams:
    "Save for a few late fixes, all of these commits have shipped in -next
    releases since before the merge window opened, and 0day has given a
    build success notification.

    The ext4 touches came from Jan, and the xfs touches have Darrick's
    reviewed-by. An xfstest for the MAP_SYNC feature has been through
    a few round of reviews and is on track to be merged.

    - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
    'userspace flush' of persistent memory updates via filesystem-dax
    mappings. It arranges for any filesystem metadata updates that may
    be required to satisfy a write fault to also be flushed ("on disk")
    before the kernel returns to userspace from the fault handler.
    Effectively every write-fault that dirties metadata completes an
    fsync() before returning from the fault handler. The new
    MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
    is validated as supported by the filesystem's ->mmap() file
    operation.

    - Add support for the standard ACPI 6.2 label access methods that
    replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
    This enables interoperability with environments that only implement
    the standardized methods.

    - Add support for the ACPI 6.2 NVDIMM media error injection methods.

    - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
    latch last shutdown status, firmware update, SMART error injection,
    and SMART alarm threshold control.

    - Cleanup physical address information disclosures to be root-only.

    - Fix revalidation of the DIMM "locked label area" status to support
    dynamic unlock of the label area.

    - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
    (system-physical-address) command and error injection commands.

    Acknowledgements that came after the commits were pushed to -next:

    - 957ac8c421ad ("dax: fix PMD faults on zero-length files"):
    Reviewed-by: Ross Zwisler

    - a39e596baa07 ("xfs: support for synchronous DAX faults") and
    7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
    Reviewed-by: Darrick J. Wong "

    * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
    acpi, nfit: add 'Enable Latch System Shutdown Status' command support
    dax: fix general protection fault in dax_alloc_inode
    dax: fix PMD faults on zero-length files
    dax: stop requiring a live device for dax_flush()
    brd: remove dax support
    dax: quiet bdev_dax_supported()
    fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
    tools/testing/nvdimm: unit test clear-error commands
    acpi, nfit: validate commands against the device type
    tools/testing/nvdimm: stricter bounds checking for error injection commands
    xfs: support for synchronous DAX faults
    xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
    ext4: Support for synchronous DAX faults
    ext4: Simplify error handling in ext4_dax_huge_fault()
    dax: Implement dax_finish_sync_fault()
    dax, iomap: Add support for synchronous faults
    mm: Define MAP_SYNC and VM_SYNC flags
    dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
    dax: Allow dax_iomap_fault() to return pfn
    dax: Fix comment describing dax_iomap_fault()
    ...

    Linus Torvalds
     

03 Nov, 2017

1 commit

  • For synchronous page fault dax_iomap_fault() will need to return PFN
    which will then need to be inserted into page tables after fsync()
    completes. Add necessary parameter to dax_iomap_fault().

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

07 Sep, 2017

1 commit

  • When servicing mmap() reads from file holes the current DAX code
    allocates a page cache page of all zeroes and places the struct page
    pointer in the mapping->page_tree radix tree.

    This has three major drawbacks:

    1) It consumes memory unnecessarily. For every 4k page that is read via
    a DAX mmap() over a hole, we allocate a new page cache page. This
    means that if you read 1GiB worth of pages, you end up using 1GiB of
    zeroed memory. This is easily visible by looking at the overall
    memory consumption of the system or by looking at /proc/[pid]/smaps:

    7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12 /root/dax/data
    Size: 1048576 kB
    Rss: 1048576 kB
    Pss: 1048576 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 1048576 kB
    Private_Dirty: 0 kB
    Referenced: 1048576 kB
    Anonymous: 0 kB
    LazyFree: 0 kB
    AnonHugePages: 0 kB
    ShmemPmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Locked: 0 kB

    2) It is slower than using a common zero page because each page fault
    has more work to do. Instead of just inserting a common zero page we
    have to allocate a page cache page, zero it, and then insert it. Here
    are the average latencies of dax_load_hole() as measured by ftrace on
    a random test box:

    Old method, using zeroed page cache pages: 3.4 us
    New method, using the common 4k zero page: 0.8 us

    This was the average latency over 1 GiB of sequential reads done by
    this simple fio script:

    [global]
    size=1G
    filename=/root/dax/data
    fallocate=none
    [io]
    rw=read
    ioengine=mmap

    3) The fact that we had to check for both DAX exceptional entries and
    for page cache pages in the radix tree made the DAX code more
    complex.

    Solve these issues by following the lead of the DAX PMD code and using a
    common 4k zero page instead. As with the PMD code we will now insert a
    DAX exceptional entry into the radix tree instead of a struct page
    pointer which allows us to remove all the special casing in the DAX
    code.

    Note that we do still pretty aggressively check for regular pages in the
    DAX radix tree, especially where we take action based on the bits set in
    the page. If we ever find a regular page in our radix tree now that
    most likely means that someone besides DAX is inserting pages (which has
    happened lots of times in the past), and we want to find that out early
    and fail loudly.

    This solution also removes the extra memory consumption. Here is that
    same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
    code:

    7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12 /root/dax/data
    Size: 1048576 kB
    Rss: 0 kB
    Pss: 0 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 0 kB
    Referenced: 0 kB
    Anonymous: 0 kB
    LazyFree: 0 kB
    AnonHugePages: 0 kB
    ShmemPmdMapped: 0 kB
    Shared_Hugetlb: 0 kB
    Private_Hugetlb: 0 kB
    Swap: 0 kB
    SwapPss: 0 kB
    KernelPageSize: 4 kB
    MMUPageSize: 4 kB
    Locked: 0 kB

    Overall system memory consumption is similarly improved.

    Another major change is that we remove dax_pfn_mkwrite() from our fault
    flow, and instead rely on the page fault itself to make the PTE dirty
    and writeable. The following description from the patch adding the
    vm_insert_mixed_mkwrite() call explains this a little more:

    "To be able to use the common 4k zero page in DAX we need to have our
    PTE fault path look more like our PMD fault path where a PTE entry
    can be marked as dirty and writeable as it is first inserted rather
    than waiting for a follow-up dax_pfn_mkwrite() =>
    finish_mkwrite_fault() call.

    Right now we can rely on having a dax_pfn_mkwrite() call because we
    can distinguish between these two cases in do_wp_page():

    case 1: 4k zero page => writable DAX storage
    case 2: read-only DAX storage => writeable DAX storage

    This distinction is made by via vm_normal_page(). vm_normal_page()
    returns false for the common 4k zero page, though, just as it does
    for DAX ptes. Instead of special casing the DAX + 4k zero page case
    we will simplify our DAX PTE page fault sequence so that it matches
    our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
    We will instead use dax_iomap_fault() to handle write-protection
    faults.

    This means that insert_pfn() needs to follow the lead of
    insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
    'mkwrite' is set insert_pfn() will do the work that was previously
    done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"

    Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: "Darrick J. Wong"
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andreas Dilger
    Cc: Christoph Hellwig
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Ingo Molnar
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: Steven Rostedt
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

06 Jul, 2017

1 commit

  • ext2 currently does a test+clear of the AS_EIO flag, which is
    is problematic for some coming changes.

    What we really need to do instead is call filemap_check_errors
    in __generic_file_fsync after syncing out the buffers. That
    will be sufficient for this case, and help other callers detect
    these errors properly as well.

    With that, we don't need to twiddle it in ext2.

    Suggested-by: Jan Kara
    Signed-off-by: Jeff Layton
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Matthew Wilcox

    Jeff Layton
     

25 Feb, 2017

3 commits

  • Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has
    been somewhat painful with getting the flags set and removed at the
    correct locations. More than one kernel oops was introduced due to
    difficulties of getting the placement correctly.

    Remove the flag values and introduce an input parameter to huge_fault
    that indicates the size of the page entry. This makes the code easier
    to trace and should avoid the issues we see with the fault flags where
    removal of the flag was necessary in the fallback paths.

    Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Tested-by: Dan Williams
    Reviewed-by: Jan Kara
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Nilesh Choudhury
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • Patch series "1G transparent hugepage support for device dax", v2.

    The following series implements support for 1G trasparent hugepage on
    x86 for device dax. The bulk of the code was written by Mathew Wilcox a
    while back supporting transparent 1G hugepage for fs DAX. I have
    forward ported the relevant bits to 4.10-rc. The current submission has
    only the necessary code to support device DAX.

    Comments from Dan Williams: So the motivation and intended user of this
    functionality mirrors the motivation and users of 1GB page support in
    hugetlbfs. Given expected capacities of persistent memory devices an
    in-memory database may want to reduce tlb pressure beyond what they can
    already achieve with 2MB mappings of a device-dax file. We have
    customer feedback to that effect as Willy mentioned in his previous
    version of these patches [1].

    [1]: https://lkml.org/lkml/2016/1/31/52

    Comments from Nilesh @ Oracle:

    There are applications which have a process model; and if you assume
    10,000 processes attempting to mmap all the 6TB memory available on a
    server; we are looking at the following:

    processes : 10,000
    memory : 6TB
    pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
    pmd @ 2M page size: 120,000 / 512 = ~240GB
    pud @ 1G page size: 240GB / 512 = ~480MB

    As you can see with 2M pages, this system will use up an exorbitant
    amount of DRAM to hold the page tables; but the 1G pages finally brings
    it down to a reasonable level. Memory sizes will keep increasing; so
    this number will keep increasing.

    An argument can be made to convert the applications from process model
    to thread model, but in the real world that may not be always practical.
    Hopefully this helps explain the use case where this is valuable.

    This patch (of 3):

    In preparation for adding the ability to handle PUD pages, convert
    vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault. The
    vm_fault structure is extended to include a union of the different page
    table pointers that may be needed, and three flag bits are reserved to
    indicate which type of pointer is in the union.

    [ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
    Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
    [dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
    Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
    Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Matthew Wilcox
    Signed-off-by: Dave Jiang
    Signed-off-by: Ross Zwisler
    Cc: Dave Hansen
    Cc: Vlastimil Babka
    Cc: Jan Kara
    Cc: Dan Williams
    Cc: Kirill A. Shutemov
    Cc: Nilesh Choudhury
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Dave Jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     
  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

08 Nov, 2016

2 commits

  • The recently added DAX functions that use the new struct iomap data
    structure were named iomap_dax_rw(), iomap_dax_fault() and
    iomap_dax_actor(). These are actually defined in fs/dax.c, though, so
    should be part of the "dax" namespace and not the "iomap" namespace.
    Rename them to dax_iomap_rw(), dax_iomap_fault() and dax_iomap_actor()
    respectively.

    Signed-off-by: Ross Zwisler
    Suggested-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Dave Chinner

    Ross Zwisler
     
  • DAX PMD support was added via the following commit:

    commit e7b1ea2ad658 ("ext2: huge page fault support")

    I believe this path to be untested as ext2 doesn't reliably provide block
    allocations that are aligned to 2MiB. In my testing I've been unable to
    get ext2 to actually fault in a PMD. It always fails with a "pfn
    unaligned" message because the sector returned by ext2_get_block() isn't
    aligned.

    I've tried various settings for the "stride" and "stripe_width" extended
    options to mkfs.ext2, without any luck.

    Since we can't reliably get PMDs, remove support so that we don't have an
    untested code path that we may someday traverse when we happen to get an
    aligned block allocation. This should also make 4k DAX faults in ext2 a
    bit faster since they will no longer have to call the PMD fault handler
    only to get a response of VM_FAULT_FALLBACK.

    Signed-off-by: Ross Zwisler
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Dave Chinner

    Ross Zwisler
     

11 Oct, 2016

1 commit

  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     

08 Oct, 2016

2 commits

  • These inode operations are no longer used; remove them.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • To support DAX pmd mappings with unmodified applications, filesystems
    need to align an mmap address by the pmd size.

    Call thp_get_unmapped_area() from f_op->get_unmapped_area.

    Note, there is no change in behavior for a non-DAX file.

    Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com
    Signed-off-by: Toshi Kani
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Kirill A. Shutemov
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Theodore Ts'o
    Cc: Andreas Dilger
    Cc: Mike Kravetz
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     

19 Sep, 2016

1 commit


27 Jul, 2016

1 commit

  • Remove the unused wrappers dax_fault() and dax_pmd_fault(). After this
    removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
    dax_pmd_fault() respectively, and update all callers.

    The dax_fault() and dax_pmd_fault() wrappers were initially intended to
    capture some filesystem independent functionality around page faults
    (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
    and ctime).

    However, the following commits:

    5726b27b09cc ("ext2: Add locking for DAX faults")
    ea3d7209ca01 ("ext4: fix races between page faults and hole punching")

    added locking to the ext2 and ext4 filesystems after these common
    operations but before __dax_fault() and __dax_pmd_fault() were called.
    This means that these wrappers are no longer used, and are unlikely to
    be used in the future.

    XFS has had locking analogous to what was recently added to ext2 and
    ext4 since DAX support was initially introduced by:

    6b698edeeef0 ("xfs: add DAX file operations support")

    Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andreas Dilger
    Cc: Dan Williams
    Cc: Dave Chinner
    Reviewed-by: Jan Kara
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

17 May, 2016

1 commit

  • Fault handlers currently take complete_unwritten argument to convert
    unwritten extents after PTEs are updated. However no filesystem uses
    this anymore as the code is racy. Remove the unused argument.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Vishal Verma

    Jan Kara
     

28 Feb, 2016

1 commit

  • As it is currently written ext4_dax_mkwrite() assumes that the call into
    __dax_mkwrite() will not have to do a block allocation so it doesn't create
    a journal entry. For a read that creates a zero page to cover a hole
    followed by a write that actually allocates storage this is incorrect. The
    ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
    get_blocks() to allocate storage.

    Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
    as this function already has all the logic needed to allocate a journal
    entry and call __dax_fault().

    Also update the ext2 fault handlers in this same way to remove duplicate
    code and keep the logic between ext2 and ext4 the same.

    Reviewed-by: Jan Kara
    Signed-off-by: Ross Zwisler
    Signed-off-by: Theodore Ts'o

    Ross Zwisler
     

23 Jan, 2016

1 commit

  • To properly support the new DAX fsync/msync infrastructure filesystems
    need to call dax_pfn_mkwrite() so that DAX can track when user pages are
    dirtied.

    Signed-off-by: Ross Zwisler
    Cc: "H. Peter Anvin"
    Cc: "J. Bruce Fields"
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andreas Dilger
    Cc: Dave Chinner
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jeff Layton
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

19 Oct, 2015

1 commit

  • Add locking to ensure that DAX faults are isolated from ext2 operations
    that modify the data blocks allocation for an inode. This is intended to
    be analogous to the work being done in XFS by Dave Chinner:

    http://www.spinics.net/lists/linux-fsdevel/msg90260.html

    Compared with XFS the ext2 case is greatly simplified by the fact that ext2
    already allocates and zeros new blocks before they are returned as part of
    ext2_get_block(), so DAX doesn't need to worry about getting unmapped or
    unwritten buffer heads.

    This means that the only work we need to do in ext2 is to isolate the DAX
    faults from inode block allocation changes. I believe this just means that
    we need to isolate the DAX faults from truncate operations.

    The newly introduced dax_sem is intended to replicate the protection
    offered by i_mmaplock in XFS. In addition to truncate the i_mmaplock also
    protects XFS operations like hole punching, fallocate down, extent
    manipulation IOCTLS like xfs_ioc_space() and extent swapping. Truncate is
    the only one of these operations supported by ext2.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Jan Kara

    Ross Zwisler
     

09 Sep, 2015

2 commits

  • Use DAX to provide support for huge pages.

    Signed-off-by: Matthew Wilcox
    Cc: Hillf Danton
    Cc: "Kirill A. Shutemov"
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
    return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
    defined in linux/mm.h. Given that we don't want to include
    in , the easiest solution is to move the DAX-related
    functions to a new header, . We could also have moved
    VM_FAULT_* definitions to a new header, or a different header that isn't
    quite such a boil-the-ocean header as , but this felt like
    the best option.

    Signed-off-by: Matthew Wilcox
    Cc: Hillf Danton
    Cc: "Kirill A. Shutemov"
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

04 Jun, 2015

1 commit

  • dax_fault() currently relies on the get_block callback to attach an
    io completion callback to the mapping buffer head so that it can
    run unwritten extent conversion after zeroing allocated blocks.

    Instead of this hack, pass the conversion callback directly into
    dax_fault() similar to the get_block callback. When the filesystem
    allocates unwritten extents, it will set the buffer_unwritten()
    flag, and hence the dax_fault code can call the completion function
    in the contexts where it is necessary without overloading the
    mapping buffer head.

    Note: The changes to ext4 to use this interface are suspect at best.
    In fact, the way ext4 did this end_io assignment in the first place
    looks suspect because it only set a completion callback when there
    wasn't already some other write() call taking place on the same
    inode. The ext4 end_io code looks rather intricate and fragile with
    all it's reference counting and passing to different contexts for
    modification via inode private pointers that aren't protected by
    locks...

    Signed-off-by: Dave Chinner
    Acked-by: Jan Kara
    Signed-off-by: Dave Chinner

    Dave Chinner
     

16 Apr, 2015

3 commits

  • Merge second patchbomb from Andrew Morton:

    - the rest of MM

    - various misc bits

    - add ability to run /sbin/reboot at reboot time

    - printk/vsprintf changes

    - fiddle with seq_printf() return value

    * akpm: (114 commits)
    parisc: remove use of seq_printf return value
    lru_cache: remove use of seq_printf return value
    tracing: remove use of seq_printf return value
    cgroup: remove use of seq_printf return value
    proc: remove use of seq_printf return value
    s390: remove use of seq_printf return value
    cris fasttimer: remove use of seq_printf return value
    cris: remove use of seq_printf return value
    openrisc: remove use of seq_printf return value
    ARM: plat-pxa: remove use of seq_printf return value
    nios2: cpuinfo: remove use of seq_printf return value
    microblaze: mb: remove use of seq_printf return value
    ipc: remove use of seq_printf return value
    rtc: remove use of seq_printf return value
    power: wakeup: remove use of seq_printf return value
    x86: mtrr: if: remove use of seq_printf return value
    linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
    MAINTAINERS: CREDITS: remove Stefano Brivio from B43
    .mailmap: add Ricardo Ribalda
    CREDITS: add Ricardo Ribalda Delgado
    ...

    Linus Torvalds
     
  • The original dax patchset split the ext2/4_file_operations because of the
    two NULL splice_read/splice_write in the dax case.

    In the vfs if splice_read/splice_write are NULL we then call
    default_splice_read/write.

    What we do here is make generic_file_splice_read aware of IS_DAX() so the
    original ext2/4_file_operations can be used as is.

    For write it appears that iter_file_splice_write is just fine. It uses
    the regular f_op->write(file,..) or new_sync_write(file, ...).

    Signed-off-by: Boaz Harrosh
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Matthew Wilcox
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Boaz Harrosh
     
  • From: Yigal Korman

    [v1]
    Without this patch, c/mtime is not updated correctly when mmap'ed page is
    first read from and then written to.

    A new xfstest is submitted for testing this (generic/080)

    [v2]
    Jan Kara has pointed out that if we add the
    sb_start/end_pagefault pair in the new pfn_mkwrite we
    are then fixing another bug where: A user could start
    writing to the page while filesystem is frozen.

    Signed-off-by: Yigal Korman
    Signed-off-by: Boaz Harrosh
    Reviewed-by: Jan Kara
    Cc: Matthew Wilcox
    Cc: Dave Chinner
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Kirill A. Shutemov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Boaz Harrosh
     

12 Apr, 2015

1 commit

  • All places outside of core VFS that checked ->read and ->write for being NULL or
    called the methods directly are gone now, so NULL {read,write} with non-NULL
    {read,write}_iter will do the right thing in all cases.

    Signed-off-by: Al Viro

    Al Viro
     

17 Feb, 2015

4 commits

  • To help people transition, accept the 'xip' mount option (and report it in
    /proc/mounts), but print a message encouraging people to switch over to
    the 'dax' option.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Mathieu Desnoyers
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The fewer Kconfig options we have the better. Use the generic
    CONFIG_FS_DAX to enable XIP support in ext2 as well as in the core.

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Instead of calling aops->get_xip_mem from the fault handler, the
    filesystem passes a get_block_t that is used to find the appropriate
    blocks.

    This requires that all architectures implement copy_user_page(). At the
    time of writing, mips and arm do not. Patches exist and are in progress.

    [akpm@linux-foundation.org: remap_file_pages went away]
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Jan Kara
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Russell King
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the generic AIO infrastructure instead of custom read and write
    methods. In addition to giving us support for AIO, this adds the missing
    locking between read() and truncate().

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

12 Jun, 2014

1 commit

  • iter_file_splice_write() - a ->splice_write() instance that gathers the
    pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
    it to ->write_iter(). A bunch of simple cases coverted to that...

    [AV: fixed the braino spotted by Cyrill]

    Signed-off-by: Al Viro

    Al Viro
     

07 May, 2014

2 commits


26 Jan, 2014

1 commit


26 Jul, 2011

1 commit

  • Replace the ->check_acl method with a ->get_acl method that simply reads an
    ACL from disk after having a cache miss. This means we can replace the ACL
    checking boilerplate code with a single implementation in namei.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

21 Jul, 2011

1 commit

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

28 May, 2010

1 commit