11 Jun, 2018

1 commit

  • There's no need to retain the fs/autofs4 directory for backward
    compatibility.

    Adding an AUTOFS4_FS fragment to the autofs Kconfig and a module alias
    for autofs4 is sufficient for almost all cases. Not keeping fs/autofs4
    remnants will prevent "insmod /autofs4/autofs4.ko" from working
    but this shouldn't be used in automation scripts rather than
    modprobe(8).

    There were some comments about things to look out for with the module
    rename in the fs/autofs4/Kconfig that is removed by this patch, see the
    commit patch if you are interested.

    One potential problem with this change is that when the
    fs/autofs/Kconfig fragment for AUTOFS4_FS is removed any AUTOFS4_FS
    entries will be removed from the kernel config, resulting in no autofs
    file system being built if there is no AUTOFS_FS entry also.

    This would have also happened if the fs/autofs4 remnants had remained
    and is most likely to be a problem with automated builds.

    Please check your build configurations before the removal which will
    occur after the next couple of kernel releases.

    Acked-by: Ian Kent
    [ With edits and commit message from Ian Kent ]
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 Jun, 2018

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "This adds a user for the new 'bytes-remaining' updates to
    memcpy_mcsafe() that you already received through Ingo via the
    x86-dax- for-linus pull.

    Not included here, but still targeting this cycle, is support for
    handling memory media errors (poison) consumed via userspace dax
    mappings.

    Summary:

    - DAX broke a fundamental assumption of truncate of file mapped
    pages. The truncate path assumed that it is safe to disconnect a
    pinned page from a file and let the filesystem reclaim the physical
    block. With DAX the page is equivalent to the filesystem block.
    Introduce dax_layout_busy_page() to enable filesystems to wait for
    pinned DAX pages to be released. Without this wait a filesystem
    could allocate blocks under active device-DMA to a new file.

    - DAX arranges for the block layer to be bypassed and uses
    dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
    However, the memcpy_mcsafe() facility is available through the pmem
    block driver. In order to safely handle media errors, via the DAX
    block-layer bypass, introduce copy_to_iter_mcsafe().

    - Fix cache management policy relative to the ACPI NFIT Platform
    Capabilities Structure to properly elide cache flushes when they
    are not necessary. The table indicates whether CPU caches are
    power-fail protected. Clarify that a deep flush is always performed
    on REQ_{FUA,PREFLUSH} requests"

    * tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
    dax: Use dax_write_cache* helpers
    libnvdimm, pmem: Do not flush power-fail protected CPU caches
    libnvdimm, pmem: Unconditionally deep flush on *sync
    libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
    acpi, nfit: Remove ecc_unit_size
    dax: dax_insert_mapping_entry always succeeds
    libnvdimm, e820: Register all pmem resources
    libnvdimm: Debug probe times
    linvdimm, pmem: Preserve read-only setting for pmem devices
    x86, nfit_test: Add unit test for memcpy_mcsafe()
    pmem: Switch to copy_to_iter_mcsafe()
    dax: Report bytes remaining in dax_iomap_actor()
    dax: Introduce a ->copy_to_iter dax operation
    uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
    xfs, dax: introduce xfs_break_dax_layouts()
    xfs: prepare xfs_break_layouts() for another layout type
    xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
    mm, fs, dax: handle layout changes to pinned dax mappings
    mm: fix __gup_device_huge vs unmap
    mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
    ...

    Linus Torvalds
     

08 Jun, 2018

2 commits

  • Create Makefile and Kconfig for autofs module.

    [raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
    Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
    Link: http://lkml.kernel.org/r/152626705591.28589.356365986974038383.stgit@pluto.themaw.net
    Signed-off-by: Ian Kent
    Tested-by: Randy Dunlap
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • With the addition of memfd hugetlbfs support, we now have the situation
    where memfd depends on TMPFS -or- HUGETLBFS. Previously, memfd was only
    supported on tmpfs, so it made sense that the code resided in shmem.c.
    In the current code, memfd is only functional if TMPFS is defined. If
    HUGETLFS is defined and TMPFS is not defined, then memfd functionality
    will not be available for hugetlbfs. This does not cause BUGs, just a
    lack of potentially desired functionality.

    Code is restructured in the following way:
    - include/linux/memfd.h is a new file containing memfd specific
    definitions previously contained in shmem_fs.h.
    - mm/memfd.c is a new file containing memfd specific code previously
    contained in shmem.c.
    - memfd specific code is removed from shmem_fs.h and shmem.c.
    - A new config option MEMFD_CREATE is added that is defined if TMPFS
    or HUGETLBFS is defined.

    No functional changes are made to the code: restructuring only.

    Link: http://lkml.kernel.org/r/20180415182119.4517-4-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Khalid Aziz
    Cc: Andrea Arcangeli
    Cc: David Herrmann
    Cc: Hugh Dickins
    Cc: Marc-Andr Lureau
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

22 May, 2018

1 commit

  • In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
    be able to rely on the fact that they will get wakeups on dev_pagemap
    page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
    generic_dax_page_free() as common indicator / infrastructure for dax
    filesytems to require. With this change there are no users of the
    MEMORY_DEVICE_HOST designation, so remove it.

    The HMM sub-system extended dev_pagemap to arrange a callback when a
    dev_pagemap managed page is freed. Since a dev_pagemap page is free /
    idle when its reference count is 1 it requires an additional branch to
    check the page-type at put_page() time. Given put_page() is a hot-path
    we do not want to incur that check if HMM is not in use, so a static
    branch is used to avoid that overhead when not necessary.

    Now, the FS_DAX implementation wants to reuse this mechanism for
    receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
    static-key into a generic mechanism that either HMM or FS_DAX code paths
    can enable.

    For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
    care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
    However, we still need to support FS_DAX in the FS_DAX_LIMITED case
    implemented by the s390/dcssblk driver.

    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Michal Hocko
    Reported-by: kbuild test robot
    Reported-by: Thomas Meyer
    Reported-by: Dave Jiang
    Cc: "Jérôme Glisse"
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dan Williams

    Dan Williams
     

28 Apr, 2018

1 commit


17 Apr, 2018

1 commit


07 Feb, 2018

1 commit

  • Pull libnvdimm updates from Ross Zwisler:

    - Require struct page by default for filesystem DAX to remove a number
    of surprising failure cases. This includes failures with direct I/O,
    gdb and fork(2).

    - Add support for the new Platform Capabilities Structure added to the
    NFIT in ACPI 6.2a. This new table tells us whether the platform
    supports flushing of CPU and memory controller caches on unexpected
    power loss events.

    - Revamp vmem_altmap and dev_pagemap handling to clean up code and
    better support future future PCI P2P uses.

    - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
    become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
    spec, and instead rely on the generic ND_CMD_CALL approach used by
    the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.

    - Enhance nfit_test so we can test some of the new things added in
    version 1.6 of the DSM specification. This includes testing firmware
    download and simulating the Last Shutdown State (LSS) status.

    * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
    libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
    acpi, nfit: fix register dimm error handling
    libnvdimm, namespace: make min namespace size 4K
    tools/testing/nvdimm: force nfit_test to depend on instrumented modules
    libnvdimm/nfit_test: adding support for unit testing enable LSS status
    libnvdimm/nfit_test: add firmware download emulation
    nfit-test: Add platform cap support from ACPI 6.2a to test
    libnvdimm: expose platform persistence attribute for nd_region
    acpi: nfit: add persistent memory control flag for nd_region
    acpi: nfit: Add support for detect platform CPU cache flush on power loss
    device-dax: Fix trailing semicolon
    libnvdimm, btt: fix uninitialized err_lock
    dax: require 'struct page' by default for filesystem dax
    ext2: auto disable dax instead of failing mount
    ext4: auto disable dax instead of failing mount
    mm, dax: introduce pfn_t_special()
    mm: Fix devm_memremap_pages() collision handling
    mm: Fix memory size alignment in devm_memremap_pages_release()
    memremap: merge find_dev_pagemap into get_dev_pagemap
    memremap: change devm_memremap_pages interface to use struct dev_pagemap
    ...

    Linus Torvalds
     

02 Feb, 2018

1 commit

  • Pull staging/IIO updates from Greg KH:
    "Here is the big Staging and IIO driver patches for 4.16-rc1.

    There is the normal amount of new IIO drivers added, like all
    releases.

    The networking IPX and the ncpfs filesystem are moved into the staging
    tree, as they are on their way out of the kernel due to lack of use
    anymore.

    The visorbus subsystem finall has started moving out of the staging
    tree to the "real" part of the kernel, and the most and fsl-mc
    codebases are almost ready to move out, that will probably happen for
    4.17-rc1 if all goes well.

    Other than that, there is a bunch of license header cleanups in the
    tree, along with the normal amount of coding style churn that we all
    know and love for this codebase. I also got frustrated at the
    Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
    huge chunks of it that were never even being used.

    Full details of everything is in the shortlog.

    All of these patches have been in linux-next for a while with no
    reported issues"

    * tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits)
    staging: rtlwifi: remove redundant initialization of 'cfg_cmd'
    staging: rtl8723bs: remove a couple of redundant initializations
    staging: comedi: reformat lines to 80 chars or less
    staging: lustre: separate a connection destroy from free struct kib_conn
    Staging: rtl8723bs: Use !x instead of NULL comparison
    Staging: rtl8723bs: Remove dead code
    Staging: rtl8723bs: Change names to conform to the kernel code
    staging: ccree: Fix missing blank line after declaration
    staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd'
    staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST
    staging: fbtft: remove unused FB_TFT_SSD1325 kconfig
    staging: comedi: dt2811: remove redundant initialization of 'ns'
    staging: wilc1000: fix alignments to match open parenthesis
    staging: wilc1000: removed unnecessary defined enums typedef
    staging: wilc1000: remove unnecessary use of parentheses
    staging: rtl8192u: remove redundant initialization of 'timeout'
    staging: sm750fb: fix CamelCase for dispSet var
    staging: lustre: lnet/selftest: fix compile error on UP build
    staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons
    staging: rts5208: Fix "seg_no" calculation in reset_ms_card()
    ...

    Linus Torvalds
     

20 Jan, 2018

1 commit

  • If a dax buffer from a device that does not map pages is passed to
    read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
    gdb attempts to examine the contents of a dax buffer from a device that
    does not map pages it triggers SIGBUS. If fork(2) is called on a process
    with a dax mapping from a device that does not map pages it triggers
    SIGBUS. 'struct page' is required otherwise several kernel code paths
    break in surprising ways. Disable filesystem-dax on devices that do not
    map pages.

    In addition to needing pfn_to_page() to be valid we also require devmap
    pages. We need this to detect dax pages in the get_user_pages_fast()
    path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
    drivers that have not supported get_user_pages() to date we allow them
    to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
    option which requires ->direct_access() to return pfn_t_special() pfns.
    This leaves DAX support in brd disabled and scheduled for removal.

    Note that when the initial dax support was being merged a few years back
    there was concern that struct page was unsuitable for use with next
    generation persistent memory devices. The theoretical concern was that
    struct page access, being such a hotly used data structure in the
    kernel, would lead to media wear out. While that was a reasonable
    conservative starting position it has not held true in practice. We have
    long since committed to using devm_memremap_pages() to support higher
    order kernel functionality that needs get_user_pages() and
    pfn_to_page().

    Cc: Jeff Moyer
    Cc: Ross Zwisler
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Gerald Schaefer
    Signed-off-by: Dan Williams

    Dan Williams
     

02 Jan, 2018

1 commit

  • This link is replicated in most filesystems' config stanzas. Referring
    to an archived version of that site is pointless as it mostly deals with
    patches; user documentation is available elsewhere.

    Signed-off-by: Adam Borowski
    CC: Alexander Viro
    Reviewed-by: Darrick J. Wong
    Acked-by: Jan Kara
    Acked-by: Dave Kleikamp
    Acked-by: David Sterba
    Acked-by: "Yan, Zheng"
    Acked-by: Chao Yu
    Acked-by: Jaegeuk Kim
    Acked-by: Steve French
    Signed-off-by: Jonathan Corbet

    Adam Borowski
     

28 Nov, 2017

1 commit


13 Jul, 2017

1 commit

  • As of commit bf3eac84c42d ("percpu-rwsem: kill CONFIG_PERCPU_RWSEM") we
    unconditionally build pcpu-rwsems. Remove a leftover in for
    FILE_LOCKING.

    Link: http://lkml.kernel.org/r/20170518180115.2794-1-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

09 May, 2017

1 commit

  • For configurations that do not enable DAX filesystems or drivers, do not
    require the DAX core to be built.

    Given that the 'direct_access' method has been removed from
    'block_device_operations', we can also go ahead and remove the
    block-related dax helper functions from fs/block_dev.c to
    drivers/dax/super.c. This keeps dax details out of the block layer and
    lets the DAX core be built as a module in the FS_DAX=n case.

    Filesystems need to include dax.h to call bdev_dax_supported().

    Cc: linux-xfs@vger.kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Matthew Wilcox
    Cc: Alexander Viro
    Cc: "Darrick J. Wong"
    Cc: Ross Zwisler
    Reviewed-by: Jan Kara
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Dan Williams

    Dan Williams
     

25 Jan, 2017

1 commit

  • As reported by Arnd:

    https://lkml.org/lkml/2017/1/10/756

    Compiling with the following configuration:

    # CONFIG_EXT2_FS is not set
    # CONFIG_EXT4_FS is not set
    # CONFIG_XFS_FS is not set
    # CONFIG_FS_IOMAP depends on the above filesystems, as is not set
    CONFIG_FS_DAX=y

    generates build warnings about unused functions in fs/dax.c:

    fs/dax.c:878:12: warning: `dax_insert_mapping' defined but not used [-Wunused-function]
    static int dax_insert_mapping(struct address_space *mapping,
    ^~~~~~~~~~~~~~~~~~
    fs/dax.c:572:12: warning: `copy_user_dax' defined but not used [-Wunused-function]
    static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
    ^~~~~~~~~~~~~
    fs/dax.c:542:12: warning: `dax_load_hole' defined but not used [-Wunused-function]
    static int dax_load_hole(struct address_space *mapping, void **entry,
    ^~~~~~~~~~~~~
    fs/dax.c:312:14: warning: `grab_mapping_entry' defined but not used [-Wunused-function]
    static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
    ^~~~~~~~~~~~~~~~~~

    Now that the struct buffer_head based DAX fault paths and I/O path have
    been removed we really depend on iomap support being present for DAX.
    Make this explicit by selecting FS_IOMAP if we compile in DAX support.

    This allows us to remove conditional selections of FS_IOMAP when FS_DAX
    was present for ext2 and ext4, and to remove an #ifdef in fs/dax.c.

    Link: http://lkml.kernel.org/r/1484087383-29478-1-git-send-email-ross.zwisler@linux.intel.com
    Signed-off-by: Ross Zwisler
    Reported-by: Arnd Bergmann
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

17 Dec, 2016

1 commit

  • Pull vfs updates from Al Viro:

    - more ->d_init() stuff (work.dcache)

    - pathname resolution cleanups (work.namei)

    - a few missing iov_iter primitives - copy_from_iter_full() and
    friends. Either copy the full requested amount, advance the iterator
    and return true, or fail, return false and do _not_ advance the
    iterator. Quite a few open-coded callers converted (and became more
    readable and harder to fuck up that way) (work.iov_iter)

    - several assorted patches, the big one being logfs removal

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    logfs: remove from tree
    vfs: fix put_compat_statfs64() does not handle errors
    namei: fold should_follow_link() with the step into not-followed link
    namei: pass both WALK_GET and WALK_MORE to should_follow_link()
    namei: invert WALK_PUT logics
    namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
    namei: saner calling conventions for mountpoint_last()
    namei.c: get rid of user_path_parent()
    switch getfrag callbacks to ..._full() primitives
    make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
    [iov_iter] new primitives - copy_from_iter_full() and friends
    don't open-code file_inode()
    ceph: switch to use of ->d_init()
    ceph: unify dentry_operations instances
    lustre: switch to use of ->d_init()

    Linus Torvalds
     

15 Dec, 2016

1 commit

  • Logfs was introduced to the kernel in 2009, and hasn't seen any non
    drive-by changes since 2012, while having lots of unsolved issues
    including the complete lack of error handling, with more and more
    issues popping up without any fixes.

    The logfs.org domain has been bouncing from a mail, and the maintainer
    on the non-logfs.org domain hasn't repsonded to past queries either.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

08 Nov, 2016

1 commit


08 Oct, 2016

1 commit

  • Avoid making ifdef get pretty unwieldy if many ARCHs support gigantic
    page. No functional change with this patch.

    Link: http://lkml.kernel.org/r/1475227569-63446-2-git-send-email-xieyisheng1@huawei.com
    Signed-off-by: Yisheng Xie
    Suggested-by: Michal Hocko
    Acked-by: Michal Hocko
    Acked-by: Naoya Horiguchi
    Acked-by: Hillf Danton
    Cc: Hanjun Guo
    Cc: Will Deacon
    Cc: Dave Hansen
    Cc: Sudeep Holla
    Cc: Catalin Marinas
    Cc: Mark Rutland
    Cc: Rob Herring
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yisheng Xie
     

22 Sep, 2016

1 commit

  • As Oleg suggested, replace file_lock_list with a structure containing
    the hlist head and a spinlock.

    This completely removes the lglock from fs/locks.

    Suggested-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Al Viro
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Cc: der.herr@hofr.at
    Cc: paulmck@linux.vnet.ibm.com
    Cc: riel@redhat.com
    Cc: tj@kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

05 Aug, 2016

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Trond made a change to the server's tcp logic that allows a fast
    client to better take advantage of high bandwidth networks, but may
    increase the risk that a single client could starve other clients;
    a new sunrpc.svc_rpc_per_connection_limit parameter should help
    mitigate this in the (hopefully unlikely) event this becomes a
    problem in practice.

    - Tom Haynes added a minimal flex-layout pnfs server, which is of no
    use in production for now--don't build it unless you're doing
    client testing or further server development"

    * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
    nfsd: remove some dead code in nfsd_create_locked()
    nfsd: drop unnecessary MAY_EXEC check from create
    nfsd: clean up bad-type check in nfsd_create_locked
    nfsd: remove unnecessary positive-dentry check
    nfsd: reorganize nfsd_create
    nfsd: check d_can_lookup in fh_verify of directories
    nfsd: remove redundant zero-length check from create
    nfsd: Make creates return EEXIST instead of EACCES
    SUNRPC: Detect immediate closure of accepted sockets
    SUNRPC: accept() may return sockets that are still in SYN_RECV
    nfsd: allow nfsd to advertise multiple layout types
    nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
    nfsd/blocklayout: Make sure calculate signature/designator length aligned
    xfs: abstract block export operations from nfsd layouts
    SUNRPC: Remove unused callback xpo_adjust_wspace()
    SUNRPC: Change TCP socket space reservation
    SUNRPC: Add a server side per-connection limit
    SUNRPC: Micro optimisation for svc_data_ready
    SUNRPC: Call the default socket callbacks instead of open coding
    SUNRPC: lock the socket while detaching it
    ...

    Linus Torvalds
     

16 Jul, 2016

1 commit


21 Jun, 2016

1 commit

  • Add infrastructure for multipage buffered writes. This is implemented
    using an main iterator that applies an actor function to a range that
    can be written.

    This infrastucture is used to implement a buffered write helper, one
    to zero file ranges and one to implement the ->page_mkwrite VM
    operations. All of them borrow a fair amount of code from fs/buffers.
    for now by using an internal version of __block_write_begin that
    gets passed an iomap and builds the corresponding buffer head.

    The file system is gets a set of paired ->iomap_begin and ->iomap_end
    calls which allow it to map/reserve a range and get a notification
    once the write code is finished with it.

    Based on earlier code from Dave Chinner.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bob Peterson
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

20 May, 2016

1 commit

  • Currently the handling of huge pages for DAX is racy. For example the
    following can happen:

    CPU0 (THP write fault) CPU1 (normal read fault)

    __dax_pmd_fault() __dax_fault()
    get_block(inode, block, &bh, 0) -> not mapped
    get_block(inode, block, &bh, 0)
    -> not mapped
    if (!buffer_mapped(&bh) && write)
    get_block(inode, block, &bh, 1) -> allocates blocks
    truncate_pagecache_range(inode, lstart, lend);
    dax_load_hole();

    This results in data corruption since process on CPU1 won't see changes
    into the file done by CPU0.

    The race can happen even if two normal faults race however with THP the
    situation is even worse because the two faults don't operate on the same
    entries in the radix tree and we want to use these entries for
    serialization. So make THP support in DAX code depend on CONFIG_BROKEN
    for now.

    Signed-off-by: Jan Kara
    Signed-off-by: Ross Zwisler

    Jan Kara
     

27 Mar, 2016

1 commit

  • Pull orangefs filesystem from Mike Marshall.

    This finally merges the long-pending orangefs filesystem, which has been
    much cleaned up with input from Al Viro over the last six months. From
    the documentation file:

    "OrangeFS is an LGPL userspace scale-out parallel storage system. It
    is ideal for large storage problems faced by HPC, BigData, Streaming
    Video, Genomics, Bioinformatics.

    Orangefs, originally called PVFS, was first developed in 1993 by Walt
    Ligon and Eric Blumer as a parallel file system for Parallel Virtual
    Machine (PVM) as part of a NASA grant to study the I/O patterns of
    parallel programs.

    Orangefs features include:

    - Distributes file data among multiple file servers
    - Supports simultaneous access by multiple clients
    - Stores file data and metadata on servers using local file system
    and access methods
    - Userspace implementation is easy to install and maintain
    - Direct MPI support
    - Stateless"

    see Documentation/filesystems/orangefs.txt for more in-depth details.

    * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits)
    orangefs: fix orangefs_superblock locking
    orangefs: fix do_readv_writev() handling of error halfway through
    orangefs: have ->kill_sb() evict the VFS side of things first
    orangefs: sanitize ->llseek()
    orangefs-bufmap.h: trim unused junk
    orangefs: saner calling conventions for getting a slot
    orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
    orangefs: get rid of readdir_handle_s
    ornagefs: ensure that truncate has an up to date inode size
    orangefs: move code which sets i_link to orangefs_inode_getattr
    orangefs: remove needless wrapper around GFP_KERNEL
    orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
    orangefs: refactor inode type or link_target change detection
    orangefs: use new getattr for revalidate and remove old getattr
    orangefs: use new getattr in inode getattr and permission
    orangefs: use new orangefs_inode_getattr to get size in write and llseek
    orangefs: use new orangefs_inode_getattr to create new inodes
    orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
    orangefs: remove inode->i_lock wrapper
    orangefs: put register_chrdev immediately before register_filesystem
    ...

    Linus Torvalds
     

18 Mar, 2016

1 commit

  • This patch adds the renamed functions moved from the f2fs crypto files.

    1. definitions for per-file encryption used by ext4 and f2fs.

    2. crypto.c for encrypt/decrypt functions
    a. IO preparation:
    - fscrypt_get_ctx / fscrypt_release_ctx
    b. before IOs:
    - fscrypt_encrypt_page
    - fscrypt_decrypt_page
    - fscrypt_zeroout_range
    c. after IOs:
    - fscrypt_decrypt_bio_pages
    - fscrypt_pullback_bio_page
    - fscrypt_restore_control_page

    3. policy.c supporting context management.
    a. For ioctls:
    - fscrypt_process_policy
    - fscrypt_get_policy
    b. For context permission
    - fscrypt_has_permitted_context
    - fscrypt_inherit_context

    4. keyinfo.c to handle permissions
    - fscrypt_get_encryption_info
    - fscrypt_free_encryption_info

    5. fname.c to support filename encryption
    a. general wrapper functions
    - fscrypt_fname_disk_to_usr
    - fscrypt_fname_usr_to_disk
    - fscrypt_setup_filename
    - fscrypt_free_filename

    b. specific filename handling functions
    - fscrypt_fname_alloc_buffer
    - fscrypt_fname_free_buffer

    6. Makefile and Kconfig

    Cc: Al Viro
    Signed-off-by: Michael Halcrow
    Signed-off-by: Ildar Muslukhov
    Signed-off-by: Uday Savagaonkar
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jaegeuk Kim

    Jaegeuk Kim
     

15 Mar, 2016

1 commit


16 Jan, 2016

2 commits

  • Now that the get_user_pages() path knows how to handle dax-pmd mappings,
    remove the protections that disabled dax-pmd support.

    Tests available from github.com/pmem/ndctl:

    make TESTS="lib/test-dax.sh lib/test-mmap.sh" check

    Signed-off-by: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Merge tag 'v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into current

    Linux 4.4

    Mike Marshall
     

13 Jan, 2016

1 commit

  • Pull file locking updates from Jeff Layton:
    "File locking related changes for v4.5 (pile #1)

    Highlights:
    - new Kconfig option to allow disabling mandatory locking (which is
    racy anyway)
    - new tracepoints for setlk and close codepaths
    - fix for a long-standing bug in code that handles races between
    setting a POSIX lock and close()"

    * tag 'locks-v4.5-1' of git://git.samba.org/jlayton/linux:
    locks: rename __posix_lock_file to posix_lock_inode
    locks: prink more detail when there are leaked locks
    locks: pass inode pointer to locks_free_lock_context
    locks: sprinkle some tracepoints around the file locking code
    locks: don't check for race with close when setting OFD lock
    locks: fix unlock when fcntl_setlk races with a close
    fs: make locks.c explicitly non-modular
    locks: use list_first_entry_or_null()
    locks: Don't allow mounts in user namespaces to enable mandatory locking
    locks: Allow disabling mandatory locking at compile time

    Linus Torvalds
     

17 Nov, 2015

1 commit

  • While dax pmd mappings are functional in the nominal path they trigger
    kernel crashes in the following paths:

    BUG: unable to handle kernel paging request at ffffea0004098000
    IP: [] follow_trans_huge_pmd+0x117/0x3b0
    [..]
    Call Trace:
    [] follow_page_mask+0x2d3/0x380
    [] __get_user_pages+0xe8/0x6f0
    [] get_user_pages_unlocked+0x165/0x1e0
    [] get_user_pages_fast+0xa1/0x1b0

    kernel BUG at arch/x86/mm/gup.c:131!
    [..]
    Call Trace:
    [] gup_pud_range+0x1bc/0x220
    [] get_user_pages_fast+0x124/0x1b0

    BUG: unable to handle kernel paging request at ffffea0004088000
    IP: [] copy_huge_pmd+0x159/0x350
    [..]
    Call Trace:
    [] copy_page_range+0x34c/0x9f0
    [] copy_process+0x1b7f/0x1e10
    [] _do_fork+0x91/0x590

    All of these paths are interpreting a dax pmd mapping as a transparent
    huge page and making the assumption that the pfn is covered by the
    memmap, i.e. that the pfn has an associated struct page. PTE mappings
    do not suffer the same fate since they have the _PAGE_SPECIAL flag to
    cause the gup path to fault. We can do something similar for the PMD
    path, or otherwise defer pmd support for cases where a struct page is
    available. For now, 4.4-rc and -stable need to disable dax pmd support
    by default.

    For development the "depends on BROKEN" line can be removed from
    CONFIG_FS_DAX_PMD.

    Cc:
    Cc: Jan Kara
    Cc: Dave Chinner
    Cc: Matthew Wilcox
    Cc: Kirill A. Shutemov
    Reported-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

16 Nov, 2015

1 commit

  • Mandatory locking appears to be almost unused and buggy and there
    appears no real interest in doing anything with it. Since effectively
    no one uses the code and since the code is buggy let's allow it to be
    disabled at compile time. I would just suggest removing the code but
    undoubtedly that will break some piece of userspace code somewhere.

    For the distributions that don't care about this piece of code
    this gives a nice starting point to make mandatory locking go away.

    Cc: Benjamin Coddington
    Cc: Dmitry Vyukov
    Cc: Jeff Layton
    Cc: J. Bruce Fields
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Jeff Layton

    Jeff Layton
     

03 Oct, 2015

1 commit


24 Jul, 2015

1 commit

  • The functionality of ext3 is fully supported by ext4 driver. Major
    distributions (SUSE, RedHat) already use ext4 driver to handle ext3
    filesystems for quite some time. There is some ugliness in mm resulting
    from jbd cleaning buffers in a dirty page without cleaning page dirty
    bit and also support for buffer bouncing in the block layer when stable
    pages are required is there only because of jbd. So let's remove the
    ext3 driver. This saves us some 28k lines of duplicated code.

    Acked-by: Theodore Ts'o
    Signed-off-by: Jan Kara

    Jan Kara
     

11 Apr, 2015

1 commit


17 Feb, 2015

2 commits

  • The DAX code accesses the underlying storage through the kernel's linear
    mapping, which may not be cache-coherent with user mappings on ARM, MIPS
    or SPARC. Temporarily disable the DAX code until this problem is
    resolved.

    The original XIP code also had this problem, but it was never noticed.

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Ralf Baechle
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The fewer Kconfig options we have the better. Use the generic
    CONFIG_FS_DAX to enable XIP support in ext2 as well as in the core.

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

05 Jan, 2015

1 commit

  • efivars is currently enabled under MISC_FILESYSTEMS, which is decribed
    as "such as filesystems that came from other operating systems".
    In reality, it is a pseudo filesystem, providing access to the kernel
    UEFI variable interface.

    Since this is the preferred interface for accessing UEFI variables, over
    the legacy efivars interface, also build it by default as a module if
    CONFIG_EFI.

    Signed-off-by: Leif Lindholm
    Signed-off-by: Matt Fleming

    Leif Lindholm
     

24 Oct, 2014

1 commit

  • Overlayfs allows one, usually read-write, directory tree to be
    overlaid onto another, read-only directory tree. All modifications
    go to the upper, writable layer.

    This type of mechanism is most often used for live CDs but there's a
    wide variety of other uses.

    The implementation differs from other "union filesystem"
    implementations in that after a file is opened all operations go
    directly to the underlying, lower or upper, filesystems. This
    simplifies the implementation and allows native performance in these
    cases.

    The dentry tree is duplicated from the underlying filesystems, this
    enables fast cached lookups without adding special support into the
    VFS. This uses slightly more memory than union mounts, but dentries
    are relatively small.

    Currently inodes are duplicated as well, but it is a possible
    optimization to share inodes for non-directories.

    Opening non directories results in the open forwarded to the
    underlying filesystem. This makes the behavior very similar to union
    mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file
    descriptors).

    Usage:

    mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay

    The following cotributions have been folded into this patch:

    Neil Brown :
    - minimal remount support
    - use correct seek function for directories
    - initialise is_real before use
    - rename ovl_fill_cache to ovl_dir_read

    Felix Fietkau :
    - fix a deadlock in ovl_dir_read_merged
    - fix a deadlock in ovl_remove_whiteouts

    Erez Zadok
    - fix cleanup after WARN_ON

    Sedat Dilek
    - fix up permission to confirm to new API

    Robin Dong
    - fix possible leak in ovl_new_inode
    - create new inode in ovl_link

    Andy Whitcroft
    - switch to __inode_permission()
    - copy up i_uid/i_gid from the underlying inode

    AV:
    - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits
    - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(),
    lack of check for udir being the parent of upper, dropping and regaining
    the lock on udir (which would require _another_ check for parent being
    right).
    - bogus d_drop() in copyup and rename [fix from your mail]
    - copyup/remove and copyup/rename races [fix from your mail]
    - ovl_dir_fsync() leaving ERR_PTR() in ->realfile
    - ovl_entry_free() is pointless - it's just a kfree_rcu()
    - fold ovl_do_lookup() into ovl_lookup()
    - manually assigning ->d_op is wrong. Just use ->s_d_op.
    [patches picked from Miklos]:
    * copyup/remove and copyup/rename races
    * bogus d_drop() in copyup and rename

    Also thanks to the following people for testing and reporting bugs:

    Jordi Pujol
    Andy Whitcroft
    Michal Suchanek
    Felix Fietkau
    Erez Zadok
    Randy Dunlap

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

18 Sep, 2014

1 commit

  • Currently, all of the grace period handling is part of lockd. Eventually
    though we'd like to be able to build v4-only servers, at which point
    we'll need to put all of this elsewhere.

    Move the code itself into fs/nfs_common and have it build a grace.ko
    module. Then, rejigger the Kconfig options so that both nfsd and lockd
    enable it automatically.

    Signed-off-by: Jeff Layton

    Jeff Layton