17 May, 2016

1 commit

  • Fault handlers currently take complete_unwritten argument to convert
    unwritten extents after PTEs are updated. However no filesystem uses
    this anymore as the code is racy. Remove the unused argument.

    Reviewed-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Signed-off-by: Vishal Verma

    Jan Kara
     

28 Feb, 2016

1 commit

  • As it is currently written ext4_dax_mkwrite() assumes that the call into
    __dax_mkwrite() will not have to do a block allocation so it doesn't create
    a journal entry. For a read that creates a zero page to cover a hole
    followed by a write that actually allocates storage this is incorrect. The
    ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
    get_blocks() to allocate storage.

    Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
    as this function already has all the logic needed to allocate a journal
    entry and call __dax_fault().

    Also update the ext2 fault handlers in this same way to remove duplicate
    code and keep the logic between ext2 and ext4 the same.

    Reviewed-by: Jan Kara
    Signed-off-by: Ross Zwisler
    Signed-off-by: Theodore Ts'o

    Ross Zwisler
     

23 Jan, 2016

1 commit

  • To properly support the new DAX fsync/msync infrastructure filesystems
    need to call dax_pfn_mkwrite() so that DAX can track when user pages are
    dirtied.

    Signed-off-by: Ross Zwisler
    Cc: "H. Peter Anvin"
    Cc: "J. Bruce Fields"
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andreas Dilger
    Cc: Dave Chinner
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jeff Layton
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

19 Oct, 2015

1 commit

  • Add locking to ensure that DAX faults are isolated from ext2 operations
    that modify the data blocks allocation for an inode. This is intended to
    be analogous to the work being done in XFS by Dave Chinner:

    http://www.spinics.net/lists/linux-fsdevel/msg90260.html

    Compared with XFS the ext2 case is greatly simplified by the fact that ext2
    already allocates and zeros new blocks before they are returned as part of
    ext2_get_block(), so DAX doesn't need to worry about getting unmapped or
    unwritten buffer heads.

    This means that the only work we need to do in ext2 is to isolate the DAX
    faults from inode block allocation changes. I believe this just means that
    we need to isolate the DAX faults from truncate operations.

    The newly introduced dax_sem is intended to replicate the protection
    offered by i_mmaplock in XFS. In addition to truncate the i_mmaplock also
    protects XFS operations like hole punching, fallocate down, extent
    manipulation IOCTLS like xfs_ioc_space() and extent swapping. Truncate is
    the only one of these operations supported by ext2.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Jan Kara

    Ross Zwisler
     

09 Sep, 2015

2 commits

  • Use DAX to provide support for huge pages.

    Signed-off-by: Matthew Wilcox
    Cc: Hillf Danton
    Cc: "Kirill A. Shutemov"
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
    return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
    defined in linux/mm.h. Given that we don't want to include
    in , the easiest solution is to move the DAX-related
    functions to a new header, . We could also have moved
    VM_FAULT_* definitions to a new header, or a different header that isn't
    quite such a boil-the-ocean header as , but this felt like
    the best option.

    Signed-off-by: Matthew Wilcox
    Cc: Hillf Danton
    Cc: "Kirill A. Shutemov"
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

04 Jun, 2015

1 commit

  • dax_fault() currently relies on the get_block callback to attach an
    io completion callback to the mapping buffer head so that it can
    run unwritten extent conversion after zeroing allocated blocks.

    Instead of this hack, pass the conversion callback directly into
    dax_fault() similar to the get_block callback. When the filesystem
    allocates unwritten extents, it will set the buffer_unwritten()
    flag, and hence the dax_fault code can call the completion function
    in the contexts where it is necessary without overloading the
    mapping buffer head.

    Note: The changes to ext4 to use this interface are suspect at best.
    In fact, the way ext4 did this end_io assignment in the first place
    looks suspect because it only set a completion callback when there
    wasn't already some other write() call taking place on the same
    inode. The ext4 end_io code looks rather intricate and fragile with
    all it's reference counting and passing to different contexts for
    modification via inode private pointers that aren't protected by
    locks...

    Signed-off-by: Dave Chinner
    Acked-by: Jan Kara
    Signed-off-by: Dave Chinner

    Dave Chinner
     

16 Apr, 2015

3 commits

  • Merge second patchbomb from Andrew Morton:

    - the rest of MM

    - various misc bits

    - add ability to run /sbin/reboot at reboot time

    - printk/vsprintf changes

    - fiddle with seq_printf() return value

    * akpm: (114 commits)
    parisc: remove use of seq_printf return value
    lru_cache: remove use of seq_printf return value
    tracing: remove use of seq_printf return value
    cgroup: remove use of seq_printf return value
    proc: remove use of seq_printf return value
    s390: remove use of seq_printf return value
    cris fasttimer: remove use of seq_printf return value
    cris: remove use of seq_printf return value
    openrisc: remove use of seq_printf return value
    ARM: plat-pxa: remove use of seq_printf return value
    nios2: cpuinfo: remove use of seq_printf return value
    microblaze: mb: remove use of seq_printf return value
    ipc: remove use of seq_printf return value
    rtc: remove use of seq_printf return value
    power: wakeup: remove use of seq_printf return value
    x86: mtrr: if: remove use of seq_printf return value
    linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
    MAINTAINERS: CREDITS: remove Stefano Brivio from B43
    .mailmap: add Ricardo Ribalda
    CREDITS: add Ricardo Ribalda Delgado
    ...

    Linus Torvalds
     
  • The original dax patchset split the ext2/4_file_operations because of the
    two NULL splice_read/splice_write in the dax case.

    In the vfs if splice_read/splice_write are NULL we then call
    default_splice_read/write.

    What we do here is make generic_file_splice_read aware of IS_DAX() so the
    original ext2/4_file_operations can be used as is.

    For write it appears that iter_file_splice_write is just fine. It uses
    the regular f_op->write(file,..) or new_sync_write(file, ...).

    Signed-off-by: Boaz Harrosh
    Reviewed-by: Jan Kara
    Cc: Dave Chinner
    Cc: Matthew Wilcox
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Boaz Harrosh
     
  • From: Yigal Korman

    [v1]
    Without this patch, c/mtime is not updated correctly when mmap'ed page is
    first read from and then written to.

    A new xfstest is submitted for testing this (generic/080)

    [v2]
    Jan Kara has pointed out that if we add the
    sb_start/end_pagefault pair in the new pfn_mkwrite we
    are then fixing another bug where: A user could start
    writing to the page while filesystem is frozen.

    Signed-off-by: Yigal Korman
    Signed-off-by: Boaz Harrosh
    Reviewed-by: Jan Kara
    Cc: Matthew Wilcox
    Cc: Dave Chinner
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Kirill A. Shutemov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Boaz Harrosh
     

12 Apr, 2015

1 commit

  • All places outside of core VFS that checked ->read and ->write for being NULL or
    called the methods directly are gone now, so NULL {read,write} with non-NULL
    {read,write}_iter will do the right thing in all cases.

    Signed-off-by: Al Viro

    Al Viro
     

17 Feb, 2015

4 commits

  • To help people transition, accept the 'xip' mount option (and report it in
    /proc/mounts), but print a message encouraging people to switch over to
    the 'dax' option.

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Mathieu Desnoyers
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • The fewer Kconfig options we have the better. Use the generic
    CONFIG_FS_DAX to enable XIP support in ext2 as well as in the core.

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Instead of calling aops->get_xip_mem from the fault handler, the
    filesystem passes a get_block_t that is used to find the appropriate
    blocks.

    This requires that all architectures implement copy_user_page(). At the
    time of writing, mips and arm do not. Patches exist and are in progress.

    [akpm@linux-foundation.org: remap_file_pages went away]
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Jan Kara
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Russell King
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Use the generic AIO infrastructure instead of custom read and write
    methods. In addition to giving us support for AIO, this adds the missing
    locking between read() and truncate().

    Signed-off-by: Matthew Wilcox
    Reviewed-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

12 Jun, 2014

1 commit

  • iter_file_splice_write() - a ->splice_write() instance that gathers the
    pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
    it to ->write_iter(). A bunch of simple cases coverted to that...

    [AV: fixed the braino spotted by Cyrill]

    Signed-off-by: Al Viro

    Al Viro
     

07 May, 2014

2 commits


26 Jan, 2014

1 commit


26 Jul, 2011

1 commit

  • Replace the ->check_acl method with a ->get_acl method that simply reads an
    ACL from disk after having a cache miss. This means we can replace the ACL
    checking boilerplate code with a single implementation in namei.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

21 Jul, 2011

1 commit

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

28 May, 2010

3 commits

  • I also have commented a possible bug in existing ext2 code, marked with XXX.

    Cc: linux-ext4@vger.kernel.org
    Cc: Christoph Hellwig
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • We don't name our generic fsync implementations very well currently.
    The no-op implementation for in-memory filesystems currently is called
    simple_sync_file which doesn't make too much sense to start with,
    the the generic one for simple filesystems is called simple_fsync
    which can lead to some confusion.

    This patch renames the generic file fsync method to generic_file_fsync
    to match the other generic_file_* routines it is supposed to be used
    with, and the no-op implementation to noop_fsync to make it obvious
    what to expect. In addition add some documentation for both methods.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

05 Mar, 2010

2 commits

  • Get rid of the initialize dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_initialize helper to __dquot_initialize
    and vfs_dq_init to dquot_initialize to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently various places in the VFS call vfs_dq_init directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the initialization. For most metadata operations
    this is a straight forward move into the methods, but for truncate and
    open it's a bit more complicated.

    For truncate we currently only call vfs_dq_init for the sys_truncate case
    because open already takes care of it for ftruncate and open(O_TRUNC) - the
    new code causes an additional vfs_dq_init for those which is harmless.

    For open the initialization is moved from do_filp_open into the open method,
    which means it happens slightly earlier now, and only for regular files.
    The latter is fine because we don't need to initialize it for operations
    on special files, and we already do it as part of the namespace operations
    for directories.

    Add a dquot_file_open helper that filesystems that support generic quotas
    can use to fill in ->open.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

16 Dec, 2009

1 commit

  • When an IO error happens while writing metadata buffers, we should better
    report it and call ext2_error since the filesystem is probably no longer
    consistent. Sometimes such IO errors happen while flushing thread does
    background writeback, the buffer gets later evicted from memory, and thus
    the only trace of the error remains as AS_EIO bit set in blockdevice's
    mapping. So we check this bit in ext2_fsync and report the error although
    we cannot be really sure which buffer we failed to write.

    Signed-off-by: Jan Kara
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

09 Sep, 2009

1 commit


12 Jun, 2009

1 commit


04 Oct, 2008

1 commit

  • Any block based fs (this patch includes ext3) just has to declare its own
    fiemap() function and then call this generic function with its own
    get_block_t. This works well for block based filesystems that will map
    multiple contiguous blocks at one time, but will work for filesystems that
    only map one block at a time, you will just end up with an "extent" for each
    block. One gotcha is this will not play nicely where there is hole+data
    after the EOF. This function will assume its hit the end of the data as soon
    as it hits a hole after the EOF, so if there is any data past that it will
    not pick that up. AFAIK no block based fs does this anyway, but its in the
    comments of the function anyway just in case.

    Signed-off-by: Josef Bacik
    Signed-off-by: Mark Fasheh
    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org

    Josef Bacik
     

07 Feb, 2008

1 commit


17 Oct, 2007

1 commit

  • Val's cross-port of the ext3 reservations code into ext2.

    [mbligh@mbligh.org: Small type error for printk
    [akpm@linux-foundation.org: fix types, sync with ext3]
    [mbligh@mbligh.org: Bring ext2 reservations code in line with latest ext3]
    [akpm@linux-foundation.org: kill noisy printk]
    [akpm@linux-foundation.org: remember to dirty the gdp's block]
    [akpm@linux-foundation.org: cross-port the missed 5dea5176e5c32ef9f0d1a41d28427b3bf6881b3a]
    [akpm@linux-foundation.org: cross-port e6022603b9aa7d61d20b392e69edcdbbc1789969]
    [akpm@linux-foundation.org: Port the omitted 08fb306fe63d98eb86e3b16f4cc21816fa47f18e]
    [akpm@linux-foundation.org: Backport the missed 20acaa18d0c002fec180956f87adeb3f11f635a6]
    [akpm@linux-foundation.org: fixes]
    [cmm@us.ibm.com: fix reservation extension]
    [bunk@stusta.de: make ext2_get_blocks() static]
    [hugh@veritas.com: fix hang]
    [hugh@veritas.com: ext2_new_blocks should reset the reservation window size]
    [hugh@veritas.com: ext2 balloc: fix off-by-one against rsv_end]
    [hugh@veritas.com: grp_goal 0 is a genuine goal (unlike -1), so ext2_try_to_allocate_with_rsv should treat it as such]
    [hugh@veritas.com: rbtree usage cleanup]
    [pbadari@us.ibm.com: Fix for ext2 reservation]
    [bunk@kernel.org: remove fs/ext2/balloc.c:reserve_blocks()]
    [hugh@veritas.com: ext2 balloc: use io_error label]
    Cc: "Martin J. Bligh"
    Cc: Valerie Henson
    Cc: Mingming Cao
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Signed-off-by: Adrian Bunk
    Signed-off-by: Hugh Dickins
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin J. Bligh
     

17 Jul, 2007

1 commit


10 Jul, 2007

2 commits

  • This patch removes xip_file_sendfile, the sendfile implementation for
    xip without replacement. Those customers that use xip on s390 are not
    using sendfile() as far as we know, and so far s390 is the only platform
    this could potentially be used on so far.
    Having sendfile is not a popular feature for execute in place file
    systems, however we have a working implementation of splice_read() based
    on fs/splice.c if anyone asks for it.
    At this point in time, it does not seem preferable to merge
    splice_read() for xip because it causes extra maintenence effort due to
    code duplication and it requires struct page behind the xip memory
    segment. We'd like to get rid of that in favor of supporting flash based
    embedded platforms (Monta Vista work) soon.

    Signed-off-by: Carsten Otte
    Signed-off-by: Jens Axboe

    Carsten Otte
     
  • They can use generic_file_splice_read() instead. Since sys_sendfile() now
    prefers that, there should be no change in behaviour.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

13 Feb, 2007

1 commit

  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

01 Oct, 2006

3 commits


31 Mar, 2006

1 commit

  • This adds support for the sys_splice system call. Using a pipe as a
    transport, it can connect to files or sockets (latter as output only).

    From the splice.c comments:

    "splice": joining two ropes together by interweaving their strands.

    This is the "extended pipe" functionality, where a pipe is used as
    an arbitrary in-memory buffer. Think of a pipe as a small kernel
    buffer that you can use to transfer data from one end to the other.

    The traditional unix read/write is extended with a "splice()" operation
    that transfers data buffers to or from a pipe buffer.

    Named by Larry McVoy, original implementation from Linus, extended by
    Jens to support splicing to files and fixing the initial implementation
    bugs.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe