05 Apr, 2016

2 commits

  • Mostly direct substitution with occasional adjustment or removing
    outdated comments.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

18 Mar, 2016

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "Performance improvements in SEEK_DATA and xattr scalability
    improvements, plus a lot of clean ups and bug fixes"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (38 commits)
    ext4: clean up error handling in the MMP support
    jbd2: do not fail journal because of frozen_buffer allocation failure
    ext4: use __GFP_NOFAIL in ext4_free_blocks()
    ext4: fix compile error while opening the macro DOUBLE_CHECK
    ext4: print ext4 mount option data_err=abort correctly
    ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
    ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry()
    ext4: fix misspellings in comments.
    jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
    ext4: more efficient SEEK_DATA implementation
    ext4: cleanup handling of bh->b_state in DAX mmap
    ext4: return hole from ext4_map_blocks()
    ext4: factor out determining of hole size
    ext4: fix setting of referenced bit in ext4_es_lookup_extent()
    ext4: remove i_ioend_count
    ext4: simplify io_end handling for AIO DIO
    ext4: move trans handling and completion deferal out of _ext4_get_block
    ext4: rename and split get blocks functions
    ext4: use i_mutex to serialize unaligned AIO DIO
    ext4: pack ioend structure better
    ...

    Linus Torvalds
     

28 Feb, 2016

5 commits

  • Merge fixes from Andrew Morton:
    "10 fixes"

    * emailed patches from Andrew Morton :
    dax: move writeback calls into the filesystems
    dax: give DAX clearing code correct bdev
    ext4: online defrag not supported with DAX
    ext2, ext4: only set S_DAX for regular inodes
    block: disable block device DAX by default
    ocfs2: unlock inode if deleting inode from orphan fails
    mm: ASLR: use get_random_long()
    drivers: char: random: add get_random_long()
    mm: numa: quickly fail allocations for NUMA balancing on full nodes
    mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED

    Linus Torvalds
     
  • As it is currently written ext4_dax_mkwrite() assumes that the call into
    __dax_mkwrite() will not have to do a block allocation so it doesn't create
    a journal entry. For a read that creates a zero page to cover a hole
    followed by a write that actually allocates storage this is incorrect. The
    ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
    get_blocks() to allocate storage.

    Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
    as this function already has all the logic needed to allocate a journal
    entry and call __dax_fault().

    Also update the ext2 fault handlers in this same way to remove duplicate
    code and keep the logic between ext2 and ext4 the same.

    Reviewed-by: Jan Kara
    Signed-off-by: Ross Zwisler
    Signed-off-by: Theodore Ts'o

    Ross Zwisler
     
  • Previously calls to dax_writeback_mapping_range() for all DAX filesystems
    (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

    dax_writeback_mapping_range() needs a struct block_device, and it used
    to get that from inode->i_sb->s_bdev. This is correct for normal inodes
    mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
    block devices and for XFS real-time files.

    Instead, call dax_writeback_mapping_range() directly from the filesystem
    ->writepages function so that it can supply us with a valid block
    device. This also fixes DAX code to properly flush caches in response
    to sync(2).

    Signed-off-by: Ross Zwisler
    Signed-off-by: Jan Kara
    Cc: Al Viro
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • dax_clear_blocks() needs a valid struct block_device and previously it
    was using inode->i_sb->s_bdev in all cases. This is correct for normal
    inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
    DAX raw block devices and for XFS real-time devices.

    Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
    its arguments to take a bdev and a sector instead of an inode and a
    block. This better reflects what the function does, and it allows the
    filesystem and raw block device code to pass in an appropriate struct
    block_device.

    Signed-off-by: Ross Zwisler
    Suggested-by: Dan Williams
    Reviewed-by: Jan Kara
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • When S_DAX is set on an inode we assume that if there are pages attached
    to the mapping (mapping->nrpages != 0), those pages are clean zero pages
    that were used to service reads from holes. Any dirty data associated
    with the inode should be in the form of DAX exceptional entries
    (mapping->nrexceptional) that is written back via
    dax_writeback_mapping_range().

    With the current code, though, this isn't always true. For example,
    ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
    data stored as dirty page cache entries. For these types of inodes,
    having S_DAX set doesn't really make sense since their I/O doesn't
    actually happen through the DAX code path.

    Instead, only allow S_DAX to be set for regular inodes for ext2 and
    ext4. This allows us to have strict DAX vs non-DAX paths in the
    writeback code.

    Signed-off-by: Ross Zwisler
    Reviewed-by: Jan Kara
    Cc: Theodore Ts'o
    Cc: Al Viro
    Cc: Dan Williams
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     

23 Feb, 2016

3 commits

  • To reduce amount of damage caused by single bad block, we limit number
    of inodes sharing an xattr block to 1024. Thus there can be more xattr
    blocks with the same contents when there are lots of files with the same
    extended attributes. These xattr blocks naturally result in hash
    collisions and can form long hash chains and we unnecessarily check each
    such block only to find out we cannot use it because it is already
    shared by too many inodes.

    Add a reusable flag to cache entries which is cleared when a cache entry
    has reached its maximum refcount. Cache entries which are not marked
    reusable are skipped by mb_cache_entry_find_{first,next}. This
    significantly speeds up mbcache when there are many same xattr blocks.
    For example for xattr-bench with 5 values and each process handling
    20000 files, the run for 64 processes is 25x faster with this patch.
    Even for 8 processes the speedup is almost 3x. We have also verified
    that for situations where there is only one xattr block of each kind,
    the patch doesn't have a measurable cost.

    [JK: Remove handling of setting the same value since it is not needed
    anymore, check for races in e_reusable setting, improve changelog,
    add measurements]

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Andreas Gruenbacher
     
  • Since old mbcache code is gone, let's rename new code to mbcache since
    number 2 is now meaningless. This is just a mechanical replacement.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     
  • The conversion is generally straightforward. We convert filesystem from
    a global cache to per-fs one. Similarly to ext4 the tricky part is that
    xattr block corresponding to found mbcache entry can get freed before we
    get buffer lock for that block. So we have to check whether the entry is
    still valid after getting the buffer lock.

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Jan Kara
     

24 Jan, 2016

1 commit

  • Pull final vfs updates from Al Viro:

    - The ->i_mutex wrappers (with small prereq in lustre)

    - a fix for too early freeing of symlink bodies on shmem (they need to
    be RCU-delayed) (-stable fodder)

    - followup to dedupe stuff merged this cycle

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: abort dedupe loop if fatal signals are pending
    make sure that freeing shmem fast symlinks is RCU-delayed
    wrappers for ->i_mutex access
    lustre: remove unused declaration

    Linus Torvalds
     

23 Jan, 2016

2 commits

  • To properly support the new DAX fsync/msync infrastructure filesystems
    need to call dax_pfn_mkwrite() so that DAX can track when user pages are
    dirtied.

    Signed-off-by: Ross Zwisler
    Cc: "H. Peter Anvin"
    Cc: "J. Bruce Fields"
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andreas Dilger
    Cc: Dave Chinner
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jeff Layton
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Dan Williams
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ross Zwisler
     
  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

13 Jan, 2016

1 commit

  • Pull misc vfs updates from Al Viro:
    "All kinds of stuff. That probably should've been 5 or 6 separate
    branches, but by the time I'd realized how large and mixed that bag
    had become it had been too close to -final to play with rebasing.

    Some fs/namei.c cleanups there, memdup_user_nul() introduction and
    switching open-coded instances, burying long-dead code, whack-a-mole
    of various kinds, several new helpers for ->llseek(), assorted
    cleanups and fixes from various people, etc.

    One piece probably deserves special mention - Neil's
    lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
    called without ->i_mutex and tries to avoid ever taking it. That, of
    course, means that it's not useful for any directory modifications,
    but things like getting inode attributes in nfds readdirplus are fine
    with that. I really should've asked for moratorium on lookup-related
    changes this cycle, but since I hadn't done that early enough... I
    *am* asking for that for the coming cycle, though - I'm going to try
    and get conversion of i_mutex to rwsem with ->lookup() done under lock
    taken shared.

    There will be a patch closer to the end of the window, along the lines
    of the one Linus had posted last May - mechanical conversion of
    ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
    inode_is_locked()/inode_lock_nested(). To quote Linus back then:

    -----
    | This is an automated patch using
    |
    | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    | with a very few manual fixups
    -----

    I'm going to send that once the ->i_mutex-affecting stuff in -next
    gets mostly merged (or when Linus says he's about to stop taking
    merges)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    nfsd: don't hold i_mutex over userspace upcalls
    fs:affs:Replace time_t with time64_t
    fs/9p: use fscache mutex rather than spinlock
    proc: add a reschedule point in proc_readfd_common()
    logfs: constify logfs_block_ops structures
    fcntl: allow to set O_DIRECT flag on pipe
    fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
    fs: xattr: Use kvfree()
    [s390] page_to_phys() always returns a multiple of PAGE_SIZE
    nbd: use ->compat_ioctl()
    fs: use block_device name vsprintf helper
    lib/vsprintf: add %*pg format specifier
    fs: use gendisk->disk_name where possible
    poll: plug an unused argument to do_poll
    amdkfd: don't open-code memdup_user()
    cdrom: don't open-code memdup_user()
    rsxx: don't open-code memdup_user()
    mtip32xx: don't open-code memdup_user()
    [um] mconsole: don't open-code memdup_user_nul()
    [um] hostaudio: don't open-code memdup_user()
    ...

    Linus Torvalds
     

12 Jan, 2016

1 commit

  • Pull vfs xattr updates from Al Viro:
    "Andreas' xattr cleanup series.

    It's a followup to his xattr work that went in last cycle; -0.5KLoC"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    xattr handlers: Simplify list operation
    ocfs2: Replace list xattr handler operations
    nfs: Move call to security_inode_listsecurity into nfs_listxattr
    xfs: Change how listxattr generates synthetic attributes
    tmpfs: listxattr should include POSIX ACL xattrs
    tmpfs: Use xattr handler infrastructure
    btrfs: Use xattr handler infrastructure
    vfs: Distinguish between full xattr names and proper prefixes
    posix acls: Remove duplicate xattr name definitions
    gfs2: Remove gfs2_xattr_acl_chmod
    vfs: Remove vfs_xattr_cmp

    Linus Torvalds
     

07 Jan, 2016

1 commit


31 Dec, 2015

1 commit


14 Dec, 2015

1 commit

  • Change the list operation to only return whether or not an attribute
    should be listed. Copying the attribute names into the buffer is moved
    to the callers.

    Since the result only depends on the dentry and not on the attribute
    name, we do not pass the attribute name to list operations.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

09 Dec, 2015

2 commits

  • new method: ->get_link(); replacement of ->follow_link(). The differences
    are:
    * inode and dentry are passed separately
    * might be called both in RCU and non-RCU mode;
    the former is indicated by passing it a NULL dentry.
    * when called that way it isn't allowed to block
    and should return ERR_PTR(-ECHILD) if it needs to be called
    in non-RCU mode.

    It's a flagday change - the old method is gone, all in-tree instances
    converted. Conversion isn't hard; said that, so far very few instances
    do not immediately bail out when called in RCU mode. That'll change
    in the next commits.

    Signed-off-by: Al Viro

    Al Viro
     
  • kmap() in page_follow_link_light() needed to go - allowing to hold
    an arbitrary number of kmaps for long is a great way to deadlocking
    the system.

    new helper (inode_nohighmem(inode)) needs to be used for pagecache
    symlinks inodes; done for all in-tree cases. page_follow_link_light()
    instrumented to yell about anything missed.

    Signed-off-by: Al Viro

    Al Viro
     

07 Dec, 2015

1 commit

  • Add an additional "name" field to struct xattr_handler. When the name
    is set, the handler matches attributes with exactly that name. When the
    prefix is set instead, the handler matches attributes with the given
    prefix and with a non-empty suffix.

    This patch should avoid bugs like the one fixed in commit c361016a in
    the future.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: James Morris
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

17 Nov, 2015

1 commit

  • Similar to XFS warn when mounting DAX while it is still considered under
    development. Also, aspects of the DAX implementation, for example
    synchronization against multiple faults and faults causing block
    allocation, depend on the correct implementation in the filesystem. The
    maturity of a given DAX implementation is filesystem specific.

    Cc:
    Cc: "Theodore Ts'o"
    Cc: Matthew Wilcox
    Cc: linux-ext4@vger.kernel.org
    Cc: Kirill A. Shutemov
    Reported-by: Dave Chinner
    Acked-by: Jan Kara
    Signed-off-by: Dan Williams

    Dan Williams
     

14 Nov, 2015

2 commits

  • Pull vfs xattr cleanups from Al Viro.

    * 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    f2fs: xattr simplifications
    squashfs: xattr simplifications
    9p: xattr simplifications
    xattr handlers: Pass handler to operations instead of flags
    jffs2: Add missing capability check for listing trusted xattrs
    hfsplus: Remove unused xattr handler list operations
    ubifs: Remove unused security xattr handler
    vfs: Fix the posix_acl_xattr_list return value
    vfs: Check attribute names in posix acl xattr handers

    Linus Torvalds
     
  • The xattr_handler operations are currently all passed a file system
    specific flags value which the operations can use to disambiguate between
    different handlers; some file systems use that to distinguish the xattr
    namespace, for example. In some oprations, it would be useful to also have
    access to the handler prefix. To allow that, pass a pointer to the handler
    to operations instead of the flags value alone.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

10 Nov, 2015

2 commits

  • Merge third patch-bomb from Andrew Morton:
    "We're pretty much done over here - I'm still waiting for a nouveau
    merge so I can cleanly finish up Christoph's dma-mapping rework.

    - bunch of small misc stuff

    - fold abs64() into abs(), remove abs64()

    - new_valid_dev() cleanups

    - binfmt_elf_fdpic feature work"

    * emailed patches from Andrew Morton : (24 commits)
    fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries
    fs/stat.c: remove unnecessary new_valid_dev() check
    fs/reiserfs/namei.c: remove unnecessary new_valid_dev() check
    fs/nilfs2/namei.c: remove unnecessary new_valid_dev() check
    fs/ncpfs/dir.c: remove unnecessary new_valid_dev() check
    fs/jfs: remove unnecessary new_valid_dev() checks
    fs/hpfs/namei.c: remove unnecessary new_valid_dev() check
    fs/f2fs/namei.c: remove unnecessary new_valid_dev() check
    fs/ext2/namei.c: remove unnecessary new_valid_dev() check
    fs/exofs/namei.c: remove unnecessary new_valid_dev() check
    fs/btrfs/inode.c: remove unnecessary new_valid_dev() check
    fs/9p: remove unnecessary new_valid_dev() checks
    include/linux/kdev_t.h: old/new_valid_dev() can return bool
    include/linux/kdev_t.h: remove unused huge_valid_dev()
    kmap_atomic_to_page() has no users, remove it
    drivers/scsi/cxgbi: fix build with EXTRA_CFLAGS
    dma: remove external references to dma_supported
    Documentation/sysctl/vm.txt: fix misleading code reference of overcommit_memory
    remove abs64()
    kernel.h: make abs() work with 64-bit types
    ...

    Linus Torvalds
     
  • new_valid_dev() always returns 1, so the !new_valid_dev() check is not
    needed. Remove it.

    Signed-off-by: Yaowei Bai
    Cc: Alexander Viro
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yaowei Bai
     

19 Oct, 2015

1 commit

  • Add locking to ensure that DAX faults are isolated from ext2 operations
    that modify the data blocks allocation for an inode. This is intended to
    be analogous to the work being done in XFS by Dave Chinner:

    http://www.spinics.net/lists/linux-fsdevel/msg90260.html

    Compared with XFS the ext2 case is greatly simplified by the fact that ext2
    already allocates and zeros new blocks before they are returned as part of
    ext2_get_block(), so DAX doesn't need to worry about getting unmapped or
    unwritten buffer heads.

    This means that the only work we need to do in ext2 is to isolate the DAX
    faults from inode block allocation changes. I believe this just means that
    we need to isolate the DAX faults from truncate operations.

    The newly introduced dax_sem is intended to replicate the protection
    offered by i_mmaplock in XFS. In addition to truncate the i_mmaplock also
    protects XFS operations like hole punching, fallocate down, extent
    manipulation IOCTLS like xfs_ioc_space() and extent swapping. Truncate is
    the only one of these operations supported by ext2.

    Signed-off-by: Ross Zwisler
    Signed-off-by: Jan Kara

    Ross Zwisler
     

09 Sep, 2015

2 commits

  • Use DAX to provide support for huge pages.

    Signed-off-by: Matthew Wilcox
    Cc: Hillf Danton
    Cc: "Kirill A. Shutemov"
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
    return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
    defined in linux/mm.h. Given that we don't want to include
    in , the easiest solution is to move the DAX-related
    functions to a new header, . We could also have moved
    VM_FAULT_* definitions to a new header, or a different header that isn't
    quite such a boil-the-ocean header as , but this felt like
    the best option.

    Signed-off-by: Matthew Wilcox
    Cc: Hillf Danton
    Cc: "Kirill A. Shutemov"
    Cc: Theodore Ts'o
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

24 Jul, 2015

1 commit


05 Jul, 2015

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     

01 Jul, 2015

1 commit

  • Pul xfs updates from Dave Chinner:
    "There's a couple of small API changes to the core DAX code which
    required small changes to the ext2 and ext4 code bases, but otherwise
    everything is within the XFS codebase.

    This update contains:

    - A new sparse on-disk inode record format to allow small extents to
    be used for inode allocation when free space is fragmented.

    - DAX support. This includes minor changes to the DAX core code to
    fix problems with lock ordering and bufferhead mapping abuse.

    - transaction commit interface cleanup

    - removal of various unnecessary XFS specific type definitions

    - cleanup and optimisation of freelist preparation before allocation

    - various minor cleanups

    - bug fixes for
    - transaction reservation leaks
    - incorrect inode logging in unwritten extent conversion
    - mmap lock vs freeze ordering
    - remote symlink mishandling
    - attribute fork removal issues"

    * tag 'xfs-for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits)
    xfs: don't truncate attribute extents if no extents exist
    xfs: clean up XFS_MIN_FREELIST macros
    xfs: sanitise error handling in xfs_alloc_fix_freelist
    xfs: factor out free space extent length check
    xfs: xfs_alloc_fix_freelist() can use incore perag structures
    xfs: remove xfs_caddr_t
    xfs: use void pointers in log validation helpers
    xfs: return a void pointer from xfs_buf_offset
    xfs: remove inst_t
    xfs: remove __psint_t and __psunsigned_t
    xfs: fix remote symlinks on V5/CRC filesystems
    xfs: fix xfs_log_done interface
    xfs: saner xfs_trans_commit interface
    xfs: remove the flags argument to xfs_trans_cancel
    xfs: pass a boolean flag to xfs_trans_free_items
    xfs: switch remaining xfs_trans_dup users to xfs_trans_roll
    xfs: check min blks for random debug mode sparse allocations
    xfs: fix sparse inodes 32-bit compile failure
    xfs: add initial DAX support
    xfs: add DAX IO path support
    ...

    Linus Torvalds
     

26 Jun, 2015

1 commit

  • Pull cgroup writeback support from Jens Axboe:
    "This is the big pull request for adding cgroup writeback support.

    This code has been in development for a long time, and it has been
    simmering in for-next for a good chunk of this cycle too. This is one
    of those problems that has been talked about for at least half a
    decade, finally there's a solution and code to go with it.

    Also see last weeks writeup on LWN:

    http://lwn.net/Articles/648292/"

    * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
    writeback, blkio: add documentation for cgroup writeback support
    vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
    writeback: do foreign inode detection iff cgroup writeback is enabled
    v9fs: fix error handling in v9fs_session_init()
    bdi: fix wrong error return value in cgwb_create()
    buffer: remove unusued 'ret' variable
    writeback: disassociate inodes from dying bdi_writebacks
    writeback: implement foreign cgroup inode bdi_writeback switching
    writeback: add lockdep annotation to inode_to_wb()
    writeback: use unlocked_inode_to_wb transaction in inode_congested()
    writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
    writeback: implement [locked_]inode_to_wb_and_lock_list()
    writeback: implement foreign cgroup inode detection
    writeback: make writeback_control track the inode being written back
    writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
    mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
    writeback: implement memcg writeback domain based throttling
    writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
    writeback: implement memcg wb_domain
    writeback: update wb_over_bg_thresh() to use wb_domain aware operations
    ...

    Linus Torvalds
     

24 Jun, 2015

1 commit


18 Jun, 2015

1 commit

  • FS_CGROUP_WRITEBACK indicates whether a file_system_type supports
    cgroup writeback; however, different super_blocks of the same
    file_system_type may or may not support cgroup writeback depending on
    filesystem options. This patch replaces FS_CGROUP_WRITEBACK with a
    per-super_block flag.

    super_block->s_flags carries some internal flags in the high bits but
    it's exposd to userland through uapi header and running out of space
    anyway. This patch adds a new field super_block->s_iflags to carry
    kernel-internal flags. It is currently only used by the new
    SB_I_CGROUPWB flag whose concatenated and abbreviated name is for
    consistency with other super_block flags.

    ext2_fill_super() is updated to set SB_I_CGROUPWB.

    v2: Added super_block->s_iflags instead of stealing another high bit
    from sb->s_flags as suggested by Christoph and Jan.

    Signed-off-by: Tejun Heo
    Cc: Alexander Viro
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: linux-ext4@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

04 Jun, 2015

1 commit

  • dax_fault() currently relies on the get_block callback to attach an
    io completion callback to the mapping buffer head so that it can
    run unwritten extent conversion after zeroing allocated blocks.

    Instead of this hack, pass the conversion callback directly into
    dax_fault() similar to the get_block callback. When the filesystem
    allocates unwritten extents, it will set the buffer_unwritten()
    flag, and hence the dax_fault code can call the completion function
    in the contexts where it is necessary without overloading the
    mapping buffer head.

    Note: The changes to ext4 to use this interface are suspect at best.
    In fact, the way ext4 did this end_io assignment in the first place
    looks suspect because it only set a completion callback when there
    wasn't already some other write() call taking place on the same
    inode. The ext4 end_io code looks rather intricate and fragile with
    all it's reference counting and passing to different contexts for
    modification via inode private pointers that aren't protected by
    locks...

    Signed-off-by: Dave Chinner
    Acked-by: Jan Kara
    Signed-off-by: Dave Chinner

    Dave Chinner
     

02 Jun, 2015

1 commit

  • Writeback now supports cgroup writeback and the generic writeback,
    buffer, libfs, and mpage helpers that ext2 uses are all updated to
    work with cgroup writeback.

    This patch enables cgroup writeback for ext2 by adding
    FS_CGROUP_WRITEBACK to its ->fs_flags.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Cc: linux-ext4@vger.kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     

11 May, 2015

1 commit