08 Oct, 2016

1 commit

  • Pull VFS splice updates from Al Viro:
    "There's a bunch of branches this cycle, both mine and from other folks
    and I'd rather send pull requests separately.

    This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
    (and introduction of such). Gets rid of a lot of code in fs/splice.c
    and elsewhere; there will be followups, but these are for the next
    cycle... Some pipe/splice-related cleanups from Miklos in the same
    branch as well"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    pipe: fix comment in pipe_buf_operations
    pipe: add pipe_buf_steal() helper
    pipe: add pipe_buf_confirm() helper
    pipe: add pipe_buf_release() helper
    pipe: add pipe_buf_get() helper
    relay: simplify relay_file_read()
    switch default_file_splice_read() to use of pipe-backed iov_iter
    switch generic_file_splice_read() to use of ->read_iter()
    new iov_iter flavour: pipe-backed
    fuse_dev_splice_read(): switch to add_to_pipe()
    skb_splice_bits(): get rid of callback
    new helper: add_to_pipe()
    splice: lift pipe_lock out of splice_to_pipe()
    splice: switch get_iovec_page_array() to iov_iter
    splice_to_pipe(): don't open-code wakeup_pipe_readers()
    consistent treatment of EFAULT on O_DIRECT read/write

    Linus Torvalds
     

06 Oct, 2016

1 commit


27 Sep, 2016

1 commit


27 Jun, 2016

1 commit

  • Make the code more readable by cleaning up the different ways of
    initializing lock holders and checking for initialized lock holders:
    mark lock holders as uninitialized by setting the holder's glock to NULL
    (gfs2_holder_mark_uninitialized) instead of zeroing out the entire
    object or using a separate flag. Recognize initialized holders by their
    non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects
    which are immeditiately initialized via gfs2_holder_init or
    gfs2_glock_nq_init.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Bob Peterson

    Andreas Gruenbacher
     

21 May, 2016

1 commit

  • Pull GFS2 updates from Bob Peterson:
    "We've got nine patches this time:

    - Abhi Das has two patches that fix a GFS2 splice issue (and an
    adjustment).

    - Ben Marzinski has a patch which allows the proper unmount of a GFS2
    file system after hitting a withdraw error.

    - I have a patch to fix a problem where GFS2 would dereference an
    error value, plus three cosmetic / refactoring patches.

    - Daniel DeFreez has a patch to fix two glock reference count
    problems, where GFS2 was not properly "uninitializing" its glock
    holder on error paths.

    - Denys Vlasenko has a patch to change a function to not be inlined,
    thus reducing the memory footprint of the GFS2 module"

    * tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    GFS2: Refactor gfs2_remove_from_journal
    GFS2: Remove allocation parms from gfs2_rbm_find
    gfs2: use inode_lock/unlock instead of accessing i_mutex directly
    GFS2: Add calls to gfs2_holder_uninit in two error handlers
    GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
    GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
    gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
    GFS2: Get rid of dead code in inode_go_demote_ok
    GFS2: ignore unlock failures after withdraw

    Linus Torvalds
     

18 May, 2016

1 commit

  • Pull vfs cleanups from Al Viro:
    "More cleanups from Christoph"

    * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    nfsd: use RWF_SYNC
    fs: add RWF_DSYNC aand RWF_SYNC
    ceph: use generic_write_sync
    fs: simplify the generic_write_sync prototype
    fs: add IOCB_SYNC and IOCB_DSYNC
    direct-io: remove the offset argument to dio_complete
    direct-io: eliminate the offset argument to ->direct_IO
    xfs: eliminate the pos variable in xfs_file_dio_aio_write
    filemap: remove the pos argument to generic_file_direct_write
    filemap: remove pos variables in generic_file_read_iter

    Linus Torvalds
     

13 May, 2016

1 commit


02 May, 2016

2 commits


20 Apr, 2016

1 commit


06 Apr, 2016

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

22 Dec, 2015

1 commit

  • Commit 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
    moved flock/posix lock identify code to locks_lock_inode_wait(), but
    missed to set fl_flags to FL_FLOCK which will cause kernel panic in
    locks_lock_inode_wait().

    Fixes: 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
    Signed-off-by: Junxiao Bi
    Signed-off-by: Bob Peterson

    Junxiao Bi
     

15 Dec, 2015

2 commits

  • This patch makes no functional changes. Its goal is to reduce the
    size of the gfs2 inode in memory by rearranging structures and
    changing the size of some variables within the structure.

    Signed-off-by: Bob Peterson

    Bob Peterson
     
  • Before this patch, multi-block reservation structures were allocated
    from a special slab. This patch folds the structure into the gfs2_inode
    structure. The disadvantage is that the gfs2_inode needs more memory,
    even when a file is opened read-only. The advantages are: (a) we don't
    need the special slab and the extra time it takes to allocate and
    deallocate from it. (b) we no longer need to worry that the structure
    exists for things like quota management. (c) This also allows us to
    remove the calls to get_write_access and put_write_access since we
    know the structure will exist.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

24 Nov, 2015

1 commit

  • This patch basically reverts the majority of patch 5407e24.
    That patch eliminated the gfs2_qadata structure in favor of just
    using the reservations structure. The problem with doing that is that
    it increases the size of the reservations structure. That is not an
    issue until it comes time to fold the reservations structure into the
    inode in memory so we know it's always there. By separating out the
    quota structure again, we aren't punishing the non-quota users by
    making all the inodes bigger, requiring more slab space. This patch
    creates a new slab area to allocate the quota stuff so it's managed
    a little more sanely.

    Signed-off-by: Bob Peterson

    Bob Peterson
     

11 Nov, 2015

1 commit

  • When new files and directories are created inside a parent directory
    we automatically inherit the GFS2_DIF_SYSTEM flag (if set) and assign
    it to the new file/dirs.

    All new system files/dirs created in the metafs by, say gfs2_jadd,
    will have this flag set because they will have parent directories in
    the metafs whose GFS2_DIF_SYSTEM flag has already been set (most likely
    by a previous mkfs.gfs2)

    Signed-off-by: Abhi Das
    Signed-off-by: Bob Peterson

    Abhi Das
     

10 Nov, 2015

1 commit

  • Pull gfs2 updates from Bob Peterson:
    "Here is a list of patches we've accumulated for GFS2 for the current
    upstream merge window. There are only six patches this time:

    1. A cleanup patch from Andreas to remove the gl_spin #define in favor
    of its value for the sake of clarity.
    2. A fix from Andy Price to mark the inode dirty during fallocate.
    3. A fix from Andy Price to set s_mode on mount failures to prevent a
    stack trace.
    4 A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data
    due to uninitialized storage.
    5. A patch from me to protecting our freeing of the in-core directory
    hash table to prevent double-free.
    6. A fix for a page/block rounding problem that resulted in a metadata
    coherency problem when the block size != page size"

    I've got a lot more patches in various stages of review and testing,
    but I'm afraid they'll have to wait until the next merge window. So
    next time we're likely to have a lot more"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    GFS2: Fix rgrp end rounding problem for bsize < page size
    GFS2: Protect freeing directory hash table with i_lock spin_lock
    gfs2: Remove gl_spin define
    gfs2: Add missing else in trans_add_meta/data
    GFS2: Set s_mode before parsing mount options
    GFS2: fallocate: do not rely on file_update_time to mark the inode dirty

    Linus Torvalds
     

23 Oct, 2015

1 commit


22 Sep, 2015

1 commit


28 Jun, 2015

1 commit

  • Pull GFS2 updates from Bob Peterson:
    "Here are the patches we've accumulated for GFS2 for the current
    upstream merge window. We have a good mixture this time. Here are
    some of the features:

    - Fix a problem with RO mounts writing to the journal.

    - Further improvements to quotas on GFS2.

    - Added support for rename2 and RENAME_EXCHANGE on GFS2.

    - Increase performance by making glock lru_list less of a bottleneck.

    - Increase performance by avoiding unnecessary buffer_head releases.

    - Increase performance by using average glock round trip time from all CPUs.

    - Fixes for some compiler warnings and minor white space issues.

    - Other misc bug fixes"

    * tag 'gfs2-merge-window' of git://git.kernel.org:/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    GFS2: Don't brelse rgrp buffer_heads every allocation
    GFS2: Don't add all glocks to the lru
    gfs2: Don't support fallocate on jdata files
    gfs2: s64 cast for negative quota value
    gfs2: limit quota log messages
    gfs2: fix quota updates on block boundaries
    gfs2: fix shadow warning in gfs2_rbm_find()
    gfs2: kerneldoc warning fixes
    gfs2: convert simple_str to kstr
    GFS2: make sure S_NOSEC flag isn't overwritten
    GFS2: add support for rename2 and RENAME_EXCHANGE
    gfs2: handle NULL rgd in set_rgrp_preferences
    GFS2: inode.c: indent with TABs, not spaces
    GFS2: mark the journal idle to fix ro mounts
    GFS2: Average in only non-zero round-trip times for congestion stats
    GFS2: Use average srttb value in congestion calculations

    Linus Torvalds
     

09 Jun, 2015

1 commit


06 May, 2015

1 commit


17 Apr, 2015

1 commit

  • Pull third hunk of vfs changes from Al Viro:
    "This contains the ->direct_IO() changes from Omar + saner
    generic_write_checks() + dealing with fcntl()/{read,write}() races
    (mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
    repeatedly looking at ->f_flags, which can be changed by fcntl(2),
    check ->ki_flags - which cannot) + infrastructure bits for dhowells'
    d_inode annotations + Christophs switch of /dev/loop to
    vfs_iter_write()"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
    block: loop: switch to VFS ITER_BVEC
    configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
    VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
    VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
    VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
    NFS: Don't use d_inode as a variable name
    VFS: Impose ordering on accesses of d_inode and d_flags
    VFS: Add owner-filesystem positive/negative dentry checks
    nfs: generic_write_checks() shouldn't be done on swapout...
    ocfs2: use __generic_file_write_iter()
    mirror O_APPEND and O_DIRECT into iocb->ki_flags
    switch generic_write_checks() to iocb and iter
    ocfs2: move generic_write_checks() before the alignment checks
    ocfs2_file_write_iter: stop messing with ppos
    udf_file_write_iter: reorder and simplify
    fuse: ->direct_IO() doesn't need generic_write_checks()
    ext4_file_write_iter: move generic_write_checks() up
    xfs_file_aio_write_checks: switch to iocb/iov_iter
    generic_write_checks(): drop isblk argument
    blkdev_write_iter: expand generic_file_checks() call in there
    ...

    Linus Torvalds
     

16 Apr, 2015

1 commit

  • Pull second vfs update from Al Viro:
    "Now that net-next went in... Here's the next big chunk - killing
    ->aio_read() and ->aio_write().

    There'll be one more pile today (direct_IO changes and
    generic_write_checks() cleanups/fixes), but I'd prefer to keep that
    one separate"

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
    ->aio_read and ->aio_write removed
    pcm: another weird API abuse
    infinibad: weird APIs switched to ->write_iter()
    kill do_sync_read/do_sync_write
    fuse: use iov_iter_get_pages() for non-splice path
    fuse: switch to ->read_iter/->write_iter
    switch drivers/char/mem.c to ->read_iter/->write_iter
    make new_sync_{read,write}() static
    coredump: accept any write method
    switch /dev/loop to vfs_iter_write()
    serial2002: switch to __vfs_read/__vfs_write
    ashmem: use __vfs_read()
    export __vfs_read()
    autofs: switch to __vfs_write()
    new helper: __vfs_write()
    switch hugetlbfs to ->read_iter()
    coda: switch to ->read_iter/->write_iter
    ncpfs: switch to ->read_iter/->write_iter
    net/9p: remove (now-)unused helpers
    p9_client_attach(): set fid->uid correctly
    ...

    Linus Torvalds
     

15 Apr, 2015

1 commit

  • Pull GFS2 updates from Bob Peterson:
    "Here is a list of patches we've accumulated for GFS2 for the current
    upstream merge window.

    Most of the patches fix GFS2 quotas, which were not properly enforced.
    There's another that adds me as a GFS2 co-maintainer, and a couple
    patches that fix a kernel panic doing splice_write on GFS2 as well as
    a few correctness patches"

    * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
    gfs2: fix quota refresh race in do_glock()
    gfs2: incorrect check for debugfs returns
    gfs2: allow fallocate to max out quotas/fs efficiently
    gfs2: allow quota_check and inplace_reserve to return available blocks
    gfs2: perform quota checks against allocation parameters
    GFS2: Move gfs2_file_splice_write outside of #ifdef
    GFS2: Allocate reservation during splice_write
    GFS2: gfs2_set_acl(): Cache "no acl" as well
    Add myself (Bob Peterson) as a maintainer of GFS2

    Linus Torvalds
     

12 Apr, 2015

2 commits


26 Mar, 2015

1 commit


19 Mar, 2015

4 commits

  • We can quickly get an estimate of how many blocks are available
    for allocation restricted by quota and fs size respectively, using
    the ap->allowed field in the gfs2_alloc_parms structure.
    gfs2_quota_check() and gfs2_inplace_reserve() provide these values.

    Once we have the total number of blocks available to us, we can
    compute how many bytes of data can be written using those blocks
    instead of guessing inefficiently.

    Signed-off-by: Abhi Das
    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Abhi Das
     
  • Use struct gfs2_alloc_parms as an argument to gfs2_quota_check()
    and gfs2_quota_lock_check() to check for quota violations while
    accounting for the new blocks requested by the current operation
    in ap->target.

    Previously, the number of new blocks requested during an operation
    were not accounted for during quota_check and would allow these
    operations to exceed quota. This was not very apparent since most
    operations allocated only 1 block at a time and quotas would get
    violated in the next operation. i.e. quota excess would only be by
    1 block or so. With fallocate, (where we allocate a bunch of blocks
    at once) the quota excess is non-trivial and is addressed by this
    patch.

    Signed-off-by: Abhi Das
    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Abhi Das
     
  • This patch moves function gfs2_file_splice_write so it's not
    conditionally compiled.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     
  • This patch adds a GFS2-specific function for splice_write which
    first calls function gfs2_rs_alloc to make sure a reservation
    structure has been allocated before attempting to reserve blocks.

    Signed-off-by: Bob Peterson
    Acked-by: Steven Whitehouse

    Bob Peterson
     

18 Feb, 2015

1 commit


11 Feb, 2015

1 commit


05 Feb, 2015

1 commit

  • Add a new mount option which enables a new "lazytime" mode. This mode
    causes atime, mtime, and ctime updates to only be made to the
    in-memory version of the inode. The on-disk times will only get
    updated when (a) if the inode needs to be updated for some non-time
    related change, (b) if userspace calls fsync(), syncfs() or sync(), or
    (c) just before an undeleted inode is evicted from memory.

    This is OK according to POSIX because there are no guarantees after a
    crash unless userspace explicitly requests via a fsync(2) call.

    For workloads which feature a large number of random write to a
    preallocated file, the lazytime mount option significantly reduces
    writes to the inode table. The repeated 4k writes to a single block
    will result in undesirable stress on flash devices and SMR disk
    drives. Even on conventional HDD's, the repeated writes to the inode
    table block will trigger Adjacent Track Interference (ATI) remediation
    latencies, which very negatively impact long tail latencies --- which
    is a very big deal for web serving tiers (for example).

    Google-Bug-Id: 18297052

    Signed-off-by: Theodore Ts'o
    Signed-off-by: Al Viro

    Theodore Ts'o
     

14 Nov, 2014

3 commits

  • gfs2_fallocate() wasn't updating ctime and mtime when modifying the
    inode. Add a call to file_update_time() to do that.

    Signed-off-by: Andrew Price
    Signed-off-by: Steven Whitehouse

    Andrew Price
     
  • This addresses an issue caught by fsx where the inode size was not being
    updated to the expected value after fallocate(2) with mode 0.

    The problem was caused by the offset and len parameters being converted
    to multiples of the file system's block size, so i_size would be rounded
    up to the nearest block size multiple instead of the requested size.

    This replaces the per-chunk i_size updates with a single i_size_write on
    successful completion of the operation. With this patch gfs2 gets
    through a complete run of fsx.

    For clarity, the check for (error == 0) following the loop is removed as
    all failures before that point jump to out_* labels or return.

    Signed-off-by: Andrew Price
    Signed-off-by: Steven Whitehouse

    Andrew Price
     
  • gfs2_fallocate wasn't checking inode_newsize_ok nor get_write_access.
    Split out the context setup and inode locking pieces into a separate
    function to make it more clear and add these missing calls.

    inode_newsize_ok is called conditional on FALLOC_FL_KEEP_SIZE as there
    is no need to enforce a file size limit if it isn't going to change.

    Signed-off-by: Andrew Price
    Signed-off-by: Steven Whitehouse

    Andrew Price