03 Nov, 2015

1 commit


09 Sep, 2015

3 commits


05 Jul, 2015

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     

25 Jun, 2015

5 commits

  • Previously our dcache readdir code relies on that child dentries in
    directory dentry's d_subdir list are sorted by dentry's offset in
    descending order. When adding dentries to the dcache, if a dentry
    already exists, our readdir code moves it to head of directory
    dentry's d_subdir list. This design relies on dcache internals.
    Al Viro suggests using ncpfs's approach: keeping array of pointers
    to dentries in page cache of directory inode. the validity of those
    pointers are presented by directory inode's complete and ordered
    flags. When a dentry gets pruned, we clear directory inode's complete
    flag in the d_prune() callback. Before moving a dentry to other
    directory, we clear the ordered flag for both old and new directory.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • GFP_NOFS memory allocation is required for page writeback path.
    But there is no need to use GFP_NOFS in syscall path and readpage
    path

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • In most cases that snap context is needed, we are holding
    reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
    i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
    and make codes get snap context from i_head_snapc. This makes
    the code simpler.

    Another benefit of this change is that we can handle snap
    notification more elegantly. Especially when snap context
    is updated while someone else is doing write. The old queue
    cap_snap code may set cap_snap's context to ether the old
    context or the new snap context, depending on if i_head_snapc
    is set. The new queue capp_snap code always set cap_snap's
    context to the old snap context.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng
    Reviewed-by: Alex Elder

    Yan, Zheng
     

24 Jun, 2015

1 commit

  • file_remove_suid() is a misnomer since it removes also file capabilities
    stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
    something else than whether file_remove_suid() call is necessary which
    leads to bugs.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

16 Apr, 2015

1 commit


12 Apr, 2015

5 commits


26 Mar, 2015

1 commit


13 Mar, 2015

1 commit


23 Feb, 2015

2 commits

  • Pull more vfs updates from Al Viro:
    "Assorted stuff from this cycle. The big ones here are multilayer
    overlayfs from Miklos and beginning of sorting ->d_inode accesses out
    from David"

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
    autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
    procfs: fix race between symlink removals and traversals
    debugfs: leave freeing a symlink body until inode eviction
    Documentation/filesystems/Locking: ->get_sb() is long gone
    trylock_super(): replacement for grab_super_passive()
    fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
    Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
    VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
    SELinux: Use d_is_positive() rather than testing dentry->d_inode
    Smack: Use d_is_positive() rather than testing dentry->d_inode
    TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
    Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
    Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
    VFS: Split DCACHE_FILE_TYPE into regular and special types
    VFS: Add a fallthrough flag for marking virtual dentries
    VFS: Add a whiteout dentry type
    VFS: Introduce inode-getting helpers for layered/unioned fs environments
    Infiniband: Fix potential NULL d_inode dereference
    posix_acl: fix reference leaks in posix_acl_create
    autofs4: Wrong format for printing dentry
    ...

    Linus Torvalds
     
  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

20 Feb, 2015

1 commit

  • Pull Ceph changes from Sage Weil:
    "On the RBD side, there is a conversion to blk-mq from Christoph,
    several long-standing bug fixes from Ilya, and some cleanup from
    Rickard Strandqvist.

    On the CephFS side there is a long list of fixes from Zheng, including
    improved session handling, a few IO path fixes, some dcache management
    correctness fixes, and several blocking while !TASK_RUNNING fixes.

    The core code gets a few cleanups and Chaitanya has added support for
    TCP_NODELAY (which has been used on the server side for ages but we
    somehow missed on the kernel client).

    There is also an update to MAINTAINERS to fix up some email addresses
    and reflect that Ilya and Zheng are doing most of the maintenance for
    RBD and CephFS these days. Do not be surprised to see a pull request
    come from one of them in the future if I am unavailable for some
    reason"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
    MAINTAINERS: update Ceph and RBD maintainers
    libceph: kfree() in put_osd() shouldn't depend on authorizer
    libceph: fix double __remove_osd() problem
    rbd: convert to blk-mq
    ceph: return error for traceless reply race
    ceph: fix dentry leaks
    ceph: re-send requests when MDS enters reconnecting stage
    ceph: show nocephx_require_signatures and notcp_nodelay options
    libceph: tcp_nodelay support
    rbd: do not treat standalone as flatten
    ceph: fix atomic_open snapdir
    ceph: properly mark empty directory as complete
    client: include kernel version in client metadata
    ceph: provide seperate {inode,file}_operations for snapdir
    ceph: fix request time stamp encoding
    ceph: fix reading inline data when i_size > PAGE_SIZE
    ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
    ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
    ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
    rbd: fix error paths in rbd_dev_refresh()
    ...

    Linus Torvalds
     

19 Feb, 2015

3 commits

  • ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
    and creates snapdir inode if it's -ENOENT

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • when inode has inline data but its size > PAGE_SIZE (it was truncated
    to larger size), previous direct read code return -EIO. This patch adds
    code to return zeros for data whose offset > PAGE_SIZE.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls
    ceph_zero_pape_vector_range(). The first argument, page_align + read + ret,
    passed to ceph_zero_pape_vector_range() is wrong.

    When a file has holes, this wrong parameter may cause memory corruption
    either in kernal space or user space. Kernel space memory may be corrupted in
    the case of non direct IO; user space memory may be corrupted in the case of
    direct IO. In the latter case, the application doing direct IO may crash due
    to memory corruption, as we have experienced.

    The correct value should be initial_align + read + ret, where intial_align =
    o_direct ? buf_align : io_align. Compared with page_align, the current page
    offest, initial_align is the initial page offest, which should be used to
    calculate the page and offset in ceph_zero_pape_vector_range().

    Reported-by: caifeng zhu
    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

21 Jan, 2015

1 commit

  • Now that we got rid of the bdi abuse on character devices we can always use
    sb->s_bdi to get at the backing_dev_info for a file, except for the block
    device special case. Export inode_to_bdi and replace uses of
    mapping->backing_dev_info with it to prepare for the removal of
    mapping->backing_dev_info.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Tejun Heo
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

18 Dec, 2014

5 commits

  • Pull ceph updates from Sage Weil:
    "The big item here is support for inline data for CephFS and for
    message signatures from Zheng. There are also several bug fixes,
    including interrupted flock request handling, 0-length xattrs, mksnap,
    cached readdir results, and a message version compat field. Finally
    there are several cleanups from Ilya, Dan, and Markus.

    Note that there is another series coming soon that fixes some bugs in
    the RBD 'lingering' requests, but it isn't quite ready yet"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
    ceph: fix setting empty extended attribute
    ceph: fix mksnap crash
    ceph: do_sync is never initialized
    libceph: fixup includes in pagelist.h
    ceph: support inline data feature
    ceph: flush inline version
    ceph: convert inline data to normal data before data write
    ceph: sync read inline data
    ceph: fetch inline data when getting Fcr cap refs
    ceph: use getattr request to fetch inline data
    ceph: add inline data to pagecache
    ceph: parse inline data in MClientReply and MClientCaps
    libceph: specify position of extent operation
    libceph: add CREATE osd operation support
    libceph: add SETXATTR/CMPXATTR osd operations support
    rbd: don't treat CEPH_OSD_OP_DELETE as extent op
    ceph: remove unused stringification macros
    libceph: require cephx message signature by default
    ceph: introduce global empty snap context
    ceph: message versioning fixes
    ...

    Linus Torvalds
     
  • Before any data write, convert inline data to normal data and set
    i_inline_version to CEPH_INLINE_NONE. The OSD request that saves
    inline data to object contains 3 operations (CMPXATTR, WRITE and
    SETXATTR). It compares a xattr named 'inline_version' to prevent
    old data overwrites newer data.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • we can't use getattr to fetch inline data while holding Fr cap,
    because it can cause deadlock. If we need to sync read inline data,
    drop cap refs first, then use getattr to fetch inline data.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • we can't use getattr to fetch inline data after getting Fcr caps,
    because it can cause deadlock. The solution is try bringing inline
    data to page cache when not holding any cap, and hope the inline
    data page is still there after getting the Fcr caps. If the page
    is still there, pin it in page cache for later IO.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • allow specifying position of extent operation in multi-operations
    osd request. This is required for cephfs to convert inline data to
    normal data (compare xattr, then write object).

    Signed-off-by: Yan, Zheng
    Reviewed-by: Ilya Dryomov

    Yan, Zheng
     

20 Nov, 2014

2 commits


15 Oct, 2014

3 commits

  • Current code set new file/directory's initial ACL in a non-atomic
    manner.
    Client first sends request to MDS to create new file/directory, then set
    the initial ACL after the new file/directory is successfully created.

    The fix is include the initial ACL in create/mkdir/mknod MDS requests.
    So MDS can handle creating file/directory and setting the initial ACL in
    one request.

    Signed-off-by: Yan, Zheng
    Reviewed-by: Sage Weil

    Yan, Zheng
     
  • ceph_sync_read and generic_file_read_iter() have already advanced the
    IO iterator.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     
  • Following sequence of events can happen.
    - Client releases an inode, queues cap release message.
    - A 'lookup' reply brings the same inode back, but the reply
    doesn't contain xattrs because MDS didn't receive the cap release
    message and thought client already has up-to-data xattrs.

    The fix is force sending a getattr request to MDS if xattrs_version
    is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
    does not have xattr.

    Signed-off-by: Yan, Zheng

    Yan, Zheng
     

28 Jul, 2014

1 commit


21 Jul, 2014

1 commit


08 Jul, 2014

2 commits