05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

08 Jun, 2015

1 commit

  • On failure, v9fs_session_init() returns with the v9fs_session_info
    struct partially initialized and expects the caller to invoke
    v9fs_session_close() to clean it up; however, it doesn't track whether
    the bdi is initialized or not and curiously invokes bdi_destroy() in
    both vfs_session_init() failure path too.

    A. If v9fs_session_init() fails before the bdi is initialized, the
    follow-up v9fs_session_close() will invoke bdi_destroy() on an
    uninitialized bdi.

    B. If v9fs_session_init() fails after the bdi is initialized,
    bdi_destroy() will be called twice on the same bdi - once in the
    failure path of v9fs_session_init() and then by
    v9fs_session_close().

    A is broken no matter what. B used to be okay because bdi_destroy()
    allowed being invoked multiple times on the same bdi, which BTW was
    broken in its own way - if bdi_destroy() was invoked on an initialiezd
    but !registered bdi, it'd fail to free percpu counters. Since
    f0054bb1e1f3 ("writeback: move backing_dev_info->wb_lock and
    ->worklist into bdi_writeback"), this no longer work - bdi_destroy()
    on an initialized but not registered bdi works correctly but multiple
    invocations of bdi_destroy() is no longer allowed.

    The obvious culprit here is v9fs_session_init()'s odd and broken error
    behavior. It should simply clean up after itself on failures. This
    patch makes the following updates to v9fs_session_init().

    * @rc -> @retval error return propagation removed. It didn't serve
    any purpose. Just use @rc.

    * Move addition to v9fs_sessionlist to the end of the function so that
    incomplete sessions are not put on the list or iterated and error
    path doesn't have to worry about it.

    * Update error handling so that it cleans up after itself.

    Signed-off-by: Tejun Heo
    Reported-by: Sasha Levin
    Signed-off-by: Jens Axboe

    Tejun Heo
     

16 Apr, 2015

1 commit


10 Jan, 2014

1 commit


04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

26 Feb, 2013

1 commit

  • The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30 # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

    Obviously, we should not be getting an ESTALE error back there since the
    inode still exists on the server. The problem is that the lookup code
    will call d_revalidate on the dentry that "." refers to, because NFS has
    FS_REVAL_DOT set.

    nfs_lookup_revalidate will see that the parent directory has changed and
    will try to reverify the dentry by redoing a LOOKUP. That of course
    fails, so the lookup code returns ESTALE.

    The problem here is that d_revalidate is really a bad fit for this case.
    What we really want to know at this point is whether the inode is still
    good or not, but we don't really care what name it goes by or whether
    the dcache is still valid.

    Add a new d_op->d_weak_revalidate operation and have complete_walk call
    that instead of d_revalidate. The intent there is to allow for a
    "weaker" d_revalidate that just checks to see whether the inode is still
    good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
    special casing.

    [AV: changed method name, added note in porting, fixed confusion re
    having it possibly called from RCU mode (it won't be)]

    Cc: NeilBrown
    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

14 Jul, 2012

1 commit

  • Pass mount flags to sget() so that it can use them in initialising a new
    superblock before the set function is called. They could also be passed to the
    compare function.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

29 Mar, 2012

1 commit


21 Mar, 2012

1 commit


11 Mar, 2012

1 commit


11 Jan, 2012

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: iattr_valid flags are kernel internal flags map them to 9p values.
    fs/9p: We should not allocate a new inode when creating hardlines.
    fs/9p: v9fs_stat2inode should update suid/sgid bits.
    9p: Reduce object size with CONFIG_NET_9P_DEBUG
    fs/9p: check schedule_timeout_interruptible return value

    Fix up trivial conflicts in fs/9p/{vfs_inode.c,vfs_inode_dotl.c} due to
    debug messages having changed to use p9_debug() on one hand, and the
    changes for umode_t on the other.

    Linus Torvalds
     

06 Jan, 2012

1 commit

  • Reduce object size by deduplicating formats.

    Use vsprintf extension %pV.
    Rename P9_DPRINTK uses to p9_debug, align arguments.
    Add function for _p9_debug and macro to add __func__.
    Add missing "\n"s to p9_debug uses.
    Remove embedded function names as p9_debug adds it.
    Remove P9_EPRINTK macro and convert use to pr_.
    Add and use pr_fmt and pr_.

    $ size fs/9p/built-in.o*
    text data bss dec hex filename
    62133 984 16000 79117 1350d fs/9p/built-in.o.new
    67342 984 16928 85254 14d06 fs/9p/built-in.o.old
    $ size net/9p/built-in.o*
    text data bss dec hex filename
    88792 4148 22024 114964 1c114 net/9p/built-in.o.new
    94072 4148 23232 121452 1da6c net/9p/built-in.o.old

    Signed-off-by: Joe Perches
    Signed-off-by: Eric Van Hensbergen

    Joe Perches
     

04 Jan, 2012

1 commit


06 Sep, 2011

1 commit


16 Apr, 2011

2 commits


23 Mar, 2011

1 commit


15 Mar, 2011

8 commits


13 Jan, 2011

1 commit

  • here we actually *want* ->d_op for root; setting it allows to get rid
    of kludge in v9fs_kill_super() since now we have proper ->d_release()
    for root and don't need to call it manually.

    Signed-off-by: Al Viro

    Al Viro
     

29 Oct, 2010

1 commit


28 Oct, 2010

3 commits


13 Sep, 2010

1 commit


11 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

1 commit


03 Aug, 2010

3 commits

  • During fid lookup we need to make sure that the dentry->d_parent doesn't
    change so that we can safely walk the parent dentries. To ensure that
    we need to prevent cross directory rename during fid_lookup. Add a
    per superblock rename_sem rw_semaphore to prevent parallel fid lookup and
    rename.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Venkateswararao Jujjuri
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     
  • SYNOPSIS

    size[4] Tgetattr tag[2] fid[4] request_mask[8]

    size[4] Rgetattr tag[2] lstat[n]

    DESCRIPTION

    The getattr transaction inquires about the file identified by fid.
    request_mask is a bit mask that specifies which fields of the
    stat structure is the client interested in.

    The reply will contain a machine-independent directory entry,
    laid out as follows:

    st_result_mask[8]
    Bit mask that indicates which fields in the stat structure
    have been populated by the server

    qid.type[1]
    the type of the file (directory, etc.), represented as a bit
    vector corresponding to the high 8 bits of the file's mode
    word.

    qid.vers[4]
    version number for given path

    qid.path[8]
    the file server's unique identification for the file

    st_mode[4]
    Permission and flags

    st_uid[4]
    User id of owner

    st_gid[4]
    Group ID of owner

    st_nlink[8]
    Number of hard links

    st_rdev[8]
    Device ID (if special file)

    st_size[8]
    Size, in bytes

    st_blksize[8]
    Block size for file system IO

    st_blocks[8]
    Number of file system blocks allocated

    st_atime_sec[8]
    Time of last access, seconds

    st_atime_nsec[8]
    Time of last access, nanoseconds

    st_mtime_sec[8]
    Time of last modification, seconds

    st_mtime_nsec[8]
    Time of last modification, nanoseconds

    st_ctime_sec[8]
    Time of last status change, seconds

    st_ctime_nsec[8]
    Time of last status change, nanoseconds

    st_btime_sec[8]
    Time of creation (birth) of file, seconds

    st_btime_nsec[8]
    Time of creation (birth) of file, nanoseconds

    st_gen[8]
    Inode generation

    st_data_version[8]
    Data version number

    request_mask and result_mask bit masks contain the following bits
    #define P9_STATS_MODE 0x00000001ULL
    #define P9_STATS_NLINK 0x00000002ULL
    #define P9_STATS_UID 0x00000004ULL
    #define P9_STATS_GID 0x00000008ULL
    #define P9_STATS_RDEV 0x00000010ULL
    #define P9_STATS_ATIME 0x00000020ULL
    #define P9_STATS_MTIME 0x00000040ULL
    #define P9_STATS_CTIME 0x00000080ULL
    #define P9_STATS_INO 0x00000100ULL
    #define P9_STATS_SIZE 0x00000200ULL
    #define P9_STATS_BLOCKS 0x00000400ULL

    #define P9_STATS_BTIME 0x00000800ULL
    #define P9_STATS_GEN 0x00001000ULL
    #define P9_STATS_DATA_VERSION 0x00002000ULL

    #define P9_STATS_BASIC 0x000007ffULL
    #define P9_STATS_ALL 0x00003fffULL

    This patch implements the client side of getattr implementation for
    9P2000.L. It introduces a new structure p9_stat_dotl for getting
    Linux stat information along with QID. The data layout is similar to
    stat structure in Linux user space with the following major
    differences:

    inode (st_ino) is not part of data. Instead qid is.

    device (st_dev) is not part of data because this doesn't make sense
    on the client.

    All time variables are 64 bit wide on the wire. The kernel seems to use
    32 bit variables for these variables. However, some of the architectures
    have used 64 bit variables and glibc exposes 64 bit variables to user
    space on some architectures. Hence to be on the safer side we have made
    these 64 bit in the protocol. Refer to the comments in
    include/asm-generic/stat.h

    There are some additional fields: st_btime_sec, st_btime_nsec, st_gen,
    st_data_version apart from the bitmask, st_result_mask. The bit mask
    is filled by the server to indicate which stat fields have been
    populated by the server. Currently there is no clean way for the
    server to obtain these additional fields, so it sends back just the
    basic fields.

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbegren

    Sripathi Kodi
     

22 May, 2010

2 commits

  • I made a V2 of this patch on top of my patches for VFS switches. The
    change was adding v9fs_statfs pointer to v9fs_super_ops_dotl
    instead of v9fs_super_ops.

    statfs - get file system statistics

    size[4] Tstatfs tag[2] fid[4]
    size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8]
    files[8] ffree[8] fsid[8] namelen[4]

    The statfs message is used to request file system information returned
    by the statfs(2) system call, which is used by df(1) to report file
    system and disk space usage.

    Signed-off-by: Jim Garlick
    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     
  • Implements VFS switches for 9p2000.L protocol.

    Signed-off-by: Sripathi Kodi
    Signed-off-by: Eric Van Hensbergen

    Sripathi Kodi
     

22 Apr, 2010

1 commit


06 Apr, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: saving negative to unsigned char
    9p: return on mutex_lock_interruptible()
    9p: Creating files with names too long should fail with ENAMETOOLONG.
    9p: Make sure we are able to clunk the cached fid on umount
    9p: drop nlink remove
    fs/9p: Clunk the fid resulting from partial walk of the name
    9p: documentation update
    9p: Fix setting of protocol flags in v9fs_session_info structure.

    Linus Torvalds