04 Mar, 2013

1 commit

  • Modify the request_module to prefix the file system type with "fs-"
    and add aliases to all of the filesystems that can be built as modules
    to match.

    A common practice is to build all of the kernel code and leave code
    that is not commonly needed as modules, with the result that many
    users are exposed to any bug anywhere in the kernel.

    Looking for filesystems with a fs- prefix limits the pool of possible
    modules that can be loaded by mount to just filesystems trivially
    making things safer with no real cost.

    Using aliases means user space can control the policy of which
    filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
    with blacklist and alias directives. Allowing simple, safe,
    well understood work-arounds to known problematic software.

    This also addresses a rare but unfortunate problem where the filesystem
    name is not the same as it's module name and module auto-loading
    would not work. While writing this patch I saw a handful of such
    cases. The most significant being autofs that lives in the module
    autofs4.

    This is relevant to user namespaces because we can reach the request
    module in get_fs_type() without having any special permissions, and
    people get uncomfortable when a user specified string (in this case
    the filesystem type) goes all of the way to request_module.

    After having looked at this issue I don't think there is any
    particular reason to perform any filtering or permission checks beyond
    making it clear in the module request that we want a filesystem
    module. The common pattern in the kernel is to call request_module()
    without regards to the users permissions. In general all a filesystem
    module does once loaded is call register_filesystem() and go to sleep.
    Which means there is not much attack surface exposed by loading a
    filesytem module unless the filesystem is mounted. In a user
    namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
    which most filesystems do not set today.

    Acked-by: Serge Hallyn
    Acked-by: Kees Cook
    Reported-by: Kees Cook
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

10 Oct, 2012

1 commit


03 Oct, 2012

1 commit

  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

23 Jul, 2012

3 commits

  • This patch makes UFS stop using the VFS '->write_super()' method along with
    the 's_dirt' superblock flag, because they are on their way out.

    The way we implement this is that we schedule a delay job instead relying on
    's_dirt' and '->write_super()'.

    The whole "superblock write-out" VFS infrastructure is served by the
    'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
    writes out all dirty superblocks using the '->write_super()' call-back. But the
    problem with this thread is that it wastes power by waking up the system every
    5 seconds, even if there are no diry superblocks, or there are no client
    file-systems which would need this (e.g., btrfs does not use
    '->write_super()'). So we want to kill it completely and thus, we need to make
    file-systems to stop using the '->write_super()' VFS service, and then remove
    it together with the kernel thread.

    Tested using fsstress from the LTP project.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • This patch does not do any functional changes. It only moves 3 functions
    in fs/ufs/super.c a little bit up in order to prepare for further changes
    where I'll need this new arrangement to avoid forward declarations.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • UFS calls 'ufs_write_super()' from 'ufs_put_super()' in order to write the
    superblocks to the media. However, it is not needed because VFS calls
    '->sync_fs()' before calling '->put_super()' - so by the time we are in
    'ufs_write_super()', the superblocks are already synchronized.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     

11 May, 2012

1 commit

  • This allows comparing hash and len in one operation on 64-bit
    architectures. Right now only __d_lookup_rcu() takes advantage of this,
    since that is the case we care most about.

    The use of anonymous struct/unions hides the alternate 64-bit approach
    from most users, the exception being a few cases where we initialize a
    'struct qstr' with a static initializer. This makes the problematic
    cases use a new QSTR_INIT() helper function for that (but initializing
    just the name pointer with a "{ .name = xyzzy }" initializer remains
    valid, as does just copying another qstr structure).

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Mar, 2012

1 commit


21 Mar, 2012

2 commits


07 Jan, 2012

1 commit


04 Jan, 2012

1 commit

  • Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
    it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
    the cost of taking it into inode_init_always() will be negligible for pipes
    and sockets and negative for everything else. Not to mention the removal of
    boilerplate code from ->destroy_inode() instances...

    Signed-off-by: Al Viro

    Al Viro
     

31 Mar, 2011

1 commit


03 Mar, 2011

1 commit

  • This introduces a new per-superblock mutex in UFS to replace
    the big kernel lock. I have been careful to avoid nested
    calls to lock_ufs and to get the lock order right with
    respect to other mutexes, in particular lock_super.

    I did not make any attempt to prove that the big kernel
    lock is not needed in a particular place in the code,
    which is very possible.

    The mutex has a significant performance impact, so it is only
    used on SMP or PREEMPT configurations.

    As Nick Piggin noticed, any allocation inside of the lock
    may end up deadlocking when we get to ufs_getfrag_block
    in the reclaim task, so we now use GFP_NOFS.

    Signed-off-by: Arnd Bergmann
    Tested-by: Nick Bowler
    Cc: Evgeniy Dushistov
    Cc: Nick Piggin

    Arnd Bergmann
     

07 Jan, 2011

1 commit

  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

29 Oct, 2010

1 commit


05 Oct, 2010

1 commit

  • This patch is a preparation necessary to remove the BKL from do_new_mount().
    It explicitly adds calls to lock_kernel()/unlock_kernel() around
    get_sb/fill_super operations for filesystems that still uses the BKL.

    I've read through all the code formerly covered by the BKL inside
    do_kern_mount() and have satisfied myself that it doesn't need the BKL
    any more.

    do_kern_mount() is already called without the BKL when mounting the rootfs
    and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
    from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
    through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
    afs_mntpt_follow_link(). Both later functions are actually the filesystems
    follow_link inode operation. vfs_kern_mount() is calling the specified
    get_sb function and lets the filesystem do its job by calling the given
    fill_super function.

    Therefore I think it is safe to push down the BKL from the VFS to the
    low-level filesystems get_sb/fill_super operation.

    [arnd: do not add the BKL to those file systems that already
    don't use it elsewhere]

    Signed-off-by: Jan Blunck
    Signed-off-by: Arnd Bergmann
    Cc: Matthew Wilcox
    Cc: Christoph Hellwig

    Jan Blunck
     

10 Aug, 2010

1 commit


31 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    quota: Convert quota statistics to generic percpu_counter
    ext3 uses rb_node = NULL; to zero rb_root.
    quota: Fixup dquot_transfer
    reiserfs: Fix resuming of quotas on remount read-write
    pohmelfs: Remove dead quota code
    ufs: Remove dead quota code
    udf: Remove dead quota code
    quota: rename default quotactl methods to dquot_
    quota: explicitly set ->dq_op and ->s_qcop
    quota: drop remount argument to ->quota_on and ->quota_off
    quota: move unmount handling into the filesystem
    quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
    quota: move remount handling into the filesystem
    ocfs2: Fix use after free on remount read-only

    Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c

    Linus Torvalds
     

28 May, 2010

1 commit

  • I recently had to recover some files from an old broken machine that was
    running BorderWare Document Gateway. It's basically a drop in web server
    for sharing files. From the look of the init process and using strings on
    of a few files it seems to be based on FreeBSD 3.3.

    The process turned out to be more difficult than I imagined, but to cut a
    long story short BorderWare in their wisdom use a nonstandard magic number
    in their UFS (ufstype=44bsd) file systems. Thus Linux refuses to mount
    the file systems in order to recover the data. After a bit of hunting I
    was able to make a quick fix to fs/ufs/super.c in order to detect the new
    magic number.

    I assume that this number is the same for all installations. It's quite
    easy to find out from ufs_fs.h. The superblock sits 8k into the block
    device and the magic number its 1372 bytes into the superblock struct.

    # dd if=/dev/sda5 skip=$(( 8192 + 1372 )) bs=1 count=4 2> /dev/null | hd
    00000000 97 26 24 0f |.&$.|
    #

    Signed-off-by: Thomas Stewart
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Stewart
     

24 May, 2010

6 commits

  • UFS quota is non-functional at least since 2.6.12 because dq_op was set
    to NULL. Since the filesystem exists mainly to allow cooperation with Solaris
    and quota format isn't standard, just remove the dead code.

    CC: Evgeniy Dushistov
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Follow the dquot_* style used elsewhere in dquot.c.

    [Jan Kara: Fixed up missing conversion of ext2]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Only set the quota operation vectors if the filesystem actually supports
    quota instead of doing it for all filesystems in alloc_super().

    [Jan Kara: Export dquot_operations and vfs_quotactl_ops]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently the VFS calls into the quotactl interface for unmounting
    filesystems. This means filesystems with their own quota handling
    can't easily distinguish between user-space originating quotaoff
    and an unount. Instead move the responsibily of the unmount handling
    into the filesystem to be consistent with all other dquot handling.

    Note that we do call dquot_disable a lot later now, e.g. after
    a sync_filesystem. But this is fine as the quota code does all its
    writes via blockdev's mapping and that is synced even later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Instead of having wrappers in the VFS namespace export the dquot_suspend
    and dquot_resume helpers directly. Also rename vfs_quota_disable to
    dquot_disable while we're at it.

    [Jan Kara: Moved dquot_suspend to quotaops.h and made it inline]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently do_remount_sb calls into the dquot code to tell it about going
    from rw to ro and ro to rw. Move this code into the filesystem to
    not depend on the dquot code in the VFS - note ocfs2 already ignores
    these calls and handles remount by itself. This gets rid of overloading
    the quotactl calls and allows to unify the VFS and XFS codepaths in
    that area later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

13 Mar, 2010

1 commit

  • Recent releases of Solaris set the fs_clean state of an unmounted UFS file
    system as FSLOG ("logging fs"). However, the Linux kernel currently does
    not recognize the value which represents this state. Thus, attempting to
    mount such a file system rw produces the message

    kernel: ufs_read_super: can't grok fs_clean 0xfffffffd

    and the file system is mounted read-only. This patch makes the kernel
    recognize that value.

    Signed-off-by: Alex Viskovatoff
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Viskovatoff
     

05 Mar, 2010

2 commits

  • Get rid of the drop dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_drop helper to __dquot_drop
    and vfs_dq_drop to dquot_drop to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Currently clear_inode calls vfs_dq_drop directly. This means
    we tie the quota code into the VFS. Get rid of that and make the
    filesystem responsible for the drop inside the ->clear_inode
    superblock operation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

16 Dec, 2009

1 commit


12 Jun, 2009

6 commits

  • Add a ->sync_fs method for data integrity syncs, and reimplement
    ->write_super ontop of it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • [xfs, btrfs, capifs, shmem don't need BKL, exempt]

    Signed-off-by: Alessio Igor Bogani
    Signed-off-by: Al Viro

    Alessio Igor Bogani
     
  • Push down lock_super into ->write_super instances and remove it from the
    caller.

    Following filesystem don't need ->s_lock in ->write_super and are skipped:

    * bfs, nilfs2 - no other uses of s_lock and have internal locks in
    ->write_super
    * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
    * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
    ->write_super
    * xfs - no other uses of s_lock and uses internal lock (buffer lock on
    superblock buffer) to serialize ->write_super. Also xfs_fs_write_super
    is superflous and will go away in the next merge window

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Note that since we can't run into contention between remount_fs and write_super
    (due to exclusion on s_umount), we have to care only about filesystems that
    touch lock_super() on their own. Out of those ext3, ext4, hpfs, sysv and ufs
    do need it; fat doesn't since its ->remount_fs() only accesses assign-once
    data (basically, it's "we have no atime on directories and only have atime on
    files for vfat; force nodiratime and possibly noatime into *flags").

    [folded a build fix from hch]

    Signed-off-by: Al Viro

    Al Viro
     
  • Move BKL into ->put_super from the only caller. A couple of
    filesystems had trivial enough ->put_super (only kfree and NULLing of
    s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
    hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
    of them probably don't need it, but I'd rather sort that out individually.
    Preferably after all the other BKL pushdowns in that area.

    [AV: original used to move lock_super() down as well; these changes are
    removed since we don't do lock_super() at all in generic_shutdown_super()
    now]
    [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We just did a full fs writeout using sync_filesystem before, and if
    that's not enough for the filesystem it can perform it's own writeout
    in ->put_super, which many filesystems already do.

    Move a call to foofs_write_super into every foofs_put_super for now to
    guarantee identical behaviour until it's cleaned up by the individual
    filesystem maintainers.

    Exceptions:

    - affs already has identical copy & pasted code at the beginning of
    affs_put_super so no need to do it twice.
    - xfs does the right thing without it and I have changes pending for
    the xfs tree touching this are so I don't really need conflicts
    here..

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

03 Apr, 2009

1 commit


28 Mar, 2009

1 commit


23 Mar, 2009

1 commit


14 Oct, 2008

1 commit

  • This is a much better version of a previous patch to make the parser
    tables constant. Rather than changing the typedef, we put the "const" in
    all the various places where its required, allowing the __initconst
    exception for nfsroot which was the cause of the previous trouble.

    This was posted for review some time ago and I believe its been in -mm
    since then.

    Signed-off-by: Steven Whitehouse
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Steven Whitehouse