20 Nov, 2008

1 commit

  • Peter Cordes is sorry that he rm'ed his swapfiles while they were in use,
    he then had no pathname to swapoff. It's a curious little oversight, but
    not one worth a lot of hackery. Kudos to Willy Tarreau for turning this
    around from a discussion of synthetic pathnames to how to prevent unlink.
    Mimic immutable: prohibit unlinking an active swapfile in may_delete()
    (and don't worry my little head over the tiny race window).

    Signed-off-by: Hugh Dickins
    Cc: Willy Tarreau
    Acked-by: Christoph Hellwig
    Cc: Peter Cordes
    Cc: Bodo Eggert
    Cc: David Newall
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

23 Oct, 2008

8 commits


01 Aug, 2008

2 commits


27 Jul, 2008

14 commits


23 Jun, 2008

2 commits


17 May, 2008

1 commit

  • In case when both EEXIST and EROFS would apply we used to
    return the former in mkdir(2) and friends. Lest anyone suspects
    us of being consistent, in the same situation knfsd gave clients
    nfs_erofs...

    ro-bind series had switched the syscall side of things to
    returning -EROFS and immediately broke an application - namely,
    mkdir -p. Patch restores the original behaviour...

    Signed-off-by: Al Viro

    Al Viro
     

29 Apr, 2008

1 commit

  • Implement a cgroup to track and enforce open and mknod restrictions on device
    files. A device cgroup associates a device access whitelist with each cgroup.
    A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block).
    'all' means it applies to all types and all major and minor numbers. Major
    and minor are either an integer or * for all. Access is a composition of r
    (read), w (write), and m (mknod).

    The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of
    the parent. Admins can then remove devices from the whitelist or add new
    entries. A child cgroup can never receive a device access which is denied its
    parent. However when a device access is removed from a parent it will not
    also be removed from the child(ren).

    An entry is added using devices.allow, and removed using
    devices.deny. For instance

    echo 'c 1:3 mr' > /cgroups/1/devices.allow

    allows cgroup 1 to read and mknod the device usually known as
    /dev/null. Doing

    echo a > /cgroups/1/devices.deny

    will remove the default 'a *:* mrw' entry.

    CAP_SYS_ADMIN is needed to change permissions or move another task to a new
    cgroup. A cgroup may not be granted more permissions than the cgroup's parent
    has. Any task can move itself between cgroups. This won't be sufficient, but
    we can decide the best way to adequately restrict movement later.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
    Signed-off-by: Serge E. Hallyn
    Acked-by: James Morris
    Looks-good-to: Pavel Emelyanov
    Cc: Daniel Hokka Zakrisson
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

19 Apr, 2008

7 commits

  • This is the first really tricky patch in the series. It elevates the writer
    count on a mount each time a non-special file is opened for write.

    We used to do this in may_open(), but Miklos pointed out that __dentry_open()
    is used as well to create filps. This will cover even those cases, while a
    call in may_open() would not have.

    There is also an elevated count around the vfs_create() call in open_namei().
    See the comments for more details, but we need this to fix a 'create, remount,
    fail r/w open()' race.

    Some filesystems forego the use of normal vfs calls to create
    struct files. Make sure that these users elevate the mnt
    writer count because they will get __fput(), and we need
    to make sure they're balanced.

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     
  • This also uses the little helper in the NFS code to make an if() a little bit
    less ugly. We introduced the helper at the beginning of the series.

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     
  • [AV: add missing nfsd pieces]

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Al Viro

    Dave Hansen
     
  • This takes care of all of the direct callers of vfs_mknod().
    Since a few of these cases also handle normal file creation
    as well, this also covers some calls to vfs_create().

    So that we don't have to make three mnt_want/drop_write()
    calls inside of the switch statement, we move some of its
    logic outside of the switch and into a helper function
    suggested by Christoph.

    This also encapsulates a fix for mknod(S_IFREG) that Miklos
    found.

    [AV: merged mkdir handling, added missing nfsd pieces]

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     
  • Elevate the write count during the vfs_rmdir() and vfs_unlink().

    [AV: merged rmdir and unlink parts, added missing pieces in nfsd]

    Acked-by: Serge Hallyn
    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     
  • open_namei() will, in the future, need to take mount write counts
    over its creation and truncation (via may_open()) operations. It
    needs to keep these write counts until any potential filp that is
    created gets __fput()'d.

    This gets complicated in the error handling and becomes very murky
    as to how far open_namei() actually got, and whether or not that
    mount write count was taken. That makes it a bad interface.

    All that the current do_filp_open() really does is allocate the
    nameidata on the stack, then call open_namei().

    So, this merges those two functions and moves filp_open() over
    to namei.c so it can be close to its buddy: do_filp_open(). It
    also gets a kerneldoc comment in the process.

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Hansen
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • My end goal here is to make sure all users of may_open()
    return filps. This will ensure that we properly release
    mount write counts which were taken for the filp in
    may_open().

    This patch moves the sys_open flags to namei flags
    calculation into fs/namei.c. We'll shortly be moving
    the nameidata_to_filp() calls into namei.c, and this
    gets the sys_open flags to a place where we can get
    at them when we need them.

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Al Viro

    Dave Hansen
     

25 Mar, 2008

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    [PATCH] get stack footprint of pathname resolution back to relative sanity
    [PATCH] double iput() on failure exit in hugetlb
    [PATCH] double dput() on failure exit in tiny-shmem
    [PATCH] fix up new filp allocators
    [PATCH] check for null vfsmount in dentry_open()
    [PATCH] reiserfs: eliminate private use of struct file in xattr
    [PATCH] sanitize hppfs
    hppfs pass vfsmount to dentry_open()
    [PATCH] restore export of do_kern_mount()

    Linus Torvalds
     

20 Mar, 2008

1 commit

  • Fix kernel-doc notation warnings in fs/.

    Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
    * mark_files_ro
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
    * lookup_one_len: filesystem helper to lookup single pathname component
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
    * bh_uptodate_or_lock: Test whether the buffer is uptodate
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
    * bh_submit_read: Submit a locked buffer for reading
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
    * writeback_acquire: attempt to get exclusive writeback access to a device
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
    * writeback_in_progress: determine whether there is writeback in progress
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
    * writeback_release: relinquish exclusive writeback access against a device.
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
    Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
    * void journal_invalidatepage()

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

19 Mar, 2008

1 commit

  • Somebody had put struct nameidata in stack frame of link_path_walk().
    Unfortunately, there are certain realities to deal with:
    * It's in the middle of recursion. Depth is equal to the nesting
    depth of symlinks, i.e. up to 8.
    * struct namiedata is, even if one discards the intent junk,
    at least 12 pointers + 5 ints.
    * moreover, adding a stack frame is not free in that situation.
    * there are fs methods called on top of that, and they also have
    stack footprint.
    * kernel stack is not infinite.

    The thing is, even if one chooses to deal with -ESTALE that way (and it's
    one hell of an overkill), the only thing that needs to be preserved is
    vfsmount + dentry, not the entire struct nameidata.

    Signed-off-by: Al Viro

    Al Viro
     

15 Feb, 2008

1 commit