27 Jul, 2008

40 commits

  • * dup2() should return -EBADF on exceeded sysctl_nr_open
    * dup() should *not* return -EINVAL even if you have rlimit set to 0;
    it should get -EMFILE instead.

    Check for orig_start exceeding rlimit taken to sys_fcntl().
    Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF.
    Consequently, remaining checks for rlimit are taken to expand_files().

    Signed-off-by: Al Viro

    Al Viro
     
  • Since Ulrich is OK with getting rid of dup3(fd, fd, flags) completely,
    to hell the damn thing goes. Corner case for dup2() is handled in
    sys_dup2() (complete with -EBADF if dup2(fd, fd) is called with fd
    that is not open), the rest is done in dup3().

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • fs.h needs path.h, not namei.h; nfs_fs.h doesn't need it at all.
    Several places in the tree needed direct include.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • make it atomic_long_t; while we are at it, get rid of useless checks in affs,
    hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.

    Signed-off-by: Al Viro

    Al Viro
     
  • Al Viro notice one cornercase that the new dup3() code. The dup2()
    function, as a special case, handles dup-ing to the same file
    descriptor. In this case the current dup3() code does nothing at
    all. I.e., it ingnores the flags parameter. This shouldn't happen,
    the close-on-exec flag should be set if requested.

    In case the O_CLOEXEC bit in the flags parameter is not set the
    dup3() function should behave in this respect identical to dup2().
    This means dup3(fd, fd, 0) should not actively reset the c-o-e
    flag.

    The patch below implements this minor change.

    [AV: credits to Artur Grabowski for bringing that up as potential subtle point
    in dup2() behaviour]

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Al Viro

    Ulrich Drepper
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Preparation to untangling intents mess: reduce the number of do_path_lookup()
    callers.

    Signed-off-by: Al Viro

    Al Viro
     
  • * do not pass nameidata; struct path is all the callers want.
    * switch to new helpers:
    user_path_at(dfd, pathname, flags, &path)
    user_path(pathname, &path)
    user_lpath(pathname, &path)
    user_path_dir(pathname, &path) (fail if not a directory)
    The last 3 are trivial macro wrappers for the first one.
    * remove nameidata in callers.

    Signed-off-by: Al Viro

    Al Viro
     
  • Almost all users __user_walk_fd() and friends care only about struct path.
    Get rid of the few that do not.

    Signed-off-by: Al Viro

    Al Viro
     
  • Incidentally, the name that gives hundreds of false positives on grep
    is not a good idea...

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • On Mon, May 19, 2008 at 12:01:49AM +0200, Marcin Slusarz wrote:
    > open_exec is needlessly indented, calls ERR_PTR with 0 argument
    > (which is not valid errno) and jumps into middle of function
    > just to return value.
    > So clean it up a bit.

    Still looks rather messy. See below for a better version.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the immutable and append-only checks from chmod, chown and utimes
    into notify_change(). Checks for immutable and append-only files are
    always performed by the VFS and not by the filesystem (see
    permission() and may_...() in namei.c), so these belong in
    notify_change(), and not in inode_change_ok().

    This should be completely equivalent.

    CC: Ulrich Drepper
    CC: Michael Kerrisk
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • The FAT_IOCTL_SET_ATTRIBUTES ioctl() calls notify_change() to change
    the file mode before changing the inode attributes. Replace with
    explicit calls to security_inode_setattr(), fat_setattr() and
    fsnotify_change().

    This is equivalent to the original. The reason it is needed, is that
    later in the series we move the immutable check into notify_change().
    That would break the FAT_IOCTL_SET_ATTRIBUTES ioctl, as it needs to
    perform the mode change regardless of the immutability of the file.

    [Fix error if fat is built as a module. Thanks to OGAWA Hirofumi for
    noticing.]

    Signed-off-by: Miklos Szeredi
    Acked-by: OGAWA Hirofumi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Untange the mess that is do_utimes(). Add kerneldoc comment to
    do_utimes().

    CC: Ulrich Drepper
    CC: Michael Kerrisk
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Add a new ia_valid flag: ATTR_TIMES_SET, to handle the
    UTIMES_OMIT/UTIMES_NOW and UTIMES_NOW/UTIMES_OMIT cases. In these
    cases neither ATTR_MTIME_SET nor ATTR_ATIME_SET is in the flags, yet
    the POSIX draft specifies that permission checking is performed the
    same way as if one or both of the times was explicitly set to a
    timestamp.

    See the path "vfs: utimensat(): fix error checking for
    {UTIME_NOW,UTIME_OMIT} case" by Michael Kerrisk for the patch
    introducing this behavior.

    This is a cleanup, as well as allowing filesystems (NFS/fuse/...) to
    perform their own permission checking instead of the default.

    CC: Ulrich Drepper
    CC: Michael Kerrisk
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • - use kstrdup() instead of kmalloc() + memcpy()
    - return NULL if allocating ->mnt_devname failed
    - mnt_devname should be const

    Signed-off-by: Li Zefan
    Acked-by: Cyrill Gorcunov
    Signed-off-by: Al Viro

    Li Zefan
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ... and get rid of the last "let's deduce mask from nameidata->flags"
    bit.

    Signed-off-by: Al Viro

    Al Viro
     
  • * MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS
    * MAY_ACCESS on fuse should affect only the last step of pathname resolution
    * fchdir() and chroot() should pass MAY_ACCESS, for the same reason why
    chdir() needs that.
    * now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be
    removed; it has no business being in nameidata.

    Signed-off-by: Al Viro

    Al Viro
     
  • long overdue...

    Signed-off-by: Al Viro

    Al Viro
     
  • ... so we ought to pass MAY_CHDIR to vfs_permission() instead of having
    it triggered on every step of preceding pathname resolution. LOOKUP_CHDIR
    is killed by that.

    Signed-off-by: Al Viro

    Al Viro
     
  • Remove the unused mode parameter from vfs_symlink and callers.

    Thanks to Tetsuo Handa for noticing.

    CC: Tetsuo Handa
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Why not reuse "inode" which is assigned as

    struct inode *inode = old_dentry->d_inode;

    in the beginning of vfs_link() ?

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Miklos Szeredi

    Tetsuo Handa
     
  • All calls to remove_suid() are made with a file pointer, because
    (similarly to file_update_time) it is called when the file is written.

    Clean up callers by passing in a file instead of a dentry.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • vfs_permission(MAY_WRITE) already checked for the inode being
    immutable, so no need to repeat it.

    Signed-off-by: Miklos Szeredi
    Acked-by: Christoph Hellwig

    Miklos Szeredi
     
  • * kill nameidata * argument; map the 3 bits in ->flags anybody cares
    about to new MAY_... ones and pass with the mask.
    * kill redundant gfs2_iop_permission()
    * sanitize ecryptfs_permission()
    * fix remaining places where ->permission() instances might barf on new
    MAY_... found in mask.

    The obvious next target in that direction is permission(9)

    folded fix for nfs_permission() breakage from Miklos Szeredi

    Signed-off-by: Al Viro

    Al Viro
     
  • hpfs_unlink() calls permission() prior to truncating the file. HPFS
    doesn't define a .permission method, so replace with explicit call to
    generic_permission().

    This is equivalent, except that devcgroup_inode_permission() and
    security_inode_permission() are not called.

    The truncation is just an implementation detail of the unlink, so
    these security checks are unnecessary.

    I suspect that even calling generic_permission() is unnecessary, since
    we shouldn't mind if the file isn't writable. But I leave that to the
    maintainer to decide.

    Signed-off-by: Miklos Szeredi
    CC: Mikulas Patocka

    Miklos Szeredi
     
  • * keep references to ctl_table_head and ctl_table in /proc/sys inodes
    * grab the former during operations, use the latter for access to
    entry if that succeeds
    * have ->d_compare() check if table should be seen for one who does lookup;
    that allows us to avoid flipping inodes - if we have the same name resolve
    to different things, we'll just keep several dentries and ->d_compare()
    will reject the wrong ones.
    * have ->lookup() and ->readdir() scan the table of our inode first, then
    walk all ctl_table_header and scan ->attached_by for those that are
    attached to our directory.
    * implement ->getattr().
    * get rid of insane amounts of tree-walking
    * get rid of the need to know dentry in ->permission() and of the contortions
    induced by that.

    Signed-off-by: Al Viro

    Al Viro
     
  • In a sense, that's the heart of the series. It's based on the following
    property of the trees we are actually asked to add: they can be split into
    stem that is already covered by registered trees and crown that is entirely
    new. IOW, if a/b and a/c/d are introduced by our tree, then a/c is also
    introduced by it.

    That allows to associate tree and table entry with each node in the union;
    while directory nodes might be covered by many trees, only one will cover
    the node by its crown. And that will allow much saner logics for /proc/sys
    in the next patches. This patch introduces the data structures needed to
    keep track of that.

    When adding a sysctl table, we find a "parent" one. Which is to say,
    find the deepest node on its stem that already is present in one of the
    tables from our table set or its ancestor sets. That table will be our
    parent and that node in it - attachment point. Add our table to list
    anchored in parent, have it refer the parent and contents of attachment
    point. Also remember where its crown lives.

    Signed-off-by: Al Viro

    Al Viro
     
  • Massage ipv4 initialization - make sure that net.ipv4 appears as
    non-per-net-namespace before it shows up in per-net-namespace sysctls.
    That's the only change outside of sysctl.c needed to get sane ordering
    rules and data structures for sysctls (esp. for procfs side of that
    mess).

    Signed-off-by: Al Viro

    Al Viro
     
  • Refcount the sucker; instead of freeing it by the end of unregistration
    just drop the refcount and free only when it hits zero. Make sure that
    we _always_ make ->unregistering non-NULL in start_unregistering().

    That allows anybody to get a reference to such puppy, preventing its
    freeing and reuse. It does *not* block unregistration. Anybody who
    holds such a reference can
    * try to grab a "use" reference (ctl_head_grab()); that will
    succeeds if and only if it hadn't entered unregistration yet. If it
    succeeds, we can use it in all normal ways until we release the "use"
    reference (with ctl_head_finish()). Note that this relies on having
    ->unregistering become non-NULL in all cases when one starts to unregister
    the sucker.
    * keep pointers to ctl_table entries; they *can* be freed if
    the entire thing is unregistered. However, if ctl_head_grab() succeeds,
    we know that unregistration had not happened (and will not happen until
    ctl_head_finish()) and such pointers can be used safely.

    IOW, now we can have inodes under /proc/sys keep references to ctl_table
    entries, protecting them with references to ctl_table_header and
    grabbing the latter for the duration of operations that require access
    to ctl_table. That won't cause deadlocks, since unregistration will not
    be stopped by mere keeping a reference to ctl_table_header.

    Signed-off-by: Al Viro

    Al Viro
     
  • New object: set of sysctls [currently - root and per-net-ns].
    Contains: pointer to parent set, list of tables and "should I see this set?"
    method (->is_seen(set)).
    Current lists of tables are subsumed by that; net-ns contains such a beast.
    ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
    that to ->list of that ctl_table_set.

    [folded compile fixes by rdd for configs without sysctl]

    Signed-off-by: Al Viro

    Al Viro
     
  • hppfs_permission() is equivalent to the '.permission == NULL' case.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Merge fifo and pipe file_operations.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: Al Viro

    Denys Vlasenko
     
  • Lookup can install a child dentry for a deleted directory. This keeps
    the directory dentry alive, and the inode pinned in the cache and on
    disk, even after all external references have gone away.

    This isn't a big problem normally, since memory pressure or umount
    will clear out the directory dentry and its children, releasing the
    inode. But for UBIFS this causes problems because its orphan area can
    overflow.

    Fix this by returning ENOENT for all lookups on a S_DEAD directory
    before creating a child dentry.

    Thanks to Zoltan Sogor for noticing this while testing UBIFS, and
    Artem for the excellent analysis of the problem and testing.

    Reported-by: Artem Bityutskiy
    Tested-by: Artem Bityutskiy
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ftrace: fix modular build
    ftrace: disable tracing on acpi idle calls
    ftrace: remove latency-tracer leftover
    ftrace: only trace preempt off with preempt tracer
    ftrace: fix 4d3702b6 (post-v2.6.26): WARNING: at kernel/lockdep.c:2731 check_flags (ftrace)

    Linus Torvalds