26 Jun, 2006

1 commit

  • When the linkat() syscall was added the flag parameter was added in the
    last minute but it wasn't used so far. The following patch should change
    that. My tests show that this is all that's needed.

    If OLDNAME is a symlink setting the flag causes linkat to follow the
    symlink and create a hardlink with the target. This is actually the
    behavior POSIX demands for link() as well but Linux wisely does not do
    this. With this flag (which will most likely be in the next POSIX
    revision) the programmer can choose the behavior, defaulting to the safe
    variant. As a side effect it is now possible to implement a
    POSIX-compliant link(2) function for those who are interested.

    touch file
    ln -s file symlink

    linkat(fd, "symlink", fd, "newlink", 0)
    -> newlink is hardlink of symlink

    linkat(fd, "symlink", fd, "newlink", AT_SYMLINK_FOLLOW)
    -> newlink is hardlink of file

    The value of AT_SYMLINK_FOLLOW is determined by the definition we already
    use in glibc.

    Signed-off-by: Ulrich Drepper
    Acked-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

23 Jun, 2006

1 commit

  • Add read_mapping_page() which is used for callers that pass
    mapping->a_ops->readpage as the filler for read_cache_page. This removes
    some duplication from filesystem code.

    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

20 Jun, 2006

1 commit

  • When an audit event involves changes to a directory entry, include
    a PATH record for the directory itself. A few other notable changes:

    - fixed audit_inode_child() hooks in fsnotify_move()
    - removed unused flags arg from audit_inode()
    - added audit log routines for logging a portion of a string

    Here's some sample output.

    before patch:
    type=SYSCALL msg=audit(1149821605.320:26): arch=40000003 syscall=39 success=yes exit=0 a0=bf8d3c7c a1=1ff a2=804e1b8 a3=bf8d3c7c items=1 ppid=739 pid=800 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
    type=CWD msg=audit(1149821605.320:26): cwd="/root"
    type=PATH msg=audit(1149821605.320:26): item=0 name="foo" parent=164068 inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0

    after patch:
    type=SYSCALL msg=audit(1149822032.332:24): arch=40000003 syscall=39 success=yes exit=0 a0=bfdd9c7c a1=1ff a2=804e1b8 a3=bfdd9c7c items=2 ppid=714 pid=777 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
    type=CWD msg=audit(1149822032.332:24): cwd="/root"
    type=PATH msg=audit(1149822032.332:24): item=0 name="/root" inode=164068 dev=03:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_dir_t:s0
    type=PATH msg=audit(1149822032.332:24): item=1 name="foo" inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0

    Signed-off-by: Amy Griffis
    Signed-off-by: Al Viro

    Amy Griffis
     

06 Jun, 2006

1 commit

  • From: Trond Myklebust

    We're presently running lock_kernel() under fs_lock via nfs's ->permission
    handler. That's a ranking bug and sometimes a sleep-in-spinlock bug. This
    problem was introduced in the openat() patchset.

    We should not need to hold the current->fs->lock for a codepath that doesn't
    use current->fs.

    [vsu@altlinux.ru: fix error path]
    Signed-off-by: Trond Myklebust
    Cc: Al Viro
    Signed-off-by: Sergey Vlasov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

01 Apr, 2006

1 commit


28 Mar, 2006

1 commit

  • In order to be able to trigger a mount using the follow_link inode method the
    nameidata struct that is passed in needs to have the vfsmount of the autofs
    trigger not its parent.

    During a path walk if an autofs trigger is mounted on a dentry, when the
    follow_link method is called, the nameidata struct contains the vfsmount and
    mountpoint dentry of the parent mount while the dentry that is passed in is
    the root of the autofs trigger mount. I believe it is impossible to get the
    vfsmount of the trigger mount, within the follow_link method, when only the
    parent vfsmount and the root dentry of the trigger mount are known.

    This patch updates the nameidata struct on entry to __do_follow_link if it
    detects that it is out of date. It moves the path_to_nameidata to above
    __do_follow_link to facilitate calling it from there. The dput_path is moved
    as well as that seemed sensible. No changes are made to these two functions.

    Signed-off-by: Ian Kent
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     

26 Mar, 2006

3 commits

  • * 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
    [PATCH] fix audit_init failure path
    [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
    [PATCH] sem2mutex: audit_netlink_sem
    [PATCH] simplify audit_free() locking
    [PATCH] Fix audit operators
    [PATCH] promiscuous mode
    [PATCH] Add tty to syscall audit records
    [PATCH] add/remove rule update
    [PATCH] audit string fields interface + consumer
    [PATCH] SE Linux audit events
    [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
    [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
    [PATCH] Fix IA64 success/failure indication in syscall auditing.
    [PATCH] Miscellaneous bug and warning fixes
    [PATCH] Capture selinux subject/object context information.
    [PATCH] Exclude messages by message type
    [PATCH] Collect more inode information during syscall processing.
    [PATCH] Pass dentry, not just name, in fsnotify creation hooks.
    [PATCH] Define new range of userspace messages.
    [PATCH] Filter rule comparators
    ...

    Fixed trivial conflict in security/selinux/hooks.c

    Linus Torvalds
     
  • As prepare_write, commit_write and readpage are allowed to return
    AOP_TRUNCATE_PAGE, page_symlink should respond to them.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • It seems there is error check missing in open_namei for errors returned
    through intent.open.file (from lookup_instantiate_filp).

    If there is plain open performed, then such a check done inside
    __path_lookup_intent_open called from path_lookup_open(), but when the open
    is performed with O_CREAT flag set, then __path_lookup_intent_open is only
    called with LOOKUP_PARENT set where no file opening can occur yet.

    Later on lookup_hash is called where exact opening might take place and
    intent.open.file may be filled. If it is filled with error value of some
    sort, then we get kernel attempting to dereference this error value as
    address (and corresponding oops) in nameidata_to_filp() called from
    filp_open().

    While this is relatively simple to workaround in ->lookup() method by just
    checking lookup_instantiate_filp() return value and returning error as
    needed, this is not so easy in ->d_revalidate(), where we can only return
    "yes, dentry is valid" or "no, dentry is invalid, perform full lookup
    again", and just returning 0 on error would cause extra lookup (with
    potential extra costly RPCs).

    So in short, I believe that there should be no difference in error handling
    for opening a file and creating a file in open_namei() and propose this
    simple patch as a solution.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Drokin
     

23 Mar, 2006

1 commit

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

21 Mar, 2006

2 commits

  • This patch augments the collection of inode info during syscall
    processing. It represents part of the functionality that was provided
    by the auditfs patch included in RHEL4.

    Specifically, it:

    - Collects information for target inodes created or removed during
    syscalls. Previous code only collects information for the target
    inode's parent.

    - Adds the audit_inode() hook to syscalls that operate on a file
    descriptor (e.g. fchown), enabling audit to do inode filtering for
    these calls.

    - Modifies filtering code to check audit context for either an inode #
    or a parent inode # matching a given rule.

    - Modifies logging to provide inode # for both parent and child.

    - Protect debug info from NULL audit_names.name.

    [AV: folded a later typo fix from the same author]

    Signed-off-by: Amy Griffis
    Signed-off-by: David Woodhouse
    Signed-off-by: Al Viro

    Amy Griffis
     
  • The audit hooks (to be added shortly) will want to see dentry->d_inode
    too, not just the name.

    Signed-off-by: Amy Griffis
    Signed-off-by: David Woodhouse

    Amy Griffis
     

12 Mar, 2006

1 commit

  • This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
    ext3_symlink(). Such allocation may re-enter ext3 code from
    try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal
    handle in task_struct and, hence, is not reentrable.

    This bug led to "Assertion failure in journal_dirty_metadata()" messages.

    http://bugzilla.openvz.org/show_bug.cgi?id=115

    Signed-off-by: Andrey Savochkin
    Signed-off-by: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     

25 Feb, 2006

1 commit

  • I'm currently at the POSIX meeting and one thing covered was the
    incompatibility of Linux's link() with the POSIX definition. The name.
    Linux does not follow symlinks, POSIX requires it does.

    Even if somebody thinks this is a good default behavior we cannot change this
    because it would break the ABI. But the fact remains that some application
    might want this behavior.

    We have one chance to help implementing this without breaking the behavior.
    For this we could use the new linkat interface which would need a new
    flags parameter. If the new parameter is AT_SYMLINK_FOLLOW the new
    behavior could be invoked.

    I do not want to introduce such a patch now. But we could add the
    parameter now, just don't use it. The patch below would do this. Can we
    get this late patch applied before the release more or less fixes the
    syscall API?

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Ralf Baechle
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

08 Feb, 2006

1 commit


06 Feb, 2006

2 commits

  • Signed-off-by: Ulrich Drepper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • When walking a path, the LOOKUP_CONTINUE flag is used by some filesystems
    (for instance NFS) in order to determine whether or not it is looking up
    the last component of the path. It this is the case, it may have to look
    at the intent information in order to perform various tasks such as atomic
    open.

    A problem currently occurs when link_path_walk() hits a symlink. In this
    case LOOKUP_CONTINUE may be cleared prematurely when we hit the end of the
    path passed by __vfs_follow_link() (i.e. the end of the symlink path)
    rather than when we hit the end of the path passed by the user.

    The solution is to have link_path_walk() clear LOOKUP_CONTINUE if and only
    if that flag was unset when we entered the function.

    Signed-off-by: Trond Myklebust
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

02 Feb, 2006

1 commit


19 Jan, 2006

1 commit

  • Here is a series of patches which introduce in total 13 new system calls
    which take a file descriptor/filename pair instead of a single file
    name. These functions, openat etc, have been discussed on numerous
    occasions. They are needed to implement race-free filesystem traversal,
    they are necessary to implement a virtual per-thread current working
    directory (think multi-threaded backup software), etc.

    We have in glibc today implementations of the interfaces which use the
    /proc/self/fd magic. But this code is rather expensive. Here are some
    results (similar to what Jim Meyering posted before).

    The test creates a deep directory hierarchy on a tmpfs filesystem. Then
    rm -fr is used to remove all directories. Without syscall support I get
    this:

    real 0m31.921s
    user 0m0.688s
    sys 0m31.234s

    With syscall support the results are much better:

    real 0m20.699s
    user 0m0.536s
    sys 0m20.149s

    The interfaces are for obvious reasons currently not much used. But they'll
    be used. coreutils (and Jeff's posixutils) are already using them.
    Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
    them. I expect a patch to make follow soon. Every program which is walking
    the filesystem tree will benefit.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Alexey Dobriyan
    Cc: Christoph Hellwig
    Cc: Al Viro
    Acked-by: Ingo Molnar
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

15 Jan, 2006

2 commits

  • Mark a few VFS functions as mandatory inline (based on Al Viro's request);
    these must be inline due to stack usage issues during a recursive loop that
    happens during the recursive symlink resolution (symlink to a symlink to a
    symlink ..)

    This patch at this point does not change behavior and is for documentation
    purposes only (but this changes later in the series)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Remove the "inline" keyword from a bunch of big functions in the kernel with
    the goal of shrinking it by 30kb to 40kb

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Acked-by: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

12 Jan, 2006

1 commit


10 Jan, 2006

1 commit


09 Jan, 2006

1 commit

  • SUS requires that when truncating a file to the size that it currently
    is:
    truncate and ftruncate should NOT modify ctime or mtime
    O_TRUNC SHOULD modify ctime and mtime.

    Currently mtime and ctime are always modified on most local
    filesystems (side effect of ->truncate) or never modified (on NFS).

    With this patch:
    ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when
    an update of these times is required whether size changes or not
    (via a new argument to do_truncate). This allows NFS to do
    the right thing for O_TRUNC.
    inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE
    sets the size to it's current value. This allows local filesystems
    to do the right thing for f?truncate.

    Also, the logic in inode_setattr is changed a bit so there are two return
    points. One returns the error from vmtruncate if it failed, the other
    returns 0 (there can be no other failure).

    Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change
    requested, we now fall-through and mark_inode_dirty. If a filesystem did
    not have a ->truncate function, then vmtruncate will have changed i_size,
    without marking the inode as 'dirty', and I think this is wrong.

    Signed-off-by: Neil Brown
    Cc: Christoph Hellwig
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

09 Nov, 2005

4 commits

  • This patch makes the needlessly global function path_lookup_create()
    static.

    Signed-off-by: Adrian Bunk
    Acked-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • ->permission and ->lookup have a struct nameidata * argument these days to
    pass down lookup intents. Unfortunately some callers of lookup_hash don't
    actually pass this one down. For lookup_one_len() we don't have a struct
    nameidata to pass down, but as this function is a library function only
    used by filesystem code this is an acceptable limitation. All other
    callers should pass down the nameidata, so this patch changes the
    lookup_hash interface to only take a struct nameidata argument and derives
    the other two arguments to __lookup_hash from it. All callers already have
    the nameidata argument available so this is not a problem.

    At the same time I'd like to deprecate the lookup_hash interface as there
    are better exported interfaces for filesystem usage. Before it can
    actually be removed I need to fix up rpc_pipefs.

    Signed-off-by: Christoph Hellwig
    Cc: Ram Pai
    Cc: Jeff Mahoney
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • A few more callers of permission() just want to check for a different access
    pattern on an already open file. This patch adds a wrapper for permission()
    that takes a file in preparation of per-mount read-only support and to clean
    up the callers a little. The helper is not intended for new code, everything
    without the interface set in stone should use vfs_permission()

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Most permission() calls have a struct nameidata * available. This helper
    takes that as an argument and thus makes sure we pass it down for lookup
    intents and prepares for per-mount read-only support where we need a struct
    vfsmount for checking whether a file is writeable.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

07 Nov, 2005

1 commit


31 Oct, 2005

1 commit


19 Oct, 2005

2 commits


07 Oct, 2005

1 commit

  • The nameidata "last.name" is always allocated with "__getname()", and
    should always be free'd with "__putname()".

    Using "putname()" without the underscores will leak memory, because the
    allocation will have been hidden from the AUDITSYSCALL code.

    Arguably the real bug is that the AUDITSYSCALL code is really broken,
    but in the meantime this fixes the problem people see.

    Reported by Robert Derr, patch by Rick Lindsley.

    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Sep, 2005

1 commit


10 Sep, 2005

2 commits


08 Sep, 2005

1 commit

  • Extract common code into inline functions to make reading easier.

    Signed-off-by: Miklos Szeredi
    Cc:
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

27 Aug, 2005

1 commit


20 Aug, 2005

1 commit

  • This bug could cause oopses and page state corruption, because ncpfs
    used the generic page-cache symlink handlign functions. But those
    functions only work if the page cache is guaranteed to be "stable", ie a
    page that was installed when the symlink walk was started has to still
    be installed in the page cache at the end of the walk.

    We could have fixed ncpfs to not use the generic helper routines, but it
    is in many ways much cleaner to instead improve on the symlink walking
    helper routines so that they don't require that absolute stability.

    We do this by allowing "follow_link()" to return a error-pointer as a
    cookie, which is fed back to the cleanup "put_link()" routine. This
    also simplifies NFS symlink handling.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Aug, 2005

1 commit