04 Jan, 2012

1 commit


19 Jul, 2011

2 commits


29 May, 2011

1 commit

  • Some recent benchmarking on btrfs showed that a major scaling bottleneck
    on large systems on btrfs is currently the xattr lookup on every write.

    Why xattr lookup on every write I hear you ask?

    write wants to drop suid and security related xattrs that could set o
    capabilities for executables. To do that it currently looks up
    security.capability on EVERY write (even for non executables) to decide
    whether to drop it or not.

    In btrfs this causes an additional tree walk, hitting some per file system
    locks and quite bad scalability. In a simple read workload on a 8S
    system I saw over 90% CPU time in spinlocks related to that.

    Chris Mason tells me this is also a problem in ext4, where it hits
    the global mbcache lock.

    This patch adds a simple per inode to avoid this problem. We only
    do the lookup once per file and then if there is no xattr cache
    the decision. All xattr changes clear the flag.

    I also used the same flag to avoid the suid check, although
    that one is pretty cheap.

    A file system can also set this flag when it creates the inode,
    if it has a cheap way to do so. This is done for some common file systems
    in followon patches.

    With this patch a major part of the lock contention disappears
    for btrfs. Some testing on smaller systems didn't show significant
    performance changes, but at least it helps the larger systems
    and is generally more efficient.

    v2: Rename is_sgid. add file system helper.
    Cc: chris.mason@oracle.com
    Cc: josef@redhat.com
    Cc: viro@zeniv.linux.org.uk
    Cc: agruen@linbit.com
    Cc: Serge E. Hallyn
    Signed-off-by: Andi Kleen
    Signed-off-by: Al Viro

    Andi Kleen
     

27 May, 2011

1 commit

  • Return -ENODATA when trying to read a user.* attribute which cannot
    exist: user space otherwise does not have a reasonable way to
    distinguish between non-existent and inaccessible attributes.

    Likewise, return -ENODATA when an unprivileged process tries to read a
    trusted.* attribute: to unprivileged processes, those attributes are
    invisible (listxattr() won't include them).

    Related to this bug report: https://bugzilla.redhat.com/660613

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

21 Apr, 2011

1 commit

  • For some reason generic_setxattr() did not pass flags (XATTR_CREATE,
    XATTR_REPLACE) to the filesystem specific helper. This caused that
    setxattr(2) syscall just ignored these flags.

    Fix the bug by passing flags correctly.

    Signed-off-by: Jan Kara
    Acked-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Jan Kara
     

24 Mar, 2011

1 commit


22 May, 2010

1 commit

  • The entries in xattr handler table should be immutable (ie const)
    like other operation tables.

    Later patches convert common filesystems. Uncoverted filesystems
    will still work, but will generate a compiler warning.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Al Viro

    Stephen Hemminger
     

17 Dec, 2009

1 commit

  • Add a flags argument to struct xattr_handler and pass it to all xattr
    handler methods. This allows using the same methods for multiple
    handlers, e.g. for the ACL methods which perform exactly the same action
    for the access and default ACLs, just using a different underlying
    attribute. With a little more groundwork it'll also allow sharing the
    methods for the regular user/trusted/secure handlers in extN, ocfs2 and
    jffs2 like it's already done for xfs in this patch.

    Also change the inode argument to the handlers to a dentry to allow
    using the handlers mechnism for filesystems that require it later,
    e.g. cifs.

    [with GFS2 bits updated by Steven Whitehouse ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     

10 Sep, 2009

1 commit


12 Jun, 2009

1 commit

  • This patch speeds up lmbench lat_mmap test by about another 2% after the
    first patch.

    Before:
    avg = 462.286
    std = 5.46106

    After:
    avg = 453.12
    std = 9.58257

    (50 runs of each, stddev gives a reasonable confidence)

    It does this by introducing mnt_clone_write, which avoids some heavyweight
    operations of mnt_want_write if called on a vfsmount which we know already
    has a write count; and mnt_want_write_file, which can call mnt_clone_write
    if the file is open for write.

    After these two patches, mnt_want_write and mnt_drop_write go from 7% on
    the profile down to 1.3% (including mnt_clone_write).

    [AV: mnt_want_write_file() should take file alone and derive mnt from it;
    not only all callers have that form, but that's the only mnt about which
    we know that it's already held for write if file is opened for write]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

21 Apr, 2009

1 commit


14 Jan, 2009

3 commits


06 Jan, 2009

1 commit

  • We used to have rather schizophrenic set of checks for NULL ->i_op even
    though it had been eliminated years ago. You'd need to go out of your
    way to set it to NULL explicitly _and_ a bunch of code would die on
    such inodes anyway. After killing two remaining places that still
    did that bogosity, all that crap can go away.

    Signed-off-by: Al Viro

    Al Viro
     

27 Jul, 2008

2 commits


29 Apr, 2008

1 commit


23 Apr, 2008

1 commit


19 Apr, 2008

1 commit


15 Feb, 2008

2 commits

  • * Add path_put() functions for releasing a reference to the dentry and
    vfsmount of a struct path in the right order

    * Switch from path_release(nd) to path_put(&nd->path)

    * Rename dput_path() to path_put_conditional()

    [akpm@linux-foundation.org: fix cifs]
    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Cc:
    Cc: Al Viro
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • This is the central patch of a cleanup series. In most cases there is no good
    reason why someone would want to use a dentry for itself. This series reflects
    that fact and embeds a struct path into nameidata.

    Together with the other patches of this series
    - it enforced the correct order of getting/releasing the reference count on
    pairs
    - it prepares the VFS for stacking support since it is essential to have a
    struct path in every place where the stack can be traversed
    - it reduces the overall code size:

    without patch series:
    text data bss dec hex filename
    5321639 858418 715768 6895825 6938d1 vmlinux

    with patch series:
    text data bss dec hex filename
    5320026 858418 715768 6894212 693284 vmlinux

    This patch:

    Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix cifs]
    [akpm@linux-foundation.org: fix smack]
    Signed-off-by: Jan Blunck
    Signed-off-by: Andreas Gruenbacher
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     

06 Feb, 2008

2 commits

  • Originally vfs_getxattr would pull the security xattr variable using
    the inode getxattr handle and then proceed to clobber it with a subsequent call
    to the LSM.

    This patch reorders the two operations such that when the xattr requested is
    in the security namespace it first attempts to grab the value from the LSM
    directly.

    If it fails to obtain the value because there is no module present or the
    module does not support the operation it will fall back to using the inode
    getxattr operation.

    In the event that both are inaccessible it returns EOPNOTSUPP.

    Signed-off-by: David P. Quigley
    Cc: Stephen Smalley
    Cc: Chris Wright
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Casey Schaufler
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David P. Quigley
     
  • This patch modifies the interface to inode_getsecurity to have the function
    return a buffer containing the security blob and its length via parameters
    instead of relying on the calling function to give it an appropriately sized
    buffer.

    Security blobs obtained with this function should be freed using the
    release_secctx LSM hook. This alleviates the problem of the caller having to
    guess a length and preallocate a buffer for this function allowing it to be
    used elsewhere for Labeled NFS.

    The patch also removed the unused err parameter. The conversion is similar to
    the one performed by Al Viro for the security_getprocattr hook.

    Signed-off-by: David P. Quigley
    Cc: Stephen Smalley
    Cc: Chris Wright
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Casey Schaufler
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David P. Quigley
     

21 Oct, 2007

1 commit


18 Jul, 2007

1 commit

  • Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
    users to it. This is done because we want to avoid bugs in the future
    where we check for only effective fsuid of the current task against a
    file's owning uid, without simultaneously checking for CAP_FOWNER as
    well, thus violating its semantics.
    [ XFS uses special macros and structures, and in general looked ...
    untouchable, so we leave it alone -- but it has been looked over. ]

    The (current->fsuid != inode->i_uid) check in generic_permission() and
    exec_permission_lite() is left alone, because those operations are
    covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
    falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

    Signed-off-by: Satyam Sharma
    Cc: Al Viro
    Acked-by: Serge E. Hallyn
    Signed-off-by: Linus Torvalds

    Satyam Sharma
     

11 May, 2007

1 commit


09 May, 2007

1 commit


09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

04 Nov, 2006

1 commit

  • The user.* extended attributes are only allowed on regular files and
    directories. Sticky directories further restrict write access to the owner
    and privileged users. (See the attr(5) man page for an explanation.)

    The original check in ext2/ext3 when user.* xattrs were merged was more
    restrictive than intended, and when the xattr permission checks were moved
    into the VFS, read access to user.* attributes on sticky directores ended
    up being denied in addition.

    Originally-from: Gerard Neil
    Signed-off-by: Andreas Gruenbacher
    Cc: Dave Kleikamp
    Cc: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Gruenbacher
     

10 Oct, 2006

1 commit

  • This patch moves code out of fs/xattr.c:listxattr into a new function -
    vfs_listxattr. The code for vfs_listxattr was originally submitted by Bill
    Nottingham to Unionfs.

    Sorry about that. The reason for this submission is to make the
    listxattr code in fs/xattr.c a little cleaner (as well as to clean up
    some code in Unionfs.)

    Currently, Unionfs has vfs_listxattr defined in its code. I think
    that's very ugly, and I'd like to see it (re)moved. The logical place
    to put it, is along side of all the other vfs_*xattr functions.

    Overall, I think this patch is benefitial for both kernel.org kernel and
    Unionfs.

    Signed-off-by: Josef "Jeff" Sipek
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Bill Nottingham
     

20 Jun, 2006

1 commit

  • When an audit event involves changes to a directory entry, include
    a PATH record for the directory itself. A few other notable changes:

    - fixed audit_inode_child() hooks in fsnotify_move()
    - removed unused flags arg from audit_inode()
    - added audit log routines for logging a portion of a string

    Here's some sample output.

    before patch:
    type=SYSCALL msg=audit(1149821605.320:26): arch=40000003 syscall=39 success=yes exit=0 a0=bf8d3c7c a1=1ff a2=804e1b8 a3=bf8d3c7c items=1 ppid=739 pid=800 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
    type=CWD msg=audit(1149821605.320:26): cwd="/root"
    type=PATH msg=audit(1149821605.320:26): item=0 name="foo" parent=164068 inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0

    after patch:
    type=SYSCALL msg=audit(1149822032.332:24): arch=40000003 syscall=39 success=yes exit=0 a0=bfdd9c7c a1=1ff a2=804e1b8 a3=bfdd9c7c items=2 ppid=714 pid=777 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 comm="mkdir" exe="/bin/mkdir" subj=root:system_r:unconfined_t:s0-s0:c0.c255
    type=CWD msg=audit(1149822032.332:24): cwd="/root"
    type=PATH msg=audit(1149822032.332:24): item=0 name="/root" inode=164068 dev=03:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_dir_t:s0
    type=PATH msg=audit(1149822032.332:24): item=1 name="foo" inode=164010 dev=03:00 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=root:object_r:user_home_t:s0

    Signed-off-by: Amy Griffis
    Signed-off-by: Al Viro

    Amy Griffis
     

21 Mar, 2006

1 commit

  • This patch augments the collection of inode info during syscall
    processing. It represents part of the functionality that was provided
    by the auditfs patch included in RHEL4.

    Specifically, it:

    - Collects information for target inodes created or removed during
    syscalls. Previous code only collects information for the target
    inode's parent.

    - Adds the audit_inode() hook to syscalls that operate on a file
    descriptor (e.g. fchown), enabling audit to do inode filtering for
    these calls.

    - Modifies filtering code to check audit context for either an inode #
    or a parent inode # matching a given rule.

    - Modifies logging to provide inode # for both parent and child.

    - Protect debug info from NULL audit_names.name.

    [AV: folded a later typo fix from the same author]

    Signed-off-by: Amy Griffis
    Signed-off-by: David Woodhouse
    Signed-off-by: Al Viro

    Amy Griffis
     

11 Jan, 2006

2 commits

  • )

    From: Christoph Hellwig

    The xattr code has rather complex permission checks because the rules are very
    different for different attribute namespaces. This patch moves as much as we
    can into the generic code. Currently all the major disk based filesystems
    duplicate these checks, while many minor filesystems or network filesystems
    lack some or all of them.

    To do this we need defines for the extended attribute names in common code, I
    moved them up from JFS which had the nicest defintions.

    Signed-off-by: Christoph Hellwig
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@osdl.org
     
  • Add vfs_getxattr, vfs_setxattr and vfs_removexattr helpers for common checks
    around invocation of the xattr methods. NFSD already was missing some of the
    checks and there will be more soon.

    Signed-off-by: Christoph Hellwig
    Cc: James Morris

    (James, I haven't touched selinux yet because it's doing various odd things
    and I'm not sure how it would interact with the security attribute fallbacks
    you added. Could you investigate whether it could use vfs_getxattr or if not
    add a __vfs_getxattr helper to share the bits it is fine with?)

    For NFSv4: instead of just converting it add an nfsd_getxattr helper for the
    code shared by NFSv2/3 and NFSv4 ACLs. In fact that code isn't even
    NFS-specific, but I'll wait for more users to pop up first before moving it to
    common code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Dave Kleikamp
    Signed-off-by: Adrian Bunk
    Signed-off-by: Neil Brown
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

10 Jan, 2006

1 commit


13 Dec, 2005

1 commit

  • Commit f549d6c18c0e8e6cf1bf0e7a47acc1daf7e2cec1 introduced a generic
    fallback for security xattrs, but appears to include a subtle bug.

    Gentoo users with kernels with selinux compiled in, and coreutils compiled
    with acl support, noticed that they could not copy files on tmpfs using
    'cp'.

    cp (compiled with acl support) copies the file, lists the extended
    attributes on the old file, copies them all to the new file, and then
    exits. However the listxattr() calls were failing with this odd behaviour:

    llistxattr("a.out", (nil), 0) = 17
    llistxattr("a.out", 0x7fffff8c6cb0, 17) = -1 ERANGE (Numerical result out of
    range)

    I believe this is a simple problem in the logic used to check the buffer
    sizes; if the user sends a buffer the exact size of the data, then its ok
    :)

    This change solves the problem.
    More info can be found at http://bugs.gentoo.org/113138

    Signed-off-by: Daniel Drake
    Acked-by: James Morris
    Acked-by: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Drake
     

07 Nov, 2005

1 commit

  • This is the fs/ part of the big kfree cleanup patch.

    Remove pointless checks for NULL prior to calling kfree() in fs/.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     

31 Oct, 2005

1 commit

  • This patch allows SELinux to canonicalize the value returned from
    getxattr() via the security_inode_getsecurity() hook, which is called after
    the fs level getxattr() function.

    The purpose of this is to allow the in-core security context for an inode
    to override the on-disk value. This could happen in cases such as
    upgrading a system to a different labeling form (e.g. standard SELinux to
    MLS) without needing to do a full relabel of the filesystem.

    In such cases, we want getxattr() to return the canonical security context
    that the kernel is using rather than what is stored on disk.

    The implementation hooks into the inode_getsecurity(), adding another
    parameter to indicate the result of the preceding fs-level getxattr() call,
    so that SELinux knows whether to compare a value obtained from disk with
    the kernel value.

    We also now allow getxattr() to work for mountpoint labeled filesystems
    (i.e. mount with option context=foo_t), as we are able to return the
    kernel value to the user.

    Signed-off-by: James Morris
    Signed-off-by: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Morris