21 Dec, 2012

8 commits


19 Oct, 2012

1 commit

  • Commit 38f38657444d ("xattr: extract simple_xattr code from tmpfs") moved
    some code from tmpfs but introduced a subtle bug along the way.

    If the name passed to simple_xattr_remove() does not exist in the list of
    xattrs, then it is possible to call kfree(new_xattr) when new_xattr is
    actually initialized to itself on the stack via uninitialized_var().

    This causes a BUG() since the memory was not allocated via the slab
    allocator and was not bypassed through to the page allocator because it
    was too large.

    Initialize the local variable to NULL so the kfree() never takes place.

    Reported-by: Fengguang Wu
    Signed-off-by: David Rientjes
    Acked-by: Hugh Dickins
    Acked-by: Aristeu Rozanski
    Signed-off-by: Linus Torvalds

    David Rientjes
     

12 Oct, 2012

1 commit

  • Currently, this gets set mostly by happenstance when we call into
    audit_inode_child. While that might be a little more efficient, it seems
    wrong. If the syscall ends up failing before audit_inode_child ever gets
    called, then you'll have an audit_names record that shows the full path
    but has the parent inode info attached.

    Fix this by passing in a parent flag when we call audit_inode that gets
    set to the value of LOOKUP_PARENT. We can then fix up the pathname for
    the audit entry correctly from the get-go.

    While we're at it, clean up the no-op macro for audit_inode in the
    !CONFIG_AUDITSYSCALL case.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

03 Oct, 2012

3 commits

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - Integrity: add local fs integrity verification to detect offline
    attacks
    - Integrity: add digital signature verification
    - Simple stacking of Yama with other LSMs (per LSS discussions)
    - IBM vTPM support on ppc64
    - Add new driver for Infineon I2C TIS TPM
    - Smack: add rule revocation for subject labels"

    Fixed conflicts with the user namespace support in kernel/auditsc.c and
    security/integrity/ima/ima_policy.c.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
    Documentation: Update git repository URL for Smack userland tools
    ima: change flags container data type
    Smack: setprocattr memory leak fix
    Smack: implement revoking all rules for a subject label
    Smack: remove task_wait() hook.
    ima: audit log hashes
    ima: generic IMA action flag handling
    ima: rename ima_must_appraise_or_measure
    audit: export audit_log_task_info
    tpm: fix tpm_acpi sparse warning on different address spaces
    samples/seccomp: fix 31 bit build on s390
    ima: digital signature verification support
    ima: add support for different security.ima data types
    ima: add ima_inode_setxattr/removexattr function and calls
    ima: add inode_post_setattr call
    ima: replace iint spinblock with rwlock/read_lock
    ima: allocating iint improvements
    ima: add appraise action keywords and default rules
    ima: integrity appraisal extension
    vfs: move ima_file_free before releasing the file
    ...

    Linus Torvalds
     
  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

27 Sep, 2012

1 commit


18 Sep, 2012

1 commit

  • - In setxattr if we are setting a posix acl convert uids and gids from
    the current user namespace into the initial user namespace, before
    the xattrs are passed to the underlying filesystem.

    Untranslatable uids and gids are represented as -1 which
    posix_acl_from_xattr will represent as INVALID_UID or INVALID_GID.
    posix_acl_valid will fail if an acl from userspace has any
    INVALID_UID or INVALID_GID values. In net this guarantees that
    untranslatable posix acls will not be stored by filesystems.

    - In getxattr if we are reading a posix acl convert uids and gids from
    the initial user namespace into the current user namespace.

    Uids and gids that can not be tranlsated into the current user namespace
    will be represented as -1.

    - Replace e_id in struct posix_acl_entry with an anymouns union of
    e_uid and e_gid. For the short term retain the e_id field
    until all of the users are converted.

    - Don't set struct posix_acl.e_id in the cases where the acl type
    does not use e_id. Greatly reducing the use of ACL_UNDEFINED_ID.

    - Rework the ordering checks in posix_acl_valid so that I use kuid_t
    and kgid_t types throughout the code, and so that I don't need
    arithmetic on uid and gid types.

    Cc: Theodore Tso
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: Jan Kara
    Cc: Al Viro
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

14 Sep, 2012

2 commits

  • new_xattr in __simple_xattr_set() is only initialized with a valid
    pointer if value is not NULL, which only happens if this function is
    called directly with the intention to remove an existing extended
    attribute. Even being safe to be this way, smatch warns about possible
    NULL dereference. Dan Carpenter suggested using uninitialized_var()
    which will make both gcc and smatch happy.

    Cc: Fengguang Wu
    Cc: Dan Carpenter
    Signed-off-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Aristeu Rozanski
     
  • v2: add function documentation instead of adding a separate file under
    Documentation/

    tj: Updated comment a bit and rolled in Randy's suggestions.

    Cc: Li Zefan
    Cc: Tejun Heo
    Cc: Hugh Dickins
    Cc: Hillf Danton
    Cc: Lennart Poettering
    Cc: Randy Dunlap
    Signed-off-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Aristeu Rozanski
     

08 Sep, 2012

1 commit


25 Aug, 2012

1 commit

  • Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.

    $ size vmlinux.o
    text data bss dec hex filename
    4658782 880729 5195032 10734543 a3cbcf vmlinux.o
    $ size vmlinux.o
    text data bss dec hex filename
    4658957 880729 5195032 10734718 a3cc7e vmlinux.o

    v7:
    - checkpatch warnings fixed
    - Implement the changes requested by Hugh Dickins:
    - make simple_xattrs_init and simple_xattrs_free inline
    - get rid of locking and list reinitialization in simple_xattrs_free,
    they're not needed
    v6:
    - no changes
    v5:
    - no changes
    v4:
    - move simple_xattrs_free() to fs/xattr.c
    v3:
    - in kmem_xattrs_free(), reinitialize the list
    - use simple_xattr_* prefix
    - introduce simple_xattr_add() to prevent direct list usage

    Original-patch-by: Li Zefan
    Cc: Li Zefan
    Cc: Hillf Danton
    Cc: Lennart Poettering
    Acked-by: Hugh Dickins
    Signed-off-by: Li Zefan
    Signed-off-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Aristeu Rozanski
     

31 Jul, 2012

1 commit


30 May, 2012

1 commit


06 Apr, 2012

3 commits


29 Feb, 2012

1 commit


04 Jan, 2012

1 commit


19 Jul, 2011

2 commits


29 May, 2011

1 commit

  • Some recent benchmarking on btrfs showed that a major scaling bottleneck
    on large systems on btrfs is currently the xattr lookup on every write.

    Why xattr lookup on every write I hear you ask?

    write wants to drop suid and security related xattrs that could set o
    capabilities for executables. To do that it currently looks up
    security.capability on EVERY write (even for non executables) to decide
    whether to drop it or not.

    In btrfs this causes an additional tree walk, hitting some per file system
    locks and quite bad scalability. In a simple read workload on a 8S
    system I saw over 90% CPU time in spinlocks related to that.

    Chris Mason tells me this is also a problem in ext4, where it hits
    the global mbcache lock.

    This patch adds a simple per inode to avoid this problem. We only
    do the lookup once per file and then if there is no xattr cache
    the decision. All xattr changes clear the flag.

    I also used the same flag to avoid the suid check, although
    that one is pretty cheap.

    A file system can also set this flag when it creates the inode,
    if it has a cheap way to do so. This is done for some common file systems
    in followon patches.

    With this patch a major part of the lock contention disappears
    for btrfs. Some testing on smaller systems didn't show significant
    performance changes, but at least it helps the larger systems
    and is generally more efficient.

    v2: Rename is_sgid. add file system helper.
    Cc: chris.mason@oracle.com
    Cc: josef@redhat.com
    Cc: viro@zeniv.linux.org.uk
    Cc: agruen@linbit.com
    Cc: Serge E. Hallyn
    Signed-off-by: Andi Kleen
    Signed-off-by: Al Viro

    Andi Kleen
     

27 May, 2011

1 commit

  • Return -ENODATA when trying to read a user.* attribute which cannot
    exist: user space otherwise does not have a reasonable way to
    distinguish between non-existent and inaccessible attributes.

    Likewise, return -ENODATA when an unprivileged process tries to read a
    trusted.* attribute: to unprivileged processes, those attributes are
    invisible (listxattr() won't include them).

    Related to this bug report: https://bugzilla.redhat.com/660613

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

21 Apr, 2011

1 commit

  • For some reason generic_setxattr() did not pass flags (XATTR_CREATE,
    XATTR_REPLACE) to the filesystem specific helper. This caused that
    setxattr(2) syscall just ignored these flags.

    Fix the bug by passing flags correctly.

    Signed-off-by: Jan Kara
    Acked-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Jan Kara
     

24 Mar, 2011

1 commit


22 May, 2010

1 commit

  • The entries in xattr handler table should be immutable (ie const)
    like other operation tables.

    Later patches convert common filesystems. Uncoverted filesystems
    will still work, but will generate a compiler warning.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Al Viro

    Stephen Hemminger
     

17 Dec, 2009

1 commit

  • Add a flags argument to struct xattr_handler and pass it to all xattr
    handler methods. This allows using the same methods for multiple
    handlers, e.g. for the ACL methods which perform exactly the same action
    for the access and default ACLs, just using a different underlying
    attribute. With a little more groundwork it'll also allow sharing the
    methods for the regular user/trusted/secure handlers in extN, ocfs2 and
    jffs2 like it's already done for xfs in this patch.

    Also change the inode argument to the handlers to a dentry to allow
    using the handlers mechnism for filesystems that require it later,
    e.g. cifs.

    [with GFS2 bits updated by Steven Whitehouse ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     

10 Sep, 2009

1 commit


12 Jun, 2009

1 commit

  • This patch speeds up lmbench lat_mmap test by about another 2% after the
    first patch.

    Before:
    avg = 462.286
    std = 5.46106

    After:
    avg = 453.12
    std = 9.58257

    (50 runs of each, stddev gives a reasonable confidence)

    It does this by introducing mnt_clone_write, which avoids some heavyweight
    operations of mnt_want_write if called on a vfsmount which we know already
    has a write count; and mnt_want_write_file, which can call mnt_clone_write
    if the file is open for write.

    After these two patches, mnt_want_write and mnt_drop_write go from 7% on
    the profile down to 1.3% (including mnt_clone_write).

    [AV: mnt_want_write_file() should take file alone and derive mnt from it;
    not only all callers have that form, but that's the only mnt about which
    we know that it's already held for write if file is opened for write]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

21 Apr, 2009

1 commit


14 Jan, 2009

3 commits