22 Sep, 2016

1 commit

  • When file permissions are modified via chmod(2) and the user is not in
    the owning group or capable of CAP_FSETID, the setgid bit is cleared in
    inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
    permissions as well as the new ACL, but doesn't clear the setgid bit in
    a similar way; this allows to bypass the check in chmod(2). Fix that.

    References: CVE-2016-7097
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: Jan Kara
    Signed-off-by: Andreas Gruenbacher

    Jan Kara
     

31 Mar, 2016

1 commit

  • When get_acl() is called for an inode whose ACL is not cached yet, the
    get_acl inode operation is called to fetch the ACL from the filesystem.
    The inode operation is responsible for updating the cached acl with
    set_cached_acl(). This is done without locking at the VFS level, so
    another task can call set_cached_acl() or forget_cached_acl() before the
    get_acl inode operation gets to calling set_cached_acl(), and then
    get_acl's call to set_cached_acl() results in caching an outdate ACL.

    Prevent this from happening by setting the cached ACL pointer to a
    task-specific sentinel value before calling the get_acl inode operation.
    Move the responsibility for updating the cached ACL from the get_acl
    inode operations to get_acl(). There, only set the cached ACL if the
    sentinel value hasn't changed.

    The sentinel values are chosen to have odd values. Likewise, the value
    of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have
    an even value (ACLs are aligned in memory). This allows to distinguish
    uncached ACLs values from ACL objects.

    In addition, switch from guarding inode->i_acl and inode->i_default_acl
    upates by the inode->i_lock spinlock to using xchg() and cmpxchg().

    Filesystems that do not want ACLs returned from their get_acl inode
    operations to be cached must call forget_cached_acl() to prevent the VFS
    from doing so.

    (Patch written by Al Viro and Andreas Gruenbacher.)

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

03 Apr, 2015

1 commit


26 Jan, 2014

3 commits


10 Feb, 2013

1 commit

  • Operations which modify extended attributes may need extra journal
    credits if inline data is used, since there is a chance that some
    extended attributes may need to get pushed to an external attribute
    block.

    Changes to reflect this was made in xattr.c, but they were missed in
    fs/ext4/acl.c. To fix this, abstract the calculation of the number of
    credits needed for xattr operations to an inline function defined in
    ext4_jbd2.h, and use it in acl.c and xattr.c.

    Also move the function declarations used in inline.c from xattr.h
    (where they are non-obviously hidden, and caused problems since
    ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
    them to ext4.h.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Tao Ma
    Reviewed-by: Jan Kara

    Theodore Ts'o
     

09 Feb, 2013

1 commit

  • So we can better understand what bits of ext4 are responsible for
    long-running jbd2 handles, use jbd2__journal_start() so we can pass
    context information for logging purposes.

    The recommended way for finding the longer-running handles is:

    T=/sys/kernel/debug/tracing
    EVENT=$T/events/jbd2/jbd2_handle_stats
    echo "interval > 5" > $EVENT/filter
    echo 1 > $EVENT/enable

    ./run-my-fs-benchmark

    cat $T/trace > /tmp/problem-handles

    This will list handles that were active for longer than 20ms. Having
    longer-running handles is bad, because a commit started at the wrong
    time could stall for those 20+ milliseconds, which could delay an
    fsync() or an O_SYNC operation. Here is an example line from the
    trace file describing a handle which lived on for 311 jiffies, or over
    1.2 seconds:

    postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
    tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
    dirtied_blocks 0

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

09 Nov, 2012

1 commit


18 Sep, 2012

2 commits

  • Convert ext2, ext3, and ext4 to fully support the posix acl changes,
    using e_uid e_gid instead e_id.

    Enabled building with posix acls enabled, all filesystems supporting
    user namespaces, now also support posix acls when user namespaces are enabled.

    Cc: Theodore Tso
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: Jan Kara
    Cc: Al Viro
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     
  • - Pass the user namespace the uid and gid values in the xattr are stored
    in into posix_acl_from_xattr.

    - Pass the user namespace kuid and kgid values should be converted into
    when storing uid and gid values in an xattr in posix_acl_to_xattr.

    - Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
    pass in &init_user_ns.

    In the short term this change is not strictly needed but it makes the
    code clearer. In the longer term this change is necessary to be able to
    mount filesystems outside of the initial user namespace that natively
    store posix acls in the linux xattr format.

    Cc: Theodore Tso
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: Jan Kara
    Cc: Al Viro
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

01 Aug, 2011

2 commits


26 Jul, 2011

4 commits

  • Replace the ->check_acl method with a ->get_acl method that simply reads an
    ACL from disk after having a cache miss. This means we can replace the ACL
    checking boilerplate code with a single implementation in namei.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • new helper: posix_acl_create(&acl, gfp, mode_p). Replaces acl with
    modified clone, on failure releases acl and replaces with NULL.
    Returns 0 or -ve on error. All callers of posix_acl_create_masq()
    switched.

    Signed-off-by: Al Viro

    Al Viro
     
  • new helper: posix_acl_chmod(&acl, gfp, mode). Replaces acl with modified
    clone or with NULL if that has failed; returns 0 or -ve on error. All
    callers of posix_acl_chmod_masq() switched to that - they'd been doing
    exactly the same thing.

    Signed-off-by: Al Viro

    Al Viro
     
  • This moves logic for checking the cached ACL values from low-level
    filesystems into generic code. The end result is a streamlined ACL
    check that doesn't need to load the inode->i_op->check_acl pointer at
    all for the common cached case.

    The filesystems also don't need to check for a non-blocking RCU walk
    case in their acl_check() functions, because that is all handled at a
    VFS layer.

    Signed-off-by: Linus Torvalds
    Signed-off-by: Al Viro

    Linus Torvalds
     

20 Jul, 2011

2 commits


24 Mar, 2011

1 commit


07 Jan, 2011

2 commits


16 Jun, 2010

1 commit


22 May, 2010

1 commit


17 Dec, 2009

1 commit

  • Add a flags argument to struct xattr_handler and pass it to all xattr
    handler methods. This allows using the same methods for multiple
    handlers, e.g. for the ACL methods which perform exactly the same action
    for the access and default ACLs, just using a different underlying
    attribute. With a little more groundwork it'll also allow sharing the
    methods for the regular user/trusted/secure handlers in extN, ocfs2 and
    jffs2 like it's already done for xfs in this patch.

    Also change the inode argument to the handlers to a dentry to allow
    using the handlers mechnism for filesystems that require it later,
    e.g. cifs.

    [with GFS2 bits updated by Steven Whitehouse ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     

09 Sep, 2009

1 commit


24 Jun, 2009

2 commits


17 Jun, 2009

1 commit

  • If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
    to do POSIX ACL checks on any files not owned by the caller, and it does
    this for every single pathname component that it looks up.

    That obviously can be pretty expensive if the filesystem isn't careful
    about it, especially with locking. That's doubly sad, since the common
    case tends to be that there are no ACL's associated with the files in
    question.

    ext4 already caches the ACL data so that it doesn't have to look it up
    over and over again, but it does so by taking the inode->i_lock spinlock
    on every lookup. Which is a noticeable overhead even if it's a private
    lock, especially on CPU's where the serialization is expensive (eg Intel
    Netburst aka 'P4').

    For the special case of not actually having any ACL's, all that locking is
    unnecessary. Even if somebody else were to be changing the ACL's on
    another CPU, we simply don't care - if we've seen a NULL ACL, we might as
    well use it.

    So just load the ACL speculatively without any locking, and if it was
    NULL, just use it. If it's non-NULL (either because we had a cached
    entry, or because the cache hasn't been filled in at all), it means that
    we'll need to get the lock and re-load it properly.

    (This commit was ported from a patch originally authored by Linus for
    ext3.)

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Al Viro

    Theodore Ts'o
     

01 Apr, 2009

1 commit


27 Jul, 2008

2 commits


30 Apr, 2008

2 commits


18 Jul, 2007

1 commit

  • Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
    users to it. This is done because we want to avoid bugs in the future
    where we check for only effective fsuid of the current task against a
    file's owning uid, without simultaneously checking for CAP_FOWNER as
    well, thus violating its semantics.
    [ XFS uses special macros and structures, and in general looked ...
    untouchable, so we leave it alone -- but it has been looked over. ]

    The (current->fsuid != inode->i_uid) check in generic_permission() and
    exec_permission_lite() is left alone, because those operations are
    covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
    falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

    Signed-off-by: Satyam Sharma
    Cc: Al Viro
    Acked-by: Serge E. Hallyn
    Signed-off-by: Linus Torvalds

    Satyam Sharma
     

12 Oct, 2006

4 commits