02 Jul, 2022

4 commits

  • commit 705191b03d507744c7e097f78d583621c14988ac upstream.

    Last cycle we extended the idmapped mounts infrastructure to support
    idmapped mounts of idmapped filesystems (No such filesystem yet exist.).
    Since then, the meaning of an idmapped mount is a mount whose idmapping
    is different from the filesystems idmapping.

    While doing that work we missed to adapt the acl translation helpers.
    They still assume that checking for the identity mapping is enough. But
    they need to use the no_idmapping() helper instead.

    Note, POSIX ACLs are always translated right at the userspace-kernel
    boundary using the caller's current idmapping and the initial idmapping.
    The order depends on whether we're coming from or going to userspace.
    The filesystem's idmapping doesn't matter at the border.

    Consequently, if a non-idmapped mount is passed we need to make sure to
    always pass the initial idmapping as the mount's idmapping and not the
    filesystem idmapping. Since it's irrelevant here it would yield invalid
    ids and prevent setting acls for filesystems that are mountable in a
    userns and support posix acls (tmpfs and fuse).

    I verified the regression reported in [1] and verified that this patch
    fixes it. A regression test will be added to xfstests in parallel.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1]
    Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems")
    Cc: Seth Forshee
    Cc: Christoph Hellwig
    Cc: # 5.15+
    Cc:
    Signed-off-by: Christian Brauner (Microsoft)
    Signed-off-by: Linus Torvalds
    Signed-off-by: Christian Brauner (Microsoft)
    Signed-off-by: Greg Kroah-Hartman

    Christian Brauner
     
  • commit bd303368b776eead1c29e6cdda82bde7128b82a7 upstream.

    In previous patches we added new and modified existing helpers to handle
    idmapped mounts of filesystems mounted with an idmapping. In this final
    patch we convert all relevant places in the vfs to actually pass the
    filesystem's idmapping into these helpers.

    With this the vfs is in shape to handle idmapped mounts of filesystems
    mounted with an idmapping. Note that this is just the generic
    infrastructure. Actually adding support for idmapped mounts to a
    filesystem mountable with an idmapping is follow-up work.

    In this patch we extend the definition of an idmapped mount from a mount
    that that has the initial idmapping attached to it to a mount that has
    an idmapping attached to it which is not the same as the idmapping the
    filesystem was mounted with.

    As before we do not allow the initial idmapping to be attached to a
    mount. In addition this patch prevents that the idmapping the filesystem
    was mounted with can be attached to a mount created based on this
    filesystem.

    This has multiple reasons and advantages. First, attaching the initial
    idmapping or the filesystem's idmapping doesn't make much sense as in
    both cases the values of the i_{g,u}id and other places where k{g,u}ids
    are used do not change. Second, a user that really wants to do this for
    whatever reason can just create a separate dedicated identical idmapping
    to attach to the mount. Third, we can continue to use the initial
    idmapping as an indicator that a mount is not idmapped allowing us to
    continue to keep passing the initial idmapping into the mapping helpers
    to tell them that something isn't an idmapped mount even if the
    filesystem is mounted with an idmapping.

    Link: https://lore.kernel.org/r/20211123114227.3124056-11-brauner@kernel.org (v1)
    Link: https://lore.kernel.org/r/20211130121032.3753852-11-brauner@kernel.org (v2)
    Link: https://lore.kernel.org/r/20211203111707.3901969-11-brauner@kernel.org
    Cc: Seth Forshee
    Cc: Amir Goldstein
    Cc: Christoph Hellwig
    Cc: Al Viro
    CC: linux-fsdevel@vger.kernel.org
    Reviewed-by: Seth Forshee
    Signed-off-by: Christian Brauner
    Signed-off-by: Christian Brauner (Microsoft)
    Signed-off-by: Greg Kroah-Hartman

    Christian Brauner
     
  • commit 4472071331549e911a5abad41aea6e3be855a1a4 upstream.

    In a few places the vfs needs to interact with bare k{g,u}ids directly
    instead of struct inode. These are just a few. In previous patches we
    introduced low-level mapping helpers that are able to support
    filesystems mounted an idmapping. This patch simply converts the places
    to use these new helpers.

    Link: https://lore.kernel.org/r/20211123114227.3124056-7-brauner@kernel.org (v1)
    Link: https://lore.kernel.org/r/20211130121032.3753852-7-brauner@kernel.org (v2)
    Link: https://lore.kernel.org/r/20211203111707.3901969-7-brauner@kernel.org
    Cc: Seth Forshee
    Cc: Amir Goldstein
    Cc: Christoph Hellwig
    Cc: Al Viro
    CC: linux-fsdevel@vger.kernel.org
    Reviewed-by: Seth Forshee
    Signed-off-by: Christian Brauner
    Signed-off-by: Christian Brauner (Microsoft)
    Signed-off-by: Greg Kroah-Hartman

    Christian Brauner
     
  • commit a793d79ea3e041081cd7cbd8ee43d0b5e4914a2b upstream.

    The low-level mapping helpers were so far crammed into fs.h. They are
    out of place there. The fs.h header should just contain the higher-level
    mapping helpers that interact directly with vfs objects such as struct
    super_block or struct inode and not the bare mapping helpers. Similarly,
    only vfs and specific fs code shall interact with low-level mapping
    helpers. And so they won't be made accessible automatically through
    regular {g,u}id helpers.

    Link: https://lore.kernel.org/r/20211123114227.3124056-3-brauner@kernel.org (v1)
    Link: https://lore.kernel.org/r/20211130121032.3753852-3-brauner@kernel.org (v2)
    Link: https://lore.kernel.org/r/20211203111707.3901969-3-brauner@kernel.org
    Cc: Seth Forshee
    Cc: Christoph Hellwig
    Cc: Al Viro
    CC: linux-fsdevel@vger.kernel.org
    Reviewed-by: Amir Goldstein
    Reviewed-by: Seth Forshee
    Signed-off-by: Christian Brauner
    Signed-off-by: Christian Brauner (Microsoft)
    Signed-off-by: Greg Kroah-Hartman

    Christian Brauner
     

19 Aug, 2021

2 commits

  • Overlayfs does not cache ACL's (to avoid double caching). Instead it just
    calls the underlying filesystem's i_op->get_acl(), which will return the
    cached value, if possible.

    In rcu path walk, however, get_cached_acl_rcu() is employed to get the
    value from the cache, which will fail on overlayfs resulting in dropping
    out of rcu walk mode. This can result in a big performance hit in certain
    situations.

    Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which
    indicates pass-through)

    Reported-by: garyhuang
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Add a rcu argument to the ->get_acl() callback to allow
    get_cached_acl_rcu() to call the ->get_acl() method in the next patch.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

24 Jan, 2021

5 commits

  • Extend some inode methods with an additional user namespace argument. A
    filesystem that is aware of idmapped mounts will receive the user
    namespace the mount has been marked with. This can be used for
    additional permission checking and also to enable filesystems to
    translate between uids and gids if they need to. We have implemented all
    relevant helpers in earlier patches.

    As requested we simply extend the exisiting inode method instead of
    introducing new ones. This is a little more code churn but it's mostly
    mechanical and doesnt't leave us with additional inode methods.

    Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The posix acl permission checking helpers determine whether a caller is
    privileged over an inode according to the acls associated with the
    inode. Add helpers that make it possible to handle acls on idmapped
    mounts.

    The vfs and the filesystems targeted by this first iteration make use of
    posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
    translate basic posix access and default permissions such as the
    ACL_USER and ACL_GROUP type according to the initial user namespace (or
    the superblock's user namespace) to and from the caller's current user
    namespace. Adapt these two helpers to handle idmapped mounts whereby we
    either map from or into the mount's user namespace depending on in which
    direction we're translating.
    Similarly, cap_convert_nscap() is used by the vfs to translate user
    namespace and non-user namespace aware filesystem capabilities from the
    superblock's user namespace to the caller's user namespace. Enable it to
    handle idmapped mounts by accounting for the mount's user namespace.

    In addition the fileystems targeted in the first iteration of this patch
    series make use of the posix_acl_chmod() and, posix_acl_update_mode()
    helpers. Both helpers perform permission checks on the target inode. Let
    them handle idmapped mounts. These two helpers are called when posix
    acls are set by the respective filesystems to handle this case we extend
    the ->set() method to take an additional user namespace argument to pass
    the mount's user namespace down.

    Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The inode_owner_or_capable() helper determines whether the caller is the
    owner of the inode or is capable with respect to that inode. Allow it to
    handle idmapped mounts. If the inode is accessed through an idmapped
    mount it according to the mount's user namespace. Afterwards the checks
    are identical to non-idmapped mounts. If the initial user namespace is
    passed nothing changes so non-idmapped mounts will see identical
    behavior as before.

    Similarly, allow the inode_init_owner() helper to handle idmapped
    mounts. It initializes a new inode on idmapped mounts by mapping the
    fsuid and fsgid of the caller from the mount's user namespace. If the
    initial user namespace is passed nothing changes so non-idmapped mounts
    will see identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The two helpers inode_permission() and generic_permission() are used by
    the vfs to perform basic permission checking by verifying that the
    caller is privileged over an inode. In order to handle idmapped mounts
    we extend the two helpers with an additional user namespace argument.
    On idmapped mounts the two helpers will make sure to map the inode
    according to the mount's user namespace and then peform identical
    permission checks to inode_permission() and generic_permission(). If the
    initial user namespace is passed nothing changes so non-idmapped mounts
    will see identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • In order to determine whether a caller holds privilege over a given
    inode the capability framework exposes the two helpers
    privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
    verifies that the inode has a mapping in the caller's user namespace and
    the latter additionally verifies that the caller has the requested
    capability in their current user namespace.
    If the inode is accessed through an idmapped mount map it into the
    mount's user namespace. Afterwards the checks are identical to
    non-idmapped inodes. If the initial user namespace is passed all
    operations are a nop so non-idmapped mounts will not see a change in
    behavior.

    Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: Christian Brauner

    Christian Brauner
     

09 Jun, 2020

1 commit

  • posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact
    the permission logic internally must not check that bit (it's only for
    upper layers to decide whether they can block to do IO to look up the
    acl information or not).

    But the way the code was written, it _looked_ like it cared, since the
    function explicitly did not mask that bit off.

    But it has exactly two callers: one for when that bit is set, which
    first clears the bit before calling posix_acl_permission(), and the
    other call site when that bit was clear.

    So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be
    used for the actual permission test, and that currently is pointlessly
    cleared by the callers when the function itself should just not care.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Jan, 2020

1 commit

  • Fix kernel-doc warnings in fs/posix_acl.c.
    Also fix one typo (setgit -> setgid).

    fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
    fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
    fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'

    Link: http://lkml.kernel.org/r/29b0dc46-1f28-a4e5-b1d0-ba2b65629779@infradead.org
    Fixes: 073931017b49d ("posix_acl: Clear SGID bit when setting file permissions")

    Signed-off-by: Randy Dunlap
    Acked-by: Andreas Gruenbacher
    Reviewed-by: Jan Kara
    Cc: Jan Kara
    Cc: Andreas Gruenbacher
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

03 Jan, 2018

1 commit

  • atomic_t variables are currently used to implement reference
    counters with the following properties:
    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable posix_acl.a_refcount is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    **Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.
    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.
    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.
    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the posix_acl.a_refcount it might make a difference
    in following places:
    - get_cached_acl(): increment in refcount_inc_not_zero() only
    guarantees control dependency on success vs. fully ordered
    atomic counterpart. However this operation is performed under
    rcu_read_lock(), so this should be fine.
    - posix_acl_release(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart

    Suggested-by: Kees Cook
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Signed-off-by: Elena Reshetova
    Signed-off-by: Jaegeuk Kim

    Elena Reshetova
     

02 Mar, 2017

1 commit


10 Jan, 2017

1 commit

  • This change was missed the tmpfs modification in In CVE-2016-7097
    commit 073931017b49 ("posix_acl: Clear SGID bit when setting
    file permissions")
    It can test by xfstest generic/375, which failed to clear
    setgid bit in the following test case on tmpfs:

    touch $testfile
    chown 100:100 $testfile
    chmod 2755 $testfile
    _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile

    Signed-off-by: Gu Zheng
    Signed-off-by: Al Viro

    Gu Zheng
     

11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

08 Oct, 2016

2 commits


28 Sep, 2016

2 commits

  • Remove the unnecessary typedefs and the zero-length a_entries array in
    struct posix_acl_xattr_header.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

22 Sep, 2016

1 commit

  • When file permissions are modified via chmod(2) and the user is not in
    the owning group or capable of CAP_FSETID, the setgid bit is cleared in
    inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
    permissions as well as the new ACL, but doesn't clear the setgid bit in
    a similar way; this allows to bypass the check in chmod(2). Fix that.

    References: CVE-2016-7097
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: Jan Kara
    Signed-off-by: Andreas Gruenbacher

    Jan Kara
     

16 Sep, 2016

1 commit


30 Jul, 2016

1 commit

  • Pull userns vfs updates from Eric Biederman:
    "This tree contains some very long awaited work on generalizing the
    user namespace support for mounting filesystems to include filesystems
    with a backing store. The real world target is fuse but the goal is
    to update the vfs to allow any filesystem to be supported. This
    patchset is based on a lot of code review and testing to approach that
    goal.

    While looking at what is needed to support the fuse filesystem it
    became clear that there were things like xattrs for security modules
    that needed special treatment. That the resolution of those concerns
    would not be fuse specific. That sorting out these general issues
    made most sense at the generic level, where the right people could be
    drawn into the conversation, and the issues could be solved for
    everyone.

    At a high level what this patchset does a couple of simple things:

    - Add a user namespace owner (s_user_ns) to struct super_block.

    - Teach the vfs to handle filesystem uids and gids not mapping into
    to kuids and kgids and being reported as INVALID_UID and
    INVALID_GID in vfs data structures.

    By assigning a user namespace owner filesystems that are mounted with
    only user namespace privilege can be detected. This allows security
    modules and the like to know which mounts may not be trusted. This
    also allows the set of uids and gids that are communicated to the
    filesystem to be capped at the set of kuids and kgids that are in the
    owning user namespace of the filesystem.

    One of the crazier corner casees this handles is the case of inodes
    whose i_uid or i_gid are not mapped into the vfs. Most of the code
    simply doesn't care but it is easy to confuse the inode writeback path
    so no operation that could cause an inode write-back is permitted for
    such inodes (aka only reads are allowed).

    This set of changes starts out by cleaning up the code paths involved
    in user namespace permirted mounts. Then when things are clean enough
    adds code that cleanly sets s_user_ns. Then additional restrictions
    are added that are possible now that the filesystem superblock
    contains owner information.

    These changes should not affect anyone in practice, but there are some
    parts of these restrictions that are changes in behavior.

    - Andy's restriction on suid executables that does not honor the
    suid bit when the path is from another mount namespace (think
    /proc/[pid]/fd/) or when the filesystem was mounted by a less
    privileged user.

    - The replacement of the user namespace implicit setting of MNT_NODEV
    with implicitly setting SB_I_NODEV on the filesystem superblock
    instead.

    Using SB_I_NODEV is a stronger form that happens to make this state
    user invisible. The user visibility can be managed but it caused
    problems when it was introduced from applications reasonably
    expecting mount flags to be what they were set to.

    There is a little bit of work remaining before it is safe to support
    mounting filesystems with backing store in user namespaces, beyond
    what is in this set of changes.

    - Verifying the mounter has permission to read/write the block device
    during mount.

    - Teaching the integrity modules IMA and EVM to handle filesystems
    mounted with only user namespace root and to reduce trust in their
    security xattrs accordingly.

    - Capturing the mounters credentials and using that for permission
    checks in d_automount and the like. (Given that overlayfs already
    does this, and we need the work in d_automount it make sense to
    generalize this case).

    Furthermore there are a few changes that are on the wishlist:

    - Get all filesystems supporting posix acls using the generic posix
    acls so that posix_acl_fix_xattr_from_user and
    posix_acl_fix_xattr_to_user may be removed. [Maintainability]

    - Reducing the permission checks in places such as remount to allow
    the superblock owner to perform them.

    - Allowing the superblock owner to chown files with unmapped uids and
    gids to something that is mapped so the files may be treated
    normally.

    I am not considering even obvious relaxations of permission checks
    until it is clear there are no more corner cases that need to be
    locked down and handled generically.

    Many thanks to Seth Forshee who kept this code alive, and putting up
    with me rewriting substantial portions of what he did to handle more
    corner cases, and for his diligent testing and reviewing of my
    changes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
    fs: Call d_automount with the filesystems creds
    fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
    evm: Translate user/group ids relative to s_user_ns when computing HMAC
    dquot: For now explicitly don't support filesystems outside of init_user_ns
    quota: Handle quota data stored in s_user_ns in quota_setxquota
    quota: Ensure qids map to the filesystem
    vfs: Don't create inodes with a uid or gid unknown to the vfs
    vfs: Don't modify inodes with a uid or gid unknown to the vfs
    cred: Reject inodes with invalid ids in set_create_file_as()
    fs: Check for invalid i_uid in may_follow_link()
    vfs: Verify acls are valid within superblock's s_user_ns.
    userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
    fs: Refuse uid/gid changes which don't map into s_user_ns
    selinux: Add support for unprivileged mounts from user namespaces
    Smack: Handle labels consistently in untrusted mounts
    Smack: Add support for unprivileged mounts from user namespaces
    fs: Treat foreign mounts as nosuid
    fs: Limit file caps to the user namespace of the super block
    userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
    userns: Remove implicit MNT_NODEV fragility.
    ...

    Linus Torvalds
     

01 Jul, 2016

1 commit

  • Update posix_acl_valid to verify that an acl is within a user namespace.

    Update the callers of posix_acl_valid to pass in an appropriate
    user namespace. For posix_acl_xattr_set and v9fs_xattr_set_acl pass in
    inode->i_sb->s_user_ns to posix_acl_valid. For md_unpack_acl pass in
    &init_user_ns as no inode or superblock is in sight.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

25 Jun, 2016

1 commit

  • Factor out part of posix_acl_xattr_set into a common function that takes
    a posix_acl, which nfsd can also call.

    The prototype already exists in include/linux/posix_acl.h.

    Signed-off-by: Andreas Gruenbacher
    Cc: stable@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: J. Bruce Fields

    Andreas Gruenbacher
     

28 May, 2016

1 commit


11 Apr, 2016

1 commit


31 Mar, 2016

2 commits

  • acl_by_type(inode, type) returns a pointer to either inode->i_acl or
    inode->i_default_acl depending on type. This is useful in
    fs/posix_acl.c, but should never have been visible outside that file.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • When get_acl() is called for an inode whose ACL is not cached yet, the
    get_acl inode operation is called to fetch the ACL from the filesystem.
    The inode operation is responsible for updating the cached acl with
    set_cached_acl(). This is done without locking at the VFS level, so
    another task can call set_cached_acl() or forget_cached_acl() before the
    get_acl inode operation gets to calling set_cached_acl(), and then
    get_acl's call to set_cached_acl() results in caching an outdate ACL.

    Prevent this from happening by setting the cached ACL pointer to a
    task-specific sentinel value before calling the get_acl inode operation.
    Move the responsibility for updating the cached ACL from the get_acl
    inode operations to get_acl(). There, only set the cached ACL if the
    sentinel value hasn't changed.

    The sentinel values are chosen to have odd values. Likewise, the value
    of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have
    an even value (ACLs are aligned in memory). This allows to distinguish
    uncached ACLs values from ACL objects.

    In addition, switch from guarding inode->i_acl and inode->i_default_acl
    upates by the inode->i_lock spinlock to using xchg() and cmpxchg().

    Filesystems that do not want ACLs returned from their get_acl inode
    operations to be cached must call forget_cached_acl() to prevent the VFS
    from doing so.

    (Patch written by Al Viro and Andreas Gruenbacher.)

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

14 Dec, 2015

1 commit

  • Change the list operation to only return whether or not an attribute
    should be listed. Copying the attribute names into the buffer is moved
    to the callers.

    Since the result only depends on the dentry and not on the attribute
    name, we do not pass the attribute name to list operations.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

07 Dec, 2015

2 commits


14 Nov, 2015

3 commits

  • The xattr_handler operations are currently all passed a file system
    specific flags value which the operations can use to disambiguate between
    different handlers; some file systems use that to distinguish the xattr
    namespace, for example. In some oprations, it would be useful to also have
    access to the handler prefix. To allow that, pass a pointer to the handler
    to operations instead of the flags value alone.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • When a filesystem that contains POSIX ACLs is mounted without ACL support
    (-o noacl), the appropriate behavior is not to list any existing POSIX ACL
    xattrs. The return value for list xattr handlers in this case is 0, not an
    error code: several filesystems that use the POSIX ACL xattr handlers do
    not expect the list operation to fail.

    Symlinks cannot have ACLs, so posix_acl_xattr_list will never be called for
    symlinks in the first place.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The get and set operations of the POSIX ACL xattr handlers failed to check
    the attribute names, so all names with "system.posix_acl_access" or
    "system.posix_acl_default" as a prefix were accepted. Reject invalid names
    from now on.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

24 Jun, 2015

1 commit

  • If posix_acl_create() returns an error code then "*acl" and "*default_acl"
    can be uninitialized or point to freed memory. This is a dangerous thing
    to do. For example, it causes a problem in ocfs2_reflink():

    fs/ocfs2/refcounttree.c:4327 ocfs2_reflink()
    error: potentially using uninitialized 'default_acl'.

    I've re-written this so we set the pointers to NULL at the start. I've
    added a temporary "clone" variable to hold the value of "*acl" until end.
    Setting them to NULL means means we don't need the "no_acl" label. We may
    as well remove the "apply_umask" stuff forward and remove that label as
    well.

    Signed-off-by: Dan Carpenter
    Cc: Alexander Viro
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton

    Dan Carpenter
     

16 Apr, 2015

1 commit


23 Feb, 2015

1 commit

  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells