30 Jul, 2016

1 commit

  • Pull userns vfs updates from Eric Biederman:
    "This tree contains some very long awaited work on generalizing the
    user namespace support for mounting filesystems to include filesystems
    with a backing store. The real world target is fuse but the goal is
    to update the vfs to allow any filesystem to be supported. This
    patchset is based on a lot of code review and testing to approach that
    goal.

    While looking at what is needed to support the fuse filesystem it
    became clear that there were things like xattrs for security modules
    that needed special treatment. That the resolution of those concerns
    would not be fuse specific. That sorting out these general issues
    made most sense at the generic level, where the right people could be
    drawn into the conversation, and the issues could be solved for
    everyone.

    At a high level what this patchset does a couple of simple things:

    - Add a user namespace owner (s_user_ns) to struct super_block.

    - Teach the vfs to handle filesystem uids and gids not mapping into
    to kuids and kgids and being reported as INVALID_UID and
    INVALID_GID in vfs data structures.

    By assigning a user namespace owner filesystems that are mounted with
    only user namespace privilege can be detected. This allows security
    modules and the like to know which mounts may not be trusted. This
    also allows the set of uids and gids that are communicated to the
    filesystem to be capped at the set of kuids and kgids that are in the
    owning user namespace of the filesystem.

    One of the crazier corner casees this handles is the case of inodes
    whose i_uid or i_gid are not mapped into the vfs. Most of the code
    simply doesn't care but it is easy to confuse the inode writeback path
    so no operation that could cause an inode write-back is permitted for
    such inodes (aka only reads are allowed).

    This set of changes starts out by cleaning up the code paths involved
    in user namespace permirted mounts. Then when things are clean enough
    adds code that cleanly sets s_user_ns. Then additional restrictions
    are added that are possible now that the filesystem superblock
    contains owner information.

    These changes should not affect anyone in practice, but there are some
    parts of these restrictions that are changes in behavior.

    - Andy's restriction on suid executables that does not honor the
    suid bit when the path is from another mount namespace (think
    /proc/[pid]/fd/) or when the filesystem was mounted by a less
    privileged user.

    - The replacement of the user namespace implicit setting of MNT_NODEV
    with implicitly setting SB_I_NODEV on the filesystem superblock
    instead.

    Using SB_I_NODEV is a stronger form that happens to make this state
    user invisible. The user visibility can be managed but it caused
    problems when it was introduced from applications reasonably
    expecting mount flags to be what they were set to.

    There is a little bit of work remaining before it is safe to support
    mounting filesystems with backing store in user namespaces, beyond
    what is in this set of changes.

    - Verifying the mounter has permission to read/write the block device
    during mount.

    - Teaching the integrity modules IMA and EVM to handle filesystems
    mounted with only user namespace root and to reduce trust in their
    security xattrs accordingly.

    - Capturing the mounters credentials and using that for permission
    checks in d_automount and the like. (Given that overlayfs already
    does this, and we need the work in d_automount it make sense to
    generalize this case).

    Furthermore there are a few changes that are on the wishlist:

    - Get all filesystems supporting posix acls using the generic posix
    acls so that posix_acl_fix_xattr_from_user and
    posix_acl_fix_xattr_to_user may be removed. [Maintainability]

    - Reducing the permission checks in places such as remount to allow
    the superblock owner to perform them.

    - Allowing the superblock owner to chown files with unmapped uids and
    gids to something that is mapped so the files may be treated
    normally.

    I am not considering even obvious relaxations of permission checks
    until it is clear there are no more corner cases that need to be
    locked down and handled generically.

    Many thanks to Seth Forshee who kept this code alive, and putting up
    with me rewriting substantial portions of what he did to handle more
    corner cases, and for his diligent testing and reviewing of my
    changes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
    fs: Call d_automount with the filesystems creds
    fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
    evm: Translate user/group ids relative to s_user_ns when computing HMAC
    dquot: For now explicitly don't support filesystems outside of init_user_ns
    quota: Handle quota data stored in s_user_ns in quota_setxquota
    quota: Ensure qids map to the filesystem
    vfs: Don't create inodes with a uid or gid unknown to the vfs
    vfs: Don't modify inodes with a uid or gid unknown to the vfs
    cred: Reject inodes with invalid ids in set_create_file_as()
    fs: Check for invalid i_uid in may_follow_link()
    vfs: Verify acls are valid within superblock's s_user_ns.
    userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
    fs: Refuse uid/gid changes which don't map into s_user_ns
    selinux: Add support for unprivileged mounts from user namespaces
    Smack: Handle labels consistently in untrusted mounts
    Smack: Add support for unprivileged mounts from user namespaces
    fs: Treat foreign mounts as nosuid
    fs: Limit file caps to the user namespace of the super block
    userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
    userns: Remove implicit MNT_NODEV fragility.
    ...

    Linus Torvalds
     

06 Jul, 2016

1 commit

  • Introduce the helper qid_has_mapping and use it to ensure that the
    quota system only considers qids that map to the filesystems
    s_user_ns.

    In practice for quota supporting filesystems today this is the exact
    same check as qid_valid. As only 0xffffffff aka (qid_t)-1 does not
    map into init_user_ns.

    Replace the qid_valid calls with qid_has_mapping as values come in
    from userspace. This is harmless today and it prepares the quota
    system to work on filesystems with quotas but mounted by unprivileged
    users.

    Call qid_has_mapping from dqget. This ensures the passed in qid has a
    prepresentation on the underlying filesystem. Previously this was
    unnecessary as filesystesm never had qids that could not map. With
    the introduction of filesystems outside of s_user_ns this will not
    remain true.

    All of this ensures the quota code never has to deal with qids that
    don't map to the underlying filesystem.

    Cc: Jan Kara
    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Jun, 2016

1 commit

  • The quota subsystem has two formats, the old v1 format using architecture
    specific time_t values on the on-disk format, while the v2 format
    (introduced in Linux 2.5.16 and 2.4.22) uses fixed 64-bit little-endian.

    While there is no future for the v1 format beyond y2038, the v2 format
    is almost there on 32-bit architectures, as both the user interface
    and the on-disk format use 64-bit timestamps, just not the time_t
    inbetween.

    This changes the internal representation to use time64_t, which will
    end up doing the right thing everywhere for v2 format.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jan Kara

    Arnd Bergmann
     

09 Feb, 2016

1 commit


08 Feb, 2016

1 commit

  • Q_XGETNEXTQUOTA is exactly like Q_XGETQUOTA, except that it
    will return quota information for the id equal to or greater
    than the id requested. In other words, if the requested id has
    no quota, the command will return quota information for the
    next higher id which does have a quota set. If no higher id
    has an active quota, -ESRCH is returned.

    This allows filesystems to do efficient iteration in kernelspace,
    much like extN filesystems do in userspace when asked to report
    all active quotas.

    The patch adds a d_id field to struct qc_dqblk so that we can
    pass back the id of the quota which was found, and return it
    to userspace.

    Today, filesystems such as XFS require getpwent-style iterations,
    and for systems which have i.e. LDAP backends, this can be very
    slow, or even impossible if iteration is not allowed in the
    configuration.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Jan Kara
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

19 Mar, 2015

1 commit


16 Mar, 2015

1 commit


04 Mar, 2015

4 commits

  • Flags in struct quota_state keep flags for each quota type and
    some common flags. This patch reorders typed flags:

    Before:

    0 USRQUOTA DQUOT_USAGE_ENABLED
    1 USRQUOTA DQUOT_LIMITS_ENABLED
    2 USRQUOTA DQUOT_SUSPENDED
    3 GRPQUOTA DQUOT_USAGE_ENABLED
    4 GRPQUOTA DQUOT_LIMITS_ENABLED
    5 GRPQUOTA DQUOT_SUSPENDED
    6 DQUOT_QUOTA_SYS_FILE
    7 DQUOT_NEGATIVE_USAGE

    After:

    0 USRQUOTA DQUOT_USAGE_ENABLED
    1 GRPQUOTA DQUOT_USAGE_ENABLED
    2 USRQUOTA DQUOT_LIMITS_ENABLED
    3 GRPQUOTA DQUOT_LIMITS_ENABLED
    4 USRQUOTA DQUOT_SUSPENDED
    5 GRPQUOTA DQUOT_SUSPENDED
    6 DQUOT_QUOTA_SYS_FILE
    7 DQUOT_NEGATIVE_USAGE

    Now we can get bitmap of all enabled/suspended quota types without loop.
    For example suspended: (flags / DQUOT_SUSPENDED) & ((1 << MAXQUOTAS) - 1).

    add/remove: 0/1 grow/shrink: 3/11 up/down: 56/-215 (-159)
    function old new delta
    __dquot_initialize 423 447 +24
    dquot_transfer 181 197 +16
    dquot_alloc_inode 286 302 +16
    dquot_reclaim_space_nodirty 316 313 -3
    dquot_claim_space_nodirty 314 311 -3
    dquot_resume 286 281 -5
    dquot_free_inode 332 324 -8
    __dquot_alloc_space 500 492 -8
    dquot_disable 1944 1929 -15
    dquot_quota_enable 252 236 -16
    __dquot_free_space 750 734 -16
    dquot_writeback_dquots 625 608 -17
    __dquot_transfer 1186 1154 -32
    dquot_quota_sync 299 261 -38
    dquot_active.isra 54 - -54

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Jan Kara

    Konstantin Khlebnikov
     
  • Change ->set_info to take new qc_info structure which contains all the
    necessary information both for XFS and VFS. Convert Q_SETINFO handler
    to use this structure.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • These callbacks are now unused. Remove them.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Create new internal interface for getting information about quota which
    contains everything needed for both VFS quotas and XFS quotas. Make VFS
    use this and hook it up to Q_GETINFO.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

30 Jan, 2015

4 commits


28 Jan, 2015

1 commit

  • Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
    tracks space limits and usage in 512-byte blocks. However VFS quotas
    track usage in bytes (as some filesystems require that) and we need to
    somehow pass this information. Upto now it wasn't a problem because we
    didn't do any unit conversion (thus VFS quota routines happily stuck
    number of bytes into d_bcount field of struct fd_disk_quota). Only if
    you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
    / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
    tried this but reportedly some Samba users hit the problem in practice.
    So when we want interfaces compatible we need to fix this.

    We bite the bullet and define another quota structure used for passing
    information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
    to have more conversion routines in fs/quota/quota.c and another copying
    of quota structure slows down getting of quota information by about 2%
    but it seems cleaner than overloading e.g. units of d_bcount to bytes.

    CC: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

22 Jan, 2015

1 commit


10 Nov, 2014

1 commit

  • Currently all filesystems supporting VFS quota support user and group
    quotas. With introduction of project quotas this is going to change so
    make sure filesystem isn't called for quota type it doesn't support by
    introduction of a bitmask determining which quota types each filesystem
    supports.

    Acked-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

16 Jul, 2014

1 commit

  • Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem,
    accessing inode->i_dquot now protected by dquot_srcu, and changing
    inode->i_dquot is now serialized by dq_data_lock.

    Signed-off-by: Lai Siyao
    Signed-off-by: Niu Yawei
    Signed-off-by: Jan Kara

    Niu Yawei
     

05 May, 2014

1 commit

  • The Q_XQUOTARM quotactl was not working properly, because
    we weren't passing around proper flags. The xfs_fs_set_xstate()
    ioctl handler used the same flags for Q_XQUOTAON/OFF as
    well as for Q_XQUOTARM, but Q_XQUOTAON/OFF look for
    XFS_UQUOTA_ACCT, XFS_UQUOTA_ENFD, XFS_GQUOTA_ACCT etc,
    i.e. quota type + state, while Q_XQUOTARM looks only for
    the type of quota, i.e. XFS_DQ_USER, XFS_DQ_GROUP etc.

    Unfortunately these flag spaces overlap a bit, so we
    got semi-random results for Q_XQUOTARM; i.e. the value
    for XFS_DQ_USER == XFS_UQUOTA_ACCT, etc. yeargh.

    Add a new quotactl op vector specifically for the QUOTARM
    operation, since it operates with a different flag space.

    This has been broken more or less forever, AFAICT.

    Signed-off-by: Eric Sandeen
    Acked-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

21 Aug, 2013

1 commit

  • XFS now supports three types of quotas (user, group and project).

    Current version of Q_XGETSTAT has support for only two types of quotas.
    In order to support three types of quotas, the interface, specifically
    struct fs_quota_stat, need to be expanded. Current version of fs_quota_stat
    does not allow expansion without breaking backward compatibility.

    So, a quotactl command and new fs_quota_stat structure need to be added.

    This patch adds a new command Q_XGETQSTATV to quotactl() which takes
    a new data structure fs_quota_statv. This new data structure provides
    support for future expansion and backward compatibility.

    Callers of the new quotactl command have to set the version of the data
    structure being passed, and kernel will fill as much data as requested.
    If the kernel does not support the user-space provided version, EINVAL
    will be returned. User-space can reduce the version number and call the same
    quotactl again.

    Signed-off-by: Chandra Seetharaman
    Reviewed-by: Jan Kara
    Reviewed-by: Rich Johnston
    Signed-off-by: Ben Myers

    [v2: Applied rjohnston's suggestions as per Chandra's request. -bpm]

    Chandra Seetharaman
     

25 Jan, 2013

1 commit


13 Oct, 2012

1 commit


18 Sep, 2012

4 commits

  • Change struct dquot dq_id to a struct kqid and remove the now
    unecessary dq_type.

    Make minimal changes to dquot, quota_tree, quota_v1, quota_v2, ext3,
    ext4, and ocfs2 to deal with the change in quota structures and
    signatures. The ocfs2 changes are larger than most because of the
    extensive tracing throughout the ocfs2 quota code that prints out
    dq_id.

    quota_tree.c:get_index is modified to take a struct kqid instead of a
    qid_t because all of it's callers pass in dquot->dq_id and it allows
    me to introduce only a single conversion.

    The rest of the changes are either just replacing dq_type with dq_id.type,
    adding conversions to deal with the change in type and occassionally
    adding qid_eq to allow quota id comparisons in a user namespace safe way.

    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Jan Kara
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Cc: Theodore Tso
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Modify quota_send_warning to take struct kqid instead a type and
    identifier pair.

    When sending netlink broadcasts always convert uids and quota
    identifiers into the intial user namespace. There is as yet no way to
    send a netlink broadcast message with different contents to receivers
    in different namespaces, so for the time being just map all of the
    identifiers into the initial user namespace which preserves the
    current behavior.

    Change the callers of quota_send_warning in gfs2, xfs and dquot
    to generate a struct kqid to pass to quota send warning. When
    all of the user namespaces convesions are complete a struct kqid
    values will be availbe without need for conversion, but a conversion
    is needed now to avoid needing to convert everything at once.

    Cc: Ben Myers
    Cc: Alex Elder
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Steven Whitehouse
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Update the quotactl user space interface to successfull compile with
    user namespaces support enabled and to hand off quota identifiers to
    lower layers of the kernel in struct kqid instead of type and qid
    pairs.

    The quota on function is not converted because while it takes a quota
    type and an id. The id is the on disk quota format to use, which
    is something completely different.

    The signature of two struct quotactl_ops methods were changed to take
    struct kqid argumetns get_dqblk and set_dqblk.

    The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk
    are minimally changed so that the code continues to work with
    the change in parameter type.

    This is the first in a series of changes to always store quota
    identifiers in the kernel in struct kqid and only use raw type and qid
    values when interacting with on disk structures or userspace. Always
    using struct kqid internally makes it hard to miss places that need
    conversion to or from the kernel internal values.

    Cc: Jan Kara
    Cc: Dave Chinner
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: Alex Elder
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Add the data type struct kqid which holds the kernel internal form of
    the owning identifier of a quota. struct kqid is a replacement for
    the implicit union of uid, gid and project id stored in an unsigned
    int and the quota type field that is was used in the quota data
    structures. Making the data type explicit allows the kuid_t and
    kgid_t type safety to propogate more thoroughly through the code,
    revealing more places where uid/gid conversions need be made.

    Along with the data type struct kqid comes the helper functions
    qid_eq, qid_lt, from_kqid, from_kqid_munged, qid_valid, make_kqid,
    make_kqid_invalid, make_kqid_uid, make_kqid_gid.

    Cc: Jan Kara
    Cc: Dave Chinner
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

23 Jul, 2012

1 commit

  • Split off part of dquot_quota_sync() which writes dquots into a quota file
    to a separate function. In the next patch we will use the function from
    filesystems and we do not want to abuse ->quota_sync quotactl callback more
    than necessary.

    Acked-by: Steven Whitehouse
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

12 Jan, 2012

1 commit


27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

20 Jul, 2011

1 commit


13 Jan, 2011

1 commit

  • As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
    is prone to deadlocks. We hold s_umount semaphore for reading during the
    path resolution and resolution itself may need to acquire the semaphore
    for writing when e. g. autofs mountpoint is passed.

    Solve the problem by performing the resolution before we get hold of the
    superblock (and thus s_umount semaphore). The whole thing is complicated
    by the fact that some filesystems (OCFS2) ignore the path argument. So to
    distinguish between filesystem which want the path and which do not we
    introduce new .quota_on_meta callback which does not get the path. OCFS2
    then uses this callback instead of old .quota_on.

    CC: Al Viro
    CC: Christoph Hellwig
    CC: Ted Ts'o
    CC: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     

28 May, 2010

1 commit

  • Generic per-cpu counter has some memory overhead but it is negligible for
    modern systems and embedded systems compile without quota support. And code
    reuse is a good thing. This patch should fix complain from preemptive kernels
    which was introduced by dde9588853b1bde.

    [Jan Kara: Fixed patch to work on 32-bit archs as well]

    Reported-by: Rafael J. Wysocki
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     

24 May, 2010

1 commit


22 May, 2010

3 commits

  • Pass the larger struct fs_disk_quota to the ->set_dqblk operation so
    that the Q_SETQUOTA and Q_XSETQUOTA operations can be implemented
    with a single filesystem operation and we can retire the ->set_xquota
    operation. The additional information (RT-subvolume accounting and
    warn counts) are left zero for the VFS quota implementation.

    Add new fieldmask values for setting the numer of blocks and inodes
    values which is required for the VFS quota, but wasn't for XFS.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Pass the larger struct fs_disk_quota to the ->get_dqblk operation so
    that the Q_GETQUOTA and Q_XGETQUOTA operations can be implemented
    with a single filesystem operation and we can retire the ->get_xquota
    operation. The additional information (RT-subvolume accounting and
    warn counts) are left zero for the VFS quota implementation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Quota stats is mostly writable data structure. Let's alloc percpu
    bucket for each value.

    NOTE: dqstats_read() function is racy against dqstats_{inc,dec}
    and may return inconsistent value. But this is ok since absolute
    accuracy is not required.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     

05 Mar, 2010

3 commits

  • Just use 0 / -EDQUOT directly - that's what it translates to anyway.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the initialize dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_initialize helper to __dquot_initialize
    and vfs_dq_init to dquot_initialize to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     
  • Get rid of the drop dquot operation - it is now always called from
    the filesystem and if a filesystem really needs it's own (which none
    currently does) it can just call into it's own routine directly.

    Rename the now static low-level dquot_drop helper to __dquot_drop
    and vfs_dq_drop to dquot_drop to have a consistent namespace.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig