29 May, 2020

2 commits


30 Apr, 2020

1 commit

  • In commit 044366659c18 ("ANDROID: vfs: Add setattr2 for filesystems with
    per mount permissions") a new symbol was exported, but it should have
    been set as a _GPL symbol.

    Fix this up by properly.

    Bug: 35848445
    Cc: Daniel Rosenberg
    Signed-off-by: Greg Kroah-Hartman
    Change-Id: I87585ad059367aa51b784ec415a1bf7f809de769

    Greg Kroah-Hartman
     

09 Feb, 2020

1 commit


09 Dec, 2019

1 commit

  • Push clamping timestamps into notify_change(), so in-kernel
    callers like nfsd and overlayfs will get similar timestamp
    set behavior as utimes.

    AV: get rid of clamping in ->setattr() instances; we don't need
    to bother with that there, with notify_change() doing normalization
    in all cases now (it already did for implicit case, since current_time()
    clamps).

    Suggested-by: Miklos Szeredi
    Fixes: 42e729b9ddbb ("utimes: Clamp the timestamps before update")
    Cc: stable@vger.kernel.org # v5.4
    Cc: Deepa Dinamani
    Cc: Jeff Layton
    Signed-off-by: Amir Goldstein
    Signed-off-by: Al Viro

    Amir Goldstein
     

23 Sep, 2019

1 commit


30 Aug, 2019

1 commit

  • Update the inode timestamp updates to use timestamp_truncate()
    instead of timespec64_trunc().

    The change was mostly generated by the following coccinelle
    script.

    virtual context
    virtual patch

    @r1 depends on patch forall@
    struct inode *inode;
    identifier i_xtime =~ "^i_[acm]time$";
    expression e;
    @@

    inode->i_xtime =
    - timespec64_trunc(
    + timestamp_truncate(
    ...,
    - e);
    + inode);

    Signed-off-by: Deepa Dinamani
    Acked-by: Greg Kroah-Hartman
    Acked-by: Jeff Layton
    Cc: adrian.hunter@intel.com
    Cc: dedekind1@gmail.com
    Cc: gregkh@linuxfoundation.org
    Cc: hch@lst.de
    Cc: jaegeuk@kernel.org
    Cc: jlbec@evilplan.org
    Cc: richard@nod.at
    Cc: tj@kernel.org
    Cc: yuchao0@huawei.com
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: linux-mtd@lists.infradead.org

    Deepa Dinamani
     

20 Jul, 2019

2 commits

  • This allows filesystems to use their mount private data to
    influence the permssions they return in permission2. It has
    been separated into a new call to avoid disrupting current
    permission users.

    Test: HiKey/X15 + Pie + android-mainline,
    and HiKey + AOSP Maser + android-mainline,
    directories under /sdcard created,
    output of mount is right,
    CTS test collecting device infor works

    Bug: 35848445
    Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
    Signed-off-by: Daniel Rosenberg
    [AmitP: Minor refactoring of original patch to align with
    changes from the following upstream commit
    4bfd054ae11e ("fs: fold __inode_permission() into inode_permission()").
    Also introduce vfs_mkobj2(), because do_create()
    moved from using vfs_create() to vfs_mkobj()
    eecec19d9e70 ("mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata")
    do_create() is dropped/cleaned-up upstream so a
    minor refactoring there as well.
    066cc813e94a ("do_mq_open(): move all work prior to dentry_open() into a helper")]
    Signed-off-by: Amit Pundir
    [astrachan: Folded the following changes into this patch:
    f46c9d62dd81 ("ANDROID: fs: Export vfs_rmdir2")
    9992eb8b9a1e ("ANDROID: xattr: Pass EOPNOTSUPP to permission2")]
    Signed-off-by: Alistair Strachan
    Signed-off-by: Yongqin Liu

    Daniel Rosenberg
     
  • This allows filesystems to use their mount private data to
    influence the permssions they use in setattr2. It has
    been separated into a new call to avoid disrupting current
    setattr users.

    Test: HiKey/X15 + Pie + android-mainline,
    and HiKey + AOSP Maser + android-mainline,
    directories under /sdcard created,
    output of mount is right,
    CTS test collecting device infor works

    Change-Id: I19959038309284448f1b7f232d579674ef546385
    Signed-off-by: Daniel Rosenberg
    Signed-off-by: Yongqin Liu

    Daniel Rosenberg
     

04 Jul, 2018

1 commit


15 Jun, 2018

1 commit

  • Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
    "This is a late set of changes from Deepa Dinamani doing an automated
    treewide conversion of the inode and iattr structures from 'timespec'
    to 'timespec64', to push the conversion from the VFS layer into the
    individual file systems.

    As Deepa writes:

    'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
    timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
    becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
    This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
    timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

    Thomas Gleixner adds:

    'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

    * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    pstore: Remove bogus format string definition
    vfs: change inode times to use struct timespec64
    pstore: Convert internal records to timespec64
    udf: Simplify calls to udf_disk_stamp_to_time
    fs: nfs: get rid of memcpys for inode times
    ceph: make inode time prints to be long long
    lustre: Use long long type to print inode time
    fs: add timespec64_truncate()

    Linus Torvalds
     

06 Jun, 2018

1 commit

  • struct timespec is not y2038 safe. Transition vfs to use
    y2038 safe struct timespec64 instead.

    The change was made with the help of the following cocinelle
    script. This catches about 80% of the changes.
    All the header file and logic changes are included in the
    first 5 rules. The rest are trivial substitutions.
    I avoid changing any of the function signatures or any other
    filesystem specific data structures to keep the patch simple
    for review.

    The script can be a little shorter by combining different cases.
    But, this version was sufficient for my usecase.

    virtual patch

    @ depends on patch @
    identifier now;
    @@
    - struct timespec
    + struct timespec64
    current_time ( ... )
    {
    - struct timespec now = current_kernel_time();
    + struct timespec64 now = current_kernel_time64();
    ...
    - return timespec_trunc(
    + return timespec64_trunc(
    ... );
    }

    @ depends on patch @
    identifier xtime;
    @@
    struct \( iattr \| inode \| kstat \) {
    ...
    - struct timespec xtime;
    + struct timespec64 xtime;
    ...
    }

    @ depends on patch @
    identifier t;
    @@
    struct inode_operations {
    ...
    int (*update_time) (...,
    - struct timespec t,
    + struct timespec64 t,
    ...);
    ...
    }

    @ depends on patch @
    identifier t;
    identifier fn_update_time =~ "update_time$";
    @@
    fn_update_time (...,
    - struct timespec *t,
    + struct timespec64 *t,
    ...) { ... }

    @ depends on patch @
    identifier t;
    @@
    lease_get_mtime( ... ,
    - struct timespec *t
    + struct timespec64 *t
    ) { ... }

    @te depends on patch forall@
    identifier ts;
    local idexpression struct inode *inode_node;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn_update_time =~ "update_time$";
    identifier fn;
    expression e, E3;
    local idexpression struct inode *node1;
    local idexpression struct inode *node2;
    local idexpression struct iattr *attr1;
    local idexpression struct iattr *attr2;
    local idexpression struct iattr attr;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    @@
    (
    (
    - struct timespec ts;
    + struct timespec64 ts;
    |
    - struct timespec ts = current_time(inode_node);
    + struct timespec64 ts = current_time(inode_node);
    )

    i_xtime, &ts)
    + timespec64_equal(&inode_node->i_xtime, &ts)
    |
    - timespec_equal(&ts, &inode_node->i_xtime)
    + timespec64_equal(&ts, &inode_node->i_xtime)
    |
    - timespec_compare(&inode_node->i_xtime, &ts)
    + timespec64_compare(&inode_node->i_xtime, &ts)
    |
    - timespec_compare(&ts, &inode_node->i_xtime)
    + timespec64_compare(&ts, &inode_node->i_xtime)
    |
    ts = current_time(e)
    |
    fn_update_time(..., &ts,...)
    |
    inode_node->i_xtime = ts
    |
    node1->i_xtime = ts
    |
    ts = inode_node->i_xtime
    |
    ia_xtime ...+> = ts
    |
    ts = attr1->ia_xtime
    |
    ts.tv_sec
    |
    ts.tv_nsec
    |
    btrfs_set_stack_timespec_sec(..., ts.tv_sec)
    |
    btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
    |
    - ts = timespec64_to_timespec(
    + ts =
    ...
    -)
    |
    - ts = ktime_to_timespec(
    + ts = ktime_to_timespec64(
    ...)
    |
    - ts = E3
    + ts = timespec_to_timespec64(E3)
    |
    - ktime_get_real_ts(&ts)
    + ktime_get_real_ts64(&ts)
    |
    fn(...,
    - ts
    + timespec64_to_timespec(ts)
    ,...)
    )
    ...+>
    (

    )
    |
    - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
    |
    - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
    + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
    |
    - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
    |
    node1->i_xtime1 =
    - timespec_trunc(attr1->ia_xtime1,
    + timespec64_trunc(attr1->ia_xtime1,
    ...)
    |
    - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
    + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
    ...)
    |
    - ktime_get_real_ts(&attr1->ia_xtime1)
    + ktime_get_real_ts64(&attr1->ia_xtime1)
    |
    - ktime_get_real_ts(&attr.ia_xtime1)
    + ktime_get_real_ts64(&attr.ia_xtime1)
    )

    @ depends on patch @
    struct inode *node;
    struct iattr *attr;
    identifier fn;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    expression e;
    @@
    (
    - fn(node->i_xtime);
    + fn(timespec64_to_timespec(node->i_xtime));
    |
    fn(...,
    - node->i_xtime);
    + timespec64_to_timespec(node->i_xtime));
    |
    - e = fn(attr->ia_xtime);
    + e = fn(timespec64_to_timespec(attr->ia_xtime));
    )

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn;
    @@
    {
    + struct timespec ts;
    i_xtime);
    fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    )
    ...+>
    }

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    struct kstat *stat;
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier i_xtime =~ "^i_[acm]time$";
    identifier xtime =~ "^[acm]time$";
    identifier fn, ret;
    @@
    {
    + struct timespec ts;
    i_xtime);
    ret = fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(node->i_xtime);
    ret = fn (...,
    - &node->i_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(stat->xtime);
    ret = fn (...,
    - &stat->xtime);
    + &ts);
    )
    ...+>
    }

    @ depends on patch @
    struct inode *node;
    struct inode *node2;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier i_xtime3 =~ "^i_[acm]time$";
    struct iattr *attrp;
    struct iattr *attrp2;
    struct iattr attr ;
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    struct kstat *stat;
    struct kstat stat1;
    struct timespec64 ts;
    identifier xtime =~ "^[acmb]time$";
    expression e;
    @@
    (
    ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
    |
    node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    stat->xtime = node2->i_xtime1;
    |
    stat1.xtime = node2->i_xtime1;
    |
    ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
    |
    ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
    |
    - e = node->i_xtime1;
    + e = timespec64_to_timespec( node->i_xtime1 );
    |
    - e = attrp->ia_xtime1;
    + e = timespec64_to_timespec( attrp->ia_xtime1 );
    |
    node->i_xtime1 = current_time(...);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    - node->i_xtime1 = e;
    + node->i_xtime1 = timespec_to_timespec64(e);
    )

    Signed-off-by: Deepa Dinamani
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:

    Deepa Dinamani
     

25 May, 2018

1 commit

  • Allow users with CAP_SYS_CHOWN over the superblock of a filesystem to
    chown files when inode owner is invalid. Ordinarily the
    capable_wrt_inode_uidgid check is sufficient to allow access to files
    but when the underlying filesystem has uids or gids that don't map to
    the current user namespace it is not enough, so the chown permission
    checks need to be extended to allow this case.

    Calling chown on filesystem nodes whose uid or gid don't map is
    necessary if those nodes are going to be modified as writing back
    inodes which contain uids or gids that don't map is likely to cause
    filesystem corruption of the uid or gid fields.

    Once chown has been called the existing capable_wrt_inode_uidgid
    checks are sufficient to allow the owner of a superblock to do anything
    the global root user can do with an appropriate set of capabilities.

    An ordinary filesystem mountable by a userns root will limit all uids
    and gids in s_user_ns or the INVALID_UID and INVALID_GID to flag all
    others. So having this added permission limited to just INVALID_UID
    and INVALID_GID is sufficient to handle every case on an ordinary filesystem.

    Of the virtual filesystems at least proc is known to set s_user_ns to
    something other than &init_user_ns, while at the same time presenting
    some files owned by GLOBAL_ROOT_UID. Those files the mounter of proc
    in a user namespace should not be able to chown to get access to.
    Limiting the relaxation in permission to just the minimum of allowing
    changing INVALID_UID and INVALID_GID prevents problems with cases like
    that.

    The original version of this patch was written by: Seth Forshee. I
    have rewritten and rethought this patch enough so it's really not the
    same thing (certainly it needs a different description), but he
    deserves credit for getting out there and getting the conversation
    started, and finding the potential gotcha's and putting up with my
    semi-paranoid feedback.

    Inspired-by: Seth Forshee
    Acked-by: Seth Forshee
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

02 Mar, 2017

1 commit


11 Oct, 2016

1 commit

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     

08 Oct, 2016

1 commit


28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

22 Sep, 2016

2 commits

  • Currently, notify_change() clears capabilities or IMA attributes by
    calling security_inode_killpriv() before calling into ->setattr. Thus it
    happens before any other permission checks in inode_change_ok() and user
    is thus allowed to trigger clearing of capabilities or IMA attributes
    for any file he can look up e.g. by calling chown for that file. This is
    unexpected and can lead to user DoSing a system.

    Fix the problem by calling security_inode_killpriv() at the end of
    inode_change_ok() instead of from notify_change(). At that moment we are
    sure user has permissions to do the requested change.

    References: CVE-2015-1350
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

16 Sep, 2016

1 commit

  • This fixes a bug where the permission was not properly checked in
    overlayfs. The testcase is ltp/utimensat01.

    It is also cleaner and safer to do the permission checking in the vfs
    helper instead of the caller.

    This patch introduces an additional ia_valid flag ATTR_TOUCH (since
    touch(1) is the most obvious user of utimes(NULL)) that is passed into
    notify_change whenever the conditions for this special permission checking
    mode are met.

    Reported-by: Aihua Zhang
    Signed-off-by: Miklos Szeredi
    Tested-by: Aihua Zhang
    Cc: # v3.18+

    Miklos Szeredi
     

06 Jul, 2016

1 commit

  • When a filesystem outside of init_user_ns is mounted it could have
    uids and gids stored in it that do not map to init_user_ns.

    The plan is to allow those filesystems to set i_uid to INVALID_UID and
    i_gid to INVALID_GID for unmapped uids and gids and then to handle
    that strange case in the vfs to ensure there is consistent robust
    handling of the weirdness.

    Upon a careful review of the vfs and filesystems about the only case
    where there is any possibility of confusion or trouble is when the
    inode is written back to disk. In that case filesystems typically
    read the inode->i_uid and inode->i_gid and write them to disk even
    when just an inode timestamp is being updated.

    Which leads to a rule that is very simple to implement and understand
    inodes whose i_uid or i_gid is not valid may not be written.

    In dealing with access times this means treat those inodes as if the
    inode flag S_NOATIME was set. Reads of the inodes appear safe and
    useful, but any write or modification is disallowed. The only inode
    write that is allowed is a chown that sets the uid and gid on the
    inode to valid values. After such a chown the inode is normal and may
    be treated as such.

    Denying all writes to inodes with uids or gids unknown to the vfs also
    prevents several oddball cases where corruption would have occurred
    because the vfs does not have complete information.

    One problem case that is prevented is attempting to use the gid of a
    directory for new inodes where the directories sgid bit is set but the
    directories gid is not mapped.

    Another problem case avoided is attempting to update the evm hash
    after setxattr, removexattr, and setattr. As the evm hash includeds
    the inode->i_uid or inode->i_gid not knowning the uid or gid prevents
    a correct evm hash from being computed. evm hash verification also
    fails when i_uid or i_gid is unknown but that is essentially harmless
    as it does not cause filesystem corruption.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

28 Jun, 2016

1 commit

  • Add checks to notify_change to verify that uid and gid changes
    will map into the superblock's user namespace. If they do not
    fail with -EOVERFLOW.

    This is mandatory so that fileystems don't have to even think
    of dealing with ia_uid and ia_gid that

    --EWB Moved the test from inode_change_ok to notify_change

    Signed-off-by: Seth Forshee
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Seth Forshee
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

11 Jun, 2014

1 commit

  • The kernel has no concept of capabilities with respect to inodes; inodes
    exist independently of namespaces. For example, inode_capable(inode,
    CAP_LINUX_IMMUTABLE) would be nonsense.

    This patch changes inode_capable to check for uid and gid mappings and
    renames it to capable_wrt_inode_uidgid, which should make it more
    obvious what it does.

    Fixes CVE-2014-4014.

    Cc: Theodore Ts'o
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Chinner
    Cc: stable@vger.kernel.org
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

06 Dec, 2013

1 commit

  • Currently notify_change directly updates i_version for size updates,
    which not only is counter to how all other fields are updated through
    struct iattr, but also breaks XFS, which need inode updates to happen
    under its own lock, and synchronized to the structure that gets written
    to the log.

    Remove the update in the common code, and it to btrfs and ext4,
    XFS already does a proper updaste internally and currently gets a
    double update with the existing code.

    IMHO this is 3.13 and -stable material and should go in through the XFS
    tree.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Andreas Dilger
    Acked-by: Jan Kara
    Reviewed-by: Dave Chinner
    Signed-off-by: Chris Mason
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

09 Nov, 2013

1 commit


20 Nov, 2012

1 commit

  • - Allow chown if CAP_CHOWN is present in the current user namespace
    and the uid of the inode maps into the current user namespace, and
    the destination uid or gid maps into the current user namespace.

    - Allow perserving setgid when changing an inode if CAP_FSETID is
    present in the current user namespace and the owner of the file has
    a mapping into the current user namespace.

    Acked-by: Serge E. Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

08 Sep, 2012

1 commit

  • Changing an inode's metadata may result in our not needing to appraise
    the file. In such cases, we must remove 'security.ima'.

    Changelog v1:
    - use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configured

    Signed-off-by: Mimi Zohar
    Acked-by: Serge Hallyn
    Acked-by: Dmitry Kasatkin

    Mimi Zohar
     

14 Jul, 2012

1 commit


31 May, 2012

1 commit

  • When a file is truncated with truncate()/ftruncate() and then closed,
    iversion is not updated. This patch uses ATTR_SIZE flag as an indication
    to increment iversion.

    Mimi said:

    On fput(), i_version is used to detect and flag files that have changed
    and need to be re-measured in the IMA measurement policy. When a file
    is truncated with truncate()/ftruncate() and then closed, i_version is
    not updated. As a result, although the file has changed, it will not be
    re-measured and added to the IMA measurement list on subsequent access.

    Signed-off-by: Dmitry Kasatkin
    Acked-by: Mimi Zohar
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dmitry Kasatkin
     

03 May, 2012

1 commit


29 Feb, 2012

1 commit


04 Jan, 2012

1 commit


09 Aug, 2011

1 commit


21 Jul, 2011

2 commits

  • Let filesystems handle waiting for direct I/O requests themselves instead
    of doing it beforehand. This means filesystem-specific locks to prevent
    new dio referenes from appearing can be held. This is important to allow
    generalizing i_dio_count to non-DIO_LOCKING filesystems.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • i_alloc_sem is a rather special rw_semaphore. It's the last one that may
    be released by a non-owner, and it's write side is always mirrored by
    real exclusion. It's intended use it to wait for all pending direct I/O
    requests to finish before starting a truncate.

    Replace it with a hand-grown construct:

    - exclusion for truncates is already guaranteed by i_mutex, so it can
    simply fall way
    - the reader side is replaced by an i_dio_count member in struct inode
    that counts the number of pending direct I/O requests. Truncate can't
    proceed as long as it's non-zero
    - when i_dio_count reaches non-zero we wake up a pending truncate using
    wake_up_bit on a new bit in i_flags
    - new references to i_dio_count can't appear while we are waiting for
    it to read zero because the direct I/O count always needs i_mutex
    (or an equivalent like XFS's i_iolock) for starting a new operation.

    This scheme is much simpler, and saves the space of a spinlock_t and a
    struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
    system).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

19 Jul, 2011

1 commit


29 May, 2011

1 commit

  • Some recent benchmarking on btrfs showed that a major scaling bottleneck
    on large systems on btrfs is currently the xattr lookup on every write.

    Why xattr lookup on every write I hear you ask?

    write wants to drop suid and security related xattrs that could set o
    capabilities for executables. To do that it currently looks up
    security.capability on EVERY write (even for non executables) to decide
    whether to drop it or not.

    In btrfs this causes an additional tree walk, hitting some per file system
    locks and quite bad scalability. In a simple read workload on a 8S
    system I saw over 90% CPU time in spinlocks related to that.

    Chris Mason tells me this is also a problem in ext4, where it hits
    the global mbcache lock.

    This patch adds a simple per inode to avoid this problem. We only
    do the lookup once per file and then if there is no xattr cache
    the decision. All xattr changes clear the flag.

    I also used the same flag to avoid the suid check, although
    that one is pretty cheap.

    A file system can also set this flag when it creates the inode,
    if it has a cheap way to do so. This is done for some common file systems
    in followon patches.

    With this patch a major part of the lock contention disappears
    for btrfs. Some testing on smaller systems didn't show significant
    performance changes, but at least it helps the larger systems
    and is generally more efficient.

    v2: Rename is_sgid. add file system helper.
    Cc: chris.mason@oracle.com
    Cc: josef@redhat.com
    Cc: viro@zeniv.linux.org.uk
    Cc: agruen@linbit.com
    Cc: Serge E. Hallyn
    Signed-off-by: Andi Kleen
    Signed-off-by: Al Viro

    Andi Kleen
     

31 Mar, 2011

1 commit