18 Dec, 2019

1 commit

  • Anything that walks all inodes on sb->s_inodes list without rescheduling
    risks softlockups.

    Previous efforts were made in 2 functions, see:

    c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
    ac05fbb inode: don't softlockup when evicting inodes

    but there hasn't been an audit of all walkers, so do that now. This
    also consistently moves the cond_resched() calls to the bottom of each
    loop in cases where it already exists.

    One loop remains: remove_dquot_ref(), because I'm not quite sure how
    to deal with that one w/o taking the i_lock.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Jan Kara
    Signed-off-by: Al Viro

    Eric Sandeen
     

07 Dec, 2019

1 commit

  • Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
    "Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
    Basically, the problem is that negative pinned dentries require
    careful treatment - unless ->d_lock is locked or parent is held at
    least shared, another thread can make them positive right under us.

    Most of the uses turned out to be safe - the main surprises as far as
    filesystems are concerned were

    - race in dget_parent() fastpath, that might end up with the caller
    observing the returned dentry _negative_, due to insufficient
    barriers. It is positive in memory, but we could end up seeing the
    wrong value of ->d_inode in CPU cache. Fixed.

    - manual checks that result of lookup_one_len_unlocked() is positive
    (and rejection of negatives). Again, insufficient barriers (we
    might end up with inconsistent observed values of ->d_inode and
    ->d_flags). Fixed by switching to a new primitive that does the
    checks itself and returns ERR_PTR(-ENOENT) instead of a negative
    dentry. That way we get rid of boilerplate converting negatives
    into ERR_PTR(-ENOENT) in the callers and have a single place to
    deal with the barrier-related mess - inside fs/namei.c rather than
    in every caller out there.

    The guts of pathname resolution *do* need to be careful - the race
    found by Ritesh is real, as well as several similar races.
    Fortunately, it turns out that we can take care of that with fairly
    local changes in there.

    The tree-wide audit had not been fun, and I hate the idea of repeating
    it. I think the right approach would be to annotate the places where
    we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
    catch regressions. But I'm still not sure what would be the least
    invasive way of doing that and it's clearly the next cycle fodder"

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/namei.c: fix missing barriers when checking positivity
    fix dget_parent() fastpath race
    new helper: lookup_positive_unlocked()
    fs/namei.c: pull positivity check into follow_managed()

    Linus Torvalds
     

16 Nov, 2019

1 commit

  • Most of the callers of lookup_one_len_unlocked() treat negatives are
    ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
    that a pinned positive dentry remains positive - it's ->d_inode is
    stable, etc.; a pinned _negative_ dentry can become positive at any
    point as long as you are not holding its parent at least shared.
    So using lookup_one_len_unlocked() needs to be careful;
    lookup_positive_unlocked() is safer and that's what the callers
    end up open-coding anyway.

    Signed-off-by: Al Viro

    Al Viro
     

11 Nov, 2019

1 commit

  • Quota statistics counted as 64-bit per-cpu counter. Reading sums per-cpu
    fractions as signed 64-bit int, filters negative values and then reports
    lower half as signed 32-bit int.

    Result may looks like:

    fs.quota.allocated_dquots = 22327
    fs.quota.cache_hits = -489852115
    fs.quota.drops = -487288718
    fs.quota.free_dquots = 22083
    fs.quota.lookups = -486883485
    fs.quota.reads = 22327
    fs.quota.syncs = 335064
    fs.quota.writes = 3088689

    Values bigger than 2^31-1 reported as negative.

    All counters except "allocated_dquots" and "free_dquots" are monotonic,
    thus they should be reported as is without filtering negative values.

    Kernel doesn't have generic helper for 64-bit sysctl yet,
    let's use at least unsigned long.

    Link: https://lore.kernel.org/r/157337934693.2078.9842146413181153727.stgit@buzz
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Jan Kara

    Konstantin Khlebnikov
     

06 Nov, 2019

1 commit


04 Nov, 2019

6 commits

  • Make dquot_get_state() gracefully handle a situation when there are no
    quota files present even though quotas are enabled.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Quota on and quota off are protected by s_umount semaphore held in
    exclusive mode since commit 7d6cd73d33b6 "quota: Hold s_umount in
    exclusive mode when enabling / disabling quotas". This makes it
    impossible for dquot_disable() to race with other enabling or disabling
    of quotas. Simplify the cleanup done by dquot_disable() based on this
    fact and also remove some stale comments. As a bonus this cleanup makes
    dquot_disable() properly handle a case when there are no quota inodes.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Now dquot_enable() has only two internal callers and both of them just
    need to update quota flags and don't need most of checks. Just drop
    dquot_enable() and fold necessary functionality into the two calling
    places.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Rename vfs_load_quota_inode() to dquot_load_quota_inode() to be
    consistent with naming of other functions used for enabling quota
    accounting from filesystems. Also export the function and add some
    sanity checks to assure filesystems are calling the function properly.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • We already have quota inode loaded when resuming quotas. Use
    vfs_load_quota() to avoid some pointless churn with the quota inode.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Factor out setting up of quota inode and eventual error cleanup from
    vfs_load_quota_inode(). This will simplify situation for filesystems
    that don't have any quota inodes.

    Signed-off-by: Jan Kara

    Jan Kara
     

01 Nov, 2019

2 commits

  • There is a race window where quota was redirted once we drop dq_list_lock inside dqput(),
    but before we grab dquot->dq_lock inside dquot_release()

    TASK1 TASK2 (chowner)
    ->dqput()
    we_slept:
    spin_lock(&dq_list_lock)
    if (dquot_dirty(dquot)) {
    spin_unlock(&dq_list_lock);
    dquot->dq_sb->dq_op->write_dquot(dquot);
    goto we_slept
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
    spin_unlock(&dq_list_lock);
    dquot->dq_sb->dq_op->release_dquot(dquot);
    dqget()
    mark_dquot_dirty()
    dqput()
    goto we_slept;
    }
    So dquot dirty quota will be released by TASK1, but on next we_sleept loop
    we detect this and call ->write_dquot() for it.
    XFSTEST: https://github.com/dmonakhov/xfstests/commit/440a80d4cbb39e9234df4d7240aee1d551c36107

    Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org
    CC: stable@vger.kernel.org
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     
  • Write only quotas which are dirty at entry.

    XFSTEST: https://github.com/dmonakhov/xfstests/commit/b10ad23566a5bf75832a6f500e1236084083cddc

    Link: https://lore.kernel.org/r/20191031103920.3919-1-dmonakhov@openvz.org
    CC: stable@vger.kernel.org
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Dmitry Monakhov
     

04 Oct, 2019

2 commits


31 Jul, 2019

1 commit


11 Jul, 2019

1 commit

  • Pull ext2, udf and quota updates from Jan Kara:

    - some ext2 fixes and cleanups

    - a fix of udf bug when extending files

    - a fix of quota Q_XGETQSTAT[V] handling

    * tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
    ext2: Use kmemdup rather than duplicating its implementation
    quota: honor quota type in Q_XGETQSTAT[V] calls
    ext2: Always brelse bh on failure in ext2_iget()
    ext2: add missing brelse() in ext2_iget()
    ext2: Fix a typo in ext2_getattr argument
    ext2: fix a typo in comment
    ext2: add missing brelse() in ext2_new_inode()
    ext2: optimize ext2_xattr_get()
    ext2: introduce new helper for xattr entry comparison
    ext2: merge xattr next entry check to ext2_xattr_entry_valid()
    ext2: code cleanup for ext2_preread_inode()
    ext2: code cleanup by using test_opt() and clear_opt()
    doc: ext2: update description of quota options for ext2
    ext2: Strengthen xattr block checks
    ext2: Merge loops in ext2_xattr_set()
    ext2: introduce helper for xattr entry validation
    ext2: introduce helper for xattr header validation
    quota: add dqi_dirty_list description to comment of Dquot List Management

    Linus Torvalds
     

19 Jun, 2019

1 commit

  • Run below script as root, dquot_add_space will return -EDQUOT since
    __dquot_transfer call dquot_add_space with flags=0, and dquot_add_space
    think it's a preallocation. Fix it by set flags as DQUOT_SPACE_WARN.

    mkfs.ext4 -O quota,project /dev/vdb
    mount -o prjquota /dev/vdb /mnt
    setquota -P 23 1 1 0 0 /dev/vdb
    dd if=/dev/zero of=/mnt/test-file bs=4K count=1
    chattr -p 23 test-file

    Fixes: 7b9ca4c61bc2 ("quota: Reduce contention on dq_data_lock")
    Signed-off-by: yangerkun
    Signed-off-by: Jan Kara

    yangerkun
     

20 May, 2019

1 commit


01 May, 2019

1 commit


25 Apr, 2019

2 commits

  • Local variable *reserved* of remove_dquot_ref() is only used if
    define CONFIG_QUOTA_DEBUG, but not ebraced in CONFIG_QUOTA_DEBUG
    macro, which leads to unused-but-set-variable warning when compiling.

    This patch ebrace it into CONFIG_QUOTA_DEBUG macro like what is done
    in add_dquot_ref().

    Signed-off-by: Jiang Biao
    Signed-off-by: Jan Kara

    Jiang Biao
     
  • We need to check return code only when calling ->read_dqblk(),
    so fix it properly.

    Signed-off-by: Chengguang Xu
    Signed-off-by: Jan Kara

    Chengguang Xu
     

26 Mar, 2019

2 commits


20 Jun, 2018

2 commits


09 Apr, 2018

1 commit

  • dquot_init() is never called in atomic context.
    This function is only set as a parameter of fs_initcall().

    Despite never getting called from atomic context,
    dquot_init() calls __get_free_pages() with GFP_ATOMIC,
    which waits busily for allocation.
    GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
    to avoid busy waiting and improve the possibility of sucessful allocation.

    This is found by a static analysis tool named DCNS written by myself.
    And I also manually check it.

    Signed-off-by: Jia-Ju Bai
    Signed-off-by: Jan Kara

    Jia-Ju Bai
     

29 Nov, 2017

1 commit

  • register_shrinker() might return -ENOMEM error since Linux 3.12.
    Call panic() as with other failure checks in this function if
    register_shrinker() failed.

    Fixes: 1d3d4437eae1 ("vmscan: per-node deferred work")
    Signed-off-by: Tetsuo Handa
    Cc: Jan Kara
    Cc: Michal Hocko
    Reviewed-by: Michal Hocko
    Signed-off-by: Jan Kara

    Tetsuo Handa
     

28 Nov, 2017

1 commit

  • In commit 6184fc0b8dd7 ("quota: Propagate error from ->acquire_dquot()"),
    we have propagated error from __dquot_initialize to caller, but we forgot
    to handle such error in add_dquot_ref(), so, currently, during quota
    accounting information initialization flow, if we failed for some of
    inodes, we just ignore such error, and do account for others, which is
    not a good implementation.

    In this patch, we choose to let user be aware of such error, so after
    turning on quota successfully, we can make sure all inodes disk usage
    can be accounted, which will be more reasonable.

    Suggested-by: Jan Kara
    Signed-off-by: Chao Yu
    Signed-off-by: Jan Kara

    Chao Yu
     

15 Nov, 2017

1 commit

  • Pull quota, ext2, isofs and udf fixes from Jan Kara:

    - two small quota error handling fixes

    - two isofs fixes for architectures with signed char

    - several udf block number overflow and signedness fixes

    - ext2 rework of mount option handling to avoid GFP_KERNEL allocation
    with spinlock held

    - ... it also contains a patch to implement auditing of responses to
    fanotify permission events. That should have been in the fanotify
    pull request but I mistakenly merged that patch into a wrong branch
    and noticed only now at which point I don't think it's worth rebasing
    and redoing.

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    quota: be aware of error from dquot_initialize
    quota: fix potential infinite loop
    isofs: use unsigned char types consistently
    isofs: fix timestamps beyond 2027
    udf: Fix some sign-conversion warnings
    udf: Fix signed/unsigned format specifiers
    udf: Fix 64-bit sign extension issues affecting blocks > 0x7FFFFFFF
    udf: Remove some outdate references from documentation
    udf: Avoid overflow when session starts at large offset
    ext2: Fix possible sleep in atomic during mount option parsing
    ext2: Parse mount options into a dedicated structure
    audit: Record fanotify access control decisions

    Linus Torvalds
     

14 Nov, 2017

1 commit


13 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

01 Nov, 2017

1 commit

  • In dquot_writeback_dquots(), we write back dquot from dirty dquots
    list. There is a potential infinite loop if ->write_dquot() failure
    and forget remove dquot from the list. This patch clear dirty bit
    anyway to avoid it.

    Signed-off-by: zhangyi (F)
    Signed-off-by: Jan Kara

    zhangyi (F)
     

10 Oct, 2017

1 commit

  • Eryu has reported that since commit 7b9ca4c61bc2 "quota: Reduce
    contention on dq_data_lock" test generic/233 occasionally fails. This is
    caused by the fact that since that commit we don't generate warning and
    set grace time for quota allocations that have DQUOT_SPACE_NOFAIL set
    (these are for example some metadata allocations in ext4). We need these
    allocations to behave regularly wrt warning generation and grace time
    setting so fix the code to return to the original behavior.

    Reported-and-tested-by: Eryu Guan
    CC: stable@vger.kernel.org
    Fixes: 7b9ca4c61bc278b771fb57d6290a31ab1fc7fdac
    Signed-off-by: Jan Kara

    Jan Kara
     

18 Sep, 2017

1 commit


08 Sep, 2017

1 commit

  • Pull quota scaling updates from Jan Kara:
    "This contains changes to make the quota subsystem more scalable.

    Reportedly it improves number of files created per second on ext4
    filesystem on fast storage by about a factor of 2x"

    * 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
    quota: Add lock annotations to struct members
    quota: Reduce contention on dq_data_lock
    fs: Provide __inode_get_bytes()
    quota: Inline dquot_[re]claim_reserved_space() into callsite
    quota: Inline inode_{incr,decr}_space() into callsites
    quota: Inline functions into their callsites
    ext4: Disable dirty list tracking of dquots when journalling quotas
    quota: Allow disabling tracking of dirty dquots in a list
    quota: Remove dq_wait_unused from dquot
    quota: Move locking into clear_dquot_dirty()
    quota: Do not dirty bad dquots
    quota: Fix possible corruption of dqi_flags
    quota: Propagate ->quota_read errors from v2_read_file_info()
    quota: Fix error codes in v2_read_file_info()
    quota: Push dqio_sem down to ->read_file_info()
    quota: Push dqio_sem down to ->write_file_info()
    quota: Push dqio_sem down to ->get_next_id()
    quota: Push dqio_sem down to ->release_dqblk()
    quota: Remove locking for writing to the old quota format
    quota: Do not acquire dqio_sem for dquot overwrites in v2 format
    ...

    Linus Torvalds
     

18 Aug, 2017

3 commits

  • dq_data_lock is currently used to protect all modifications of quota
    accounting information, consistency of quota accounting on the inode,
    and dquot pointers from inode. As a result contention on the lock can be
    pretty heavy.

    Reduce the contention on the lock by protecting quota accounting
    information by a new dquot->dq_dqb_lock and consistency of quota
    accounting with inode usage by inode->i_lock.

    This change reduces time to create 500000 files on ext4 on ramdisk by 50
    different processes in separate directories by 6% when user quota is
    turned on. When those 50 processes belong to 50 different users, the
    improvement is about 9%.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • dquot_claim_reserved_space() and dquot_reclaim_reserved_space() have
    only a single callsite. Inline them there.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • inode_incr_space() and inode_decr_space() have only two callsites.
    Inline them there as that will make locking changes simpler.

    Signed-off-by: Jan Kara

    Jan Kara