27 Aug, 2019

1 commit


04 Jul, 2019

1 commit

  • Add a new xfs_bulk_ireq flag to constrain the iteration to a single AG.
    If the passed-in startino value is zero then we start with the first
    inode in the AG that the user passes in; otherwise, we iterate only
    within the same AG as the passed-in inode.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Allison Collins
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

03 Jul, 2019

3 commits


29 Jun, 2019

1 commit

  • There are many, many xfs header files which are included but
    unneeded (or included twice) in the xfs code, so remove them.

    nb: xfs_linux.h includes about 9 headers for everyone, so those
    explicit includes get removed by this. I'm not sure what the
    preference is, but if we wanted explicit includes everywhere,
    a followup patch could remove those xfs_*.h includes from
    xfs_linux.h and move them into the files that need them.
    Or it could be left as-is.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     

23 Apr, 2019

1 commit

  • Widen the incore inode's i_delayed_blks counter to be a 64-bit integer.
    This is necessary to fix an integer overflow problem that can be
    reproduced easily now that we use the counter to track blocks that are
    assigned to the inode in memory but not on disk. This includes actual
    delalloc reservations as well as real extents in the COW fork that
    are waiting to be remapped into the data fork.

    These 'delayed mapping' blocks can easily exceed 2^32 blocks if one
    creates a very large sparse file of size approximately 2^33 bytes with
    one byte written every 2^23 bytes, sets a very large COW extent size
    hint of 2^23 blocks, reflinks the first file into a second file, and
    then writes a single byte every 2^23 blocks in the original file.

    When this happens, we'll try to create approximately 1024 2^23 extent
    reservations in the COW fork, which will overflow the counter and cause
    problems.

    Note that on x64 we end up filling a 4-byte gap in the structure so this
    doesn't increase the incore size.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Allison Collins
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

27 Jul, 2018

1 commit

  • Replace the IRELE macro with a proper function so that we can do proper
    typechecking and so that we can stop open-coding iput in scrub, which
    means that we'll be able to ftrace inode lifetimes going through scrub
    correctly.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Carlos Maiolino
    Reviewed-by: Brian Foster

    Darrick J. Wong
     

07 Jun, 2018

1 commit

  • Remove the verbose license text from XFS files and replace them
    with SPDX tags. This does not change the license of any of the code,
    merely refers to the common, up-to-date license files in LICENSES/

    This change was mostly scripted. fs/xfs/Makefile and
    fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
    and modified by the following command:

    for f in `git grep -l "GNU General" fs/xfs/` ; do
    echo $f
    cat $f | awk -f hdr.awk > $f.new
    mv -f $f.new $f
    done

    And the hdr.awk script that did the modification (including
    detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
    is as follows:

    $ cat hdr.awk
    BEGIN {
    hdr = 1.0
    tag = "GPL-2.0"
    str = ""
    }

    /^ \* This program is free software/ {
    hdr = 2.0;
    next
    }

    /any later version./ {
    tag = "GPL-2.0+"
    next
    }

    /^ \*\// {
    if (hdr > 0.0) {
    print "// SPDX-License-Identifier: " tag
    print str
    print $0
    str=""
    hdr = 0.0
    next
    }
    print $0
    next
    }

    /^ \* / {
    if (hdr > 1.0)
    next
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    next
    }

    /^ \*/ {
    if (hdr > 0.0)
    next
    print $0
    next
    }

    // {
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    }

    END { }
    $

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

16 May, 2018

1 commit


10 May, 2018

12 commits

  • The function 'xfs_qm_dqiterate' doesn't iterate dquots at all, it
    iterates all dquot blocks of a quota inode and clears the counters.
    Therefore, change the name to something more descriptive so that we can
    introduce a real dquot iterator later.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the
    _dqget family of functions cares about is DQALLOC. Therefore, change
    it to a boolean 'can alloc?' flag for the dqget interfaces where that
    makes sense.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Darrick J. Wong
     
  • The quota initialization code needs an "uncached" variant of _dqget to
    read in default quota limits and timers before the dquot cache is fully
    set up. We've already split up _dqget into its component pieces so
    create a fourth variant to address this need, and make dqread internal
    to xfs_dquot.c again.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • Quotacheck only runs during mount, which means that there are no other
    processes in the system that could be doing chown or chproj. Therefore
    there's no potential for racing to attach dquots to the inode so we can
    drop all the ILOCK and race detection bits from quotacheck.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • There are two uses of dqget here -- one is to return the dquot for a
    given type and id, and the other is to return the dquot for a given type
    and inode. Those are two separate things, so split them into two
    smaller functions.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • The flags argument is always zero, get rid of it.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     
  • When dquot flush or purge fail there's no need to spam the logs, we've
    already logged the IO error or fs shutdown that caused the flush
    failures.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster

    Darrick J. Wong
     
  • The log item flags contain a field that is protected by the AIL
    lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to
    set and clear these flags, but most of the updates and checks are
    not done with the AIL lock held and so are susceptible to update
    races.

    Fix this by changing the log item flags to use atomic bitops rather
    than be reliant on the AIL lock for update serialisation.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Add an xfs_dqblk verifier so that it can check the uuid on V5 filesystems;
    it calls the existing xfs_dquot_verify verifier to validate the
    xfs_disk_dquot_t contained inside it. This lets us move the uuid
    verification out of the crc verifier, which makes little sense.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • It's a bit dicey to pass in the smaller xfs_disk_dquot and then cast it to
    something larger; pass in the full xfs_dqblk so we know the caller has sent
    us the right thing. Rename the function to xfs_dqblk_repair for
    clarity.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • Long ago the flags argument was used to determine whether to issue warnings
    about corruptions, but that's done elsewhere now and the flag is unused
    here, so remove it.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • Move xfs_buf_incore out of line and make it the only way to look up
    a buffer in the buffer cache from outside the buffer cache. Convert
    the external users of _xfs_buf_find() to xfs_buf_incore() and make
    _xfs_buf_find() static.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    [darrick: actually rename xfs_incore -> xfs_buf_incore]
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

03 Apr, 2018

1 commit

  • xfs_dir_ialloc() rolls the current transaction when allocation of a new
    inode required the space manager to perform an allocation and replinish
    the Inode btree.

    None of the callers of xfs_dir_ialloc() need to know if the
    transaction was committed. Hence this commit removes the "committed"
    argument of xfs_dir_ialloc.

    Signed-off-by: Chandan Rajendra
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Chandan Rajendra
     

13 Jan, 2018

1 commit

  • Starting with commit 57e734423ad ("vsprintf: refactor %pK code out of
    pointer"), the behavior of the raw '%p' printk format specifier was
    changed to print a 32-bit hash of the pointer value to avoid leaking
    kernel pointers into dmesg. For most situations that's good.

    This is /undesirable/ behavior when we're trying to debug XFS, however,
    so define a PTR_FMT that prints the actual pointer when we're in debug
    mode.

    Note that %p for tracepoints still prints the raw pointer, so in the
    long run we could consider rewriting some of these messages as
    tracepoints.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Dave Chinner

    Darrick J. Wong
     

09 Jan, 2018

2 commits


03 Jan, 2018

2 commits


09 Dec, 2017

1 commit

  • If we create a new file we will need an inode, and usually some metadata
    in the parent direction. Aiming for everything to go well despite the
    lack of a reservation leads to dirty transactions cancelled under a heavy
    create/delete load. This patch removes those nospace transactions, which
    will lead to slightly earlier ENOSPC on some workloads, but instead
    prevent file system shutdowns due to cancelling dirty transactions for
    others.

    A customer could observe assertations failures and shutdowns due to
    cancelation of dirty transactions during heavy NFS workloads as shown
    below:

    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262

    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [] dump_stack+0x63/0x81
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [] warn_slowpath_common+0x8a/0xc0
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [] warn_slowpath_null+0x1a/0x20
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [] asswarn+0x33/0x40 [xfs]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [] xfs_create+0x7be/0x7d0 [xfs]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [] xfs_generic_create+0x1fb/0x2e0 [xfs]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [] xfs_vn_mknod+0x14/0x20 [xfs]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [] xfs_vn_create+0x13/0x20 [xfs]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [] vfs_create+0xd5/0x140
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [] do_nfsd_create+0x499/0x610 [nfsd]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [] nfsd3_proc_create+0x135/0x210 [nfsd]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [] nfsd_dispatch+0xc3/0x210 [nfsd]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [] svc_process_common+0x453/0x6f0 [sunrpc]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [] svc_process+0x113/0x1f0 [sunrpc]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [] nfsd+0x10f/0x180 [nfsd]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [] ? nfsd_destroy+0x80/0x80 [nfsd]
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [] kthread+0xd8/0xf0
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [] ? kthread_create_on_node+0x1b0/0x1b0
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [] ret_from_fork+0x42/0x70
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [] ? kthread_create_on_node+0x1b0/0x1b0
    2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---

    2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs]

    2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
    2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

02 Sep, 2017

1 commit


24 Jul, 2017

1 commit

  • If a dquot has an id of U32_MAX, the next lookup index increment
    overflows the uint32_t back to 0. This starts the lookup sequence
    over from the beginning, repeats indefinitely and results in a
    livelock.

    Update xfs_qm_dquot_walk() to explicitly check for the lookup
    overflow and exit the loop.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

19 Jun, 2017

1 commit

  • Reclaim during quotacheck can lead to deadlocks on the dquot flush
    lock:

    - Quotacheck populates a local delwri queue with the physical dquot
    buffers.
    - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
    dirties all of the dquots.
    - Reclaim kicks in and attempts to flush a dquot whose buffer is
    already queud on the quotacheck queue. The flush succeeds but
    queueing to the reclaim delwri queue fails as the backing buffer is
    already queued. The flush unlock is now deferred to I/O completion
    of the buffer from the quotacheck queue.
    - The dqadjust bulkstat continues and dirties the recently flushed
    dquot once again.
    - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
    the flush lock to update the backing buffers with the in-core
    recalculated values. It deadlocks on the redirtied dquot as the
    flush lock was already acquired by reclaim, but the buffer resides
    on the local delwri queue which isn't submitted until the end of
    quotacheck.

    This is reproduced by running quotacheck on a filesystem with a
    couple million inodes in low memory (512MB-1GB) situations. This is
    a regression as of commit 43ff2122e6 ("xfs: on-stack delayed write
    buffer lists"), which removed a trylock and buffer I/O submission
    from the quotacheck dquot flush sequence.

    Quotacheck first resets and collects the physical dquot buffers in a
    delwri queue. Then, it traverses the filesystem inodes via bulkstat,
    updates the in-core dquots, flushes the corrected dquots to the
    backing buffers and finally submits the delwri queue for I/O. Since
    the backing buffers are queued across the entire quotacheck
    operation, dquot reclaim cannot possibly complete a dquot flush
    before quotacheck completes.

    Therefore, quotacheck must submit the buffer for I/O in order to
    cycle the flush lock and flush the dirty in-core dquot to the
    buffer. Add a delwri queue buffer push mechanism to submit an
    individual buffer for I/O without losing the delwri queue status and
    use it from quotacheck to avoid the deadlock. This restores
    quotacheck behavior to as before the regression was introduced.

    Reported-by: Martin Svec
    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

26 Apr, 2017

2 commits

  • The quotacheck error handling of the delwri buffer list assumes the
    resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
    on the buffers that are dequeued. This can lead to assert failures
    on buffer release and possibly other locking problems.

    Move this code to a delwri queue cancel helper function to
    encapsulate the logic required to properly release buffers from a
    delwri queue. Update the helper to clear the delwri queue flag and
    call it from quotacheck.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • The kbuild test robot caught this; in debug code we have another
    caller of do_div with a 32-bit dividend (j) which is caught now
    that we are using the kernel-supplied do_div.

    None of the values used here are 64-bit; just use simple division.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     

28 Jan, 2017

1 commit

  • Quotacheck runs at mount time in situations where quota accounting must
    be recalculated. In doing so, it uses bulkstat to visit every inode in
    the filesystem. Historically, every inode processed during quotacheck
    was released and immediately tagged for reclaim because quotacheck runs
    before the superblock is marked active by the VFS. In other words,
    the final iput() lead to an immediate ->destroy_inode() call, which
    allowed the XFS background reclaim worker to start reclaiming inodes.

    Commit 17c12bcd3 ("xfs: when replaying bmap operations, don't let
    unlinked inodes get reaped") marks the XFS superblock active sooner as
    part of the mount process to support caching inodes processed during log
    recovery. This occurs before quotacheck and thus means all inodes
    processed by quotacheck are inserted to the LRU on release. The
    s_umount lock is held until the mount has completed and thus prevents
    the shrinkers from operating on the sb. This means that quotacheck can
    excessively populate the inode LRU and lead to OOM conditions on systems
    without sufficient RAM.

    Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
    inodes processed by quotacheck. This causes ->drop_inode() to return 1
    and in turn causes iput_final() to evict the inode. This preserves the
    original quotacheck behavior and prevents it from overloading the LRU
    and running out of memory.

    CC: stable@vger.kernel.org # v4.9
    Reported-by: Martin Svec
    Signed-off-by: Brian Foster
    Reviewed-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

08 Nov, 2016

1 commit

  • The open-coded pattern:

    ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)

    is all over the xfs code; provide a new helper
    xfs_iext_count(ifp) to count the number of inline extents
    in an inode fork.

    [dchinner: pick up several missed conversions]

    Signed-off-by: Eric Sandeen
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

06 Apr, 2016

1 commit

  • Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
    that returns a transaction with all the required log and block reservations,
    and which allows passing transaction flags directly to avoid the cumbersome
    _xfs_trans_alloc interface.

    While we're at it we also get rid of the transaction type argument that has
    been superflous since we stopped supporting the non-CIL logging mode. The
    guts of it will be removed in another patch.

    [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     

08 Feb, 2016

1 commit

  • Default quotas are globally set due historical reasons. IRIX only
    supported user and project quotas, and default quota was only
    applied to user quotas.

    In Linux, when a default quota is set, all different quota types
    inherits the same default value.

    An user with a quota limit larger than the default quota value, will
    still be limited to the default value because the group quotas also
    inherits the default quotas. Unless the group which the user belongs
    to have a custom quota limit set.

    This patch aims to split the default quota value by quota type.
    Allowing each quota type having different default values.

    Default time limits are still set globally. XFS does not set a
    per-user/group timer, but a single global timer. For changing this
    behavior, some changes should be made in user-space tools another
    bugs being fixed.

    Signed-off-by: Carlos Maiolino
    Reviewed-by: Eric Sandeen
    Signed-off-by: Dave Chinner

    Carlos Maiolino
     

12 Nov, 2015

1 commit

  • Pull xfs updates from Dave Chinner:
    "There is nothing really major here - the only significant addition is
    the per-mount operation statistics infrastructure. Otherwises there's
    various ACL, xattr, DAX, AIO and logging fixes, and a smattering of
    small cleanups and fixes elsewhere.

    Summary:

    - per-mount operational statistics in sysfs
    - fixes for concurrent aio append write submission
    - various logging fixes
    - detection of zeroed logs and invalid log sequence numbers on v5 filesystems
    - memory allocation failure message improvements
    - a bunch of xattr/ACL fixes
    - fdatasync optimisation
    - miscellaneous other fixes and cleanups"

    * tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits)
    xfs: give all workqueues rescuer threads
    xfs: fix log recovery op header validation assert
    xfs: Fix error path in xfs_get_acl
    xfs: optimise away log forces on timestamp updates for fdatasync
    xfs: don't leak uuid table on rmmod
    xfs: invalidate cached acl if set via ioctl
    xfs: Plug memory leak in xfs_attrmulti_attr_set
    xfs: Validate the length of on-disk ACLs
    xfs: invalidate cached acl if set directly via xattr
    xfs: xfs_filemap_pmd_fault treats read faults as write faults
    xfs: add ->pfn_mkwrite support for DAX
    xfs: DAX does not use IO completion callbacks
    xfs: Don't use unwritten extents for DAX
    xfs: introduce BMAPI_ZERO for allocating zeroed extents
    xfs: fix inode size update overflow in xfs_map_direct()
    xfs: clear PF_NOFREEZE for xfsaild kthread
    xfs: fix an error code in xfs_fs_fill_super()
    xfs: stats are no longer dependent on CONFIG_PROC_FS
    xfs: simplify /proc teardown & error handling
    xfs: per-filesystem stats counter implementation
    ...

    Linus Torvalds
     

07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman