06 Dec, 2018

1 commit

  • commit 7b38460dc8e4eafba06c78f8e37099d3b34d473c upstream.

    Kanda Motohiro reported that expanding a tiny xattr into a large xattr
    fails on XFS because we remove the tiny xattr from a shortform fork and
    then try to re-add it after converting the fork to extents format having
    not removed the ATTR_REPLACE flag. This fails because the attr is no
    longer present, causing a fs shutdown.

    This is derived from the patch in his bug report, but we really
    shouldn't ignore a nonzero retval from the remove call.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199119
    Reported-by: kanda.motohiro@gmail.com
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Ben Hutchings
    Signed-off-by: Sasha Levin

    Darrick J. Wong
     

10 Nov, 2018

1 commit

  • [ Upstream commit a606ebdb859e78beb757dfefa08001df366e2ef5 ]

    The truncate transaction does not ever modify the inode btree, but
    includes an associated log reservation. Update
    xfs_calc_itruncate_reservation() to remove the reservation
    associated with inobt updates.

    [Amir: This commit was merged for kernel v4.16 and a twin commit was
    merged for xfsprogs v4.16. As a result, a small xfs filesystem
    formatted with features -m rmapbt=1,reflink=1 using mkfs.xfs
    version >= v4.16 cannot be mounted with kernel < v4.16.

    For example, xfstests generic/17{1,2,3} format a small fs and
    when trying to mount it, they fail with an assert on this very
    demonic line:

    XFS (vdc): Log size 3075 blocks too small, minimum size is 3717 blocks
    XFS (vdc): AAIEEE! Log failed size checks. Abort!
    XFS: Assertion failed: 0, file: src/linux/fs/xfs/xfs_log.c, line: 666

    The simple solution for stable kernels is to apply this patch,
    because mkfs.xfs v4.16 is already in the wild, so we have to
    assume that xfs filesystems with a "too small" log exist.
    Regardless, xfsprogs maintainers should also consider reverting
    the twin patch to stop creating those filesystems for the sake
    of users with unpatched kernels.]

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Cc: # v4.9+
    Signed-off-by: Amir Goldstein
    Reviewed-by: Darrick J . Wong
    Signed-off-by: Sasha Levin

    Brian Foster
     

09 Aug, 2018

3 commits

  • commit bb3d48dcf86a97dc25fe9fc2c11938e19cb4399a upstream.

    xfs_attr3_leaf_create may have errored out before instantiating a buffer,
    for example if the blkno is out of range. In that case there is no work
    to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops
    if we try.

    This also seems to fix a flaw where the original error from
    xfs_attr3_leaf_create gets overwritten in the cleanup case, and it
    removes a pointless assignment to bp which isn't used after this.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969
    Reported-by: Xu, Wen
    Tested-by: Xu, Wen
    Signed-off-by: Eric Sandeen
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Cc: Eduardo Valentin
    Signed-off-by: Greg Kroah-Hartman

    Eric Sandeen
     
  • commit afca6c5b2595fc44383919fba740c194b0b76aff upstream.

    A recent fuzzed filesystem image cached random dcache corruption
    when the reproducer was run. This often showed up as panics in
    lookup_slow() on a null inode->i_ops pointer when doing pathwalks.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    ....
    Call Trace:
    lookup_slow+0x44/0x60
    walk_component+0x3dd/0x9f0
    link_path_walk+0x4a7/0x830
    path_lookupat+0xc1/0x470
    filename_lookup+0x129/0x270
    user_path_at_empty+0x36/0x40
    path_listxattr+0x98/0x110
    SyS_listxattr+0x13/0x20
    do_syscall_64+0xf5/0x280
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    but had many different failure modes including deadlocks trying to
    lock the inode that was just allocated or KASAN reports of
    use-after-free violations.

    The cause of the problem was a corrupt INOBT on a v4 fs where the
    root inode was marked as free in the inobt record. Hence when we
    allocated an inode, it chose the root inode to allocate, found it in
    the cache and re-initialised it.

    We recently fixed a similar inode allocation issue caused by inobt
    record corruption problem in xfs_iget_cache_miss() in commit
    ee457001ed6c ("xfs: catch inode allocation state mismatch
    corruption"). This change adds similar checks to the cache-hit path
    to catch it, and turns the reproducer into a corruption shutdown
    situation.

    Reported-by: Wen Xu
    Signed-Off-By: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    [darrick: fix typos in comment]
    Signed-off-by: Darrick J. Wong
    Cc: Eduardo Valentin
    Signed-off-by: Greg Kroah-Hartman

    Dave Chinner
     
  • commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream.

    We recently came across a V4 filesystem causing memory corruption
    due to a newly allocated inode being setup twice and being added to
    the superblock inode list twice. From code inspection, the only way
    this could happen is if a newly allocated inode was not marked as
    free on disk (i.e. di_mode wasn't zero).

    Running the metadump on an upstream debug kernel fails during inode
    allocation like so:

    XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod=
    e.c, line: 838
    ------------[ cut here ]------------
    kernel BUG at fs/xfs/xfs_message.c:114!
    invalid opcode: 0000 [#1] PREEMPT SMP
    CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0=
    1/2014
    RIP: 0010:assfail+0x28/0x30
    RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
    RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
    RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
    RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
    R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
    FS: 00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000=
    000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
    Call Trace:
    xfs_ialloc+0x383/0x570
    xfs_dir_ialloc+0x6a/0x2a0
    xfs_create+0x412/0x670
    xfs_generic_create+0x1f7/0x2c0
    ? capable_wrt_inode_uidgid+0x3f/0x50
    vfs_mkdir+0xfb/0x1b0
    SyS_mkdir+0xcf/0xf0
    do_syscall_64+0x73/0x1a0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

    Extracting the inode number we crashed on from an event trace and
    looking at it with xfs_db:

    xfs_db> inode 184452204
    xfs_db> p
    core.magic = 0x494e
    core.mode = 0100644
    core.version = 2
    core.format = 2 (extents)
    core.nlinkv2 = 1
    core.onlink = 0
    .....

    Confirms that it is not a free inode on disk. xfs_repair
    also trips over this inode:

    .....
    zero length extent (off = 0, fsbno = 0) in ino 184452204
    correcting nextents for inode 184452204
    bad attribute fork in inode 184452204, would clear attr fork
    bad nblocks 1 for inode 184452204, would reset to 0
    bad anextents 1 for inode 184452204, would reset to 0
    imap claims in-use inode 184452204 is free, would correct imap
    would have cleared inode 184452204
    .....
    disconnected inode 184452204, would move to lost+found

    And so we have a situation where the directory structure and the
    inobt thinks the inode is free, but the inode on disk thinks it is
    still in use. Where this corruption came from is not possible to
    diagnose, but we can detect it and prevent the kernel from oopsing
    on lookup. The reproducer now results in:

    $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
    mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex=
    ists
    mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex=
    ists
    mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu=
    re needs cleaning
    mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o=
    utput error
    mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o=
    utput error
    ....

    And this corruption shutdown:

    [ 54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not=
    marked free on disk
    [ 54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 =
    of file fs/xfs/xfs_trans.c. Caller xfs_create+0x425/0x670
    [ 54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #=
    443
    [ 54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO=
    S 1.10.2-1 04/01/2014
    [ 54.852859] Call Trace:
    [ 54.853531] dump_stack+0x85/0xc5
    [ 54.854385] xfs_trans_cancel+0x197/0x1c0
    [ 54.855421] xfs_create+0x425/0x670
    [ 54.856314] xfs_generic_create+0x1f7/0x2c0
    [ 54.857390] ? capable_wrt_inode_uidgid+0x3f/0x50
    [ 54.858586] vfs_mkdir+0xfb/0x1b0
    [ 54.859458] SyS_mkdir+0xcf/0xf0
    [ 54.860254] do_syscall_64+0x73/0x1a0
    [ 54.861193] entry_SYSCALL_64_after_hwframe+0x42/0xb7
    [ 54.862492] RIP: 0033:0x7fb73bddf547
    [ 54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000=
    000000000053
    [ 54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73=
    bddf547
    [ 54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda=
    a55449a
    [ 54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a=
    8670dd0
    [ 54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000=
    00001ff
    [ 54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda=
    a553500
    [ 54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1=
    024 of file fs/xfs/xfs_trans.c. Return address = ffffffff814cd050
    [ 54.882790] XFS (loop0): Corruption of in-memory data detected. Shutt=
    ing down filesystem
    [ 54.884597] XFS (loop0): Please umount the filesystem and rectify the =
    problem(s)

    Note that this crash is only possible on v4 filesystemsi or v5
    filesystems mounted with the ikeep mount option. For all other V5
    filesystems, this problem cannot occur because we don't read inodes
    we are allocating from disk - we simply overwrite them with the new
    inode information.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Carlos Maiolino
    Tested-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Cc: Eduardo Valentin
    Signed-off-by: Greg Kroah-Hartman

    Dave Chinner
     

11 Jul, 2018

2 commits

  • commit 80660f20252d6f76c9f203874ad7c7a4a8508cf8 upstream.

    The function return values are confusing with the way the function is
    named. We expect a true or false return value but it actually returns
    0/-errno. This makes the code very confusing. Changing the return values
    to return a bool where if DAX is supported then return true and no DAX
    support returns false.

    Signed-off-by: Dave Jiang
    Signed-off-by: Ross Zwisler
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Dave Jiang
     
  • commit ba23cba9b3bdc967aabdc6ff1e3e9b11ce05bb4f upstream.

    Change bdev_dax_supported so it takes a bdev parameter. This enables
    multi-device filesystems like xfs to check that a dax device can work for
    the particular filesystem. Once that's in place, actually fix all the
    parts of XFS where we need to be able to distinguish between datadev and
    rtdev.

    This patch fixes the problem where we screw up the dax support checking
    in xfs if the datadev and rtdev have different dax capabilities.

    Signed-off-by: Darrick J. Wong
    [rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases]
    Signed-off-by: Ross Zwisler
    Reviewed-by: Eric Sandeen
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     

05 Jun, 2018

2 commits

  • commit a27ba2607e60312554cbcd43fc660b2c7f29dc9c upstream.

    The struct xfs_agfl v5 header was originally introduced with
    unexpected padding that caused the AGFL to operate with one less
    slot than intended. The header has since been packed, but the fix
    left an incompatibility for users who upgrade from an old kernel
    with the unpacked header to a newer kernel with the packed header
    while the AGFL happens to wrap around the end. The newer kernel
    recognizes one extra slot at the physical end of the AGFL that the
    previous kernel did not. The new kernel will eventually attempt to
    allocate a block from that slot, which contains invalid data, and
    cause a crash.

    This condition can be detected by comparing the active range of the
    AGFL to the count. While this detects a padding mismatch, it can
    also trigger false positives for unrelated flcount corruption. Since
    we cannot distinguish a size mismatch due to padding from unrelated
    corruption, we can't trust the AGFL enough to simply repopulate the
    empty slot.

    Instead, avoid unnecessarily complex detection logic and and use a
    solution that can handle any form of flcount corruption that slips
    through read verifiers: distrust the entire AGFL and reset it to an
    empty state. Any valid blocks within the AGFL are intentionally
    leaked. This requires xfs_repair to rectify (which was already
    necessary based on the state the AGFL was found in). The reset
    mitigates the side effect of the padding mismatch problem from a
    filesystem crash to a free space accounting inconsistency. The
    generic approach also means that this patch can be safely backported
    to kernels with or without a packed struct xfs_agfl.

    Check the AGF for an invalid freelist count on initial read from
    disk. If detected, set a flag on the xfs_perag to indicate that a
    reset is required before the AGFL can be used. In the first
    transaction that attempts to use a flagged AGFL, reset it to empty,
    warn the user about the inconsistency and allow the freelist fixup
    code to repopulate the AGFL with new blocks. The xfs_perag flag is
    cleared to eliminate the need for repeated checks on each block
    allocation operation.

    This allows kernels that include the packing fix commit 96f859d52bcb
    ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
    to handle older unpacked AGFL formats without a filesystem crash.

    Suggested-by: Dave Chinner
    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Reviewed-by Dave Chiluk
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Brian Foster
     
  • commit a78ee256c325ecfaec13cafc41b315bd4e1dd518 upstream.

    The AGFL size calculation is about to get more complex, so lets turn
    the macro into a function first and remove the macro.

    Signed-off-by: Dave Chinner
    [darrick: forward port to newer kernel, simplify the helper]
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Signed-off-by: Greg Kroah-Hartman

    Dave Chinner
     

30 May, 2018

1 commit

  • [ Upstream commit 8c81dd46ef3c416b3b95e3020fb90dbd44e6140b ]

    Forcing the log to disk after reading the agf is wrong, we might be
    calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.

    This can cause a deadlock when racing a fstrim with a filesystem
    shutdown.

    The deadlock has been identified due a miscalculation bug in device-mapper
    dm-thin, which returns lack of space to its users earlier than the device itself
    really runs out of space, changing the device-mapper volume into an error state.

    The problem happened while filling the filesystem with a single file,
    triggering the bug in device-mapper, consequently causing an IO error
    and shutting down the filesystem.

    If such file is removed, and fstrim executed before the XFS finishes the
    shut down process, the fstrim process will end up holding the buffer
    lock, and going to sleep on the cil wait queue.

    At this point, the shut down process will try to wake up all the threads
    waiting on the cil wait queue, but for this, it will try to hold the
    same buffer log already held my the fstrim, locking up the filesystem.

    Signed-off-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Carlos Maiolino
     

09 May, 2018

1 commit

  • commit 7d83fb14258b9961920cd86f0b921caaeb3ebe85 upstream.

    During the "insert range" fallocate operation, i_size grows by the
    specified 'len' bytes. XFS verifies that i_size + len < s_maxbytes, as
    it should. But this comparison is done using the signed 'loff_t', and
    'i_size + len' can wrap around to a negative value, causing the check to
    incorrectly pass, resulting in an inode with "negative" i_size. This is
    possible on 64-bit platforms, where XFS sets s_maxbytes = LLONG_MAX.
    ext4 and f2fs don't run into this because they set a smaller s_maxbytes.

    Fix it by using subtraction instead.

    Reproducer:
    xfs_io -f file -c "truncate $(((1< # v4.1+
    Originally-From: Eric Biggers
    Signed-off-by: Eric Biggers
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    [darrick: fix signed integer addition overflow too]
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     

03 Mar, 2018

2 commits

  • [ Upstream commit 3a3882ff26fbdbaf5f7e13f6a0bccfbf7121041d ]

    xfs_qm_init_quotainfo() does not check result of register_shrinker()
    which was tagged as __must_check recently, reported by sparse.

    Signed-off-by: Aliaksei Karaliou
    [darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos]
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Aliaksei Karaliou
     
  • [ Upstream commit 2196881566225f3c3428d1a5f847a992944daa5b ]

    xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock
    while destroys quotainfo->qi_quotaofflock.

    Signed-off-by: Aliaksei Karaliou
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Aliaksei Karaliou
     

04 Feb, 2018

5 commits

  • [ Upstream commit 373b0589dc8d58bc09c9a28d03611ae4fb216057 ]

    Once the inode item writeback errors is already fixed, it's time to fix the same
    problem in dquot code.

    Although there were no reports of users hitting this bug in dquot code (at least
    none I've seen), the bug is there and I was already planning to fix it when the
    correct approach to fix the inodes part was decided.

    This patch aims to fix the same problem in dquot code, regarding failed buffers
    being unable to be resubmitted once they are flush locked.

    Tested with the recently test-case sent to fstests list by Hou Tao.

    Reviewed-by: Brian Foster
    Signed-off-by: Carlos Maiolino
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Carlos Maiolino
     
  • [ Upstream commit 22a6c83777ac7c17d6c63891beeeac24cf5da450 ]

    Fix some complaints from the UBSAN about signed integer addition overflows.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • [ Upstream commit d210a9874b8f6166579408131cb74495caff1958 ]

    percpu_counter_init failure path doesn't clean up &btp->bt_lru list.
    Call list_lru_destroy in that error path. Similarly register_shrinker
    error path is not handled.

    While it is unlikely to trigger these error path, it is not impossible
    especially the later might fail with large NUMAs. Let's handle the
    failure to make the code more robust.

    Noticed-by: Tetsuo Handa
    Signed-off-by: Michal Hocko
    Acked-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     
  • [ Upstream commit 509955823cc9cc225c05673b1b83d70ca70c5c60 ]

    As part of testing log recovery with dm_log_writes, Amir Goldstein
    discovered an error in the deferred ops recovery that lead to corruption
    of the filesystem metadata if a reflink+rmap filesystem happened to shut
    down midway through a CoW remap:

    "This is what happens [after failed log recovery]:

    "Phase 1 - find and verify superblock...
    "Phase 2 - using internal log
    " - zero log...
    " - scan filesystem freespace and inode maps...
    " - found root inode chunk
    "Phase 3 - for each AG...
    " - scan (but don't clear) agi unlinked lists...
    " - process known inodes and perform inode discovery...
    " - agno = 0
    "data fork in regular inode 134 claims CoW block 376
    "correcting nextents for inode 134
    "bad data fork in inode 134
    "would have cleared inode 134"

    Hou Tao dissected the log contents of exactly such a crash:

    "According to the implementation of xfs_defer_finish(), these ops should
    be completed in the following sequence:

    "Have been done:
    "(1) CUI: Oper (160)
    "(2) BUI: Oper (161)
    "(3) CUD: Oper (194), for CUI Oper (160)
    "(4) RUI A: Oper (197), free rmap [0x155, 2, -9]

    "Should be done:
    "(5) BUD: for BUI Oper (161)
    "(6) RUI B: add rmap [0x155, 2, 137]
    "(7) RUD: for RUI A
    "(8) RUD: for RUI B

    "Actually be done by xlog_recover_process_intents()
    "(5) BUD: for BUI Oper (161)
    "(6) RUI B: add rmap [0x155, 2, 137]
    "(7) RUD: for RUI B
    "(8) RUD: for RUI A

    "So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
    then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
    from the log record in post_mount.log (generated after umount) and the trace
    print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
    entry [0x155, 2, -9] are freed."

    When reconstructing the internal log state from the log items found on
    disk, it's required that deferred ops replay in exactly the same order
    that they would have had the filesystem not gone down. However,
    replaying unfinished deferred ops can create /more/ deferred ops. These
    new deferred ops are finished in the wrong order. This causes fs
    corruption and replay crashes, so let's create a single defer_ops to
    handle the subsequent ops created during replay, then use one single
    transaction at the end of log recovery to ensure that everything is
    replayed in the same order as they're supposed to be.

    Reported-by: Amir Goldstein
    Analyzed-by: Hou Tao
    Reviewed-by: Christoph Hellwig
    Tested-by: Amir Goldstein
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • [ Upstream commit 98c4f78dcdd8cec112d1cbc5e9a792ee6e5ab7a6 ]

    In xfs_ifree, we reset the data/attr forks to extents format without
    bothering to free any inline data buffer that might still be around
    after all the blocks have been truncated off the file. Prior to commit
    43518812d2 ("xfs: remove support for inlining data/extents into the
    inode fork") nobody noticed because the leftover inline data after
    truncation was small enough to fit inside the inline buffer inside the
    fork itself.

    However, now that we've removed the inline buffer, we /always/ have to
    free the inline data buffer or else we leak them like crazy. This test
    was found by turning on kmemleak for generic/001 or generic/388.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     

20 Dec, 2017

4 commits

  • [ Upstream commit 5e422f5e4fd71d18bc6b851eeb3864477b3d842e ]

    There was one spot in xfs_bmap_add_extent_unwritten_real that didn't use the
    passed in new extent state but always converted to normal, leading to wrong
    behavior when converting from normal to unwritten.

    Only found by code inspection, it seems like this code path to move partial
    extent from written to unwritten while merging it with the next extent is
    rarely exercised.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Christoph Hellwig
     
  • [ Upstream commit ed438b476b611c67089760037139f93ea8ed41d5 ]

    For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
    return ENODATA so that we don't confuse it with the pre-existing ENOENT
    cases (inode is in cache, but freed).

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     
  • [ Upstream commit 9f2a4505800607e537e9dd9dea4f55c4b0c30c7a ]

    It is possible for mkfs to format very small filesystems with too
    small of an internal log with respect to the various minimum size
    and block count requirements. If this occurs when the log happens to
    be smaller than the scan window used for cycle verification and the
    scan wraps the end of the log, the start_blk calculation in
    xlog_find_head() underflows and leads to an attempt to scan an
    invalid range of log blocks. This results in log recovery failure
    and a failed mount.

    Since there may be filesystems out in the wild with this kind of
    geometry, we cannot simply refuse to mount. Instead, cap the scan
    window for cycle verification to the size of the physical log. This
    ensures that the cycle verification proceeds as expected when the
    scan wraps the end of the log.

    Reported-by: Zorro Lang
    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Brian Foster
     
  • [ Upstream commit 350976ae21873b0d36584ea005076356431b8f79 ]

    On truncate down, if new size is not block size aligned, we zero the
    rest of block to avoid exposing stale data to user, and
    iomap_truncate_page() skips zeroing if the range is already in
    unwritten state or a hole. Then we writeback from on-disk i_size to
    the new size if this range hasn't been written to disk yet, and
    truncate page cache beyond new EOF and set in-core i_size.

    The problem is that we could write data between di_size and newsize
    before removing the page cache beyond newsize, as the extents may
    still be in unwritten state right after a buffer write. As such, the
    page of data that newsize lies in has not been zeroed by page cache
    invalidation before it is written, and xfs_do_writepage() hasn't
    triggered it's "zero data beyond EOF" case because we haven't
    updated in-core i_size yet. Then a subsequent mmap read could see
    non-zeros past EOF.

    I occasionally see this in fsx runs in fstests generic/112, a
    simplified fsx operation sequence is like (assuming 4k block size
    xfs):

    fallocate 0x0 0x1000 0x0 keep_size
    write 0x0 0x1000 0x0
    truncate 0x0 0x800 0x1000
    punch_hole 0x0 0x800 0x800
    mapread 0x0 0x800 0x800

    where fallocate allocates unwritten extent but doesn't update
    i_size, buffer write populates the page cache and extent is still
    unwritten, truncate skips zeroing page past new EOF and writes the
    page to disk, punch_hole invalidates the page cache, at last mapread
    reads the block back and sees non-zero beyond EOF.

    Fix it by moving truncate_setsize() to before writeback so the page
    cache invalidation zeros the partial page at the new EOF. This also
    triggers "zero data beyond EOF" in xfs_do_writepage() at writeback
    time, because newsize has been set and page straddles the newsize.

    Also fixed the wrong 'end' param of filemap_write_and_wait_range()
    call while we're at it, the 'end' is inclusive and should be
    'newsize - 1'.

    Suggested-by: Dave Chinner
    Signed-off-by: Eryu Guan
    Acked-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Eryu Guan
     

14 Dec, 2017

1 commit

  • [ Upstream commit 962cc1ad6caddb5abbb9f0a43e5abe7131a71f18 ]

    In commit f2e9ad21 ("xfs: check for race with xfs_reclaim_inode"), we
    skip an inode if we're racing with freeing the inode via
    xfs_reclaim_inode, but we forgot to release the rcu read lock when
    dumping the inode, with the result that we exit to userspace with a lock
    held. Don't do that; generic/320 with a 1k block size fails this
    very occasionally.

    ================================================
    WARNING: lock held when returning to user space!
    4.14.0-rc6-djwong #4 Tainted: G W
    ------------------------------------------------
    rm/30466 is leaving the kernel with locks still held!
    1 lock held by rm/30466:
    #0: (rcu_read_lock){....}, at: [] xfs_ifree_cluster.isra.17+0x2c3/0x6f0 [xfs]
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 30466 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x71/0x700
    Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
    CPU: 1 PID: 30466 Comm: rm Tainted: G W 4.14.0-rc6-djwong #4
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
    task: ffff880037680000 task.stack: ffffc90001064000
    RIP: 0010:rcu_note_context_switch+0x71/0x700
    RSP: 0000:ffffc90001067e50 EFLAGS: 00010002
    RAX: 0000000000000001 RBX: ffff880037680000 RCX: ffff88003e73d200
    RDX: 0000000000000002 RSI: ffffffff819e53e9 RDI: ffffffff819f4375
    RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880062c900d0
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037680000
    R13: 0000000000000000 R14: ffffc90001067eb8 R15: ffff880037680690
    FS: 00007fa3b8ce8700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f69bf77c000 CR3: 000000002450a000 CR4: 00000000000006e0
    Call Trace:
    __schedule+0xb8/0xb10
    schedule+0x40/0x90
    exit_to_usermode_loop+0x6b/0xa0
    prepare_exit_to_usermode+0x7a/0x90
    retint_user+0x8/0x20
    RIP: 0033:0x7fa3b87fda87
    RSP: 002b:00007ffe41206568 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
    RAX: 0000000000000000 RBX: 00000000010e88c0 RCX: 00007fa3b87fda87
    RDX: 0000000000000000 RSI: 00000000010e89c8 RDI: 0000000000000005
    RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
    R10: 000000000000015e R11: 0000000000000246 R12: 00000000010c8060
    R13: 00007ffe41206690 R14: 0000000000000000 R15: 0000000000000000
    ---[ end trace e88f83bf0cfbd07d ]---

    Fixes: f2e9ad212def50bcf4c098c6288779dd97fff0f0
    Cc: Omar Sandoval
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Omar Sandoval
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     

03 Nov, 2017

1 commit

  • …el/git/gregkh/driver-core

    Pull initial SPDX identifiers from Greg KH:
    "License cleanup: add SPDX license identifiers to some files

    Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the
    'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
    binding shorthand, which can be used instead of the full boiler plate
    text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart
    and Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset
    of the use cases:

    - file had no licensing information it it.

    - file was a */uapi/* one with no licensing information in it,

    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to
    license had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied
    to a file was done in a spreadsheet of side by side results from of
    the output of two independent scanners (ScanCode & Windriver)
    producing SPDX tag:value files created by Philippe Ombredanne.
    Philippe prepared the base worksheet, and did an initial spot review
    of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537
    files assessed. Kate Stewart did a file by file comparison of the
    scanner results in the spreadsheet to determine which SPDX license
    identifier(s) to be applied to the file. She confirmed any
    determination that was not immediately clear with lawyers working with
    the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:

    - Files considered eligible had to be source code files.

    - Make and config files were included as candidates if they contained
    >5 lines of source

    - File already had some variant of a license header in it (even if <5
    lines).

    All documentation files were explicitly excluded.

    The following heuristics were used to determine which SPDX license
    identifiers to apply.

    - when both scanners couldn't find any license traces, file was
    considered to have no license information in it, and the top level
    COPYING file license applied.

    For non */uapi/* files that summary was:

    SPDX license identifier # files
    ---------------------------------------------------|-------
    GPL-2.0 11139

    and resulted in the first patch in this series.

    If that file was a */uapi/* path one, it was "GPL-2.0 WITH
    Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
    was:

    SPDX license identifier # files
    ---------------------------------------------------|-------
    GPL-2.0 WITH Linux-syscall-note 930

    and resulted in the second patch in this series.

    - if a file had some form of licensing information in it, and was one
    of the */uapi/* ones, it was denoted with the Linux-syscall-note if
    any GPL family license was found in the file or had no licensing in
    it (per prior point). Results summary:

    SPDX license identifier # files
    ---------------------------------------------------|------
    GPL-2.0 WITH Linux-syscall-note 270
    GPL-2.0+ WITH Linux-syscall-note 169
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
    LGPL-2.1+ WITH Linux-syscall-note 15
    GPL-1.0+ WITH Linux-syscall-note 14
    ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
    LGPL-2.0+ WITH Linux-syscall-note 4
    LGPL-2.1 WITH Linux-syscall-note 3
    ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
    ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

    and that resulted in the third patch in this series.

    - when the two scanners agreed on the detected license(s), that
    became the concluded license(s).

    - when there was disagreement between the two scanners (one detected
    a license but the other didn't, or they both detected different
    licenses) a manual inspection of the file occurred.

    - In most cases a manual inspection of the information in the file
    resulted in a clear resolution of the license that should apply
    (and which scanner probably needed to revisit its heuristics).

    - When it was not immediately clear, the license identifier was
    confirmed with lawyers working with the Linux Foundation.

    - If there was any question as to the appropriate license identifier,
    the file was flagged for further research and to be revisited later
    in time.

    In total, over 70 hours of logged manual review was done on the
    spreadsheet to determine the SPDX license identifiers to apply to the
    source files by Kate, Philippe, Thomas and, in some cases,
    confirmation by lawyers working with the Linux Foundation.

    Kate also obtained a third independent scan of the 4.13 code base from
    FOSSology, and compared selected files where the other two scanners
    disagreed against that SPDX file, to see if there was new insights.
    The Windriver scanner is based on an older version of FOSSology in
    part, so they are related.

    Thomas did random spot checks in about 500 files from the spreadsheets
    for the uapi headers and agreed with SPDX license identifier in the
    files he inspected. For the non-uapi files Thomas did random spot
    checks in about 15000 files.

    In initial set of patches against 4.14-rc6, 3 files were found to have
    copy/paste license identifier errors, and have been fixed to reflect
    the correct identifier.

    Additionally Philippe spent 10 hours this week doing a detailed manual
    inspection and review of the 12,461 patched files from the initial
    patch version early this week with:

    - a full scancode scan run, collecting the matched texts, detected
    license ids and scores

    - reviewing anything where there was a license detected (about 500+
    files) to ensure that the applied SPDX license was correct

    - reviewing anything where there was no detection but the patch
    license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
    applied SPDX license was correct

    This produced a worksheet with 20 files needing minor correction. This
    worksheet was then exported into 3 different .csv files for the
    different types of files to be modified.

    These .csv files were then reviewed by Greg. Thomas wrote a script to
    parse the csv files and add the proper SPDX tag to the file, in the
    format that the file expected. This script was further refined by Greg
    based on the output to detect more types of files automatically and to
    distinguish between header and source .c files (which need different
    comment types.) Finally Greg ran the script using the .csv files to
    generate the patches.

    Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
    Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

    * tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    License cleanup: add SPDX license identifier to uapi header files with a license
    License cleanup: add SPDX license identifier to uapi header files with no license
    License cleanup: add SPDX GPL-2.0 license identifier to files with no license

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

26 Oct, 2017

1 commit


24 Oct, 2017

1 commit

  • Apparently our current rwsem code doesn't like doing the trylock, then
    lock for real scheme. So change our read/write methods to just do the
    trylock for the RWF_NOWAIT case. This fixes a ~25% regression in
    AIM7.

    Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads")
    Reported-by: kernel test robot
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

22 Oct, 2017

1 commit


19 Oct, 2017

1 commit


17 Oct, 2017

3 commits

  • The last cleanup introduced two harmless warnings:

    fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not used
    fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' defined but not used

    This moves those two functions as well.

    Fixes: bb9c2e543325 ("xfs: move more RT specific code under CONFIG_XFS_RT")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Brian Foster
    Acked-by: Geert Uytterhoeven
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Arnd Bergmann
     
  • The writeback rework in commit fbcc02561359 ("xfs: Introduce
    writeback context for writepages") introduced a subtle change in
    behavior with regard to the block mapping used across the
    ->writepages() sequence. The previous xfs_cluster_write() code would
    only flush pages up to EOF at the time of the writepage, thus
    ensuring that any pages due to file-extending writes would be
    handled on a separate cycle and with a new, updated block mapping.

    The updated code establishes a block mapping in xfs_writepage_map()
    that could extend beyond EOF if the file has post-eof preallocation.
    Because we now use the generic writeback infrastructure and pass the
    cached mapping to each writepage call, there is no implicit EOF
    limit in place. If eofblocks trimming occurs during ->writepages(),
    any post-eof portion of the cached mapping becomes invalid. The
    eofblocks code has no means to serialize against writeback because
    there are no pages associated with post-eof blocks. Therefore if an
    eofblocks trim occurs and is followed by a file-extending buffered
    write, not only has the mapping become invalid, but we could end up
    writing a page to disk based on the invalid mapping.

    Consider the following sequence of events:

    - A buffered write creates a delalloc extent and post-eof
    speculative preallocation.
    - Writeback starts and on the first writepage cycle, the delalloc
    extent is converted to real blocks (including the post-eof blocks)
    and the mapping is cached.
    - The file is closed and xfs_release() trims post-eof blocks. The
    cached writeback mapping is now invalid.
    - Another buffered write appends the file with a delalloc extent.
    - The concurrent writeback cycle picks up the just written page
    because the writeback range end is LLONG_MAX. xfs_writepage_map()
    attributes it to the (now invalid) cached mapping and writes the
    data to an incorrect location on disk (and where the file offset is
    still backed by a delalloc extent).

    This problem is reproduced by xfstests test generic/464, which
    triggers racing writes, appends, open/closes and writeback requests.

    To address this problem, trim the mapping used during writeback to
    within EOF when the mapping is validated. This ensures the mapping
    is revalidated for any pages encountered beyond EOF as of the time
    the current mapping was cached or last validated.

    Reported-by: Eryu Guan
    Diagnosed-by: Eryu Guan
    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • Recently we've had warnings arise from the vm handing us pages
    without bufferheads attached to them. This should not ever occur
    in XFS, but we don't defend against it properly if it does. The only
    place where we remove bufferheads from a page is in
    xfs_vm_releasepage(), but we can't tell the difference here between
    "page is dirty so don't release" and "page is dirty but is being
    invalidated so release it".

    In some places that are invalidating pages ask for pages to be
    released and follow up afterward calling ->releasepage by checking
    whether the page was dirty and then aborting the invalidation. This
    is a possible vector for releasing buffers from a page but then
    leaving it in the mapping, so we really do need to avoid dirty pages
    in xfs_vm_releasepage().

    To differentiate between invalidated pages and normal pages, we need
    to clear the page dirty flag when invalidating the pages. This can
    be done through xfs_vm_invalidatepage(), and will result
    xfs_vm_releasepage() seeing the page as clean which matches the
    bufferhead state on the page after calling block_invalidatepage().

    Hence we can re-add the page dirty check in xfs_vm_releasepage to
    catch the case where we might be releasing a page that is actually
    dirty and so should not have the bufferheads on it removed. This
    will remove one possible vector of "dirty page with no bufferheads"
    and so help narrow down the search for the root cause of that
    problem.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

12 Oct, 2017

6 commits

  • Jason reported that a corrupted filesystem failed to replay
    the log with a metadata block out of bounds warning:

    XFS (dm-2): _xfs_buf_find: Block out of range: block 0x80270fff8, EOFS 0x9c40000

    _xfs_buf_find() and xfs_btree_get_bufs() return NULL if
    that happens, and then when xfs_alloc_fix_freelist() calls
    xfs_trans_binval() on that NULL bp, we oops with:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8

    We don't handle _xfs_buf_find errors very well, every
    caller higher up the stack gets to guess at why it failed.
    But we should at least handle it somehow, so return
    EFSCORRUPTED here.

    Reported-by: Jason L Tibbitts III
    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Eric Sandeen
     
  • xfs_attr3_root_inactive() walks the attr fork tree to invalidate the
    associated blocks. xfs_attr3_node_inactive() recursively descends
    from internal blocks to leaf blocks, caching block address values
    along the way to revisit parent blocks, locate the next entry and
    descend down that branch of the tree.

    The code that attempts to reread the parent block is unsafe because
    it assumes that the local xfs_da_node_entry pointer remains valid
    after an xfs_trans_brelse() and re-read of the parent buffer. Under
    heavy memory pressure, it is possible that the buffer has been
    reclaimed and reallocated by the time the parent block is reread.
    This means that 'btree' can point to an invalid memory address, lead
    to a random/garbage value for child_fsb and cause the subsequent
    read of the attr fork to go off the rails and return a NULL buffer
    for an attr fork offset that is most likely not allocated.

    Note that this problem can be manufactured by setting
    XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers,
    creating a file with a multi-level attr fork and removing it to
    trigger inactivation.

    To address this problem, reinit the node/btree pointers to the
    parent buffer after it has been re-read. This ensures btree points
    to a valid record and allows the walk to proceed.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     
  • Bool initializations should use true and false. Bool tests don't need
    comparisons.

    Signed-off-by: Thomas Meyer
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Thomas Meyer
     
  • If we get ENOSPC half way through setting the ACL, the inode mode
    can still be changed even though the ACL does not exist. Reorder the
    operation to only change the mode of the inode if the ACL is set
    correctly.

    Whilst this does not fix the problem with crash consistency (that requires
    attribute addition to be a deferred op) it does prevent ENOSPC and other
    non-fatal errors setting an xattr to be handled sanely.

    This fixes xfstests generic/449.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Various utility functions and interfaces that iterate internal
    devices try to reference the realtime device even when RT support is
    not compiled into the kernel.

    Make sure this code is excluded from the CONFIG_XFS_RT=n build,
    and where appropriate stub functions to return fatal errors if
    they ever get called when RT support is not present.

    Signed-Off-By: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     
  • Prevent kmemcheck from throwing warnings about reading uninitialised
    memory when formatting inodes into the incore log buffer. There are
    several issues here - we don't always log all the fields in the
    inode log format item, and we never log the inode the
    di_next_unlinked field.

    In the case of the inode log format item, this is exacerbated
    by the old xfs_inode_log_format structure padding issue. Hence make
    the padded, 64 bit aligned version of the structure the one we always
    use for formatting the log and get rid of the 64 bit variant. This
    means we'll always log the 64-bit version and so recovery only needs
    to convert from the unpadded 32 bit version from older 32 bit
    kernels.

    Signed-Off-By: Dave Chinner
    Tested-by: Tetsuo Handa
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

04 Oct, 2017

2 commits