07 Oct, 2020

2 commits

  • Currently regardless of a full or a fast fsync we always wait for ordered
    extents to complete, and then start logging the inode after that. However
    for fast fsyncs we can just wait for the writeback to complete, we don't
    need to wait for the ordered extents to complete since we use the list of
    modified extents maps to figure out which extents we must log and we can
    get their checksums directly from the ordered extents that are still in
    flight, otherwise look them up from the checksums tree.

    Until commit b5e6c3e170b770 ("btrfs: always wait on ordered extents at
    fsync time"), for fast fsyncs, we used to start logging without even
    waiting for the writeback to complete first, we would wait for it to
    complete after logging, while holding a transaction open, which lead to
    performance issues when using cgroups and probably for other cases too,
    as wait for IO while holding a transaction handle should be avoided as
    much as possible. After that, for fast fsyncs, we started to wait for
    ordered extents to complete before starting to log, which adds some
    latency to fsyncs and we even got at least one report about a performance
    drop which bisected to that particular change:

    https://lore.kernel.org/linux-btrfs/20181109215148.GF23260@techsingularity.net/

    This change makes fast fsyncs only wait for writeback to finish before
    starting to log the inode, instead of waiting for both the writeback to
    finish and for the ordered extents to complete. This brings back part of
    the logic we had that extracts checksums from in flight ordered extents,
    which are not yet in the checksums tree, and making sure transaction
    commits wait for the completion of ordered extents previously logged
    (by far most of the time they have already completed by the time a
    transaction commit starts, resulting in no wait at all), to avoid any
    data loss if an ordered extent completes after the transaction used to
    log an inode is committed, followed by a power failure.

    When there are no other tasks accessing the checksums and the subvolume
    btrees, the ordered extent completion is pretty fast, typically taking
    100 to 200 microseconds only in my observations. However when there are
    other tasks accessing these btrees, ordered extent completion can take a
    lot more time due to lock contention on nodes and leaves of these btrees.
    I've seen cases over 2 milliseconds, which starts to be significant. In
    particular when we do have concurrent fsyncs against different files there
    is a lot of contention on the checksums btree, since we have many tasks
    writing the checksums into the btree and other tasks that already started
    the logging phase are doing lookups for checksums in the btree.

    This change also turns all ranged fsyncs into full ranged fsyncs, which
    is something we already did when not using the NO_HOLES features or when
    doing a full fsync. This is to guarantee we never miss checksums due to
    writeback having been triggered only for a part of an extent, and we end
    up logging the full extent but only checksums for the written range, which
    results in missing checksums after log replay. Allowing ranged fsyncs to
    operate again only in the original range, when using the NO_HOLES feature
    and doing a fast fsync is doable but requires some non trivial changes to
    the writeback path, which can always be worked on later if needed, but I
    don't think they are a very common use case.

    Several tests were performed using fio for different numbers of concurrent
    jobs, each writing and fsyncing its own file, for both sequential and
    random file writes. The tests were run on bare metal, no virtualization,
    on a box with 12 cores (Intel i7-8700), 64Gb of RAM and a NVMe device,
    with a kernel configuration that is the default of typical distributions
    (debian in this case), without debug options enabled (kasan, kmemleak,
    slub debug, debug of page allocations, lock debugging, etc).

    The following script that calls fio was used:

    $ cat test-fsync.sh
    #!/bin/bash

    DEV=/dev/nvme0n1
    MNT=/mnt/btrfs
    MOUNT_OPTIONS="-o ssd -o space_cache=v2"
    MKFS_OPTIONS="-d single -m single"

    if [ $# -ne 5 ]; then
    echo "Use $0 NUM_JOBS FILE_SIZE FSYNC_FREQ BLOCK_SIZE [write|randwrite]"
    exit 1
    fi

    NUM_JOBS=$1
    FILE_SIZE=$2
    FSYNC_FREQ=$3
    BLOCK_SIZE=$4
    WRITE_MODE=$5

    if [ "$WRITE_MODE" != "write" ] && [ "$WRITE_MODE" != "randwrite" ]; then
    echo "Invalid WRITE_MODE, must be 'write' or 'randwrite'"
    exit 1
    fi

    cat < /tmp/fio-job.ini
    [writers]
    rw=$WRITE_MODE
    fsync=$FSYNC_FREQ
    fallocate=none
    group_reporting=1
    direct=0
    bs=$BLOCK_SIZE
    ioengine=sync
    size=$FILE_SIZE
    directory=$MNT
    numjobs=$NUM_JOBS
    EOF

    echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

    echo
    echo "Using config:"
    echo
    cat /tmp/fio-job.ini
    echo

    umount $MNT &> /dev/null
    mkfs.btrfs -f $MKFS_OPTIONS $DEV
    mount $MOUNT_OPTIONS $DEV $MNT
    fio /tmp/fio-job.ini
    umount $MNT

    The results were the following:

    *************************
    *** sequential writes ***
    *************************

    ==== 1 job, 8GiB file, fsync frequency 1, block size 64KiB ====

    Before patch:

    WRITE: bw=36.6MiB/s (38.4MB/s), 36.6MiB/s-36.6MiB/s (38.4MB/s-38.4MB/s), io=8192MiB (8590MB), run=223689-223689msec

    After patch:

    WRITE: bw=40.2MiB/s (42.1MB/s), 40.2MiB/s-40.2MiB/s (42.1MB/s-42.1MB/s), io=8192MiB (8590MB), run=203980-203980msec
    (+9.8%, -8.8% runtime)

    ==== 2 jobs, 4GiB files, fsync frequency 1, block size 64KiB ====

    Before patch:

    WRITE: bw=35.8MiB/s (37.5MB/s), 35.8MiB/s-35.8MiB/s (37.5MB/s-37.5MB/s), io=8192MiB (8590MB), run=228950-228950msec

    After patch:

    WRITE: bw=43.5MiB/s (45.6MB/s), 43.5MiB/s-43.5MiB/s (45.6MB/s-45.6MB/s), io=8192MiB (8590MB), run=188272-188272msec
    (+21.5% throughput, -17.8% runtime)

    ==== 4 jobs, 2GiB files, fsync frequency 1, block size 64KiB ====

    Before patch:

    WRITE: bw=50.1MiB/s (52.6MB/s), 50.1MiB/s-50.1MiB/s (52.6MB/s-52.6MB/s), io=8192MiB (8590MB), run=163446-163446msec

    After patch:

    WRITE: bw=64.5MiB/s (67.6MB/s), 64.5MiB/s-64.5MiB/s (67.6MB/s-67.6MB/s), io=8192MiB (8590MB), run=126987-126987msec
    (+28.7% throughput, -22.3% runtime)

    ==== 8 jobs, 1GiB files, fsync frequency 1, block size 64KiB ====

    Before patch:

    WRITE: bw=64.0MiB/s (68.1MB/s), 64.0MiB/s-64.0MiB/s (68.1MB/s-68.1MB/s), io=8192MiB (8590MB), run=126075-126075msec

    After patch:

    WRITE: bw=86.8MiB/s (91.0MB/s), 86.8MiB/s-86.8MiB/s (91.0MB/s-91.0MB/s), io=8192MiB (8590MB), run=94358-94358msec
    (+35.6% throughput, -25.2% runtime)

    ==== 16 jobs, 512MiB files, fsync frequency 1, block size 64KiB ====

    Before patch:

    WRITE: bw=79.8MiB/s (83.6MB/s), 79.8MiB/s-79.8MiB/s (83.6MB/s-83.6MB/s), io=8192MiB (8590MB), run=102694-102694msec

    After patch:

    WRITE: bw=107MiB/s (112MB/s), 107MiB/s-107MiB/s (112MB/s-112MB/s), io=8192MiB (8590MB), run=76446-76446msec
    (+34.1% throughput, -25.6% runtime)

    ==== 32 jobs, 512MiB files, fsync frequency 1, block size 64KiB ====

    Before patch:

    WRITE: bw=93.2MiB/s (97.7MB/s), 93.2MiB/s-93.2MiB/s (97.7MB/s-97.7MB/s), io=16.0GiB (17.2GB), run=175836-175836msec

    After patch:

    WRITE: bw=111MiB/s (117MB/s), 111MiB/s-111MiB/s (117MB/s-117MB/s), io=16.0GiB (17.2GB), run=147001-147001msec
    (+19.1% throughput, -16.4% runtime)

    ==== 64 jobs, 512MiB files, fsync frequency 1, block size 64KiB ====

    Before patch:

    WRITE: bw=108MiB/s (114MB/s), 108MiB/s-108MiB/s (114MB/s-114MB/s), io=32.0GiB (34.4GB), run=302656-302656msec

    After patch:

    WRITE: bw=133MiB/s (140MB/s), 133MiB/s-133MiB/s (140MB/s-140MB/s), io=32.0GiB (34.4GB), run=246003-246003msec
    (+23.1% throughput, -18.7% runtime)

    ************************
    *** random writes ***
    ************************

    ==== 1 job, 8GiB file, fsync frequency 16, block size 4KiB ====

    Before patch:

    WRITE: bw=11.5MiB/s (12.0MB/s), 11.5MiB/s-11.5MiB/s (12.0MB/s-12.0MB/s), io=8192MiB (8590MB), run=714281-714281msec

    After patch:

    WRITE: bw=11.6MiB/s (12.2MB/s), 11.6MiB/s-11.6MiB/s (12.2MB/s-12.2MB/s), io=8192MiB (8590MB), run=705959-705959msec
    (+0.9% throughput, -1.7% runtime)

    ==== 2 jobs, 4GiB files, fsync frequency 16, block size 4KiB ====

    Before patch:

    WRITE: bw=12.8MiB/s (13.5MB/s), 12.8MiB/s-12.8MiB/s (13.5MB/s-13.5MB/s), io=8192MiB (8590MB), run=638101-638101msec

    After patch:

    WRITE: bw=13.1MiB/s (13.7MB/s), 13.1MiB/s-13.1MiB/s (13.7MB/s-13.7MB/s), io=8192MiB (8590MB), run=625374-625374msec
    (+2.3% throughput, -2.0% runtime)

    ==== 4 jobs, 2GiB files, fsync frequency 16, block size 4KiB ====

    Before patch:

    WRITE: bw=15.4MiB/s (16.2MB/s), 15.4MiB/s-15.4MiB/s (16.2MB/s-16.2MB/s), io=8192MiB (8590MB), run=531146-531146msec

    After patch:

    WRITE: bw=17.8MiB/s (18.7MB/s), 17.8MiB/s-17.8MiB/s (18.7MB/s-18.7MB/s), io=8192MiB (8590MB), run=460431-460431msec
    (+15.6% throughput, -13.3% runtime)

    ==== 8 jobs, 1GiB files, fsync frequency 16, block size 4KiB ====

    Before patch:

    WRITE: bw=19.9MiB/s (20.8MB/s), 19.9MiB/s-19.9MiB/s (20.8MB/s-20.8MB/s), io=8192MiB (8590MB), run=412664-412664msec

    After patch:

    WRITE: bw=22.2MiB/s (23.3MB/s), 22.2MiB/s-22.2MiB/s (23.3MB/s-23.3MB/s), io=8192MiB (8590MB), run=368589-368589msec
    (+11.6% throughput, -10.7% runtime)

    ==== 16 jobs, 512MiB files, fsync frequency 16, block size 4KiB ====

    Before patch:

    WRITE: bw=29.3MiB/s (30.7MB/s), 29.3MiB/s-29.3MiB/s (30.7MB/s-30.7MB/s), io=8192MiB (8590MB), run=279924-279924msec

    After patch:

    WRITE: bw=30.4MiB/s (31.9MB/s), 30.4MiB/s-30.4MiB/s (31.9MB/s-31.9MB/s), io=8192MiB (8590MB), run=269258-269258msec
    (+3.8% throughput, -3.8% runtime)

    ==== 32 jobs, 512MiB files, fsync frequency 16, block size 4KiB ====

    Before patch:

    WRITE: bw=36.9MiB/s (38.7MB/s), 36.9MiB/s-36.9MiB/s (38.7MB/s-38.7MB/s), io=16.0GiB (17.2GB), run=443581-443581msec

    After patch:

    WRITE: bw=41.6MiB/s (43.6MB/s), 41.6MiB/s-41.6MiB/s (43.6MB/s-43.6MB/s), io=16.0GiB (17.2GB), run=394114-394114msec
    (+12.7% throughput, -11.2% runtime)

    ==== 64 jobs, 512MiB files, fsync frequency 16, block size 4KiB ====

    Before patch:

    WRITE: bw=45.9MiB/s (48.1MB/s), 45.9MiB/s-45.9MiB/s (48.1MB/s-48.1MB/s), io=32.0GiB (34.4GB), run=714614-714614msec

    After patch:

    WRITE: bw=48.8MiB/s (51.1MB/s), 48.8MiB/s-48.8MiB/s (51.1MB/s-51.1MB/s), io=32.0GiB (34.4GB), run=672087-672087msec
    (+6.3% throughput, -6.0% runtime)

    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     
  • Since commit d4682ba03ef618 ("Btrfs: sync log after logging new name") we
    started to commit logs, and fallback to transaction commits when we failed
    to log the new names or commit the logs, after link and rename operations
    when the target inodes (or their parents) were previously logged in the
    current transaction. This was to avoid losing directories despite an
    explicit fsync on them when they are ancestors of some inode that got a
    new named logged, due to a link or rename operation. However that adds the
    cost of starting IO and waiting for it to complete, which can cause higher
    latencies for applications.

    Instead of doing that, just make sure that when we log a new name for an
    inode we don't mark any of its ancestors as logged, so that if any one
    does an fsync against any of them, without doing any other change on them,
    the fsync commits the log. This way we only pay the cost of a log commit
    (or a transaction commit if something goes wrong or a new block group was
    created) if the application explicitly asks to fsync any of the parent
    directories.

    Using dbench, which mixes several filesystems operations including renames,
    revealed some significant latency gains. The following script that uses
    dbench was used to test this:

    #!/bin/bash

    DEV=/dev/nvme0n1
    MNT=/mnt/btrfs
    MOUNT_OPTIONS="-o ssd -o space_cache=v2"
    MKFS_OPTIONS="-m single -d single"
    THREADS=16

    echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    mkfs.btrfs -f $MKFS_OPTIONS $DEV
    mount $MOUNT_OPTIONS $DEV $MNT

    dbench -t 300 -D $MNT $THREADS

    umount $MNT

    The test was run on bare metal, no virtualization, on a box with 12 cores
    (Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
    configuration that is the default of typical distributions (debian in this
    case), without debug options enabled (kasan, kmemleak, slub debug, debug
    of page allocations, lock debugging, etc).

    Results before this patch:

    Operation Count AvgLat MaxLat
    ----------------------------------------
    NTCreateX 10750455 0.011 155.088
    Close 7896674 0.001 0.243
    Rename 455222 2.158 1101.947
    Unlink 2171189 0.067 121.638
    Deltree 256 2.425 7.816
    Mkdir 128 0.002 0.003
    Qpathinfo 9744323 0.006 21.370
    Qfileinfo 1707092 0.001 0.146
    Qfsinfo 1786756 0.001 11.228
    Sfileinfo 875612 0.003 21.263
    Find 3767281 0.025 9.617
    WriteX 5356924 0.011 211.390
    ReadX 16852694 0.003 9.442
    LockX 35008 0.002 0.119
    UnlockX 35008 0.001 0.138
    Flush 753458 4.252 1102.249

    Throughput 1128.35 MB/sec 16 clients 16 procs max_latency=1102.255 ms

    Results after this patch:

    16 clients, after

    Operation Count AvgLat MaxLat
    ----------------------------------------
    NTCreateX 11471098 0.012 448.281
    Close 8426396 0.001 0.925
    Rename 485746 0.123 267.183
    Unlink 2316477 0.080 63.433
    Deltree 288 2.830 11.144
    Mkdir 144 0.003 0.010
    Qpathinfo 10397420 0.006 10.288
    Qfileinfo 1822039 0.001 0.169
    Qfsinfo 1906497 0.002 14.039
    Sfileinfo 934433 0.004 2.438
    Find 4019879 0.026 10.200
    WriteX 5718932 0.011 200.985
    ReadX 17981671 0.003 10.036
    LockX 37352 0.002 0.076
    UnlockX 37352 0.001 0.109
    Flush 804018 5.015 778.033

    Throughput 1201.98 MB/sec 16 clients 16 procs max_latency=778.036 ms
    (+6.5% throughput, -29.4% max latency, -75.8% rename latency)

    Test case generic/498 from fstests tests the scenario that the previously
    mentioned commit fixed.

    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     

30 Apr, 2019

2 commits


17 Dec, 2018

1 commit

  • The io_err field of struct btrfs_log_ctx is no longer used after the
    recent simplification of the fast fsync path, where we now wait for
    ordered extents to complete before logging the inode. We did this in
    commit b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync
    time") and commit a2120a473a80 ("btrfs: clean up the left over
    logged_list usage") removed its last use.

    Reviewed-by: Nikolay Borisov
    Reviewed-by: Josef Bacik
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     

15 Oct, 2018

1 commit

  • btrfs_pin_log_trans defines the variable "ret" for return value, but it
    is not modified after initialization. Further, I find that none of the
    callers do handles the return value, so it is safe to drop the unneeded
    "ret" and make it return void.

    Signed-off-by: zhong jiang
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    zhong jiang
     

23 Aug, 2018

1 commit

  • When we add a new name for an inode which was logged in the current
    transaction, we update the inode in the log so that its new name and
    ancestors are added to the log. However when we do this we do not persist
    the log, so the changes remain in memory only, and as a consequence, any
    ancestors that were created in the current transaction are updated such
    that future calls to btrfs_inode_in_log() return true. This leads to a
    subsequent fsync against such new ancestor directories returning
    immediately, without persisting the log, therefore after a power failure
    the new ancestor directories do not exist, despite fsync being called
    against them explicitly.

    Example:

    $ mkfs.btrfs -f /dev/sdb
    $ mount /dev/sdb /mnt

    $ mkdir /mnt/A
    $ mkdir /mnt/B
    $ mkdir /mnt/A/C
    $ touch /mnt/B/foo
    $ xfs_io -c "fsync" /mnt/B/foo
    $ ln /mnt/B/foo /mnt/A/C/foo
    $ xfs_io -c "fsync" /mnt/A

    After the power failure, directory "A" does not exist, despite the explicit
    fsync on it.

    Instead of fixing this by changing the behaviour of the explicit fsync on
    directory "A" to persist the log instead of doing nothing, make the logging
    of the new file name (which happens when creating a hard link or renaming)
    persist the log. This approach not only is simpler, not requiring addition
    of new fields to the inode in memory structure, but also gives us the same
    behaviour as ext4, xfs and f2fs (possibly other filesystems too).

    A test case for fstests follows soon.

    Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes")
    Reported-by: Vijay Chidambaram
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     

12 Apr, 2018

1 commit


26 Mar, 2018

1 commit


14 Feb, 2017

6 commits


25 Aug, 2016

1 commit

  • Commit 44f714dae50a ("Btrfs: improve performance on fsync against new
    inode after rename/unlink"), which landed in 4.8-rc2, introduced a
    possibility for a deadlock due to double locking of an inode's log mutex
    by the same task, which lockdep reports with:

    [23045.433975] =============================================
    [23045.434748] [ INFO: possible recursive locking detected ]
    [23045.435426] 4.7.0-rc6-btrfs-next-34+ #1 Not tainted
    [23045.436044] ---------------------------------------------
    [23045.436044] xfs_io/3688 is trying to acquire lock:
    [23045.436044] (&ei->log_mutex){+.+...}, at: [] btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044]
    but task is already holding lock:
    [23045.436044] (&ei->log_mutex){+.+...}, at: [] btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044]
    other info that might help us debug this:
    [23045.436044] Possible unsafe locking scenario:

    [23045.436044] CPU0
    [23045.436044] ----
    [23045.436044] lock(&ei->log_mutex);
    [23045.436044] lock(&ei->log_mutex);
    [23045.436044]
    *** DEADLOCK ***

    [23045.436044] May be due to missing lock nesting notation

    [23045.436044] 3 locks held by xfs_io/3688:
    [23045.436044] #0: (&sb->s_type->i_mutex_key#15){+.+...}, at: [] btrfs_sync_file+0x14e/0x425 [btrfs]
    [23045.436044] #1: (sb_internal#2){.+.+.+}, at: [] __sb_start_write+0x5f/0xb0
    [23045.436044] #2: (&ei->log_mutex){+.+...}, at: [] btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044]
    stack backtrace:
    [23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ #1
    [23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
    [23045.436044] 0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70
    [23045.436044] ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68
    [23045.436044] 0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68
    [23045.436044] Call Trace:
    [23045.436044] [] dump_stack+0x67/0x90
    [23045.436044] [] __lock_acquire+0xcbb/0xe4e
    [23045.436044] [] ? mark_lock+0x24/0x201
    [23045.436044] [] ? mark_held_locks+0x5e/0x74
    [23045.436044] [] lock_acquire+0x12f/0x1c3
    [23045.436044] [] ? lock_acquire+0x12f/0x1c3
    [23045.436044] [] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044] [] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044] [] mutex_lock_nested+0x77/0x3a7
    [23045.436044] [] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044] [] ? btrfs_release_delayed_node+0xb/0xd [btrfs]
    [23045.436044] [] btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044] [] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
    [23045.436044] [] ? vprintk_emit+0x453/0x465
    [23045.436044] [] btrfs_log_inode+0x66e/0xc95 [btrfs]
    [23045.436044] [] log_new_dir_dentries+0x26c/0x359 [btrfs]
    [23045.436044] [] btrfs_log_inode_parent+0x4a6/0x628 [btrfs]
    [23045.436044] [] btrfs_log_dentry_safe+0x5a/0x75 [btrfs]
    [23045.436044] [] btrfs_sync_file+0x304/0x425 [btrfs]
    [23045.436044] [] vfs_fsync_range+0x8c/0x9e
    [23045.436044] [] vfs_fsync+0x1c/0x1e
    [23045.436044] [] do_fsync+0x31/0x4a
    [23045.436044] [] SyS_fsync+0x10/0x14
    [23045.436044] [] entry_SYSCALL_64_fastpath+0x18/0xa8
    [23045.436044] [] ? trace_hardirqs_off_caller+0x3f/0xaa

    An example reproducer for this is:

    $ mkfs.btrfs -f /dev/sdb
    $ mount /dev/sdb /mnt
    $ mkdir /mnt/dir
    $ touch /mnt/dir/foo
    $ sync
    $ mv /mnt/dir/foo /mnt/dir/bar
    $ touch /mnt/dir/foo
    $ xfs_io -c "fsync" /mnt/dir/bar

    This is because while logging the inode of file bar we end up logging its
    parent directory (since its inode has an unlink_trans field matching the
    current transaction id due to the rename operation), which in turn logs
    the inodes for all its new dentries, so that the new inode for the new
    file named foo gets logged which in turn triggered another logging attempt
    for the inode we are fsync'ing, since that inode had an old name that
    corresponds to the name of the new inode.

    So fix this by ensuring that when logging the inode for a new dentry that
    has a name matching an old name of some other inode, we don't log again
    the original inode that we are fsync'ing.

    Fixes: 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    Filipe Manana
     

02 Mar, 2016

1 commit

  • If we delete a snapshot, fsync its parent directory and crash/power fail
    before the next transaction commit, on the next mount when we attempt to
    replay the log tree of the root containing the parent directory we will
    fail and prevent the filesystem from mounting, which is solvable by wiping
    out the log trees with the btrfs-zero-log tool but very inconvenient as
    we will lose any data and metadata fsynced before the parent directory
    was fsynced.

    For example:

    $ mkfs.btrfs -f /dev/sdc
    $ mount /dev/sdc /mnt
    $ mkdir /mnt/testdir
    $ btrfs subvolume snapshot /mnt /mnt/testdir/snap
    $ btrfs subvolume delete /mnt/testdir/snap
    $ xfs_io -c "fsync" /mnt/testdir
    < crash / power failure and reboot >
    $ mount /dev/sdc /mnt
    mount: mount(2) failed: No such file or directory

    And in dmesg/syslog we get the following message and trace:

    [192066.361162] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
    [192066.363010] ------------[ cut here ]------------
    [192066.365268] WARNING: CPU: 4 PID: 5130 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x17a/0x354 [btrfs]()
    [192066.367250] BTRFS: Transaction aborted (error -2)
    [192066.368401] Modules linked in: btrfs dm_flakey dm_mod ppdev sha256_generic xor raid6_pq hmac drbg ansi_cprng aesni_intel acpi_cpufreq tpm_tis aes_x86_64 tpm ablk_helper evdev cryptd sg parport_pc i2c_piix4 psmouse lrw parport i2c_core pcspkr gf128mul processor serio_raw glue_helper button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
    [192066.377154] CPU: 4 PID: 5130 Comm: mount Tainted: G W 4.4.0-rc6-btrfs-next-20+ #1
    [192066.378875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
    [192066.380889] 0000000000000000 ffff880143923670 ffffffff81257570 ffff8801439236b8
    [192066.382561] ffff8801439236a8 ffffffff8104ec07 ffffffffa039dc2c 00000000fffffffe
    [192066.384191] ffff8801ed31d000 ffff8801b9fc9c88 ffff8801086875e0 ffff880143923710
    [192066.385827] Call Trace:
    [192066.386373] [] dump_stack+0x4e/0x79
    [192066.387387] [] warn_slowpath_common+0x99/0xb2
    [192066.388429] [] ? __btrfs_unlink_inode+0x17a/0x354 [btrfs]
    [192066.389236] [] warn_slowpath_fmt+0x48/0x50
    [192066.389884] [] __btrfs_unlink_inode+0x17a/0x354 [btrfs]
    [192066.390621] [] ? iput+0xb0/0x266
    [192066.391200] [] btrfs_unlink_inode+0x1c/0x3d [btrfs]
    [192066.391930] [] check_item_in_log+0x1fe/0x29b [btrfs]
    [192066.392715] [] replay_dir_deletes+0x167/0x1cf [btrfs]
    [192066.393510] [] replay_one_buffer+0x417/0x570 [btrfs]
    [192066.394241] [] walk_up_log_tree+0x10e/0x1dc [btrfs]
    [192066.394958] [] walk_log_tree+0xa5/0x190 [btrfs]
    [192066.395628] [] btrfs_recover_log_trees+0x239/0x32c [btrfs]
    [192066.396790] [] ? replay_one_extent+0x50a/0x50a [btrfs]
    [192066.397891] [] open_ctree+0x1d8b/0x2167 [btrfs]
    [192066.398897] [] btrfs_mount+0x5ef/0x729 [btrfs]
    [192066.399823] [] ? trace_hardirqs_on+0xd/0xf
    [192066.400739] [] ? lockdep_init_map+0xb9/0x1b3
    [192066.401700] [] mount_fs+0x67/0x131
    [192066.402482] [] vfs_kern_mount+0x6c/0xde
    [192066.403930] [] btrfs_mount+0x1cb/0x729 [btrfs]
    [192066.404831] [] ? trace_hardirqs_on+0xd/0xf
    [192066.405726] [] ? lockdep_init_map+0xb9/0x1b3
    [192066.406621] [] mount_fs+0x67/0x131
    [192066.407401] [] vfs_kern_mount+0x6c/0xde
    [192066.408247] [] do_mount+0x893/0x9d2
    [192066.409047] [] ? strndup_user+0x3f/0x8c
    [192066.409842] [] SyS_mount+0x75/0xa1
    [192066.410621] [] entry_SYSCALL_64_fastpath+0x12/0x6b
    [192066.411572] ---[ end trace 2de42126c1e0a0f0 ]---
    [192066.412344] BTRFS: error (device dm-0) in __btrfs_unlink_inode:3986: errno=-2 No such entry
    [192066.413748] BTRFS: error (device dm-0) in btrfs_replay_log:2464: errno=-2 No such entry (Failed to recover log tree)
    [192066.415458] BTRFS error (device dm-0): cleaner transaction attach returned -30
    [192066.444613] BTRFS: open_ctree failed

    This happens because when we are replaying the log and processing the
    directory entry pointing to the snapshot in the subvolume tree, we treat
    its btrfs_dir_item item as having a location with a key type matching
    BTRFS_INODE_ITEM_KEY, which is wrong because the type matches
    BTRFS_ROOT_ITEM_KEY and therefore must be processed differently, as the
    object id refers to a root number and not to an inode in the root
    containing the parent directory.

    So fix this by triggering a transaction commit if an fsync against the
    parent directory is requested after deleting a snapshot. This is the
    simplest approach for a rare use case. Some alternative that avoids the
    transaction commit would require more code to explicitly delete the
    snapshot at log replay time (factoring out common code from ioctl.c:
    btrfs_ioctl_snap_destroy()), special care at fsync time to remove the
    log tree of the snapshot's root from the log root of the root of tree
    roots, amongst other steps.

    A test case for xfstests that triggers the issue follows.

    seq=`basename $0`
    seqres=$RESULT_DIR/$seq
    echo "QA output created by $seq"
    tmp=/tmp/$$
    status=1 # failure is the default!
    trap "_cleanup; exit \$status" 0 1 2 3 15

    _cleanup()
    {
    _cleanup_flakey
    cd /
    rm -f $tmp.*
    }

    # get standard environment, filters and checks
    . ./common/rc
    . ./common/filter
    . ./common/dmflakey

    # real QA test starts here
    _need_to_be_root
    _supported_fs btrfs
    _supported_os Linux
    _require_scratch
    _require_dm_target flakey
    _require_metadata_journaling $SCRATCH_DEV

    rm -f $seqres.full

    _scratch_mkfs >>$seqres.full 2>&1
    _init_flakey
    _mount_flakey

    # Create a snapshot at the root of our filesystem (mount point path), delete it,
    # fsync the mount point path, crash and mount to replay the log. This should
    # succeed and after the filesystem is mounted the snapshot should not be visible
    # anymore.
    _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap1
    _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap1
    $XFS_IO_PROG -c "fsync" $SCRATCH_MNT
    _flakey_drop_and_remount
    [ -e $SCRATCH_MNT/snap1 ] && \
    echo "Snapshot snap1 still exists after log replay"

    # Similar scenario as above, but this time the snapshot is created inside a
    # directory and not directly under the root (mount point path).
    mkdir $SCRATCH_MNT/testdir
    _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/testdir/snap2
    _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap2
    $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir
    _flakey_drop_and_remount
    [ -e $SCRATCH_MNT/testdir/snap2 ] && \
    echo "Snapshot snap2 still exists after log replay"

    _unmount_flakey

    echo "Silence is golden"
    status=0
    exit

    Signed-off-by: Filipe Manana
    Tested-by: Liu Bo
    Reviewed-by: Liu Bo
    Signed-off-by: Chris Mason

    Filipe Manana
     

27 Mar, 2015

1 commit

  • We can get into inconsistency between inodes and directory entries
    after fsyncing a directory. The issue is that while a directory gets
    the new dentries persisted in the fsync log and replayed at mount time,
    the link count of the inode that directory entries point to doesn't
    get updated, staying with an incorrect link count (smaller then the
    correct value). This later leads to stale file handle errors when
    accessing (including attempt to delete) some of the links if all the
    other ones are removed, which also implies impossibility to delete the
    parent directories, since the dentries can not be removed.

    Another issue is that (unlike ext3/4, xfs, f2fs, reiserfs, nilfs2),
    when fsyncing a directory, new files aren't logged (their metadata and
    dentries) nor any child directories. So this patch fixes this issue too,
    since it has the same resolution as the incorrect inode link count issue
    mentioned before.

    This is very easy to reproduce, and the following excerpt from my test
    case for xfstests shows how:

    _scratch_mkfs >> $seqres.full 2>&1
    _init_flakey
    _mount_flakey

    # Create our main test file and directory.
    $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foo | _filter_xfs_io
    mkdir $SCRATCH_MNT/mydir

    # Make sure all metadata and data are durably persisted.
    sync

    # Add a hard link to 'foo' inside our test directory and fsync only the
    # directory. The btrfs fsync implementation had a bug that caused the new
    # directory entry to be visible after the fsync log replay but, the inode
    # of our file remained with a link count of 1.
    ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_2

    # Add a few more links and new files.
    # This is just to verify nothing breaks or gives incorrect results after the
    # fsync log is replayed.
    ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_3
    $XFS_IO_PROG -f -c "pwrite -S 0xff 0 64K" $SCRATCH_MNT/hello | _filter_xfs_io
    ln $SCRATCH_MNT/hello $SCRATCH_MNT/mydir/hello_2

    # Add some subdirectories and new files and links to them. This is to verify
    # that after fsyncing our top level directory 'mydir', all the subdirectories
    # and their files/links are registered in the fsync log and exist after the
    # fsync log is replayed.
    mkdir -p $SCRATCH_MNT/mydir/x/y/z
    ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/foo_y_link
    ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/z/foo_z_link
    touch $SCRATCH_MNT/mydir/x/y/z/qwerty

    # Now fsync only our top directory.
    $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/mydir

    # And fsync now our new file named 'hello', just to verify later that it has
    # the expected content and that the previous fsync on the directory 'mydir' had
    # no bad influence on this fsync.
    $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/hello

    # Simulate a crash/power loss.
    _load_flakey_table $FLAKEY_DROP_WRITES
    _unmount_flakey

    _load_flakey_table $FLAKEY_ALLOW_WRITES
    _mount_flakey

    # Verify the content of our file 'foo' remains the same as before, 8192 bytes,
    # all with the value 0xaa.
    echo "File 'foo' content after log replay:"
    od -t x1 $SCRATCH_MNT/foo

    # Remove the first name of our inode. Because of the directory fsync bug, the
    # inode's link count was 1 instead of 5, so removing the 'foo' name ended up
    # deleting the inode and the other names became stale directory entries (still
    # visible to applications). Attempting to remove or access the remaining
    # dentries pointing to that inode resulted in stale file handle errors and
    # made it impossible to remove the parent directories since it was impossible
    # for them to become empty.
    echo "file 'foo' link count after log replay: $(stat -c %h $SCRATCH_MNT/foo)"
    rm -f $SCRATCH_MNT/foo

    # Now verify that all files, links and directories created before fsyncing our
    # directory exist after the fsync log was replayed.
    [ -f $SCRATCH_MNT/mydir/foo_2 ] || echo "Link mydir/foo_2 is missing"
    [ -f $SCRATCH_MNT/mydir/foo_3 ] || echo "Link mydir/foo_3 is missing"
    [ -f $SCRATCH_MNT/hello ] || echo "File hello is missing"
    [ -f $SCRATCH_MNT/mydir/hello_2 ] || echo "Link mydir/hello_2 is missing"
    [ -f $SCRATCH_MNT/mydir/x/y/foo_y_link ] || \
    echo "Link mydir/x/y/foo_y_link is missing"
    [ -f $SCRATCH_MNT/mydir/x/y/z/foo_z_link ] || \
    echo "Link mydir/x/y/z/foo_z_link is missing"
    [ -f $SCRATCH_MNT/mydir/x/y/z/qwerty ] || \
    echo "File mydir/x/y/z/qwerty is missing"

    # We expect our file here to have a size of 64Kb and all the bytes having the
    # value 0xff.
    echo "file 'hello' content after log replay:"
    od -t x1 $SCRATCH_MNT/hello

    # Now remove all files/links, under our test directory 'mydir', and verify we
    # can remove all the directories.
    rm -f $SCRATCH_MNT/mydir/x/y/z/*
    rmdir $SCRATCH_MNT/mydir/x/y/z
    rm -f $SCRATCH_MNT/mydir/x/y/*
    rmdir $SCRATCH_MNT/mydir/x/y
    rmdir $SCRATCH_MNT/mydir/x
    rm -f $SCRATCH_MNT/mydir/*
    rmdir $SCRATCH_MNT/mydir

    # An fsck, run by the fstests framework everytime a test finishes, also detected
    # the inconsistency and printed the following error message:
    #
    # root 5 inode 257 errors 2001, no inode item, link count wrong
    # unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
    # unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref

    status=0
    exit

    The expected golden output for the test is:

    wrote 8192/8192 bytes at offset 0
    XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    wrote 65536/65536 bytes at offset 0
    XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    File 'foo' content after log replay:
    0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
    *
    0020000
    file 'foo' link count after log replay: 5
    file 'hello' content after log replay:
    0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    *
    0200000

    Which is the output after this patch and when running the test against
    ext3/4, xfs, f2fs, reiserfs or nilfs2. Without this patch, the test's
    output is:

    wrote 8192/8192 bytes at offset 0
    XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    wrote 65536/65536 bytes at offset 0
    XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    File 'foo' content after log replay:
    0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
    *
    0020000
    file 'foo' link count after log replay: 1
    Link mydir/foo_2 is missing
    Link mydir/foo_3 is missing
    Link mydir/x/y/foo_y_link is missing
    Link mydir/x/y/z/foo_z_link is missing
    File mydir/x/y/z/qwerty is missing
    file 'hello' content after log replay:
    0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    *
    0200000
    rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y/z': No such file or directory
    rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y': No such file or directory
    rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x': No such file or directory
    rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_2': Stale file handle
    rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_3': Stale file handle
    rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir': Directory not empty

    Fsck, without this fix, also complains about the wrong link count:

    root 5 inode 257 errors 2001, no inode item, link count wrong
    unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
    unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref

    So fix this by logging the inodes that the dentries point to when
    fsyncing a directory.

    A test case for xfstests follows.

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

19 Sep, 2014

1 commit

  • When we do a fast fsync, we start all ordered operations and then while
    they're running in parallel we visit the list of modified extent maps
    and construct their matching file extent items and write them to the
    log btree. After that, in btrfs_sync_log() we wait for all the ordered
    operations to finish (via btrfs_wait_logged_extents).

    The problem with this is that we were completely ignoring errors that
    can happen in the extent write path, such as -ENOSPC, a temporary -ENOMEM
    or -EIO errors for example. When such error happens, it means we have parts
    of the on disk extent that weren't written to, and so we end up logging
    file extent items that point to these extents that contain garbage/random
    data - so after a crash/reboot plus log replay, we get our inode's metadata
    pointing to those extents.

    This worked in contrast with the full (non-fast) fsync path, where we
    start all ordered operations, wait for them to finish and then write
    to the log btree. In this path, after each ordered operation completes
    we check if it's flagged with an error (BTRFS_ORDERED_IOERR) and return
    -EIO if so (via btrfs_wait_ordered_range).

    So if an error happens with any ordered operation, just return a -EIO
    error to userspace, so that it knows that not all of its previous writes
    were durably persisted and the application can take proper action (like
    redo the writes for e.g.) - and definitely not leave any file extent items
    in the log refer to non fully written extents.

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

09 Sep, 2014

1 commit

  • While we're doing a full fsync (when the inode has the flag
    BTRFS_INODE_NEEDS_FULL_SYNC set) that is ranged too (covers only a
    portion of the file), we might have ordered operations that are started
    before or while we're logging the inode and that fall outside the fsync
    range.

    Therefore when a full ranged fsync finishes don't remove every extent
    map from the list of modified extent maps - as for some of them, that
    fall outside our fsync range, their respective ordered operation hasn't
    finished yet, meaning the corresponding file extent item wasn't inserted
    into the fs/subvol tree yet and therefore we didn't log it, and we must
    let the next fast fsync (one that checks only the modified list) see this
    extent map and log a matching file extent item to the log btree and wait
    for its ordered operation to finish (if it's still ongoing).

    A test case for xfstests follows.

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

10 Jun, 2014

1 commit


11 Mar, 2014

2 commits

  • We might commit the log sub-transaction which didn't contain the metadata we
    logged. It was because we didn't record the log transid and just select
    the current log sub-transaction to commit, but the right one might be
    committed by the other task already. Actually, we needn't do anything
    and it is safe that we go back directly in this case.

    This patch improves the log sync by the above idea. We record the transid
    of the log sub-transaction in which we log the metadata, and the transid
    of the log sub-transaction we have committed. If the committed transid
    is >= the transid we record when logging the metadata, we just go back.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • It is possible that many tasks sync the log tree at the same time, but
    only one task can do the sync work, the others will wait for it. But those
    wait tasks didn't get the result of the log sync, and returned 0 when they
    ended the wait. It caused those tasks skipped the error handle, and the
    serious problem was they told the users the file sync succeeded but in
    fact they failed.

    This patch fixes this problem by introducing a log context structure,
    we insert it into the a global list. When the sync fails, we will set
    the error number of every log context in the list, then the waiting tasks
    get the error number of the log context and handle the error if need.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     

07 May, 2013

1 commit

  • Big patch, but all it does is add statics to functions which
    are in fact static, then remove the associated dead-code fallout.

    removed functions:

    btrfs_iref_to_path()
    __btrfs_lookup_delayed_deletion_item()
    __btrfs_search_delayed_insertion_item()
    __btrfs_search_delayed_deletion_item()
    find_eb_for_page()
    btrfs_find_block_group()
    range_straddles_pages()
    extent_range_uptodate()
    btrfs_file_extent_length()
    btrfs_scrub_cancel_devid()
    btrfs_start_transaction_lflush()

    btrfs_print_tree() is left because it is used for debugging.
    btrfs_start_transaction_lflush() and btrfs_reada_detach() are
    left for symmetry.

    ulist.c functions are left, another patch will take care of those.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Josef Bacik

    Eric Sandeen
     

22 Mar, 2012

1 commit


04 May, 2011

1 commit


25 May, 2010

1 commit


14 Oct, 2009

1 commit

  • rpm has a habit of running fdatasync when the file hasn't
    changed. We already detect if a file hasn't been changed
    in the current transaction but it might have been sent to
    the tree-log in this transaction and not changed since
    the last call to fsync.

    In this case, we want to avoid a tree log sync, which includes
    a number of synchronous writes and barriers. This commit
    extends the existing tracking of the last transaction to change
    a file to also track the last sub-transaction.

    The end result is that rpm -ivh and -Uvh are roughly twice as fast,
    and on par with ext3.

    Signed-off-by: Chris Mason

    Chris Mason
     

25 Mar, 2009

1 commit

  • The tree logging code allows individual files or directories to be logged
    without including operations on other files and directories in the FS.
    It tries to commit the minimal set of changes to disk in order to
    fsync the single file or directory that was sent to fsync or O_SYNC.

    The tree logging code was allowing files and directories to be unlinked
    if they were part of a rename operation where only one directory
    in the rename was in the fsync log. This patch adds a few new rules
    to the tree logging.

    1) on rename or unlink, if the inode being unlinked isn't in the fsync
    log, we must force a full commit before doing an fsync of the directory
    where the unlink was done. The commit isn't done during the unlink,
    but it is forced the next time we try to log the parent directory.

    Solution: record transid of last unlink/rename per directory when the
    directory wasn't already logged. For renames this is only done when
    renaming to a different directory.

    mkdir foo/some_dir
    normal commit
    rename foo/some_dir foo2/some_dir
    mkdir foo/some_dir
    fsync foo/some_dir/some_file

    The fsync above will unlink the original some_dir without recording
    it in its new location (foo2). After a crash, some_dir will be gone
    unless the fsync of some_file forces a full commit

    2) we must log any new names for any file or dir that is in the fsync
    log. This way we make sure not to lose files that are unlinked during
    the same transaction.

    2a) we must log any new names for any file or dir during rename
    when the directory they are being removed from was logged.

    2a is actually the more important variant. Without the extra logging
    a crash might unlink the old name without recreating the new one

    3) after a crash, we must go through any directories with a link count
    of zero and redo the rm -rf

    mkdir f1/foo
    normal commit
    rm -rf f1/foo
    fsync(f1)

    The directory f1 was fully removed from the FS, but fsync was never
    called on f1, only its parent dir. After a crash the rm -rf must
    be replayed. This must be able to recurse down the entire
    directory tree. The inode link count fixup code takes care of the
    ugly details.

    Signed-off-by: Chris Mason

    Chris Mason
     

25 Sep, 2008

1 commit