29 Sep, 2008

2 commits

  • There's a race between mm->owner assignment and swapoff, more easily
    seen when task slab poisoning is turned on. The condition occurs when
    try_to_unuse() runs in parallel with an exiting task. A similar race
    can occur with callers of get_task_mm(), such as /proc//
    or ptrace or page migration.

    CPU0 CPU1
    try_to_unuse
    looks at mm = task0->mm
    increments mm->mm_users
    task 0 exits
    mm->owner needs to be updated, but no
    new owner is found (mm_users > 1, but
    no other task has task->mm = task0->mm)
    mm_update_next_owner() leaves
    mmput(mm) decrements mm->mm_users
    task0 freed
    dereferencing mm->owner fails

    The fix is to notify the subsystem via mm_owner_changed callback(),
    if no new owner is found, by specifying the new task as NULL.

    Jiri Slaby:
    mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
    must be set after that, so as not to pass NULL as old owner causing oops.

    Daisuke Nishimura:
    mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
    and its callers need to take account of this situation to avoid oops.

    Hugh Dickins:
    Lockdep warning and hang below exec_mmap() when testing these patches.
    exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
    so exec_mmap() now needs to do the same. And with that repositioning,
    there's now no point in mm_need_new_owner() allowing for NULL mm.

    Reported-by: Hugh Dickins
    Signed-off-by: Balbir Singh
    Signed-off-by: Jiri Slaby
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • The VFS interface for the 'd_compare()' is a bit special (read: 'odd'),
    because it really just essentially replaces a memcmp(). The filesystem
    is supposed to just compare the two names with whatever case-independent
    or other function.

    And when I say 'is supposed to', I obviously mean that 'procfs does odd
    things, and actually looks at the dentry that we don't even pass down,
    rather than just the name'. Which results in problems, because we
    actually call d_compare before we have even verified that the dentry is
    still hashed at all.

    And that causes a problm since the inode that procfs looks at may have
    been free'd and the d_inode pointer is NULL. procfs just assumes that
    all dentries are positive, since procfs itself never generates a
    negative one. But memory pressure will still result in the dentry
    getting torn down, and as it is removed by RCU, it still remains visible
    on some lists - and to d_compare.

    If the filesystem just did a name comparison, we wouldn't care. And we
    could just fix procfs to know about negative dentries too. But rather
    than have the low-level filesystems know about internal VFS details,
    just move the check for a unhashed dentry up a bit, so that we will only
    call d_compare on dentries that are still active.

    The actual oops this caused didn't look like a NULL pointer dereference
    because procfs did a 'container_of(inode, struct proc_inode, vfs_inode)'
    to get at its internal proc_inode information from the inode pointer,
    and accessed a field below the inode. So the oops would look something
    like

    BUG: unable to handle kernel paging request at fffffffffffffff0
    IP: [] proc_sys_compare+0x36/0x50

    and was seen on both x86-64 (Alexey Dobriyan and Hugh Dickins) and
    ppc64 (Hugh Dickins).

    Reported-by: Alexey Dobriyan
    Acked-by: Hugh Dickins
    Cc: Al Viro
    Reviewed-by: "Eric W. Biederman"
    Signed-of-by: Linus Torvalds

    Linus Torvalds
     

26 Sep, 2008

4 commits

  • * git://oss.sgi.com:8090/xfs/linux-2.6:
    [XFS] Remove xfs_iext_irec_compact_full()
    [XFS] Fix extent list corruption in xfs_iext_irec_compact_full().

    Linus Torvalds
     
  • * 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6:
    UBIFS: fix printk format warnings
    UBIFS: remove incorrect assert
    UBIFS: TNC / GC race fixes
    UBIFS: create the name of the background thread in every case

    Linus Torvalds
     
  • Yet another bug was found in xfs_iext_irec_compact_full() and while the
    source of the bug was found it wasn't an easy task to track it down
    because the conditions are very difficult to reproduce.

    A HUGE thank-you goes to Russell Cattelan and Eric Sandeen for their
    significant effort in tracking down the source of this corruption.

    xfs_iext_irec_compact_full() and xfs_iext_irec_compact_pages() are almost
    identical - they both compact indirect extent lists by moving extents from
    subsequent buffers into earlier ones. xfs_iext_irec_compact_pages() only
    moves extents if all of the extents in the next buffer will fit into the
    empty space in the buffer before it. xfs_iext_irec_compact_full() will go
    a step further and move part of the next buffer if all the extents wont
    fit. It will then shift the remaining extents in the next buffer up to the
    start of the buffer. The bug here was that we did not update er_extoff and
    this caused extent list corruption.

    It does not appear that this extra functionality gains us much. Calling
    xfs_iext_irec_compact_pages() instead will do a good enough job at
    compacting the indirect list and will be quicker too.

    For the case in xfs_iext_indirect_to_direct() the total number of extents
    in the indirect list will fit into one buffer so we will never need the
    extra functionality of xfs_iext_irec_compact_full() there.

    Also xfs_iext_irec_compact_pages() doesn't need to do a memmove() (the
    buffers will never overlap) so we don't want the performance hit that can
    incur.

    SGI-PV: 987159

    SGI-Modid: xfs-linux-melb:xfs-kern:32166a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Eric Sandeen

    Lachlan McIlroy
     
  • If we don't move all the records from the next buffer into the current
    buffer then we need to update the er_extoff field of the next buffer as we
    shift the remaining records to the start of the buffer.

    SGI-PV: 987159

    SGI-Modid: xfs-linux-melb:xfs-kern:32165a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Eric Sandeen
    Signed-off-by: Russell Cattelan

    Lachlan McIlroy
     

25 Sep, 2008

1 commit

  • In case of error, the function p9_client_walk returns an ERR pointer, but
    never returns a NULL pointer. So a NULL test that comes after an IS_ERR
    test should be deleted.

    The semantic match that finds this problem is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @match_bad_null_test@
    expression x, E;
    statement S1,S2;
    @@
    x = p9_client_walk(...)
    ... when != x = E
    * if (x != NULL)
    S1 else S2
    //

    Signed-off-by: Julien Brunel
    Signed-off-by: Julia Lawall
    Signed-off-by: Eric Van Hensbergen
    Signed-off-by: Andrew Morton

    Julien Brunel
     

18 Sep, 2008

1 commit

  • fs/ubifs/dir.c:428: warning: format '%llu' expects type 'long long
    unsigned int', but argument 5 has type 'long unsigned int'

    fs/ubifs/debug.c:541: warning: format '%llu' expects type 'long long
    unsigned int', but argument 2 has type 'long unsigned int'

    Signed-off-by: Alexander Beregalov
    Signed-off-by: Artem Bityutskiy

    Alexander Beregalov
     

17 Sep, 2008

10 commits

  • The assert was not valid because one of the variables
    'taken_empty_lebs' has transient values out of sync
    with the other variables.

    Signed-off-by: Adrian Hunter
    Signed-off-by: Artem Bityutskiy

    Adrian Hunter
     
  • - update GC sequence number if any nodes may have been moved
    even if GC did not finish the LEB
    - don't ignore error return when reading

    Signed-off-by: Adrian Hunter
    Signed-off-by: Artem Bityutskiy

    Adrian Hunter
     
  • If the ubifs partition is mounted RO and then remounted RW we end
    up with no thread name in ubifs_remount_rw() and the thread appears
    nameless.

    Signed-off-by: Sebastian Siewior
    Signed-off-by: Artem Bityutskiy

    Sebastian Siewior
     
  • When unreserving space with boundaries that are not block aligned we round
    up the start and round down the end boundaries and then use this function,
    xfs_zero_remaining_bytes(), to zero the parts of the blocks that got
    dropped during the rounding. The problem is we don't consider if these
    blocks are beyond eof. Worse still is if we encounter delayed allocations
    beyond eof we will try to use the magic delayed allocation block number as
    a real block number. If the file size is ever extended to expose these
    blocks then we'll go through xfs_zero_eof() to zero them anyway.

    SGI-PV: 983683

    SGI-Modid: xfs-linux-melb:xfs-kern:32055a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     
  • We have a use-after-free issue where log completions access buffers via
    the buffer log item and the buffer has already been freed. Fix this by
    taking a reference on the buffer when attaching the buffer log item and
    release the hold when the buffer log item is detached and we no longer
    need the buffer. Also create a new function xfs_buf_item_free() to combine
    some common code.

    SGI-PV: 985757

    SGI-Modid: xfs-linux-melb:xfs-kern:32025a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     
  • If we call xfs_lock_two_inodes() to grab both the iolock and the ilock,
    then drop the ilocks on both inodes, then grab them again (as
    xfs_swap_extents() does) then lockdep will report a locking order problem.
    This is a false positive.

    To avoid this, disallow xfs_lock_two_inodes() fom locking both inode locks
    at once - force calers to make two separate calls. This means that nested
    dropping and regaining of the ilocks will retain the same lockdep subclass
    and so lockdep will not see anything wrong with this code.

    SGI-PV: 986238

    SGI-Modid: xfs-linux-melb:xfs-kern:31999a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Peter Leckie
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • The current code in xlog_iodone() uses the wrong macro to check if the
    barrier has been cleared due to an EOPNOTSUPP error form the lower layer.

    SGI-PV: 986143

    SGI-Modid: xfs-linux-melb:xfs-kern:31984a

    Signed-off-by: David Chinner
    Signed-off-by: Nathaniel W. Turner
    Signed-off-by: Peter Leckie
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • With the help from some tracing I found that we try to map extents beyond
    eof when doing a direct I/O read. It appears that the way to inform the
    generic direct I/O path (ie do_direct_IO()) that we have breached eof is
    to return an unmapped buffer from xfs_get_blocks_direct(). This will cause
    do_direct_IO() to jump to the hole handling code where is will check for
    eof and then abort.

    This problem was found because a direct I/O read was trying to map beyond
    eof and was encountering delayed allocations. The delayed allocations
    beyond eof are speculative allocations and they didn't get converted when
    the direct I/O flushed the file because there was only enough space in the
    current AG to convert and write out the dirty pages within eof. Note that
    xfs_iomap_write_allocate() wont necessarily convert all the delayed
    allocation passed to it - it will return after allocating the first extent
    - so if the delayed allocation extends beyond eof then it will stay that
    way.

    SGI-PV: 983683

    SGI-Modid: xfs-linux-melb:xfs-kern:31929a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     
  • Logically we would return an error in xfs_fs_remount code to prevent users
    from believing they might have changed mount options using remount which
    can't be changed.

    But unfortunately mount(8) adds all options from mtab and fstab to the
    mount arguments in some cases so we can't blindly reject options, but have
    to check for each specified option if it actually differs from the
    currently set option and only reject it if that's the case.

    Until that is implemented we return success for every remount request, and
    silently ignore all options that we can't actually change.

    SGI-PV: 985710

    SGI-Modid: xfs-linux-melb:xfs-kern:31908a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Memory allocations for log->l_grant_trace and iclog->ic_trace are done on
    demand when the first event is logged. In xlog_state_get_iclog_space() we
    call xlog_trace_iclog() under a spinlock and allocating memory here can
    cause us to sleep with a spinlock held and deadlock the system.

    For the log grant tracing we use KM_NOSLEEP but that means we can lose
    trace entries. Since there is no locking to serialize the log grant
    tracing we could race and have multiple allocations and leak memory.

    So move the allocations to where we initialize the log/iclog structures.
    Use KM_NOFS to avoid recursing into the filesystem and drop log->l_trace
    since it's not even used.

    SGI-PV: 983738

    SGI-Modid: xfs-linux-melb:xfs-kern:31896a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     

14 Sep, 2008

4 commits

  • Herton Krzesinski reports that the error-checking changes in
    04ebd4aee52b06a2c38127d9208546e5b96f3a19 ("block/ioctl.c and
    fs/partition/check.c: check value returned by add_partition") cause his
    buggy USB camera to no longer mount. "The camera is an Olympus X-840.
    The original issue comes from the camera itself: its format program
    creates a partition with an off by one error".

    Buggy devices happen. It is better for the kernel to warn and to proceed
    with the mount.

    Reported-by: Herton Ronaldo Krzesinski
    Cc: Abdel Benamrouche
    Cc: Jens Axboe
    Cc: Alan Stern
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • A "Quicklists: 0 kB" line has just started appearing in
    /proc/meminfo, but most architectures (including x86) don't have
    them configured, so #ifdef it, like the highmem lines.

    And those architectures which do have quicklists configured are
    using them for page tables: so let's place it next to PageTables.

    Signed-off-by: Hugh Dickins
    Acked-by: Christoph Lameter
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • This fixes:

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.27-rc5-00283-g70bb089 #68
    ---------------------------------------------
    touch/6855 is trying to acquire lock:
    (&info->bfs_lock){--..}, at: [] bfs_delete_inode+0x9e/0x18c

    but task is already holding lock:
    (&info->bfs_lock){--..}, at: [] bfs_create+0x45/0x187

    other info that might help us debug this:
    2 locks held by touch/6855:
    #0: (&type->i_mutex_dir_key#5){--..}, at: [] do_filp_open+0x10b/0x62f
    #1: (&info->bfs_lock){--..}, at: [] bfs_create+0x45/0x187

    stack backtrace:
    Pid: 6855, comm: touch Not tainted 2.6.27-rc5-00283-g70bb089 #68
    [] validate_chain+0x458/0x9f4
    [] ? trace_hardirqs_off+0xb/0xd
    [] __lock_acquire+0x666/0x6e0
    [] lock_acquire+0x5b/0x77
    [] ? bfs_delete_inode+0x9e/0x18c
    [] mutex_lock_nested+0xbc/0x234
    [] ? bfs_delete_inode+0x9e/0x18c
    [] ? bfs_delete_inode+0x9e/0x18c
    [] bfs_delete_inode+0x9e/0x18c
    [] ? bfs_delete_inode+0x0/0x18c
    [] generic_delete_inode+0x94/0xfe
    [] generic_drop_inode+0x12/0x12f
    [] iput+0x4b/0x4e
    [] bfs_create+0x163/0x187
    [] vfs_create+0xa6/0x114
    [] do_filp_open+0x1ad/0x62f
    [] ? native_sched_clock+0x82/0x96
    [] ? _spin_unlock+0x27/0x3c
    [] ? alloc_fd+0xbf/0xc9
    [] ? sub_preempt_count+0x9d/0xab
    [] ? alloc_fd+0xbf/0xc9
    [] do_sys_open+0x42/0xb8
    [] ? trace_hardirqs_on_thunk+0xc/0x10
    [] sys_open+0x1e/0x26
    [] sysenter_do_call+0x12/0x31
    =======================

    The problem is that we don't unlock the bfs->lock mutex before calling
    iput (we do in the other cases).

    Signed-off-by: Eric Sesterhenn
    Cc: Tigran Aivazian
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sesterhenn
     
  • Print parent directory name as well.

    The aim is to catch non-creation of parent directory when proc_mkdir will
    return NULL and all subsequent registrations go directly in /proc instead
    of intended directory.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    [ Fixed insane printk string while at it. - Linus ]
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

11 Sep, 2008

1 commit


10 Sep, 2008

2 commits

  • ocfs2 will become read-only if we try to read the bytes which pass
    the end of i_size. This can be easily reproduced by following steps:
    1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse.
    2. create a small file(say less than 100 bytes) and we will create the file
    which is allocated 1 cluster.
    3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit.
    4. The ocfs2 volume becomes read-only and dmesg shows:
    OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks:
    Inode 66010 has a hole at block 1
    File system is now read-only due to the potential of on-disk corruption.
    Please run fsck.ocfs2 once the file system is unmounted.

    So suppress the ERROR message.

    Signed-off-by: Tao Ma
    Signed-off-by: Mark Fasheh

    Tao Ma
     
  • * 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6:
    UBIFS: make minimum fanout 3
    UBIFS: fix division by zero
    UBIFS: amend f_fsid
    UBIFS: fill f_fsid
    UBIFS: improve statfs reporting even more
    UBIFS: introduce LEB overhead
    UBIFS: add forgotten gc_idx_lebs component
    UBIFS: fix assertion
    UBIFS: improve statfs reporting
    UBIFS: remove incorrect index space check
    UBIFS: push empty flash hack down
    UBIFS: do not update min_idx_lebs in stafs
    UBIFS: allow for racing between GC and TNC
    UBIFS: always read hashed-key nodes under TNC mutex
    UBIFS: fix zero-length truncations

    Linus Torvalds
     

09 Sep, 2008

2 commits

  • Automounter maps can contain mount options valid for other NFS
    implementations but not for Linux. The Linux automounter uses the
    mount command's "-s" command line option ("s" for "sloppy") so that
    mount requests containing such options are not rejected.

    Commit f45663ce5fb30f76a3414ab3ac69f4dd320e760a attempted to address a
    known regression with text-based NFS mount option parsing. Unrecognized
    mount options would cause mount requests to fail, even if the "-s"
    option was used on the mount command line.

    Unfortunately, this commit was not complete as submitted. It adds a
    new mount option, "sloppy". But it is missing a hunk, so it now allows
    NFS mounts with unrecognized mount options, even if the "sloppy" option
    is not present. This could be a problem if a required critical mount
    option such as "sync" is misspelled, for example, and is considered a
    regression from 2.6.26.

    This patch restores the missing hunk. Now, the default behavior of
    text-based NFS mount options is as before: any unrecognized mount option
    will cause the mount to fail.

    Please include this in 2.6.27-rc.

    Thanks to Neil Brown for reporting this.

    Signed-off-by: Chuck Lever
    Acked-by: J. Bruce Fields
    Signed-off-by: Linus Torvalds

    Chuck Lever
     
  • UDF currently doesn't set a llseek method for regular files, which
    means it will fall back to default_llseek. This means no one can seek
    beyond 2 Gigabytes on udf, and that there's not protection vs
    the i_size updates from writers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Christoph Hellwig
     

06 Sep, 2008

3 commits

  • UBIFS does not really work correctly when fanout is 2,
    because of the way we manage the indexing tree. It may
    just become a list and UBIFS screws up.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • If fanout is 3, we have division by zero in
    'ubifs_read_superblock()':

    divide error: 0000 [#1] PREEMPT SMP

    Pid: 28744, comm: mount Not tainted (2.6.27-rc4-ubifs-2.6 #23)
    EIP: 0060:[] EFLAGS: 00010202 CPU: 0
    EIP is at ubifs_reported_space+0x2d/0x69 [ubifs]
    EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
    ESI: 00000000 EDI: f0ae64b0 EBP: f1f9fcf4 ESP: f1f9fce0
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Spencer reported a problem where utime and stime were going negative despite
    the fixes in commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. The suspected
    reason for the problem is that signal_struct maintains it's own utime and
    stime (of exited tasks), these are not updated using the new task_utime()
    routine, hence sig->utime can go backwards and cause the same problem
    to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
    fixes the problem

    TODO: using max(task->prev_utime, derived utime) works for now, but a more
    generic solution is to implement cputime_max() and use the cputime_gt()
    function for comparison.

    Reported-by: spencer@bluehost.com
    Signed-off-by: Balbir Singh
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Balbir Singh
     

03 Sep, 2008

4 commits

  • David Woodhouse suggested to be consistent with other FSes
    and xor the beginning and the end of the UUID.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Quicklists can consume several GB of memory. We should provide a means of
    monitoring this.

    After this patch is applied, /proc/meminfo will output the following:

    % cat /proc/meminfo

    MemTotal: 7715392 kB
    MemFree: 5401600 kB
    Buffers: 80384 kB
    Cached: 300800 kB
    SwapCached: 0 kB
    Active: 235584 kB
    Inactive: 262656 kB
    SwapTotal: 2031488 kB
    SwapFree: 2031488 kB
    Dirty: 3520 kB
    Writeback: 0 kB
    AnonPages: 117696 kB
    Mapped: 38528 kB
    Slab: 1589952 kB
    SReclaimable: 23104 kB
    SUnreclaim: 1566848 kB
    PageTables: 14656 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 5889152 kB
    Committed_AS: 393152 kB
    VmallocTotal: 17592177655808 kB
    VmallocUsed: 29056 kB
    VmallocChunk: 17592177626432 kB
    Quicklists: 130944 kB
    HugePages_Total: 0
    HugePages_Free: 0
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 262144 kB

    Signed-off-by: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Keiichiro Tokunaga
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Update the location of the NTFS homepage in several files.

    Signed-off-by: Adrian Bunk
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • * 'for-2.6.27' of git://linux-nfs.org/~bfields/linux:
    nfsd: fix buffer overrun decoding NFSv4 acl
    sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports
    nfsd: fix compound state allocation error handling
    svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet

    Linus Torvalds
     

02 Sep, 2008

2 commits


31 Aug, 2008

4 commits

  • UBIFS stores 16-bit UUID in the superblock, and it is a good
    idea to return part of it in 'f_fsid' filed of kstatfs structure.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Since free space we report in statfs is file size which should
    fit to the FS - change the way we calculate free space and use
    leb_overhead instead of dark_wm in calculations.

    Results of "freespace" test (120MiB volume, 16KiB LEB size,
    512 bytes page size). Before the change:

    freespace: Test 1: fill the space we have 3 times
    freespace: was free: 85204992 bytes 81.3 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 11284480 bytes 10.8 MiB, wrote 13.2% more than predicted
    freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 12935168 bytes 12.3 MiB, wrote 15.5% more than predicted
    freespace: was free: 83554304 bytes 79.7 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 12939264 bytes 12.3 MiB, wrote 15.5% more than predicted
    freespace: Test 1 finished

    freespace: Test 2: gradually lessen amount of free space and fill the FS
    freespace: do 10 steps, lessen free space by 7596218 bytes 7.2 MiB each time
    freespace: was free: 78675968 bytes 75.0 MiB, wrote: 88903680 bytes 84.8 MiB, delta: 10227712 bytes 9.8 MiB, wrote 13.0% more than predicted
    freespace: was free: 72015872 bytes 68.7 MiB, wrote: 81514496 bytes 77.7 MiB, delta: 9498624 bytes 9.1 MiB, wrote 13.2% more than predicted
    freespace: was free: 63938560 bytes 61.0 MiB, wrote: 72589312 bytes 69.2 MiB, delta: 8650752 bytes 8.2 MiB, wrote 13.5% more than predicted
    freespace: was free: 56127488 bytes 53.5 MiB, wrote: 63762432 bytes 60.8 MiB, delta: 7634944 bytes 7.3 MiB, wrote 13.6% more than predicted
    freespace: was free: 48336896 bytes 46.1 MiB, wrote: 54935552 bytes 52.4 MiB, delta: 6598656 bytes 6.3 MiB, wrote 13.7% more than predicted
    freespace: was free: 40587264 bytes 38.7 MiB, wrote: 46157824 bytes 44.0 MiB, delta: 5570560 bytes 5.3 MiB, wrote 13.7% more than predicted
    freespace: was free: 32841728 bytes 31.3 MiB, wrote: 37384192 bytes 35.7 MiB, delta: 4542464 bytes 4.3 MiB, wrote 13.8% more than predicted
    freespace: was free: 25100288 bytes 23.9 MiB, wrote: 28618752 bytes 27.3 MiB, delta: 3518464 bytes 3.4 MiB, wrote 14.0% more than predicted
    freespace: was free: 17342464 bytes 16.5 MiB, wrote: 19841024 bytes 18.9 MiB, delta: 2498560 bytes 2.4 MiB, wrote 14.4% more than predicted
    freespace: was free: 9605120 bytes 9.2 MiB, wrote: 11063296 bytes 10.6 MiB, delta: 1458176 bytes 1.4 MiB, wrote 15.2% more than predicted
    freespace: Test 2 finished

    freespace: Test 3: gradually lessen amount of free space by trashing and fill the FS
    freespace: do 10 steps, lessen free space by 7606272 bytes 7.3 MiB each time
    freespace: trashing: was free: 83668992 bytes 79.8 MiB, need free: 7606272 bytes 7.3 MiB, files created: 248297, delete 225724 (90.9% of them)
    freespace: was free: 70803456 bytes 67.5 MiB, wrote: 82485248 bytes 78.7 MiB, delta: 11681792 bytes 11.1 MiB, wrote 16.5% more than predicted
    freespace: trashing: was free: 81080320 bytes 77.3 MiB, need free: 15212544 bytes 14.5 MiB, files created: 248711, delete 202047 (81.2% of them)
    freespace: was free: 59867136 bytes 57.1 MiB, wrote: 71897088 bytes 68.6 MiB, delta: 12029952 bytes 11.5 MiB, wrote 20.1% more than predicted
    freespace: trashing: was free: 82243584 bytes 78.4 MiB, need free: 22818816 bytes 21.8 MiB, files created: 248866, delete 179817 (72.3% of them)
    freespace: was free: 50905088 bytes 48.5 MiB, wrote: 63168512 bytes 60.2 MiB, delta: 12263424 bytes 11.7 MiB, wrote 24.1% more than predicted
    freespace: trashing: was free: 83402752 bytes 79.5 MiB, need free: 30425088 bytes 29.0 MiB, files created: 248920, delete 158114 (63.5% of them)
    freespace: was free: 42651648 bytes 40.7 MiB, wrote: 55406592 bytes 52.8 MiB, delta: 12754944 bytes 12.2 MiB, wrote 29.9% more than predicted
    freespace: trashing: was free: 84402176 bytes 80.5 MiB, need free: 38031360 bytes 36.3 MiB, files created: 248709, delete 136641 (54.9% of them)
    freespace: was free: 35233792 bytes 33.6 MiB, wrote: 48250880 bytes 46.0 MiB, delta: 13017088 bytes 12.4 MiB, wrote 36.9% more than predicted
    freespace: trashing: was free: 82530304 bytes 78.7 MiB, need free: 45637632 bytes 43.5 MiB, files created: 248778, delete 111208 (44.7% of them)
    freespace: was free: 27287552 bytes 26.0 MiB, wrote: 40267776 bytes 38.4 MiB, delta: 12980224 bytes 12.4 MiB, wrote 47.6% more than predicted
    freespace: trashing: was free: 85114880 bytes 81.2 MiB, need free: 53243904 bytes 50.8 MiB, files created: 248508, delete 93052 (37.4% of them)
    freespace: was free: 22437888 bytes 21.4 MiB, wrote: 35328000 bytes 33.7 MiB, delta: 12890112 bytes 12.3 MiB, wrote 57.4% more than predicted
    freespace: trashing: was free: 84103168 bytes 80.2 MiB, need free: 60850176 bytes 58.0 MiB, files created: 248637, delete 68743 (27.6% of them)
    freespace: was free: 15536128 bytes 14.8 MiB, wrote: 28319744 bytes 27.0 MiB, delta: 12783616 bytes 12.2 MiB, wrote 82.3% more than predicted
    freespace: trashing: was free: 84357120 bytes 80.4 MiB, need free: 68456448 bytes 65.3 MiB, files created: 248567, delete 46852 (18.8% of them)
    freespace: was free: 9015296 bytes 8.6 MiB, wrote: 22044672 bytes 21.0 MiB, delta: 13029376 bytes 12.4 MiB, wrote 144.5% more than predicted
    freespace: trashing: was free: 84942848 bytes 81.0 MiB, need free: 76062720 bytes 72.5 MiB, files created: 248636, delete 25993 (10.5% of them)
    freespace: was free: 6086656 bytes 5.8 MiB, wrote: 8331264 bytes 7.9 MiB, delta: 2244608 bytes 2.1 MiB, wrote 36.9% more than predicted
    freespace: Test 3 finished

    freespace: finished successfully

    After the change:

    freespace: Test 1: fill the space we have 3 times
    freespace: was free: 94048256 bytes 89.7 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 2441216 bytes 2.3 MiB, wrote 2.6% more than predicted
    freespace: was free: 92246016 bytes 88.0 MiB, wrote: 96493568 bytes 92.0 MiB, delta: 4247552 bytes 4.1 MiB, wrote 4.6% more than predicted
    freespace: was free: 92254208 bytes 88.0 MiB, wrote: 96489472 bytes 92.0 MiB, delta: 4235264 bytes 4.0 MiB, wrote 4.6% more than predicted
    freespace: Test 1 finished

    freespace: Test 2: gradually lessen amount of free space and fill the FS
    freespace: do 10 steps, lessen free space by 8386001 bytes 8.0 MiB each time
    freespace: was free: 86605824 bytes 82.6 MiB, wrote: 88252416 bytes 84.2 MiB, delta: 1646592 bytes 1.6 MiB, wrote 1.9% more than predicted
    freespace: was free: 78667776 bytes 75.0 MiB, wrote: 80715776 bytes 77.0 MiB, delta: 2048000 bytes 2.0 MiB, wrote 2.6% more than predicted
    freespace: was free: 69615616 bytes 66.4 MiB, wrote: 71630848 bytes 68.3 MiB, delta: 2015232 bytes 1.9 MiB, wrote 2.9% more than predicted
    freespace: was free: 61018112 bytes 58.2 MiB, wrote: 62783488 bytes 59.9 MiB, delta: 1765376 bytes 1.7 MiB, wrote 2.9% more than predicted
    freespace: was free: 52424704 bytes 50.0 MiB, wrote: 53968896 bytes 51.5 MiB, delta: 1544192 bytes 1.5 MiB, wrote 2.9% more than predicted
    freespace: was free: 43880448 bytes 41.8 MiB, wrote: 45199360 bytes 43.1 MiB, delta: 1318912 bytes 1.3 MiB, wrote 3.0% more than predicted
    freespace: was free: 35332096 bytes 33.7 MiB, wrote: 36425728 bytes 34.7 MiB, delta: 1093632 bytes 1.0 MiB, wrote 3.1% more than predicted
    freespace: was free: 26771456 bytes 25.5 MiB, wrote: 27643904 bytes 26.4 MiB, delta: 872448 bytes 852.0 KiB, wrote 3.3% more than predicted
    freespace: was free: 18231296 bytes 17.4 MiB, wrote: 18878464 bytes 18.0 MiB, delta: 647168 bytes 632.0 KiB, wrote 3.5% more than predicted
    freespace: was free: 9674752 bytes 9.2 MiB, wrote: 10088448 bytes 9.6 MiB, delta: 413696 bytes 404.0 KiB, wrote 4.3% more than predicted
    freespace: Test 2 finished

    freespace: Test 3: gradually lessen amount of free space by trashing and fill the FS
    freespace: do 10 steps, lessen free space by 8397544 bytes 8.0 MiB each time
    freespace: trashing: was free: 92372992 bytes 88.1 MiB, need free: 8397552 bytes 8.0 MiB, files created: 248296, delete 225723 (90.9% of them)
    freespace: was free: 71909376 bytes 68.6 MiB, wrote: 82472960 bytes 78.7 MiB, delta: 10563584 bytes 10.1 MiB, wrote 14.7% more than predicted
    freespace: trashing: was free: 88989696 bytes 84.9 MiB, need free: 16795096 bytes 16.0 MiB, files created: 248794, delete 201838 (81.1% of them)
    freespace: was free: 60354560 bytes 57.6 MiB, wrote: 71782400 bytes 68.5 MiB, delta: 11427840 bytes 10.9 MiB, wrote 18.9% more than predicted
    freespace: trashing: was free: 90304512 bytes 86.1 MiB, need free: 25192640 bytes 24.0 MiB, files created: 248733, delete 179342 (72.1% of them)
    freespace: was free: 51187712 bytes 48.8 MiB, wrote: 62943232 bytes 60.0 MiB, delta: 11755520 bytes 11.2 MiB, wrote 23.0% more than predicted
    freespace: trashing: was free: 91209728 bytes 87.0 MiB, need free: 33590184 bytes 32.0 MiB, files created: 248779, delete 157160 (63.2% of them)
    freespace: was free: 42704896 bytes 40.7 MiB, wrote: 55050240 bytes 52.5 MiB, delta: 12345344 bytes 11.8 MiB, wrote 28.9% more than predicted
    freespace: trashing: was free: 92700672 bytes 88.4 MiB, need free: 41987728 bytes 40.0 MiB, files created: 248848, delete 136135 (54.7% of them)
    freespace: was free: 35250176 bytes 33.6 MiB, wrote: 48115712 bytes 45.9 MiB, delta: 12865536 bytes 12.3 MiB, wrote 36.5% more than predicted
    freespace: trashing: was free: 93986816 bytes 89.6 MiB, need free: 50385272 bytes 48.1 MiB, files created: 248723, delete 115385 (46.4% of them)
    freespace: was free: 29995008 bytes 28.6 MiB, wrote: 41582592 bytes 39.7 MiB, delta: 11587584 bytes 11.1 MiB, wrote 38.6% more than predicted
    freespace: trashing: was free: 91881472 bytes 87.6 MiB, need free: 58782816 bytes 56.1 MiB, files created: 248645, delete 89569 (36.0% of them)
    freespace: was free: 22511616 bytes 21.5 MiB, wrote: 34705408 bytes 33.1 MiB, delta: 12193792 bytes 11.6 MiB, wrote 54.2% more than predicted
    freespace: trashing: was free: 91774976 bytes 87.5 MiB, need free: 67180360 bytes 64.1 MiB, files created: 248580, delete 66616 (26.8% of them)
    freespace: was free: 16908288 bytes 16.1 MiB, wrote: 26898432 bytes 25.7 MiB, delta: 9990144 bytes 9.5 MiB, wrote 59.1% more than predicted
    freespace: trashing: was free: 92450816 bytes 88.2 MiB, need free: 75577904 bytes 72.1 MiB, files created: 248654, delete 45381 (18.3% of them)
    freespace: was free: 10170368 bytes 9.7 MiB, wrote: 19111936 bytes 18.2 MiB, delta: 8941568 bytes 8.5 MiB, wrote 87.9% more than predicted
    freespace: trashing: was free: 93282304 bytes 89.0 MiB, need free: 83975448 bytes 80.1 MiB, files created: 248513, delete 24794 (10.0% of them)
    freespace: was free: 3911680 bytes 3.7 MiB, wrote: 7872512 bytes 7.5 MiB, delta: 3960832 bytes 3.8 MiB, wrote 101.3% more than predicted
    freespace: Test 3 finished

    freespace: finished successfully

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • This is a preparational patch for the following statfs()
    report fix.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • We add this component at other similar places, but not in this
    one.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy