03 Jun, 2016

3 commits

  • self-tests code assumes 4k as the sectorsize and nodesize. This commit
    fix hardcoded 4K. Enables the self-tests code to be executed on non-4k
    page sized systems (e.g. ppc64).

    Reviewed-by: Josef Bacik
    Signed-off-by: Feifei Xu
    Signed-off-by: Chandan Rajendra
    Signed-off-by: David Sterba

    Feifei Xu
     
  • On ppc64, bytes_per_bitmap will be (65536*8*65536). Hence append UL to
    fix integer overflow.

    Reviewed-by: Josef Bacik
    Reviewed-by: Chandan Rajendra
    Signed-off-by: Feifei Xu
    Signed-off-by: David Sterba

    Feifei Xu
     
  • On a ppc64 machine using 64K as the block size, assume that the RB
    tree at btrfs_free_space_ctl->free_space_offset contains following
    two entries:

    1. A bitmap entry having an offset value of 0 and having the bits
    corresponding to the address range [128M+512K, 128M+768K] set.
    2. An extent entry corresponding to the address range
    [128M-256K, 128M-128K]

    In such a scenario, test_check_exists() invoked for checking the
    existence of address range [128M+768K, 256M] can lead to an
    infinite loop as explained below:

    - Checking for the extent entry fails.
    - Checking for a bitmap entry results in the free space info in
    range [128M+512K, 128M+768K] beng returned.
    - rb_prev(info) returns NULL because the bitmap entry starting from
    offset 0 comes first in the RB tree.
    - current_node = bitmap node.
    - while (current_node)
    tmp = rb_next(bitmap_node);/*tmp is extent based free space entry*/
    Since extent based free space entry's last address is smaller
    than the address being searched for (i.e. 128M+768K) we
    incorrectly again obtain the extent node as the "next right node"
    of the RB tree and thus end up looping infinitely.

    This patch fixes the issue by checking the "tmp" variable which point
    to the most recently searched free space node.

    Reviewed-by: Josef Bacik
    Reviewed-by: Chandan Rajendra
    Signed-off-by: Feifei Xu
    Signed-off-by: David Sterba

    Feifei Xu
     

26 May, 2016

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

11 Jan, 2016

2 commits


07 Jan, 2016

3 commits


24 Dec, 2015

1 commit


19 Dec, 2015

1 commit


16 Dec, 2015

2 commits

  • …manana/linux into for-linus-4.4

    Chris Mason
     
  • Dave Jones found a warning from kasan in setup_cluster_bitmaps()

    ==================================================================
    BUG: KASAN: stack-out-of-bounds in setup_cluster_bitmap+0xc4/0x5a0 at
    addr ffff88039bef6828
    Read of size 8 by task nfsd/1009
    page:ffffea000e6fbd80 count:0 mapcount:0 mapping: (null)
    index:0x0
    flags: 0x8000000000000000()
    page dumped because: kasan: bad access detected
    CPU: 1 PID: 1009 Comm: nfsd Tainted: G W
    4.4.0-rc3-backup-debug+ #1
    ffff880065647b50 000000006bb712c2 ffff88039bef6640 ffffffffa680a43e
    0000004559c00000 ffff88039bef66c8 ffffffffa62638d1 ffffffffa61121c0
    ffff8803a5769de8 0000000000000296 ffff8803a5769df0 0000000000046280
    Call Trace:
    [] dump_stack+0x4b/0x6d
    [] kasan_report_error+0x501/0x520
    [] ? debug_show_all_locks+0x1e0/0x1e0
    [] kasan_report+0x58/0x60
    [] ? rb_last+0x10/0x40
    [] ? setup_cluster_bitmap+0xc4/0x5a0
    [] __asan_load8+0x5d/0x70
    [] setup_cluster_bitmap+0xc4/0x5a0
    [] ? setup_cluster_no_bitmap+0x6a/0x400
    [] btrfs_find_space_cluster+0x4b6/0x640
    [] ? btrfs_alloc_from_cluster+0x4e0/0x4e0
    [] ? btrfs_return_cluster_to_free_space+0x9e/0xb0
    [] ? _raw_spin_unlock+0x27/0x40
    [] find_free_extent+0xba1/0x1520

    Andrey noticed this was because we were doing list_first_entry on a list
    that might be empty. Rework the tests a bit so we don't do that.

    Signed-off-by: Chris Mason
    Reprorted-by: Andrey Ryabinin
    Reported-by: Dave Jones

    Chris Mason
     

10 Dec, 2015

1 commit


03 Dec, 2015

1 commit


08 Nov, 2015

1 commit

  • Merge second patch-bomb from Andrew Morton:

    - most of the rest of MM

    - procfs

    - lib/ updates

    - printk updates

    - bitops infrastructure tweaks

    - checkpatch updates

    - nilfs2 update

    - signals

    - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
    dma-debug, dma-mapping, ...

    * emailed patches from Andrew Morton : (102 commits)
    ipc,msg: drop dst nil validation in copy_msg
    include/linux/zutil.h: fix usage example of zlib_adler32()
    panic: release stale console lock to always get the logbuf printed out
    dma-debug: check nents in dma_sync_sg*
    dma-mapping: tidy up dma_parms default handling
    pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
    kexec: use file name as the output message prefix
    fs, seqfile: always allow oom killer
    seq_file: reuse string_escape_str()
    fs/seq_file: use seq_* helpers in seq_hex_dump()
    coredump: change zap_threads() and zap_process() to use for_each_thread()
    coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
    signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
    signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
    signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
    signals: kill block_all_signals() and unblock_all_signals()
    nilfs2: fix gcc uninitialized-variable warnings in powerpc build
    nilfs2: fix gcc unused-but-set-variable warnings
    MAINTAINERS: nilfs2: add header file for tracing
    nilfs2: add tracepoints for analyzing reading and writing metadata files
    ...

    Linus Torvalds
     

07 Nov, 2015

1 commit

  • There are many places which use mapping_gfp_mask to restrict a more
    generic gfp mask which would be used for allocations which are not
    directly related to the page cache but they are performed in the same
    context.

    Let's introduce a helper function which makes the restriction explicit and
    easier to track. This patch doesn't introduce any functional changes.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Michal Hocko
    Suggested-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

22 Oct, 2015

6 commits


08 Oct, 2015

1 commit


29 Jul, 2015

1 commit

  • When we clear the dirty bits in btrfs_delete_unused_bgs for extents
    in the empty block group, it results in btrfs_finish_extent_commit being
    unable to discard the freed extents.

    The block group removal patch added an alternate path to forget extents
    other than btrfs_finish_extent_commit. As a result, any extents that
    would be freed when the block group is removed aren't discarded. In my
    test run, with a large copy of mixed sized files followed by removal, it
    left nearly 2/3 of extents undiscarded.

    To clean up the block groups, we add the removed block group onto a list
    that will be discarded after transaction commit.

    Signed-off-by: Jeff Mahoney
    Reviewed-by: Filipe Manana
    Tested-by: Filipe Manana
    Signed-off-by: Chris Mason

    Jeff Mahoney
     

03 Jun, 2015

1 commit

  • If the call to btrfs_truncate_inode_items() failed and we don't have a block
    group, we were unlocking the cache_write_mutex without having locked it (we
    do it only if we have a block group).

    Fixes: 1bbc621ef284 ("Btrfs: allow block group cache writeout
    outside critical section in commit")

    Signed-off-by: Filipe Manana
    Reviewed-by: David Sterba
    Signed-off-by: Chris Mason

    Filipe Manana
     

11 May, 2015

1 commit

  • If the writeback of an inode cache failed we were unnecessarilly
    attempting to release again the delalloc metadata that we previously
    reserved. However attempting to do this a second time triggers an
    assertion at drop_outstanding_extent() because we have no more
    outstanding extents for our inode cache's inode. If we were able
    to start writeback of the cache the reserved metadata space is
    released at btrfs_finished_ordered_io(), even if an error happens
    during writeback.

    So make sure we don't repeat the metadata space release if writeback
    started for our inode cache.

    This issue was trivial to reproduce by running the fstest btrfs/088
    with "-o inode_cache", which triggered the assertion leading to a
    BUG() call and requiring a reboot in order to run the remaining
    fstests. Trace produced by btrfs/088:

    [255289.385904] BTRFS: assertion failed: BTRFS_I(inode)->outstanding_extents >= num_extents, file: fs/btrfs/extent-tree.c, line: 5276
    [255289.388094] ------------[ cut here ]------------
    [255289.389184] kernel BUG at fs/btrfs/ctree.h:4057!
    [255289.390125] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    (...)
    [255289.392068] Call Trace:
    [255289.392068] [] drop_outstanding_extent+0x3d/0x6d [btrfs]
    [255289.392068] [] btrfs_delalloc_release_metadata+0x54/0xe3 [btrfs]
    [255289.392068] [] btrfs_write_out_ino_cache+0x95/0xad [btrfs]
    [255289.392068] [] btrfs_save_ino_cache+0x275/0x2dc [btrfs]
    [255289.392068] [] commit_fs_roots.isra.12+0xaa/0x137 [btrfs]
    [255289.392068] [] ? trace_hardirqs_on+0xd/0xf
    [255289.392068] [] ? btrfs_commit_transaction+0x4b1/0x9c9 [btrfs]
    [255289.392068] [] ? _raw_spin_unlock+0x32/0x46
    [255289.392068] [] btrfs_commit_transaction+0x4c0/0x9c9 [btrfs]
    (...)

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

07 May, 2015

1 commit

  • We were passing a flags value that differed from the intention in commit
    2b108268006e ("Btrfs: don't use highmem for free space cache pages").

    This caused problems in a ARM machine, leaving btrfs unusable there.

    Reported-by: Merlijn Wajer
    Tested-by: Merlijn Wajer
    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

26 Apr, 2015

1 commit


25 Apr, 2015

1 commit

  • __btrfs_write_out_cache is holding the ctl->tree_lock while it prepares
    a list of bitmaps to record in the free space cache. It was dropping
    the lock while it worked on other components, which made a window for
    free_bitmap() to free the bitmap struct without removing it from the
    list.

    This changes things to hold the lock the whole time, and also makes sure
    we hold the lock during enospc cleanup.

    Reported-by: Filipe Manana
    Signed-off-by: Chris Mason

    Chris Mason
     

24 Apr, 2015

1 commit

  • The code to fix stalls during free spache cache IO wasn't using
    the correct root when waiting on the IO for inode caches. This
    is only a problem when the inode cache is enabled with

    mount -o inode_cache

    This fixes the inode cache writeout to preserve any error values and
    makes sure not to override the root when inode cache writeout is done.

    Reported-by: Filipe Manana
    Signed-off-by: Chris Mason

    Chris Mason
     

11 Apr, 2015

5 commits

  • We loop through all of the dirty block groups during commit and write
    the free space cache. In order to make sure the cache is currect, we do
    this while no other writers are allowed in the commit.

    If a large number of block groups are dirty, this can introduce long
    stalls during the final stages of the commit, which can block new procs
    trying to change the filesystem.

    This commit changes the block group cache writeout to take appropriate
    locks and allow it to run earlier in the commit. We'll still have to
    redo some of the block groups, but it means we can get most of the work
    out of the way without blocking the entire FS.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • In order to create the free space cache concurrently with FS modifications,
    we need to take a few block group locks.

    The cache code also does kmap, which would schedule with the locks held.
    Instead of going through kmap_atomic, lets just use lowmem for the cache
    pages.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Block group cache writeout is currently waiting on the pages for each
    block group cache before moving on to writing the next one. This commit
    switches things around to send down all the caches and then wait on them
    in batches.

    The end result is much faster, since we're keeping the disk pipeline
    full.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • We'll need to put the io_ctl into the block_group cache struct, so
    name it struct btrfs_io_ctl and move it into ctree.h

    Signed-off-by: Chris Mason

    Chris Mason
     
  • When we are deleting large files with large extents, we are building up
    a huge set of delayed refs for processing. Truncate isn't checking
    often enough to see if we need to back off and process those, or let
    a commit proceed.

    The end result is long stalls after the rm, and very long commit times.
    During the commits, other processes back up waiting to start new
    transactions and we get into trouble.

    Signed-off-by: Chris Mason

    Chris Mason
     

04 Mar, 2015

3 commits