10 May, 2016

4 commits


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

04 Feb, 2016

1 commit


08 Nov, 2015

1 commit

  • Merge second patch-bomb from Andrew Morton:

    - most of the rest of MM

    - procfs

    - lib/ updates

    - printk updates

    - bitops infrastructure tweaks

    - checkpatch updates

    - nilfs2 update

    - signals

    - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
    dma-debug, dma-mapping, ...

    * emailed patches from Andrew Morton : (102 commits)
    ipc,msg: drop dst nil validation in copy_msg
    include/linux/zutil.h: fix usage example of zlib_adler32()
    panic: release stale console lock to always get the logbuf printed out
    dma-debug: check nents in dma_sync_sg*
    dma-mapping: tidy up dma_parms default handling
    pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
    kexec: use file name as the output message prefix
    fs, seqfile: always allow oom killer
    seq_file: reuse string_escape_str()
    fs/seq_file: use seq_* helpers in seq_hex_dump()
    coredump: change zap_threads() and zap_process() to use for_each_thread()
    coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
    signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
    signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
    signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
    signals: kill block_all_signals() and unblock_all_signals()
    nilfs2: fix gcc uninitialized-variable warnings in powerpc build
    nilfs2: fix gcc unused-but-set-variable warnings
    MAINTAINERS: nilfs2: add header file for tracing
    nilfs2: add tracepoints for analyzing reading and writing metadata files
    ...

    Linus Torvalds
     

07 Nov, 2015

1 commit

  • There are many places which use mapping_gfp_mask to restrict a more
    generic gfp mask which would be used for allocations which are not
    directly related to the page cache but they are performed in the same
    context.

    Let's introduce a helper function which makes the restriction explicit and
    easier to track. This patch doesn't introduce any functional changes.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Michal Hocko
    Suggested-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

22 Oct, 2015

2 commits


11 Oct, 2015

1 commit


14 Aug, 2015

1 commit

  • We can always fill up the bio now, no need to estimate the possible
    size based on queue parameters.

    Acked-by: Steven Whitehouse
    Signed-off-by: Kent Overstreet
    [hch: rebased and wrote a changelog]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

26 Mar, 2015

1 commit


04 Mar, 2015

1 commit


17 Feb, 2015

1 commit


13 Dec, 2014

1 commit

  • Pull btrfs update from Chris Mason:
    "From a feature point of view, most of the code here comes from Miao
    Xie and others at Fujitsu to implement scrubbing and replacing devices
    on raid56. This has been in development for a while, and it's a big
    improvement.

    Filipe and Josef have a great assortment of fixes, many of which solve
    problems corruptions either after a crash or in error conditions. I
    still have a round two from Filipe for next week that solves
    corruptions with discard and block group removal"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
    Btrfs: make get_caching_control unconditionally return the ctl
    Btrfs: fix unprotected deletion from pending_chunks list
    Btrfs: fix fs mapping extent map leak
    Btrfs: fix memory leak after block remove + trimming
    Btrfs: make btrfs_abort_transaction consider existence of new block groups
    Btrfs: fix race between writing free space cache and trimming
    Btrfs: fix race between fs trimming and block group remove/allocation
    Btrfs, replace: enable dev-replace for raid56
    Btrfs: fix freeing used extents after removing empty block group
    Btrfs: fix crash caused by block group removal
    Btrfs: fix invalid block group rbtree access after bg is removed
    Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56
    Btrfs, replace: write raid56 parity into the replace target device
    Btrfs, replace: write dirty pages into the replace target device
    Btrfs, raid56: support parity scrub on raid56
    Btrfs, raid56: use a variant to record the operation type
    Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted
    Btrfs, raid56: don't change bbio and raid_map
    Btrfs: remove unnecessary code of stripe_index assignment in __btrfs_map_block
    Btrfs: remove noused bbio_ret in __btrfs_map_block in condition
    ...

    Linus Torvalds
     

01 Dec, 2014

1 commit

  • Don Bailey noticed that our page zeroing for compression at end-io time
    isn't complete. This reworks a patch from Linus to push the zeroing
    into the zlib and lzo specific functions instead of trying to handle the
    corners inside btrfs_decompress_buf2page

    Signed-off-by: Chris Mason
    Reviewed-by: Josef Bacik
    Reported-by: Don A. Bailey
    cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Chris Mason
     

21 Nov, 2014

1 commit

  • Our compressed bio write end callback was essentially ignoring the error
    parameter. When a write error happens, it must pass a value of 0 to the
    inode's write_page_end_io_hook callback, SetPageError on the respective
    pages and set AS_EIO in the inode's mapping flags, so that a call to
    filemap_fdatawait_range() / filemap_fdatawait() can find out that errors
    happened (we surely don't want silent failures on fsync for example).

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

02 Oct, 2014

1 commit


18 Sep, 2014

1 commit


29 Jun, 2014

1 commit


10 Jun, 2014

1 commit

  • The btrfs compression wrappers translated errors from workspace
    allocation to either -ENOMEM or -1. The compression type workspace
    allocators are already returning a ERR_PTR(-ENOMEM). Just return that
    and get rid of the magical -1.

    This helps a future patch return errors from the compression wrappers.

    Signed-off-by: Zach Brown
    Reviewed-by: David Sterba
    Signed-off-by: Chris Mason

    Zach Brown
     

04 Apr, 2014

1 commit

  • shmem mappings already contain exceptional entries where swap slot
    information is remembered.

    To be able to store eviction information for regular page cache, prepare
    every site dealing with the radix trees directly to handle entries other
    than pages.

    The common lookup functions will filter out non-page entries and return
    NULL for page cache holes, just as before. But provide a raw version of
    the API which returns non-page entries as well, and switch shmem over to
    use it.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

10 Feb, 2014

1 commit

  • Pull btrfs fixes from Chris Mason:
    "This is a small collection of fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix data corruption when reading/updating compressed extents
    Btrfs: don't loop forever if we can't run because of the tree mod log
    btrfs: reserve no transaction units in btrfs_ioctl_set_features
    btrfs: commit transaction after setting label and features
    Btrfs: fix assert screwup for the pending move stuff

    Linus Torvalds
     

09 Feb, 2014

1 commit

  • When using a mix of compressed file extents and prealloc extents, it
    is possible to fill a page of a file with random, garbage data from
    some unrelated previous use of the page, instead of a sequence of zeroes.

    A simple sequence of steps to get into such case, taken from the test
    case I made for xfstests, is:

    _scratch_mkfs
    _scratch_mount "-o compress-force=lzo"
    $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
    $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
    $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
    $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

    This results in the following file items in the fs tree:

    item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
    inode generation 6 transid 6 size 542872 block group 0 mode 100600
    item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
    inode ref index 2 namelen 6 name: foobar
    item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
    extent data disk byte 0 nr 0 gen 6
    extent data offset 0 nr 24576 ram 266240
    extent compression 0
    item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
    prealloc data disk byte 12849152 nr 241664 gen 6
    prealloc data offset 0 nr 241664
    item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
    extent data disk byte 12845056 nr 4096 gen 6
    extent data offset 0 nr 20480 ram 20480
    extent compression 2
    item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
    prealloc data disk byte 13090816 nr 405504 gen 6
    prealloc data offset 0 nr 258048

    The on disk extent at offset 266240 (which corresponds to 1 single disk block),
    contains 5 compressed chunks of file data. Each of the first 4 compress 4096
    bytes of file data, while the last one only compresses 3024 bytes of file data.
    Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
    1072 bytes) should always return zeroes (our next extent is a prealloc one).

    The solution here is the compression code path to zero the remaining (untouched)
    bytes of the last page it uncompressed data into, as the information about how
    much space the file data consumes in the last page is not known in the upper layer
    fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
    the remainder of the page but only if it corresponds to the last page of the inode
    and if the inode's size is not a multiple of the page size.

    This would cause not only returning random data on reads, but also permanently
    storing random data when updating parts of the region that should be zeroed.
    For the example above, it means updating a single byte in the region [285648 ; 286720[
    would store that byte correctly but also store random data on disk.

    A test case for xfstests follows soon.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     

31 Jan, 2014

1 commit

  • Pull btrfs updates from Chris Mason:
    "This is a pretty big pull, and most of these changes have been
    floating in btrfs-next for a long time. Filipe's properties work is a
    cool building block for inheriting attributes like compression down on
    a per inode basis.

    Jeff Mahoney kicked in code to export filesystem info into sysfs.

    Otherwise, lots of performance improvements, cleanups and bug fixes.

    Looks like there are still a few other small pending incrementals, but
    I wanted to get the bulk of this in first"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
    Btrfs: fix spin_unlock in check_ref_cleanup
    Btrfs: setup inode location during btrfs_init_inode_locked
    Btrfs: don't use ram_bytes for uncompressed inline items
    Btrfs: fix btrfs_search_slot_for_read backwards iteration
    Btrfs: do not export ulist functions
    Btrfs: rework ulist with list+rb_tree
    Btrfs: fix memory leaks on walking backrefs failure
    Btrfs: fix send file hole detection leading to data corruption
    Btrfs: add a reschedule point in btrfs_find_all_roots()
    Btrfs: make send's file extent item search more efficient
    Btrfs: fix to catch all errors when resolving indirect ref
    Btrfs: fix protection between walking backrefs and root deletion
    btrfs: fix warning while merging two adjacent extents
    Btrfs: fix infinite path build loops in incremental send
    btrfs: undo sysfs when open_ctree() fails
    Btrfs: fix snprintf usage by send's gen_unique_name
    btrfs: fix defrag 32-bit integer overflow
    btrfs: sysfs: list the NO_HOLES feature
    btrfs: sysfs: don't show reserved incompat feature
    btrfs: call permission checks earlier in ioctls and return EPERM
    ...

    Linus Torvalds
     

29 Jan, 2014

1 commit


24 Nov, 2013

2 commits

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     
  • With immutable biovecs we don't want code accessing bi_io_vec directly -
    the uses this patch changes weren't incorrect since they all own the
    bio, but it makes the code harder to audit for no good reason - also,
    this will help with multipage bvecs later.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: Jaegeuk Kim
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust

    Kent Overstreet
     

12 Nov, 2013

2 commits


01 Sep, 2013

2 commits

  • u64 is "unsigned long long" on all architectures now, so there's no need to
    cast it when formatting it using the "ll" length modifier.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Geert Uytterhoeven
     
  • We want this for btrfs_extent_same. Basically readpage and friends do their
    own extent locking but for the purposes of dedupe, we want to have both
    files locked down across a set of readpage operations (so that we can
    compare data). Introduce this variant and a flag which can be set for
    extent_read_full_page() to indicate that we are already locked.

    Partial credit for this patch goes to Gabriel de Perthuis
    as I have included a fix from him to the original patch which avoids a
    deadlock on compressed extents.

    Signed-off-by: Mark Fasheh
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Mark Fasheh
     

07 May, 2013

2 commits

  • Big patch, but all it does is add statics to functions which
    are in fact static, then remove the associated dead-code fallout.

    removed functions:

    btrfs_iref_to_path()
    __btrfs_lookup_delayed_deletion_item()
    __btrfs_search_delayed_insertion_item()
    __btrfs_search_delayed_deletion_item()
    find_eb_for_page()
    btrfs_find_block_group()
    range_straddles_pages()
    extent_range_uptodate()
    btrfs_file_extent_length()
    btrfs_scrub_cancel_devid()
    btrfs_start_transaction_lflush()

    btrfs_print_tree() is left because it is used for debugging.
    btrfs_start_transaction_lflush() and btrfs_reada_detach() are
    left for symmetry.

    ulist.c functions are left, another patch will take care of those.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Josef Bacik

    Eric Sandeen
     
  • Argument 'root' is no more used in btrfs_csum_data().

    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik

    Liu Bo
     

02 Feb, 2013

1 commit


13 Dec, 2012

1 commit

  • With the addition of the device replace procedure, it is possible
    for btrfs_map_bio(READ) to report an error. This happens when the
    specific mirror is requested which is located on the target disk,
    and the copy operation has not yet copied this block. Hence the
    block cannot be read and this error state is indicated by
    returning EIO.
    Some background information follows now. A new mirror is added
    while the device replace procedure is running.
    btrfs_get_num_copies() returns one more, and
    btrfs_map_bio(GET_READ_MIRROR) adds one more mirror if a disk
    location is involved that was already handled by the device
    replace copy operation. The assigned mirror num is the highest
    mirror number, e.g. the value 3 in case of RAID1.
    If btrfs_map_bio() is invoked with mirror_num == 0 (i.e., select
    any mirror), the copy on the target drive is never selected
    because that disk shall be able to perform the write requests as
    quickly as possible. The parallel execution of read requests would
    only slow down the disk copy procedure. Second case is that
    btrfs_map_bio() is called with mirror_num > 0. This is done from
    the repair code only. In this case, the highest mirror num is
    assigned to the target disk, since it is used last. And when this
    mirror is not available because the copy procedure has not yet
    handled this area, an error is returned. Everywhere in the code
    the handling of such errors is added now.

    Signed-off-by: Stefan Behrens
    Signed-off-by: Chris Mason

    Stefan Behrens
     

09 Oct, 2012

1 commit


29 Aug, 2012

1 commit

  • We need a barrir before calling waitqueue_active otherwise we will miss
    wakeups. So in places that do atomic_dec(); then atomic_read() use
    atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
    and then add an explicit memory barrier everywhere else that need them.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik