20 Jun, 2017

3 commits


28 Feb, 2017

3 commits


30 Nov, 2016

2 commits


27 Sep, 2016

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

17 Feb, 2015

1 commit


01 Dec, 2014

1 commit

  • Don Bailey noticed that our page zeroing for compression at end-io time
    isn't complete. This reworks a patch from Linus to push the zeroing
    into the zlib and lzo specific functions instead of trying to handle the
    corners inside btrfs_decompress_buf2page

    Signed-off-by: Chris Mason
    Reviewed-by: Josef Bacik
    Reported-by: Don A. Bailey
    cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Chris Mason
     

18 Sep, 2014

2 commits

  • `struct workspace' used for zlib compression contains two zlib
    z_stream-s: `def_strm' used in zlib_compress_pages(), and `inf_strm'
    used in zlib_decompress/zlib_decompress_biovec(). None of these
    functions use `inf_strm' and `def_strm' simultaniously, meaning that
    for every compress/decompress operation we need only one z_stream
    (out of two available).

    `inf_strm' and `def_strm' are different in size of ->workspace. For
    inflate stream we vmalloc() zlib_inflate_workspacesize() bytes, for
    deflate stream - zlib_deflate_workspacesize() bytes. On my system zlib
    returns the following workspace sizes, correspondingly: 42312 and 268104
    (+ guard pages).

    Keep only one `z_stream' in `struct workspace' and use it for both
    compression and decompression. Hence, instead of vmalloc() of two
    z_stream->worskpace-s, allocate only one of size:
    max(zlib_deflate_workspacesize(), zlib_inflate_workspacesize())

    Reviewed-by: David Sterba
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Chris Mason

    Sergey Senozhatsky
     
  • The form

    (value + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT

    is equivalent to

    (value + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE

    The rest is a simple subsitution, no difference in the generated
    assembly code.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     

03 Jul, 2014

1 commit


10 Jun, 2014

1 commit

  • The compression layer seems to have been built to return -1 and have
    callers make up errors that make sense. This isn't great because there
    are different errors that originate down in the compression layer.

    Let's return real negative errnos from the compression layer so that
    callers can pass on the error without having to guess what happened.
    ENOMEM for allocation failure, E2BIG when compression exceeds the
    uncompressed input, and EIO for everything else.

    This helps a future path return errors from btrfs_decompress().

    Signed-off-by: Zach Brown
    Signed-off-by: Chris Mason

    Zach Brown
     

29 Jan, 2014

1 commit


09 Oct, 2012

1 commit


20 Mar, 2012

1 commit


23 Mar, 2011

1 commit

  • Instead of always creating a huge (268K) deflate_workspace with the
    maximum compression parameters (windowBits=15, memLevel=8), allow the
    caller to obtain a smaller workspace by specifying smaller parameter
    values.

    For example, when capturing oops and panic reports to a medium with
    limited capacity, such as NVRAM, compression may be the only way to
    capture the whole report. In this case, a small workspace (24K works
    fine) is a win, whether you allocate the workspace when you need it (i.e.,
    during an oops or panic) or at boot time.

    I've verified that this patch works with all accepted values of windowBits
    (positive and negative), memLevel, and compression level.

    Signed-off-by: Jim Keniston
    Cc: Herbert Xu
    Cc: David Miller
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Keniston
     

22 Dec, 2010

4 commits


30 Oct, 2010

1 commit

  • These are all the cases where a variable is set, but not read which are
    not bugs as far as I can see, but simply leftovers.

    Still needs more review.

    Found by gcc 4.6's new warnings

    Signed-off-by: Andi Kleen
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Chris Mason

    Andi Kleen
     

08 Aug, 2009

1 commit

  • find_zlib_workspace returns an ERR_PTR value in an error case instead of NULL.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    @match exists@
    expression x, E;
    statement S1, S2;
    @@

    x = find_zlib_workspace(...)
    ... when != x = E
    (
    * if (x == NULL || ...) S1 else S2
    |
    * if (x == NULL && ...) S1 else S2
    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Chris Mason

    Julia Lawall
     

06 Jan, 2009

1 commit


02 Dec, 2008

1 commit


11 Nov, 2008

1 commit


07 Nov, 2008

1 commit

  • When reading compressed extents, try to put pages into the page cache
    for any pages covered by the compressed extent that readpages didn't already
    preload.

    Add an async work queue to handle transformations at delayed allocation processing
    time. Right now this is just compression. The workflow is:

    1) Find offsets in the file marked for delayed allocation
    2) Lock the pages
    3) Lock the state bits
    4) Call the async delalloc code

    The async delalloc code clears the state lock bits and delalloc bits. It is
    important this happens before the range goes into the work queue because
    otherwise it might deadlock with other work queue items that try to lock
    those extent bits.

    The file pages are compressed, and if the compression doesn't work the
    pages are written back directly.

    An ordered work queue is used to make sure the inodes are written in the same
    order that pdflush or writepages sent them down.

    This changes extent_write_cache_pages to let the writepage function
    update the wbc nr_written count.

    Signed-off-by: Chris Mason

    Chris Mason
     

30 Oct, 2008

1 commit

  • This is a large change for adding compression on reading and writing,
    both for inline and regular extents. It does some fairly large
    surgery to the writeback paths.

    Compression is off by default and enabled by mount -o compress. Even
    when the -o compress mount option is not used, it is possible to read
    compressed extents off the disk.

    If compression for a given set of pages fails to make them smaller, the
    file is flagged to avoid future compression attempts later.

    * While finding delalloc extents, the pages are locked before being sent down
    to the delalloc handler. This allows the delalloc handler to do complex things
    such as cleaning the pages, marking them writeback and starting IO on their
    behalf.

    * Inline extents are inserted at delalloc time now. This allows us to compress
    the data before inserting the inline extent, and it allows us to insert
    an inline extent that spans multiple pages.

    * All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
    are changed to record both an in-memory size and an on disk size, as well
    as a flag for compression.

    From a disk format point of view, the extent pointers in the file are changed
    to record the on disk size of a given extent and some encoding flags.
    Space in the disk format is allocated for compression encoding, as well
    as encryption and a generic 'other' field. Neither the encryption or the
    'other' field are currently used.

    In order to limit the amount of data read for a single random read in the
    file, the size of a compressed extent is limited to 128k. This is a
    software only limit, the disk format supports u64 sized compressed extents.

    In order to limit the ram consumed while processing extents, the uncompressed
    size of a compressed extent is limited to 256k. This is a software only limit
    and will be subject to tuning later.

    Checksumming is still done on compressed extents, and it is done on the
    uncompressed version of the data. This way additional encodings can be
    layered on without having to figure out which encoding to checksum.

    Compression happens at delalloc time, which is basically singled threaded because
    it is usually done by a single pdflush thread. This makes it tricky to
    spread the compression load across all the cpus on the box. We'll have to
    look at parallel pdflush walks of dirty inodes at a later time.

    Decompression is hooked into readpages and it does spread across CPUs nicely.

    Signed-off-by: Chris Mason

    Chris Mason