15 Sep, 2022

1 commit

  • [ Upstream commit 2f44013e39984c127c6efedf70e6b5f4e9dcf315 ]

    During stress testing with CONFIG_SMP disabled, KASAN reports as below:

    ==================================================================
    BUG: KASAN: use-after-free in __mutex_lock+0xe5/0xc30
    Read of size 8 at addr ffff8881094223f8 by task stress/7789

    CPU: 0 PID: 7789 Comm: stress Not tainted 6.0.0-rc1-00002-g0d53d2e882f9 #3
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    Call Trace:

    ..
    __mutex_lock+0xe5/0xc30
    ..
    z_erofs_do_read_page+0x8ce/0x1560
    ..
    z_erofs_readahead+0x31c/0x580
    ..
    Freed by task 7787
    kasan_save_stack+0x1e/0x40
    kasan_set_track+0x20/0x30
    kasan_set_free_info+0x20/0x40
    __kasan_slab_free+0x10c/0x190
    kmem_cache_free+0xed/0x380
    rcu_core+0x3d5/0xc90
    __do_softirq+0x12d/0x389

    Last potentially related work creation:
    kasan_save_stack+0x1e/0x40
    __kasan_record_aux_stack+0x97/0xb0
    call_rcu+0x3d/0x3f0
    erofs_shrink_workstation+0x11f/0x210
    erofs_shrink_scan+0xdc/0x170
    shrink_slab.constprop.0+0x296/0x530
    drop_slab+0x1c/0x70
    drop_caches_sysctl_handler+0x70/0x80
    proc_sys_call_handler+0x20a/0x2f0
    vfs_write+0x555/0x6c0
    ksys_write+0xbe/0x160
    do_syscall_64+0x3b/0x90

    The root cause is that erofs_workgroup_unfreeze() doesn't reset to
    orig_val thus it causes a race that the pcluster reuses unexpectedly
    before freeing.

    Since UP platforms are quite rare now, such path becomes unnecessary.
    Let's drop such specific-designed path directly instead.

    Fixes: 73f5c66df3e2 ("staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'")
    Reviewed-by: Yue Hu
    Reviewed-by: Chao Yu
    Link: https://lore.kernel.org/r/20220902045710.109530-1-hsiangkao@linux.alibaba.com
    Signed-off-by: Gao Xiang
    Signed-off-by: Sasha Levin

    Gao Xiang
     

17 Aug, 2022

1 commit

  • [ Upstream commit 448b5a1548d87c246c3d0c3df8480d3c6eb6c11a ]

    Currently, vmap()s are avoided if physical addresses are
    consecutive for decompressed buffers.

    I observed that is very common for 4KiB pclusters since the
    numbers of decompressed pages are almost 2 or 3.

    However, such detection doesn't work for Highmem pages on
    32-bit machines, let's fix it now.

    Reported-by: Liu Jinbao
    Fixes: 7fc45dbc938a ("staging: erofs: introduce generic decompression backend")
    Link: https://lore.kernel.org/r/20220708101001.21242-1-hsiangkao@linux.alibaba.com
    Signed-off-by: Gao Xiang
    Signed-off-by: Sasha Levin

    Gao Xiang
     

01 May, 2022

1 commit

  • commit 4fdccaa0d184c202f98d73b24e3ec8eeee88ab8d upstream

    Add a done_before argument to iomap_dio_rw that indicates how much of
    the request has already been transferred. When the request succeeds, we
    report that done_before additional bytes were tranferred. This is
    useful for finishing a request asynchronously when part of the request
    has already been completed synchronously.

    We'll use that to allow iomap_dio_rw to be used with page faults
    disabled: when a page fault occurs while submitting a request, we
    synchronously complete the part of the request that has already been
    submitted. The caller can then take care of the page fault and call
    iomap_dio_rw again for the rest of the request, passing in the number of
    bytes already tranferred.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Anand Jain
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     

01 Dec, 2021

1 commit

  • [ Upstream commit 57bbeacdbee72a54eb97d56b876cf9c94059fc34 ]

    We observed the following deadlock in the stress test under low
    memory scenario:

    Thread A Thread B
    - erofs_shrink_scan
    - erofs_try_to_release_workgroup
    - erofs_workgroup_try_to_freeze -- A
    - z_erofs_do_read_page
    - z_erofs_collection_begin
    - z_erofs_register_collection
    - erofs_insert_workgroup
    - xa_lock(&sbi->managed_pslots) -- B
    - erofs_workgroup_get
    - erofs_wait_on_workgroup_freezed -- A
    - xa_erase
    - xa_lock(&sbi->managed_pslots) -- B

    To fix this, it needs to hold xa_lock before freezing the workgroup
    since xarray will be touched then. So let's hold the lock before
    accessing each workgroup, just like what we did with the radix tree
    before.

    [ Gao Xiang: Jianhua Hao also reports this issue at
    https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ]

    Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com
    Fixes: 64094a04414f ("erofs: convert workstn to XArray")
    Reviewed-by: Chao Yu
    Reviewed-by: Gao Xiang
    Signed-off-by: Huang Jianan
    Reported-by: Jianhua Hao
    Signed-off-by: Gao Xiang
    Signed-off-by: Sasha Levin

    Huang Jianan
     

19 Nov, 2021

2 commits

  • commit 86432a6dca9bed79111990851df5756d3eb5f57c upstream.

    There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL
    before actual I/O submission. Thus, the decompression chain can be
    extended if the following pcluster chain hooks such tail pcluster.

    As the related comment mentioned, if some page is made of a hooked
    pcluster and another followed pcluster, it can be reused for in-place
    I/O (since I/O should be submitted anyway):
    _______________________________________________________________
    | tail (partial) page | head (partial) page |
    |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________|

    However, it's by no means safe to reuse as pagevec since if such
    PRIMARY_HOOKED pclusters finally move into bypass chain without I/O
    submission. It's somewhat hard to reproduce with LZ4 and I just found
    it (general protection fault) by ro_fsstressing a LZMA image for long
    time.

    I'm going to actively clean up related code together with multi-page
    folio adaption in the next few months. Let's address it directly for
    easier backporting for now.

    Call trace for reference:
    z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs]
    z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs]
    z_erofs_runqueue+0x5f3/0x840 [erofs]
    z_erofs_readahead+0x1e8/0x320 [erofs]
    read_pages+0x91/0x270
    page_cache_ra_unbounded+0x18b/0x240
    filemap_get_pages+0x10a/0x5f0
    filemap_read+0xa9/0x330
    new_sync_read+0x11b/0x1a0
    vfs_read+0xf1/0x190

    Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org
    Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
    Cc: # 4.19+
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • [ Upstream commit a0961f351d82d43ab0b845304caa235dfe249ae9 ]

    syzbot reported a WARNING [1] due to corrupted compressed data.

    As Dmitry said, "If this is not a kernel bug, then the code should
    not use WARN. WARN if for kernel bugs and is recognized as such by
    all testing systems and humans."

    [1] https://lore.kernel.org/r/000000000000b3586105cf0ff45e@google.com

    Link: https://lore.kernel.org/r/20211025074311.130395-1-hsiangkao@linux.alibaba.com
    Cc: Dmitry Vyukov
    Reviewed-by: Chao Yu
    Reported-by: syzbot+d8aaffc3719597e8cfb4@syzkaller.appspotmail.com
    Signed-off-by: Gao Xiang
    Signed-off-by: Sasha Levin

    Gao Xiang
     

23 Sep, 2021

2 commits

  • Currently, the whole indexes will only be compacted 4B if
    compacted_4b_initial > totalidx. So, the calculated compacted_2b
    is worthless for that case. It may waste CPU resources.

    No need to update compacted_4b_initial as mkfs since it's used to
    fulfill the alignment of the 1st compacted_2b pack and would handle
    the case above.

    We also need to clarify compacted_4b_end here. It's used for the
    last lclusters which aren't fitted in the previous compacted_2b
    packs.

    Some messages are from Xiang.

    Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com
    Signed-off-by: Yue Hu
    Reviewed-by: Gao Xiang
    Reviewed-by: Chao Yu
    [ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ]
    Signed-off-by: Gao Xiang

    Yue Hu
     
  • Unsupported chunk format should be checked with
    "if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)"

    Found when checking with 4k-byte blockmap (although currently mkfs
    uses inode chunk indexes format by default.)

    Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.com
    Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files")
    Reviewed-by: Liu Bo
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

10 Sep, 2021

1 commit

  • Pull libnvdimm updates from Dan Williams:

    - Fix a race condition in the teardown path of raw mode pmem
    namespaces.

    - Cleanup the code that filesystems use to detect filesystem-dax
    capabilities of their underlying block device.

    * tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    dax: remove bdev_dax_supported
    xfs: factor out a xfs_buftarg_is_dax helper
    dax: stub out dax_supported for !CONFIG_FS_DAX
    dax: remove __generic_fsdax_supported
    dax: move the dax_read_lock() locking into dax_supported
    dax: mark dax_get_by_host static
    dm: use fs_dax_get_by_bdev instead of dax_get_by_host
    dax: stop using bdevname
    fsdax: improve the FS_DAX Kconfig description and help text
    libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind

    Linus Torvalds
     

03 Sep, 2021

1 commit

  • Pull overlayfs update from Miklos Szeredi:

    - Copy up immutable/append/sync/noatime attributes (Amir Goldstein)

    - Improve performance by enabling RCU lookup.

    - Misc fixes and improvements

    The reason this touches so many files is that the ->get_acl() method now
    gets a "bool rcu" argument. The ->get_acl() API was updated based on
    comments from Al and Linus:

    Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/

    * tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: enable RCU'd ->get_acl()
    vfs: add rcu argument to ->get_acl() callback
    ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
    ovl: use kvalloc in xattr copy-up
    ovl: update ctime when changing fileattr
    ovl: skip checking lower file's i_writecount on truncate
    ovl: relax lookup error on mismatch origin ftype
    ovl: do not set overlay.opaque for new directories
    ovl: add ovl_allow_offline_changes() helper
    ovl: disable decoding null uuid with redirect_dir
    ovl: consistent behavior for immutable/append-only inodes
    ovl: copy up sync/noatime fileattr flags
    ovl: pass ovl_fs to ovl_check_setxattr()
    fs: add generic helper for filling statx attribute flags

    Linus Torvalds
     

25 Aug, 2021

1 commit

  • Dan reported a new smatch warning [1]
    "fs/erofs/inode.c:210 erofs_read_inode() error: double free of 'copied'"

    Due to new chunk-based format handling logic, the error path can be
    called after kfree(copied).

    Set "copied = NULL" after kfree(copied) to fix this.

    [1] https://lore.kernel.org/r/202108251030.bELQozR7-lkp@intel.com

    Link: https://lore.kernel.org/r/20210825120757.11034-1-hsiangkao@linux.alibaba.com
    Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files")
    Reported-by: kernel test robot
    Reported-by: Dan Carpenter
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

20 Aug, 2021

2 commits

  • Add runtime support for chunk-based uncompressed files
    described in the previous patch.

    Link: https://lore.kernel.org/r/20210820100019.208490-2-hsiangkao@linux.alibaba.com
    Reviewed-by: Liu Bo
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Currently, uncompressed data except for tail-packing inline is
    consecutive on disk.

    In order to support chunk-based data deduplication, add a new
    corresponding inode data layout.

    In the future, the data source of chunks can be either (un)compressed.

    Link: https://lore.kernel.org/r/20210820100019.208490-1-hsiangkao@linux.alibaba.com
    Reviewed-by: Liu Bo
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

19 Aug, 2021

3 commits

  • Add a rcu argument to the ->get_acl() callback to allow
    get_cached_acl_rcu() to call the ->get_acl() method in the next patch.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This adds fiemap support for both uncompressed files and compressed
    files by using iomap infrastructure.

    Link: https://lore.kernel.org/r/20210813052931.203280-3-hsiangkao@linux.alibaba.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Previously, there is no need to get the full decompressed length since
    EROFS supports partial decompression. However for some other cases
    such as fiemap, the full decompressed length is necessary for iomap to
    make it work properly.

    This patch adds a way to get the full decompressed length. Note that
    it takes more metadata overhead and it'd be avoided if possible in the
    performance sensitive scenario.

    Link: https://lore.kernel.org/r/20210818152231.243691-1-hsiangkao@linux.alibaba.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

11 Aug, 2021

2 commits


10 Aug, 2021

3 commits

  • Since tail-packing inline has been supported by iomap now, let's
    convert all EROFS uncompressed data I/O to iomap, which is pretty
    straight-forward.

    Link: https://lore.kernel.org/r/20210805003601.183063-4-hsiangkao@linux.alibaba.com
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • DAX is quite useful for some VM use cases in order to save guest
    memory extremely with minimal lightweight EROFS.

    In order to prepare for such use cases, add preliminary dax support
    for non-tailpacking regular files for now.

    Tested with the DRAM-emulated PMEM and the EROFS image generated by
    "mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9"

    Link: https://lore.kernel.org/r/20210805003601.183063-3-hsiangkao@linux.alibaba.com
    Cc: nvdimm@lists.linux.dev
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Add iomap support for non-tailpacking uncompressed data in order to
    support DIO and DAX.

    Direct I/O is useful in certain scenarios for uncompressed files.
    For example, double pagecache can be avoid by direct I/O when
    loop device is used for uncompressed files containing upper layer
    compressed filesystem.

    This adds iomap DIO support for non-tailpacking cases first and
    tail-packing inline files are handled in the follow-up patch.

    Link: https://lore.kernel.org/r/20210805003601.183063-2-hsiangkao@linux.alibaba.com
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Chao Yu
    Signed-off-by: Huang Jianan
    Signed-off-by: Gao Xiang

    Huang Jianan
     

08 Jun, 2021

3 commits

  • - Remove my outdated misleading email address;

    - Get rid of all unnecessary trailing newline by accident.

    Link: https://lore.kernel.org/r/20210602160634.10757-1-xiang@kernel.org
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • No any behavior to variable occupied in z_erofs_attach_page() which
    is only caller to z_erofs_pagevec_enqueue().

    Link: https://lore.kernel.org/r/20210419102623.2015-1-zbestahu@gmail.com
    Signed-off-by: Yue Hu
    Reviewed-by: Gao Xiang
    Signed-off-by: Gao Xiang

    Yue Hu
     
  • 'ret' will be overwritten to 0 if erofs_sb_has_sb_chksum() return true,
    thus 0 will return in some error handling cases. Fix to return negative
    error code -EINVAL instead of 0.

    Link: https://lore.kernel.org/r/20210519141657.3062715-1-weiyongjun1@huawei.com
    Fixes: b858a4844cfb ("erofs: support superblock checksum")
    Cc: stable # 5.5+
    Reported-by: Hulk Robot
    Signed-off-by: Wei Yongjun
    Reviewed-by: Gao Xiang
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Wei Yongjun
     

13 May, 2021

1 commit

  • If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
    rather than a HEAD or PLAIN type instead, which means its pclustersize
    _must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
    as illustrated below:

    HEAD HEAD / PLAIN lcluster type
    ____________ ____________
    |_:__________|_________:__| file data (uncompressed)
    . .
    .____________.
    |____________| pcluster data (compressed)

    Such on-disk case was explained before [1] but missed to be handled
    properly in the runtime implementation.

    It can be observed if manually generating 1 lcluster-sized pcluster
    with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.

    [1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org

    Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org
    Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes")
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

10 Apr, 2021

9 commits

  • Enable COMPR_CFGS and BIG_PCLUSTER since the implementations are
    all settled properly.

    Link: https://lore.kernel.org/r/20210407043927.10623-11-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Prior to big pcluster, there was only one compressed page so it'd
    easy to map this. However, when big pcluster is enabled, more work
    needs to be done to handle multiple compressed pages. In detail,

    - (maptype 0) if there is only one compressed page + no need
    to copy inplace I/O, just map it directly what we did before;

    - (maptype 1) if there are more compressed pages + no need to
    copy inplace I/O, vmap such compressed pages instead;

    - (maptype 2) if inplace I/O needs to be copied, use per-CPU
    buffers for decompression then.

    Another thing is how to detect inplace decompression is feasable or
    not (it's still quite easy for non big pclusters), apart from the
    inplace margin calculation, inplace I/O page reusing order is also
    needed to be considered for each compressed page. Currently, if the
    compressed page is the xth page, it shouldn't be reused as [0 ...
    nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.

    Although there are some extra optimization ideas for this, I'd like
    to make big pcluster work correctly first and obviously it can be
    further optimized later since it has nothing with the on-disk format
    at all.

    Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Different from non-compact indexes, several lclusters are packed
    as the compact form at once and an unique base blkaddr is stored for
    each pack, so each lcluster index would take less space on avarage
    (e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER
    switch should be consistent for compact head0/1.

    Prior to big pcluster, the size of all pclusters was 1 lcluster.
    Therefore, when a new HEAD lcluster was scanned, blkaddr would be
    bumped by 1 lcluster. However, that way doesn't work anymore for
    big pcluster since we actually don't know the compressed size of
    pclusters in advance (before reading CBLKCNT lcluster).

    So, instead, let blkaddr of each pack be the first pcluster blkaddr
    with a valid CBLKCNT, in detail,

    1) if CBLKCNT starts at the pack, this first valid pcluster is
    itself, e.g.
    _____________________________________________________________
    |_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ...
    ^ = blkaddr base ^ += CBLKCNT0 ^ += CBLKCNT1

    2) if CBLKCNT doesn't start at the pack, the first valid pcluster
    is the next pcluster, e.g.
    _________________________________________________________
    | NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ...
    ^ = blkaddr base ^ += CBLKCNT0
    ^ += 1

    When a CBLKCNT is found, blkaddr will be increased by CBLKCNT
    lclusters, or a new HEAD is found immediately, bump blkaddr by 1
    instead (see the picture above.)

    Also noted if CBLKCNT is the end of the pack, instead of storing
    delta1 (distance of the next HEAD lcluster) as normal NONHEADs,
    it still uses the compressed block count (delta0) since delta1
    can be calculated indirectly but the block count can't.

    Adjust decoding logic to fit big pcluster compact indexes as well.

    Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
    will also have the same on-disk header compact indexes to keep per-file
    configurations instead of leaving it zeroed.

    If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
    pcluster in this file by parsing 1st non-head lcluster.

    Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Adjust per-CPU buffers on demand since big pcluster definition is
    available. Also, bail out unsupported pcluster size according to
    Z_EROFS_PCLUSTER_MAX_SIZE.

    Link: https://lore.kernel.org/r/20210407043927.10623-7-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Big pcluster indicates the size of compressed data for each physical
    pcluster is no longer fixed as block size, but could be more than 1
    block (more accurately, 1 logical pcluster)

    When big pcluster feature is enabled for head0/1, delta0 of the 1st
    non-head lcluster index will keep block count of this pcluster in
    lcluster size instead of 1. Or, the compressed size of pcluster
    should be 1 lcluster if pcluster has no non-head lcluster index.

    Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
    it depends on COMPR_CFGS and will be released together.

    Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • When picking up inplace I/O pages, it should be traversed in reverse
    order in aligned with the traversal order of file-backed online pages.
    Also, index should be updated together when preloading compressed pages.

    Previously, only page-sized pclustersize was supported so no problem
    at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
    functionality.

    Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Since multiple pcluster sizes could be used at once, the number of
    compressed pages will become a variable factor. It's necessary to
    introduce slab pools rather than a single slab cache now.

    This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and
    get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no
    use now.

    Link: https://lore.kernel.org/r/20210407043927.10623-4-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • To deal the with the cases which inplace decompression is infeasible
    for some inplace I/O. Per-CPU buffers was introduced to get rid of page
    allocation latency and thrash for low-latency decompression algorithms
    such as lz4.

    For the big pcluster feature, introduce multipage per-CPU buffers to
    keep such inplace I/O pclusters temporarily as well but note that
    per-CPU pages are just consecutive virtually.

    When a new big pcluster fs is mounted, its max pclustersize will be
    read and per-CPU buffers can be growed if needed. Shrinking adjustable
    per-CPU buffers is more complex (because we don't know if such size
    is still be used), so currently just release them all when unloading.

    Link: https://lore.kernel.org/r/20210409190630.19569-1-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

07 Apr, 2021

1 commit

  • Formal big pcluster design is actually more powerful / flexable than
    the previous thought whose pclustersize was fixed as power-of-2 blocks,
    which was obviously inefficient and space-wasting. Instead, pclustersize
    can now be set independently for each pcluster, so various pcluster
    sizes can also be used together in one file if mkfs wants (for example,
    according to data type and/or compression ratio).

    Let's get rid of previous physical_clusterbits[] setting (also notice
    that corresponding on-disk fields are still 0 for now). Therefore,
    head1/2 can be used for at most 2 different algorithms in one file and
    again pclustersize is now independent of these.

    Link: https://lore.kernel.org/r/20210407043927.10623-2-xiang@kernel.org
    Acked-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

03 Apr, 2021

1 commit


29 Mar, 2021

4 commits

  • Add a bitmap for available compression algorithms and a variable-sized
    on-disk table for compression options in preparation for upcoming big
    pcluster and LZMA algorithm, which follows the end of super block.

    To parse the compression options, the bitmap is scanned one by one.
    For each available algorithm, there is data followed by 2-byte `length'
    correspondingly (it's enough for most cases, or entire fs blocks should
    be used.)

    With such available algorithm bitmap, kernel itself can also refuse to
    mount such filesystem if any unsupported compression algorithm exists.

    Note that COMPR_CFGS feature will be enabled with BIG_PCLUSTER.

    Link: https://lore.kernel.org/r/20210329100012.12980-1-hsiangkao@aol.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Introduce z_erofs_lz4_cfgs to store all lz4 configurations.
    Currently it's only max_distance, but will be used for new
    features later.

    Link: https://lore.kernel.org/r/20210329012308.28743-4-hsiangkao@aol.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • lz4 uses LZ4_DISTANCE_MAX to record history preservation. When
    using rolling decompression, a block with a higher compression
    ratio will cause a larger memory allocation (up to 64k). It may
    cause a large resource burden in extreme cases on devices with
    small memory and a large number of concurrent IOs. So appropriately
    reducing this value can improve performance.

    Decreasing this value will reduce the compression ratio (except
    when input_size
    Signed-off-by: Huang Jianan
    Signed-off-by: Guo Weichao
    [ Gao Xiang: introduce struct erofs_sb_lz4_info for configurations. ]
    Signed-off-by: Gao Xiang

    Huang Jianan
     
  • Introduce erofs_sb_has_xxx() to make long checks short, especially
    for later big pcluster & LZMA features.

    Link: https://lore.kernel.org/r/20210329012308.28743-2-hsiangkao@aol.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang