03 Mar, 2020

4 commits

  • As Lasse pointed out, "Looking at fs/erofs/decompress.c,
    the return value from LZ4_decompress_safe_partial is only
    checked for negative value to catch errors. ... So if
    I understood it correctly, if there is bad data whose
    uncompressed size is much less than it should be, it can
    leave part of the output buffer untouched and expose the
    previous data as the file content. "

    Let's fix it now.

    Cc: Lasse Collin
    Fixes: 7fc45dbc938a ("staging: erofs: introduce generic decompression backend")
    [ Gao Xiang: v5.3+, I will manually backport this to stable later. ]
    Link: https://lore.kernel.org/r/20200226081008.86348-3-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • As Lasse pointed out, "EROFS uses LZ4_decompress_safe_partial
    for both partial and full blocks. Thus when it is decoding a
    full block, it doesn't know if the LZ4 decoder actually decoded
    all the input. The real uncompressed size could be bigger than
    the value stored in the file system metadata.

    Using LZ4_decompress_safe instead of _safe_partial when
    decompressing a full block would help to detect errors."

    So it's reasonable to use _safe in case of potential corrupted
    images and it might have some speed gain as well although
    I didn't observe much difference.

    Note that legacy compressor (< 5.3, no LZ4_0PADDING) could
    encode extra data in a pcluster, which is excluded as well.

    Cc: Lasse Collin
    Fixes: 0ffd71bcc3a0 ("staging: erofs: introduce LZ4 decompression inplace")
    [ Gao Xiang: v5.3+, I will manually backport this to stable later. ]
    Link: https://lore.kernel.org/r/20200226081008.86348-2-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • The remaining count should not include successful
    shrink attempts.

    Fixes: e7e9a307be9d ("staging: erofs: introduce workstation for decompression")
    Cc: # 4.19+
    Link: https://lore.kernel.org/r/20200226081008.86348-1-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • XArray has friendly APIs and it will replace the old radix
    tree in the near future.

    This convert makes use of __xa_cmpxchg when inserting on
    a just inserted item by other thread. In detail, instead
    of totally looking up again as what we did for the old
    radix tree, it will try to legitimize the current in-tree
    item in the XArray therefore more effective.

    In addition, naming is rather a challenge for non-English
    speaker like me. The basic idea of workstn is to provide
    a runtime sparse array with items arranged in the physical
    block number order. Such items (was called workgroup) can be
    used to record compress clusters or for later new features.

    However, both workgroup and workstn seem not good names from
    whatever point of view, so I'd like to rename them as pslot
    and managed_pslots to stand for physical slots. This patch
    handles the second as a part of the radix tree convert.

    Cc: Matthew Wilcox
    Link: https://lore.kernel.org/r/20200220024642.91529-1-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

21 Jan, 2020

2 commits


11 Jan, 2020

1 commit

  • rq->out[1] should be valid before accessing. Otherwise,
    in very rare cases, out-of-bound dirty onstack rq->out[1]
    can equal to *in and lead to unintended memmove behavior.

    Link: https://lore.kernel.org/r/20200107022546.19432-1-gaoxiang25@huawei.com
    Fixes: 7fc45dbc938a ("staging: erofs: introduce generic decompression backend")
    Cc: # 5.3+
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

07 Jan, 2020

4 commits


12 Dec, 2019

1 commit

  • Pull erofs fixes from Gao Xiang:
    "Mainly address a regression reported by David recently observed
    together with overlayfs due to the improper return value of
    listxattr() without xattr. Update outdated expressions in document as
    well.

    Summary:

    - Fix improper return value of listxattr() with no xattr

    - Keep up documentation with latest code"

    * tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
    erofs: update documentation
    erofs: zero out when listxattr is called with no xattr

    Linus Torvalds
     

04 Dec, 2019

1 commit

  • As David reported [1], ENODATA returns when attempting
    to modify files by using EROFS as an overlayfs lower layer.

    The root cause is that listxattr could return unexpected
    -ENODATA by mistake for inodes without xattr. That breaks
    listxattr return value convention and it can cause copy
    up failure when used with overlayfs.

    Resolve by zeroing out if no xattr is found for listxattr.

    [1] https://lore.kernel.org/r/CAEvUa7nxnby+rxK-KRMA46=exeOMApkDMAV08AjMkkPnTPV4CQ@mail.gmail.com
    Link: https://lore.kernel.org/r/20191201084040.29275-1-hsiangkao@aol.com
    Fixes: cadf1ccf1b00 ("staging: erofs: add error handling for xattr submodule")
    Cc: # 4.19+
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

24 Nov, 2019

6 commits

  • We have already handled cache_strategy option carefully,
    so incorrect setting could not pass option parsing.
    Meanwhile, print 'cache_strategy=(unknown)' can cause
    failure on remount.

    Link: https://lore.kernel.org/r/20191119115049.3401-1-cgxu519@mykernel.net
    Signed-off-by: Chengguang Xu
    Reviewed-by: Gao Xiang
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Chengguang Xu
     
  • VLE was an old informal name of fixed-sized output
    compression which came from published ATC'19 paper [1].

    Drop those old annotations since erofs can handle
    all encoded clusters in block-aligned basis, which
    is wider than fixed-sized output compression after
    larger clustersize feature is fully implemented.

    Unaligned encoding won't be considered in EROFS
    since it's not friendly to inplace I/O and perhaps
    decompression inplace.

    a) Fixed-sized output compression with 16KB pcluster:
    ___________________________________
    |xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|
    |___ 0___|___ 1___|___ 2___|___ 3___| physical blocks

    b) Block-aligned fixed-sized input compression with
    16KB pcluster:
    ___________________________________
    |xxxxxxxx|xxxxxxxx|xxxxxxxx|xxx00000|
    |___ 0___|___ 1___|___ 2___|___ 3___| physical blocks

    c) Block-unaligned fixed-sized input compression with
    16KB compression unit:
    ____________________________________________
    |..xxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|x.......|
    |___ 0___|___ 1___|___ 2___|___ 3___|___ 4___| physical blocks

    Refine better names for those as well.

    [1] https://www.usenix.org/conference/atc19/presentation/gao

    Link: https://lore.kernel.org/r/20191108033733.63919-1-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Introduce superblock checksum feature in order to
    check at mounting time.

    Note that the first 1024 bytes are ignore for x86
    boot sectors and other oddities.

    Link: https://lore.kernel.org/r/20191104024937.113939-1-gaoxiang25@huawei.com
    Signed-off-by: Pratik Shinde
    Reviewed-by: Chao Yu
    Cc: Dan Carpenter
    Signed-off-by: Gao Xiang

    Pratik Shinde
     
  • For those tasks waiting I/O for sync decompression,
    they should be better marked as IO wait state.

    Link: https://lore.kernel.org/r/20191008125616.183715-5-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Previously, both z_erofs_unzip_io and z_erofs_unzip_io_sb
    record decompress queues for backend to use.

    The only difference is that z_erofs_unzip_io is used for
    on-stack sync decompression so that it doesn't have a super
    block field (since the caller can pass it in its context),
    but it increases complexity with only a pointer saving.

    Rename z_erofs_unzip_io to z_erofs_decompressqueue with
    a fixed super_block member and kill the other entirely,
    and it can fallback to sync decompression if memory
    allocation failure.

    Link: https://lore.kernel.org/r/20191008125616.183715-4-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • Now open code is much cleaner due to iterative development.

    Link: https://lore.kernel.org/r/20191124025217.12345-1-hsiangkao@aol.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

16 Oct, 2019

2 commits

  • After commit 4279f3f9889f ("staging: erofs: turn cache
    strategies into mount options"), cache strategies are
    changed into mount options rather than old build configs.

    Let's kill useless code for obsoleted build options.

    Link: https://lore.kernel.org/r/20191008125616.183715-2-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • - change return value to int since collection is
    already returned within the collector.
    - better function naming.

    Link: https://lore.kernel.org/r/20191008125616.183715-1-gaoxiang25@huawei.com
    Reviewed-by: Chao Yu
    Signed-off-by: Gao Xiang

    Gao Xiang
     

01 Oct, 2019

3 commits

  • Fix a recent cleanup patch. noio (bypass) chain is
    handled asynchronously against submit chain, therefore
    inplace I/O or pagevec cannot be applied to such pages.
    Add detailed comment for this as well.

    Fixes: 97e86a858bc3 ("staging: erofs: tidy up decompression frontend")
    Reviewed-by: Chao Yu
    Link: https://lore.kernel.org/r/20190922100434.229340-1-gaoxiang25@huawei.com
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • After doing more drop_caches stress test on
    our products, I found the mistake introduced by
    a very recent cleanup [1].

    The current rule is that "erofs_get_meta_page"
    should be returned with page locked (although
    it's mostly unnecessary for read-only fs after
    pages are PG_uptodate), but a fix should be
    done for this.

    [1] https://lore.kernel.org/r/20190904020912.63925-26-gaoxiang25@huawei.com
    Fixes: 618f40ea026b ("erofs: use read_cache_page_gfp for erofs_get_meta_page")
    Reviewed-by: Chao Yu
    Link: https://lore.kernel.org/r/20190921184355.149928-1-gaoxiang25@huawei.com
    Signed-off-by: Gao Xiang

    Gao Xiang
     
  • In case of error, the function read_mapping_page() returns
    ERR_PTR() not NULL. The NULL test in the return value check
    should be replaced with IS_ERR().

    Fixes: fe7c2423570d ("erofs: use read_mapping_page instead of sb_bread")
    Reviewed-by: Gao Xiang
    Reviewed-by: Chao Yu
    Signed-off-by: Wei Yongjun
    Link: https://lore.kernel.org/r/20190918083033.47780-1-weiyongjun1@huawei.com
    Signed-off-by: Gao Xiang

    Wei Yongjun
     

06 Sep, 2019

16 commits

  • As Christoph said [1], "I'd much prefer to just use
    read_cache_page_gfp, and live with the fact that this
    allocates bufferheads behind you for now. I'll try to
    speed up my attempts to get rid of the buffer heads on
    the block device mapping instead. "

    This simplifies the code a lot and a minor thing is
    "no REQ_META (e.g. for blktrace) on metadata at all..."

    [1] https://lore.kernel.org/r/20190903153704.GA2201@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-26-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph said [1] [2], "Just use the slightly
    more complicated 32-bit version everywhere so that
    you have a single actually tested code path.
    And then remove this helper. "

    [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
    [2] https://lore.kernel.org/r/20190902125320.GA16726@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-25-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph said [1], "This seems to be your only direct
    use of buffer heads, which while not deprecated are a bit
    of an ugly step child. So if you can easily avoid creating
    a buffer_head dependency in a new filesystem I think you
    should avoid it. "

    [1] https://lore.kernel.org/r/20190902125109.GA9826@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-24-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • Add prefix "erofs_" to these functions and print
    sb->s_id as a prefix to erofs_{err, info} so that
    the user knows which file system is affected.

    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-23-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph said [1], ".. and save one
    level of indentation."

    [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-22-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph said [1],
    "vm_map_ram is supposed to generally behave better. So if
    it doesn't please report that that to the arch maintainer
    and linux-mm so that they can look into the issue. Having
    user make choices of deep down kernel internals is just
    a horrible interface.

    Please talk to maintainers of other bits of the kernel
    if you see issues and / or need enhancements. "

    Let's redo the previous conclusion and kill the vmap
    approach.

    [1] https://lore.kernel.org/r/20190830165533.GA10909@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-21-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph suggested [1], "Please just use plain kmalloc
    everywhere and let the normal kernel error injection code
    take care of injeting any errors."

    [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-20-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • Add erofs_ prefix to free_inode, alloc_inode, ...

    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-19-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph pointed out [1], "
    Why is there __submit_bio which really just obsfucates
    what is going on? Also why is __submit_bio using
    bio_set_op_attrs instead of opencode it as the comment
    right next to it asks you to? "

    Let's use submit_bio directly instead.

    [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-18-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph pointed out [1],
    "Why is there __erofs_get_meta_page with the two weird
    booleans instead of a single erofs_get_meta_page that
    gets and gfp_t for additional flags and an unsigned int
    for additional bio op flags."

    And since all callers can handle errors, let's kill
    prio and nofail and erofs_get_inline_page() now.

    [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-17-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph pointed out [1], "erofs_grab_bio tries to
    handle a bio_alloc failure, except that the function will
    not actually fail due the mempool backing it."

    Sorry about useless code, fix it now and
    localize erofs_grab_bio [2].

    [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/
    [2] https://lore.kernel.org/r/20190902122016.GL15931@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-16-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph said [1], "That is some very verbose
    debug info. We usually don't add that and let
    people trace the function instead. "

    [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-15-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph pointed out [1], "Why is the variable name
    for the on-disk subperblock layout? We usually still
    calls this something with sb in the name, e.g. dsb.
    for disksuper block. " Let's fix it.

    [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-14-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • Fix as Christoph suggested [1] [2], "remove is_inode_fast_symlink
    and just opencode it in the few places using it"

    and
    "Please just set the ops directly instead of obsfucating that in
    a single caller, single line inline function. And please set it
    instead of the normal symlink iops in the same place where you
    also set those."

    [1] https://lore.kernel.org/r/20190830163910.GB29603@infradead.org/
    [2] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-13-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph suggested [1], update them all.

    [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-12-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang
     
  • As Christoph said [1] [2], update it now.

    [1] https://lore.kernel.org/r/20190902124521.GA22153@infradead.org/
    [2] https://lore.kernel.org/r/20190902120548.GB15931@infradead.org/
    Reported-by: Christoph Hellwig
    Signed-off-by: Gao Xiang
    Link: https://lore.kernel.org/r/20190904020912.63925-11-gaoxiang25@huawei.com
    Signed-off-by: Greg Kroah-Hartman

    Gao Xiang