08 Apr, 2022

10 commits

  • commit 705757274599e2e064dd3054aabc74e8af31a095 upstream.

    When renaming the whiteout file, the old whiteout file is not deleted.
    Therefore, we add the old dentry size to the old dir like XFS.
    Otherwise, an error may be reported due to `fscki->calc_sz != fscki->size`
    in check_indes.

    Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
    Reported-by: Zhihao Cheng
    Signed-off-by: Baokun Li
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Baokun Li
     
  • commit 3b67db8a6ca83e6ff90b756d3da0c966f61cd37b upstream.

    MM defined the rule [1] very clearly that once page was set with PG_private
    flag, we should increment the refcount in that page, also main flows like
    pageout(), migrate_page() will assume there is one additional page
    reference count if page_has_private() returns true. Otherwise, we may
    get a BUG in page migration:

    page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
    index:0xe2 pfn:0x14c12
    aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
    flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
    zone=1|lastcpupid=0x1fffff)
    page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
    ------------[ cut here ]------------
    kernel BUG at include/linux/page_ref.h:184!
    invalid opcode: 0000 [#1] SMP
    CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
    RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
    Call Trace:
    ubifs_migrate_page+0x22/0xc0 [ubifs]
    move_to_new_page+0xb4/0x600
    migrate_pages+0x1523/0x1cc0
    compact_zone+0x8c5/0x14b0
    kcompactd+0x2bc/0x560
    kthread+0x18c/0x1e0
    ret_from_fork+0x1f/0x30

    Before the time, we should make clean a concept, what does refcount means
    in page gotten from grab_cache_page_write_begin(). There are 2 situations:
    Situation 1: refcount is 3, page is created by __page_cache_alloc.
    TYPE_A - the write process is using this page
    TYPE_B - page is assigned to one certain mapping by calling
    __add_to_page_cache_locked()
    TYPE_C - page is added into pagevec list corresponding current cpu by
    calling lru_cache_add()
    Situation 2: refcount is 2, page is gotten from the mapping's tree
    TYPE_B - page has been assigned to one certain mapping
    TYPE_A - the write process is using this page (by calling
    page_cache_get_speculative())
    Filesystem releases one refcount by calling put_page() in xxx_write_end(),
    the released refcount corresponds to TYPE_A (write task is using it). If
    there are any processes using a page, page migration process will skip the
    page by judging whether expected_page_refs() equals to page refcount.

    The BUG is caused by following process:
    PA(cpu 0) kcompactd(cpu 1)
    compact_zone
    ubifs_write_begin
    page_a = grab_cache_page_write_begin
    add_to_page_cache_lru
    lru_cache_add
    pagevec_add // put page into cpu 0's pagevec
    (refcnf = 3, for page creation process)
    ubifs_write_end
    SetPagePrivate(page_a) // doesn't increase page count !
    unlock_page(page_a)
    put_page(page_a) // refcnt = 2
    [...]

    PB(cpu 0)
    filemap_read
    filemap_get_pages
    add_to_page_cache_lru
    lru_cache_add
    __pagevec_lru_add // traverse all pages in cpu 0's pagevec
    __pagevec_lru_add_fn
    SetPageLRU(page_a)
    isolate_migratepages
    isolate_migratepages_block
    get_page_unless_zero(page_a)
    // refcnt = 3
    list_add(page_a, from_list)
    migrate_pages(from_list)
    __unmap_and_move
    move_to_new_page
    ubifs_migrate_page(page_a)
    migrate_page_move_mapping
    expected_page_refs get 3
    (migration[1] + mapping[1] + private[1])
    release_pages
    put_page_testzero(page_a) // refcnt = 3
    page_ref_freeze // refcnt = 0
    page_ref_dec_and_test(0 - 1 = -1)
    page_ref_unfreeze
    VM_BUG_ON_PAGE(-1 != 0, page)

    UBIFS doesn't increase the page refcount after setting private flag, which
    leads to page migration task believes the page is not used by any other
    processes, so the page is migrated. This causes concurrent accessing on
    page refcount between put_page() called by other process(eg. read process
    calls lru_cache_add) and page_ref_unfreeze() called by migration task.

    Actually zhangjun has tried to fix this problem [2] by recalculating page
    refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
    just like Kirill suggested in [2], we need to check all users of
    page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
    refcount when setting/clearing private for a page. BTW, according to [4],
    we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
    And, [5] provided a common helper to set/clear page private, ubifs can
    use this helper following the example of iomap, afs, btrfs, etc.

    Jump [6] to find a reproducer.

    [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com
    [2] https://www.spinics.net/lists/linux-mtd/msg04018.html
    [3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
    [4] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org
    [5] https://lore.kernel.org/all/20200517214718.468-1-guoqing.jiang@cloud.ionos.com
    [6] https://bugzilla.kernel.org/show_bug.cgi?id=214961

    Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit 4f2262a334641e05f645364d5ade1f565c85f20b upstream.

    Function ubifs_wbuf_write_nolock() may access buf out of bounds in
    following process:

    ubifs_wbuf_write_nolock():
    aligned_len = ALIGN(len, 8); // Assume len = 4089, aligned_len = 4096
    if (aligned_len avail) ... // Not satisfy
    if (wbuf->used) {
    ubifs_leb_write() // Fill some data in avail wbuf
    len -= wbuf->avail; // len is still not 8-bytes aligned
    aligned_len -= wbuf->avail;
    }
    n = aligned_len >> c->max_write_shift;
    if (n) {
    n <max_write_shift;
    err = ubifs_leb_write(c, wbuf->lnum, buf + written,
    wbuf->offs, n);
    // n > len, read out of bounds less than 8(n-len) bytes
    }

    , which can be catched by KASAN:
    =========================================================
    BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0
    Read of size 4 at addr ffff888105594ff8 by task kworker/u8:4/128
    Workqueue: writeback wb_workfn (flush-ubifs_0_0)
    Call Trace:
    kasan_report.cold+0x81/0x165
    nand_write_page_swecc+0xa9/0x160
    ubifs_leb_write+0xf2/0x1b0 [ubifs]
    ubifs_wbuf_write_nolock+0x421/0x12c0 [ubifs]
    write_head+0xdc/0x1c0 [ubifs]
    ubifs_jnl_write_inode+0x627/0x960 [ubifs]
    wb_workfn+0x8af/0xb80

    Function ubifs_wbuf_write_nolock() accepts that parameter 'len' is not 8
    bytes aligned, the 'len' represents the true length of buf (which is
    allocated in 'ubifs_jnl_xxx', eg. ubifs_jnl_write_inode), so
    ubifs_wbuf_write_nolock() must handle the length read from 'buf' carefully
    to write leb safely.

    Fetch a reproducer in [Link].

    Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=214785
    Reported-by: Chengsong Ke
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit 1b83ec057db16b4d0697dc21ef7a9743b6041f72 upstream.

    Make 'ui->data_len' aligned with 8 bytes before it is assigned to
    dirtied_ino_d. Since 8871d84c8f8b0c6b("ubifs: convert to fileattr")
    applied, 'setflags()' only affects regular files and directories, only
    xattr inode, symlink inode and special inode(pipe/char_dev/block_dev)
    have none- zero 'ui->data_len' field, so assertion
    '!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space().
    To avoid assertion fails in future evolution(eg. setflags can operate
    special inodes), it's better to make dirtied_ino_d 8 bytes aligned,
    after all aligned size is still zero for regular files.

    Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit a6dab6607d4681d227905d5198710b575dbdb519 upstream.

    UBIFS should make sure the flash has enough space to store dirty (Data
    that is newer than disk) data (in memory), space budget is exactly
    designed to do that. If space budget calculates less data than we need,
    'make_reservation()' will do more work(return -ENOSPC if no free space
    lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error
    -28" in ubifs error messages) with ubifs inodes locked, which may effect
    other syscalls.

    A simple way to decide how much space do we need when make a budget:
    See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx()
    function according to corresponding operation.

    It's better to report ENOSPC in ubifs_budget_space(), as early as we can.

    Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
    Fixes: 1e51764a3c2ac05 ("UBIFS: add new flash file system")
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit 60eb3b9c9f11206996f57cb89521824304b305ad upstream.

    'ui->dirty' is not protected by 'ui_mutex' in function do_tmpfile() which
    may race with ubifs_write_inode[wb_workfn] to access/update 'ui->dirty',
    finally dirty space is released twice.

    open(O_TMPFILE) wb_workfn
    do_tmpfile
    ubifs_budget_space(ino_req = { .dirtied_ino = 1})
    d_tmpfile // mark inode(tmpfile) dirty
    ubifs_jnl_update // without holding tmpfile's ui_mutex
    mark_inode_clean(ui)
    if (ui->dirty)
    ubifs_release_dirty_inode_budget(ui) // release first time
    ubifs_write_inode
    mutex_lock(&ui->ui_mutex)
    ubifs_release_dirty_inode_budget(ui)
    // release second time
    mutex_unlock(&ui->ui_mutex)
    ui->dirty = 0

    Run generic/476 can reproduce following message easily
    (See reproducer in [Link]):

    UBIFS error (ubi0:0 pid 2578): ubifs_assert_failed [ubifs]: UBIFS assert
    failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:554
    UBIFS warning (ubi0:0 pid 2578): ubifs_ro_mode [ubifs]: switched to
    read-only mode, error -22
    Workqueue: writeback wb_workfn (flush-ubifs_0_0)
    Call Trace:
    ubifs_ro_mode+0x54/0x60 [ubifs]
    ubifs_assert_failed+0x4b/0x80 [ubifs]
    ubifs_release_budget+0x468/0x5a0 [ubifs]
    ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs]
    ubifs_write_inode+0x121/0x1f0 [ubifs]
    ...
    wb_workfn+0x283/0x7b0

    Fix it by holding tmpfile ubifs inode lock during ubifs_jnl_update().
    Similar problem exists in whiteout renaming, but previous fix("ubifs:
    Rename whiteout atomically") has solved the problem.

    Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=214765
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit 278d9a243635f26c05ad95dcf9c5a593b9e04dc6 upstream.

    Currently, rename whiteout has 3 steps:
    1. create tmpfile(which associates old dentry to tmpfile inode) for
    whiteout, and store tmpfile to disk
    2. link whiteout, associate whiteout inode to old dentry agagin and
    store old dentry, old inode, new dentry on disk
    3. writeback dirty whiteout inode to disk

    Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
    memory allocation failure) during above steps may cause kinds of problems:
    Problem 1: ENOSPC returned by whiteout space budget (before step 2),
    old dentry will disappear after rename syscall, whiteout file
    cannot be found either.

    ls dir // we get file, whiteout
    rename(dir/file, dir/whiteout, REANME_WHITEOUT)
    ENOSPC = ubifs_budget_space(&wht_req) // return
    ls dir // empty (no file, no whiteout)
    Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
    is not stored on disk, whiteout dentry(old dentry) is written
    on disk, whiteout file is lost on next mount (We get "dead
    directory entry" after executing 'ls -l' on whiteout file).

    Now, we use following 3 steps to finish rename whiteout:
    1. create an in-mem inode with 'nlink = 1' as whiteout
    2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
    whiteout inode, associating new dentry with old inode)
    3. iput(whiteout)

    Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
    whiteout, which avoids middle disk state caused by suddenly power-cut
    and error occurring.

    Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit 716b4573026bcbfa7b58ed19fe15554bac66b082 upstream.

    whiteout inode should be put when do_tmpfile() failed if inode has been
    initialized. Otherwise we will get following warning during umount:
    UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS
    assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930
    VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds.

    Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
    Signed-off-by: Zhihao Cheng
    Suggested-by: Sascha Hauer
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit afd427048047e8efdedab30e8888044e2be5aa9c upstream.

    Following hung tasks:
    [ 77.028764] task:kworker/u8:4 state:D stack: 0 pid: 132
    [ 77.028820] Call Trace:
    [ 77.029027] schedule+0x8c/0x1b0
    [ 77.029067] mutex_lock+0x50/0x60
    [ 77.029074] ubifs_write_inode+0x68/0x1f0 [ubifs]
    [ 77.029117] __writeback_single_inode+0x43c/0x570
    [ 77.029128] writeback_sb_inodes+0x259/0x740
    [ 77.029148] wb_writeback+0x107/0x4d0
    [ 77.029163] wb_workfn+0x162/0x7b0

    [ 92.390442] task:aa state:D stack: 0 pid: 1506
    [ 92.390448] Call Trace:
    [ 92.390458] schedule+0x8c/0x1b0
    [ 92.390461] wb_wait_for_completion+0x82/0xd0
    [ 92.390469] __writeback_inodes_sb_nr+0xb2/0x110
    [ 92.390472] writeback_inodes_sb_nr+0x14/0x20
    [ 92.390476] ubifs_budget_space+0x705/0xdd0 [ubifs]
    [ 92.390503] do_rename.cold+0x7f/0x187 [ubifs]
    [ 92.390549] ubifs_rename+0x8b/0x180 [ubifs]
    [ 92.390571] vfs_rename+0xdb2/0x1170
    [ 92.390580] do_renameat2+0x554/0x770

    , are caused by concurrent rename whiteout and inode writeback processes:
    rename_whiteout(Thread 1) wb_workfn(Thread2)
    ubifs_rename
    do_rename
    lock_4_inodes (Hold ui_mutex)
    ubifs_budget_space
    make_free_space
    shrink_liability
    __writeback_inodes_sb_nr
    bdi_split_work_to_wbs (Queue new wb work)
    wb_do_writeback(wb work)
    __writeback_single_inode
    ubifs_write_inode
    LOCK(ui_mutex)

    wb_wait_for_completion (Wait wb work)
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     
  • commit 40a8f0d5e7b3999f096570edab71c345da812e3e upstream.

    'whiteout_ui->data' will be freed twice if space budget fail for
    rename whiteout operation as following process:

    rename_whiteout
    dev = kmalloc
    whiteout_ui->data = dev
    kfree(whiteout_ui->data) // Free first time
    iput(whiteout)
    ubifs_free_inode
    kfree(ui->data) // Double free!

    KASAN reports:
    ==================================================================
    BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70
    Call Trace:
    kfree+0x117/0x490
    ubifs_free_inode+0x4f/0x70 [ubifs]
    i_callback+0x30/0x60
    rcu_do_batch+0x366/0xac0
    __do_softirq+0x133/0x57f

    Allocated by task 1506:
    kmem_cache_alloc_trace+0x3c2/0x7a0
    do_rename+0x9b7/0x1150 [ubifs]
    ubifs_rename+0x106/0x1f0 [ubifs]
    do_syscall_64+0x35/0x80

    Freed by task 1506:
    kfree+0x117/0x490
    do_rename.cold+0x53/0x8a [ubifs]
    ubifs_rename+0x106/0x1f0 [ubifs]
    do_syscall_64+0x35/0x80

    The buggy address belongs to the object at ffff88810238bed8 which
    belongs to the cache kmalloc-8 of size 8
    ==================================================================

    Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused
    assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode()
    -> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it
    (because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release',
    and the nlink of whiteout inode is 0).

    Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Zhihao Cheng
     

27 Jan, 2022

1 commit

  • commit 3fea4d9d160186617ff40490ae01f4f4f36b28ff upstream.

    it seems freeing the write buffers in the error path of the
    ubifs_remount_rw() is wrong. It leads later to a kernel oops like this:

    [10016.431274] UBIFS (ubi0:0): start fixing up free space
    [10090.810042] UBIFS (ubi0:0): free space fixup complete
    [10090.814623] UBIFS error (ubi0:0 pid 512): ubifs_remount_fs: cannot
    spawn "ubifs_bgt0_0", error -4
    [10101.915108] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started,
    PID 517
    [10105.275498] Unable to handle kernel NULL pointer dereference at
    virtual address 0000000000000030
    [10105.284352] Mem abort info:
    [10105.287160] ESR = 0x96000006
    [10105.290252] EC = 0x25: DABT (current EL), IL = 32 bits
    [10105.295592] SET = 0, FnV = 0
    [10105.298652] EA = 0, S1PTW = 0
    [10105.301848] Data abort info:
    [10105.304723] ISV = 0, ISS = 0x00000006
    [10105.308573] CM = 0, WnR = 0
    [10105.311564] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000f03d1000
    [10105.318034] [0000000000000030] pgd=00000000f6cee003,
    pud=00000000f4884003, pmd=0000000000000000
    [10105.326783] Internal error: Oops: 96000006 [#1] PREEMPT SMP
    [10105.332355] Modules linked in: ath10k_pci ath10k_core ath mac80211
    libarc4 cfg80211 nvme nvme_core cryptodev(O)
    [10105.342468] CPU: 3 PID: 518 Comm: touch Tainted: G O
    5.4.3 #1
    [10105.349517] Hardware name: HYPEX CPU (DT)
    [10105.353525] pstate: 40000005 (nZcv daif -PAN -UAO)
    [10105.358324] pc : atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
    [10105.364596] lr : mutex_lock+0x1c/0x34
    [10105.368253] sp : ffff000075633aa0
    [10105.371563] x29: ffff000075633aa0 x28: 0000000000000001
    [10105.376874] x27: ffff000076fa80c8 x26: 0000000000000004
    [10105.382185] x25: 0000000000000030 x24: 0000000000000000
    [10105.387495] x23: 0000000000000000 x22: 0000000000000038
    [10105.392807] x21: 000000000000000c x20: ffff000076fa80c8
    [10105.398119] x19: ffff000076fa8000 x18: 0000000000000000
    [10105.403429] x17: 0000000000000000 x16: 0000000000000000
    [10105.408741] x15: 0000000000000000 x14: fefefefefefefeff
    [10105.414052] x13: 0000000000000000 x12: 0000000000000fe0
    [10105.419364] x11: 0000000000000fe0 x10: ffff000076709020
    [10105.424675] x9 : 0000000000000000 x8 : 00000000000000a0
    [10105.429986] x7 : ffff000076fa80f4 x6 : 0000000000000030
    [10105.435297] x5 : 0000000000000000 x4 : 0000000000000000
    [10105.440609] x3 : 0000000000000000 x2 : ffff00006f276040
    [10105.445920] x1 : ffff000075633ab8 x0 : 0000000000000030
    [10105.451232] Call trace:
    [10105.453676] atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
    [10105.459600] ubifs_garbage_collect+0xb4/0x334
    [10105.463956] ubifs_budget_space+0x398/0x458
    [10105.468139] ubifs_create+0x50/0x180
    [10105.471712] path_openat+0x6a0/0x9b0
    [10105.475284] do_filp_open+0x34/0x7c
    [10105.478771] do_sys_open+0x78/0xe4
    [10105.482170] __arm64_sys_openat+0x1c/0x24
    [10105.486180] el0_svc_handler+0x84/0xc8
    [10105.489928] el0_svc+0x8/0xc
    [10105.492808] Code: 52800013 17fffffb d2800003 f9800011 (c85ffc05)
    [10105.498903] ---[ end trace 46b721d93267a586 ]---

    To reproduce the problem:

    1. Filesystem initially mounted read-only, free space fixup flag set.

    2. mount -o remount,rw

    3. it takes some time (free space fixup running)
    ... try to terminate running mount by CTRL-C
    ... does not respond, only after free space fixup is complete
    ... then "ubifs_remount_fs: cannot spawn "ubifs_bgt0_0", error -4"

    4. mount -o remount,rw
    ... now finished instantly (fixup already done).

    5. Create file or just unmount the filesystem and we get the oops.

    Cc:
    Fixes: b50b9f408502 ("UBIFS: do not free write-buffers when in R/O mode")
    Signed-off-by: Petr Cvachoucek
    Signed-off-by: Richard Weinberger
    Signed-off-by: Greg Kroah-Hartman

    Petr Cvachoucek
     

26 Jul, 2021

1 commit

  • The stat() family of syscalls report the wrong size for encrypted
    symlinks, which has caused breakage in several userspace programs.

    Fix this by calling fscrypt_symlink_getattr() after ubifs_getattr() for
    encrypted symlinks. This function computes the correct size by reading
    and decrypting the symlink target (if it's not already cached).

    For more details, see the commit which added fscrypt_symlink_getattr().

    Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks")
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210702065350.209646-5-ebiggers@kernel.org
    Signed-off-by: Eric Biggers

    Eric Biggers
     

22 Jun, 2021

2 commits

  • xfstests-generic/476 reports a warning message as below:

    WARNING: CPU: 2 PID: 30347 at fs/inode.c:361 inc_nlink+0x52/0x70
    Call Trace:
    do_rename+0x502/0xd40 [ubifs]
    ubifs_rename+0x8b/0x180 [ubifs]
    vfs_rename+0x476/0x1080
    do_renameat2+0x67c/0x7b0
    __x64_sys_renameat2+0x6e/0x90
    do_syscall_64+0x66/0xe0
    entry_SYSCALL_64_after_hwframe+0x44/0xae

    Following race case can cause this:
    rename_whiteout(Thread 1) wb_workfn(Thread 2)
    ubifs_rename
    do_rename
    __writeback_single_inode
    spin_lock(&inode->i_lock)
    whiteout->i_state |= I_LINKABLE
    inode->i_state &= ~dirty;
    ---- How race happens on i_state:
    (tmp = whiteout->i_state | I_LINKABLE)
    (tmp = inode->i_state & ~dirty)
    (whiteout->i_state = tmp)
    (inode->i_state = tmp)
    ----
    spin_unlock(&inode->i_lock)
    inc_nlink(whiteout)
    WARN_ON(!(inode->i_state & I_LINKABLE)) !!!

    Fix to add i_lock to avoid i_state update race condition.

    Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger

    Zhihao Cheng
     
  • Fix some spelling mistakes in comments:
    withoug ==> without
    numer ==> number
    aswell ==> as well
    referes ==> refers
    childs ==> children
    unnecesarry ==> unnecessary

    Signed-off-by: Zheng Yongjun
    Reviewed-by: Alexander Dahl
    Signed-off-by: Richard Weinberger

    Zheng Yongjun
     

19 Jun, 2021

4 commits

  • Since ubifs_xattr_get and ubifs_xattr_set cannot being executed
    parallelly after importing @host_ui->xattr_sem, now we can remove
    ui_mutex imported by commit ab92a20bce3b4c2 ("ubifs: make
    ubifs_[get|set]xattr atomic").

    @xattr_size, @xattr_names and @xattr_cnt can't be out of protection
    by @host_ui->mutex yet, they are sill accesed in other places, such as
    pack_inode() called by ubifs_write_inode() triggered by page-writeback.

    Signed-off-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger

    Zhihao Cheng
     
  • UBIFS may occur some problems with concurrent xattr_{set|get} and
    listxattr operations, such as assertion failure, memory corruption,
    stale xattr value[1].

    Fix it by importing a new rw-lock in @ubifs_inode to serilize write
    operations on xattr, concurrent read operations are still effective,
    just like ext4.

    [1] https://lore.kernel.org/linux-mtd/20200630130438.141649-1-houtao1@huawei.com

    Fixes: 1e51764a3c2ac05a23 ("UBIFS: add new flash file system")
    Cc: stable@vger.kernel.org # v2.6+
    Signed-off-by: Zhihao Cheng
    Reviewed-by: Sascha Hauer
    Signed-off-by: Richard Weinberger

    Zhihao Cheng
     
  • The snprintf() function returns the number of characters (not
    counting the NUL terminator) that it would have printed if we
    had space.

    This buffer has UBIFS_DFS_DIR_LEN characters plus one extra for
    the terminator. Printing UBIFS_DFS_DIR_LEN is okay but anything
    higher will result in truncation. Thus the comparison needs to be
    change from == to >.

    These strings are compile time constants so this patch doesn't
    affect runtime.

    Fixes: ae380ce04731 ("UBIFS: lessen the size of debugging info data structure")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Alexander Dahl
    Signed-off-by: Richard Weinberger

    Dan Carpenter
     
  • Fix to return a negative error code from the error handling case instead
    of 0, as done elsewhere in this function.

    Fixes: 9ca2d7326444 ("ubifs: Limit number of xattrs per inode")
    Reported-by: Hulk Robot
    Signed-off-by: Zhen Lei
    Signed-off-by: Richard Weinberger

    Zhen Lei
     

05 May, 2021

1 commit

  • Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
    "JFFS2:
    - Use splice_write()
    - Fix for a slab-out-of-bounds bug

    UBI:
    - Fix for clang related warnings
    - Code cleanup

    UBIFS:
    - Fix for inode rebirth at replay
    - Set s_uuid
    - Use zstd for default filesystem"

    * tag 'for-linus-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
    ubi: Remove unnecessary struct declaration
    jffs2: Hook up splice_write callback
    jffs2: avoid Wempty-body warnings
    jffs2: Fix kasan slab-out-of-bounds problem
    ubi: Fix fall-through warnings for Clang
    ubifs: Report max LEB count at mount time
    ubifs: Set s_uuid in super block to support ima/evm uuid options
    ubifs: Default to zstd compression
    ubifs: Only check replay with inode type to judge if inode linked

    Linus Torvalds
     

28 Apr, 2021

1 commit

  • Pull fileattr conversion updates from Miklos Szeredi via Al Viro:
    "This splits the handling of FS_IOC_[GS]ETFLAGS from ->ioctl() into a
    separate method.

    The interface is reasonably uniform across the filesystems that
    support it and gives nice boilerplate removal"

    * 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
    ovl: remove unneeded ioctls
    fuse: convert to fileattr
    fuse: add internal open/release helpers
    fuse: unsigned open flags
    fuse: move ioctl to separate source file
    vfs: remove unused ioctl helpers
    ubifs: convert to fileattr
    reiserfs: convert to fileattr
    ocfs2: convert to fileattr
    nilfs2: convert to fileattr
    jfs: convert to fileattr
    hfsplus: convert to fileattr
    efivars: convert to fileattr
    xfs: convert to fileattr
    orangefs: convert to fileattr
    gfs2: convert to fileattr
    f2fs: convert to fileattr
    ext4: convert to fileattr
    ext2: convert to fileattr
    btrfs: convert to fileattr
    ...

    Linus Torvalds
     

16 Apr, 2021

4 commits

  • There is no other way to directly report/query this
    quantity. It is useful when planing how given filesystem
    can be resized.

    Signed-off-by: Martin Devera
    Signed-off-by: Richard Weinberger

    Martin Devera
     
  • This is required to provide uuid based integrity functionality for:
    ima_policy (fsuuid option) and the 'evmctl' command ('--uuid' option).

    Co-developed-by: Oleksij Rempel
    Co-developed-by: Juergen Borleis
    Signed-off-by: Steffen Trumtrar
    Reviewed-by: Andy Shevchenko
    Signed-off-by: Richard Weinberger

    Steffen Trumtrar
     
  • Compared to lzo and zlib, zstd is the best all-around performer, both in terms
    of speed and compression ratio. Set it as the default, if available.

    Signed-off-by: Rui Salvaterra
    Signed-off-by: Richard Weinberger

    Rui Salvaterra
     
  • Conside the following case, it just write a big file into flash,
    when complete writing, delete the file, and then power off promptly.
    Next time power on, we'll get a replay list like:
    ...
    LEB 1105:211344 len 4144 deletion 0 sqnum 428783 key type 1 inode 80
    LEB 15:233544 len 160 deletion 1 sqnum 428785 key type 0 inode 80
    LEB 1105:215488 len 4144 deletion 0 sqnum 428787 key type 1 inode 80
    ...
    In the replay list, data nodes' deletion are 0, and the inode node's
    deletion is 1. In current logic, the file's dentry will be removed,
    but inode and the flash space it occupied will be reserved.
    User will see that much free space been disappeared.

    We only need to check the deletion value of the following inode type
    node of the replay entry.

    Fixes: e58725d51fa8 ("ubifs: Handle re-linking of inodes correctly while recovery")
    Cc: stable@vger.kernel.org
    Signed-off-by: Guochun Mao
    Signed-off-by: Richard Weinberger

    Guochun Mao
     

12 Apr, 2021

1 commit


09 Apr, 2021

1 commit

  • list_sort() internally casts the comparison function passed to it
    to a different type with constant struct list_head pointers, and
    uses this pointer to call the functions, which trips indirect call
    Control-Flow Integrity (CFI) checking.

    Instead of removing the consts, this change defines the
    list_cmp_func_t type and changes the comparison function types of
    all list_sort() callers to use const pointers, thus avoiding type
    mismatches.

    Suggested-by: Nick Desaulniers
    Signed-off-by: Sami Tolvanen
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Kees Cook
    Tested-by: Nick Desaulniers
    Tested-by: Nathan Chancellor
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com

    Sami Tolvanen
     

24 Feb, 2021

1 commit

  • Pull idmapped mounts from Christian Brauner:
    "This introduces idmapped mounts which has been in the making for some
    time. Simply put, different mounts can expose the same file or
    directory with different ownership. This initial implementation comes
    with ports for fat, ext4 and with Christoph's port for xfs with more
    filesystems being actively worked on by independent people and
    maintainers.

    Idmapping mounts handle a wide range of long standing use-cases. Here
    are just a few:

    - Idmapped mounts make it possible to easily share files between
    multiple users or multiple machines especially in complex
    scenarios. For example, idmapped mounts will be used in the
    implementation of portable home directories in
    systemd-homed.service(8) where they allow users to move their home
    directory to an external storage device and use it on multiple
    computers where they are assigned different uids and gids. This
    effectively makes it possible to assign random uids and gids at
    login time.

    - It is possible to share files from the host with unprivileged
    containers without having to change ownership permanently through
    chown(2).

    - It is possible to idmap a container's rootfs and without having to
    mangle every file. For example, Chromebooks use it to share the
    user's Download folder with their unprivileged containers in their
    Linux subsystem.

    - It is possible to share files between containers with
    non-overlapping idmappings.

    - Filesystem that lack a proper concept of ownership such as fat can
    use idmapped mounts to implement discretionary access (DAC)
    permission checking.

    - They allow users to efficiently changing ownership on a per-mount
    basis without having to (recursively) chown(2) all files. In
    contrast to chown (2) changing ownership of large sets of files is
    instantenous with idmapped mounts. This is especially useful when
    ownership of a whole root filesystem of a virtual machine or
    container is changed. With idmapped mounts a single syscall
    mount_setattr syscall will be sufficient to change the ownership of
    all files.

    - Idmapped mounts always take the current ownership into account as
    idmappings specify what a given uid or gid is supposed to be mapped
    to. This contrasts with the chown(2) syscall which cannot by itself
    take the current ownership of the files it changes into account. It
    simply changes the ownership to the specified uid and gid. This is
    especially problematic when recursively chown(2)ing a large set of
    files which is commong with the aforementioned portable home
    directory and container and vm scenario.

    - Idmapped mounts allow to change ownership locally, restricting it
    to specific mounts, and temporarily as the ownership changes only
    apply as long as the mount exists.

    Several userspace projects have either already put up patches and
    pull-requests for this feature or will do so should you decide to pull
    this:

    - systemd: In a wide variety of scenarios but especially right away
    in their implementation of portable home directories.

    https://systemd.io/HOME_DIRECTORY/

    - container runtimes: containerd, runC, LXD:To share data between
    host and unprivileged containers, unprivileged and privileged
    containers, etc. The pull request for idmapped mounts support in
    containerd, the default Kubernetes runtime is already up for quite
    a while now: https://github.com/containerd/containerd/pull/4734

    - The virtio-fs developers and several users have expressed interest
    in using this feature with virtual machines once virtio-fs is
    ported.

    - ChromeOS: Sharing host-directories with unprivileged containers.

    I've tightly synced with all those projects and all of those listed
    here have also expressed their need/desire for this feature on the
    mailing list. For more info on how people use this there's a bunch of
    talks about this too. Here's just two recent ones:

    https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
    https://fosdem.org/2021/schedule/event/containers_idmap/

    This comes with an extensive xfstests suite covering both ext4 and
    xfs:

    https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

    It covers truncation, creation, opening, xattrs, vfscaps, setid
    execution, setgid inheritance and more both with idmapped and
    non-idmapped mounts. It already helped to discover an unrelated xfs
    setgid inheritance bug which has since been fixed in mainline. It will
    be sent for inclusion with the xfstests project should you decide to
    merge this.

    In order to support per-mount idmappings vfsmounts are marked with
    user namespaces. The idmapping of the user namespace will be used to
    map the ids of vfs objects when they are accessed through that mount.
    By default all vfsmounts are marked with the initial user namespace.
    The initial user namespace is used to indicate that a mount is not
    idmapped. All operations behave as before and this is verified in the
    testsuite.

    Based on prior discussions we want to attach the whole user namespace
    and not just a dedicated idmapping struct. This allows us to reuse all
    the helpers that already exist for dealing with idmappings instead of
    introducing a whole new range of helpers. In addition, if we decide in
    the future that we are confident enough to enable unprivileged users
    to setup idmapped mounts the permission checking can take into account
    whether the caller is privileged in the user namespace the mount is
    currently marked with.

    The user namespace the mount will be marked with can be specified by
    passing a file descriptor refering to the user namespace as an
    argument to the new mount_setattr() syscall together with the new
    MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
    of extensibility.

    The following conditions must be met in order to create an idmapped
    mount:

    - The caller must currently have the CAP_SYS_ADMIN capability in the
    user namespace the underlying filesystem has been mounted in.

    - The underlying filesystem must support idmapped mounts.

    - The mount must not already be idmapped. This also implies that the
    idmapping of a mount cannot be altered once it has been idmapped.

    - The mount must be a detached/anonymous mount, i.e. it must have
    been created by calling open_tree() with the OPEN_TREE_CLONE flag
    and it must not already have been visible in the filesystem.

    The last two points guarantee easier semantics for userspace and the
    kernel and make the implementation significantly simpler.

    By default vfsmounts are marked with the initial user namespace and no
    behavioral or performance changes are observed.

    The manpage with a detailed description can be found here:

    https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8

    In order to support idmapped mounts, filesystems need to be changed
    and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
    patches to convert individual filesystem are not very large or
    complicated overall as can be seen from the included fat, ext4, and
    xfs ports. Patches for other filesystems are actively worked on and
    will be sent out separately. The xfstestsuite can be used to verify
    that port has been done correctly.

    The mount_setattr() syscall is motivated independent of the idmapped
    mounts patches and it's been around since July 2019. One of the most
    valuable features of the new mount api is the ability to perform
    mounts based on file descriptors only.

    Together with the lookup restrictions available in the openat2()
    RESOLVE_* flag namespace which we added in v5.6 this is the first time
    we are close to hardened and race-free (e.g. symlinks) mounting and
    path resolution.

    While userspace has started porting to the new mount api to mount
    proper filesystems and create new bind-mounts it is currently not
    possible to change mount options of an already existing bind mount in
    the new mount api since the mount_setattr() syscall is missing.

    With the addition of the mount_setattr() syscall we remove this last
    restriction and userspace can now fully port to the new mount api,
    covering every use-case the old mount api could. We also add the
    crucial ability to recursively change mount options for a whole mount
    tree, both removing and adding mount options at the same time. This
    syscall has been requested multiple times by various people and
    projects.

    There is a simple tool available at

    https://github.com/brauner/mount-idmapped

    that allows to create idmapped mounts so people can play with this
    patch series. I'll add support for the regular mount binary should you
    decide to pull this in the following weeks:

    Here's an example to a simple idmapped mount of another user's home
    directory:

    u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

    u1001@f2-vm:/$ ls -al /home/ubuntu/
    total 28
    drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
    drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
    -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
    -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
    -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
    -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
    -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
    -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

    u1001@f2-vm:/$ ls -al /mnt/
    total 28
    drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
    drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
    -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
    -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
    -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
    -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
    -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
    -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo

    u1001@f2-vm:/$ touch /mnt/my-file

    u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

    u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

    u1001@f2-vm:/$ ls -al /mnt/my-file
    -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

    u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
    -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

    u1001@f2-vm:/$ getfacl /mnt/my-file
    getfacl: Removing leading '/' from absolute path names
    # file: mnt/my-file
    # owner: u1001
    # group: u1001
    user::rw-
    user:u1001:rwx
    group::rw-
    mask::rwx
    other::r--

    u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
    getfacl: Removing leading '/' from absolute path names
    # file: home/ubuntu/my-file
    # owner: ubuntu
    # group: ubuntu
    user::rw-
    user:ubuntu:rwx
    group::rw-
    mask::rwx
    other::r--"

    * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
    xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
    xfs: support idmapped mounts
    ext4: support idmapped mounts
    fat: handle idmapped mounts
    tests: add mount_setattr() selftests
    fs: introduce MOUNT_ATTR_IDMAP
    fs: add mount_setattr()
    fs: add attr_flags_to_mnt_flags helper
    fs: split out functions to hold writers
    namespace: only take read lock in do_reconfigure_mnt()
    mount: make {lock,unlock}_mount_hash() static
    namespace: take lock_mount_hash() directly when changing flags
    nfs: do not export idmapped mounts
    overlayfs: do not mount on top of idmapped mounts
    ecryptfs: do not mount on top of idmapped mounts
    ima: handle idmapped mounts
    apparmor: handle idmapped mounts
    fs: make helpers idmap mount aware
    exec: handle idmapped mounts
    would_dump: handle idmapped mounts
    ...

    Linus Torvalds
     

14 Feb, 2021

1 commit

  • Fix to return PTR_ERR() error code from the error handling case instead
    fo 0 in function alloc_wbufs(), as done elsewhere in this function.

    Fixes: 6a98bc4614de ("ubifs: Add authentication nodes to journal")
    Signed-off-by: Wang ShaoBo
    Reviewed-by: Sascha Hauer
    Signed-off-by: Richard Weinberger

    Wang ShaoBo
     

13 Feb, 2021

3 commits

  • An inode is allowed to have ubifs_xattr_max_cnt() xattrs, so we must
    complain only when an inode has more xattrs, having exactly
    ubifs_xattr_max_cnt() xattrs is fine.
    With this the maximum number of xattrs can be created without hitting
    the "has too many xattrs" warning when removing it.

    Signed-off-by: Sascha Hauer
    Signed-off-by: Richard Weinberger

    Sascha Hauer
     
  • An earlier commit moved out some functions to not be inlined by gcc, but
    after some other rework to remove one of those, clang started inlining
    the other one and ran into the same problem as gcc did before:

    fs/ubifs/replay.c:1174:5: error: stack frame size of 1152 bytes in function 'ubifs_replay_journal' [-Werror,-Wframe-larger-than=]

    Mark the function as noinline_for_stack to ensure it doesn't happen
    again.

    Fixes: f80df3851246 ("ubifs: use crypto_shash_tfm_digest()")
    Fixes: eb66eff6636d ("ubifs: replay: Fix high stack usage")
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Nathan Chancellor
    Signed-off-by: Richard Weinberger

    Arnd Bergmann
     
  • When crypto_shash_digestsize() fails, c->hmac_tfm
    has not been freed before returning, which leads
    to memleak.

    Fixes: 49525e5eecca5 ("ubifs: Add helper functions for authentication support")
    Signed-off-by: Dinghao Liu
    Reviewed-by: Zhihao Cheng
    Signed-off-by: Richard Weinberger

    Dinghao Liu
     

24 Jan, 2021

5 commits

  • Extend some inode methods with an additional user namespace argument. A
    filesystem that is aware of idmapped mounts will receive the user
    namespace the mount has been marked with. This can be used for
    additional permission checking and also to enable filesystems to
    translate between uids and gids if they need to. We have implemented all
    relevant helpers in earlier patches.

    As requested we simply extend the exisiting inode method instead of
    introducing new ones. This is a little more code churn but it's mostly
    mechanical and doesnt't leave us with additional inode methods.

    Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The generic_fillattr() helper fills in the basic attributes associated
    with an inode. Enable it to handle idmapped mounts. If the inode is
    accessed through an idmapped mount map it into the mount's user
    namespace before we store the uid and gid. If the initial user namespace
    is passed nothing changes so non-idmapped mounts will see identical
    behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-12-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The posix acl permission checking helpers determine whether a caller is
    privileged over an inode according to the acls associated with the
    inode. Add helpers that make it possible to handle acls on idmapped
    mounts.

    The vfs and the filesystems targeted by this first iteration make use of
    posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
    translate basic posix access and default permissions such as the
    ACL_USER and ACL_GROUP type according to the initial user namespace (or
    the superblock's user namespace) to and from the caller's current user
    namespace. Adapt these two helpers to handle idmapped mounts whereby we
    either map from or into the mount's user namespace depending on in which
    direction we're translating.
    Similarly, cap_convert_nscap() is used by the vfs to translate user
    namespace and non-user namespace aware filesystem capabilities from the
    superblock's user namespace to the caller's user namespace. Enable it to
    handle idmapped mounts by accounting for the mount's user namespace.

    In addition the fileystems targeted in the first iteration of this patch
    series make use of the posix_acl_chmod() and, posix_acl_update_mode()
    helpers. Both helpers perform permission checks on the target inode. Let
    them handle idmapped mounts. These two helpers are called when posix
    acls are set by the respective filesystems to handle this case we extend
    the ->set() method to take an additional user namespace argument to pass
    the mount's user namespace down.

    Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • When file attributes are changed most filesystems rely on the
    setattr_prepare(), setattr_copy(), and notify_change() helpers for
    initialization and permission checking. Let them handle idmapped mounts.
    If the inode is accessed through an idmapped mount map it into the
    mount's user namespace. Afterwards the checks are identical to
    non-idmapped mounts. If the initial user namespace is passed nothing
    changes so non-idmapped mounts will see identical behavior as before.

    Helpers that perform checks on the ia_uid and ia_gid fields in struct
    iattr assume that ia_uid and ia_gid are intended values and have already
    been mapped correctly at the userspace-kernelspace boundary as we
    already do today. If the initial user namespace is passed nothing
    changes so non-idmapped mounts will see identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Christian Brauner

    Christian Brauner
     
  • The inode_owner_or_capable() helper determines whether the caller is the
    owner of the inode or is capable with respect to that inode. Allow it to
    handle idmapped mounts. If the inode is accessed through an idmapped
    mount it according to the mount's user namespace. Afterwards the checks
    are identical to non-idmapped mounts. If the initial user namespace is
    passed nothing changes so non-idmapped mounts will see identical
    behavior as before.

    Similarly, allow the inode_init_owner() helper to handle idmapped
    mounts. It initializes a new inode on idmapped mounts by mapping the
    fsuid and fsgid of the caller from the mount's user namespace. If the
    initial user namespace is passed nothing changes so non-idmapped mounts
    will see identical behavior as before.

    Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig
    Cc: David Howells
    Cc: Al Viro
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Reviewed-by: James Morris
    Signed-off-by: Christian Brauner

    Christian Brauner
     

18 Dec, 2020

2 commits

  • Pull jffs2, ubi and ubifs updates from Richard Weinberger:
    "JFFS2:
    - Fix for a remount regression
    - Fix for an abnormal GC exit
    - Fix for a possible NULL pointer issue while mounting

    UBI:
    - Add support ECC-ed NOR flash
    - Removal of dead code

    UBIFS:
    - Make node dumping debug code more reliable
    - Various cleanups: less ifdefs, less typos
    - Fix for an info leak"

    * tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
    ubifs: ubifs_dump_node: Dump all branches of the index node
    ubifs: ubifs_dump_sleb: Remove unused function
    ubifs: Pass node length in all node dumping callers
    Revert "ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len"
    ubifs: Limit dumping length by size of memory which is allocated for the node
    ubifs: Remove the redundant return in dbg_check_nondata_nodes_order
    jffs2: Fix NULL pointer dereference in rp_size fs option parsing
    ubifs: Fixed print foramt mismatch in ubifs
    ubi: Do not zero out EC and VID on ECC-ed NOR flashes
    jffs2: remove trailing semicolon in macro definition
    ubifs: Fix error return code in ubifs_init_authentication()
    ubifs: wbuf: Don't leak kernel memory to flash
    ubi: Remove useless code in bytes_str_to_int
    ubifs: Fix the printing type of c->big_lpt
    jffs2: Allow setting rp_size to zero during remounting
    jffs2: Fix ignoring mounting options problem during remounting
    jffs2: Fix GC exit abnormally
    ubifs: Code cleanup by removing ifdef macro surrounding
    jffs2: Fix if/else empty body warnings
    ubifs: Delete duplicated words + other fixes

    Linus Torvalds
     
  • Pull f2fs updates from Jaegeuk Kim:
    "In this round, we've made more work into per-file compression support.

    For example, F2FS_IOC_GET | SET_COMPRESS_OPTION provides a way to
    change the algorithm or cluster size per file. F2FS_IOC_COMPRESS |
    DECOMPRESS_FILE provides a way to compress and decompress the existing
    normal files manually.

    There is also a new mount option, compress_mode=fs|user, which can
    control who compresses the data.

    Chao also added a checksum feature with a mount option so that
    we are able to detect any corrupted cluster.

    In addition, Daniel contributed casefolding with encryption patch,
    which will be used for Android devices.

    Summary:

    Enhancements:
    - add ioctls and mount option to manage per-file compression feature
    - support casefolding with encryption
    - support checksum for compressed cluster
    - avoid IO starvation by replacing mutex with rwsem
    - add sysfs, max_io_bytes, to control max bio size

    Bug fixes:
    - fix use-after-free issue when compression and fsverity are enabled
    - fix consistency corruption during fault injection test
    - fix data offset for lseek
    - get rid of buffer_head which has 32bits limit in fiemap
    - fix some bugs in multi-partitions support
    - fix nat entry count calculation in shrinker
    - fix some stat information

    And, we've refactored some logics and fix minor bugs as well"

    * tag 'f2fs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
    f2fs: compress: fix compression chksum
    f2fs: fix shift-out-of-bounds in sanity_check_raw_super()
    f2fs: fix race of pending_pages in decompression
    f2fs: fix to account inline xattr correctly during recovery
    f2fs: inline: fix wrong inline inode stat
    f2fs: inline: correct comment in f2fs_recover_inline_data
    f2fs: don't check PAGE_SIZE again in sanity_check_raw_super()
    f2fs: convert to F2FS_*_INO macro
    f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size
    f2fs: don't allow any writes on readonly mount
    f2fs: avoid race condition for shrinker count
    f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
    f2fs: add compress_mode mount option
    f2fs: Remove unnecessary unlikely()
    f2fs: init dirty_secmap incorrectly
    f2fs: remove buffer_head which has 32bits limit
    f2fs: fix wrong block count instead of bytes
    f2fs: use new conversion functions between blks and bytes
    f2fs: rename logical_to_blk and blk_to_logical
    f2fs: fix kbytes written stat for multi-device case
    ...

    Linus Torvalds
     

15 Dec, 2020

1 commit

  • Pull crypto updates from Herbert Xu:
    "API:
    - Add speed testing on 1420-byte blocks for networking

    Algorithms:
    - Improve performance of chacha on ARM for network packets
    - Improve performance of aegis128 on ARM for network packets

    Drivers:
    - Add support for Keem Bay OCS AES/SM4
    - Add support for QAT 4xxx devices
    - Enable crypto-engine retry mechanism in caam
    - Enable support for crypto engine on sdm845 in qce
    - Add HiSilicon PRNG driver support"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (161 commits)
    crypto: qat - add capability detection logic in qat_4xxx
    crypto: qat - add AES-XTS support for QAT GEN4 devices
    crypto: qat - add AES-CTR support for QAT GEN4 devices
    crypto: atmel-i2c - select CONFIG_BITREVERSE
    crypto: hisilicon/trng - replace atomic_add_return()
    crypto: keembay - Add support for Keem Bay OCS AES/SM4
    dt-bindings: Add Keem Bay OCS AES bindings
    crypto: aegis128 - avoid spurious references crypto_aegis128_update_simd
    crypto: seed - remove trailing semicolon in macro definition
    crypto: x86/poly1305 - Use TEST %reg,%reg instead of CMP $0,%reg
    crypto: x86/sha512 - Use TEST %reg,%reg instead of CMP $0,%reg
    crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg
    crypto: cpt - Fix sparse warnings in cptpf
    hwrng: ks-sa - Add dependency on IOMEM and OF
    crypto: lib/blake2s - Move selftest prototype into header file
    crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata
    crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
    crypto: ccree - rework cache parameters handling
    crypto: cavium - Use dma_set_mask_and_coherent to simplify code
    crypto: marvell/octeontx - Use dma_set_mask_and_coherent to simplify code
    ...

    Linus Torvalds
     

14 Dec, 2020

1 commit