03 Aug, 2022

1 commit

  • commit c80af0c250c8f8a3c978aa5aafbe9c39b336b813 upstream.

    This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.

    This commit introduced a regression that can cause mount hung. The
    changes in __ocfs2_find_empty_slot causes that any node with none-zero
    node number can grab the slot that was already taken by node 0, so node 1
    will access the same journal with node 0, when it try to grab journal
    cluster lock, it will hung because it was already acquired by node 0.
    It's very easy to reproduce this, in one cluster, mount node 0 first, then
    node 1, you will see the following call trace from node 1.

    [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
    [13148.739691] Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
    [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [13148.745846] task:mount.ocfs2 state:D stack: 0 pid:53045 ppid: 53044 flags:0x00004000
    [13148.749354] Call Trace:
    [13148.750718]
    [13148.752019] ? usleep_range+0x90/0x89
    [13148.753882] __schedule+0x210/0x567
    [13148.755684] schedule+0x44/0xa8
    [13148.757270] schedule_timeout+0x106/0x13c
    [13148.759273] ? __prepare_to_swait+0x53/0x78
    [13148.761218] __wait_for_common+0xae/0x163
    [13148.763144] __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
    [13148.765780] ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
    [13148.768312] ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
    [13148.770968] ocfs2_journal_init+0x91/0x340 [ocfs2]
    [13148.773202] ocfs2_check_volume+0x39/0x461 [ocfs2]
    [13148.775401] ? iput+0x69/0xba
    [13148.777047] ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
    [13148.779646] ocfs2_fill_super+0x54b/0x853 [ocfs2]
    [13148.781756] mount_bdev+0x190/0x1b7
    [13148.783443] ? ocfs2_remount+0x440/0x440 [ocfs2]
    [13148.785634] legacy_get_tree+0x27/0x48
    [13148.787466] vfs_get_tree+0x25/0xd0
    [13148.789270] do_new_mount+0x18c/0x2d9
    [13148.791046] __x64_sys_mount+0x10e/0x142
    [13148.792911] do_syscall_64+0x3b/0x89
    [13148.794667] entry_SYSCALL_64_after_hwframe+0x170/0x0
    [13148.797051] RIP: 0033:0x7f2309f6e26e
    [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
    [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
    [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
    [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
    [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
    [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
    [13148.816564]

    To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
    introduced the feature to mount ocfs2 locally even it is cluster based,
    that is a very dangerous, it can easily cause serious data corruption,
    there is no way to stop other nodes mounting the fs and corrupting it.
    Setup ha or other cluster-aware stack is just the cost that we have to
    take for avoiding corruption, otherwise we have to do it in kernel.

    Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
    Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack")
    Signed-off-by: Junxiao Bi
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Junxiao Bi
     

09 Jun, 2022

1 commit

  • commit 863e0d81b6683c4cbc588ad831f560c90e494bef upstream.

    When user_dlm_destroy_lock failed, it didn't clean up the flags it set
    before exit. For USER_LOCK_IN_TEARDOWN, if this function fails because of
    lock is still in used, next time when unlink invokes this function, it
    will return succeed, and then unlink will remove inode and dentry if lock
    is not in used(file closed), but the dlm lock is still linked in dlm lock
    resource, then when bast come in, it will trigger a panic due to
    user-after-free. See the following panic call trace. To fix this,
    USER_LOCK_IN_TEARDOWN should be reverted if fail. And also error should
    be returned if USER_LOCK_IN_TEARDOWN is set to let user know that unlink
    fail.

    For the case of ocfs2_dlm_unlock failure, besides USER_LOCK_IN_TEARDOWN,
    USER_LOCK_BUSY is also required to be cleared. Even though spin lock is
    released in between, but USER_LOCK_IN_TEARDOWN is still set, for
    USER_LOCK_BUSY, if before every place that waits on this flag,
    USER_LOCK_IN_TEARDOWN is checked to bail out, that will make sure no flow
    waits on the busy flag set by user_dlm_destroy_lock(), then we can
    simplely revert USER_LOCK_BUSY when ocfs2_dlm_unlock fails. Fix
    user_dlm_cluster_lock() which is the only function not following this.

    [ 941.336392] (python,26174,16):dlmfs_unlink:562 ERROR: unlink
    004fb0000060000b5a90b8c847b72e1, error -16 from destroy
    [ 989.757536] ------------[ cut here ]------------
    [ 989.757709] kernel BUG at fs/ocfs2/dlmfs/userdlm.c:173!
    [ 989.757876] invalid opcode: 0000 [#1] SMP
    [ 989.758027] Modules linked in: ksplice_2zhuk2jr_ib_ipoib_new(O)
    ksplice_2zhuk2jr(O) mptctl mptbase xen_netback xen_blkback xen_gntalloc
    xen_gntdev xen_evtchn cdc_ether usbnet mii ocfs2 jbd2 rpcsec_gss_krb5
    auth_rpcgss nfsv4 nfsv3 nfs_acl nfs fscache lockd grace ocfs2_dlmfs
    ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc
    fcoe libfcoe libfc scsi_transport_fc sunrpc ipmi_devintf bridge stp llc
    rds_rdma rds bonding ib_sdp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
    rdma_cm ib_cm iw_cm falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE)
    mlx4_vnic falcon_kal(E) falcon_lsm_pinned_13402(E) mlx4_ib ib_sa ib_mad
    ib_core ib_addr xenfs xen_privcmd dm_multipath iTCO_wdt iTCO_vendor_support
    pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core ipmi_ssif i2c_core ipmi_si
    ipmi_msghandler
    [ 989.760686] ioatdma sg ext3 jbd mbcache sd_mod ahci libahci ixgbe dca ptp
    pps_core vxlan udp_tunnel ip6_udp_tunnel megaraid_sas mlx4_core crc32c_intel
    be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio
    libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi wmi
    dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
    ksplice_2zhuk2jr_ib_ipoib_old]
    [ 989.761987] CPU: 10 PID: 19102 Comm: dlm_thread Tainted: P OE
    4.1.12-124.57.1.el6uek.x86_64 #2
    [ 989.762290] Hardware name: Oracle Corporation ORACLE SERVER
    X5-2/ASM,MOTHERBOARD,1U, BIOS 30350100 06/17/2021
    [ 989.762599] task: ffff880178af6200 ti: ffff88017f7c8000 task.ti:
    ffff88017f7c8000
    [ 989.762848] RIP: e030:[] []
    __user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs]
    [ 989.763185] RSP: e02b:ffff88017f7cbcb8 EFLAGS: 00010246
    [ 989.763353] RAX: 0000000000000000 RBX: ffff880174d48008 RCX:
    0000000000000003
    [ 989.763565] RDX: 0000000000120012 RSI: 0000000000000003 RDI:
    ffff880174d48170
    [ 989.763778] RBP: ffff88017f7cbcc8 R08: ffff88021f4293b0 R09:
    0000000000000000
    [ 989.763991] R10: ffff880179c8c000 R11: 0000000000000003 R12:
    ffff880174d48008
    [ 989.764204] R13: 0000000000000003 R14: ffff880179c8c000 R15:
    ffff88021db7a000
    [ 989.764422] FS: 0000000000000000(0000) GS:ffff880247480000(0000)
    knlGS:ffff880247480000
    [ 989.764685] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 989.764865] CR2: ffff8000007f6800 CR3: 0000000001ae0000 CR4:
    0000000000042660
    [ 989.765081] Stack:
    [ 989.765167] 0000000000000003 ffff880174d48040 ffff88017f7cbd18
    ffffffffc07d455f
    [ 989.765442] ffff88017f7cbd88 ffffffff816fb639 ffff88017f7cbd38
    ffff8800361b5600
    [ 989.765717] ffff88021db7a000 ffff88021f429380 0000000000000003
    ffffffffc0453020
    [ 989.765991] Call Trace:
    [ 989.766093] [] user_bast+0x5f/0xf0 [ocfs2_dlmfs]
    [ 989.766287] [] ? schedule_timeout+0x169/0x2d0
    [ 989.766475] [] ? o2dlm_lock_ast_wrapper+0x20/0x20
    [ocfs2_stack_o2cb]
    [ 989.766738] [] o2dlm_blocking_ast_wrapper+0x1a/0x20
    [ocfs2_stack_o2cb]
    [ 989.767010] [] dlm_do_local_bast+0x46/0xe0 [ocfs2_dlm]
    [ 989.767217] [] ? dlm_lockres_calc_usage+0x4c/0x60
    [ocfs2_dlm]
    [ 989.767466] [] dlm_thread+0xa31/0x1140 [ocfs2_dlm]
    [ 989.767662] [] ? __schedule+0x24a/0x810
    [ 989.767834] [] ? __schedule+0x23e/0x810
    [ 989.768006] [] ? __schedule+0x24a/0x810
    [ 989.768178] [] ? __schedule+0x23e/0x810
    [ 989.768349] [] ? __schedule+0x24a/0x810
    [ 989.768521] [] ? __schedule+0x23e/0x810
    [ 989.768693] [] ? __schedule+0x24a/0x810
    [ 989.768893] [] ? __schedule+0x23e/0x810
    [ 989.769067] [] ? __schedule+0x24a/0x810
    [ 989.769241] [] ? wait_woken+0x90/0x90
    [ 989.769411] [] ? dlm_kick_thread+0x80/0x80 [ocfs2_dlm]
    [ 989.769617] [] kthread+0xcb/0xf0
    [ 989.769774] [] ? __schedule+0x24a/0x810
    [ 989.769945] [] ? __schedule+0x24a/0x810
    [ 989.770117] [] ? kthread_create_on_node+0x180/0x180
    [ 989.770321] [] ret_from_fork+0x61/0x90
    [ 989.770492] [] ? kthread_create_on_node+0x180/0x180
    [ 989.770689] Code: d0 00 00 00 f0 45 7d c0 bf 00 20 00 00 48 89 83 c0 00 00
    00 48 89 83 c8 00 00 00 e8 55 c1 8c c0 83 4b 04 10 48 83 c4 08 5b 5d c3
    0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 83
    [ 989.771892] RIP []
    __user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs]
    [ 989.772174] RSP
    [ 989.772704] ---[ end trace ebd1e38cebcc93a8 ]---
    [ 989.772907] Kernel panic - not syncing: Fatal exception
    [ 989.773173] Kernel Offset: disabled

    Link: https://lkml.kernel.org/r/20220518235224.87100-2-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Junxiao Bi via Ocfs2-devel
     

08 Apr, 2022

1 commit

  • commit de19433423c7bedabbd4f9a25f7dbc62c5e78921 upstream.

    There is a reported crash when mounting ocfs2 with quota enabled.

    RIP: 0010:ocfs2_qinfo_lock_res_init+0x44/0x50 [ocfs2]
    Call Trace:
    ocfs2_local_read_info+0xb9/0x6f0 [ocfs2]
    dquot_load_quota_sb+0x216/0x470
    dquot_load_quota_inode+0x85/0x100
    ocfs2_enable_quotas+0xa0/0x1c0 [ocfs2]
    ocfs2_fill_super.cold+0xc8/0x1bf [ocfs2]
    mount_bdev+0x185/0x1b0
    legacy_get_tree+0x27/0x40
    vfs_get_tree+0x25/0xb0
    path_mount+0x465/0xac0
    __x64_sys_mount+0x103/0x140

    It is caused by when initializing dqi_gqlock, the corresponding dqi_type
    and dqi_sb are not properly initialized.

    This issue is introduced by commit 6c85c2c72819, which wants to avoid
    accessing uninitialized variables in error cases. So make global quota
    info properly initialized.

    Link: https://lkml.kernel.org/r/20220323023644.40084-1-joseph.qi@linux.alibaba.com
    Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007141
    Fixes: 6c85c2c72819 ("ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()")
    Signed-off-by: Joseph Qi
    Reported-by: Dayvison
    Tested-by: Valentin Vidic
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Joseph Qi
     

23 Mar, 2022

1 commit

  • commit 7b0b1332cfdb94489836b67d088a779699f8e47e upstream.

    Once s_root is set, genric_shutdown_super() will be called if
    fill_super() fails. That means, we will call ocfs2_dismount_volume()
    twice in such case, which can lead to kernel crash.

    Fix this issue by initializing filecheck kobj before setting s_root.

    Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com
    Fixes: 5f483c4abb50 ("ocfs2: add kobject for online file check")
    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Joseph Qi
     

02 Feb, 2022

1 commit

  • commit ddf4b773aa40790dfa936bd845c18e735a49c61c upstream.

    commit 6f1b228529ae introduces a regression which can deadlock as
    follows:

    Task1: Task2:
    jbd2_journal_commit_transaction ocfs2_test_bg_bit_allocatable
    spin_lock(&jh->b_state_lock) jbd_lock_bh_journal_head
    __jbd2_journal_remove_checkpoint spin_lock(&jh->b_state_lock)
    jbd2_journal_put_journal_head
    jbd_lock_bh_journal_head

    Task1 and Task2 lock bh->b_state and jh->b_state_lock in different
    order, which finally result in a deadlock.

    So use jbd2_journal_[grab|put]_journal_head instead in
    ocfs2_test_bg_bit_allocatable() to fix it.

    Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.com
    Fixes: 6f1b228529ae ("ocfs2: fix race between searching chunks and release journal_head from buffer_head")
    Signed-off-by: Joseph Qi
    Reported-by: Gautham Ananthakrishna
    Tested-by: Gautham Ananthakrishna
    Reported-by: Saeed Mirzamohammadi
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Joel Becker
    Cc: Jun Piao
    Cc: Junxiao Bi
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Joseph Qi
     

19 Nov, 2021

1 commit

  • commit 839b63860eb3835da165642923120d305925561d upstream.

    Patch series "ocfs2: Truncate data corruption fix".

    As further testing has shown, commit 5314454ea3f ("ocfs2: fix data
    corruption after conversion from inline format") didn't fix all the data
    corruption issues the customer started observing after 6dbf7bb55598
    ("fs: Don't invalidate page buffers in block_write_full_page()") This
    time I have tracked them down to two bugs in ocfs2 truncation code.

    One bug (truncating page cache before clearing tail cluster and setting
    i_size) could cause data corruption even before 6dbf7bb55598, but before
    that commit it needed a race with page fault, after 6dbf7bb55598 it
    started to be pretty deterministic.

    Another bug (zeroing pages beyond old i_size) used to be harmless
    inefficiency before commit 6dbf7bb55598. But after commit 6dbf7bb55598
    in combination with the first bug it resulted in deterministic data
    corruption.

    Although fixing only the first problem is needed to stop data
    corruption, I've fixed both issues to make the code more robust.

    This patch (of 2):

    ocfs2_truncate_file() did unmap invalidate page cache pages before
    zeroing partial tail cluster and setting i_size. Thus some pages could
    be left (and likely have left if the cluster zeroing happened) in the
    page cache beyond i_size after truncate finished letting user possibly
    see stale data once the file was extended again. Also the tail cluster
    zeroing was not guaranteed to finish before truncate finished causing
    possible stale data exposure. The problem started to be particularly
    easy to hit after commit 6dbf7bb55598 "fs: Don't invalidate page buffers
    in block_write_full_page()" stopped invalidation of pages beyond i_size
    from page writeback path.

    Fix these problems by unmapping and invalidating pages in the page cache
    after the i_size is reduced and tail cluster is zeroed out.

    Link: https://lkml.kernel.org/r/20211025150008.29002-1-jack@suse.cz
    Link: https://lkml.kernel.org/r/20211025151332.11301-1-jack@suse.cz
    Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
    Signed-off-by: Jan Kara
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

29 Oct, 2021

1 commit

  • Encountered a race between ocfs2_test_bg_bit_allocatable() and
    jbd2_journal_put_journal_head() resulting in the below vmcore.

    PID: 106879 TASK: ffff880244ba9c00 CPU: 2 COMMAND: "loop3"
    Call trace:
    panic
    oops_end
    no_context
    __bad_area_nosemaphore
    bad_area_nosemaphore
    __do_page_fault
    do_page_fault
    page_fault
    [exception RIP: ocfs2_block_group_find_clear_bits+316]
    ocfs2_block_group_find_clear_bits [ocfs2]
    ocfs2_cluster_group_search [ocfs2]
    ocfs2_search_chain [ocfs2]
    ocfs2_claim_suballoc_bits [ocfs2]
    __ocfs2_claim_clusters [ocfs2]
    ocfs2_claim_clusters [ocfs2]
    ocfs2_local_alloc_slide_window [ocfs2]
    ocfs2_reserve_local_alloc_bits [ocfs2]
    ocfs2_reserve_clusters_with_limit [ocfs2]
    ocfs2_reserve_clusters [ocfs2]
    ocfs2_lock_refcount_allocators [ocfs2]
    ocfs2_make_clusters_writable [ocfs2]
    ocfs2_replace_cow [ocfs2]
    ocfs2_refcount_cow [ocfs2]
    ocfs2_file_write_iter [ocfs2]
    lo_rw_aio
    loop_queue_work
    kthread_worker_fn
    kthread
    ret_from_fork

    When ocfs2_test_bg_bit_allocatable() called bh2jh(bg_bh), the
    bg_bh->b_private NULL as jbd2_journal_put_journal_head() raced and
    released the jounal head from the buffer head. Needed to take bit lock
    for the bit 'BH_JournalHead' to fix this race.

    Link: https://lkml.kernel.org/r/1634820718-6043-1-git-send-email-gautham.ananthakrishna@oracle.com
    Signed-off-by: Gautham Ananthakrishna
    Reviewed-by: Joseph Qi
    Cc:
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham Ananthakrishna
     

19 Oct, 2021

2 commits

  • Starting with kernel 5.11 built with CONFIG_FORTIFY_SOURCE mouting an
    ocfs2 filesystem with either o2cb or pcmk cluster stack fails with the
    trace below. Problem seems to be that strings for cluster stack and
    cluster name are not guaranteed to be null terminated in the disk
    representation, while strlcpy assumes that the source string is always
    null terminated. This causes a read outside of the source string
    triggering the buffer overflow detection.

    detected buffer overflow in strlen
    ------------[ cut here ]------------
    kernel BUG at lib/string.c:1149!
    invalid opcode: 0000 [#1] SMP PTI
    CPU: 1 PID: 910 Comm: mount.ocfs2 Not tainted 5.14.0-1-amd64 #1
    Debian 5.14.6-2
    RIP: 0010:fortify_panic+0xf/0x11
    ...
    Call Trace:
    ocfs2_initialize_super.isra.0.cold+0xc/0x18 [ocfs2]
    ocfs2_fill_super+0x359/0x19b0 [ocfs2]
    mount_bdev+0x185/0x1b0
    legacy_get_tree+0x27/0x40
    vfs_get_tree+0x25/0xb0
    path_mount+0x454/0xa20
    __x64_sys_mount+0x103/0x140
    do_syscall_64+0x3b/0xc0
    entry_SYSCALL_64_after_hwframe+0x44/0xae

    Link: https://lkml.kernel.org/r/20210929180654.32460-1-vvidic@valentin-vidic.from.hr
    Signed-off-by: Valentin Vidic
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valentin Vidic
     
  • Commit 6dbf7bb55598 ("fs: Don't invalidate page buffers in
    block_write_full_page()") uncovered a latent bug in ocfs2 conversion
    from inline inode format to a normal inode format.

    The code in ocfs2_convert_inline_data_to_extents() attempts to zero out
    the whole cluster allocated for file data by grabbing, zeroing, and
    dirtying all pages covering this cluster. However these pages are
    beyond i_size, thus writeback code generally ignores these dirty pages
    and no blocks were ever actually zeroed on the disk.

    This oversight was fixed by commit 693c241a5f6a ("ocfs2: No need to zero
    pages past i_size.") for standard ocfs2 write path, inline conversion
    path was apparently forgotten; the commit log also has a reasoning why
    the zeroing actually is not needed.

    After commit 6dbf7bb55598, things became worse as writeback code stopped
    invalidating buffers on pages beyond i_size and thus these pages end up
    with clean PageDirty bit but with buffers attached to these pages being
    still dirty. So when a file is converted from inline format, then
    writeback triggers, and then the file is grown so that these pages
    become valid, the invalid dirtiness state is preserved,
    mark_buffer_dirty() does nothing on these pages (buffers are already
    dirty) but page is never written back because it is clean. So data
    written to these pages is lost once pages are reclaimed.

    Simple reproducer for the problem is:

    xfs_io -f -c "pwrite 0 2000" -c "pwrite 2000 2000" -c "fsync" \
    -c "pwrite 4000 2000" ocfs2_file

    After unmounting and mounting the fs again, you can observe that end of
    'ocfs2_file' has lost its contents.

    Fix the problem by not doing the pointless zeroing during conversion
    from inline format similarly as in the standard write path.

    [akpm@linux-foundation.org: fix whitespace, per Joseph]

    Link: https://lkml.kernel.org/r/20210930095405.21433-1-jack@suse.cz
    Fixes: 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()")
    Signed-off-by: Jan Kara
    Reviewed-by: Joseph Qi
    Tested-by: Joseph Qi
    Acked-by: Gang He
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Jun Piao
    Cc: "Markov, Andrey"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

25 Sep, 2021

1 commit

  • ocfs2_data_convert_worker() is currently dropping any cached acl info
    for FILE before down-converting meta lock. It should also drop for
    DIRECTORY. Otherwise the second acl lookup returns the cached one (from
    VFS layer) which could be already stale.

    The problem we are seeing is that the acl changes on one node doesn't
    get refreshed on other nodes in the following case:

    Node 1 Node 2
    -------------- ----------------
    getfacl dir1

    getfacl dir1
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wengang Wang
     

04 Sep, 2021

4 commits

  • Merge misc updates from Andrew Morton:
    "173 patches.

    Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
    pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
    bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
    hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
    oom-kill, migration, ksm, percpu, vmstat, and madvise)"

    * emailed patches from Andrew Morton : (173 commits)
    mm/madvise: add MADV_WILLNEED to process_madvise()
    mm/vmstat: remove unneeded return value
    mm/vmstat: simplify the array size calculation
    mm/vmstat: correct some wrong comments
    mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
    selftests: vm: add COW time test for KSM pages
    selftests: vm: add KSM merging time test
    mm: KSM: fix data type
    selftests: vm: add KSM merging across nodes test
    selftests: vm: add KSM zero page merging test
    selftests: vm: add KSM unmerge test
    selftests: vm: add KSM merge test
    mm/migrate: correct kernel-doc notation
    mm: wire up syscall process_mrelease
    mm: introduce process_mrelease system call
    memblock: make memblock_find_in_range method private
    mm/mempolicy.c: use in_task() in mempolicy_slab_node()
    mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
    mm/mempolicy: advertise new MPOL_PREFERRED_MANY
    mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
    ...

    Linus Torvalds
     
  • Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to
    the expected level for satisfy dlm bast requests from the other nodes.

    But there is a rare situation. When dlm lock conversion is being
    canceled, ocfs2_downconvert_lock() function will return -EBUSY. You need
    to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm
    implementation.

    If we does not requeue this lockres entry, ocfs2 downconvert thread no
    longer handles this dlm lock bast request. Then, the other nodes will not
    get the dlm lock again, the current node's process will be blocked when
    acquire this dlm lock again.

    Link: https://lkml.kernel.org/r/20210830044621.12544-1-ghe@suse.com
    Signed-off-by: Gang He
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gang He
     
  • A memory block is allocated through kmalloc(), and its return value is
    assigned to the pointer oinfo. However, oinfo->dqi_gqinode is not
    initialized but it is accessed in:
    iput(oinfo->dqi_gqinode);

    To fix this possible uninitialized-variable access, assign NULL to
    oinfo->dqi_gqinode, and add ocfs2_qinfo_lock_res_init() behind the
    assignment in ocfs2_local_read_info(). Remove ocfs2_qinfo_lock_res_init()
    in ocfs2_global_read_info().

    Link: https://lkml.kernel.org/r/20210804031832.57154-1-islituo@gmail.com
    Signed-off-by: Tuo Li
    Reported-by: TOTE Robot
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tuo Li
     
  • The case where "tmp_oh" is NULL is handled at the start of the function.
    At this point we know it's non-NULL so this will always return 1.

    Link: https://lkml.kernel.org/r/YOcItgIXtisi3MaO@mwanda
    Signed-off-by: Dan Carpenter
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc: Larry Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     

03 Sep, 2021

1 commit

  • Pull overlayfs update from Miklos Szeredi:

    - Copy up immutable/append/sync/noatime attributes (Amir Goldstein)

    - Improve performance by enabling RCU lookup.

    - Misc fixes and improvements

    The reason this touches so many files is that the ->get_acl() method now
    gets a "bool rcu" argument. The ->get_acl() API was updated based on
    comments from Al and Linus:

    Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/

    * tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    ovl: enable RCU'd ->get_acl()
    vfs: add rcu argument to ->get_acl() callback
    ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
    ovl: use kvalloc in xattr copy-up
    ovl: update ctime when changing fileattr
    ovl: skip checking lower file's i_writecount on truncate
    ovl: relax lookup error on mismatch origin ftype
    ovl: do not set overlay.opaque for new directories
    ovl: add ovl_allow_offline_changes() helper
    ovl: disable decoding null uuid with redirect_dir
    ovl: consistent behavior for immutable/append-only inodes
    ovl: copy up sync/noatime fileattr flags
    ovl: pass ovl_fs to ovl_check_setxattr()
    fs: add generic helper for filling statx attribute flags

    Linus Torvalds
     

23 Aug, 2021

1 commit

  • We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
    off in Fedora and RHEL8. Several other distros have followed suit.

    I've heard of one problem in all that time: Someone migrated from an
    older distro that supported "-o mand" to one that didn't, and the host
    had a fstab entry with "mand" in it which broke on reboot. They didn't
    actually _use_ mandatory locking so they just removed the mount option
    and moved on.

    This patch rips out mandatory locking support wholesale from the kernel,
    along with the Kconfig option and the Documentation file. It also
    changes the mount code to ignore the "mand" mount option instead of
    erroring out, and to throw a big, ugly warning.

    Signed-off-by: Jeff Layton

    Jeff Layton
     

19 Aug, 2021

1 commit


31 Jul, 2021

2 commits

  • For punch holes in EOF blocks, fallocate used buffer write to zero the
    EOF blocks in last cluster. But since ->writepage will ignore EOF
    pages, those zeros will not be flushed.

    This "looks" ok as commit 6bba4471f0cc ("ocfs2: fix data corruption by
    fallocate") will zero the EOF blocks when extend the file size, but it
    isn't. The problem happened on those EOF pages, before writeback, those
    pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
    set, when writeback run by write_cache_pages(), DIRTY flag on the page
    was cleared, but DIRTY flag on the buffer_head not.

    When next write happened to those EOF pages, since buffer_head already
    had DIRTY flag set, it would not mark page DIRTY again. That made
    writeback ignore them forever. That will cause data corruption. Even
    directio write can't work because it will fail when trying to drop pages
    caches before direct io, as it found the buffer_head for those pages
    still had DIRTY flag set, then it will fall back to buffer io mode.

    To make a summary of the issue, as writeback ingores EOF pages, once any
    EOF page is generated, any write to it will only go to the page cache,
    it will never be flushed to disk even file size extends and that page is
    not EOF page any more. The fix is to avoid zero EOF blocks with buffer
    write.

    The following code snippet from qemu-img could trigger the corruption.

    656 open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
    ...
    660 fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680
    660 fallocate(11, 0, 2275868672, 327680) = 0
    658 pwrite64(11, "

    Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • If append-dio feature is enabled, direct-io write and fallocate could
    run in parallel to extend file size, fallocate used "orig_isize" to
    record i_size before taking "ip_alloc_sem", when
    ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already
    extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed
    out.

    Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com
    Fixes: 6bba4471f0cc ("ocfs2: fix data corruption by fallocate")
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Joel Becker
    Cc: Jun Piao
    Cc: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

01 Jul, 2021

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "In addition to bug fixes and cleanups, there are two new features for
    ext4 in 5.14:

    - Allow applications to poll on changes to
    /sys/fs/ext4/*/errors_count

    - Add the ioctl EXT4_IOC_CHECKPOINT which allows the journal to be
    checkpointed, truncated and discarded or zero'ed"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
    jbd2: export jbd2_journal_[un]register_shrinker()
    ext4: notify sysfs on errors_count value change
    fs: remove bdev_try_to_free_page callback
    ext4: remove bdev_try_to_free_page() callback
    jbd2: simplify journal_clean_one_cp_list()
    jbd2,ext4: add a shrinker to release checkpointed buffers
    jbd2: remove redundant buffer io error checks
    jbd2: don't abort the journal when freeing buffers
    jbd2: ensure abort the journal if detect IO error when writing original buffer back
    jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
    ext4: no need to verify new add extent block
    jbd2: clean up misleading comments for jbd2_fc_release_bufs
    ext4: add check to prevent attempting to resize an fs with sparse_super2
    ext4: consolidate checks for resize of bigalloc into ext4_resize_begin
    ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
    ext4: fsmap: fix the block/inode bitmap comment
    ext4: fix comment for s_hash_unsigned
    ext4: use local variable ei instead of EXT4_I() macro
    ext4: fix avefreec in find_group_orlov
    ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
    ...

    Linus Torvalds
     

30 Jun, 2021

7 commits

  • Remove the CONFIG_BLOCK default to __set_page_dirty_buffers and just wire
    that method up for the missing instances.

    [hch@lst.de: ecryptfs: add a ->set_page_dirty cludge]
    Link: https://lkml.kernel.org/r/20210624125250.536369-1-hch@lst.de

    Link: https://lkml.kernel.org/r/20210614061512.3966143-4-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Jan Kara
    Cc: Al Viro
    Cc: Matthew Wilcox (Oracle)
    Cc: Tyler Hicks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The variable ret is being initialized with a value that is never read, the
    assignment is redundant and can be removed.

    Addresses-Coverity: ("Unused value")
    Link: https://lkml.kernel.org/r/20210613135148.74658-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     
  • simple_strtoull() is deprecated in some situation since it does not check
    for the range overflow, use kstrtoull() instead.

    Link: https://lkml.kernel.org/r/20210526092020.554341-3-chenhuang5@huawei.com
    Signed-off-by: Chen Huang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Huang
     
  • In commit 60f91826ca62 ("buffer: Avoid setting buffer bits that are
    already set"), function set_buffer_##name was added a test_bit() to check
    buffer, which is the same as function buffer_##name. The
    !buffer_uptodate(bh) here is a repeated check. Remove it.

    Link: https://lkml.kernel.org/r/20210425025702.13628-1-wanjiabing@vivo.com
    Signed-off-by: Wan Jiabing
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wan Jiabing
     
  • The pointer queue is being initialized with a value that is never read and
    it is being updated later with a new value. The initialization is
    redundant and can be removed.

    Addresses-Coverity: ("Unused value")
    Link: https://lkml.kernel.org/r/20210513113957.57539-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     
  • The snprintf() function returns the number of bytes which would have been
    printed if the buffer was large enough. In other words it can return ">=
    remain" but this code assumes it returns "== remain".

    The run time impact of this bug is not very severe. The next iteration
    through the loop would trigger a WARN() when we pass a negative limit to
    snprintf(). We would then return success instead of -E2BIG.

    The kernel implementation of snprintf() will never return negatives so
    there is no need to check and I have deleted that dead code.

    Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam
    Fixes: a860f6eb4c6a ("ocfs2: sysfile interfaces for online file check")
    Fixes: 74ae4e104dfc ("ocfs2: Create stack glue sysfs files.")
    Signed-off-by: Dan Carpenter
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • The list_head o2hb_node_events is initialized statically. It is
    unnecessary to initialize by INIT_LIST_HEAD().

    Link: https://lkml.kernel.org/r/20210511115847.3817395-1-yangyingliang@huawei.com
    Signed-off-by: Yang Yingliang
    Reported-by: Hulk Robot
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Yingliang
     

23 Jun, 2021

1 commit


05 Jun, 2021

1 commit

  • When fallocate punches holes out of inode size, if original isize is in
    the middle of last cluster, then the part from isize to the end of the
    cluster will be zeroed with buffer write, at that time isize is not yet
    updated to match the new size, if writeback is kicked in, it will invoke
    ocfs2_writepage()->block_write_full_page() where the pages out of inode
    size will be dropped. That will cause file corruption. Fix this by
    zero out eof blocks when extending the inode size.

    Running the following command with qemu-image 4.2.1 can get a corrupted
    coverted image file easily.

    qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
    -O qcow2 -o compat=1.1 $qcow_image.conv

    The usage of fallocate in qemu is like this, it first punches holes out
    of inode size, then extend the inode size.

    fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
    fallocate(11, 0, 2276196352, 65536) = 0

    v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
    v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/

    Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Jan Kara
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

07 May, 2021

1 commit

  • The section "19) Editor modelines and other cruft" in
    Documentation/process/coding-style.rst clearly says, "Do not include any
    of these in source files."

    I recently receive a patch to explicitly add a new one.

    Let's do treewide cleanups, otherwise some people follow the existing code
    and attempt to upstream their favoriate editor setups.

    It is even nicer if scripts/checkpatch.pl can check it.

    If we like to impose coding style in an editor-independent manner, I think
    editorconfig (patch [1]) is a saner solution.

    [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

    Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
    Signed-off-by: Masahiro Yamada
    Acked-by: Geert Uytterhoeven
    Reviewed-by: Miguel Ojeda [auxdisplay]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

01 May, 2021

4 commits

  • Fix the following clang warning:

    fs/ocfs2/dlm/dlmrecovery.c:129:20: warning: unused function 'dlm_reset_recovery' [-Wunused-function].

    Link: https://lkml.kernel.org/r/1618382761-5784-1-git-send-email-jiapeng.chong@linux.alibaba.com
    Signed-off-by: Jiapeng Chong
    Reported-by: Abaci Robot
    Reviewed-by: Wengang Wang
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiapeng Chong
     
  • s/cluter/cluster/

    Link: https://lkml.kernel.org/r/20210324072931.5056-1-unixbhaskar@gmail.com
    Signed-off-by: Bhaskar Chowdhury
    Acked-by: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaskar Chowdhury
     
  • Use macro map_flag() is tricky and coccicheck outputs the following
    warning:

    fs/ocfs2/stack_o2cb.c:69:5-16: Unneeded variable: "o2dlm_flags"

    So map flags directly in flags_to_o2dlm() to make coccicheck happy.
    And remove BUG_ON() here as well to simplify code since it runs well
    a long time.

    Link: https://lkml.kernel.org/r/1616138664-35935-1-git-send-email-joseph.qi@linux.alibaba.com
    Signed-off-by: Joseph Qi
    Reviewed-by: Wengang Wang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Fix the following coccicheck warning:

    fs/ocfs2/blockcheck.c:232:0-23: WARNING: blockcheck_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE

    Link: https://lkml.kernel.org/r/1614155230-57292-1-git-send-email-yang.lee@linux.alibaba.com
    Signed-off-by: Yang Li
    Reported-by: Abaci Robot
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Li
     

28 Apr, 2021

2 commits

  • Pull fileattr conversion updates from Miklos Szeredi via Al Viro:
    "This splits the handling of FS_IOC_[GS]ETFLAGS from ->ioctl() into a
    separate method.

    The interface is reasonably uniform across the filesystems that
    support it and gives nice boilerplate removal"

    * 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
    ovl: remove unneeded ioctls
    fuse: convert to fileattr
    fuse: add internal open/release helpers
    fuse: unsigned open flags
    fuse: move ioctl to separate source file
    vfs: remove unused ioctl helpers
    ubifs: convert to fileattr
    reiserfs: convert to fileattr
    ocfs2: convert to fileattr
    nilfs2: convert to fileattr
    jfs: convert to fileattr
    hfsplus: convert to fileattr
    efivars: convert to fileattr
    xfs: convert to fileattr
    orangefs: convert to fileattr
    gfs2: convert to fileattr
    f2fs: convert to fileattr
    ext4: convert to fileattr
    ext2: convert to fileattr
    btrfs: convert to fileattr
    ...

    Linus Torvalds
     
  • Pull vfs inode type handling updates from Al Viro:
    "We should never change the type bits of ->i_mode or the method tables
    (->i_op and ->i_fop) of a live inode.

    Unfortunately, not all filesystems took care to prevent that"

    * 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    spufs: fix bogosity in S_ISGID handling
    9p: missing chunk of "fs/9p: Don't update file type when updating file attributes"
    openpromfs: don't do unlock_new_inode() until the new inode is set up
    hostfs_mknod(): don't bother with init_special_inode()
    cifs: have cifs_fattr_to_inode() refuse to change type on live inode
    cifs: have ->mkdir() handle race with another client sanely
    do_cifs_create(): don't set ->i_mode of something we had not created
    gfs2: be careful with inode refresh
    ocfs2_inode_lock_update(): make sure we don't change the type bits of i_mode
    orangefs_inode_is_stale(): i_mode type bits do *not* form a bitmap...
    vboxsf: don't allow to change the inode type
    afs: Fix updating of i_mode due to 3rd party change
    ceph: don't allow type or device number to change on non-I_NEW inodes
    ceph: fix up error handling with snapdirs
    new helper: inode_wrong_type()

    Linus Torvalds
     

12 Apr, 2021

1 commit


10 Apr, 2021

1 commit

  • The following deadlock is detected:

    truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

    PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9"
    #0 __schedule at ffffffff818667cc
    #1 schedule at ffffffff81866de6
    #2 inode_dio_wait at ffffffff812a2d04
    #3 ocfs2_setattr at ffffffffc05f322e [ocfs2]
    #4 notify_change at ffffffff812a5a09
    #5 do_truncate at ffffffff812808f5
    #6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2
    #7 sys_ftruncate at ffffffff81280d8e
    #8 do_syscall_64 at ffffffff81003949
    #9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

    dio completion path is going to complete one direct IO (decrement
    inode->i_dio_count), but before that it hung at locking inode->i_rwsem:

    #0 __schedule+700 at ffffffff818667cc
    #1 schedule+54 at ffffffff81866de6
    #2 rwsem_down_write_failed+536 at ffffffff8186aa28
    #3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7
    #4 down_write+45 at ffffffff81869c9d
    #5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
    #6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
    #7 dio_complete+140 at ffffffff812c873c
    #8 dio_aio_complete_work+25 at ffffffff812c89f9
    #9 process_one_work+361 at ffffffff810b1889
    #10 worker_thread+77 at ffffffff810b233d
    #11 kthread+261 at ffffffff810b7fd5
    #12 ret_from_fork+62 at ffffffff81a0035e

    Thus above forms ABBA deadlock. The same deadlock was mentioned in
    upstream commit 28f5a8a7c033 ("ocfs2: should wait dio before inode lock
    in ocfs2_setattr()"). It seems that that commit only removed the
    cluster lock (the victim of above dead lock) from the ABBA deadlock
    party.

    End-user visible effects: Process hang in truncate -> ocfs2_setattr path
    and other processes hang at ocfs2_dio_end_io_write path.

    This is to fix the deadlock itself. It removes inode_lock() call from
    dio completion path to remove the deadlock and add ip_alloc_sem lock in
    setattr path to synchronize the inode modifications.

    [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
    Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com

    Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
    Signed-off-by: Wengang Wang
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wengang Wang
     

13 Mar, 2021

1 commit


25 Feb, 2021

1 commit

  • Fix the following coccicheck warnings:

    fs/ocfs2/refcounttree.c:981:16-18: WARNING !A || A && B is equivalent to !A || B.

    Link: https://lkml.kernel.org/r/1612235424-80367-1-git-send-email-jiapeng.chong@linux.alibaba.com
    Signed-off-by: Jiapeng Chong
    Reported-by: Abaci Robot
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Changwei Ge
    Cc: Gang He
    Cc: Jun Piao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiapeng Chong