Eric Lee / smarc-fsl-linux-kernel

04 Feb, 2020

2 commits

45586c707 treewide: remove redundant IS_ERR() before error code check ... Browse Code »

'PTR_ERR(p) == -E*' is a stronger condition than IS_ERR(p).
Hence, IS_ERR(p) is unneeded.

The semantic patch that generates this commit is as follows:

//
@@
expression ptr;
constant error_code;
@@
-IS_ERR(ptr) && (PTR_ERR(ptr) == - error_code)
+PTR_ERR(ptr) == - error_code
//

Link: http://lkml.kernel.org/r/20200106045833.1725-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada
Cc: Julia Lawall
Acked-by: Stephen Boyd [drivers/clk/clk.c]
Acked-by: Bartosz Golaszewski [GPIO]
Acked-by: Wolfram Sang [drivers/i2c]
Acked-by: Rafael J. Wysocki [acpi/scan.c]
Acked-by: Rob Herring
Cc: Eric Biggers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2020-02-04 11:05:27 +0800
2d797e9ff ocfs2: fix oops when writing cloned file ... Browse Code »

Writing a cloned file triggers a kernel oops and the user-space command
process is also killed by the system. The bug can be reproduced stably
via:

1) create a file under ocfs2 file system directory.

journalctl -b > aa.txt

2) create a cloned file for this file.

reflink aa.txt bb.txt

3) write the cloned file with dd command.

dd if=/dev/zero of=bb.txt bs=512 count=1 conv=notrunc

The dd command is killed by the kernel, then you can see the oops message
via dmesg command.

[ 463.875404] BUG: kernel NULL pointer dereference, address: 0000000000000028
[ 463.875413] #PF: supervisor read access in kernel mode
[ 463.875416] #PF: error_code(0x0000) - not-present page
[ 463.875418] PGD 0 P4D 0
[ 463.875425] Oops: 0000 [#1] SMP PTI
[ 463.875431] CPU: 1 PID: 2291 Comm: dd Tainted: G OE 5.3.16-2-default
[ 463.875433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 463.875500] RIP: 0010:ocfs2_refcount_cow+0xa4/0x5d0 [ocfs2]
[ 463.875505] Code: 06 89 6c 24 38 89 eb f6 44 24 3c 02 74 be 49 8b 47 28
[ 463.875508] RSP: 0018:ffffa2cb409dfce8 EFLAGS: 00010202
[ 463.875512] RAX: ffff8b1ebdca8000 RBX: 0000000000000001 RCX: ffff8b1eb73a9df0
[ 463.875515] RDX: 0000000000056a01 RSI: 0000000000000000 RDI: 0000000000000000
[ 463.875517] RBP: 0000000000000001 R08: ffff8b1eb73a9de0 R09: 0000000000000000
[ 463.875520] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 463.875522] R13: ffff8b1eb922f048 R14: 0000000000000000 R15: ffff8b1eb922f048
[ 463.875526] FS: 00007f8f44d15540(0000) GS:ffff8b1ebeb00000(0000) knlGS:0000000000000000
[ 463.875529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 463.875532] CR2: 0000000000000028 CR3: 000000003c17a000 CR4: 00000000000006e0
[ 463.875546] Call Trace:
[ 463.875596] ? ocfs2_inode_lock_full_nested+0x18b/0x960 [ocfs2]
[ 463.875648] ocfs2_file_write_iter+0xaf8/0xc70 [ocfs2]
[ 463.875672] new_sync_write+0x12d/0x1d0
[ 463.875688] vfs_write+0xad/0x1a0
[ 463.875697] ksys_write+0xa1/0xe0
[ 463.875710] do_syscall_64+0x60/0x1f0
[ 463.875743] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 463.875758] RIP: 0033:0x7f8f4482ed44
[ 463.875762] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00
[ 463.875765] RSP: 002b:00007fff300a79d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 463.875769] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8f4482ed44
[ 463.875771] RDX: 0000000000000200 RSI: 000055f771b5c000 RDI: 0000000000000001
[ 463.875774] RBP: 0000000000000200 R08: 00007f8f44af9c78 R09: 0000000000000003
[ 463.875776] R10: 000000000000089f R11: 0000000000000246 R12: 000055f771b5c000
[ 463.875779] R13: 0000000000000200 R14: 0000000000000000 R15: 000055f771b5c000

This regression problem was introduced by commit e74540b28556 ("ocfs2:
protect extent tree in ocfs2_prepare_inode_for_write()").

Link: http://lkml.kernel.org/r/20200121050153.13290-1-ghe@suse.com
Fixes: e74540b28556 ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()").
Signed-off-by: Gang He
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Jun Piao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2020-02-04 11:05:23 +0800

01 Feb, 2020

7 commits

25b69918d ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction ... Browse Code »

For the uniform format, we use ocfs2_update_inode_fsync_trans() to
access t_tid in handle->h_transaction

Link: http://lkml.kernel.org/r/6ff9a312-5f7d-0e27-fb51-bc4e062fcd97@huawei.com
Signed-off-by: Yan Wang
Reviewed-by: Jun Piao
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc: Changwei Ge
Cc: Gang He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

wangyan
2020-02-01 02:30:36 +0800
9f16ca48f ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans() ... Browse Code »

I found a NULL pointer dereference in ocfs2_update_inode_fsync_trans(),
handle->h_transaction may be NULL in this situation:

ocfs2_file_write_iter
->__generic_file_write_iter
->generic_perform_write
->ocfs2_write_begin
->ocfs2_write_begin_nolock
->ocfs2_write_cluster_by_desc
->ocfs2_write_cluster
->ocfs2_mark_extent_written
->ocfs2_change_extent_flag
->ocfs2_split_extent
->ocfs2_try_to_merge_extent
->ocfs2_extend_rotate_transaction
->ocfs2_extend_trans
->jbd2_journal_restart
->jbd2__journal_restart
// handle->h_transaction is NULL here
->handle->h_transaction = NULL;
->start_this_handle
/* journal aborted due to storage
network disconnection, return error */
->return -EROFS;
/* line 3806 in ocfs2_try_to_merge_extent (),
it will ignore ret error. */
->ret = 0;
->...
->ocfs2_write_end
->ocfs2_write_end_nolock
->ocfs2_update_inode_fsync_trans
// NULL pointer dereference
->oi->i_sync_tid = handle->h_transaction->t_tid;

The information of NULL pointer dereference as follows:
JBD2: Detected IO errors while flushing file data on dm-11-45
Aborting journal on device dm-11-45.
JBD2: Error -5 detected when updating journal superblock for dm-11-45.
(dd,22081,3):ocfs2_extend_trans:474 ERROR: status = -30
(dd,22081,3):ocfs2_try_to_merge_extent:3877 ERROR: status = -30
Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000008
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e74e1338
[0000000000000008] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Process dd (pid: 22081, stack limit = 0x00000000584f35a9)
CPU: 3 PID: 22081 Comm: dd Kdump: loaded
Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
pstate: 60400009 (nZCv daif +PAN -UAO)
pc : ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
lr : ocfs2_write_end_nolock+0x2a0/0x550 [ocfs2]
sp : ffff0000459fba70
x29: ffff0000459fba70 x28: 0000000000000000
x27: ffff807ccf7f1000 x26: 0000000000000001
x25: ffff807bdff57970 x24: ffff807caf1d4000
x23: ffff807cc79e9000 x22: 0000000000001000
x21: 000000006c6cd000 x20: ffff0000091d9000
x19: ffff807ccb239db0 x18: ffffffffffffffff
x17: 000000000000000e x16: 0000000000000007
x15: ffff807c5e15bd78 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000001
x9 : 0000000000000228 x8 : 000000000000000c
x7 : 0000000000000fff x6 : ffff807a308ed6b0
x5 : ffff7e01f10967c0 x4 : 0000000000000018
x3 : d0bc661572445600 x2 : 0000000000000000
x1 : 000000001b2e0200 x0 : 0000000000000000
Call trace:
ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
ocfs2_write_end+0x4c/0x80 [ocfs2]
generic_perform_write+0x108/0x1a8
__generic_file_write_iter+0x158/0x1c8
ocfs2_file_write_iter+0x668/0x950 [ocfs2]
__vfs_write+0x11c/0x190
vfs_write+0xac/0x1c0
ksys_write+0x6c/0xd8
__arm64_sys_write+0x24/0x30
el0_svc_common+0x78/0x130
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc

To prevent NULL pointer dereference in this situation, we use
is_handle_aborted() before using handle->h_transaction->t_tid.

Link: http://lkml.kernel.org/r/03e750ab-9ade-83aa-b000-b9e81e34e539@huawei.com
Signed-off-by: Yan Wang
Reviewed-by: Jun Piao
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc: Changwei Ge
Cc: Gang He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

wangyan
2020-02-01 02:30:36 +0800
dd3e7cba1 ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use ... Browse Code »

There are users already and will be more of BITS_TO_BYTES() macro. Move
it to bitops.h for wider use.

In the case of ocfs2 the replacement is identical.

As for bnx2x, there are two places where floor version is used. In the
first case to calculate the amount of structures that can fit one memory
page. In this case obviously the ceiling variant is correct and
original code might have a potential bug, if amount of bits % 8 is not
0. In the second case the macro is used to calculate bytes transmitted
in one microsecond. This will work for all speeds which is multiply of
1Gbps without any change, for the rest new code will give ceiling value,
for instance 100Mbps will give 13 bytes, while old code gives 12 bytes
and the arithmetically correct one is 12.5 bytes. Further the value is
used to setup timer threshold which in any case has its own margins due
to certain resolution. I don't see here an issue with slightly shifting
thresholds for low speed connections, the card is supposed to utilize
highest available rate, which is usually 10Gbps.

Link: http://lkml.kernel.org/r/20200108121316.22411-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko
Reviewed-by: Joseph Qi
Acked-by: Sudarsana Reddy Kalluru
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Shevchenko
2020-02-01 02:30:36 +0800
d8f187506 ocfs2/dlm: remove redundant assignment to ret ... Browse Code »

The variable ret is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/20191202164833.62865-1-colin.king@canonical.com
Signed-off-by: Colin Ian King
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Colin Ian King
2020-02-01 02:30:36 +0800
ca322fb60 ocfs2: make local header paths relative to C files ... Browse Code »

Gang He reports the failure of building fs/ocfs2/ as an external module
of the kernel installed on the system:

$ cd fs/ocfs2
$ make -C /lib/modules/`uname -r`/build M=`pwd` modules

If you want to make it work reliably, I'd recommend to remove ccflags-y
from the Makefiles, and to make header paths relative to the C files. I
think this is the correct usage of the #include "..." directive.

Link: http://lkml.kernel.org/r/20191227022950.14804-1-ghe@suse.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Gang He
Reported-by: Gang He
Reviewed-by: Gang He
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc: Changwei Ge
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masahiro Yamada
2020-02-01 02:30:36 +0800
5b43d6453 ocfs2: remove unneeded semicolons ... Browse Code »

Fixes coccicheck warnings:

fs/ocfs2/cluster/quorum.c:76:2-3: Unneeded semicolon
fs/ocfs2/dlmglue.c:573:2-3: Unneeded semicolon

Link: http://lkml.kernel.org/r/6ee3aa16-9078-30b1-df3f-22064950bd98@linux.alibaba.com
Signed-off-by: zhengbin
Reported-by: Hulk Robot
Acked-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

zhengbin
2020-02-01 02:30:36 +0800
67e2d2eb5 fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres ... Browse Code »

In the only caller of dlm_migrate_lockres() - dlm_empty_lockres(),
target is checked for O2NM_MAX_NODES. Thus, the assertion in
dlm_migrate_lockres() is unnecessary and can be removed. The patch
eliminates such a check.

Link: http://lkml.kernel.org/r/20191218194111.26041-1-pakki001@umn.edu
Signed-off-by: Aditya Pakki
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aditya Pakki
2020-02-01 02:30:36 +0800

05 Jan, 2020

2 commits

b73eba2a8 ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less ... Browse Code »

Because ocfs2_get_dlm_debug() function is called once less here, ocfs2
file system will trigger the system crash, usually after ocfs2 file
system is unmounted.

This system crash is caused by a generic memory corruption, these crash
backtraces are not always the same, for exapmle,

ocfs2: Unmounting device (253,16) on (node 172167785)
general protection fault: 0000 [#1] SMP PTI
CPU: 3 PID: 14107 Comm: fence_legacy Kdump:
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:__kmalloc+0xa5/0x2a0
Code: 00 00 4d 8b 07 65 4d 8b
RSP: 0018:ffffaa1fc094bbe8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: d310a8800d7a3faf RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff96e68fc036c0
RBP: d310a8800d7a3faf R08: ffff96e6ffdb10a0 R09: 00000000752e7079
R10: 000000000001c513 R11: 0000000004091041 R12: 0000000000000dc0
R13: 0000000000000039 R14: ffff96e68fc036c0 R15: ffff96e68fc036c0
FS: 00007f699dfba540(0000) GS:ffff96e6ffd80000(0000) knlGS:00000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f3a9d9b768 CR3: 000000002cd1c000 CR4: 00000000000006e0
Call Trace:
ext4_htree_store_dirent+0x35/0x100 [ext4]
htree_dirblock_to_tree+0xea/0x290 [ext4]
ext4_htree_fill_tree+0x1c1/0x2d0 [ext4]
ext4_readdir+0x67c/0x9d0 [ext4]
iterate_dir+0x8d/0x1a0
__x64_sys_getdents+0xab/0x130
do_syscall_64+0x60/0x1f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f699d33a9fb

This regression problem was introduced by commit e581595ea29c ("ocfs: no
need to check return value of debugfs_create functions").

Link: http://lkml.kernel.org/r/20191225061501.13587-1-ghe@suse.com
Fixes: e581595ea29c ("ocfs: no need to check return value of debugfs_create functions")
Signed-off-by: Gang He
Acked-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Cc: [5.3+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2020-01-05 05:55:09 +0800
397eac17f ocfs2: call journal flush to mark journal as empty after journal recovery when mount ... Browse Code »

If journal is dirty when mount, it will be replayed but jbd2 sb log tail
cannot be updated to mark a new start because journal->j_flag has
already been set with JBD2_ABORT first in journal_init_common.

When a new transaction is committed, it will be recored in block 1
first(journal->j_tail is set to 1 in journal_reset). If emergency
restart happens again before journal super block is updated
unfortunately, the new recorded trans will not be replayed in the next
mount.

The following steps describe this procedure in detail.
1. mount and touch some files
2. these transactions are committed to journal area but not checkpointed
3. emergency restart
4. mount again and its journals are replayed
5. journal super block's first s_start is 1, but its s_seq is not updated
6. touch a new file and its trans is committed but not checkpointed
7. emergency restart again
8. mount and journal is dirty, but trans committed in 6 will not be
replayed.

This exception happens easily when this lun is used by only one node.
If it is used by multi-nodes, other node will replay its journal and its
journal super block will be updated after recovery like what this patch
does.

ocfs2_recover_node->ocfs2_replay_journal.

The following jbd2 journal can be generated by touching a new file after
journal is replayed, and seq 15 is the first valid commit, but first seq
is 13 in journal super block.

logdump:
Block 0: Journal Superblock
Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2)
Blocksize: 4096 Total Blocks: 32768 First Block: 1
First Commit ID: 13 Start Log Blknum: 1
Error: 0
Feature Compat: 0
Feature Incompat: 2 block64
Feature RO compat: 0
Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
FS Share Cnt: 1 Dynamic Superblk Blknum: 0
Per Txn Block Limit Journal: 0 Data: 0

Block 1: Journal Commit Block
Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK)

Block 2: Journal Descriptor
Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK)
No. Blocknum Flags
0. 587 none
UUID: 00000000000000000000000000000000
1. 8257792 JBD2_FLAG_SAME_UUID
2. 619 JBD2_FLAG_SAME_UUID
3. 24772864 JBD2_FLAG_SAME_UUID
4. 8257802 JBD2_FLAG_SAME_UUID
5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
...
Block 7: Inode
Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809)
FS Generation: 2839773110 (0xa9437fb6)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x1) InlineData
User: 0 (root) Group: 0 (root) Size: 7
Links: 1 Clusters: 0
ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019
mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
dtime: 0x0 -- Thu Jan 1 08:00:00 1970
...
Block 9: Journal Commit Block
Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK)

The following is journal recovery log when recovering the upper jbd2
journal when mount again.

syslog:
ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

Due to first commit seq 13 recorded in journal super is not consistent
with the value recorded in block 1(seq is 14), journal recovery will be
terminated before seq 15 even though it is an unbroken commit, inode
8257802 is a new file and it will be lost.

Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com
Signed-off-by: Kai Li
Reviewed-by: Joseph Qi
Reviewed-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kai Li
2020-01-05 05:55:09 +0800

02 Dec, 2019

2 commits

596cf45cb Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:
"Incoming:

- a small number of updates to scripts/, ocfs2 and fs/buffer.c

- most of MM

I still have quite a lot of material (mostly not MM) staged after
linux-next due to -next dependencies. I'll send those across next week
as the preprequisites get merged up"

* emailed patches from Andrew Morton : (135 commits)
mm/page_io.c: annotate refault stalls from swap_readpage
mm/Kconfig: fix trivial help text punctuation
mm/Kconfig: fix indentation
mm/memory_hotplug.c: remove __online_page_set_limits()
mm: fix typos in comments when calling __SetPageUptodate()
mm: fix struct member name in function comments
mm/shmem.c: cast the type of unmap_start to u64
mm: shmem: use proper gfp flags for shmem_writepage()
mm/shmem.c: make array 'values' static const, makes object smaller
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
userfaultfd: wrap the common dst_vma check into an inlined function
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
userfaultfd: use vma_pagesize for all huge page size calculation
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
mm/madvise.c: replace with page_size() in madvise_inject_error()
mm/mmap.c: make vma_merge() comment more easy to understand
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
autonuma: reduce cache footprint when scanning page tables
autonuma: fix watermark checking in migrate_balanced_pgdat()
...

Linus Torvalds
2019-12-02 12:36:41 +0800
0da522107 Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground ... Browse Code »

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.

In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.

After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.

This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...

Linus Torvalds
2019-12-02 05:46:15 +0800

01 Dec, 2019

4 commits

188c523e1 ocfs2: fix passing zero to 'PTR_ERR' warning ... Browse Code »

Fix a static code checker warning:
fs/ocfs2/acl.c:331
ocfs2_acl_chmod() warn: passing zero to 'PTR_ERR'

Link: http://lkml.kernel.org/r/1dee278b-6c96-eec2-ce76-fe6e07c6e20f@linux.alibaba.com
Fixes: 5ee0fbd50fd ("ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang")
Signed-off-by: Ding Xiang
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ding Xiang
2019-12-01 22:29:17 +0800
6a965666b Merge tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/lin… ... Browse Code »

…ux/kernel/git/dhowells/linux-fs

Pull pipe rework from David Howells:
"This is my set of preparatory patches for building a general
notification queue on top of pipes. It makes a number of significant
changes:

- It removes the nr_exclusive argument from __wake_up_sync_key() as
this is always 1. This prepares for the next step:

- Adds wake_up_interruptible_sync_poll_locked() so that poll can be
woken up from a function that's holding the poll waitqueue
spinlock.

- Change the pipe buffer ring to be managed in terms of unbounded
head and tail indices rather than bounded index and length. This
means that reading the pipe only needs to modify one index, not
two.

- A selection of helper functions are provided to query the state of
the pipe buffer, plus a couple to apply updates to the pipe
indices.

- The pipe ring is allowed to have kernel-reserved slots. This allows
many notification messages to be spliced in by the kernel without
allowing userspace to pin too many pages if it writes to the same
pipe.

- Advance the head and tail indices inside the pipe waitqueue lock
and use wake_up_interruptible_sync_poll_locked() to poke poll
without having to take the lock twice.

- Rearrange pipe_write() to preallocate the buffer it is going to
write into and then drop the spinlock. This allows kernel
notifications to then be added the ring whilst it is filling the
buffer it allocated. The read side is stalled because the pipe
mutex is still held.

- Don't wake up readers on a pipe if there was already data in it
when we added more.

- Don't wake up writers on a pipe if the ring wasn't full before we
removed a buffer"

* tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
pipe: Remove sync on wake_ups
pipe: Increase the writer-wakeup threshold to reduce context-switch count
pipe: Check for ring full inside of the spinlock in pipe_write()
pipe: Remove redundant wakeup from pipe_write()
pipe: Rearrange sequence in pipe_write() to preallocate slot
pipe: Conditionalise wakeup in pipe_read()
pipe: Advance tail pointer inside of wait spinlock in pipe_read()
pipe: Allow pipes to have kernel-reserved slots
pipe: Use head and tail pointers for the ring, not cursor and length
Add wake_up_interruptible_sync_poll_locked()
Remove the nr_exclusive argument from __wake_up_sync_key()
pipe: Reduce #inclusion of pipe_fs_i.h

Linus Torvalds
2019-12-01 06:12:13 +0800
b8072d5b3 Merge tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull ext2, quota, reiserfs cleanups and fixes from Jan Kara:

- Refactor the quota on/off kernel internal interfaces (mostly for
ubifs quota support as ubifs does not want to have inodes holding
quota information)

- A few other small quota fixes and cleanups

- Various small ext2 fixes and cleanups

- Reiserfs xattr fix and one cleanup

* tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
ext2: code cleanup for descriptor_loc()
fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
ext2: fix improper function comment
ext2: code cleanup for ext2_try_to_allocate()
ext2: skip unnecessary operations in ext2_try_to_allocate()
ext2: Simplify initialization in ext2_try_to_allocate()
ext2: code cleanup by calling ext2_group_last_block_no()
ext2: introduce new helper ext2_group_last_block_no()
reiserfs: replace open-coded atomic_dec_and_mutex_lock()
ext2: check err when partial != NULL
quota: Handle quotas without quota inodes in dquot_get_state()
quota: Make dquot_disable() work without quota inodes
quota: Drop dquot_enable()
fs: Use dquot_load_quota_inode() from filesystems
quota: Rename vfs_load_quota_inode() to dquot_load_quota_inode()
quota: Simplify dquot_resume()
quota: Factor out setup of quota inode
quota: Check that quota is not dirty before release
quota: fix livelock in dquot_writeback_dquots
ext2: don't set *count in the case of failure in ext2_try_to_allocate()
...

Linus Torvalds
2019-12-01 03:16:07 +0800
50b8b3f85 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"This merge window saw the the following new featuers added to ext4:

- Direct I/O via iomap (required the iomap-for-next branch from
Darrick as a prereq).

- Support for using dioread-nolock where the block size < page size.

- Support for encryption for file systems where the block size < page
size.

- Rework of journal credits handling so a revoke-heavy workload will
not cause the journal to run out of space.

- Replace bit-spinlocks with spinlocks in jbd2

Also included were some bug fixes and cleanups, mostly to clean up
corner cases from fuzzed file systems and error path handling"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (59 commits)
ext4: work around deleting a file with i_nlink == 0 safely
ext4: add more paranoia checking in ext4_expand_extra_isize handling
jbd2: make jbd2_handle_buffer_credits() handle reserved handles
ext4: fix a bug in ext4_wait_for_tail_page_commit
ext4: bio_alloc with __GFP_DIRECT_RECLAIM never fails
ext4: code cleanup for get_next_id
ext4: fix leak of quota reservations
ext4: remove unused variable warning in parse_options()
ext4: Enable encryption for subpage-sized blocks
fs/buffer.c: support fscrypt in block_read_full_page()
ext4: Add error handling for io_end_vec struct allocation
jbd2: Fine tune estimate of necessary descriptor blocks
jbd2: Provide trace event for handle restarts
ext4: Reserve revoke credits for freed blocks
jbd2: Make credit checking more strict
jbd2: Rename h_buffer_credits to h_total_credits
jbd2: Reserve space for revoke descriptor blocks
jbd2: Drop jbd2_space_needed()
jbd2: Account descriptor blocks into t_outstanding_credits
jbd2: Factor out common parts of stopping and restarting a handle
...

Linus Torvalds
2019-12-01 02:53:02 +0800

27 Nov, 2019

1 commit

168829ad0 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:

- A comprehensive rewrite of the robust/PI futex code's exit handling
to fix various exit races. (Thomas Gleixner et al)

- Rework the generic REFCOUNT_FULL implementation using
atomic_fetch_* operations so that the performance impact of the
cmpxchg() loops is mitigated for common refcount operations.

With these performance improvements the generic implementation of
refcount_t should be good enough for everybody - and this got
confirmed by performance testing, so remove ARCH_HAS_REFCOUNT and
REFCOUNT_FULL entirely, leaving the generic implementation enabled
unconditionally. (Will Deacon)

- Other misc changes, fixes, cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
lkdtm: Remove references to CONFIG_REFCOUNT_FULL
locking/refcount: Remove unused 'refcount_error_report()' function
locking/refcount: Consolidate implementations of refcount_t
locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
locking/refcount: Move saturation warnings out of line
locking/refcount: Improve performance of generic REFCOUNT_FULL code
locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the header
locking/refcount: Remove unused refcount_*_checked() variants
locking/refcount: Ensure integer operands are treated as signed
locking/refcount: Define constants for saturation and max refcount values
futex: Prevent exit livelock
futex: Provide distinct return value when owner is exiting
futex: Add mutex around futex exit
futex: Provide state handling for exec() as well
futex: Sanitize exit state handling
futex: Mark the begin of futex exit explicitly
futex: Set task::futex_state to DEAD right after handling futex exit
futex: Split futex_mm_release() for exit/exec
exit/exec: Seperate mm_release()
futex: Replace PF_EXITPIDONE with a state
...

Linus Torvalds
2019-11-27 08:02:40 +0800

23 Nov, 2019

1 commit

94b07b6f9 Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()" ... Browse Code »

This reverts commit 56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d.

Commit 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences
in ocfs2_xa_prepare_entry()") introduces a regression that fail to
create directory with mount option user_xattr and acl. Actually the
reported NULL pointer dereference case can be correctly handled by
loc->xl_ops->xlo_add_entry(), so revert it.

Link: http://lkml.kernel.org/r/1573624916-83825-1-git-send-email-joseph.qi@linux.alibaba.com
Fixes: 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()")
Signed-off-by: Joseph Qi
Reported-by: Thomas Voegtle
Acked-by: Changwei Ge
Cc: Jia-Ju Bai
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Gang He
Cc: Jun Piao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joseph Qi
2019-11-23 01:11:18 +0800

07 Nov, 2019

1 commit

e74540b28 ocfs2: protect extent tree in ocfs2_prepare_inode_for_write() ... Browse Code »

When the extent tree is modified, it should be protected by inode
cluster lock and ip_alloc_sem.

The extent tree is accessed and modified in the
ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem.

The following is a case. The function ocfs2_fiemap is accessing the
extent tree, which is modified at the same time.

kernel BUG at fs/ocfs2/extent_map.c:475!
invalid opcode: 0000 [#1] SMP
Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...]
CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2
Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018
task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000
RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2]
Call Trace:
ocfs2_fiemap+0x1e3/0x430 [ocfs2]
do_vfs_ioctl+0x155/0x510
SyS_ioctl+0x81/0xa0
system_call_fastpath+0x18/0xd8
Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f
RIP ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2]
---[ end trace c8aa0c8180e869dc ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled

This issue can be reproduced every week in a production environment.

This issue is related to the usage mode. If others use ocfs2 in this
mode, the kernel will panic frequently.

[akpm@linux-foundation.org: coding style fixes]
[Fix new warning due to unused function by removing said function - Linus ]
Link: http://lkml.kernel.org/r/1568772175-2906-2-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang
Reviewed-by: Junxiao Bi
Reviewed-by: Gang He
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Joseph Qi
Cc: Changwei Ge
Cc: Jun Piao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shuning Zhang
2019-11-07 00:47:08 +0800

06 Nov, 2019

4 commits

dae82c7fd Pull series refactoring quota enabling and disabling code. Browse Code »

Jan Kara
2019-11-06 17:52:10 +0800
a6d404084 Merge branch 'jk/jbd2-revoke-overflow' Browse Code »

Theodore Ts'o
2019-11-06 05:02:20 +0800
fdc3ef882 jbd2: Reserve space for revoke descriptor blocks ... Browse Code »

Extend functions for starting, extending, and restarting transaction
handles to take number of revoke records handle must be able to
accommodate. These functions then make sure transaction has enough
credits to be able to store resulting revoke descriptor blocks. Also
revoke code tracks number of revoke records created by a handle to catch
situation where some place didn't reserve enough space for revoke
records. Similarly to standard transaction credits, space for unused
reserved revoke records is released when the handle is stopped.

On the ext4 side we currently take a simplistic approach of reserving
space for 1024 revoke records for any transaction. This grows amount of
credits reserved for each handle only by a few and is enough for any
normal workload so that we don't hit warnings in jbd2. We will refine
the logic in following commits.

Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800
9797a9024 ocfs2: Use accessor function for h_buffer_credits ... Browse Code »

Use the jbd2 accessor function for h_buffer_credits.

Reviewed-by: Theodore Ts'o
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20191105164437.32602-12-jack@suse.cz
Signed-off-by: Theodore Ts'o

Jan Kara
2019-11-06 05:00:48 +0800

04 Nov, 2019

1 commit

7212b95e6 fs: Use dquot_load_quota_inode() from filesystems ... Browse Code »

Use dquot_load_quota_inode from filesystems instead of dquot_enable().
In all three cases we want to load quota inode and never use the
function to update quota flags.

Signed-off-by: Jan Kara

Jan Kara
2019-11-04 16:58:05 +0800

01 Nov, 2019

1 commit

df4bb5d12 quota: Check that quota is not dirty before release ... Browse Code »

There is a race window where quota was redirted once we drop dq_list_lock inside dqput(),
but before we grab dquot->dq_lock inside dquot_release()

TASK1 TASK2 (chowner)
->dqput()
we_slept:
spin_lock(&dq_list_lock)
if (dquot_dirty(dquot)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->write_dquot(dquot);
goto we_slept
if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->release_dquot(dquot);
dqget()
mark_dquot_dirty()
dqput()
goto we_slept;
}
So dquot dirty quota will be released by TASK1, but on next we_sleept loop
we detect this and call ->write_dquot() for it.
XFSTEST: https://github.com/dmonakhov/xfstests/commit/440a80d4cbb39e9234df4d7240aee1d551c36107

Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org
CC: stable@vger.kernel.org
Signed-off-by: Dmitry Monakhov
Signed-off-by: Jan Kara

Dmitry Monakhov
2019-11-01 02:07:42 +0800

24 Oct, 2019

1 commit

d055b4fb4 pipe: Reduce #inclusion of pipe_fs_i.h ... Browse Code »

Remove some #inclusions of linux/pipe_fs_i.h that don't seem to be
necessary any more.

Signed-off-by: David Howells

David Howells
2019-10-24 00:02:34 +0800

23 Oct, 2019

1 commit

314999dcb fs: compat_ioctl: move FITRIM emulation into file systems ... Browse Code »

Remove the special case for FITRIM, and make file systems
handle that like all other ioctl commands with their own
handlers.

Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: Mikulas Patocka
Cc: linux-nilfs@vger.kernel.org
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: Arnd Bergmann

Arnd Bergmann
2019-10-23 23:23:46 +0800

21 Oct, 2019

1 commit

464170647 jbd2: Make state lock a spinlock ... Browse Code »

Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep
on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held
region because bit spinlocks disable preemption even on RT.

A first attempt was to replace state lock with a spinlock placed in struct
buffer_head and make the locking conditional on PREEMPT_RT and
DEBUG_BIT_SPINLOCKS.

Jan pointed out that there is a 4 byte hole in struct journal_head where a
regular spinlock fits in and he would not object to convert the state lock
to a spinlock unconditionally.

Aside of solving the RT problem, this also gains lockdep coverage for the
journal head state lock (bit-spinlocks are not covered by lockdep as it's
hard to fit a lockdep map into a single bit).

The trivial change would have been to convert the jbd_*lock_bh_state()
inlines, but that comes with the downside that these functions take a
buffer head pointer which needs to be converted to a journal head pointer
which adds another level of indirection.

As almost all functions which use this lock have a journal head pointer
readily available, it makes more sense to remove the lock helper inlines
and write out spin_*lock() at all call sites.

Fixup all locking comments as well.

Suggested-by: Jan Kara
Signed-off-by: Thomas Gleixner
Signed-off-by: Jan Kara
Cc: "Theodore Ts'o"
Cc: Mark Fasheh
Cc: Joseph Qi
Cc: Joel Becker
Cc: Jan Kara
Cc: linux-ext4@vger.kernel.org
Link: https://lore.kernel.org/r/20190809124233.13277-7-jack@suse.cz
Signed-off-by: Theodore Ts'o

Thomas Gleixner
2019-10-21 21:16:46 +0800

19 Oct, 2019

2 commits

b918c4302 ocfs2: fix panic due to ocfs2_wq is null ... Browse Code »

mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error. ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.

Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
------------[ cut here ]------------
Oops: 0002 [#1] SMP NOPTI
CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G E
4.14.148-200.ckv.x86_64 #1
Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
task: ffff967af0520000 task.stack: ffffa5f05484000
RIP: 0010:mutex_lock+0x19/0x20
Call Trace:
flush_workqueue+0x81/0x460
ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
ocfs2_dismount_volume+0x84/0x400 [ocfs2]
ocfs2_fill_super+0xa4/0x1270 [ocfs2]
? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
mount_bdev+0x17f/0x1c0
mount_fs+0x3a/0x160

Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com
Signed-off-by: Yi Li
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yi Li
2019-10-19 18:32:32 +0800
ce750f43f ocfs2: fix error handling in ocfs2_setattr() ... Browse Code »

Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL on error case before
jumping to do dqput().

Link: http://lkml.kernel.org/r/20191010082349.1134-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chengguang Xu
2019-10-19 18:32:32 +0800

09 Oct, 2019

1 commit

5facae4f3 locking/lockdep: Remove unused @nested argument from lock_release() ... Browse Code »

Since the following commit:

b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release")

@nested is no longer used in lock_release(), so remove it from all
lock_release() calls and friends.

Signed-off-by: Qian Cai
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Will Deacon
Acked-by: Daniel Vetter
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: alexander.levin@microsoft.com
Cc: daniel@iogearbox.net
Cc: davem@davemloft.net
Cc: dri-devel@lists.freedesktop.org
Cc: duyuyang@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: hannes@cmpxchg.org
Cc: intel-gfx@lists.freedesktop.org
Cc: jack@suse.com
Cc: jlbec@evilplan.or
Cc: joonas.lahtinen@linux.intel.com
Cc: joseph.qi@linux.alibaba.com
Cc: jslaby@suse.com
Cc: juri.lelli@redhat.com
Cc: maarten.lankhorst@linux.intel.com
Cc: mark@fasheh.com
Cc: mhocko@kernel.org
Cc: mripard@kernel.org
Cc: ocfs2-devel@oss.oracle.com
Cc: rodrigo.vivi@intel.com
Cc: sean@poorly.run
Cc: st@kernel.org
Cc: tj@kernel.org
Cc: tytso@mit.edu
Cc: vdavydov.dev@gmail.com
Cc: vincent.guittot@linaro.org
Cc: viro@zeniv.linux.org.uk
Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw
Signed-off-by: Ingo Molnar

Qian Cai
2019-10-09 18:46:10 +0800

08 Oct, 2019

4 commits

2abb7d3b1 fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() ... Browse Code »

In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283
to check whether inode_alloc is NULL:

if (inode_alloc)

When inode_alloc is NULL, it is used on line 287:

ocfs2_inode_lock(inode_alloc, &bh, 0);
ocfs2_inode_lock_full_nested(inode, ...)
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);

Thus, a possible null-pointer dereference may occur.

To fix this bug, inode_alloc is checked on line 286.

This bug is found by a static analysis tool STCheck written by us.

Link: http://lkml.kernel.org/r/20190726033717.32359-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jia-Ju Bai
2019-10-08 06:47:19 +0800
583fee3e1 fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() ... Browse Code »

In ocfs2_write_end_nolock(), there are an if statement on lines 1976,
2047 and 2058, to check whether handle is NULL:

if (handle)

When handle is NULL, it is used on line 2045:

ocfs2_update_inode_fsync_trans(handle, inode, 1);
oi->i_sync_tid = handle->h_transaction->t_tid;

Thus, a possible null-pointer dereference may occur.

To fix this bug, handle is checked before calling
ocfs2_update_inode_fsync_trans().

This bug is found by a static analysis tool STCheck written by us.

Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jia-Ju Bai
2019-10-08 06:47:19 +0800
56e94ea13 fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() ... Browse Code »

In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
check whether loc->xl_entry is NULL:

if (loc->xl_entry)

When loc->xl_entry is NULL, it is used on line 2158:

ocfs2_xa_add_entry(loc, name_hash);
loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);

and line 2164:

ocfs2_xa_add_namevalue(loc, xi);
loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
loc->xl_entry->xe_name_len = xi->xi_name_len;

Thus, possible null-pointer dereferences may occur.

To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
abnormally returns with -EINVAL.

These bugs are found by a static analysis tool STCheck written by us.

[akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jia-Ju Bai
2019-10-08 06:47:19 +0800
7a243c82e ocfs2: clear zero in unaligned direct IO ... Browse Code »

Unused portion of a part-written fs-block-sized block is not set to zero
in unaligned append direct write.This can lead to serious data
inconsistencies.

Ocfs2 manage disk with cluster size(for example, 1M), part-written in
one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
VFS(function dio_zero_block) doesn't do the cleaning because bh's state
is not set to NEW in function ocfs2_dio_wr_get_block when we write a
WRITTEN cluster. For example, the cluster size is 1M, file size is 8k
and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
contain dirty data.

We have to deal with two cases:
1.The starting position of direct write is outside the file.
2.The starting position of direct write is located in the file.

We need set bh's state to NEW in the first case. In the second case, we
need mapped twice because bh's state of area out file should be set to
NEW while area in file not.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.com
Signed-off-by: Jia Guo
Reviewed-by: Yiwen Jiang
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jia Guo
2019-10-08 06:47:19 +0800

25 Sep, 2019

4 commits

1c3ce5417 ocfs2: fix spelling mistake "ambigous" -> "ambiguous" ... Browse Code »

There is a spelling mistake in a mlog_bug_on_msg message. Fix it.

Link: http://lkml.kernel.org/r/831bdff4-064e-038b-f45d-c4d265cbff1e@linux.alibaba.com
Signed-off-by: Colin Ian King
Acked-by: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Colin Ian King
2019-09-25 06:54:07 +0800
d7283b39d ocfs2: checkpoint appending truncate log transaction before flushing ... Browse Code »

Appending truncate log(TA) and and flushing truncate log(TF) are two
separated transactions. They can be both committed but not checkpointed.
If crash occurs then, both transaction will be replayed with several
already released to global bitmap clusters. Then truncate log will be
replayed resulting in cluster double free.

To reproduce this issue, just crash the host while punching hole to files.

Signed-off-by: Changwei Ge
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Changwei Ge
2019-09-25 06:54:07 +0800
0a3775e4f ocfs2: wait for recovering done after direct unlock request ... Browse Code »

There is a scenario causing ocfs2 umount hang when multiple hosts are
rebooting at the same time.

NODE1 NODE2 NODE3
send unlock requset to NODE2
dies
become recovery master
recover NODE2
find NODE2 dead
mark resource RECOVERING
directly remove lock from grant list
calculate usage but RECOVERING marked
**miss the window of purging
clear RECOVERING

To reproduce this issue, crash a host and then umount ocfs2
from another node.

To solve this, just let unlock progress wait for recovery done.

Link: http://lkml.kernel.org/r/1550124866-20367-1-git-send-email-gechangwei@live.cn
Signed-off-by: Changwei Ge
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Changwei Ge
2019-09-25 06:54:07 +0800
a89bd89fa ocfs2: delete unnecessary checks before brelse() ... Browse Code »

brelse() tests whether its argument is NULL and then returns immediately.
Thus the tests around the shown calls are not needed.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/55cde320-394b-f985-56ce-1a2abea782aa@web.de
Signed-off-by: Markus Elfring
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc: Gang He
Cc: Jun Piao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Markus Elfring
2019-09-25 06:54:07 +0800