Eric Lee / smarc-fsl-linux-kernel

10 Oct, 2018

1 commit

de0e2a92c ocfs2: fix locking for res->tracking and dlm->tracking_list ... Browse Code »

commit cbe355f57c8074bc4f452e5b6e35509044c6fa23 upstream.

In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock. This can cause list
corruptions and can end up in kernel panic.

Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.

Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant
Reviewed-by: Changwei Ge
Acked-by: Joseph Qi
Acked-by: Jun Piao
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Changwei Ge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Ashish Samant
2018-10-10 14:54:28 +0800

29 Sep, 2018

1 commit

7c1ca8fb8 ocfs2: fix ocfs2 read block panic ... Browse Code »

commit 234b69e3e089d850a98e7b3145bd00e9b52b1111 upstream.

While reading block, it is possible that io error return due to underlying
storage issue, in this case, BH_NeedsValidate was left in the buffer head.
Then when reading the very block next time, if it was already linked into
journal, that will trigger the following panic.

[203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
[203748.702533] invalid opcode: 0000 [#1] SMP
[203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
[203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
[203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
[203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
[203748.703088] RIP: 0010:[] [] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.703130] RSP: 0018:ffff88006ff4b818 EFLAGS: 00010206
[203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
[203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
[203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
[203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
[203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
[203748.705871] FS: 00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
[203748.706370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
[203748.707124] Stack:
[203748.707371] ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
[203748.707885] 0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
[203748.708399] 00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
[203748.708915] Call Trace:
[203748.709175] [] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[203748.709680] [] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
[203748.710185] [] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
[203748.710691] [] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
[203748.711204] [] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
[203748.711716] [] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
[203748.712227] [] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
[203748.712737] [] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
[203748.713003] [] ocfs2_create+0x65/0x170 [ocfs2]
[203748.713263] [] vfs_create+0xdb/0x150
[203748.713518] [] do_last+0x815/0x1210
[203748.713772] [] ? path_init+0xb9/0x450
[203748.714123] [] path_openat+0x80/0x600
[203748.714378] [] ? handle_pte_fault+0xd15/0x1620
[203748.714634] [] do_filp_open+0x3a/0xb0
[203748.714888] [] ? __alloc_fd+0xa7/0x130
[203748.715143] [] do_sys_open+0x12c/0x220
[203748.715403] [] ? syscall_trace_enter_phase1+0x11b/0x180
[203748.715668] [] ? system_call_after_swapgs+0xe9/0x190
[203748.715928] [] SyS_open+0x1e/0x20
[203748.716184] [] system_call_fastpath+0x18/0xd7
[203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
[203748.717505] RIP [] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.717775] RSP

Joesph ever reported a similar panic.
Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html

Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi
Cc: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Changwei Ge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Junxiao Bi
2018-09-29 18:06:05 +0800

22 Jul, 2018

2 commits

1ccab2bf7 ocfs2: ip_alloc_sem should be taken in ocfs2_get_block() ... Browse Code »

commit 3e4c56d41eef5595035872a2ec5a483f42e8917f upstream.

ip_alloc_sem should be taken in ocfs2_get_block() when reading file in
DIRECT mode to prevent concurrent access to extent tree with
ocfs2_dio_end_io_write(), which may cause BUGON in the following
situation:

read file 'A' end_io of writing file 'A'
vfs_read
__vfs_read
ocfs2_file_read_iter
generic_file_read_iter
ocfs2_direct_IO
__blockdev_direct_IO
do_blockdev_direct_IO
do_direct_IO
get_more_blocks
ocfs2_get_block
ocfs2_extent_map_get_blocks
ocfs2_get_clusters
ocfs2_get_clusters_nocache()
ocfs2_search_extent_list
return the index of record which
contains the v_cluster, that is
v_cluster > rec[i]->e_cpos.
ocfs2_dio_end_io
ocfs2_dio_end_io_write
down_write(&oi->ip_alloc_sem);
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_split_extent
...
--> modify the rec[i]->e_cpos, resulting
in v_cluster < rec[i]->e_cpos.
BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))

[alex.chen@huawei.com: v3]
Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com
Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com
Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Alex Chen
Reviewed-by: Jun Piao
Reviewed-by: Joseph Qi
Reviewed-by: Gang He
Acked-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Salvatore Bonaccorso
Signed-off-by: Greg Kroah-Hartman

alex chen
2018-07-22 20:28:42 +0800
c59a8f13f ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent ... Browse Code »

commit 853bc26a7ea39e354b9f8889ae7ad1492ffa28d2 upstream.

The subsystem.su_mutex is required while accessing the item->ci_parent,
otherwise, NULL pointer dereference to the item->ci_parent will be
triggered in the following situation:

add node delete node
sys_write
vfs_write
configfs_write_file
o2nm_node_store
o2nm_node_local_write
do_rmdir
vfs_rmdir
configfs_rmdir
mutex_lock(&subsys->su_mutex);
unlink_obj
item->ci_group = NULL;
item->ci_parent = NULL;
to_o2nm_cluster_from_node
node->nd_item.ci_parent->ci_parent
BUG since of NULL pointer dereference to nd_item.ci_parent

Moreover, the o2nm_cluster also should be protected by the
subsystem.su_mutex.

[alex.chen@huawei.com: v2]
Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com
Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.com
Signed-off-by: Alex Chen
Reviewed-by: Jun Piao
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Cc: Salvatore Bonaccorso
Signed-off-by: Greg Kroah-Hartman

alex chen
2018-07-22 20:28:42 +0800

21 Jun, 2018

1 commit

12ddc2639 ocfs2: take inode cluster lock before moving reflinked inode from orphan dir ... Browse Code »

[ Upstream commit e4383029201470523c3ffe339bd7d57e9b4a7d65 ]

While reflinking an inode, we create a new inode in orphan directory,
then take EX lock on it, reflink the original inode to orphan inode and
release EX lock. Once the lock is released another node could request
it in EX mode from ocfs2_recover_orphans() which causes downconvert of
the lock, on this node, to NL mode.

Later we attempt to initialize security acl for the orphan inode and
move it to the reflink destination. However, while doing this we dont
take EX lock on the inode. This could potentially cause problems
because we could be starting transaction, accessing journal and
modifying metadata of the inode while holding NL lock and with another
node holding EX lock on the inode.

Fix this by taking orphan inode cluster lock in EX mode before
initializing security and moving orphan inode to reflink destination.
Use the __tracker variant while taking inode lock to avoid recursive
locking in the ocfs2_init_security_and_acl() call chain.

Link: http://lkml.kernel.org/r/1523475107-7639-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant
Reviewed-by: Joseph Qi
Reviewed-by: Junxiao Bi
Acked-by: Jun Piao
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Changwei Ge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Ashish Samant
2018-06-21 03:02:57 +0800

30 May, 2018

1 commit

839c27f71 ocfs2/dlm: don't handle migrate lockres if already in shutdown ... Browse Code »

[ Upstream commit bb34f24c7d2c98d0c81838a7700e6068325b17a0 ]

We should not handle migrate lockres if we are already in
'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
dlm domain. At last other nodes will get stuck into infinite loop when
requsting lock from us.

The problem is caused by concurrency umount between nodes. Before
receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
migrate target. So N2 will continue sending lockres to N1 even though
N1 has left domain.

N1 N2 (owner)
touch file

access the file,
and get pr lock

begin leave domain and
pick up N1 as new owner

begin leave domain and
migrate all lockres done

begin migrate lockres to N1

end leave domain, but
the lockres left
unexpectedly, because
migrate task has passed

[piaojun@huawei.com: v3]
Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
Signed-off-by: Jun Piao
Reviewed-by: Yiwen Jiang
Reviewed-by: Joseph Qi
Reviewed-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Jun Piao
2018-05-30 13:52:24 +0800

26 Apr, 2018

3 commits

a7fbc7f31 ocfs2: return error when we attempt to access a dirty bh in jbd2 ... Browse Code »

[ Upstream commit d984187e3a1ad7d12447a7ab2c43ce3717a2b5b3 ]

We should not reuse the dirty bh in jbd2 directly due to the following
situation:

1. When removing extent rec, we will dirty the bhs of extent rec and
truncate log at the same time, and hand them over to jbd2.

2. The bhs are submitted to jbd2 area successfully.

3. The write-back thread of device help flush the bhs to disk but
encounter write error due to abnormal storage link.

4. After a while the storage link become normal. Truncate log flush
worker triggered by the next space reclaiming found the dirty bh of
truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
__ocfs2_journal_access():

ocfs2_truncate_log_worker
ocfs2_flush_truncate_log
__ocfs2_flush_truncate_log
ocfs2_replay_truncate_records
ocfs2_journal_access_di
__ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.

5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
extent rec is still in error state, and unfortunately nobody will
take care of it.

6. At last the space of extent rec was not reduced, but truncate log
flush worker have given it back to globalalloc. That will cause
duplicate cluster problem which could be identified by fsck.ocfs2.

Sadly we can hardly revert this but set fs read-only in case of ruining
atomicity and consistency of space reclaim.

Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com
Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
Signed-off-by: Jun Piao
Reviewed-by: Yiwen Jiang
Reviewed-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

piaojun
2018-04-26 17:02:14 +0800
a66174eb4 ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute ... Browse Code »

[ Upstream commit 16c8d569f5704a84164f30ff01b29879f3438065 ]

The race between *set_acl and *get_acl will cause getting incomplete
xattr data as below:

processA processB

ocfs2_set_acl
ocfs2_xattr_set
__ocfs2_xattr_set_handle

ocfs2_get_acl_nolock
ocfs2_xattr_get_nolock:

processB may get incomplete xattr data if processA hasn't set_acl done.

So we should use 'ip_xattr_sem' to protect getting extended attribute in
ocfs2_get_acl_nolock(), as other processes could be changing it
concurrently.

Link: http://lkml.kernel.org/r/5A5DDCFF.7030001@huawei.com
Signed-off-by: Jun Piao
Reviewed-by: Alex Chen
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc: Changwei Ge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

piaojun
2018-04-26 17:02:14 +0800
66aaeed27 ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid ... Browse Code »

[ Upstream commit 025bcbde3634b2c9b316f227fed13ad6ad6817fb ]

If metadata is corrupted such as 'invalid inode block', we will get
failed by calling 'mount()' and then set filesystem readonly as below:

ocfs2_mount
ocfs2_initialize_super
ocfs2_init_global_system_inodes
ocfs2_iget
ocfs2_read_locked_inode
ocfs2_validate_inode_block
ocfs2_error
ocfs2_handle_error
ocfs2_set_ro_flag(osb, 0); // set readonly

In this situation we need return -EROFS to 'mount.ocfs2', so that user
can fix it by fsck. And then mount again. In addition, 'mount.ocfs2'
should be updated correspondingly as it only return 1 for all errno.
And I will post a patch for 'mount.ocfs2' too.

Link: http://lkml.kernel.org/r/5A4302FA.2010606@huawei.com
Signed-off-by: Jun Piao
Reviewed-by: Alex Chen
Reviewed-by: Joseph Qi
Reviewed-by: Changwei Ge
Reviewed-by: Gang He
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

piaojun
2018-04-26 17:02:14 +0800

22 Feb, 2018

1 commit

2e7e8bd8f ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE ... Browse Code »

commit ff26cc10aec128c3f86b5611fd5f59c71d49c0e3 upstream.

If we can't get inode lock immediately in the function
ocfs2_inode_lock_with_page() when reading a page, we should not return
directly here, since this will lead to a softlockup problem when the
kernel is configured with CONFIG_PREEMPT is not set. The method is to
get a blocking lock and immediately unlock before returning, this can
avoid CPU resource waste due to lots of retries, and benefits fairness
in getting lock among multiple nodes, increase efficiency in case
modifying the same file frequently from multiple nodes.

The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1)
looks like:

Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:

dump_stack+0x5c/0x82
panic+0xd5/0x21e
watchdog_timer_fn+0x208/0x210
__hrtimer_run_queues+0xcc/0x200
hrtimer_interrupt+0xa6/0x1f0
smp_apic_timer_interrupt+0x34/0x50
apic_timer_interrupt+0x96/0xa0

RIP: 0010:unlock_page+0x17/0x30
RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004
RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300
RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00
R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518
R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300
ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
ocfs2_readpage+0x41/0x2d0 [ocfs2]
filemap_fault+0x12b/0x5c0
ocfs2_fault+0x29/0xb0 [ocfs2]
__do_fault+0x1a/0xa0
__handle_mm_fault+0xbe8/0x1090
handle_mm_fault+0xaa/0x1f0
__do_page_fault+0x235/0x4b0
trace_do_page_fault+0x3c/0x110
async_page_fault+0x28/0x30
RIP: 0033:0x7fa75ded638e
RSP: 002b:00007ffd6657db18 EFLAGS: 00010287
RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700
RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700
RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000
R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770
R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000

About performance improvement, we can see the testing time is reduced,
and CPU utilization decreases, the detailed data is as follows. I ran
multi_mmap test case in ocfs2-test package in a three nodes cluster.

Before applying this patch:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2754 ocfs2te+ 20 0 170248 6980 4856 D 80.73 0.341 0:18.71 multi_mmap
1505 root rt 0 222236 123060 97224 S 2.658 6.015 0:01.44 corosync
5 root 20 0 0 0 0 S 1.329 0.000 0:00.19 kworker/u8:0
95 root 20 0 0 0 0 S 1.329 0.000 0:00.25 kworker/u8:1
2728 root 20 0 0 0 0 S 0.997 0.000 0:00.24 jbd2/sda1-33
2721 root 20 0 0 0 0 S 0.664 0.000 0:00.07 ocfs2dc-3C8CFD4
2750 ocfs2te+ 20 0 142976 4652 3532 S 0.664 0.227 0:00.28 mpirun

ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
Tests with "-b 4096 -C 32768"
Thu Dec 28 14:44:52 CST 2017
multi_mmap..................................................Passed.
Runtime 783 seconds.

After apply this patch:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2508 ocfs2te+ 20 0 170248 6804 4680 R 54.00 0.333 0:55.37 multi_mmap
155 root 20 0 0 0 0 S 2.667 0.000 0:01.20 kworker/u8:3
95 root 20 0 0 0 0 S 2.000 0.000 0:01.58 kworker/u8:1
2504 ocfs2te+ 20 0 142976 4604 3480 R 1.667 0.225 0:01.65 mpirun
5 root 20 0 0 0 0 S 1.000 0.000 0:01.36 kworker/u8:0
2482 root 20 0 0 0 0 S 1.000 0.000 0:00.86 jbd2/sda1-33
299 root 0 -20 0 0 0 S 0.333 0.000 0:00.13 kworker/2:1H
335 root 0 -20 0 0 0 S 0.333 0.000 0:00.17 kworker/1:1H
535 root 20 0 12140 7268 1456 S 0.333 0.355 0:00.34 haveged
1282 root rt 0 222284 123108 97224 S 0.333 6.017 0:01.33 corosync

ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
Tests with "-b 4096 -C 32768"
Thu Dec 28 15:04:12 CST 2017
multi_mmap..................................................Passed.
Runtime 487 seconds.

Link: http://lkml.kernel.org/r/1514447305-30814-1-git-send-email-ghe@suse.com
Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He
Reviewed-by: Eric Ren
Acked-by: alex chen
Acked-by: piaojun
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc: Changwei Ge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Gang He
2018-02-22 22:42:16 +0800

24 Nov, 2017

2 commits

44ec0aecc ocfs2: should wait dio before inode lock in ocfs2_setattr() ... Browse Code »

commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.

we should wait dio requests to finish before inode lock in
ocfs2_setattr(), otherwise the following deadlock will happen:

process 1 process 2 process 3
truncate file 'A' end_io of writing file 'A' receiving the bast messages
ocfs2_setattr
ocfs2_inode_lock_tracker
ocfs2_inode_lock_full
inode_dio_wait
__inode_dio_wait
-->waiting for all dio
requests finish
dlm_proxy_ast_handler
dlm_do_local_bast
ocfs2_blocking_ast
ocfs2_generic_handle_bast
set OCFS2_LOCK_BLOCKED flag
dio_end_io
dio_bio_end_aio
dio_complete
ocfs2_dio_end_io
ocfs2_dio_end_io_write
ocfs2_inode_lock
__ocfs2_cluster_lock
ocfs2_wait_for_mask
-->waiting for OCFS2_LOCK_BLOCKED
flag to be cleared, that is waiting
for 'process 1' unlocking the inode lock
inode_dio_end
-->here dec the i_dio_count, but will never
be called, so a deadlock happened.

Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com
Signed-off-by: Alex Chen
Reviewed-by: Jun Piao
Reviewed-by: Joseph Qi
Acked-by: Changwei Ge
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

alex chen
2017-11-24 15:37:04 +0800
e250a1993 ocfs2: fix cluster hang after a node dies ... Browse Code »

commit 1c01967116a678fed8e2c68a6ab82abc8effeddc upstream.

When a node dies, other live nodes have to choose a new master for an
existed lock resource mastered by the dead node.

As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources as
DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
changes lock resource's master later.

So without invoking dlm_move_lockres_to_recovery_list, no master will be
choosed after dlm recovery accomplishment since no lock resource can be
found through ::resource list.

What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
resources mastered a dead node, it will break up synchronization among
nodes.

So invoke dlm_move_lockres_to_recovery_list again.

Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.com
Signed-off-by: Changwei Ge
Reported-by: Vitaly Mayatskih
Tested-by: Vitaly Mayatskikh
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Changwei Ge
2017-11-24 15:37:04 +0800

03 Nov, 2017

1 commit

105ddc93f ocfs2: fstrim: Fix start offset of first cluster group during fstrim ... Browse Code »

The first cluster group descriptor is not stored at the start of the
group but at an offset from the start. We need to take this into
account while doing fstrim on the first cluster group. Otherwise we
will wrongly start fstrim a few blocks after the desired start block and
the range can cross over into the next cluster group and zero out the
group descriptor there. This can cause filesytem corruption that cannot
be fixed by fsck.

Link: http://lkml.kernel.org/r/1507835579-7308-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant
Reviewed-by: Junxiao Bi
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ashish Samant
2017-11-03 22:39:19 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

15 Sep, 2017

1 commit

0f0d12728 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).

This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like

list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

sed -i -e 's/\/SB_RDONLY/g' \
-e 's/\/SB_NOSUID/g' \
-e 's/\/SB_NODEV/g' \
-e 's/\/SB_NOEXEC/g' \
-e 's/\/SB_SYNCHRONOUS/g' \
-e 's/\/SB_MANDLOCK/g' \
-e 's/\/SB_DIRSYNC/g' \
-e 's/\/SB_NOATIME/g' \
-e 's/\/SB_NODIRATIME/g' \
-e 's/\/SB_SILENT/g' \
-e 's/\/SB_POSIXACL/g' \
-e 's/\/SB_KERNMOUNT/g' \
-e 's/\/SB_I_VERSION/g' \
-e 's/\/SB_LAZYTIME/g' \
$list

and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

Linus Torvalds
2017-09-15 09:54:01 +0800

08 Sep, 2017

2 commits

ae8ac6b7d Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull quota scaling updates from Jan Kara:
"This contains changes to make the quota subsystem more scalable.

Reportedly it improves number of files created per second on ext4
filesystem on fast storage by about a factor of 2x"

* 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
quota: Add lock annotations to struct members
quota: Reduce contention on dq_data_lock
fs: Provide __inode_get_bytes()
quota: Inline dquot_[re]claim_reserved_space() into callsite
quota: Inline inode_{incr,decr}_space() into callsites
quota: Inline functions into their callsites
ext4: Disable dirty list tracking of dquots when journalling quotas
quota: Allow disabling tracking of dirty dquots in a list
quota: Remove dq_wait_unused from dquot
quota: Move locking into clear_dquot_dirty()
quota: Do not dirty bad dquots
quota: Fix possible corruption of dqi_flags
quota: Propagate ->quota_read errors from v2_read_file_info()
quota: Fix error codes in v2_read_file_info()
quota: Push dqio_sem down to ->read_file_info()
quota: Push dqio_sem down to ->write_file_info()
quota: Push dqio_sem down to ->get_next_id()
quota: Push dqio_sem down to ->release_dqblk()
quota: Remove locking for writing to the old quota format
quota: Do not acquire dqio_sem for dquot overwrites in v2 format
...

Linus Torvalds
2017-09-08 06:19:35 +0800
a0725ab0c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:

- Fix for a registration race in loop, from Anton Volkov.

- Overflow complaint fix from Arnd for DAC960.

- Series of drbd changes from the usual suspects.

- Conversion of the stec/skd driver to blk-mq. From Bart.

- A few BFQ improvements/fixes from Paolo.

- CFQ improvement from Ritesh, allowing idling for group idle.

- A few fixes found by Dan's smatch, courtesy of Dan.

- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.

- A few nbd fixes from Josef.

- Support for cgroup info in blktrace, from Shaohua.

- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.

- Various corner cases and error handling fixes from Weiping Zhang.

- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.

- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.

- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...

Linus Torvalds
2017-09-08 02:59:42 +0800

07 Sep, 2017

4 commits

d34fc1adf Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- various misc bits

- DAX updates

- OCFS2

- most of MM

* emailed patches from Andrew Morton : (119 commits)
mm,fork: introduce MADV_WIPEONFORK
x86,mpx: make mpx depend on x86-64 to free up VMA flag
mm: add /proc/pid/smaps_rollup
mm: hugetlb: clear target sub-page last when clearing huge page
mm: oom: let oom_reap_task and exit_mmap run concurrently
swap: choose swap device according to numa node
mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
mm, oom: do not rely on TIF_MEMDIE for memory reserves access
z3fold: use per-cpu unbuddied lists
mm, swap: don't use VMA based swap readahead if HDD is used as swap
mm, swap: add sysfs interface for VMA based swap readahead
mm, swap: VMA based swap readahead
mm, swap: fix swap readahead marking
mm, swap: add swap readahead hit statistics
mm/vmalloc.c: don't reinvent the wheel but use existing llist API
mm/vmstat.c: fix wrong comment
selftests/memfd: add memfd_create hugetlbfs selftest
mm/shmem: add hugetlbfs support to memfd_create()
mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
...

Linus Torvalds
2017-09-07 11:49:49 +0800
964f14a0d ocfs2: clean up some dead code ... Browse Code »

clean up some unused functions and parameters.

Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.com
Signed-off-by: Jun Piao
Reviewed-by: Alex Chen
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jun Piao
2017-09-07 08:27:24 +0800
01ffb56bc ocfs2: make ocfs2_set_acl() static ... Browse Code »

The function is never called outside of fs/ocfs2/acl.c.

Link: http://lkml.kernel.org/r/20170801141252.19675-2-jack@suse.cz
Signed-off-by: Jan Kara
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2017-09-07 08:27:24 +0800
ec3604c7a Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux ... Browse Code »

Pull writeback error handling updates from Jeff Layton:
"This pile continues the work from last cycle on better tracking
writeback errors. In v4.13 we added some basic errseq_t infrastructure
and converted a few filesystems to use it.

This set continues refining that infrastructure, adds documentation,
and converts most of the other filesystems to use it. The main
exception at this point is the NFS client"

* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
ecryptfs: convert to file_write_and_wait in ->fsync
mm: remove optimizations based on i_size in mapping writeback waits
fs: convert a pile of fsync routines to errseq_t based reporting
gfs2: convert to errseq_t based writeback error reporting for fsync
fs: convert sync_file_range to use errseq_t based error-tracking
mm: add file_fdatawait_range and file_write_and_wait
fuse: convert to errseq_t based error tracking for fsync
mm: consolidate dax / non-dax checks for writeback
Documentation: add some docs for errseq_t
errseq: rename __errseq_set to errseq_set

Linus Torvalds
2017-09-07 05:11:03 +0800

24 Aug, 2017

1 commit

74d46992e block: replace bi_bdev with a gendisk pointer and partitions index ... Browse Code »

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-08-24 02:49:55 +0800

18 Aug, 2017

5 commits

7b9ca4c61 quota: Reduce contention on dq_data_lock ... Browse Code »

dq_data_lock is currently used to protect all modifications of quota
accounting information, consistency of quota accounting on the inode,
and dquot pointers from inode. As a result contention on the lock can be
pretty heavy.

Reduce the contention on the lock by protecting quota accounting
information by a new dquot->dq_dqb_lock and consistency of quota
accounting with inode usage by inode->i_lock.

This change reduces time to create 500000 files on ext4 on ramdisk by 50
different processes in separate directories by 6% when user quota is
turned on. When those 50 processes belong to 50 different users, the
improvement is about 9%.

Signed-off-by: Jan Kara

Jan Kara
2017-08-18 04:07:59 +0800
42fdb8583 quota: Push dqio_sem down to ->read_file_info() ... Browse Code »

Push down acquisition of dqio_sem into ->read_file_info() callback. This
is for consistency with other operations and it also allows us to get
rid of an ugliness in OCFS2.

Reviewed-by: Andreas Dilger
Signed-off-by: Jan Kara

Jan Kara
2017-08-18 01:16:24 +0800
9a8ae30e7 quota: Push dqio_sem down to ->write_file_info() ... Browse Code »

Push down acquisition of dqio_sem into ->write_file_info() callback.
Mostly for consistency with other operations.

Reviewed-by: Andreas Dilger
Signed-off-by: Jan Kara

Jan Kara
2017-08-18 01:11:23 +0800
d6ab36610 quota: Acquire dqio_sem for reading in vfs_load_quota_inode() ... Browse Code »

vfs_load_quota_inode() needs dqio_sem only for reading. In fact dqio_sem
is not needed there at all since the function can be called only during
quota on when quota file cannot be modified but let's leave the
protection there since it is logical and the path is in no way
performance critical.

Reviewed-by: Andreas Dilger
Signed-off-by: Jan Kara

Jan Kara
2017-08-18 00:59:04 +0800
bc8230ee8 quota: Convert dqio_mutex to rwsem ... Browse Code »

Convert dqio_mutex to rwsem and call it dqio_sem. No functional changes
yet.

Signed-off-by: Jan Kara

Jan Kara
2017-08-18 00:52:48 +0800

03 Aug, 2017

1 commit

19ec8e485 ocfs2: don't clear SGID when inheriting ACLs ... Browse Code »

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
into ocfs2_iop_set_acl(). That way the function will not be called when
inheriting ACLs which is what we want as it prevents SGID bit clearing
and the mode has been properly set by posix_acl_create() anyway. Also
posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
mode itself.

Fixes: 073931017b4 ("posix_acl: Clear SGID bit when setting file permissions")
Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
Signed-off-by: Jan Kara
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2017-08-03 08:16:13 +0800

01 Aug, 2017

1 commit

3b49c9a1e fs: convert a pile of fsync routines to errseq_t based reporting ... Browse Code »

This patch converts most of the in-kernel filesystems that do writeback
out of the pagecache to report errors using the errseq_t-based
infrastructure that was recently added. This allows them to report
errors once for each open file description.

Most filesystems have a fairly straightforward fsync operation. They
call filemap_write_and_wait_range to write back all of the data and
wait on it, and then (sometimes) sync out the metadata.

For those filesystems this is a straightforward conversion from calling
filemap_write_and_wait_range in their fsync operation to calling
file_write_and_wait_range.

Acked-by: Jan Kara
Acked-by: Dave Kleikamp
Signed-off-by: Jeff Layton

Jeff Layton
2017-08-01 20:39:29 +0800

17 Jul, 2017

1 commit

bc98a42c1 VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) ... Browse Code »

Firstly by applying the following with coccinelle's spatch:

@@ expression SB; @@
-SB->s_flags & MS_RDONLY
+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

@@ expression A, SB; @@
(
-(!sb_rdonly(SB)) && A
+!sb_rdonly(SB) && A
|
-A != (sb_rdonly(SB))
+A != sb_rdonly(SB)
|
-A == (sb_rdonly(SB))
+A == sb_rdonly(SB)
|
-!(sb_rdonly(SB))
+!sb_rdonly(SB)
|
-A && (sb_rdonly(SB))
+A && sb_rdonly(SB)
|
-A || (sb_rdonly(SB))
+A || sb_rdonly(SB)
|
-(sb_rdonly(SB)) != A
+sb_rdonly(SB) != A
|
-(sb_rdonly(SB)) == A
+sb_rdonly(SB) == A
|
-(sb_rdonly(SB)) && A
+sb_rdonly(SB) && A
|
-(sb_rdonly(SB)) || A
+sb_rdonly(SB) || A
)

@@ expression A, B, SB; @@
(
-(sb_rdonly(SB)) ? 1 : 0
+sb_rdonly(SB)
|
-(sb_rdonly(SB)) ? A : B
+sb_rdonly(SB) ? A : B
)

to remove left over excess bracketage and finally by applying:

@@ expression A, SB; @@
(
-(A & MS_RDONLY) != sb_rdonly(SB)
+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
|
-(A & MS_RDONLY) == sb_rdonly(SB)
+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells

David Howells
2017-07-17 15:45:34 +0800

07 Jul, 2017

4 commits

b74271e40 ocfs2: constify attribute_group structures ... Browse Code »

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by work with
const attribute_group. So mark the non-const structs as const.

File size before:
text data bss dec hex filename
4402 1088 38 5528 1598 fs/ocfs2/stackglue.o

File size After adding 'const':
text data bss dec hex filename
4442 1024 38 5504 1580 fs/ocfs2/stackglue.o

Link: http://lkml.kernel.org/r/cab4e59b4918db3ed2ec77073a4cb310c4429ef5.1498808026.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arvind Yadav
2017-07-07 07:24:30 +0800
25b1c72e1 ocfs2: free 'dummy_sc' in sc_fop_release() to prevent memory leak ... Browse Code »

'sd->dbg_sock' is malloced in sc_common_open(), but not freed at the end
of sc_fop_release().

Link: http://lkml.kernel.org/r/594FB0A4.2050105@huawei.com
Signed-off-by: Jun Piao
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

piaojun
2017-07-07 07:24:30 +0800
62aa81d7c ocfs2: use magic.h ... Browse Code »

Filesystems generally use SUPER_MAGIC values from magic.h instead of a
local definition.

Link: http://lkml.kernel.org/r/20170521154217.27917-1-fabf@skynet.be
Signed-off-by: Fabian Frederick
Reviewed-by: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Cc: Joseph Qi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2017-07-07 07:24:30 +0800
8c4d5a438 ocfs2: fix a static checker warning ... Browse Code »

Fix a static code checker warning:

fs/ocfs2/inode.c:179 ocfs2_iget() warn: passing zero to 'ERR_PTR'

Fixes: d56a8f32e4c6 ("ocfs2: check/fix inode block for online file check")
Link: http://lkml.kernel.org/r/1495516634-1952-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He
Reviewed-by: Joseph Qi
Reviewed-by: Eric Ren
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gang He
2017-07-07 07:24:30 +0800

04 Jul, 2017

2 commits

c6b1e36c8 Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block/IO updates from Jens Axboe:
"This is the main pull request for the block layer for 4.13. Not a huge
round in terms of features, but there's a lot of churn related to some
core cleanups.

Note this depends on the UUID tree pull request, that Christoph
already sent out.

This pull request contains:

- A series from Christoph, unifying the error/stats codes in the
block layer. We now use blk_status_t everywhere, instead of using
different schemes for different places.

- Also from Christoph, some cleanups around request allocation and IO
scheduler interactions in blk-mq.

- And yet another series from Christoph, cleaning up how we handle
and do bounce buffering in the block layer.

- A blk-mq debugfs series from Bart, further improving on the support
we have for exporting internal information to aid debugging IO
hangs or stalls.

- Also from Bart, a series that cleans up the request initialization
differences across types of devices.

- A series from Goldwyn Rodrigues, allowing the block layer to return
failure if we will block and the user asked for non-blocking.

- Patch from Hannes for supporting setting loop devices block size to
that of the underlying device.

- Two series of patches from Javier, fixing various issues with
lightnvm, particular around pblk.

- A series from me, adding support for write hints. This comes with
NVMe support as well, so applications can help guide data placement
on flash to improve performance, latencies, and write
amplification.

- A series from Ming, improving and hardening blk-mq support for
stopping/starting and quiescing hardware queues.

- Two pull requests for NVMe updates. Nothing major on the feature
side, but lots of cleanups and bug fixes. From the usual crew.

- A series from Neil Brown, greatly improving the bio rescue set
support. Most notably, this kills the bio rescue work queues, if we
don't really need them.

- Lots of other little bug fixes that are all over the place"

* 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
lightnvm: pblk: set line bitmap check under debug
lightnvm: pblk: verify that cache read is still valid
lightnvm: pblk: add initialization check
lightnvm: pblk: remove target using async. I/Os
lightnvm: pblk: use vmalloc for GC data buffer
lightnvm: pblk: use right metadata buffer for recovery
lightnvm: pblk: schedule if data is not ready
lightnvm: pblk: remove unused return variable
lightnvm: pblk: fix double-free on pblk init
lightnvm: pblk: fix bad le64 assignations
nvme: Makefile: remove dead build rule
blk-mq: map all HWQ also in hyperthreaded system
nvmet-rdma: register ib_client to not deadlock in device removal
nvme_fc: fix error recovery on link down.
nvmet_fc: fix crashes on bad opcodes
nvme_fc: Fix crash when nvme controller connection fails.
nvme_fc: replace ioabort msleep loop with completion
nvme_fc: fix double calls to nvme_cleanup_cmd()
nvme-fabrics: verify that a controller returns the correct NQN
nvme: simplify nvme_dev_attrs_are_visible
...

Linus Torvalds
2017-07-04 01:34:51 +0800
81e3e0448 Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid ... Browse Code »

Pull uuid subsystem from Christoph Hellwig:
"This is the new uuid subsystem, in which Amir, Andy and I have started
consolidating our uuid/guid helpers and improving the types used for
them. Note that various other subsystems have pulled in this tree, so
I'd like it to go in early.

UUID/GUID summary:

- introduce the new uuid_t/guid_t types that are going to replace the
somewhat confusing uuid_be/uuid_le types and make the terminology
fit the various specs, as well as the userspace libuuid library.
(me, based on a previous version from Amir)

- consolidated generic uuid/guid helper functions lifted from XFS and
libnvdimm (Amir and me)

- conversions to the new types and helpers (Amir, Andy and me)"

* tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid: (34 commits)
ACPI: hns_dsaf_acpi_dsm_guid can be static
mmc: sdhci-pci: make guid intel_dsm_guid static
uuid: Take const on input of uuid_is_null() and guid_is_null()
thermal: int340x_thermal: fix compile after the UUID API switch
thermal: int340x_thermal: Switch to use new generic UUID API
acpi: always include uuid.h
ACPI: Switch to use generic guid_t in acpi_evaluate_dsm()
ACPI / extlog: Switch to use new generic UUID API
ACPI / bus: Switch to use new generic UUID API
ACPI / APEI: Switch to use new generic UUID API
acpi, nfit: Switch to use new generic UUID API
MAINTAINERS: add uuid entry
tmpfs: generate random sb->s_uuid
scsi_debug: switch to uuid_t
nvme: switch to uuid_t
sysctl: switch to use uuid_t
partitions/ldm: switch to use uuid_t
overlayfs: use uuid_t instead of uuid_be
fs: switch ->s_uuid to uuid_t
ima/policy: switch to use uuid_t
...

Linus Torvalds
2017-07-04 00:55:26 +0800

24 Jun, 2017

1 commit

8818efaaa ocfs2: fix deadlock caused by recursive locking in xattr ... Browse Code »

Another deadlock path caused by recursive locking is reported. This
kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been
fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points"). Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.

This one can be reproduced like this. On node1, cp a large file from
home directory to ocfs2 mountpoint. While on node2, run
setfacl/getfacl. Both nodes will hang up there. The backtraces:

On node1:
__ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
ocfs2_write_begin+0x43/0x1a0 [ocfs2]
generic_perform_write+0xa9/0x180
__generic_file_write_iter+0x1aa/0x1d0
ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
__vfs_write+0xc3/0x130
vfs_write+0xb1/0x1a0
SyS_write+0x46/0xa0

On node2:
__ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
ocfs2_set_acl+0x22d/0x260 [ocfs2]
ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
set_posix_acl+0x75/0xb0
posix_acl_xattr_set+0x49/0xa0
__vfs_setxattr+0x69/0x80
__vfs_setxattr_noperm+0x72/0x1a0
vfs_setxattr+0xa7/0xb0
setxattr+0x12d/0x190
path_setxattr+0x9f/0xb0
SyS_setxattr+0x14/0x20

Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").

Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren
Reported-by: Thomas Voegtle
Tested-by: Thomas Voegtle
Reviewed-by: Joseph Qi
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Junxiao Bi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Ren
2017-06-24 07:15:55 +0800

13 Jun, 2017

1 commit

fdd050b5b Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into nvme-base Browse Code »

Christoph Hellwig
2017-06-13 17:45:14 +0800

12 Jun, 2017

1 commit

8f66439ee Merge tag 'v4.12-rc5' into for-4.13/block ... Browse Code »

We've already got a few conflicts and upcoming work depends on some of the
changes that have gone into mainline as regression fixes for this series.

Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
trees to continue working on 4.13 changes.

Signed-off-by: Jens Axboe

Jens Axboe
2017-06-12 22:30:13 +0800

09 Jun, 2017

1 commit

4e4cbee93 block: switch bios to blk_status_t ... Browse Code »

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-09 23:27:32 +0800