20 Jan, 2017

1 commit

  • commit e7ee2c089e94067d68475990bdeed211c8852917 upstream.

    The crash happens rather often when we reset some cluster nodes while
    nodes contend fiercely to do truncate and append.

    The crash backtrace is below:

    dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
    dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
    ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
    ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
    ocfs2: Beginning quota recovery on device (253,18) for slot 2
    ocfs2: Finishing quota recovery on device (253,18) for slot 2
    (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
    (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
    ------------[ cut here ]------------
    kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
    Supported: No, Unsupported modules are loaded
    CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
    task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
    RIP: 0010:[] [] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
    RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282
    RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
    R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
    R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
    FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
    Call Trace:
    ocfs2_setattr+0x698/0xa90 [ocfs2]
    notify_change+0x1ae/0x380
    do_truncate+0x5e/0x90
    do_sys_ftruncate.constprop.11+0x108/0x160
    entry_SYSCALL_64_fastpath+0x12/0x6d
    Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
    RIP [] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]

    It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
    not equal to the disk i_size. We mistakenly trust the LVB because the
    underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
    DLM_SBF_VALNOTVALID properly for us. But, why?

    The current code tries to downconvert lock without DLM_LKF_VALBLK flag
    to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
    if the lock resource type needs LVB. This is not the right way for
    fsdlm.

    The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
    DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If
    DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
    this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
    failure happens.

    The following diagram briefly illustrates how this crash happens:

    RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;

    The 1st round:

    Node1 Node2
    RSB1: PR
    RSB1(master): NULL->EX
    ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
    ocfs2_dlm_lock(no DLM_LKF_VALBLK)

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    dlm_lock(no DLM_LKF_VALBLK)
    convert_lock(overwrite lkb->lkb_exflags
    with no DLM_LKF_VALBLK)

    RSB1: NULL RSB1: EX
    reset Node2
    dlm_recover_rsbs()
    recover_lvb()

    /* The LVB is not trustable if the node with EX fails and
    * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
    */

    if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
    return; * to invalid the LVB here.
    */

    The 2nd round:

    Node 1 Node2
    RSB1(become master from recovery)

    ocfs2_setattr()
    ocfs2_inode_lock(NULL->EX)
    /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
    ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
    ocfs2_truncate_file()
    mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */

    The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag
    for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
    is uesed.

    Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
    Signed-off-by: Eric Ren
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Eric Ren
     

12 Nov, 2016

1 commit

  • The following panic was caught when run ocfs2 disconfig single test
    (block size 512 and cluster size 8192). ocfs2_journal_dirty() return
    -ENOSPC, that means credits were used up.

    The total credit should include 3 times of "num_dx_leaves" from
    ocfs2_dx_dir_rebalance(), because 2 times will be consumed in
    ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in
    ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() ->
    ocfs2_dx_dir_format_cluster(). But only two times is included in
    ocfs2_dx_dir_rebalance_credits(), fix it.

    This can cause read-only fs(v4.1+) or panic for mainline linux depending
    on mount option.

    ------------[ cut here ]------------
    kernel BUG at fs/ocfs2/journal.c:775!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
    CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2
    Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
    task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000
    RIP: ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
    RSP: 0018:ffff8800a7d4b6d8 EFLAGS: 00010286
    RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9
    RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460
    RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460
    R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e
    R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070
    FS: 00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0
    Call Trace:
    ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2]
    ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2]
    ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2]
    ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2]
    ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2]
    ocfs2_mknod+0x5a2/0x1400 [ocfs2]
    ocfs2_create+0x73/0x180 [ocfs2]
    vfs_create+0xd8/0x100
    lookup_open+0x185/0x1c0
    do_last+0x36d/0x780
    path_openat+0x92/0x470
    do_filp_open+0x4a/0xa0
    do_sys_open+0x11a/0x230
    SyS_open+0x1e/0x20
    system_call_fastpath+0x12/0x71
    Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
    RIP ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
    ---[ end trace 91ac5312a6ee1288 ]---
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: disabled

    Link: http://lkml.kernel.org/r/1478248135-31963-1-git-send-email-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

12 Oct, 2016

1 commit

  • In the dlm_migrate_request_handler(), when `ret' is -EEXIST, the mle
    should be freed, otherwise the memory will be leaked.

    Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4A3D3522A@H3CMLB12-EX.srv.huawei-3com.com
    Signed-off-by: Guozhonghua
    Reviewed-by: Mark Fasheh
    Cc: Eric Ren
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Guozhonghua
     

11 Oct, 2016

4 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

08 Oct, 2016

9 commits

  • Al Viro
     
  • Merge updates from Andrew Morton:

    - fsnotify updates

    - ocfs2 updates

    - all of MM

    * emailed patches from Andrew Morton : (127 commits)
    console: don't prefer first registered if DT specifies stdout-path
    cred: simpler, 1D supplementary groups
    CREDITS: update Pavel's information, add GPG key, remove snail mail address
    mailmap: add Johan Hovold
    .gitattributes: set git diff driver for C source code files
    uprobes: remove function declarations from arch/{mips,s390}
    spelling.txt: "modeled" is spelt correctly
    nmi_backtrace: generate one-line reports for idle cpus
    arch/tile: adopt the new nmi_backtrace framework
    nmi_backtrace: do a local dump_stack() instead of a self-NMI
    nmi_backtrace: add more trigger_*_cpu_backtrace() methods
    min/max: remove sparse warnings when they're nested
    Documentation/filesystems/proc.txt: add more description for maps/smaps
    mm, proc: fix region lost in /proc/self/smaps
    proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
    proc: add LSM hook checks to /proc//timerslack_ns
    proc: relax /proc//timerslack_ns capability requirements
    meminfo: break apart a very long seq_printf with #ifdefs
    seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
    proc: faster /proc/*/status
    ...

    Linus Torvalds
     
  • These inode operations are no longer used; remove them.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The extern struct variable ocfs2_inode_cache is not defined. It meant to
    use ocfs2_inode_cachep defined in super.c, I think. Fortunately it is
    not used anywhere now, so no impact actually. Clean it up to fix this
    mistake.

    Link: http://lkml.kernel.org/r/57E1E49D.8050503@huawei.com
    Signed-off-by: Joseph Qi
    Reviewed-by: Eric Ren
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • The workqueue "dlm_worker" queues a single work item &dlm->dispatched_work
    and thus it doesn't require execution ordering. Hence, alloc_workqueue
    has been used to replace the deprecated create_singlethread_workqueue
    instance.

    The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
    memory pressure.

    Since there are fixed number of work items, explicit concurrency
    limit is unnecessary here.

    Link: http://lkml.kernel.org/r/2b5ad8d6688effe1a9ddb2bc2082d26fbbe00302.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • The workqueue "ocfs2_wq" queues multiple work items viz
    &osb->la_enable_wq, &journal->j_recovery_work, &os->os_orphan_scan_work,
    &osb->osb_truncate_log_wq which require strict execution ordering. Hence,
    an ordered dedicated workqueue has been used.

    WQ_MEM_RECLAIM has been set to ensure forward progress under memory
    pressure because the workqueue is being used on a memory reclaim path.

    Link: http://lkml.kernel.org/r/66279de510a7f4cfc6e386d99b7e04b3f65fb11b.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • The workqueue "o2net_wq" queues multiple work items viz
    &old_sc->sc_shutdown_work, &sc->sc_rx_work, &sc->sc_connect_work which
    require strict execution ordering. Hence, an ordered dedicated
    workqueue has been used.

    WQ_MEM_RECLAIM has been set to ensure forward progress under memory
    pressure.

    Link: http://lkml.kernel.org/r/ddc12e5766c79ba26f8a00d98049107f8a1d4866.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • The workqueue "user_dlm_worker" queues a single work item
    &lockres->l_work per user_lock_res instance and so it doesn't require
    execution ordering. Hence, alloc_workqueue has been used to replace the
    deprecated create_singlethread_workqueue instance.

    The WQ_MEM_RECLAIM flag has been set to ensure forward progress under
    memory pressure.

    Since there are fixed number of work items, explicit concurrency
    limit is unnecessary here.

    Link: http://lkml.kernel.org/r/9748136d3a3b18138ad1d6ba708367aa1fe9f98c.1472590094.git.bhaktipriya96@gmail.com
    Signed-off-by: Bhaktipriya Shridhar
    Acked-by: Tejun Heo
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bhaktipriya Shridhar
     
  • Pull VFS splice updates from Al Viro:
    "There's a bunch of branches this cycle, both mine and from other folks
    and I'd rather send pull requests separately.

    This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
    (and introduction of such). Gets rid of a lot of code in fs/splice.c
    and elsewhere; there will be followups, but these are for the next
    cycle... Some pipe/splice-related cleanups from Miklos in the same
    branch as well"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    pipe: fix comment in pipe_buf_operations
    pipe: add pipe_buf_steal() helper
    pipe: add pipe_buf_confirm() helper
    pipe: add pipe_buf_release() helper
    pipe: add pipe_buf_get() helper
    relay: simplify relay_file_read()
    switch default_file_splice_read() to use of pipe-backed iov_iter
    switch generic_file_splice_read() to use of ->read_iter()
    new iov_iter flavour: pipe-backed
    fuse_dev_splice_read(): switch to add_to_pipe()
    skb_splice_bits(): get rid of callback
    new helper: add_to_pipe()
    splice: lift pipe_lock out of splice_to_pipe()
    splice: switch get_iovec_page_array() to iov_iter
    splice_to_pipe(): don't open-code wakeup_pipe_readers()
    consistent treatment of EFAULT on O_DIRECT read/write

    Linus Torvalds
     

06 Oct, 2016

1 commit


01 Oct, 2016

1 commit

  • The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.

    In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
    there are 2 process repeatedly performing the following operations
    respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
    1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
    ftruncate(fd, CLUSTER_SIZE) again and again.

    This is the backtrace when the deadlock happens:

    __wait_on_bit_lock+0x50/0xa0
    __lock_page+0xb7/0xc0
    ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
    ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
    do_page_mkwrite+0x66/0xc0
    handle_mm_fault+0x685/0x1350
    __do_page_fault+0x1d8/0x4d0
    trace_do_page_fault+0x37/0xf0
    do_async_page_fault+0x19/0x70
    async_page_fault+0x28/0x30

    In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
    disk space for this write; ocfs2_try_to_free_truncate_log() will be
    called if -ENOSPC is returned; if we're lucky to get enough clusters,
    which is usually the case, we start over again.

    But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
    will deadlock when trying to grab the target page again.

    Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
    Another deadlock will happen in __do_page_mkwrite() if
    ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
    locked target page.

    These two errors fail on the same path, so fix them by unlocking the
    target page manually before ocfs2_free_write_ctxt().

    Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
    cause.

    Changes since v1:
    1. Also put ENOMEM error case into consideration.

    Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com
    Signed-off-by: Eric Ren
    Reviewed-by: He Gang
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Ren
     

28 Sep, 2016

1 commit

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     

27 Sep, 2016

2 commits

  • Generated patch:

    sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
    sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This is trivial to do:

    - add flags argument to foo_rename()
    - check if flags is zero
    - assign foo_rename() to .rename2 instead of .rename

    This doesn't mean it's impossible to support RENAME_NOREPLACE for these
    filesystems, but it is not trivial, like for local filesystems.
    RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
    for a file to be created on one host while it is overwritten by rename on
    another host).

    Filesystems converted:

    9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.

    After this, we can get rid of the duplicate interfaces for rename.

    Signed-off-by: Miklos Szeredi
    Acked-by: Greg Kroah-Hartman
    Acked-by: David Howells [AFS]
    Acked-by: Mike Marshall
    Cc: Eric Van Hensbergen
    Cc: Ilya Dryomov
    Cc: Jan Harkes
    Cc: Tyler Hicks
    Cc: Oleg Drokin
    Cc: Trond Myklebust
    Cc: Mark Fasheh

    Miklos Szeredi
     

22 Sep, 2016

2 commits

  • inode_change_ok() will be resposible for clearing capabilities and IMA
    extended attributes and as such will need dentry. Give it as an argument
    to inode_change_ok() instead of an inode. Also rename inode_change_ok()
    to setattr_prepare() to better relect that it does also some
    modifications in addition to checks.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • When file permissions are modified via chmod(2) and the user is not in
    the owning group or capable of CAP_FSETID, the setgid bit is cleared in
    inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
    permissions as well as the new ACL, but doesn't clear the setgid bit in
    a similar way; this allows to bypass the check in chmod(2). Fix that.

    References: CVE-2016-7097
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jeff Layton
    Signed-off-by: Jan Kara
    Signed-off-by: Andreas Gruenbacher

    Jan Kara
     

20 Sep, 2016

6 commits

  • This reverts commit 38b52efd218b ("ocfs2: bump up o2cb network protocol
    version").

    This commit made rolling upgrade fail. When one node is upgraded to new
    version with this commit, the remaining nodes will fail to establish
    connections to it, then the application like VMs on the remaining nodes
    can't be live migrated to the upgraded one. This will cause an outage.
    Since negotiate hb timeout behavior didn't change without this commit,
    so revert it.

    Fixes: 38b52efd218bf ("ocfs2: bump up o2cb network protocol version")
    Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Joseph Qi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • If we punch a hole on a reflink such that following conditions are met:

    1. start offset is on a cluster boundary
    2. end offset is not on a cluster boundary
    3. (end offset is somewhere in another extent) or
    (hole range > MAX_CONTIG_BYTES(1MB)),

    we dont COW the first cluster starting at the start offset. But in this
    case, we were wrongly passing this cluster to
    ocfs2_zero_range_for_truncate() to zero out. This will modify the
    cluster in place and zero it in the source too.

    Fix this by skipping this cluster in such a scenario.

    To reproduce:

    1. Create a random file of say 10 MB
    xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
    2. Reflink it
    reflink -f 10MBfile reflnktest
    3. Punch a hole at starting at cluster boundary with range greater that
    1MB. You can also use a range that will put the end offset in another
    extent.
    fallocate -p -o 0 -l 1048615 reflnktest
    4. sync
    5. Check the first cluster in the source file. (It will be zeroed out).
    dd if=10MBfile iflag=direct bs= count=1 | hexdump -C

    Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com
    Signed-off-by: Ashish Samant
    Reported-by: Saar Maoz
    Reviewed-by: Srinivas Eeda
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Cc: Eric Ren
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ashish Samant
     
  • If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
    free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
    will lock/unlock global bitmap inode, we have to unlock it before
    calling this function. But when retry reserve and it fails with no
    global bitmap inode lock taken, it will unlock again in error handling
    branch and BUG.

    This issue also exists if no need retry and then ocfs2_inode_lock fails.
    So fix it.

    Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log")
    Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com
    Signed-off-by: Joseph Qi
    Signed-off-by: Jiufei Xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • The root cause of this issue is the same with the one fixed by the last
    patch, but this time credits for allocator inode and group descriptor
    may not be consumed before trans extend.

    The following error was caught:

    WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
    Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
    CPU: 0 PID: 2037 Comm: rm Tainted: G W 4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2
    Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
    Call Trace:
    dump_stack+0x48/0x5c
    warn_slowpath_common+0x95/0xe0
    warn_slowpath_null+0x1a/0x20
    start_this_handle+0x4c3/0x510 [jbd2]
    jbd2__journal_restart+0x161/0x1b0 [jbd2]
    jbd2_journal_restart+0x13/0x20 [jbd2]
    ocfs2_extend_trans+0x74/0x220 [ocfs2]
    ocfs2_free_cached_blocks+0x16b/0x4e0 [ocfs2]
    ocfs2_run_deallocs+0x70/0x270 [ocfs2]
    ocfs2_commit_truncate+0x474/0x6f0 [ocfs2]
    ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
    ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
    ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
    ocfs2_evict_inode+0x28/0x60 [ocfs2]
    evict+0xab/0x1a0
    iput_final+0xf6/0x190
    iput+0xc8/0xe0
    do_unlinkat+0x1b7/0x310
    SyS_unlinkat+0x22/0x40
    system_call_fastpath+0x12/0x71
    ---[ end trace a62437cb060baa71 ]---
    JBD2: rm wants too many credits (149 > 128)

    Link: http://lkml.kernel.org/r/1473674623-11810-2-git-send-email-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Every time, ocfs2_extend_trans() included a credit for truncate log
    inode, but as that inode had been managed by jbd2 running transaction
    first time, it will not consume that credit until
    jbd2_journal_restart().

    Since total credits to extend always included the un-consumed ones,
    there will be more and more un-consumed credit, at last
    jbd2_journal_restart() will fail due to credit number over the half of
    max transction credit.

    The following error was caught when unlinking a large file with many
    extents:

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
    Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
    CPU: 0 PID: 13626 Comm: unlink Tainted: G W 4.1.12-37.6.3.el6uek.x86_64 #2
    Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
    Call Trace:
    dump_stack+0x48/0x5c
    warn_slowpath_common+0x95/0xe0
    warn_slowpath_null+0x1a/0x20
    start_this_handle+0x4c3/0x510 [jbd2]
    jbd2__journal_restart+0x161/0x1b0 [jbd2]
    jbd2_journal_restart+0x13/0x20 [jbd2]
    ocfs2_extend_trans+0x74/0x220 [ocfs2]
    ocfs2_replay_truncate_records+0x93/0x360 [ocfs2]
    __ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2]
    ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2]
    ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2]
    ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
    ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
    ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
    ocfs2_evict_inode+0x28/0x60 [ocfs2]
    evict+0xab/0x1a0
    iput_final+0xf6/0x190
    iput+0xc8/0xe0
    do_unlinkat+0x1b7/0x310
    SyS_unlink+0x16/0x20
    system_call_fastpath+0x12/0x71
    ---[ end trace 28aa7410e69369cf ]---
    JBD2: unlink wants too many credits (251 > 128)

    Link: http://lkml.kernel.org/r/1473674623-11810-1-git-send-email-junxiao.bi@oracle.com
    Signed-off-by: Junxiao Bi
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
    checks if lockres master has changed to identify whether new master has
    finished recovery or not. This will introduce a race that right after
    old master does umount ( means master will change), a new convert
    request comes.

    In this case, it will reset lockres state to DLM_RECOVERING and then
    retry convert, and then fail with lockres->l_action being set to
    OCFS2_AST_INVALID, which will cause inconsistent lock level between
    ocfs2 and dlm, and then finally BUG.

    Since dlm recovery will clear lock->convert_pending in
    dlm_move_lockres_to_recovery_list, we can use it to correctly identify
    the race case between convert and recovery. So fix it.

    Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
    Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com
    Signed-off-by: Joseph Qi
    Signed-off-by: Jun Piao
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

06 Aug, 2016

1 commit

  • Pull qstr constification updates from Al Viro:
    "Fairly self-contained bunch - surprising lot of places passes struct
    qstr * as an argument when const struct qstr * would suffice; it
    complicates analysis for no good reason.

    I'd prefer to feed that separately from the assorted fixes (those are
    in #for-linus and with somewhat trickier topology)"

    * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    qstr: constify instances in adfs
    qstr: constify instances in lustre
    qstr: constify instances in f2fs
    qstr: constify instances in ext2
    qstr: constify instances in vfat
    qstr: constify instances in procfs
    qstr: constify instances in fuse
    qstr constify instances in fs/dcache.c
    qstr: constify instances in nfs
    qstr: constify instances in ocfs2
    qstr: constify instances in autofs4
    qstr: constify instances in hfs
    qstr: constify instances in hfsplus
    qstr: constify instances in logfs
    qstr: constify dentry_init_security

    Linus Torvalds
     

03 Aug, 2016

5 commits

  • We found a dlm-blocked situation caused by continuous breakdown of
    recovery masters described below. To solve this problem, we should
    purge recovery lock once detecting recovery master goes down.

    N3 N2 N1(reco master)
    go down
    pick up recovery lock and
    begin recoverying for N2

    go down

    pick up recovery
    lock failed, then
    purge it:
    dlm_purge_lockres
    ->DROPPING_REF is set

    send deref to N1 failed,
    recovery lock is not purged

    find N1 go down, begin
    recoverying for N1, but
    blocked in dlm_do_recovery
    as DROPPING_REF is set:
    dlm_do_recovery
    ->dlm_pick_recovery_master
    ->dlmlock
    ->dlm_get_lock_resource
    ->__dlm_wait_on_lockres_flags(tmpres,
    DLM_LOCK_RES_DROPPING_REF);

    Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
    Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
    Signed-off-by: Jun Piao
    Reviewed-by: Joseph Qi
    Reviewed-by: Jiufei Xue
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    piaojun
     
  • We found a BUG situation that lockres is migrated during deref described
    below. To solve the BUG, we could purge lockres directly when other
    node says I did not have a ref. Additionally, we'd better purge lockres
    if master goes down, as no one will response deref done.

    Node 1 Node 2(old master) Node3(new master)
    dlm_purge_lockres
    send deref to N2

    leave domain
    migrate lockres to N3
    finish migration
    send do assert
    master to N1

    receive do assert msg
    form N3, but can not
    find lockres because
    DROPPING_REF is set,
    so the owner is still
    N2.

    receive deref from N1
    and response -EINVAL
    because lockres is migrated

    BUG when receive -EINVAL
    in dlm_drop_lockres_ref

    Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear the refmap bit right now")

    Link: http://lkml.kernel.org/r/57845103.3070406@huawei.com
    Signed-off-by: Jun Piao
    Reviewed-by: Joseph Qi
    Reviewed-by: Jiufei Xue
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    piaojun
     
  • …eref_lockres_done_handler

    We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
    unexpected that described below. To solve the bug, we disable the
    BUG_ON and purge lockres in dlm_do_local_recovery_cleanup.

    Node 1 Node 2(master)
    dlm_purge_lockres
    dlm_deref_lockres_handler

    DLM_LOCK_RES_SETREF_INPROG is set
    response DLM_DEREF_RESPONSE_INPROG

    receive DLM_DEREF_RESPONSE_INPROG
    stop puring in dlm_purge_lockres
    and wait for DLM_DEREF_RESPONSE_DONE

    dispatch dlm_deref_lockres_worker
    response DLM_DEREF_RESPONSE_DONE

    receive DLM_DEREF_RESPONSE_DONE and
    prepare to purge lockres

    Node 2 goes down

    find Node2 down and do local
    clean up for Node2:
    dlm_do_local_recovery_cleanup
    -> clear DLM_LOCK_RES_DROPPING_REF

    when purging lockres, BUG_ON happens
    because DLM_LOCK_RES_DROPPING_REF is clear:
    dlm_deref_lockres_done_handler
    ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));

    [akpm@linux-foundation.org: fix duplicated write to `ret']
    Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
    Link: http://lkml.kernel.org/r/57845055.9080702@huawei.com
    Signed-off-by: Jun Piao <piaojun@huawei.com>
    Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
    Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
    Reviewed-by: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    piaojun
     
  • The testcase "mmaptruncate" in ocfs2 test suite always fails with ENOSPC
    error on small volume (say less than 10G). This testcase repeatedly
    performs "extend" and "truncate" on a file. Continuously, it truncates
    the file to 1/2 of the size, and then extends to 100% of the size. The
    main bitmap will quickly run out of space because the "truncate" code
    prevent truncate log from being flushed by
    ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may have
    cached lots of clusters.

    So retry to allocate after flushing truncate log when ENOSPC is
    returned. And we cannot reuse the deleted blocks before the transaction
    committed. Fortunately, we already have a function to do this -
    ocfs2_try_to_free_truncate_log(). Just need to remove the "static"
    modifier and put it into the right place.

    The "unlock"/"lock" code isn't elegant, but there seems to be no better
    option.

    [zren@suse.com: locking fix]
    Link: http://lkml.kernel.org/r/1468031546-4797-1-git-send-email-zren@suse.com
    Link: http://lkml.kernel.org/r/1466586469-5541-1-git-send-email-zren@suse.com
    Signed-off-by: Eric Ren
    Reviewed-by: Gang He
    Reviewed-by: Joseph Qi
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Ren
     
  • We encountered a bug from the customer, the user did a fsck.ocfs2 on the
    file system and exited unusually, the lockspace (with LVB size = 32) was
    left in the kernel space, next, the user mounted this file system, the
    kernel module did not create a new lockspace (LVB size = 64) via calling
    dlm_new_lockspace() function in mounting stage, just used the existing
    lockspace, created by the user space tool, this would lead the user was
    not able to mount this file system from the other nodes, with the error
    message like:

    dlm: 032F5......: config mismatch: 64,0 nodeid 177127961: 32,0
    (mount.ocfs2,26981,46):ocfs2_dlm_init:2995 ERROR: status = -71
    ocfs2_mount_volume:1881 ERROR: status = -71
    ocfs2_fill_super:1236 ERROR: status = -71

    The user found it very difficult to find the root cause, then, we
    brought out this patch to relieve such problem.

    First, we add one more flag in calling dlm_new_lockspace() function, to
    make sure the lockspace is created by kernel module itself, and this
    change will not affect the backward compatibility.

    Second, the obvious error message is reported in the kernel log, let the
    user be more easy to find the root cause.

    This patch will be used to insure the dlm lockspace is created by kernel
    module when mounting a ocfs2 file system. There are two ways to create
    a lockspace, from user space and kernel space, but the same name
    lockspaces probably have different lvblen lengths/flags.

    To avoid this mix using, we add one more flag DLM_LSFL_NEWEXCL, it will
    make sure the dlm lockspace is created by kernel module when mounting.
    Secondly, if a user space program (ocfs2-tools) is running on a file
    system, the user tries to mount this file system in the cluster, DLM
    module will return a -EEXIST or -EPROTO errno, we should give the user a
    obvious error message, then, the user can let that user space tool exit
    before mounting the file system again.

    Link: http://lkml.kernel.org/r/1463731940-13044-2-git-send-email-ghe@suse.com
    Signed-off-by: Gang He
    Reviewed-by: Goldwyn Rodrigues
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gang He
     

29 Jul, 2016

4 commits

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    fat: fix error message for bogus number of directory entries
    fat: fix typo s/supeblock/superblock/
    ASoC: max9877: Remove unused function declaration
    dw2102: don't output spurious blank lines to the kernel log
    init: fix Kconfig text
    ARM: io: fix comment grammar
    ocfs: fix ocfs2_xattr_user_get() argument name
    scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()

    Linus Torvalds
     
  • Pull quota update from Jan Kara:
    "time64 support for quota"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    quota: use time64_t internally

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "Assorted cleanups and fixes.

    Probably the most interesting part long-term is ->d_init() - that will
    have a bunch of followups in (at least) ceph and lustre, but we'll
    need to sort the barrier-related rules before it can get used for
    really non-trivial stuff.

    Another fun thing is the merge of ->d_iput() callers (dentry_iput()
    and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
    except the one in __d_lookup_lru())"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
    fs/dcache.c: avoid soft-lockup in dput()
    vfs: new d_init method
    vfs: Update lookup_dcache() comment
    bdev: get rid of ->bd_inodes
    Remove last traces of ->sync_page
    new helper: d_same_name()
    dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
    vfs: clean up documentation
    vfs: document ->d_real()
    vfs: merge .d_select_inode() into .d_real()
    unify dentry_iput() and dentry_unlink_inode()
    binfmt_misc: ->s_root is not going anywhere
    drop redundant ->owner initializations
    ufs: get rid of redundant checks
    orangefs: constify inode_operations
    missed comment updates from ->direct_IO() prototype change
    file_inode(f)->i_mapping is f->f_mapping
    trim fsnotify hooks a bit
    9p: new helper - v9fs_parent_fid()
    debugfs: ->d_parent is never NULL or negative
    ...

    Linus Torvalds
     
  • This changes the vfs dentry hashing to mix in the parent pointer at the
    _beginning_ of the hash, rather than at the end.

    That actually improves both the hash and the code generation, because we
    can move more of the computation to the "static" part of the dcache
    setup, and do less at lookup runtime.

    It turns out that a lot of other hash users also really wanted to mix in
    a base pointer as a 'salt' for the hash, and so the slightly extended
    interface ends up working well for other cases too.

    Users that want a string hash that is purely about the string pass in a
    'salt' pointer of NULL.

    * merge branch 'salted-string-hash':
    fs/dcache.c: Save one 32-bit multiply in dcache lookup
    vfs: make the string hashes salt the hash

    Linus Torvalds
     

27 Jul, 2016

1 commit

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2

    - most(?) of MM

    * emailed patches from Andrew Morton : (125 commits)
    thp: fix comments of __pmd_trans_huge_lock()
    cgroup: remove unnecessary 0 check from css_from_id()
    cgroup: fix idr leak for the first cgroup root
    mm: memcontrol: fix documentation for compound parameter
    mm: memcontrol: remove BUG_ON in uncharge_list
    mm: fix build warnings in
    mm, thp: convert from optimistic swapin collapsing to conservative
    mm, thp: fix comment inconsistency for swapin readahead functions
    thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
    shmem: split huge pages beyond i_size under memory pressure
    thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
    khugepaged: add support of collapse for tmpfs/shmem pages
    shmem: make shmem_inode_info::lock irq-safe
    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
    thp: extract khugepaged from mm/huge_memory.c
    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
    shmem: add huge pages support
    shmem: get_unmapped_area align huge page
    shmem: prepare huge= mount option and sysfs knob
    mm, rmap: account shmem thp pages
    ...

    Linus Torvalds