06 Nov, 2015

2 commits

  • A simplified test case is (this case from Ryan):
    1) dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct;
    2) truncate /mnt/hello -s 2097152
    file 'hello' is not exist before test. After this command,
    file 'hello' should be all zero. But 512~4096 is some random data.

    Setting bh state to new when get a new block, if so,
    direct_io_worker()->dio_zero_block() will fill-in the unused portion
    of the block with zero.

    Signed-off-by: Yiwen Jiang
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    jiangyiwen
     
  • If ocfs2_is_overwrite failed, ocfs2_direct_IO_write mays till return
    success to the caller.

    Signed-off-by: Norton.Zhu
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Norton.Zhu
     

05 Sep, 2015

5 commits

  • These uses sometimes do and sometimes don't have '\n' terminations. Make
    the uses consistently use '\n' terminations and remove the newline from
    the functions.

    Miscellanea:

    o Coalesce formats
    o Realign arguments

    Signed-off-by: Joe Perches
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • 1: After we call ocfs2_journal_access_di() in ocfs2_write_begin(),
    jbd2_journal_restart() may also be called, in this function transaction
    A's t_updates-- and obtains a new transaction B. If
    jbd2_journal_commit_transaction() is happened to commit transaction A,
    when t_updates==0, it will continue to complete commit and unfile
    buffer.

    So when jbd2_journal_dirty_metadata(), the handle is pointed a new
    transaction B, and the buffer head's journal head is already freed,
    jh->b_transaction == NULL, jh->b_next_transaction == NULL, it returns
    EINVAL, So it triggers the BUG_ON(status).

    thread 1 jbd2
    ocfs2_write_begin jbd2_journal_commit_transaction
    ocfs2_write_begin_nolock
    ocfs2_start_trans
    jbd2__journal_start(t_updates+1,
    transaction A)
    ocfs2_journal_access_di
    ocfs2_write_cluster_by_desc
    ocfs2_mark_extent_written
    ocfs2_change_extent_flag
    ocfs2_split_extent
    ocfs2_extend_rotate_transaction
    jbd2_journal_restart
    (t_updates-1,transaction B) t_updates==0
    __jbd2_journal_refile_buffer
    (jh->b_transaction = NULL)
    ocfs2_write_end
    ocfs2_write_end_nolock
    ocfs2_journal_dirty
    jbd2_journal_dirty_metadata(bug)
    ocfs2_commit_trans

    2. In ext4, I found that: jbd2_journal_get_write_access() called by
    ext4_write_end.

    ext4_write_begin
    ext4_journal_start
    __ext4_journal_start_sb
    ext4_journal_check_start
    jbd2__journal_start

    ext4_write_end
    ext4_mark_inode_dirty
    ext4_reserve_inode_write
    ext4_journal_get_write_access
    jbd2_journal_get_write_access
    ext4_mark_iloc_dirty
    ext4_do_update_inode
    ext4_handle_dirty_metadata
    jbd2_journal_dirty_metadata

    3. So I think we should put ocfs2_journal_access_di before
    ocfs2_journal_dirty in the ocfs2_write_end. and it works well after my
    modification.

    Signed-off-by: vicky
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Zhangguanghui
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    yangwenfang
     
  • In ocfs2, ip_alloc_sem is used to protect allocation changes on the
    node. In direct IO, we add ip_alloc_sem to protect date consistent
    between direct-io and ocfs2_truncate_file race (buffer io use
    ip_alloc_sem already). Although inode->i_mutex lock is used to avoid
    concurrency of above situation, i think ip_alloc_sem is still needed
    because protect allocation changes is significant.

    Other filesystem like ext4 also uses rw_semaphore to protect data
    consistent between get_block-vs-truncate race by other means, So
    ip_alloc_sem in ocfs2 direct io is needed.

    Signed-off-by: Weiwei Wang
    Signed-off-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WeiWei Wang
     
  • 1) Take rw EX lock in case of append dio.
    2) Explicitly treat the error code -EIOCBQUEUED as normal.
    3) Set di_bh to NULL after brelse if it may be used again later.

    Signed-off-by: Joseph Qi
    Cc: Yiwen Jiang
    Cc: Weiwei Wang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • During direct io the inode will be added to orphan first and then
    deleted from orphan. There is a race window that the orphan entry will
    be deleted twice and thus trigger the BUG when validating
    OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.

    ocfs2_direct_IO_write
    ...
    ocfs2_add_inode_to_orphan
    >>>>>>>> race window.
    1) another node may rm the file and then down, this node
    take care of orphan recovery and clear flag
    OCFS2_DIO_ORPHANED_FL.
    2) since rw lock is unlocked, it may race with another
    orphan recovery and append dio.
    ocfs2_del_inode_from_orphan

    So take inode mutex lock when recovering orphans and make rw unlock at the
    end of aio write in case of append dio.

    Signed-off-by: Joseph Qi
    Reported-by: Yiwen Jiang
    Cc: Weiwei Wang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

07 Aug, 2015

1 commit

  • When using a large volume, for example 9T volume with 2T already used,
    frequent creation of small files with O_DIRECT when the IO is not
    cluster aligned may clear sectors in the wrong place. This will cause
    filesystem corruption.

    This is because p_cpos is a u32. When calculating the corresponding
    sector it should be converted to u64 first, otherwise it may overflow.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: [4.0+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

25 Jun, 2015

3 commits

  • contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared
    with clusters_to_alloc. So convert it to clusters first.

    Signed-off-by: Joseph Qi
    Reviewed-by: Weiwei Wang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • In ocfs2 direct read/write, OCFS2_IOCB_SEM lock type is used to protect
    inode->i_alloc_sem rw semaphore lock in the earlier kernel version.
    However, in the latest kernel, inode->i_alloc_sem rw semaphore lock is not
    used at all, so OCFS2_IOCB_SEM lock type needs to be removed.

    Signed-off-by: Weiwei Wang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Reviewed-by: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WeiWei Wang
     
  • Once dio crashed it will leave an entry in orphan dir. And orphan scan
    will take care of the clean up. There is a tiny race case that the same
    entry will be truncated twice and then trigger the BUG in
    ocfs2_del_inode_from_orphan.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

17 Apr, 2015

1 commit

  • Pull third hunk of vfs changes from Al Viro:
    "This contains the ->direct_IO() changes from Omar + saner
    generic_write_checks() + dealing with fcntl()/{read,write}() races
    (mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
    repeatedly looking at ->f_flags, which can be changed by fcntl(2),
    check ->ki_flags - which cannot) + infrastructure bits for dhowells'
    d_inode annotations + Christophs switch of /dev/loop to
    vfs_iter_write()"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
    block: loop: switch to VFS ITER_BVEC
    configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
    VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
    VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
    VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
    NFS: Don't use d_inode as a variable name
    VFS: Impose ordering on accesses of d_inode and d_flags
    VFS: Add owner-filesystem positive/negative dentry checks
    nfs: generic_write_checks() shouldn't be done on swapout...
    ocfs2: use __generic_file_write_iter()
    mirror O_APPEND and O_DIRECT into iocb->ki_flags
    switch generic_write_checks() to iocb and iter
    ocfs2: move generic_write_checks() before the alignment checks
    ocfs2_file_write_iter: stop messing with ppos
    udf_file_write_iter: reorder and simplify
    fuse: ->direct_IO() doesn't need generic_write_checks()
    ext4_file_write_iter: move generic_write_checks() up
    xfs_file_aio_write_checks: switch to iocb/iov_iter
    generic_write_checks(): drop isblk argument
    blkdev_write_iter: expand generic_file_checks() call in there
    ...

    Linus Torvalds
     

15 Apr, 2015

5 commits

  • Merge first patchbomb from Andrew Morton:

    - arch/sh updates

    - ocfs2 updates

    - kernel/watchdog feature

    - about half of mm/

    * emailed patches from Andrew Morton : (122 commits)
    Documentation: update arch list in the 'memtest' entry
    Kconfig: memtest: update number of test patterns up to 17
    arm: add support for memtest
    arm64: add support for memtest
    memtest: use phys_addr_t for physical addresses
    mm: move memtest under mm
    mm, hugetlb: abort __get_user_pages if current has been oom killed
    mm, mempool: do not allow atomic resizing
    memcg: print cgroup information when system panics due to panic_on_oom
    mm: numa: remove migrate_ratelimited
    mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    mm: split ET_DYN ASLR from mmap ASLR
    s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    mm: expose arch_mmap_rnd when available
    s390: standardize mmap_rnd() usage
    powerpc: standardize mmap_rnd() usage
    mips: extract logic for mmap_rnd()
    arm64: standardize mmap_rnd() usage
    x86: standardize mmap_rnd() usage
    arm: factor out mmap ASLR into mmap_rnd
    ...

    Linus Torvalds
     
  • In ocfs2_direct_IO_write, we use ocfs2_zero_extend to zero allocated
    clusters in case of cluster not aligned. But ocfs2_zero_extend uses page
    cache, this may happen that it clears the data which blockdev_direct_IO
    has already written.

    We should use blkdev_issue_zeroout instead of ocfs2_zero_extend during
    direct IO.

    So fix this issue by introducing ocfs2_direct_IO_zero_extend and
    ocfs2_direct_IO_extend_no_holes.

    Reported-by: Yiwen Jiang
    Signed-off-by: Joseph Qi
    Tested-by: Yiwen Jiang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • We need take inode lock when calling ocfs2_get_clusters.
    And use GFP_NOFS instead of GFP_KERNEL.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Since di_bh won't be used when zeroing extend, set it to NULL.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Only when direct IO succeeds we need consider zeroing out in case of
    cluster not aligned.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

12 Apr, 2015

3 commits


26 Mar, 2015

1 commit


17 Feb, 2015

2 commits

  • Allow blocks allocation in ocfs2_direct_IO_get_blocks.

    Signed-off-by: Joseph Qi
    Cc: Weiwei Wang
    Cc: Junxiao Bi
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc: Xuejiufei
    Cc: alex chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Implement ocfs2_direct_IO_write. Add the inode to orphan dir first, and
    then delete it once append O_DIRECT finished.

    This is to make sure block allocation and inode size are consistent.

    [akpm@linux-foundation.org: fix it for "block: Add discard flag to blkdev_issue_zeroout() function"]
    Signed-off-by: Joseph Qi
    Cc: Weiwei Wang
    Cc: Junxiao Bi
    Cc: Joel Becker
    Cc: Mark Fasheh
    Cc: Xuejiufei
    Cc: alex chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     

19 Dec, 2014

1 commit

  • For buffer write, page lock will be got in write_begin and released in
    write_end, in ocfs2_write_end_nolock(), before it unlock the page in
    ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
    for the read lock of journal->j_trans_barrier. Holding page lock and
    ask for journal->j_trans_barrier breaks the locking order.

    This will cause a deadlock with journal commit threads, ocfs2cmt will
    get write lock of journal->j_trans_barrier first, then it wakes up
    kjournald2 to do the commit work, at last it waits until done. To
    commit journal, kjournald2 needs flushing data first, it needs get the
    cache page lock.

    Since some ocfs2 cluster locks are holding by write process, this
    deadlock may hung the whole cluster.

    unlock pages before ocfs2_run_deallocs() can fix the locking order, also
    put unlock before ocfs2_commit_trans() to make page lock is unlocked
    before j_trans_barrier to preserve unlocking order.

    Signed-off-by: Junxiao Bi
    Reviewed-by: Wengang Wang
    Cc:
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

11 Dec, 2014

1 commit

  • Do not set the filesystem readonly if the storage link is down. In this
    case, metadata is not corrupted and only -EIO is returned. And if it is
    indeed corrupted metadata, it has already called ocfs2_error() in
    ocfs2_validate_inode_block().

    Signed-off-by: Yiwen Jiang
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    jiangyiwen
     

10 Oct, 2014

1 commit

  • For commit ocfs2 journal, ocfs2 journal thread will acquire the mutex
    osb->journal->j_trans_barrier and wake up jbd2 commit thread, then it
    will wait until jbd2 commit thread done. In order journal mode, jbd2
    needs flushing dirty data pages first, and this needs get page lock.
    So osb->journal->j_trans_barrier should be got before page lock.

    But ocfs2_write_zero_page() and ocfs2_write_begin_inline() obey this
    locking order, and this will cause deadlock and hung the whole cluster.

    One deadlock catched is the following:

    PID: 13449 TASK: ffff8802e2f08180 CPU: 31 COMMAND: "oracle"
    #0 [ffff8802ee3f79b0] __schedule at ffffffff8150a524
    #1 [ffff8802ee3f7a58] schedule at ffffffff8150acbf
    #2 [ffff8802ee3f7a68] rwsem_down_failed_common at ffffffff8150cb85
    #3 [ffff8802ee3f7ad8] rwsem_down_read_failed at ffffffff8150cc55
    #4 [ffff8802ee3f7ae8] call_rwsem_down_read_failed at ffffffff812617a4
    #5 [ffff8802ee3f7b50] ocfs2_start_trans at ffffffffa0498919 [ocfs2]
    #6 [ffff8802ee3f7ba0] ocfs2_zero_start_ordered_transaction at ffffffffa048b2b8 [ocfs2]
    #7 [ffff8802ee3f7bf0] ocfs2_write_zero_page at ffffffffa048e9bd [ocfs2]
    #8 [ffff8802ee3f7c80] ocfs2_zero_extend_range at ffffffffa048ec83 [ocfs2]
    #9 [ffff8802ee3f7ce0] ocfs2_zero_extend at ffffffffa048edfd [ocfs2]
    #10 [ffff8802ee3f7d50] ocfs2_extend_file at ffffffffa049079e [ocfs2]
    #11 [ffff8802ee3f7da0] ocfs2_setattr at ffffffffa04910ed [ocfs2]
    #12 [ffff8802ee3f7e70] notify_change at ffffffff81187d29
    #13 [ffff8802ee3f7ee0] do_truncate at ffffffff8116bbc1
    #14 [ffff8802ee3f7f50] sys_ftruncate at ffffffff8116bcbd
    #15 [ffff8802ee3f7f80] system_call_fastpath at ffffffff81515142
    RIP: 00007f8de750c6f7 RSP: 00007fffe786e478 RFLAGS: 00000206
    RAX: 000000000000004d RBX: ffffffff81515142 RCX: 0000000000000000
    RDX: 0000000000000200 RSI: 0000000000028400 RDI: 000000000000000d
    RBP: 00007fffe786e040 R8: 0000000000000000 R9: 000000000000000d
    R10: 0000000000000000 R11: 0000000000000206 R12: 000000000000000d
    R13: 00007fffe786e710 R14: 00007f8de70f8340 R15: 0000000000028400
    ORIG_RAX: 000000000000004d CS: 0033 SS: 002b

    crash64> bt
    PID: 7610 TASK: ffff88100fd56140 CPU: 1 COMMAND: "ocfs2cmt"
    #0 [ffff88100f4d1c50] __schedule at ffffffff8150a524
    #1 [ffff88100f4d1cf8] schedule at ffffffff8150acbf
    #2 [ffff88100f4d1d08] jbd2_log_wait_commit at ffffffffa01274fd [jbd2]
    #3 [ffff88100f4d1d98] jbd2_journal_flush at ffffffffa01280b4 [jbd2]
    #4 [ffff88100f4d1dd8] ocfs2_commit_cache at ffffffffa0499b14 [ocfs2]
    #5 [ffff88100f4d1e38] ocfs2_commit_thread at ffffffffa0499d38 [ocfs2]
    #6 [ffff88100f4d1ee8] kthread at ffffffff81090db6
    #7 [ffff88100f4d1f48] kernel_thread_helper at ffffffff81516284

    crash64> bt
    PID: 7609 TASK: ffff88100f2d4480 CPU: 0 COMMAND: "jbd2/dm-20-86"
    #0 [ffff88100def3920] __schedule at ffffffff8150a524
    #1 [ffff88100def39c8] schedule at ffffffff8150acbf
    #2 [ffff88100def39d8] io_schedule at ffffffff8150ad6c
    #3 [ffff88100def39f8] sleep_on_page at ffffffff8111069e
    #4 [ffff88100def3a08] __wait_on_bit_lock at ffffffff8150b30a
    #5 [ffff88100def3a58] __lock_page at ffffffff81110687
    #6 [ffff88100def3ab8] write_cache_pages at ffffffff8111b752
    #7 [ffff88100def3be8] generic_writepages at ffffffff8111b901
    #8 [ffff88100def3c48] journal_submit_data_buffers at ffffffffa0120f67 [jbd2]
    #9 [ffff88100def3cf8] jbd2_journal_commit_transaction at ffffffffa0121372[jbd2]
    #10 [ffff88100def3e68] kjournald2 at ffffffffa0127a86 [jbd2]
    #11 [ffff88100def3ee8] kthread at ffffffff81090db6
    #12 [ffff88100def3f48] kernel_thread_helper at ffffffff81516284

    Signed-off-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Alex Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

07 May, 2014

2 commits


04 Apr, 2014

2 commits

  • Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
    transaction to complete. This isn't terribly efficient, since sync_file
    really only needs to wait for the last transaction involving that inode
    to complete, and this doesn't require i_mutex.

    Therefore, implement the necessary bits to track the newest tid
    associated with an inode, and teach sync_file to wait for that instead
    of waiting for everything in the journal to commit. Furthermore, only
    issue the flush request to the drive if jbd2 hasn't already done so.

    This also eliminates the deadlock between ocfs2_file_aio_write() and
    ocfs2_sync_file(). aio_write takes i_mutex then calls
    ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
    However, if that dio completion involves calling fsync, then we can get
    into trouble when some ocfs2_sync_file tries to take i_mutex.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • There is a problem that waitqueue_active() may check stale data thus miss
    a wakeup of threads waiting on ip_unaligned_aio.

    The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to
    be of type mutex thus the above prolem is avoid. Another benifit is that
    mutex which works as FIFO is fairer than wake_up_all().

    Signed-off-by: Wengang Wang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wengang Wang
     

13 Nov, 2013

4 commits

  • Ocfs2 doesn't do data journalling. Thus its ->invalidatepage and
    ->releasepage functions never get called on buffers that have journal
    heads attached. So just use standard variants of functions from
    buffer.c.

    Signed-off-by: Jan Kara
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • When ocfs2_write_cluster_by_desc() failed in ocfs2_write_begin_nolock()
    because of ENOSPC, it goes to out_quota, freeing data_ac(meta_ac). Then
    it calls ocfs2_try_to_free_truncate_log() to free space. If enough
    space freed, it will try to write again. Unfortunately, some error
    happenes before ocfs2_lock_allocators(), it goes to out and free
    data_ac(meta_ac) again.

    Signed-off-by: joyce
    Reviewed-by: Jie Liu
    Acked-by: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • The only reason for sb_getblk() failing is if it can't allocate the
    buffer_head. So return ENOMEM instead when it fails.

    [joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change]
    Signed-off-by: Rui Xiang
    Reviewed-by: Jie Liu
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Cc: Joseph Qi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rui Xiang
     
  • Code cleanup to remove unnecessary variable passed but never used
    to ocfs2_calc_extend_credits.

    Signed-off-by: Goldwyn Rodrigues
    Cc: Joel Becker
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     

12 Sep, 2013

1 commit

  • Though ocfs2 uses inode->i_mutex to protect i_size, there are both
    i_size_read/write() and direct accesses. Clean up all direct access to
    eliminate confusion.

    Signed-off-by: Junxiao Bi
    Cc: Jie Liu
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     

04 Sep, 2013

1 commit

  • Add support to the core direct-io code to defer AIO completions to user
    context using a workqueue. This replaces opencoded and less efficient
    code in XFS and ext4 (we save a memory allocation for each direct IO)
    and will be needed to properly support O_(D)SYNC for AIO.

    The communication between the filesystem and the direct I/O code requires
    a new buffer head flag, which is a bit ugly but not avoidable until the
    direct I/O code stops abusing the buffer_head structure for communicating
    with the filesystems.

    Currently this creates a per-superblock unbound workqueue for these
    completions, which is taken from an earlier patch by Jan Kara. I'm
    not really convinced about this use and would prefer a "normal" global
    workqueue with a high concurrency limit, but this needs further discussion.

    JK: Fixed ext4 part, dynamic allocation of the workqueue.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Christoph Hellwig
     

14 Aug, 2013

1 commit

  • Since ocfs2_cow_file_pos will invoke ocfs2_refcount_icow with a NULL as
    the struct file pointer, it finally result in a null pointer dereference
    in ocfs2_duplicate_clusters_by_page.

    This patch replace file pointer with inode pointer in
    cow_duplicate_clusters to fix this issue.

    [jeff.liu@oracle.com: rebased patch against linux-next tree]
    Signed-off-by: Tiger Yang
    Signed-off-by: Jie Liu
    Cc: Joel Becker
    Cc: Mark Fasheh
    Acked-by: Tao Ma
    Tested-by: David Weber
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tiger Yang
     

22 May, 2013

3 commits

  • ->invalidatepage() aop now accepts range to invalidate so we can make
    use of it in ocfs2_invalidatepage().

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara
    Acked-by: Joel Becker

    Lukas Czerner
     
  • invalidatepage now accepts range to invalidate and there are two file
    system using jbd2 also implementing punch hole feature which can benefit
    from this. We need to implement the same thing for jbd2 layer in order to
    allow those file system take benefit of this functionality.

    This commit adds length argument to the jbd2_journal_invalidatepage()
    and updates all instances in ext4 and ocfs2.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara

    Lukas Czerner
     
  • Currently there is no way to truncate partial page where the end
    truncate point is not at the end of the page. This is because it was not
    needed and the functionality was enough for file system truncate
    operation to work properly. However more file systems now support punch
    hole feature and it can benefit from mm supporting truncating page just
    up to the certain point.

    Specifically, with this functionality truncate_inode_pages_range() can
    be changed so it supports truncating partial page at the end of the
    range (currently it will BUG_ON() if 'end' is not at the end of the
    page).

    This commit changes the invalidatepage() address space operation
    prototype to accept range to be invalidated and update all the instances
    for it.

    We also change the block_invalidatepage() in the same way and actually
    make a use of the new length argument implementing range invalidation.

    Actual file system implementations will follow except the file systems
    where the changes are really simple and should not change the behaviour
    in any way .Implementation for truncate_page_range() which will be able
    to accept page unaligned ranges will follow as well.

    Signed-off-by: Lukas Czerner
    Cc: Andrew Morton
    Cc: Hugh Dickins

    Lukas Czerner