03 Aug, 2016

1 commit

  • The header file "include/linux/nilfs2_fs.h" is composed of parts for
    ioctl and disk format, and both are intended to be shared with user
    space programs.

    This moves them to the uapi directory "include/uapi/linux" splitting the
    file to "nilfs2_api.h" and "nilfs2_ondisk.h". The following minor
    changes are accompanied by this migration:

    - nilfs_direct_node struct in nilfs2/direct.h is converged to
    nilfs2_ondisk.h because it's an on-disk structure.
    - inline functions nilfs_rec_len_from_disk() and
    nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.

    Link: http://lkml.kernel.org/r/1465825507-3407-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     

24 May, 2016

4 commits

  • This fixes block comments with proper formatting to eliminate the
    following checkpatch.pl warnings:

    "WARNING: Block comments use * on subsequent lines"
    "WARNING: Block comments use a trailing */ on a separate line"

    Link: http://lkml.kernel.org/r/1462886671-3521-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This fixes checkpatch.pl warning "WARNING: Prefer 'unsigned int' to
    bare use of 'unsigned'".

    Link: http://lkml.kernel.org/r/1462886671-3521-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • E-mail addresses of osrg.net domain are no longer available. This
    removes them from authorship notices and prevents reporters from being
    confused.

    Link: http://lkml.kernel.org/r/1461935747-10380-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This removes the extra paragraph which mentions FSF address in GPL
    notices from source code of nilfs2 and avoids the checkpatch.pl error
    related to it.

    Link: http://lkml.kernel.org/r/1461935747-10380-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     

07 Nov, 2015

1 commit

  • This patch adds a tracepoint for tracking stage transition of block
    collection in segment construction. With the tracepoint, we can analysis
    the behavior of segment construction in depth. It would be useful for
    bottleneck detection and debugging, etc.

    The tracepoint is created with the standard trace API of linux (like ext3,
    ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of
    course, more detailed analysis will be possible if we can create nilfs
    specific analysis tools.

    Below is an example of event dump with Brendan Gregg's perf-tools
    (https://github.com/brendangregg/perf-tools). Time consumption between
    each stage can be obtained.

    $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
    Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
    segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
    segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
    segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
    segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
    segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
    segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
    segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
    segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
    segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE

    For capturing transition correctly, this patch adds wrappers for the
    member scnt of nilfs_cstage. With this change, every transition of the
    stage can produce trace event in a correct manner.

    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Ryusuke Konishi
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     

06 Feb, 2015

1 commit

  • Nilfs2 eventually hangs in a stress test with fsstress program. This
    issue was caused by the following deadlock over I_SYNC flag between
    nilfs_segctor_thread() and writeback_sb_inodes():

    nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
    nilfs_segctor_unlock()
    nilfs_dispose_list()
    iput()
    iput_final()
    evict()
    inode_wait_for_writeback() * wait for I_SYNC flag

    writeback_sb_inodes()
    * set I_SYNC flag on inode->i_state
    __writeback_single_inode()
    do_writepages()
    nilfs_writepages()
    nilfs_construct_dsync_segment()
    nilfs_segctor_sync()
    * wait for completion of segment constructor
    inode_sync_complete()
    * clear I_SYNC flag after __writeback_single_inode() completed

    writeback_sb_inodes() calls do_writepages() for dirty inodes after
    setting I_SYNC flag on inode->i_state. do_writepages() in turn calls
    nilfs_writepages(), which can run segment constructor and wait for its
    completion. On the other hand, segment constructor calls iput(), which
    can call evict() and wait for the I_SYNC flag on
    inode_wait_for_writeback().

    Since segment constructor doesn't know when I_SYNC will be set, it
    cannot know whether iput() will block or not unless inode->i_nlink has a
    non-zero count. We can prevent evict() from being called in iput() by
    implementing sop->drop_inode(), but it's not preferable to leave inodes
    with i_nlink == 0 for long periods because it even defers file
    truncation and inode deallocation. So, this instead resolves the
    deadlock by calling iput() asynchronously with a workqueue for inodes
    with i_nlink == 0.

    Signed-off-by: Ryusuke Konishi
    Cc: Al Viro
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     

10 May, 2011

1 commit

  • Previously, nilfs was cloning pages for mmapped region to freeze their
    data and ensure consistency of checksum during writeback cycles. A
    private page allocator was used for this page cloning. But, we no
    longer need to do that since clear_page_dirty_for_io function sets up
    pte so that vm_ops->page_mkwrite function is called right before the
    mmapped pages are modified and nilfs_page_mkwrite function can safely
    wait for the pages to be written back to disk.

    So, this stops making a copy of mmapped pages during writeback, and
    eliminates the private page allocation and deallocation functions from
    nilfs.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

09 Mar, 2011

3 commits

  • This directly uses sb->s_fs_info to keep a nilfs filesystem object and
    fully removes the intermediate nilfs_sb_info structure. With this
    change, the hierarchy of on-memory structures of nilfs will be
    simplified as follows:

    Before:
    super_block
    -> nilfs_sb_info
    -> the_nilfs
    -> cptree --+-> nilfs_root (current file system)
    +-> nilfs_root (snapshot A)
    +-> nilfs_root (snapshot B)
    :
    -> nilfs_sc_info (log writer structure)
    After:
    super_block
    -> the_nilfs
    -> cptree --+-> nilfs_root (current file system)
    +-> nilfs_root (snapshot A)
    +-> nilfs_root (snapshot B)
    :
    -> nilfs_sc_info (log writer structure)

    The reason why we didn't design so from the beginning is because the
    initial shape also differed from the above. The early hierachy was
    composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a
    shared nilfs object. On the kernel 2.6.37, it was changed to the
    current shape in order to unify super block instances into one per
    device, and this cleanup became applicable as the result.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This replaces sbi uses with direct reference to sb instance.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Removes sci->sc_sbi which is a back pointer to nilfs_sb_info struct
    from log writer object (nilfs_sc_info).

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

23 Oct, 2010

2 commits

  • This rewrites functions using ifile so that they get ifile from
    nilfs_root object, and will remove sbi->s_ifile. Some functions that
    don't know the root object are extended to receive it from caller.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • On-memory inode structures of nilfs have a member "i_cno" which stores
    a checkpoint number related to the inode. For gc-inodes, this field
    indicates version of data each gc-inode caches for GC. Log writer
    temporarily uses "i_cno" to transfer the latest checkpoint number.

    This stops the latter use and lets only gc-inodes use it.

    The purpose of this patch is to allow the successive change use
    "i_cno" for inode lookup.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

23 Jul, 2010

2 commits


31 May, 2010

1 commit


10 May, 2010

3 commits


14 Mar, 2010

2 commits


13 Feb, 2010

1 commit


30 Nov, 2009

1 commit

  • This separates wait function for submitted logs from the write
    function nilfs_segctor_write(). A new list of segment buffers
    "sc_write_logs" is added to hold logs under writing, and double
    buffering is partially applied to hide io latency.

    At this point, the double buffering is disabled for blocksize <
    pagesize because page dirty flag is turned off during write and dirty
    buffers are not properly collected for pages crossing over segments.

    To receive full benefit of the double buffering, further refinement is
    needed to move the io wait outside the lock section of log writer.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

10 Jun, 2009

2 commits

  • This will eliminate obsolete list operations of nilfs_segment_entry
    structure which has been used to handle mutiple segment numbers.

    The patch ("nilfs2: remove list of freeing segments") removed use of
    the structure from the segment constructor code, and this patch
    simplifies the remaining code by integrating it into recovery.c.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This will clean up the removal list of segments and the related
    functions from segment.c and ioctl.c, which have hurt code
    readability.

    This elimination is applied by using nilfs_sufile_updatev() previously
    introduced in the patch ("nilfs2: add sufile function that can modify
    multiple segment usages").

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

11 May, 2009

1 commit

  • This is a companion patch to ("nilfs2: fix possible circular locking
    for get information ioctls").

    This corrects lock order reversal between mm->mmap_sem and
    nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by
    lockdep check:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.30-rc3-nilfs-00003-g360bdc1 #7
    -------------------------------------------------------
    mmap/5294 is trying to acquire lock:
    (&nilfs->ns_segctor_sem){++++.+}, at: [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]

    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: [] do_page_fault+0x1d8/0x30a

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++++}:
    [] __lock_acquire+0x1066/0x13b0
    [] lock_acquire+0xba/0xdd
    [] might_fault+0x68/0x88
    [] copy_from_user+0x2a/0x111
    [] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2]
    [] nilfs_clean_segments+0x6d/0x1b9 [nilfs2]
    [] nilfs_ioctl+0x2ad/0x318 [nilfs2]
    [] vfs_ioctl+0x22/0x69
    [] do_vfs_ioctl+0x460/0x499
    [] sys_ioctl+0x40/0x5a
    [] sysenter_do_call+0x12/0x38
    [] 0xffffffff

    -> #0 (&nilfs->ns_segctor_sem){++++.+}:
    [] __lock_acquire+0xdcc/0x13b0
    [] lock_acquire+0xba/0xdd
    [] down_read+0x2a/0x3e
    [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
    [] __do_fault+0x165/0x376
    [] handle_mm_fault+0x287/0x5d1
    [] do_page_fault+0x2fb/0x30a
    [] error_code+0x72/0x78
    [] 0xffffffff

    where nilfs_clean_segments() holds:

    nilfs->ns_segctor_sem -> copy_from_user()
    --> page fault -> mm->mmap_sem

    And, page fault path may hold:

    page fault -> mm->mmap_sem
    --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem

    Even though nilfs_clean_segments() does not perform write access on
    given user pages, it may cause deadlock because nilfs->ns_segctor_sem
    is shared per device and mm->mmap_sem can be shared with other tasks.

    To avoid this problem, this patch moves all calls of copy_from_user()
    outside the nilfs->ns_segctor_sem lock in the ioctl.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

07 Apr, 2009

6 commits

  • The former versions didn't have extra super blocks. This improves the
    weak point by introducing another super block at unused region in tail of
    the partition.

    This doesn't break disk format compatibility; older versions just ingore
    the secondary super block, and new versions just recover it if it doesn't
    exist. The partition created by an old mkfs may not have unused region,
    but in that case, the secondary super block will not be added.

    This doesn't make more redundant copies of the super block; it is a future
    work.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • will reduce some lines of segment constructor. Previously, the state was
    complexly controlled through a list of segments in order to keep
    consistency in meta data of usage state of segments. Instead, this
    presents ``calculated'' active flags to userland cleaner program and stop
    maintaining its real flag on disk.

    Only by this fake flag, the cleaner cannot exactly know if each segment is
    reclaimable or not. However, the recent extension of nilfs_sustat ioctl
    struct (nilfs2-extend-nilfs_sustat-ioctl-struct.patch) can prevent the
    cleaner from reclaiming in-use segment wrongly.

    So, now I can apply this for simplification.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Nilfs creates checkpoints even for garbage collection or metadata updates
    such as checkpoint mode change. So, user often sees checkpoints created
    only by such internal operations.

    This is inconvenient in some situations. For example, application that
    monitors checkpoints and changes them to snapshots, will fall into an
    infinite loop because it cannot distinguish internally created
    checkpoints.

    This patch solves this sort of problem by adding a flag to checkpoint for
    identification.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The sketch file is a file to mark checkpoints with user data. It was
    experimentally introduced in the original implementation, and now
    obsolete. The file was handled differently with regular files; the file
    size got truncated when a checkpoint was created.

    This stops the special treatment and will treat it as a regular file.
    Most users are not affected because mkfs.nilfs2 no longer makes this file.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Chris Mason pointed out that there is a missed sync issue in
    nilfs_writepages():

    On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote:
    > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
    > do_sync_mapping_range().

    where WB_SYNC_NONE in do_sync_mapping_range() was replaced with
    WB_SYNC_ALL by Nick's patch (commit:
    ee53a891f47444c53318b98dac947ede963db400).

    This fixes the problem by letting nilfs_writepages() write out the log of
    file data within the range if sync_mode is WB_SYNC_ALL.

    This involves removal of nilfs_file_aio_write() which was previously
    needed to ensure O_SYNC sync writes.

    Cc: Chris Mason
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds the segment constructor (also called log writer).

    The segment constructor collects dirty buffers for every dirty inode,
    makes summaries of the buffers, assigns disk block addresses to the
    buffers, and then submits BIOs for the buffers.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi