22 Nov, 2011

1 commit

  • There is no reason to export two functions for entering the
    refrigerator. Calling refrigerator() instead of try_to_freeze()
    doesn't save anything noticeable or removes any race condition.

    * Rename refrigerator() to __refrigerator() and make it return bool
    indicating whether it scheduled out for freezing.

    * Update try_to_freeze() to return bool and relay the return value of
    __refrigerator() if freezing().

    * Convert all refrigerator() users to try_to_freeze().

    * Update documentation accordingly.

    * While at it, add might_sleep() to try_to_freeze().

    Signed-off-by: Tejun Heo
    Cc: Samuel Ortiz
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: KONISHI Ryusuke
    Cc: Christoph Hellwig

    Tejun Heo
     

11 Jun, 2011

1 commit

  • Checkpoint generation interval of nilfs goes wrong after user has
    changed the interval parameter with nilfs-tune tool.

    segctord starting. Construction interval = 5 seconds,
    CP frequency < 30 seconds
    segctord starting. Construction interval = 0 seconds,
    CP frequency < 30 seconds

    This turned out to be caused by a trivial bug in initialization code
    of log writer. This will fix it.

    Reported-by: Andrea Gelmini
    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

10 May, 2011

7 commits


09 Mar, 2011

7 commits


02 Mar, 2011

1 commit

  • According to the report from Jiro SEKIBA titled "regression in
    2.6.37?" (Message-Id: ), on 2.6.37 and
    later kernels, lscp command no longer displays "i" flag on checkpoints
    that snapshot operations or garbage collection created.

    This is a regression of nilfs2 checkpointing function, and it's
    critical since it broke behavior of a part of nilfs2 applications.
    For instance, snapshot manager of TimeBrowse gets to create
    meaningless snapshots continuously; snapshot creation triggers another
    checkpoint, but applications cannot distinguish whether the new
    checkpoint contains meaningful changes or not without the i-flag.

    This patch fixes the regression and brings that application behavior
    back to normal.

    Reported-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Tested-by: Jiro SEKIBA
    Cc: stable [2.6.37]

    Ryusuke Konishi
     

10 Jan, 2011

3 commits

  • nilfs_dat_inode function was a wrapper to switch between normal dat
    inode and gcdat, a clone of the dat inode for garbage collection.

    This function got obsolete when the gcdat inode was removed, and now
    we can access the dat inode directly from a nilfs object. So, we will
    unfold the wrapper and remove it.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Nilfs does not allocate new blocks on disk until they are actually
    written to. To implement fiemap, we need to deal with such blocks.

    To allow successive fiemap patch to distinguish mapped but unallocated
    regions, this marks buffer heads of those new blocks as delayed and
    clears the flag after the blocks are written to disk.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Some functions using nilfs bmap routines can wrongly return invalid
    argument error (i.e. -EINVAL) that bmap returns as an internal code
    for btree corruption.

    This fixes the issue by catching and converting the internal EINVAL to
    EIO and calling nilfs_error function inside bmap routines.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

27 Oct, 2010

1 commit

  • To help developers and applications gain visibility into writeback
    behaviour this patch adds two counters to /proc/vmstat.

    # grep nr_dirtied /proc/vmstat
    nr_dirtied 3747
    # grep nr_written /proc/vmstat
    nr_written 3618

    These entries allow user apps to understand writeback behaviour over time
    and learn how it is impacting their performance. Currently there is no
    way to inspect dirty and writeback speed over time. It's not possible for
    nr_dirty/nr_writeback.

    These entries are necessary to give visibility into writeback behaviour.
    We have /proc/diskstats which lets us understand the io in the block
    layer. We have blktrace for more in depth understanding. We have
    e2fsprogs and debugsfs to give insight into the file systems behaviour,
    but we don't offer our users the ability understand what writeback is
    doing. There is no way to know how active it is over the whole system, if
    it's falling behind or to quantify it's efforts. With these values
    exported users can easily see how much data applications are sending
    through writeback and also at what rates writeback is processing this
    data. Comparing the rates of change between the two allow developers to
    see when writeback is not able to keep up with incoming traffic and the
    rate of dirty memory being sent to the IO back end. This allows folks to
    understand their io workloads and track kernel issues. Non kernel
    engineers at Google often use these counters to solve puzzling performance
    problems.

    Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written

    Patch #5 add writeback thresholds to /proc/vmstat

    Currently these values are in debugfs. But they should be promoted to
    /proc since they are useful for developers who are writing databases
    and file servers and are not debugging the kernel.

    The output is as below:

    # grep threshold /proc/vmstat
    nr_pages_dirty_threshold 409111
    nr_pages_dirty_background_threshold 818223

    This patch:

    This allows code outside of the mm core to safely manipulate page
    writeback state and not worry about the other accounting. Not using these
    routines means that some code will lose track of the accounting and we get
    bugs.

    Modify nilfs2 to use interface.

    Signed-off-by: Michael Rubin
    Reviewed-by: KOSAKI Motohiro
    Reviewed-by: Wu Fengguang
    Cc: KONISHI Ryusuke
    Cc: Jiro SEKIBA
    Cc: Dave Chinner
    Cc: Jens Axboe
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Rubin
     

23 Oct, 2010

9 commits

  • insert sparse annotations to fix following sparse warning.

    fs/nilfs2/segment.c:2681:3: warning: context imbalance in 'nilfs_segctor_kill_thread' - unexpected unlock

    nilfs_segctor_kill_thread is only called inside sc_state_lock lock.
    sparse doesn't detect the context and warn "unexpected unlock".
    __acquires/__releases pretend to lock/unlock the sc_state_lock for sparse.

    Signed-off-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi

    Jiro SEKIBA
     
  • Nilfs hasn't supported the freeze/thaw feature because it didn't work
    due to the peculiar design that multiple super block instances could
    be allocated for a device. This limitation was removed by the patch
    "nilfs2: do not allocate multiple super block instances for a device".

    So now this adds the freeze/thaw support to nilfs.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Nilfs object holds a back pointer to a writable super block instance
    in nilfs->ns_writer, and this became eliminable since sb is now made
    per device and all inodes have a valid pointer to it.

    This deletes the ns_writer pointer and a reader/writer semaphore
    protecting it.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This applies prepared rollback function and redirect function of
    metadata file to DAT file, and eliminates GCDAT inode.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • During garbage collection (GC), DAT file, which converts virtual block
    number to real block number, may return disk block number that is not
    yet written to the device.

    To avoid access to unwritten blocks, the current implementation stores
    changes to the caches of GCDAT during GC and atomically commit the
    changes into the DAT file after they are written to the device.

    This patch, instead, adds a function that makes a copy of specified
    buffer and stores it in nilfs_shadow_map, and a function to get the
    backup copy as needed (nilfs_mdt_freeze_buffer and
    nilfs_mdt_get_frozen_buffer respectively).

    Before DAT changes block number in an entry block, it makes a copy and
    redirect access to the buffer so that address conversion function
    (i.e. nilfs_dat_translate) refers to the old address saved in the
    copy.

    This patch gives requisites for such redirection.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root
    object.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This rewrites functions using ifile so that they get ifile from
    nilfs_root object, and will remove sbi->s_ifile. Some functions that
    don't know the root object are extended to receive it from caller.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This uses inode hash function that vfs provides instead of the own
    hash table for caching gc inodes. This finally removes the own inode
    hash from nilfs.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • On-memory inode structures of nilfs have a member "i_cno" which stores
    a checkpoint number related to the inode. For gc-inodes, this field
    indicates version of data each gc-inode caches for GC. Log writer
    temporarily uses "i_cno" to transfer the latest checkpoint number.

    This stops the latter use and lets only gc-inodes use it.

    The purpose of this patch is to allow the successive change use
    "i_cno" for inode lookup.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

23 Jul, 2010

4 commits

  • Super blocks of nilfs are periodically overwritten in order to record
    the recent log position. This shortens recovery time after unclean
    unmount, but the current implementation performs the update even for a
    few blocks of change. If the filesystem gets small changes slowly and
    continually, super blocks may be updated excessively.

    This moderates the issue by skipping update of log cursor if it does
    not cross a segment boundary.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This will sync super blocks in turns instead of syncing duplicate
    super blocks at the time. This will help searching valid super root
    when super block is written into disk before log is written, which is
    happen when barrier-less block devices are unmounted uncleanly. In
    the situation, old super block likely points to valid log.

    This patch introduces ns_sbwcount member to the nilfs object and adds
    nilfs_sb_will_flip() function; ns_sbwcount counts how many times super
    blocks write back to the disk. And, nilfs_sb_will_flip() decides
    whether flipping required or not based on the count of ns_sbwcount to
    sync super blocks asymmetrically.

    The following functions are also changed:

    - nilfs_prepare_super(): flips super blocks according to the
    argument. The argument is calculated by nilfs_sb_will_flip()
    function.

    - nilfs_cleanup_super(): sets "clean" flag to both super blocks if
    they point to the same checkpoint.

    To update both of super block information, caller of
    nilfs_commit_super must set the information on both super blocks.

    Signed-off-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi

    Jiro SEKIBA
     
  • This function checks validity of super block pointers.
    If first super block is invalid, it will swap the super blocks.
    The function should be called before any super block information updates.
    Caller must obtain nilfs->ns_sem.

    Signed-off-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi

    Jiro SEKIBA
     
  • This removes macros to test segment summary flags and redefines a few
    relevant macros with inline functions.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

10 May, 2010

6 commits

  • This kills the following sparse warnings:

    fs/nilfs2/segment.c:567:28: warning: symbol 'nilfs_sc_file_ops' was not declared. Should it be static?
    fs/nilfs2/segment.c:617:28: warning: symbol 'nilfs_sc_dat_ops' was not declared. Should it be static?
    fs/nilfs2/segment.c:625:28: warning: symbol 'nilfs_sc_dsync_ops' was not declared. Should it be static?

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • In nilfs_segctor_thread(), timer is a local variable allocated on stack. Its
    address can't be set to sci->sc_timer and passed in several procedures.

    It works now by chance, just because other procedures are called by
    nilfs_segctor_thread() directly or indirectly and the stack hasn't been
    deallocated yet.

    Signed-off-by: Li Hong
    Signed-off-by: Ryusuke Konishi

    Li Hong
     
  • There are only two lines of code in nilfs_segctor_init(). From a logic
    design view, the first line 'sci->sc_seq_done = sci->sc_seq_request;'
    should be put in nilfs_segctor_new(). Even in nilfs_segctor_new(),
    this initialization is needless because sci is kzalloc-ed. So
    nilfs_segctor_init() is only a wrap call to
    nilfs_segctor_start_thread().

    Signed-off-by: Li Hong
    Signed-off-by: Ryusuke Konishi

    Li Hong
     
  • This adds a field to record the latest checkpoint number in the
    nilfs_segment_summary structure. This will help to recover the latest
    checkpoint number from logs on disk. This field is intended for
    crucial cases in which super blocks have lost pointer to the latest
    log.

    Even though this will change the disk format, both backward and
    forward compatibility is preserved by a size field prepared in the
    segment summary header.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This cleanup patch gives several improvements:

    - Moving all kmem_cache_{create_destroy} calls into one place, which removes
    some small function calls, cleans up error check code and clarify the logic.

    - Mark all initial code in __init section.

    - Remove some very obvious comments.

    - Adjust some declarations.

    - Fix some space-tab issues.

    Signed-off-by: Li Hong
    Signed-off-by: Ryusuke Konishi

    Li Hong
     
  • This moves out checksum routines in log writer to segbuf.c for
    cleanup.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi