23 Jul, 2010

40 commits

  • This applies read-ahead to nilfs_btree_do_lookup and
    nilfs_btree_lookup_contig functions and extends them to read ahead
    siblings of level 1 btree nodes that hold data blocks.

    At present, the read-ahead is not applied to most btree operations;
    only get_block() callback function, which is used during read of
    regular files or directories, receives the benefit.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_btree_get_block() now may return untested buffer due to
    read-ahead. This adds a new flag for buffer heads so that the btree
    code can check whether the buffer is already verified or not.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This adds __nilfs_btree_get_block() function that can issue a series
    of read-ahead requests for sibling btree nodes.

    This read-ahead needs parent node block, so nilfs_btree_readahead_info
    structure is added to pass the information that
    __nilfs_btree_get_block() needs.

    This also replaces the previous nilfs_btree_get_block() implementation
    with a wrapper function of __nilfs_btree_get_block().

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This adds mode argument to nilfs_btnode_submit_block() function and
    allows it to issue a read-ahead request.

    An optional submit_ptr argument is also added to store the actual
    block address for which bio is sent. submit_ptr is used for a series
    of read-ahead requests, and helps to decide if each requested block is
    continous to the previous one on disk.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_btnode_submit_block() refers to buffer head just before
    returning from the function, but it releases the buffer head earlier
    than that if nilfs_dat_translate() gets an error.

    This has potential for oops in the erroneous case. This fixes the
    issue.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This removes all inline uses from btree.c. Gcc now agressively apply
    inline expansion even for the functions declared without the keyword;
    the inline use in btree.c looks excessive.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The patch "reduce repetitive calculation of max number of child nodes"
    gathered up the calculation of maximum number of child nodes into
    nilfs_btree_nchildren_per_block() function. This makes the function
    get resultant value from a private variable in bmap object instead of
    calculating it for each call.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The current btree implementation repeats the same calculation on the
    maximum number of child nodes. This is because a few low level
    routines use the calculation for index addressing in a btree node
    block.

    This reduces the calculation by explicitly passing the maximum number
    of child nodes (ncmax) through their argument.

    This changes parameter passing of the following functions:

    - nilfs_btree_node_dptrs
    - nilfs_btree_node_get_ptr
    - nilfs_btree_node_set_ptr
    - nilfs_btree_node_init
    - nilfs_btree_node_move_left
    - nilfs_btree_node_move_right
    - nilfs_btree_node_insert
    - nilfs_btree_node_delete, and
    - nilfs_btree_get_node

    The following functions are removed:

    - nilfs_btree_node_nchildren_min
    - nilfs_btree_node_nchildren_max

    Most middle level btree operations are rewritten to pass a proper
    ncmax value depending on whether each occurrence of node is "root" or
    not.

    A constant NILFS_BTREE_ROOT_NCHILDREN_MAX is used for the root node,
    whereas nilfs_btree_nchildren_per_block() function is used for
    non-root nodes. If a node could be either root or a non-root node, an
    output argument of nilfs_btree_get_node() is used to set up ncmax.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_btree_node_nchildren_max() and nilfs_btree_node_nchildren_min()
    functions switch return value depending on whether target node is the
    root or a node block. In most uses of these functions, however, the
    node type is fixed, and moreover the same calculation is repeatedly
    performed in loop.

    This unfold these functions depending on context and move them outside
    loops wherever possible.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_bmap_lookup and its variants are supposed to take a valid
    pointer argument to return a block address, thus pointer checks in
    nilfs_btree_lookup and nilfs_direct_lookup are needless.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This removes nilfs_bmap_union and finally unifies three structures and
    the union in bmap/btree code into one.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This unifies two similar functions nilfs_btree_set_target_v and
    nilfs_direct_set_target_v into one, nilfs_bmap_set_target_v.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This replaces all uses of nilfs_btree struct in implementation of
    btree mapping with nilfs_bmap struct.

    Name of local variable "btree" is kept not to bloat amount of change.
    And, a part of local variables "bmap" is renamed to "btree" to uniform
    naming rule.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This replaces all uses of nilfs_direct struct in implementation of
    direct mapping with nilfs_bmap struct.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The first argument of bops->bop_propagate operation takes a constant
    qualifier, and causes compilation error when removed cast to pointer
    of nilfs_btree structure type. This fixes the issue to prepare for
    succesive removal of nilfs_btree struct.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Will remove nilfs_bmap_key_to_dkey(), nilfs_bmap_dkey_to_key(),
    nilfs_bmap_ptr_to_dptr(), and nilfs_bmap_dptr_to_ptr() for simplicity.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This inserts sanity checks soon after read btree node from disk. This
    allows early detection of broken btree nodes, and helps to narrow down
    problems due to file system corruption.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • According to the report titled "problem with nilfs_cleanerd" from
    Łukasz Wójcicki, nilfs_btree_lookup_dirty_buffers or
    nilfs_btree_add_dirty_buffer got memory violation during garbage
    collection.

    This could happen if a level field of given btree node buffer is
    incorrect, which is a crucial internal bug.

    This inserts a sanity check to figure out the problem.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This adds is_remount argument to the parse_options() function that
    obtains mount options from strings.

    Previously, parse_options did not distinguish context whether it's
    called for a new mount or remount, so the caller needed additional
    verifications outside the function.

    This allows parse_options to verify options and print messages
    depending on the context.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This replaces seq_printf() with seq_puts() in nilfs_show_options for
    mount options which have no argument.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Nilfs has "discard" mount option which issues discard/TRIM commands to
    underlying block device, but it lacks a complementary option and has
    no way to disable the feature through remount.

    This adds "nodiscard" option to resolve this imbalance.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Nilfs enables write barriers by default and has "nobarrier" mount
    option to disable this feature. But it lacks the complementary option
    and has no way to re-enable the feature on remount.

    This adds "barrier" option to resolve this imbalance.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Super blocks of nilfs are periodically overwritten in order to record
    the recent log position. This shortens recovery time after unclean
    unmount, but the current implementation performs the update even for a
    few blocks of change. If the filesystem gets small changes slowly and
    continually, super blocks may be updated excessively.

    This moderates the issue by skipping update of log cursor if it does
    not cross a segment boundary.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Although nilfs redundantly uses two super blocks and each may point to
    different position on log, the current version of nilfs does not try
    fallback to the spare super block when it doesn't find any valid log
    at the position that the primary super block points to.

    This has been a cause of mount failures due to write order reversals
    on barrier less block devices.

    This inserts fallback code in error path of nilfs_search_super_root
    routine to resolve the mount failure problem.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_search_super_root can return -ENOMEM, but this error code is not
    described in its kernel-doc comment. This fixes the discrepancy.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This separates a setup routine of log cursor from init_nilfs(). The
    routine, nilfs_store_log_cursor, reads the last position of the log
    containing a super root, and initializes relevant state on the nilfs
    object.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This will sync super blocks in turns instead of syncing duplicate
    super blocks at the time. This will help searching valid super root
    when super block is written into disk before log is written, which is
    happen when barrier-less block devices are unmounted uncleanly. In
    the situation, old super block likely points to valid log.

    This patch introduces ns_sbwcount member to the nilfs object and adds
    nilfs_sb_will_flip() function; ns_sbwcount counts how many times super
    blocks write back to the disk. And, nilfs_sb_will_flip() decides
    whether flipping required or not based on the count of ns_sbwcount to
    sync super blocks asymmetrically.

    The following functions are also changed:

    - nilfs_prepare_super(): flips super blocks according to the
    argument. The argument is calculated by nilfs_sb_will_flip()
    function.

    - nilfs_cleanup_super(): sets "clean" flag to both super blocks if
    they point to the same checkpoint.

    To update both of super block information, caller of
    nilfs_commit_super must set the information on both super blocks.

    Signed-off-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi

    Jiro SEKIBA
     
  • This function checks validity of super block pointers.
    If first super block is invalid, it will swap the super blocks.
    The function should be called before any super block information updates.
    Caller must obtain nilfs->ns_sem.

    Signed-off-by: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi

    Jiro SEKIBA
     
  • This moves out section that updates information of the recent log
    position stored in super blocks from nilfs_commit_super to a new
    routine named nilfs_set_log_cursor.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This function marks error state and write it on super blocks. This is
    a preparation for making super block writeback alternately.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This function write out filesystem state to super blocks in order to
    share the same cleanup work. This is a preparation for making super
    block writeback alternately.

    Cc: Jiro SEKIBA
    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Mount time field in super block is wrongly updated when nilfs remounts
    the partition from read-write to read-only. This fixes the issue.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This counter is unused.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This removes macros to test segment summary flags and redefines a few
    relevant macros with inline functions.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This will get rid of nilfs_segsum_info use from recovery functions for
    simplicity.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • load_segment_summary function has two distinct roles: getting summary
    header of a log, and verifying consistencies of the log.

    This divide it into two corresponding functions, nilfs_read_log_header
    and nilfs_validate_log to clarify the meaning.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The function name of nilfs_recover_logical_segments makes no sense.
    This changes the name into nilfs_salvage_orphan_logs to clarify the
    role of the function.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Most functions in recovery code take an argument of a super block
    instance or a nilfs_sb_info struct for convenience sake.

    This replaces them aggressively with a nilfs object by applying
    __bread and __breadahead against routines using sb_bread and
    sb_breadahead.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This stores blocksize in nilfs objects for the successive refactoring
    of recovery logic.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Linus Torvalds