30 Aug, 2010

1 commit


23 Aug, 2010

1 commit


19 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    fs: brlock vfsmount_lock
    fs: scale files_lock
    lglock: introduce special lglock and brlock spin locks
    tty: fix fu_list abuse
    fs: cleanup files_lock locking
    fs: remove extra lookup in __lookup_hash
    fs: fs_struct rwlock to spinlock
    apparmor: use task path helpers
    fs: dentry allocation consolidation
    fs: fix do_lookup false negative
    mbcache: Limit the maximum number of cache entries
    hostfs ->follow_link() braino
    hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
    remove SWRITE* I/O types
    kill BH_Ordered flag
    vfs: update ctime when changing the file's permission by setfacl
    cramfs: only unlock new inodes
    fix reiserfs_evict_inode end_writeback second call

    Linus Torvalds
     

18 Aug, 2010

2 commits

  • nilfs_discard_segment() doesn't wait for completion of discard
    requests. This specifies BLKDEV_IFL_WAIT flag when calling
    blkdev_issue_discard() in order to fix the sync failure.

    Reported-by: Christoph Hellwig
    Signed-off-by: Ryusuke Konishi
    Cc: Christoph Hellwig

    Ryusuke Konishi
     
  • Instead of abusing a buffer_head flag just add a variant of
    sync_dirty_buffer which allows passing the exact type of write
    flag required.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

16 Aug, 2010

2 commits

  • After applying commit b2ac86e1, the following message got appeared
    after unclean shutdown:

    > NILFS warning: broken superblock. using spare superblock.

    This turns out to be a false message due to the change which updates
    two super blocks alternately. The secondary super block now can be
    selected if it's newer than the primary one.

    This kills the false warning by suppressing it if another super block
    is not actually broken.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • If nilfs_attach_checkpoint() gets a memory allocation failure during
    creation of ifile, it will return without removing nilfs_sb_info
    struct from ns_supers list. When a concurrently mounted snapshot is
    unmounted or another new snapshot is mounted after that, this causes
    kernel oops as below:

    > BUG: unable to handle kernel NULL pointer dereference at (null)
    > IP: [] nilfs_find_sbinfo+0x74/0xa4 [nilfs2]
    > *pde = 00000000
    > Oops: 0000 [#1] SMP

    > Call Trace:
    > [] ? nilfs_get_sb+0x165/0x532 [nilfs2]
    > [] ? ida_get_new_above+0x16d/0x187
    > [] ? alloc_vfsmnt+0x7e/0x10a
    > [] ? kstrdup+0x2c/0x40
    > [] ? vfs_kern_mount+0x96/0x14e
    > [] ? do_kern_mount+0x32/0xbd
    > [] ? do_mount+0x642/0x6a1
    > [] ? do_page_fault+0x0/0x2d1
    > [] ? copy_mount_options+0x80/0xe2
    > [] ? strndup_user+0x48/0x67
    > [] ? sys_mount+0x61/0x90
    > [] ? sysenter_do_call+0x12/0x22

    This fixes the problem.

    Signed-off-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc: stable@kernel.org

    Ryusuke Konishi
     

11 Aug, 2010

2 commits

  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

7 commits

  • [folded build fix from sfr]

    Signed-off-by: Al Viro

    Al Viro
     
  • add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
    equivalent to I_FREEING for almost all code looking at either;
    it's there to keep track of having called clear_inode() exactly
    once per inode lifetime, at some point after having set I_FREEING.
    I_CLEAR and I_FREEING never get set at the same time with the
    current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
    instead of I_CLEAR without loss of information. As the result of
    such change, checks become simpler and the amount of code that needs
    to know about I_CLEAR shrinks a lot.

    Signed-off-by: Al Viro

    Al Viro
     
  • Replace inode_setattr with opencoded variants of it in all callers. This
    moves the remaining call to vmtruncate into the filesystem methods where it
    can be replaced with the proper truncate sequence.

    In a few cases it was obvious that we would never end up calling vmtruncate
    so it was left out in the opencoded variant:

    spufs: explicitly checks for ATTR_SIZE earlier
    btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
    ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

    In addition to that ncpfs called inode_setattr with handcrafted iattrs,
    which allowed to trim down the opencoded variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to block_write_begin.

    While we're at it also remove several unused arguments to block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Split up the block_write_begin implementation - __block_write_begin is a new
    trivial wrapper for block_prepare_write that always takes an already
    allocated page and can be either called from block_write_begin or filesystem
    code that already has a page allocated. Remove the handling of already
    allocated pages from block_write_begin after switching all callers that
    do it to __block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • For filesystem that implement directories in pagecache we call
    block_write_begin with an already allocated page for this code, while the
    normal regular file write path uses the default block_write_begin behaviour.

    Get rid of the __foofs_write_begin helper and opencode the normal write_begin
    call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for
    the directory code. The added benefit is that foofs_prepare_chunk has
    a much saner calling convention.

    Note that the interruptible flag passed into block_write_begin is always
    ignored if we already pass in a page (see next patch for details), and
    we never were doing truncations of exessive blocks for this case either so we
    can switch directly to block_write_begin_newtrunc.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

25 Jul, 2010

2 commits

  • This inserts sanity check that refuses to mount a filesystem with
    unsupported block size.

    Previously, kernel code of nilfs was looking only limitation of
    devices though mkfs.nilfs2 limits the range of block sizes; there was
    no check that prevents rec_len overflow with larger block sizes.

    With this change, block sizes larger than 64KB or smaller than 1KB
    will get rejected explicitly by kernel.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • With 64KB blocksize, a directory entry can have size 64KB which does
    not fit into 16 bits we have for entry length. So this patch stores
    0xffff instead and converts value when read from / written to disk.

    Nilfs derives its directory implementation from ext2 filesystem, and
    this draws upon the corresponding change on ext2.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

24 Jul, 2010

1 commit

  • Implementation of nilfs_get_page() is a bit old as below:

    - A common read_mapping_page inline function is now available instead
    of its read_cache_page use.
    - wait_on_page_locked() use in the function is eliminable since
    read_cache_page function does the same thing through wait_on_page_read().
    - PageUptodate() check is eliminable for the same reason.

    This renews nilfs_get_page() based on these points.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

23 Jul, 2010

20 commits

  • This forces nilfs to check compatibility of feature flags so as to
    reject a filesystem with unknown features when it mounts or remounts
    the filesystem.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This applies read-ahead to nilfs_btree_do_lookup and
    nilfs_btree_lookup_contig functions and extends them to read ahead
    siblings of level 1 btree nodes that hold data blocks.

    At present, the read-ahead is not applied to most btree operations;
    only get_block() callback function, which is used during read of
    regular files or directories, receives the benefit.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_btree_get_block() now may return untested buffer due to
    read-ahead. This adds a new flag for buffer heads so that the btree
    code can check whether the buffer is already verified or not.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This adds __nilfs_btree_get_block() function that can issue a series
    of read-ahead requests for sibling btree nodes.

    This read-ahead needs parent node block, so nilfs_btree_readahead_info
    structure is added to pass the information that
    __nilfs_btree_get_block() needs.

    This also replaces the previous nilfs_btree_get_block() implementation
    with a wrapper function of __nilfs_btree_get_block().

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This adds mode argument to nilfs_btnode_submit_block() function and
    allows it to issue a read-ahead request.

    An optional submit_ptr argument is also added to store the actual
    block address for which bio is sent. submit_ptr is used for a series
    of read-ahead requests, and helps to decide if each requested block is
    continous to the previous one on disk.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_btnode_submit_block() refers to buffer head just before
    returning from the function, but it releases the buffer head earlier
    than that if nilfs_dat_translate() gets an error.

    This has potential for oops in the erroneous case. This fixes the
    issue.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This removes all inline uses from btree.c. Gcc now agressively apply
    inline expansion even for the functions declared without the keyword;
    the inline use in btree.c looks excessive.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The patch "reduce repetitive calculation of max number of child nodes"
    gathered up the calculation of maximum number of child nodes into
    nilfs_btree_nchildren_per_block() function. This makes the function
    get resultant value from a private variable in bmap object instead of
    calculating it for each call.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The current btree implementation repeats the same calculation on the
    maximum number of child nodes. This is because a few low level
    routines use the calculation for index addressing in a btree node
    block.

    This reduces the calculation by explicitly passing the maximum number
    of child nodes (ncmax) through their argument.

    This changes parameter passing of the following functions:

    - nilfs_btree_node_dptrs
    - nilfs_btree_node_get_ptr
    - nilfs_btree_node_set_ptr
    - nilfs_btree_node_init
    - nilfs_btree_node_move_left
    - nilfs_btree_node_move_right
    - nilfs_btree_node_insert
    - nilfs_btree_node_delete, and
    - nilfs_btree_get_node

    The following functions are removed:

    - nilfs_btree_node_nchildren_min
    - nilfs_btree_node_nchildren_max

    Most middle level btree operations are rewritten to pass a proper
    ncmax value depending on whether each occurrence of node is "root" or
    not.

    A constant NILFS_BTREE_ROOT_NCHILDREN_MAX is used for the root node,
    whereas nilfs_btree_nchildren_per_block() function is used for
    non-root nodes. If a node could be either root or a non-root node, an
    output argument of nilfs_btree_get_node() is used to set up ncmax.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_btree_node_nchildren_max() and nilfs_btree_node_nchildren_min()
    functions switch return value depending on whether target node is the
    root or a node block. In most uses of these functions, however, the
    node type is fixed, and moreover the same calculation is repeatedly
    performed in loop.

    This unfold these functions depending on context and move them outside
    loops wherever possible.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • nilfs_bmap_lookup and its variants are supposed to take a valid
    pointer argument to return a block address, thus pointer checks in
    nilfs_btree_lookup and nilfs_direct_lookup are needless.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This removes nilfs_bmap_union and finally unifies three structures and
    the union in bmap/btree code into one.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This unifies two similar functions nilfs_btree_set_target_v and
    nilfs_direct_set_target_v into one, nilfs_bmap_set_target_v.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This replaces all uses of nilfs_btree struct in implementation of
    btree mapping with nilfs_bmap struct.

    Name of local variable "btree" is kept not to bloat amount of change.
    And, a part of local variables "bmap" is renamed to "btree" to uniform
    naming rule.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This replaces all uses of nilfs_direct struct in implementation of
    direct mapping with nilfs_bmap struct.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The first argument of bops->bop_propagate operation takes a constant
    qualifier, and causes compilation error when removed cast to pointer
    of nilfs_btree structure type. This fixes the issue to prepare for
    succesive removal of nilfs_btree struct.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Will remove nilfs_bmap_key_to_dkey(), nilfs_bmap_dkey_to_key(),
    nilfs_bmap_ptr_to_dptr(), and nilfs_bmap_dptr_to_ptr() for simplicity.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This inserts sanity checks soon after read btree node from disk. This
    allows early detection of broken btree nodes, and helps to narrow down
    problems due to file system corruption.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • According to the report titled "problem with nilfs_cleanerd" from
    Łukasz Wójcicki, nilfs_btree_lookup_dirty_buffers or
    nilfs_btree_add_dirty_buffer got memory violation during garbage
    collection.

    This could happen if a level field of given btree node buffer is
    incorrect, which is a crucial internal bug.

    This inserts a sanity check to figure out the problem.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This adds is_remount argument to the parse_options() function that
    obtains mount options from strings.

    Previously, parse_options did not distinguish context whether it's
    called for a new mount or remount, so the caller needed additional
    verifications outside the function.

    This allows parse_options to verify options and print messages
    depending on the context.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi