23 Oct, 2010

10 commits


10 Aug, 2010

4 commits

  • [folded build fix from sfr]

    Signed-off-by: Al Viro

    Al Viro
     
  • Replace inode_setattr with opencoded variants of it in all callers. This
    moves the remaining call to vmtruncate into the filesystem methods where it
    can be replaced with the proper truncate sequence.

    In a few cases it was obvious that we would never end up calling vmtruncate
    so it was left out in the opencoded variant:

    spufs: explicitly checks for ATTR_SIZE earlier
    btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
    ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

    In addition to that ncpfs called inode_setattr with handcrafted iattrs,
    which allowed to trim down the opencoded variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to block_write_begin.

    While we're at it also remove several unused arguments to block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

22 May, 2010

1 commit


10 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Nov, 2009

2 commits


20 Nov, 2009

2 commits

  • Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called
    mark_inode_dirty() after they changed the number of data blocks.

    This moves these calls outside bmap outermost functions like
    nilfs_bmap_insert() or nilfs_bmap_truncate().

    This will mitigate overhead for truncate or delete operation since
    they repeatedly remove set of blocks. Nearly 10 percent improvement
    was observed for removal of a large file:

    # dd if=/dev/zero of=/test/aaa bs=1M count=512
    # time rm /test/aaa

    real 2.968s -> 2.705s

    Further optimization may be possible by eliminating these
    mark_inode_dirty() uses though I avoid mixing separate changes here.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This lock is eliminable because inodes on the buffer can be updated
    independently. Although a log writer also fills in bmap data on the
    on-disk inodes, this update is exclusively done by a log writer lock.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

15 Nov, 2009

1 commit


29 Sep, 2009

1 commit


22 Sep, 2009

1 commit


14 Sep, 2009

1 commit


24 Jun, 2009

1 commit


10 Jun, 2009

3 commits

  • Although get_block() callback function can return extent of contiguous
    blocks with bh->b_size, nilfs_get_block() function did not support
    this feature.

    This adds contiguous lookup feature to the block mapping codes of
    nilfs, and allows the nilfs_get_blocks() function to return the extent
    information by applying the feature.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This adds a missing sync_page method which unplugs bio requests when
    waiting for page locks. This will improve read performance of nilfs.

    Here is a measurement result using dd command.

    Without this patch:

    # mount -t nilfs2 /dev/sde1 /test
    # dd if=/test/aaa of=/dev/null bs=512k
    1024+0 records in
    1024+0 records out
    536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s

    With this patch:

    # mount -t nilfs2 /dev/sde1 /test
    # dd if=/test/aaa of=/dev/null bs=512k
    1024+0 records in
    1024+0 records out
    536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Hi,

    I introduced "is_partially_uptodate" aops for NILFS2.

    A page can have multiple buffers and even if a page is not uptodate, some buffers
    can be uptodate on pagesize != blocksize environment.
    This aops checks that all buffers which correspond to a part of a file
    that we want to read are uptodate. If so, we do not have to issue actual
    read IO to HDD even if a page is not uptodate because the portion we
    want to read are uptodate.
    "block_is_partially_uptodate" function is already used by ext2/3/4.
    With the following patch random read/write mixed workloads or random read after
    random write workloads can be optimized and we can get performance improvement.

    I did a performance test using the sysbench.

    1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil
    e-rw-ratio=1 run

    -2.6.30-rc5

    Test execution summary:
    total time: 151.2907s
    total number of events: 200000
    total time taken by event execution: 2409.8387
    per-request statistics:
    min: 0.0000s
    avg: 0.0120s
    max: 0.9306s
    approx. 95 percentile: 0.0439s

    Threads fairness:
    events (avg/stddev): 12500.0000/238.52
    execution time (avg/stddev): 150.6149/0.01

    -2.6.30-rc5-patched

    Test execution summary:
    total time: 140.8828s
    total number of events: 200000
    total time taken by event execution: 2240.8577
    per-request statistics:
    min: 0.0000s
    avg: 0.0112s
    max: 0.8750s
    approx. 95 percentile: 0.0418s

    Threads fairness:
    events (avg/stddev): 12500.0000/218.43
    execution time (avg/stddev): 140.0536/0.01

    arch: ia64
    pagesize: 16k

    Thanks.

    Signed-off-by: Hisashi Hifumi
    Signed-off-by: Ryusuke Konishi

    Hisashi Hifumi
     

07 Apr, 2009

6 commits

  • After a review of user's feedback for finding out other compatibility
    issues, I found nilfs improperly initializes timestamps in inode;
    CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs
    didn't have nanosecond timestamps on disk. A few users gave us the report
    that the tar program sometimes failed to expand symbolic links on nilfs,
    and it turned out to be the cause.

    Instead of applying the above displacement, I've decided to support
    nanosecond timestamps on this occation. Fortunetaly, a needless 64-bit
    field was in the nilfs_inode struct, and I found it's available for this
    purpose without impact for the users.

    So, this will do the enhancement and resolve the tar problem.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The sketch file is a file to mark checkpoints with user data. It was
    experimentally introduced in the original implementation, and now
    obsolete. The file was handled differently with regular files; the file
    size got truncated when a checkpoint was created.

    This stops the special treatment and will treat it as a regular file.
    Most users are not affected because mkfs.nilfs2 no longer makes this file.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Pekka Enberg advised me:
    > It would be nice if BUG(), BUG_ON(), and panic() calls would be
    > converted to proper error handling using WARN_ON() calls. The BUG()
    > call in nilfs_cpfile_delete_checkpoints(), for example, looks to be
    > triggerable from user-space via the ioctl() system call.

    This will follow the comment and keep them to a minimum.

    Acked-by: Pekka Enberg
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Pekka Enberg pointed out that double error handlings found after
    nilfs_transaction_end() can be avoided by separating abort operation:

    OK, I don't understand this. The only way nilfs_transaction_end() can
    fail is if we have NILFS_TI_SYNC set and we fail to construct the
    segment. But why do we want to construct a segment if we don't commit?

    I guess what I'm asking is why don't we have a separate
    nilfs_transaction_abort() function that can't fail for the erroneous
    case to avoid this double error value tracking thing?

    This does the separation and renames nilfs_transaction_end() to
    nilfs_transaction_commit() for clarification.

    Since, some calls of these functions were used just for exclusion control
    against the segment constructor, they are replaced with semaphore
    operations.

    Acked-by: Pekka Enberg
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Chris Mason pointed out that there is a missed sync issue in
    nilfs_writepages():

    On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote:
    > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by
    > do_sync_mapping_range().

    where WB_SYNC_NONE in do_sync_mapping_range() was replaced with
    WB_SYNC_ALL by Nick's patch (commit:
    ee53a891f47444c53318b98dac947ede963db400).

    This fixes the problem by letting nilfs_writepages() write out the log of
    file data within the range if sync_mode is WB_SYNC_ALL.

    This involves removal of nilfs_file_aio_write() which was previously
    needed to ensure O_SYNC sync writes.

    Cc: Chris Mason
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds inode level operations of the nilfs2 file system.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi