30 May, 2009

1 commit


22 May, 2009

1 commit


12 May, 2009

1 commit

  • Although some ioctls of nilfs2 exchange data in the form of indirectly
    referenced array, some of them lack size check on the array elements.

    This inserts the missing checks and rejects requests if data of ioctl
    does not have a valid format.

    We usually don't have to check size of structures that we associated
    with ioctl commands because the size is tested implicitly for
    identifying ioctl command; the checks this patch adds are for the
    cases where the implicit check is not applied.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

11 May, 2009

2 commits

  • This is a companion patch to ("nilfs2: fix possible circular locking
    for get information ioctls").

    This corrects lock order reversal between mm->mmap_sem and
    nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by
    lockdep check:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.30-rc3-nilfs-00003-g360bdc1 #7
    -------------------------------------------------------
    mmap/5294 is trying to acquire lock:
    (&nilfs->ns_segctor_sem){++++.+}, at: [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]

    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: [] do_page_fault+0x1d8/0x30a

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++++}:
    [] __lock_acquire+0x1066/0x13b0
    [] lock_acquire+0xba/0xdd
    [] might_fault+0x68/0x88
    [] copy_from_user+0x2a/0x111
    [] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2]
    [] nilfs_clean_segments+0x6d/0x1b9 [nilfs2]
    [] nilfs_ioctl+0x2ad/0x318 [nilfs2]
    [] vfs_ioctl+0x22/0x69
    [] do_vfs_ioctl+0x460/0x499
    [] sys_ioctl+0x40/0x5a
    [] sysenter_do_call+0x12/0x38
    [] 0xffffffff

    -> #0 (&nilfs->ns_segctor_sem){++++.+}:
    [] __lock_acquire+0xdcc/0x13b0
    [] lock_acquire+0xba/0xdd
    [] down_read+0x2a/0x3e
    [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
    [] __do_fault+0x165/0x376
    [] handle_mm_fault+0x287/0x5d1
    [] do_page_fault+0x2fb/0x30a
    [] error_code+0x72/0x78
    [] 0xffffffff

    where nilfs_clean_segments() holds:

    nilfs->ns_segctor_sem -> copy_from_user()
    --> page fault -> mm->mmap_sem

    And, page fault path may hold:

    page fault -> mm->mmap_sem
    --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem

    Even though nilfs_clean_segments() does not perform write access on
    given user pages, it may cause deadlock because nilfs->ns_segctor_sem
    is shared per device and mm->mmap_sem can be shared with other tasks.

    To avoid this problem, this patch moves all calls of copy_from_user()
    outside the nilfs->ns_segctor_sem lock in the ioctl.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This is one of two patches which are to correct possible circular
    locking between mm->mmap_sem and nilfs->ns_segctor_sem.

    The problem was detected by lockdep check as follows:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.30-rc3-nilfs-00002-g3552613 #6
    -------------------------------------------------------
    mmap/5418 is trying to acquire lock:
    (&nilfs->ns_segctor_sem){++++.+}, at: [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]

    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: [] do_page_fault+0x1d8/0x30a

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++++}:
    [] __lock_acquire+0x1066/0x13b0
    [] lock_acquire+0xba/0xdd
    [] might_fault+0x68/0x88
    [] copy_to_user+0x2c/0xfc
    [] nilfs_ioctl_wrap_copy+0x103/0x160 [nilfs2]
    [] nilfs_ioctl+0x30a/0x3b0 [nilfs2]
    [] vfs_ioctl+0x22/0x69
    [] do_vfs_ioctl+0x460/0x499
    [] sys_ioctl+0x40/0x5a
    [] sysenter_do_call+0x12/0x38
    [] 0xffffffff

    -> #0 (&nilfs->ns_segctor_sem){++++.+}:
    [] __lock_acquire+0xdcc/0x13b0
    [] lock_acquire+0xba/0xdd
    [] down_read+0x2a/0x3e
    [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
    [] __do_fault+0x165/0x376
    [] handle_mm_fault+0x287/0x5d1
    [] do_page_fault+0x2fb/0x30a
    [] error_code+0x72/0x78
    [] 0xffffffff

    other info that might help us debug this:

    1 lock held by mmap/5418:
    #0: (&mm->mmap_sem){++++++}, at: [] do_page_fault+0x1d8/0x30a

    stack backtrace:
    Pid: 5418, comm: mmap Not tainted 2.6.30-rc3-nilfs-00002-g3552613 #6
    Call Trace:
    [] ? printk+0xf/0x12
    [] print_circular_bug_tail+0xaa/0xb5
    [] __lock_acquire+0xdcc/0x13b0
    [] ? nilfs_sufile_get_stat+0x1e/0x105 [nilfs2]
    [] ? up_read+0x16/0x2c
    [] ? nilfs_sufile_get_stat+0xfa/0x105 [nilfs2]
    [] lock_acquire+0xba/0xdd
    [] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] down_read+0x2a/0x3e
    [] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
    [] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
    [] __do_fault+0x165/0x376
    [] handle_mm_fault+0x287/0x5d1
    [] ? do_page_fault+0x1d8/0x30a
    [] ? down_read_trylock+0x39/0x43
    [] do_page_fault+0x2fb/0x30a
    [] ? do_page_fault+0x0/0x30a
    [] error_code+0x72/0x78
    [] ? do_page_fault+0x0/0x30a

    This makes the lock granularity of nilfs->ns_segctor_sem finer than
    that of the mmap semaphore for ioctl commands except
    nilfs_clean_segments().

    The successive patch ("nilfs2: fix lock order reversal in
    nilfs_clean_segments ioctl") is required to fully resolve the problem.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

10 May, 2009

1 commit

  • This would fix the following failure during GC:

    nilfs_cpfile_delete_checkpoints: cannot delete block
    NILFS: GC failed during preparation: cannot delete checkpoints: err=-2

    The problem was caused by a break in state consistency between page
    cache and btree; the above block was removed from the btree but the
    page buffering the block was remaining in the page cache in dirty
    state.

    This resolves the inconsistency by ensuring to clear dirty state of
    the page buffering the deleted block.

    Reported-by: David Arendt
    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

09 May, 2009

2 commits

  • This fixes the following circular locking dependency problem:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.30-rc3 #5
    -------------------------------------------------------
    segctord/3895 is trying to acquire lock:
    (&nilfs->ns_writer_mutex){+.+...}, at: []
    nilfs_mdt_get_block+0x89/0x20f [nilfs2]

    but task is already holding lock:
    (&bmap->b_sem){++++..}, at: []
    nilfs_bmap_propagate+0x14/0x2e [nilfs2]

    which lock already depends on the new lock.

    The bugfix is done by replacing call sites of nilfs_get_writer() which
    are never called from read-only context with direct dereferencing of
    pointer to a writable FS-instance.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • Some function calls in nilfs_prepare_segment_for_recovery() may fail
    because they can create blocks on meta data files without configuring
    a writable FS-instance. Concretely, nilfs_mdt_create_block() routine
    of meta data files will fail in that case.

    This fixes the problem by temporarily attaching a writable FS-instace
    during the function is called.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

13 Apr, 2009

8 commits

  • On-disk counters ndirtysegs and ncleansegs of sufile, can go wrong
    after roll-forward recovery because
    nilfs_prepare_segment_for_recovery() function marks segments dirty
    without adjusting value of these counters.

    This fixes the problem by adding a function to sufile which does the
    operation adjusting the counters, and by letting the recovery function
    use it.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This will simplify sufile.c by sharing common code which repeatedly
    appears in routines updating a segment usage entry; a wrapper function
    nilfs_sufile_update() is introduced for the purpose, and counter
    modifications are integrated to a new function
    nilfs_sufile_mod_counter().

    This is a preparation for the successive bugfix patch ("nilfs2: fix
    possible mismatch of sufile counters on recovery").

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The nilfs_sufile_set_error() function wrongly adjusts the number of
    dirty segments instead of the number of clean segments. In addition,
    the function calls brelse() twice for the same buffer head.

    This fixes these bugs.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This fixes a bug of ("nilfs2: simplify handling of active state of
    segments") patch. The patch did not take account that a base index is
    increased in nilfs_sufile_get_suinfo() function if requested entries
    go across block boundary on sufile.

    Due to this bug, the active flag sometimes appears on wrong segments
    and has induced malfunction of garbage collection.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • A MODULE_VERSION() macro has been used in out-of-tree nilfs modules,
    but it's needless and not updated in tree. So, this removes it along
    with the version declaration.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This fixes the following false detection of lockdep against nilfs meta
    data files:

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.29 #26
    ---------------------------------------------
    mount.nilfs2/4185 is trying to acquire lock:
    (&mi->mi_sem){----}, at: [] nilfs_sufile_get_stat+0x1e/0x105 [nilfs2]
    but task is already holding lock:
    (&mi->mi_sem){----}, at: [] nilfs_count_free_blocks+0x48/0x84 [nilfs2]

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • The bmap semaphore of DAT file can be held while a bmap of other files
    is locked. This has caused the following false detection of lockdep
    check:

    mount.nilfs2/4667 is trying to acquire lock:
    (&bmap->b_sem){..--}, at: [] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2]

    but task is already holding lock:
    (&bmap->b_sem){..--}, at: [] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2]

    This will fix the false detection by distinguishing semaphores of the
    DAT and other files.

    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     
  • This follows the change of Coly Li's series ("fs: return f_fsid for
    statfs(2)"), and make nilfs2 return f_fsid info for statfs(2).

    Acked-by: Coly Li
    Signed-off-by: Ryusuke Konishi

    Ryusuke Konishi
     

07 Apr, 2009

24 commits

  • After a review of user's feedback for finding out other compatibility
    issues, I found nilfs improperly initializes timestamps in inode;
    CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs
    didn't have nanosecond timestamps on disk. A few users gave us the report
    that the tar program sometimes failed to expand symbolic links on nilfs,
    and it turned out to be the cause.

    Instead of applying the above displacement, I've decided to support
    nanosecond timestamps on this occation. Fortunetaly, a needless 64-bit
    field was in the nilfs_inode struct, and I found it's available for this
    purpose without impact for the users.

    So, this will do the enhancement and resolve the tar problem.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The former versions didn't have extra super blocks. This improves the
    weak point by introducing another super block at unused region in tail of
    the partition.

    This doesn't break disk format compatibility; older versions just ingore
    the secondary super block, and new versions just recover it if it doesn't
    exist. The partition created by an old mkfs may not have unused region,
    but in that case, the secondary super block will not be added.

    This doesn't make more redundant copies of the super block; it is a future
    work.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • will reduce some lines of segment constructor. Previously, the state was
    complexly controlled through a list of segments in order to keep
    consistency in meta data of usage state of segments. Instead, this
    presents ``calculated'' active flags to userland cleaner program and stop
    maintaining its real flag on disk.

    Only by this fake flag, the cleaner cannot exactly know if each segment is
    reclaimable or not. However, the recent extension of nilfs_sustat ioctl
    struct (nilfs2-extend-nilfs_sustat-ioctl-struct.patch) can prevent the
    cleaner from reclaiming in-use segment wrongly.

    So, now I can apply this for simplification.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Nilfs creates checkpoints even for garbage collection or metadata updates
    such as checkpoint mode change. So, user often sees checkpoints created
    only by such internal operations.

    This is inconvenient in some situations. For example, application that
    monitors checkpoints and changes them to snapshots, will fall into an
    infinite loop because it cannot distinguish internally created
    checkpoints.

    This patch solves this sort of problem by adding a flag to checkpoint for
    identification.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The sketch file is a file to mark checkpoints with user data. It was
    experimentally introduced in the original implementation, and now
    obsolete. The file was handled differently with regular files; the file
    size got truncated when a checkpoint was created.

    This stops the special treatment and will treat it as a regular file.
    Most users are not affected because mkfs.nilfs2 no longer makes this file.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds a missing endian conversion of checksum field in the super
    block. This fixes compatibility issue on big endian machines which will
    come to surface after supporting recovery of super block.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Pekka Enberg advised me:
    > It would be nice if BUG(), BUG_ON(), and panic() calls would be
    > converted to proper error handling using WARN_ON() calls. The BUG()
    > call in nilfs_cpfile_delete_checkpoints(), for example, looks to be
    > triggerable from user-space via the ioctl() system call.

    This will follow the comment and keep them to a minimum.

    Acked-by: Pekka Enberg
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds a new argument to the nilfs_sustat structure.

    The extended field allows to delete volatile active state of segments,
    which was needed to protect freshly-created segments from garbage
    collection but has confused code dealing with segments. This
    extension alleviates the mess and gives room for further
    simplifications.

    The volatile active flag is not persistent, so it's eliminable on this
    occasion without affecting compatibility other than the ioctl change.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Pekka Enberg suggested converting ->ioctl operations to use
    ->unlocked_ioctl to avoid BKL.

    The conversion was verified to be safe, so I will take it on this
    occasion.

    Cc: Pekka Enberg
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This removes compat code from the nilfs ioctls and applies the same
    function for both .ioctl and .compat_ioctl file operations.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Nilfs ioctl had structures not having fixed sized types such as:

    struct nilfs_argv {
    void *v_base;
    size_t v_nmembs;
    size_t v_size;
    int v_index;
    int v_flags;
    };

    Further, some of them are wrongly aligned:

    e.g.

    struct nilfs_cpmode {
    __u64 cm_cno;
    int cm_mode;
    };

    The size of wrongly aligned structures varies depending on
    architectures, and it breaks the identity of ioctl commands, which
    leads to arch dependent errors.

    Previously, these are compensated by using compat_ioctl.

    This fixes these problems and allows removal of compat ioctl.

    Since this will change sizes of those structures, binary compatibility
    for the past utilities will once break; new utilities have to be used
    instead. However, it would be helpful to avoid platform dependent
    problems in the long term.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This removes NILFS_IOCTL_TIMEDWAIT command from ioctl interface along
    with the related flags and wait queue.

    The command is terrible because it just sleeps in the ioctl. I prefer
    to avoid this by devising means of event polling in userland program.
    By reconsidering the userland GC daemon, I found this is possible
    without changing behaviour of the daemon and sacrificing efficiency.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This will fix the weird behavior of lscp command in listing continuously
    created checkpoints; the output of lscp is rewinded regularly for the
    recent nilfs. As a result of debugging, a defect was found in
    nilfs_cpfile_do_get_cpinfo() function.

    Though the function can be repeatedly called to enumerate checkpoints and
    it can skip invalid checkpoint entries, the index value was not carried
    between successive calls.

    The bug has long been present, and came to surface after applying a bugfix
    nilfs2-fix-problems-of-memory-allocation-in-ioctl.patch, which increased
    frequency of calling the function. The similar bugfix was already applied
    for ``snapshots'' by
    nilfs2-fix-gc-failure-on-volumes-keeping-numerous-snapshots.patch.

    This fixes the problem by making the index argument bidirectional on the
    function.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This cleans up the strange indirect function calling convention used in
    nilfs to follow the normal kernel coding style.

    Signed-off-by: Pekka Enberg
    Acked-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • A few tool developers gave me requests for fixing inconvenient return
    value of nilfs_get_cpinfo() ioctl; if the requested mode is NILFS_SNAPSHOT
    and the specified start entry is not a snapshot, the ioctl unnaturally
    returns one as the number of acquired snapshot item.

    In addition, the ioctl function returns an ENOENT error for checkpoints
    within blocks deleted by garbage collection.

    These behaviors require corrections for programs which enumerate
    snapshots. This resolves the inconvenience by changing the return values
    to zero for the above cases.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This resolves the following failure of nilfs2 cleaner daemon:

    nilfs_cleanerd[20670]: cannot clean segments: No such file or directory
    nilfs_cleanerd[20670]: shutdown

    When creating thousands of snapshots, the cleaner daemon had rarely died
    as above due to an error returned from the kernel code.

    After applying the recent patch which fixed memory allocation problems in
    ioctl (Message-Id: ), the
    problem gets more frequent.

    It turned out to be a bug of nilfs_ioctl_wrap_copy function and one of its
    callback routines to read out information of snapshots; if the
    nilfs_ioctl_wrap_copy function divided a large read request into multiple
    requests, the second and later requests have failed since a restart
    position on snapshot meta data was not properly set forward.

    It's a deficiency of the callback interface that cannot pass the restart
    position among multiple requests. This patch fixes the issue by allowing
    nilfs_ioctl_wrap_copy and snapshot read functions to exchange a position
    argument.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The file gcinode.c gives buffer cache functions for on-disk blocks
    moved in garbage collection. Joern Engel has suggested inserting its
    explanations in the source file (Message-ID:
    and
    ).

    This follows the comment.

    Cc: Joern Engel
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Pekka Enberg pointed out that double error handlings found after
    nilfs_transaction_end() can be avoided by separating abort operation:

    OK, I don't understand this. The only way nilfs_transaction_end() can
    fail is if we have NILFS_TI_SYNC set and we fail to construct the
    segment. But why do we want to construct a segment if we don't commit?

    I guess what I'm asking is why don't we have a separate
    nilfs_transaction_abort() function that can't fail for the erroneous
    case to avoid this double error value tracking thing?

    This does the separation and renames nilfs_transaction_end() to
    nilfs_transaction_commit() for clarification.

    Since, some calls of these functions were used just for exclusion control
    against the segment constructor, they are replaced with semaphore
    operations.

    Acked-by: Pekka Enberg
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This will remove the following unnecessary locks and cleanup code in
    nilfs_clear_inode():

    - unnecessary protection using nilfs_transaction_begin() and
    nilfs_transaction_end().

    - cleanup code of i_dirty list field which is never chained
    when this function is called.

    - spinlock used when releasing i_bh field.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This is another patch for fixing the following problems of a memory
    copy function in nilfs2 ioctl:

    (1) It tries to allocate 128KB size of memory even for small objects.

    (2) Though the function repeatedly tries large memory allocations
    while reducing the size, GFP_NOWAIT flag is not specified.
    This increases the possibility of system memory shortage.

    (3) During the retries of (2), verbose warnings are printed
    because _GFP_NOWARN flag is not used for the kmalloc calls.

    The first patch was still doing large allocations by kmalloc which are
    repeatedly tried while reducing the size.

    Andi Kleen told me that using copy_from_user for large memory is not
    good from the viewpoint of preempt latency:

    On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen wrote:
    > > In the current interface, each data item is copied twice: one is to
    > > the allocated memory from user space (via copy_from_user), and another
    >
    > For such large copies it is better to use multiple smaller (e.g. 4K)
    > copy user, that gives better real time preempt latencies. Each cfu has a
    > cond_resched(), but only one, not multiple times in the inner loop.

    He also advised me that:

    On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen wrote:
    > Better would be if you could go to PAGE_SIZE. order 0 allocations
    > are typically the fastest / least likely to stall.
    >
    > Also in this case it's a good idea to use __get_free_pages()
    > directly, kmalloc tends to be become less efficient at larger
    > sizes.

    For the function in question, the size of buffer memory can be reduced
    since the buffer is repeatedly used for a number of small objects. On
    the other hand, it may incur large preempt latencies for larger buffer
    because a copy_from_user (and a copy_to_user) was applied only once
    each cycle.

    With that, this revision uses the order 0 allocations with
    __get_free_pages() to fix the original problems.

    Cc: Andi Kleen
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds a Makefile for the nilfs2 file system, and updates the
    makefile and Kconfig file in the file system directory.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds userland interface implemented with ioctl.

    Signed-off-by: Koji Sato
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Koji Sato
     
  • This adds the cache of on-disk blocks to be moved in garbage
    collection. The disk blocks are held with dummy inodes (called
    gcinodes), and this file provides lookup function of the dummy inodes,
    and their buffer read function.

    Signed-off-by: Seiji Kihara
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Yoshiji Amagai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • NILFS2 uses another DAT inode during garbage collection to ensure
    atomicity and consistency of the DAT in the transient state. This
    twin inode is called GCDAT.

    This adds functions to initialize the GCDAT and to switch page caches
    and B-tree node caches between these two inodes.

    Signed-off-by: Seiji Kihara
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Yoshiji Amagai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi