27 Jan, 2007

1 commit

  • In __writeback_single_inode(), when we find a locked inode and we're not
    doing a data-integrity sync, we used to just skip writing entirely,
    since we didn't want to wait for the inode to unlock.

    However, there's really no reason to skip writing the data pages, which
    are likely to be the the bulk of the dirty state anyway (and the main
    reason why writeback was started for the non-data-integrity case, of
    course!)

    Acked-by: Nick Piggin
    Cc: Andrew Morton ,
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: David Howells
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Oct, 2006

2 commits


01 Jul, 2006

2 commits

  • Conversion of nr_unstable to a per zone counter

    We need to do some special modifications to the nfs code since there are
    multiple cases of disposition and we need to have a page ref for proper
    accounting.

    This converts the last critical page state of the VM and therefore we need to
    remove several functions that were depending on GET_PAGE_STATE_LAST in order
    to make the kernel compile again. We are only left with event type counters
    in page state.

    [akpm@osdl.org: bugfixes]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This makes nr_dirty a per zone counter. Looping over all processors is
    avoided during writeback state determination.

    The counter aggregation for nr_dirty had to be undone in the NFS layer since
    we summed up the page counts from multiple zones. Someone more familiar with
    NFS should probably review what I have done.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

23 Jun, 2006

2 commits

  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When a writeback_control's `start' and `end' fields are used to
    indicate a one-byte-range starting at file offset zero, the required
    values of .start=0,.end=0 mean that the ->writepages() implementation
    has no way of telling that it is being asked to perform a range
    request. Because we're currently overloading (start == 0 && end == 0)
    to mean "this is not a write-a-range request".

    To make all this sane, the patch changes range of writeback_control.

    So caller does: If it is calling ->writepages() to write pages, it
    sets range (range_start/end or range_cyclic) always.

    And if range_cyclic is true, ->writepages() thinks the range is
    cyclic, otherwise it just uses range_start and range_end.

    This patch does,

    - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
    -1 is usually ok for range_end (type is long long). But, if someone did,

    range_end += val; range_end is "val - 1"
    u64val = range_end >> bits; u64val is "~(0ULL)"

    or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
    things, and uses LLONG_MAX for range_end.

    - All callers of ->writepages() sets range_start/end or range_cyclic.

    - Fix updates of ->writeback_index. It seems already bit strange.
    If it starts at 0 and ended by check of nr_to_write, this last
    index may reduce chance to scan end of file. So, this updates
    ->writeback_index only if range_cyclic is true or whole-file is
    scanned.

    Signed-off-by: OGAWA Hirofumi
    Cc: Nathan Scott
    Cc: Anton Altaparmakov
    Cc: Steven French
    Cc: "Vladimir V. Saveliev"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     

26 Mar, 2006

1 commit


07 Nov, 2005

2 commits

  • Convert to proper kernel-doc format.

    Some have extra blank lines (not allowed immed. after the function name)
    or need blank lines (after all parameters). Function summary must be only
    one line.

    Colon (":") in a function description does weird things (causes kernel-doc
    to think that it's a new section head sadly).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • If the backing_dev_info doesn't have BDI_CAP_NO_WRITEBACK we're not supposed
    to write back an inode's pages. But in this situation write_inode_now()
    refuses to write the inode itself as well. Fix.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Nov, 2005

1 commit

  • When the inode count is zero in inode writeback, the

    WARN_ON(!(inode->i_state & I_WILL_FREE));

    is broken, and needs to test for either I_WILL_FREE|I_FREEING.

    When the inode is in I_FREEING state, it's already out of the visibility
    of the vm so it can't be freed so it doesn't require the __iget and the
    generic_delete_inode path can call the sync internally to the lowlevel
    fs callback during the last iput. So the inode being in I_FREEING is
    also a valid condition for calling the sync with i_count == 0.

    The specific stack trace is this:

    0xc00000007b8fb6e0 0xc00000000010118c .__writeback_single_inode +0x5c
    0xc00000007b8fb6e0 0xc0000000001014dc (lr) .sync_inode +0x3c
    0xc00000007b8fb790 0xc0000000001014dc .sync_inode +0x3c
    0xc00000007b8fb820 0xc0000000001a5020 .ext2_sync_inode +0x64
    0xc00000007b8fb8f0 0xc0000000001a65b4 .ext2_truncate +0x3f8
    0xc00000007b8fba40 0xc0000000001a6940 .ext2_delete_inode +0xdc
    0xc00000007b8fbac0 0xc0000000000f7a5c .generic_delete_inode +0x124
    0xc00000007b8fbb50 0xc0000000000f5fe0 .iput +0xb8
    0xc00000007b8fbbe0 0xc0000000000e9fd4 .sys_unlink +0x2a8
    0xc00000007b8fbd10 0xc00000000001048c .ret_from_syscall_1 +0x0

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

31 Oct, 2005

1 commit

  • list_move(&inode->i_list, &inode_in_use);
    } else {
    list_move(&inode->i_list, &inode_unused);
    + inodes_stat.nr_unused++;
    }
    }
    wake_up_inode(inode);

    Are you sure the above diff is correct? It was added somewhere between
    2.6.5 and 2.6.8. I think it's wrong.

    The only way I can imagine the i_count to be zero in the above path, is
    that I_WILL_FREE is set. And if I_WILL_FREE is set, then we must not
    increase nr_unused. So I believe the above change is buggy and it will
    definitely overstate the number of unused inodes and it should be backed
    out.

    Note that __writeback_single_inode before calling __sync_single_inode, can
    drop the spinlock and we can have both the dirty and locked bitflags clear
    here:

    spin_unlock(&inode_lock);
    __wait_on_inode(inode);
    iput(inode);
    XXXXXXX
    spin_lock(&inode_lock);
    }
    use inode again here

    a construct like the above makes zero sense from a reference counting
    standpoint.

    Either we don't ever use the inode again after the iput, or the
    inode_lock should be taken _before_ executing the iput (i.e. a __iput
    would be required). Taking the inode_lock after iput means the iget was
    useless if we keep using the inode after the iput.

    So the only chance the 2.6 was safe to call __writeback_single_inode
    with the i_count == 0, is that I_WILL_FREE is set (I_WILL_FREE will
    prevent the VM to free the inode in XXXXX).

    Potentially calling the above iput with I_WILL_FREE was also wrong
    because it would recurse in iput_final (the second mainline bug).

    The below (untested) patch fixes the nr_unused accounting, avoids recursing
    in iput when I_WILL_FREE is set and makes sure (with the BUG_ON) that we
    don't corrupt memory and that all holders that don't set I_WILL_FREE, keeps
    a reference on the inode!

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

24 Jun, 2005

1 commit

  • This patch removes O(n^2) super block loops in sync_inodes(),
    sync_filesystems() etc. in favour of using __put_super_and_need_restart()
    which I introduced earlier. We faced a noticably long freezes on sb
    syncing when there are thousands of super blocks in the system.

    Signed-Off-By: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     

01 May, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds