17 Oct, 2007

4 commits

  • Implement nobh in new aops. This is a bit tricky. FWIW, nobh_truncate is
    now implemented in a way that does not create blocks in sparse regions,
    which is a silly thing for it to have been doing (isn't it?)

    ext2 survives fsx and fsstress. jfs is converted as well... ext3
    should be easy to do (but not done yet).

    [akpm@linux-foundation.org: coding-style fixes]
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Rework the generic block "cont" routines to handle the new aops. Supporting
    cont_prepare_write would take quite a lot of code to support, so remove it
    instead (and we later convert all filesystems to use it).

    write_begin gets passed AOP_FLAG_CONT_EXPAND when called from
    generic_cont_expand, so filesystems can avoid the old hacks they used.

    Signed-off-by: Nick Piggin
    Cc: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • These are intended to replace prepare_write and commit_write with more
    flexible alternatives that are also able to avoid the buffered write
    deadlock problems efficiently (which prepare_write is unable to do).

    [mark.fasheh@oracle.com: API design contributions, code review and fixes]
    [akpm@linux-foundation.org: various fixes]
    [dmonakhov@sw.ru: new aop block_write_begin fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Mark Fasheh
    Signed-off-by: Dmitriy Monakhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

19 Jul, 2007

1 commit

  • Many filesystems need a ->page-mkwrite callout to correctly
    set up pages that have been written to by mmap. This is especially
    important when mmap is writing into holes as it allows filesystems
    to correctly account for and allocate space before the mmap
    write is allowed to proceed.

    Protection against truncate races is provided by locking the page
    and checking to see whether the page mapping is correct and whether
    it is beyond EOF so we don't end up allowing allocations beyond
    the current EOF or changing EOF as a result of a mmap write.

    SGI-PV: 940392
    SGI-Modid: 2.6.x-xfs-melb:linux:29146a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     

08 May, 2007

2 commits

  • Remove duplicate work in kill_bdev().

    It currently invalidates and then truncates the bdev's mapping.
    invalidate_mapping_pages() will opportunistically remove pages from the
    mapping. And truncate_inode_pages() will forcefully remove all pages.

    The only thing truncate doesn't do is flush the bh lrus. So do that
    explicitly. This avoids (very unlikely) but possible invalid lookup
    results if the same bdev is quickly re-issued.

    It also will prevent extreme kernel latencies which are observed when
    blockdevs which have a large amount of pagecache are unmounted, by avoiding
    invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no
    cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
    has one.

    [akpm@linux-foundation.org: restore nrpages==0 optimisation]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
    been used in 6 years (so akpm says).

    find * -name \*.[ch] | xargs grep -l invalidate_bdev |
    while read file; do
    quilt add $file;
    sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
    done

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

13 Feb, 2007

2 commits

  • While compiling my code with -Wconversion using gcc-trunk, I always get a
    bunch of warrning from headers, here is fix for them:

    __getblk is alawys called with unsigned argument,
    but it takes signed, the same story with __bread,__breadahead and so on.

    Signed-off-by: Tomasz Kvarsin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Kvarsin
     
  • Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
    bufferhead. Recently, I found the long standing mmap/unwritten extent
    conversion bug, and it was to do with partial page invalidation not clearing
    the unwritten flag from bufferheads attached to the page but beyond EOF. See
    here for a full explaination:

    http://oss.sgi.com/archives/xfs/2006-12/msg00196.html

    The solution I have checked into the XFS dev tree involves duplicating code
    from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
    and then calling block_invalidatepage() to do the rest.

    Christoph suggested that this would be better solved by pushing the unwritten
    flag into the common buffer head flags and just adding the call to
    discard_buffer():

    http://oss.sgi.com/archives/xfs/2006-12/msg00239.html

    The following patch makes BH_Unwritten a first class citizen.

    Signed-off-by: Dave Chinner
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

17 Oct, 2006

1 commit

  • When IO error happens on metadata buffer, buffer is freed from memory and
    later fsync() is called, filesystems like ext2 fail to report EIO. We

    solve the problem by introducing a pointer to associated address space into
    the buffer_head. When a buffer is removed from a list of metadata buffers
    associated with an address space, IO error is transferred from the buffer to
    the address space, so that fsync can later report it.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

01 Oct, 2006

2 commits

  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

28 Jun, 2006

1 commit

  • - add a proper prototype for the following global function:
    - buffer_init()

    - make the following needlessly global function static:
    - end_buffer_async_write()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

27 Mar, 2006

4 commits

  • Pass amount of disk needs to be mapped to get_block(). This way one can
    modify the fs ->get_block() functions to map multiple blocks at the same time.

    [akpm@osdl.org: performance tweak]
    [akpm@osdl.org: remove unneeded assignments]
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
    Update some old and moldy comments in and around the structure as well.

    The b_size increase allows us to perform larger mappings and allocations for
    large I/O requests from userspace, which tie in with other changes allowing
    the get_block_t() interface to map multiple blocks at once.

    Signed-off-by: Nathan Scott
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • The return value of this function is never used, so let's be honest and
    declare it as void.

    Some places where invalidatepage returned 0, I have inserted comments
    suggesting a BUG_ON.

    [akpm@osdl.org: JBD BUG fix]
    [akpm@osdl.org: rework for git-nfs]
    [akpm@osdl.org: don't go BUG in block_invalidate_page()]
    Signed-off-by: Neil Brown
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The only user ignores the return value, and the only instanace
    (block_sync_page) always returns 0...

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

09 Jan, 2006

1 commit

  • This patch changes generic_cont_expand(), in order to share the code
    with fatfs.

    - Use vmtruncate() if ->prepare_write() returns a error.

    Even if ->prepare_write() returns an error, it may already have added some
    blocks. So, this truncates blocks outside of ->i_size by vmtruncate().

    - Add generic_cont_expand_simple().

    The generic_cont_expand_simple() assumes that ->prepare_write() can handle
    the block boundary. With this, we don't need to care the extra byte.

    And for expanding a file size by truncate(), fatfs uses the
    added generic_cont_expand_simple().

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     

31 Oct, 2005

1 commit

  • Fix the problem (BUG 4964) with unmapped buffers in transaction's
    t_sync_data list. The problem is we need to call filesystem's own
    invalidatepage() from block_write_full_page().

    block_write_full_page() must call filesystem's invalidatepage(). Otherwise
    following nasty race can happen:

    proc 1 proc 2
    ------ ------
    - write some new data to 'offset'
    => bh gets to the transactions data list
    - starts truncate
    => i_size set to new size
    - mpage_writepages()
    - ext3_ordered_writepage() to 'offset'
    - block_write_full_page()
    - page->index > end_index+1
    - block_invalidatepage()
    - discard_buffer()
    - clear_buffer_mapped()

    - commit triggers and finds unmapped buffer - BOOM!

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

30 Oct, 2005

1 commit

  • Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
    a many-threaded application which concurrently initializes different parts of
    a large anonymous area.

    This patch corrects that, by using a separate spinlock per page table page, to
    guard the page table entries in that page, instead of using the mm's single
    page_table_lock. (But even then, page_table_lock is still used to guard page
    table allocation, and anon_vma allocation.)

    In this implementation, the spinlock is tucked inside the struct page of the
    page table page: with a BUILD_BUG_ON in case it overflows - which it would in
    the case of 32-bit PA-RISC with spinlock debugging enabled.

    Splitting the lock is not quite for free: another cacheline access. Ideally,
    I suppose we would use split ptlock only for multi-threaded processes on
    multi-cpu machines; but deciding that dynamically would have its own costs.
    So for now enable it by config, at some number of cpus - since the Kconfig
    language doesn't support inequalities, let preprocessor compare that with
    NR_CPUS. But I don't think it's worth being user-configurable: for good
    testing of both split and unsplit configs, split now at 4 cpus, and perhaps
    change that to 8 later.

    There is a benefit even for singly threaded processes: kswapd can be attacking
    one part of the mm while another part is busy faulting.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Oct, 2005

1 commit

  • - ->releasepage() annotated (s/int/gfp_t), instances updated
    - missing gfp_t in fs/* added
    - fixed misannotation from the original sweep caught by bitwise checks:
    XFS used __nocast both for gfp_t and for flags used by XFS allocator.
    The latter left with unsigned int __nocast; we might want to add a
    different type for those but for now let's leave them alone. That,
    BTW, is a case when __nocast use had been actively confusing - it had
    been used in the same code for two different and similar types, with
    no way to catch misuses. Switch of gfp_t to bitwise had caught that
    immediately...

    One tricky bit is left alone to be dealt with later - mapping->flags is
    a mix of gfp_t and error indications. Left alone for now.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

08 Jul, 2005

1 commit

  • Use a bit spin lock in the first buffer of the page to synchronise asynch
    IO buffer completions, instead of the global page_uptodate_lock, which is
    showing some scalabilty problems.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds