17 Oct, 2006

1 commit

  • When IO error happens on metadata buffer, buffer is freed from memory and
    later fsync() is called, filesystems like ext2 fail to report EIO. We

    solve the problem by introducing a pointer to associated address space into
    the buffer_head. When a buffer is removed from a list of metadata buffers
    associated with an address space, IO error is transferred from the buffer to
    the address space, so that fsync can later report it.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

12 Oct, 2006

2 commits

  • A couple of flush_dcache_page()s are missing on the I/O-error paths.

    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Monakhov Dmitriy
     
  • If grow_buffers() is for some reason passed a block number which wants to lie
    outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then it
    will accidentally truncate `index' and will then instnatiate a page at the
    wrong pagecache offset. This causes __getblk_slow() to go into an infinite
    loop.

    This can happen with corrupted disks, or with software errors elsewhere.

    Detect that, and handle it.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

10 Oct, 2006

1 commit

  • This was triggered, but not the fault of, the dirty page accounting
    patches. Suitable for -stable as well, after it goes upstream.

    Unable to handle kernel NULL pointer dereference at virtual address 0000004c
    EIP is at _spin_lock+0x12/0x66
    Call Trace:
    [] __set_page_dirty_buffers+0x15/0xc0
    [] set_page_dirty+0x2c/0x51
    [] set_page_dirty_balance+0xb/0x3b
    [] __do_fault+0x1d8/0x279
    [] __handle_mm_fault+0x125/0x951
    [] do_page_fault+0x440/0x59f
    [] error_code+0x39/0x40
    [] 0x8048a33

    Signed-off-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

01 Oct, 2006

1 commit

  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

26 Sep, 2006

1 commit

  • Tracking of dirty pages in shared writeable mmap()s.

    The idea is simple: write protect clean shared writeable pages, catch the
    write-fault, make writeable and set dirty. On page write-back clean all the
    PTE dirty bits and write protect them once again.

    The implementation is a tad harder, mainly because the default
    backing_dev_info capabilities were too loosely maintained. Hence it is not
    enough to test the backing_dev_info for cap_account_dirty.

    The current heuristic is as follows, a VMA is eligible when:
    - its shared writeable
    (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
    - it is not a 'special' mapping
    (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
    - the backing_dev_info is cap_account_dirty
    mapping_cap_account_dirty(vma->vm_file->f_mapping)
    - f_op->mmap() didn't change the default page protection

    Page from remap_pfn_range() are explicitly excluded because their COW
    semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
    because they don't have a backing store anyway.

    mprotect() is taught about the new behaviour as well. However it overrides
    the last condition.

    Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
    It can be called on any page, but is currently only implemented for mapped
    pages, if the page is found the be of a VMA that accounts dirty pages it will
    also wrprotect the PTE.

    Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
    under ->private_lock. This seems to be safe, since ->private_lock is used to
    serialize access to the buffers, not the page itself. This is needed because
    clear_page_dirty() will call into page_mkclean() and would thereby violate
    locking order.

    [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
    Signed-off-by: Peter Zijlstra
    Cc: Hugh Dickins
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

01 Aug, 2006

1 commit

  • We can immediately bail from invalidate_bdev() if the blockdev has no
    pagecache.

    This solves the huge IPI storms which hald is causing on the big ia64
    machines when it polls CDROM drives.

    Acked-by: Jes Sorensen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Jul, 2006

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
    Remove obsolete #include
    remove obsolete swsusp_encrypt
    arch/arm26/Kconfig typos
    Documentation/IPMI typos
    Kconfig: Typos in net/sched/Kconfig
    v9fs: do not include linux/version.h
    Documentation/DocBook/mtdnand.tmpl: typo fixes
    typo fixes: specfic -> specific
    typo fixes in Documentation/networking/pktgen.txt
    typo fixes: occuring -> occurring
    typo fixes: infomation -> information
    typo fixes: disadvantadge -> disadvantage
    typo fixes: aquire -> acquire
    typo fixes: mecanism -> mechanism
    typo fixes: bandwith -> bandwidth
    fix a typo in the RTC_CLASS help text
    smb is no longer maintained

    Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S

    Linus Torvalds
     
  • This makes nr_dirty a per zone counter. Looping over all processors is
    avoided during writeback state determination.

    The counter aggregation for nr_dirty had to be undone in the NFS layer since
    we summed up the page counts from multiple zones. Someone more familiar with
    NFS should probably review what I have done.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Jörn Engel
    Signed-off-by: Adrian Bunk

    Jörn Engel
     

29 Jun, 2006

1 commit


28 Jun, 2006

1 commit

  • - add a proper prototype for the following global function:
    - buffer_init()

    - make the following needlessly global function static:
    - end_buffer_async_write()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

23 Jun, 2006

1 commit

  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Mar, 2006

1 commit


27 Mar, 2006

6 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
    drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
    Kconfig help: MTD_JEDECPROBE already supports Intel
    Remove ugly debugging stuff
    do_mounts.c: Minor ROOT_DEV comment cleanup
    BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
    BUG_ON() Conversion in mm/mempool.c
    BUG_ON() Conversion in mm/memory.c
    BUG_ON() Conversion in kernel/fork.c
    BUG_ON() Conversion in ipc/sem.c
    BUG_ON() Conversion in fs/ext2/
    BUG_ON() Conversion in fs/hfs/
    BUG_ON() Conversion in fs/dcache.c
    BUG_ON() Conversion in fs/buffer.c
    BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
    BUG_ON() Conversion in md/dm-table.c
    BUG_ON() Conversion in md/dm-path-selector.c
    BUG_ON() Conversion in drivers/isdn
    BUG_ON() Conversion in drivers/char
    BUG_ON() Conversion in drivers/mtd/

    Linus Torvalds
     
  • Pass amount of disk needs to be mapped to get_block(). This way one can
    modify the fs ->get_block() functions to map multiple blocks at the same time.

    [akpm@osdl.org: performance tweak]
    [akpm@osdl.org: remove unneeded assignments]
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
    Update some old and moldy comments in and around the structure as well.

    The b_size increase allows us to perform larger mappings and allocations for
    large I/O requests from userspace, which tie in with other changes allowing
    the get_block_t() interface to map multiple blocks at once.

    Signed-off-by: Nathan Scott
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • The return value of this function is never used, so let's be honest and
    declare it as void.

    Some places where invalidatepage returned 0, I have inserted comments
    suggesting a BUG_ON.

    [akpm@osdl.org: JBD BUG fix]
    [akpm@osdl.org: rework for git-nfs]
    [akpm@osdl.org: don't go BUG in block_invalidate_page()]
    Signed-off-by: Neil Brown
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The only user ignores the return value, and the only instanace
    (block_sync_page) always returns 0...

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • this changes if() BUG(); constructs to BUG_ON() which is
    cleaner and can better optimized away

    Signed-off-by: Eric Sesterhenn
    Signed-off-by: Adrian Bunk

    Eric Sesterhenn
     

26 Mar, 2006

1 commit


24 Mar, 2006

4 commits

  • Pull the guts out of do_fsync() - we can use it elsewhere.

    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • We need set_page_dirty() to return true if it actually transitioned the page
    from a clean to dirty state. This wasn't right in a couple of places. Do a
    kernel-wide audit, fix things up.

    This leaves open the possibility of returning a negative errno from
    set_page_dirty() sometime in the future. But we don't do that at present.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Instead of using for_each_cpu(i), we can use for_each_online_cpu(i).

    When a CPU goes offline (ie removed from online map), it might have a non
    null bh_accounting.nr, so this patch adds a transfer of this counter to an
    online CPU counter.

    We already have a hotcpu_notifier, (function buffer_cpu_notify()), where we
    can do this bh_accounting.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Change the kmem_cache_create calls for certain slab caches to support cpuset
    memory spreading.

    See the previous patches, cpuset_mem_spread, for an explanation of cpuset
    memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support
    for memory spreading.

    The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab
    caches, and buffer_head. This list may change over time. In particular,
    other file system types that are used extensively on large NUMA systems may
    want to allow for spreading their directory and inode slab cache entries.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

23 Mar, 2006

1 commit

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

22 Mar, 2006

1 commit

  • Centralize the page migration functions in anticipation of additional
    tinkering. Creates a new file mm/migrate.c

    1. Extract buffer_migrate_page() from fs/buffer.c

    2. Extract central migration code from vmscan.c

    3. Extract some components from mempolicy.c

    4. Export pageout() and remove_from_swap() from vmscan.c

    5. Make it possible to configure NUMA systems without page migration
    and non-NUMA systems with page migration.

    I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

15 Mar, 2006

1 commit

  • page migration currently simply retries a couple of times if try_to_unmap()
    fails without inspecting the return code.

    However, SWAP_FAIL indicates that the page is in a vma that has the
    VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
    and avoid retrying the migration.

    migrate_page_remove_references() now needs to return a reason why the
    failure occured. So switch migrate_page_remove_references to use -Exx
    style error messages.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

04 Feb, 2006

1 commit


02 Feb, 2006

2 commits

  • The b_private field in buffer heads needs to be zero filled when the
    buffers are allocated. Thanks to Nathan Scott for finding this. It was
    causing problems on systems with both XFS and reiserfs.

    Signed-off-by: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Mason
     
  • Migrate a page with buffers without requiring writeback

    This introduces a new address space operation migratepage() that may be used
    by a filesystem to implement its own version of page migration.

    A version is provided that migrates buffers attached to pages. Some
    filesystems (ext2, ext3, xfs) are modified to utilize this feature.

    The swapper address space operation are modified so that a regular
    migrate_page() will occur for anonymous pages without writeback (migrate_pages
    forces every anonymous page to have a swap entry).

    Signed-off-by: Mike Kravetz
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Jan, 2006

1 commit


15 Jan, 2006

1 commit


12 Jan, 2006

1 commit


10 Jan, 2006

1 commit


09 Jan, 2006

3 commits

  • We've had two instances recently of overflows when doing

    64_bit_value = (32_bit_value << PAGE_CACHE_SHIFT)

    I did a tree-wide grep of `<page_base)

    Cc: Oleg Drokin
    Cc: David Howells
    Cc: David Woodhouse
    Cc:
    Cc: Christoph Hellwig
    Cc: Anton Altaparmakov
    Cc: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Cc: Roman Zippel
    Cc:
    Cc: Miklos Szeredi
    Cc: Russell King
    Cc: Trond Myklebust
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.

    See mm/filemap.c:

    And changes the filemap_write_and_wait() and filemap_write_and_wait_range().

    Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
    returns error. However, even if filemap_fdatawrite() returned an
    error, it may have submitted the partially data pages to the device.
    (e.g. in the case of -ENOSPC)

    Andrew Morton writes,

    If filemap_fdatawrite() returns an error, this might be due to some
    I/O problem: dead disk, unplugged cable, etc. Given the generally
    crappy quality of the kernel's handling of such exceptions, there's a
    good chance that the filemap_fdatawait() will get stuck in D state
    forever.

    So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.

    Trond, could you please review the nfs part? Especially I'm not sure,
    nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.

    Acked-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • This patch changes generic_cont_expand(), in order to share the code
    with fatfs.

    - Use vmtruncate() if ->prepare_write() returns a error.

    Even if ->prepare_write() returns an error, it may already have added some
    blocks. So, this truncates blocks outside of ->i_size by vmtruncate().

    - Add generic_cont_expand_simple().

    The generic_cont_expand_simple() assumes that ->prepare_write() can handle
    the block boundary. With this, we don't need to care the extra byte.

    And for expanding a file size by truncate(), fatfs uses the
    added generic_cont_expand_simple().

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     

07 Nov, 2005

1 commit


31 Oct, 2005

1 commit

  • If a filesystem passes an idiotic blocksize into bread(), __getblk_slow() will
    warn and will return NULL. We have a report (from Hubert Tonneau
    ) of isofs_fill_super() doing this (passing in
    a silly block size) against an unplugged CDROM drive.

    But a couple of __getblk_slow() callers forgot to check for the NULL bh, hence
    oops.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton