22 May, 2007

1 commit


17 May, 2007

2 commits

  • grow_dev_page() simply passes GFP_NOFS to find_or_create_page. This means
    the allocation of radix tree nodes is done with GFP_NOFS and the allocation
    of a new page is done using GFP_NOFS.

    The mapping has a flags field that contains the necessary allocation flags
    for the page cache allocation. These need to be consulted in order to get
    DMA and HIGHMEM allocations etc right. And yes a blockdev could be
    allowing Highmem allocations if its a ramdisk.

    Cc: Hugh Dickins
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

2 commits

  • Since nonboot CPUs are now disabled after tasks and devices have been
    frozen and the CPU hotplug infrastructure is used for this purpose, we need
    special CPU hotplug notifications that will help the CPU-hotplug-aware
    subsystems distinguish normal CPU hotplug events from CPU hotplug events
    related to a system-wide suspend or resume operation in progress. This
    patch introduces such notifications and causes them to be used during
    suspend and resume transitions. It also changes all of the
    CPU-hotplug-aware subsystems to take these notifications into consideration
    (for now they are handled in the same way as the corresponding "normal"
    ones).

    [oleg@tv-sign.ru: cleanups]
    Signed-off-by: Rafael J. Wysocki
    Cc: Gautham R Shenoy
    Cc: Pavel Machek
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • It's very common for file systems to need to zero part or all of a page,
    the simplist way is just to use kmap_atomic() and memset(). There's
    actually a library function in include/linux/highmem.h that does exactly
    that, but it's confusingly named memclear_highpage_flush(), which is
    descriptive of *how* it does the work rather than what the *purpose* is.
    So this patchset renames the function to zero_user_page(), and calls it
    from the various places that currently open code it.

    This first patch introduces the new function call, and converts all the
    core kernel callsites, both the open-coded ones and the old
    memclear_highpage_flush() ones. Following this patch is a series of
    conversions for each file system individually, per AKPM, and finally a
    patch deprecating the old call. The diffstat below shows the entire
    patchset.

    [akpm@linux-foundation.org: fix a few things]
    Signed-off-by: Nate Diller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nate Diller
     

09 May, 2007

2 commits


08 May, 2007

4 commits

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove duplicate work in kill_bdev().

    It currently invalidates and then truncates the bdev's mapping.
    invalidate_mapping_pages() will opportunistically remove pages from the
    mapping. And truncate_inode_pages() will forcefully remove all pages.

    The only thing truncate doesn't do is flush the bh lrus. So do that
    explicitly. This avoids (very unlikely) but possible invalid lookup
    results if the same bdev is quickly re-issued.

    It also will prevent extreme kernel latencies which are observed when
    blockdevs which have a large amount of pagecache are unmounted, by avoiding
    invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no
    cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
    has one.

    [akpm@linux-foundation.org: restore nrpages==0 optimisation]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
    been used in 6 years (so akpm says).

    find * -name \*.[ch] | xargs grep -l invalidate_bdev |
    while read file; do
    quilt add $file;
    sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
    done

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • __block_write_full_page is calling SetPageUptodate without the page locked.
    This is unusual, but not incorrect, as PG_writeback is still set.

    However the next patch will require that SetPageUptodate always be called with
    the page locked. Simply don't bother setting the page uptodate in this case
    (it is unusual that the write path does such a thing anyway). Instead just
    leave it to the read side to bring the page uptodate when it notices that all
    buffers are uptodate.

    Signed-off-by: Nick Piggin
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

07 Mar, 2007

1 commit

  • This fixes a regression caused by 22c8ca78f20724676b6006232bf06cc3e9299539.

    nobh_prepare_write() no longer marks the page uptodate, so
    nobh_truncate_page() needs to do it.

    Signed-off-by: Dave Kleikamp
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     

21 Feb, 2007

2 commits

  • nobh_prepare_write leaks data similarly to how simple_prepare_write did. Fix
    by not marking the page uptodate until nobh_commit_write time. Again, this
    could break weird use-cases, but none appear to exist in the tree.

    We can safely remove the set_page_dirty, because as the comment says,
    nobh_commit_write does set_page_dirty. If a filesystem wants to allocate
    backing store for a page dirtied via mmap, page_mkwrite is the suggested
    approach.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Andrew noticed that unlocking the page before submitting all buffers for
    writeout could cause problems if the IO completes before we've finished
    messing around with the page buffers, and they subsequently get freed.

    Even if there were no bug, it is a good idea to bring the error case
    into line with the common case here.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

13 Feb, 2007

2 commits

  • While compiling my code with -Wconversion using gcc-trunk, I always get a
    bunch of warrning from headers, here is fix for them:

    __getblk is alawys called with unsigned argument,
    but it takes signed, the same story with __bread,__breadahead and so on.

    Signed-off-by: Tomasz Kvarsin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Kvarsin
     
  • Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a
    bufferhead. Recently, I found the long standing mmap/unwritten extent
    conversion bug, and it was to do with partial page invalidation not clearing
    the unwritten flag from bufferheads attached to the page but beyond EOF. See
    here for a full explaination:

    http://oss.sgi.com/archives/xfs/2006-12/msg00196.html

    The solution I have checked into the XFS dev tree involves duplicating code
    from block_invalidatepage to clear the unwritten flag from the bufferhead(s),
    and then calling block_invalidatepage() to do the rest.

    Christoph suggested that this would be better solved by pushing the unwritten
    flag into the common buffer head flags and just adding the call to
    discard_buffer():

    http://oss.sgi.com/archives/xfs/2006-12/msg00239.html

    The following patch makes BH_Unwritten a first class citizen.

    Signed-off-by: Dave Chinner
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

12 Feb, 2007

2 commits

  • unlock_buffer(), like unlock_page(), must not clear the lock without
    ensuring that the critical section is closed.

    Mingming later sent the same patch, saying:

    We are running SDET benchmark and saw double free issue for ext3 extended
    attributes block, which complains the same xattr block already being freed (in
    ext3_xattr_release_block()). The problem could also been triggered by
    multiple threads loop untar/rm a kernel tree.

    The race is caused by missing a memory barrier at unlock_buffer() before the
    lock bit being cleared, resulting in possible concurrent h_refcounter update.
    That causes a reference counter leak, then later leads to the double free that
    we have seen.

    Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
    bit is being cleared, however, there is no memory barrier *before* the bit is
    cleared. On some arch the h_refcount update instruction and the clear bit
    instruction could be reordered, thus leave the critical section re-entered.

    The race is like this: For example, if the h_refcount is initialized as 1,

    cpu 0: cpu1
    -------------------------------------- -----------------------------------
    lock_buffer() /* test_and_set_bit */
    clear_buffer_locked(bh);
    lock_buffer() /* test_and_set_bit */
    h_refcount = h_refcount+1; /* = 2*/ h_refcount = h_refcount + 1; /*= 2 */
    clear_buffer_locked(bh);
    .... ......

    We lost a h_refcount here. We need a memory barrier before the buffer head lock
    bit being cleared to force the order of the two writes. Please apply.

    Signed-off-by: Nick Piggin
    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Convert all calls to invalidate_inode_pages() into open-coded calls to
    invalidate_mapping_pages().

    Leave the invalidate_inode_pages() wrapper in place for now, marked as
    deprecated.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

30 Jan, 2007

1 commit

  • Fix commit ecdfc9787fe527491baefc22dce8b2dbd5b2908d

    Not to put too fine a point on it, but in a nutshell...

    __set_page_dirty_buffers() | try_to_free_buffers()
    ---------------------------+---------------------------
    | spin_lock(private_lock);
    | drop_bufers()
    | spin_unlock(private_lock);
    spin_lock(private_lock) |
    !page_has_buffers() |
    spin_unlock(private_lock) |
    SetPageDirty() |
    | cancel_dirty_page()

    oops!

    Signed-off-by: Nick Piggin
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

27 Jan, 2007

1 commit

  • It's not pretty, but it appears that ext3 with data=journal will clean
    pages without ever actually telling the VM that they are clean. This,
    in turn, will result in the VM (and balance_dirty_pages() in particular)
    to never realize that the pages got cleaned, and wait forever for an
    event that already happened.

    Technically, this seems to be a problem with ext3 itself, but it used to
    be hidden by 'try_to_free_buffers()' noticing this situation on its own,
    and just working around the filesystem problem.

    This commit re-instates that hack, in order to avoid a regression for
    the 2.6.20 release. This fixes bugzilla 7844:

    http://bugzilla.kernel.org/show_bug.cgi?id=7844

    Peter Zijlstra points out that we should probably retain the debugging
    code that this removes from cancel_dirty_page(), and I agree, but for
    the imminent release we might as well just silence the warning too
    (since it's not a new bug: anything that triggers that warning has been
    around forever).

    Acked-by: Randy Dunlap
    Acked-by: Jens Axboe
    Acked-by: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

12 Jan, 2007

1 commit

  • Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
    xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

    (XFS unlocks the semaphore from a different task, by design. The mutex
    code warns about this)

    Signed-off-by: Dave Chinner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

22 Dec, 2006

1 commit


11 Dec, 2006

3 commits

  • Account for the number of byte writes which this process caused to not happen
    after all.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Accounting writes is fairly simple: whenever a process flips a page from clean
    to dirty, we accuse it of having caused a write to underlying storage of
    PAGE_CACHE_SIZE bytes.

    This may overestimate the amount of writing: the page-dirtying may cause only
    one buffer_head's worth of writeout. Fixing that is possible, but probably a
    bit messy and isn't obviously important.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
    and a few other places. No functional changes.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Dec, 2006

2 commits

  • There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
    prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
    generating compiler warnings of unused symbols, hence forcing people to add
    #ifdefs.

    the compiler can skip truly unused functions just fine:

    text data bss dec hex filename
    1624412 728710 3674856 6027978 5bfaca vmlinux.before
    1624412 728710 3674856 6027978 5bfaca vmlinux.after

    [akpm@osdl.org: topology.c fix]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

17 Oct, 2006

1 commit

  • When IO error happens on metadata buffer, buffer is freed from memory and
    later fsync() is called, filesystems like ext2 fail to report EIO. We

    solve the problem by introducing a pointer to associated address space into
    the buffer_head. When a buffer is removed from a list of metadata buffers
    associated with an address space, IO error is transferred from the buffer to
    the address space, so that fsync can later report it.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

12 Oct, 2006

2 commits

  • A couple of flush_dcache_page()s are missing on the I/O-error paths.

    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Monakhov Dmitriy
     
  • If grow_buffers() is for some reason passed a block number which wants to lie
    outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then it
    will accidentally truncate `index' and will then instnatiate a page at the
    wrong pagecache offset. This causes __getblk_slow() to go into an infinite
    loop.

    This can happen with corrupted disks, or with software errors elsewhere.

    Detect that, and handle it.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

10 Oct, 2006

1 commit

  • This was triggered, but not the fault of, the dirty page accounting
    patches. Suitable for -stable as well, after it goes upstream.

    Unable to handle kernel NULL pointer dereference at virtual address 0000004c
    EIP is at _spin_lock+0x12/0x66
    Call Trace:
    [] __set_page_dirty_buffers+0x15/0xc0
    [] set_page_dirty+0x2c/0x51
    [] set_page_dirty_balance+0xb/0x3b
    [] __do_fault+0x1d8/0x279
    [] __handle_mm_fault+0x125/0x951
    [] do_page_fault+0x440/0x59f
    [] error_code+0x39/0x40
    [] 0x8048a33

    Signed-off-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

01 Oct, 2006

1 commit

  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

26 Sep, 2006

1 commit

  • Tracking of dirty pages in shared writeable mmap()s.

    The idea is simple: write protect clean shared writeable pages, catch the
    write-fault, make writeable and set dirty. On page write-back clean all the
    PTE dirty bits and write protect them once again.

    The implementation is a tad harder, mainly because the default
    backing_dev_info capabilities were too loosely maintained. Hence it is not
    enough to test the backing_dev_info for cap_account_dirty.

    The current heuristic is as follows, a VMA is eligible when:
    - its shared writeable
    (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
    - it is not a 'special' mapping
    (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
    - the backing_dev_info is cap_account_dirty
    mapping_cap_account_dirty(vma->vm_file->f_mapping)
    - f_op->mmap() didn't change the default page protection

    Page from remap_pfn_range() are explicitly excluded because their COW
    semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
    because they don't have a backing store anyway.

    mprotect() is taught about the new behaviour as well. However it overrides
    the last condition.

    Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
    It can be called on any page, but is currently only implemented for mapped
    pages, if the page is found the be of a VMA that accounts dirty pages it will
    also wrprotect the PTE.

    Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
    under ->private_lock. This seems to be safe, since ->private_lock is used to
    serialize access to the buffers, not the page itself. This is needed because
    clear_page_dirty() will call into page_mkclean() and would thereby violate
    locking order.

    [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
    Signed-off-by: Peter Zijlstra
    Cc: Hugh Dickins
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

01 Aug, 2006

1 commit

  • We can immediately bail from invalidate_bdev() if the blockdev has no
    pagecache.

    This solves the huge IPI storms which hald is causing on the big ia64
    machines when it polls CDROM drives.

    Acked-by: Jes Sorensen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Jul, 2006

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
    Remove obsolete #include
    remove obsolete swsusp_encrypt
    arch/arm26/Kconfig typos
    Documentation/IPMI typos
    Kconfig: Typos in net/sched/Kconfig
    v9fs: do not include linux/version.h
    Documentation/DocBook/mtdnand.tmpl: typo fixes
    typo fixes: specfic -> specific
    typo fixes in Documentation/networking/pktgen.txt
    typo fixes: occuring -> occurring
    typo fixes: infomation -> information
    typo fixes: disadvantadge -> disadvantage
    typo fixes: aquire -> acquire
    typo fixes: mecanism -> mechanism
    typo fixes: bandwith -> bandwidth
    fix a typo in the RTC_CLASS help text
    smb is no longer maintained

    Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S

    Linus Torvalds
     
  • This makes nr_dirty a per zone counter. Looping over all processors is
    avoided during writeback state determination.

    The counter aggregation for nr_dirty had to be undone in the NFS layer since
    we summed up the page counts from multiple zones. Someone more familiar with
    NFS should probably review what I have done.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Signed-off-by: Jörn Engel
    Signed-off-by: Adrian Bunk

    Jörn Engel
     

29 Jun, 2006

1 commit


28 Jun, 2006

1 commit

  • - add a proper prototype for the following global function:
    - buffer_init()

    - make the following needlessly global function static:
    - end_buffer_async_write()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

23 Jun, 2006

1 commit

  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe