08 Oct, 2006

1 commit

  • Init list is called with a list parameter that is not equal to the
    cachep->nodelists entry under NUMA if more than one node exists. This is
    fully legitimatei. One may want to populate the list fields before
    switching nodelist pointers.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

06 Oct, 2006

2 commits

  • Add a way for a no_page() handler to request a retry of the faulting
    instruction. It goes back to userland on page faults and just tries again
    in get_user_pages(). I added a cond_resched() in the loop in that later
    case.

    The problem I have with signal and spufs is an actual bug affecting apps and I
    don't see other ways of fixing it.

    In addition, we are having issues with infiniband and 64k pages (related to
    the way the hypervisor deals with some HV cards) that will require us to muck
    around with the MMU from within the IB driver's no_page() (it's a pSeries
    specific driver) and return to the caller the same way using NOPAGE_REFAULT.

    And to add to this, the graphics folks have been following a new approach of
    memory management that involves transparently swapping objects between video
    ram and main meory. To do that, they need installing PTEs from a no_page()
    handler as well and that also requires returning with NOPAGE_REFAULT.

    (For the later, they are currently using io_remap_pfn_range to install one PTE
    from no_page() which is a bit racy, we need to add a check for the PTE having
    already been installed afer taking the lock, but that's ok, they are only at
    the proof-of-concept stage. I'll send a patch adding a "clean" function to do
    that, we can use that from spufs too and get rid of the sparsemem hacks we do
    to create struct page for SPEs. Basically, that provides a generic solution
    for being able to have no_page() map hardware devices, which is something that
    I think sound driver folks have been asking for some time too).

    All of these things depend on having the NOPAGE_REFAULT exit path from
    no_page() handlers.

    Signed-off-by: Benjamin Herrenchmidt
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • Reduce the NUMA text size of mm/slab.o a little on x86 by using a local
    variable to store the result of numa_node_id().

    text data bss dec hex filename
    16858 2584 16 19458 4c02 mm/slab.o (before)
    16804 2584 16 19404 4bcc mm/slab.o (after)

    [akpm@osdl.org: use better names]
    [pbadari@us.ibm.com: fix that]
    Cc: Christoph Lameter
    Signed-off-by: Pekka Enberg
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

05 Oct, 2006

2 commits

  • * master.kernel.org:/pub/scm/linux/kernel/git/davej/configh:
    Remove all inclusions of

    Manually resolved trivial path conflicts due to removed files in
    the sound/oss/ subdirectory.

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6: (292 commits)
    [GFS2] Fix endian bug for de_type
    [GFS2] Initialize SELinux extended attributes at inode creation time.
    [GFS2] Move logging code into log.c (mostly)
    [GFS2] Mark nlink cleared so VFS sees it happen
    [GFS2] Two redundant casts removed
    [GFS2] Remove uneeded endian conversion
    [GFS2] Remove duplicate sb reading code
    [GFS2] Mark metadata reads for blktrace
    [GFS2] Remove iflags.h, use FS_
    [GFS2] Fix code style/indent in ops_file.c
    [GFS2] streamline-generic_file_-interfaces-and-filemap gfs fix
    [GFS2] Remove readv/writev methods and use aio_read/aio_write instead (gfs bits)
    [GFS2] inode-diet: Eliminate i_blksize from the inode structure
    [GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)
    [GFS2] Fix typo in last patch
    [GFS2] Fix direct i/o logic in filemap.c
    [GFS2] Fix bug in Makefiles for lock modules
    [GFS2] Remove (extra) fs_subsys declaration
    [GFS2/DLM] Fix trailing whitespace
    [GFS2] Tidy up meta_io code
    ...

    Linus Torvalds
     

04 Oct, 2006

10 commits

  • - rename ____kmalloc to kmalloc_track_caller so that people have a chance
    to guess what it does just from it's name. Add a comment describing it
    for those who don't. Also move it after kmalloc in slab.h so people get
    less confused when they are just looking for kmalloc - move things around
    in slab.c a little to reduce the ifdef mess.

    [penberg@cs.helsinki.fi: Fix up reversed #ifdef]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Fix kernel-doc and function declaration (missing "void") in
    mm/page_alloc.c.

    Add mm/page_alloc.c to kernel-api.tmpl in DocBook.

    mm/page_alloc.c:2589:38: warning: non-ANSI function declaration of function 'remove_all_active_ranges'

    Signed-off-by: Randy Dunlap
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Spotted by Hugh that hugetlb page is free'ed back to global pool before
    performing any TLB flush in unmap_hugepage_range(). This potentially allow
    threads to abuse free-alloc race condition.

    The generic tlb gather code is unsuitable to use by hugetlb, I just open
    coded a page gathering list and delayed put_page until tlb flush is
    performed.

    Cc: Hugh Dickins
    Signed-off-by: Ken Chen
    Acked-by: William Irwin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     
  • Having min be a signed quantity means gcc can't turn high latency divides
    into shifts. There happen to be two such divides for GFP_ATOMIC (ie.
    networking, ie. important) allocations, one of which depends on the other.
    Fixing this makes code smaller as a bonus.

    Shame on somebody (probably me).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fixes an kerneldoc error.

    Signed-off-by: Henrik Kretzschmar
    Cc: "Randy.Dunlap"
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Kretzschmar
     
  • kbuild explicitly includes this at build time.

    Signed-off-by: Dave Jones

    Dave Jones
     
  • This patch against fixes a spelling mistake ("control" instead of "cotrol").

    Signed-off-by: Michael Opdenacker
    Acked-by: Alan Cox
    Signed-off-by: Adrian Bunk

    Michael Opdenacker
     
  • Many files include the filename at the beginning, serveral used a wrong one.

    Signed-off-by: Uwe Zeisberger
    Signed-off-by: Adrian Bunk

    Uwe Zeisberger
     
  • Randy brought it to my attention that in proper english "can not" should always
    be written "cannot". I donot see any reason to argue, even if I mightnot
    understand why this rule exists. This patch fixes "can not" in several
    Documentation files as well as three Kconfigs.

    Signed-off-by: Matt LaPlante
    Acked-by: Randy Dunlap
    Signed-off-by: Adrian Bunk

    Matt LaPlante
     
  • Signed-off-by: Adrian Bunk

    Matt LaPlante
     

02 Oct, 2006

1 commit


01 Oct, 2006

21 commits

  • Implement lazy MMU update hooks which are SMP safe for both direct and shadow
    page tables. The idea is that PTE updates and page invalidations while in
    lazy mode can be batched into a single hypercall. We use this in VMI for
    shadow page table synchronization, and it is a win. It also can be used by
    PPC and for direct page tables on Xen.

    For SMP, the enter / leave must happen under protection of the page table
    locks for page tables which are being modified. This is because otherwise,
    you end up with stale state in the batched hypercall, which other CPUs can
    race ahead of. Doing this under the protection of the locks guarantees the
    synchronization is correct, and also means that spurious faults which are
    generated during this window by remote CPUs are properly handled, as the page
    fault handler must re-check the PTE under protection of the same lock.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     
  • Change pte_clear_full to a more appropriately named pte_clear_not_present,
    allowing optimizations when not-present mapping changes need not be reflected
    in the hardware TLB for protected page table modes. There is also another
    case that can use it in the fremap code.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     
  • We don't want to read PTEs directly like this after they have been modified,
    as a lazy MMU implementation of direct page tables may not have written the
    updated PTE back to memory yet.

    Signed-off-by: Zachary Amsden
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     
  • The recent fix to invalidate_inode_pages() (git commit 016eb4a) managed to
    unfix invalidate_inode_pages2().

    The problem is that various bits of code in the kernel can take transient refs
    on pages: the page scanner will do this when inspecting a batch of pages, and
    the lru_cache_add() batching pagevecs also hold a ref.

    Net result is transient failures in invalidate_inode_pages2(). This affects
    NFS directory invalidation (observed) and presumably also block-backed
    direct-io (not yet reported).

    Fix it by reverting invalidate_inode_pages2() back to the old version which
    ignores the page refcounts.

    We may come up with something more clever later, but for now we need a 2.6.18
    fix for NFS.

    Cc: Chuck Lever
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This is mostly included for parity with dec_nlink(), where we will have some
    more hooks. This one should stay pretty darn straightforward for now.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • When a filesystem decrements i_nlink to zero, it means that a write must be
    performed in order to drop the inode from the filesystem.

    We're shortly going to have keep filesystems from being remounted r/o between
    the time that this i_nlink decrement and that write occurs.

    So, add a little helper function to do the decrements. We'll tie into it in a
    bit to note when i_nlink hits zero.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This patch cleans up generic_file_*_read/write() interfaces. Christoph
    Hellwig gave me the idea for this clean ups.

    In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
    do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
    to cleanup all variants of generic_file_* routines.

    Final available interfaces:

    generic_file_aio_read() - read handler
    generic_file_aio_write() - write handler
    generic_file_aio_write_nolock() - no lock write handler

    __generic_file_aio_write_nolock() - internal worker routine

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This patch removes readv() and writev() methods and replaces them with
    aio_read()/aio_write() methods.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This patch vectorizes aio_read() and aio_write() methods to prepare for
    collapsing all aio & vectored operations into one interface - which is
    aio_read()/aio_write().

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Cc: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • One of idiomatic ways to duplicate a region of memory is

    dst = kmalloc(len, GFP_KERNEL);
    if (!dst)
    return -ENOMEM;
    memcpy(dst, src, len);

    which is neat code except a programmer needs to write size twice. Which
    sometimes leads to mistakes. If len passed to kmalloc is smaller that len
    passed to memcpy, it's straight overwrite-beyond-end. If len passed to
    memcpy is smaller than len passed to kmalloc, it's either a) legit
    behaviour ;-), or b) cloned buffer will contain garbage in second half.

    Slight trolling of commit lists shows several duplications bugs
    done exactly because of diverged lenghts:

    Linux:
    [CRYPTO]: Fix memcpy/memset args.
    [PATCH] memcpy/memset fixes
    OpenBSD:
    kerberosV/src/lib/asn1: der_copy.c:1.4

    If programmer is given only one place to play with lengths, I believe, such
    mistakes could be avoided.

    With kmemdup, the snippet above will be rewritten as:

    dst = kmemdup(src, len, GFP_KERNEL);
    if (!dst)
    return -ENOMEM;

    This also leads to smaller code (kzalloc effect). Quick grep shows
    200+ places where kmemdup() can be used.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The api for hot-add memory already has a construct for finding nodes based on
    an address, memory_add_physaddr_to_nid. This patch allows the fucntion to do
    something besides return 0. It uses the nodes_add infomation to lookup to
    node info for a hot add event.

    Signed-off-by: Keith Mannthey
    Cc: KAMEZAWA Hiroyuki
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keith Mannthey
     
  • Migate CONFIG_MEMORY_HOTPLUG to CONFIG_MEMORY_HOTPLUG_SPARSE where needed.

    Signed-off-by: Keith Mannthey
    Cc: KAMEZAWA Hiroyuki
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keith Mannthey
     
  • Create Kconfig namespace for MEMORY_HOTPLUG_RESERVE and MEMORY_HOTPLUG_SPARSE.
    This is needed to create a disticiton between the 2 paths. Selecting the
    high level opiton of MEMORY_HOTPLUG will get you MEMORY_HOTPLUG_SPARSE if you
    have sparsemem enabled or MEMORY_HOTPLUG_RESERVE if you are x86_64 with
    discontig and ACPI numa support.

    Signed-off-by: Keith Mannthey
    Cc: KAMEZAWA Hiroyuki
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keith Mannthey
     
  • Fix up externs in memory_hotplug.c. Cleanup.

    Signed-off-by: Keith Mannthey
    Cc: KAMEZAWA Hiroyuki
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keith Mannthey
     
  • Don't try and give NULL to fput() in the error handling in do_mmap_pgoff()
    as it'll cause an oops.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Lambert
     
  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Dissociate the generic_writepages() function from the mpage stuff, moving its
    declaration to linux/mm.h and actually emitting a full implementation into
    mm/page-writeback.c.

    The implementation is a partial duplicate of mpage_writepages() with all BIO
    references removed.

    It is used by NFS to do writeback.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Move the bounce buffer code from mm/highmem.c to mm/bounce.c so that it can be
    more easily disabled when the block layer is disabled.

    !!!NOTE!!! There may be a bug in this code: Should init_emergency_pool() be
    contingent on CONFIG_HIGHMEM?

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Stop fallback_migrate_page() from using page_has_buffers() since that might not
    be available. Use PagePrivate() instead since that's more general.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

30 Sep, 2006

3 commits