28 Jul, 2005

4 commits

  • Originally __free_pages_bulk used the relative page number within a zone to
    define its buddies. This meant that to maintain the "maximally aligned"
    requirements (that an allocation of size N will be aligned at least to N
    physically) zones had to also be aligned to 1<
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • The madvise() system call returns -EBADF for areas which does not map to
    files, only for *behaviour* request MADV_WILLNEED.

    According to man pages, madvise returns :

    EBADF - the map exists, but the area maps something that isn't a file.

    Fixes bug 2995.

    Signed-off-by: Suzuki K P
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    suzuki
     
  • Fix bug identifued by Richard Purdie .

    oprofile calls check_user_page_readable() from interrupt context, so we
    deadlock over various VFS locks.

    But check_user_page_readable() doesn't imply either a read or a write of the
    page's contents. Change __follow_page() so that check_user_page_readable()
    can tell __follow_page() that we're not accessing the page's contents, and use
    that info to avoid the troublesome lock-takings.

    Also, make follow_page() inline for the single callsite in memory.c to save a
    bit of stack space.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • All mempolicy changes must be inside the spinlock and readding the rb_erase
    prevents a crash while doing:

    > echo "1" > /tmp/numatest
    > numactl --length=0x4000 --shm /tmp/numatest --localalloc
    > numactl --length=0x2000 --offset=0 --shm /tmp/numatest --membind=0
    > numactl --length=0x2000 --offset=0x2000 --shm /tmp/numatest --membind=1
    > ipcs
    > ipcrm -M "the_key_value_of_this_shm_area"

    Based on a patch by John Blackwood

    Cc:
    Cc:
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

16 Jul, 2005

1 commit

  • This patch includes feedback from Andrew and Christoph. Thanks for
    taking time to review.

    Use of empty_zero_page was eliminated to fix compilation for architectures
    that don't have it.

    This patch removes setting pages up-to-date in ext2_get_xip_page and all
    bug checks to verify that the page is indeed up to date. Setting the page
    state on mapping to userland is bogus. None of the code patchs involved
    with these pages in mm cares about the page state.

    still on my ToDo list: identify a place outside second extended where
    __inode_direct_access should reside

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     

13 Jul, 2005

1 commit

  • mm/filemap_xip.c: In function `__xip_unmap':
    mm/filemap_xip.c:194: request for member `pte' in something not a structure or union

    Apparently pte_pfn() takes a pte_t, not a pointer to a pte_t. From looking
    at asm/page.h, it seems to be the same on ia32 or ppc (iff
    STRICT_MM_TYPECHECKS is enabled, which is disabled by default on ppc).

    Acked-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

08 Jul, 2005

4 commits


07 Jul, 2005

1 commit

  • This patch used to be in Andrew's tree before the NUMA slab allocator went
    in. Either this patch or the NUMA slab allocator is needed in order for
    kmalloc_node to work correctly.

    pcibus_to_node may be used to generate the node information passed to
    kmalloc_node. pcibus_to_node returns -1 if it was not able to determine
    on which node a pcibus is located. For that case kmalloc_node must
    work like kmalloc.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

29 Jun, 2005

1 commit


28 Jun, 2005

1 commit

  • I spotted this issue while in memmap_init last week. I can't say the
    change has any test coverage by me. start_pfn was formerly used in main
    "for" loop. The fix is replace start_pfn with pfn.

    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Picco
     

26 Jun, 2005

8 commits

  • Linus Torvalds
     
  • 1. Establish a simple API for process freezing defined in linux/include/sched.h:

    frozen(process) Check for frozen process
    freezing(process) Check if a process is being frozen
    freeze(process) Tell a process to freeze (go to refrigerator)
    thaw_process(process) Restart process
    frozen_process(process) Process is frozen now

    2. Remove all references to PF_FREEZE and PF_FROZEN from all
    kernel sources except sched.h

    3. Fix numerous locations where try_to_freeze is manually done by a driver

    4. Remove the argument that is no longer necessary from two function calls.

    5. Some whitespace cleanup

    6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
    cleared before setting PF_FROZEN, recalc_sigpending does not check
    PF_FROZEN).

    This patch does not address the problem of freeze_processes() violating the rule
    that a task may only modify its own flags by setting PF_FREEZE. This is not clean
    in an SMP environment. freeze(process) is therefore not SMP safe!

    Signed-off-by: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This patch makes use of ALIGN() to remove duplicate round-up code.

    Signed-off-by: Nick Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Wilson
     
  • This patch retrieves the max_pfn being used by previous kernel and stores it
    in a safe location (saved_max_pfn) before it is overwritten due to user
    defined memory map. This pfn is used to make sure that user does not try to
    read the physical memory beyond saved_max_pfn.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Here is the fix for the problem described in

    http://bugzilla.kernel.org/show_bug.cgi?id=4721

    Basically, problem is generic_file_buffered_write() is accessing beyond end
    of the iov[] vector after handling the last vector. If we happen to cross
    page boundary, we get a fault.

    I think this simple patch is good enough. If we really don't want to
    depend on the "count", then we need pass nr_segs to
    filemap_set_next_iovec() and decrement it and check it.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • CONFIG_PM_DISK is long gone, but it still managed to survived at few
    places.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Out-of-tree user of remap_pfn_range hit kernel BUG at mm/memory.c:1112! It
    passes an unrounded size to remap_pfn_range, which was okay before 2.6.12,
    but misses remap_pte_range's new end condition. An audit of all the other
    ptwalks confirms that this is the only one so exposed.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Fix a bug on error handling in the direct I/O function.

    Currently, if a file is opened with the O_DIRECT|O_SYNC flag, the write()
    syscall cannot receive the EIO error after an I/O error (SCSI cable is
    disconnected etc.).

    Return values of other points that call generic_osync_inode() are treated
    appropriately.

    Signed-off-by: Hisashi Hifumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hifumi Hisashi
     

24 Jun, 2005

19 commits

  • Make sys_madvice/fadvice return sane with xip.

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • This patch reworks filemap_xip.c with the goal to reduce code duplication
    from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of
    implementing the aio functions, this one implements the synchronous
    read/write functions only. For readv and writev, the generic fallback is
    used. For aio, we rely on the application doing the fallback. Since our
    "synchronous" function does memcpy immediately anyway, there is no
    performance difference between using the fallbacks or implementing each
    operation.

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • - generic_file* file operations do no longer have a xip/non-xip split
    - filemap_xip.c implements a new set of fops that require get_xip_page
    aop to work proper. all new fops are exported GPL-only (don't like to
    see whatever code use those except GPL modules)
    - __xip_unmap now uses page_check_address, which is no longer static
    in rmap.c, and defined in linux/rmap.h
    - mm/filemap.h is now much more clean, plainly having just Linus'
    inline funcs moved here from filemap.c
    - fix includes in filemap_xip to make it build cleanly on i386

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • This patch updates some comments to match code changes.

    Signed-off-by: Martin Waitz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Waitz
     
  • The following patch removes the f_error field and all checks of f_error.

    Trond said:

    f_error was introduced for NFS, and made sense when we were guaranteed
    always to have a file pointer around when write errors occurred. Since
    then, we have (for various reasons) had to introduce the nfs_open_context in
    order to track the file read/write state, and it made sense to move our
    f_error tracking there too.

    Signed-off-by: Christoph Lameter
    Acked-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Here's a small patch to improve the performance of mempool_alloc by only
    initializing the wait queue when we're about to wait.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin LaHaise
     
  • This patch removes redundant VM_ClearReadHint from mm/madvice.c which was
    left there by Prasanna's patch.

    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • This patch creates a new kstrdup library function and changes the "local"
    implementations in several places to use this function.

    Most of the changes come from the sound and net subsystems. The sound part
    had already been acknowledged by Takashi Iwai and the net part by David S.
    Miller.

    I left UML alone for now because I would need more time to read the code
    carefully before making changes there.

    Signed-off-by: Paulo Marques
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paulo Marques
     
  • Patch to allocate the control structures for for ide devices on the node of
    the device itself (for NUMA systems). The patch depends on the Slab API
    change patch by Manfred and me (in mm) and the pcidev_to_node patch that I
    posted today.

    Does some realignment too.

    Signed-off-by: Justin M. Forbes
    Signed-off-by: Christoph Lameter
    Signed-off-by: Pravin Shelar
    Signed-off-by: Shobhit Dayal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Make sparse's initalization be accessible at runtime. This allows sparse
    mappings to be created after boot in a hotplug situation.

    This patch is separated from the previous one just to give an indication how
    much of the sparse infrastructure is *just* for hotplug memory.

    The section_mem_map doesn't really store a pointer. It stores something that
    is convenient to do some math against to get a pointer. It isn't valid to
    just do *section_mem_map, so I don't think it should be stored as a pointer.

    There are a couple of things I'd like to store about a section. First of all,
    the fact that it is !NULL does not mean that it is present. There could be
    such a combination where section_mem_map *is* NULL, but the math gets you
    properly to a real mem_map. So, I don't think that check is safe.

    Since we're storing 32-bit-aligned structures, we have a few bits in the
    bottom of the pointer to play with. Use one bit to encode whether there's
    really a mem_map there, and the other one to tell whether there's a valid
    section there. We need to distinguish between the two because sometimes
    there's a gap between when a section is discovered to be present and when we
    can get the mem_map for it.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Jack Steiner
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • The part of the sparsemem patch which modifies memmap_init_zone() has recently
    become a problem. It changes behavior so that there is a call to
    pfn_to_page() for each individual page inside of a node's range:
    node_start_pfn through node_end_pfn. It used to simply do this once, at the
    beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
    of a node made it necessary to change.

    Mike Kravetz recently wrote a patch which made the NUMA code accept some new
    kinds of layouts. The system's memory was laid out like this, with node 0's
    memory in two pieces: one before and one after node 1's memory:

    Node 0: +++++ +++++
    Node 1: +++++

    Previous behavior before Mike's patch was to assign nodes like this:

    Node 0: 00000 XXXXX
    Node 1: 11111

    Where the 'X' areas were simply thrown away. The new behavior was to make the
    pg_data_t span node 0 across all of its areas, including areas that are really
    node 1's: Node 0: 000000000000000 Node 1: 11111

    This wastes a little bit of mem_map space, but ends up being OK, and more
    fully utilizes the system's memory. memmap_init_zone() initializes all of the
    "struct page"s for node 0, even for the "hole", but those never get used,
    because there is no pfn_to_page() that resolves to those pages. However, only
    calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
    allocated for node0->node_mem_map because:

    struct page *start = pfn_to_page(start_pfn);
    // effectively start = &node->node_mem_map[0]
    for (page = start; page < (start + size); page++) {
    init_page_here();...
    page++;
    }

    Slow, and wasteful, but generally harmless.

    But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
    does):

    for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
    page = pfn_to_page(pfn);
    }

    And you end up trying to initialize node 1's pages too early, along with bogus
    data from node 0. This patch checks for those weird layouts and declines to
    touch the pages, making the more frequent pfn_to_page() calls OK to do.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
    mem_map[] is needed by discontiguous memory machines (like in the old
    CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
    replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
    become a complete replacement.

    A significant advantage over DISCONTIGMEM is that it's completely separated
    from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
    and DISCONTIG are often confused.

    Another advantage is that sparse doesn't require each NUMA node's ranges to be
    contiguous. It can handle overlapping ranges between nodes with no problems,
    where DISCONTIGMEM currently throws away that memory.

    Sparsemem uses an array to provide different pfn_to_page() translations for
    each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
    to be chopped up.

    In order to do quick pfn_to_page() operations, the section number of the page
    is encoded in page->flags. Part of the sparsemem infrastructure enables
    sharing of these bits more dynamically (at compile-time) between the
    page_zone() and sparsemem operations. However, on 32-bit architectures, the
    number of bits is quite limited, and may require growing the size of the
    page->flags type in certain conditions. Several things might force this to
    occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
    memory), an increase in the physical address space, or an increase in the
    number of used page->flags.

    One thing to note is that, once sparsemem is present, the NUMA node
    information no longer needs to be stored in the page->flags. It might provide
    speed increases on certain platforms and will be stored there if there is
    room. But, if out of room, an alternate (theoretically slower) mechanism is
    used.

    This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
    there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
    often have to compile out the same areas of code.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Adrian Bunk
    Signed-off-by: Yasunori Goto
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Allow architectures to indicate that they will be providing hooks to indice
    installed memory areas, memory_present(). Provide prototypes for the i386
    implementation.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • This gives DISCONTIGMEM a bit more help text to explain what it does, not just
    when to choose it.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • I got some feedback from users who think that the new "Memory Model" menu is a
    little invasive. This patch will hide that menu, except when
    CONFIG_EXPERIMENTAL is enabled *or* when an individual architecture wants it.

    An individual arch may want to enable it because they've removed their
    arch-specific DISCONTIG prompt in favor of the mm/Kconfig one.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • The following patch applies on top of 2.6.12-rc2-mm1. It fixes a minor
    user interaction issue, and an early reference to SPARSEMEM.

    This "choice" menu would always default to FLATMEM, as it was listed first.
    Move it to the end so that the other defaults have a chance first.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • There is some confusion that arose when working on SPARSEMEM patch between
    what is needed for DISCONTIG vs. NUMA.

    Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
    All of the current NUMA implementations require an implementation of
    DISCONTIG. Because of this, quite a lot of code which is really needed for
    NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some
    of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
    case.

    Introducing this new NEED_MULTIPLE_NODES config option allows code that is
    needed for both NUMA or DISCONTIG to be separated out from code that is
    specific to DISCONTIG.

    One great advantage of this approach is that it doesn't require every
    architecture to be converted over. All of the current implementations
    should "just work", only the ones implementing SPARSEMEM will have to be
    fixed up.

    The change to free_area_init() makes it work inside, or out of the new
    config option.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • With sparsemem being introduced, we need a central place for new
    memory-related .config options: mm/Kconfig. This allows us to remove many
    of the duplicated arch-specific options.

    The new option, CONFIG_FLATMEM, is there to enable us to detangle NUMA and
    DISCONTIGMEM. This is a requirement for sparsemem because sparsemem uses
    the NUMA code without the presence of DISCONTIGMEM. The sparsemem patches
    use CONFIG_FLATMEM in generic code, so this patch is a requirement before
    applying them.

    Almost all places that used to do '#ifndef CONFIG_DISCONTIGMEM' should use
    '#ifdef CONFIG_FLATMEM' instead.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Generify the value fields in the page_flags. The aim is to allow the location
    and size of these fields to be varied. Additionally we want to move away from
    fixed allocations per field whilst still enforcing the overall bit utilisation
    limits. We rely on the compiler to spot and optimise the accessor functions.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen