11 Dec, 2006

7 commits

  • This patch introduces users of the round_jiffies() function in the slab code.

    The slab code has a few "run every second" timers for background work; these
    are obviously not timing critical as long as they happen roughly at the right
    frequency.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • The only time it is safe to call aio_complete() is when the ->ki_retry
    function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
    historically done this by relying on its caller to translate positive return
    codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
    conditionals in sync. direct_io_worker() knew when finished_one_bio() was
    going to call aio_complete(). It would reverse the test and wait and free the
    dio in the cases it thought that finished_one_bio() wasn't going to.

    Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
    errno from the submission path but it failed to communicate this to
    finished_one_bio(). direct_io_worker() would return < 0, it's callers
    wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
    future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
    would be called for a second time which can manifest as an oops.

    The previous cleanups have whittled the sync and async completion paths down
    to the point where we can collapse them and clearly reassert the invariant
    that we must only call aio_complete() after returning -EIOCBQUEUED.
    direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
    drop the dio refcount and the aio bio completion path will only call
    aio_complete() when it is the last to drop the dio refcount.
    direct_io_worker() can ensure that it is the last to drop the reference count
    by waiting for bios to drain. It does this for sync ops, of course, and for
    partial dio writes that must fall back to buffered and for aio ops that saw
    errors during submission.

    This means that operations that end up waiting, even if they were issued as
    aio ops, will not call aio_complete() from dio. Instead we return the return
    code of the operation and let the aio core call aio_complete(). This is
    purposely done to fix a bug where AIO DIO file extensions would call
    aio_complete() before their callers have a chance to update i_size.

    Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
    no longer have to translate for it. XFS needs to be careful not to free
    resources that will be used during AIO completion if -EIOCBQUEUED is returned.
    We maintain the previous behaviour of trying to write fs metadata for O_SYNC
    aio+dio writes.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • nfs's ->readpages uses read_cache_pages(). Wire it up there.

    [wfg@mail.ustc.edu.cn: account only successful nfs/fuse reads]
    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Account for the number of byte writes which this process caused to not happen
    after all.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Accounting writes is fairly simple: whenever a process flips a page from clean
    to dirty, we accuse it of having caused a write to underlying storage of
    PAGE_CACHE_SIZE bytes.

    This may overestimate the amount of writing: the page-dirtying may cause only
    one buffer_head's worth of writeout. Fixing that is possible, but probably a
    bit messy and isn't obviously important.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
    and a few other places. No functional changes.

    Cc: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Cc: David Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
    bugzilla 7645. Right: read_zero_pagealigned uses down_read of mmap_sem,
    but another thread's racing read of /dev/zero, or a normal fault, can
    easily set that pte again, in between zap_page_range and zeromap_page_range
    getting there. It's been wrong ever since 2.4.3.

    The simple fix is to use down_write instead, but that would serialize reads
    of /dev/zero more than at present: perhaps some app would be badly
    affected. So instead let zeromap_page_range return the error instead of
    BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
    that case - there's no need to optimize for it.

    Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
    zeromap_page_range), though it really isn't interesting there. And since
    mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
    than -ENOMEM.

    Signed-off-by: Hugh Dickins
    Cc: Ramiro Voicu:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

09 Dec, 2006

7 commits

  • Assign defaults most likely to please a new user:
    1) generate some logging output
    (verbose=2)
    2) avoid injecting failures likely to lock up UI
    (ignore_gfp_wait=1, ignore_gfp_highmem=1)

    Signed-off-by: Don Mullis
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Don Mullis
     
  • This patch provides fault-injection capability for alloc_pages()

    Boot option:

    fail_page_alloc=,,,

    -- specifies the interval of failures.

    -- specifies how often it should fail in percent.

    -- specifies the size of free space where memory can be
    allocated safely in pages.

    -- specifies how many times failures may happen at most.

    Debugfs:

    /debug/fail_page_alloc/interval
    /debug/fail_page_alloc/probability
    /debug/fail_page_alloc/specifies
    /debug/fail_page_alloc/times
    /debug/fail_page_alloc/ignore-gfp-highmem
    /debug/fail_page_alloc/ignore-gfp-wait

    Example:

    fail_page_alloc=10,100,0,-1

    The page allocation (alloc_pages(), ...) fails once per 10 times.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This patch provides fault-injection capability for kmalloc.

    Boot option:

    failslab=,,,

    -- specifies the interval of failures.

    -- specifies how often it should fail in percent.

    -- specifies the size of free space where memory can be
    allocated safely in bytes.

    -- specifies how many times failures may happen at most.

    Debugfs:

    /debug/failslab/interval
    /debug/failslab/probability
    /debug/failslab/specifies
    /debug/failslab/times
    /debug/failslab/ignore-gfp-highmem
    /debug/failslab/ignore-gfp-wait

    Example:

    failslab=10,100,0,-1

    slab allocation (kmalloc(), kmem_cache_alloc(),..) fails once per 10 times.

    Cc: Pekka Enberg
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • This facility provides three entry points:

    ilog2() Log base 2 of unsigned long
    ilog2_u32() Log base 2 of u32
    ilog2_u64() Log base 2 of u64

    These facilities can either be used inside functions on dynamic data:

    int do_something(long q)
    {
    ...;
    y = ilog2(x)
    ...;
    }

    Or can be used to statically initialise global variables with constant values:

    unsigned n = ilog2(27);

    When performing static initialisation, the compiler will report "error:
    initializer element is not constant" if asked to take a log of zero or of
    something not reducible to a constant. They treat negative numbers as
    unsigned.

    When not dealing with a constant, they fall back to using fls() which permits
    them to use arch-specific log calculation instructions - such as BSR on
    x86/x86_64 or SCAN on FRV - if available.

    [akpm@osdl.org: MMC fix]
    Signed-off-by: David Howells
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Herbert Xu
    Cc: David Howells
    Cc: Wojtek Kaniewski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Signed-off-by: Josef Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Sipek
     
  • Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in linux/mm/.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     
  • fallback_alloc() could end up calling cpuset_zone_allowed() with interrupts
    disabled (by code in kmem_cache_alloc_node()), but without __GFP_HARDWALL
    set, leading to a possible call of a sleeping function with interrupts
    disabled.

    This results in the BUG report:

    BUG: sleeping function called from invalid context at kernel/cpuset.c:1520
    in_atomic():0, irqs_disabled():1

    Thanks to Paul Menage for catching this one.

    Signed-off-by: Paul Jackson
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

08 Dec, 2006

26 commits

  • - move some file_operations structs into the .rodata section

    - move static strings from policy_types[] array into the .rodata section

    - fix generic seq_operations usages, so that those structs may be defined
    as "const" as well

    [akpm@osdl.org: couple of fixes]
    Signed-off-by: Helge Deller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Helge Deller
     
  • In time for 2.6.20, we can get rid of this junk.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Burman Yan
     
  • Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
    prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
    generating compiler warnings of unused symbols, hence forcing people to add
    #ifdefs.

    the compiler can skip truly unused functions just fine:

    text data bss dec hex filename
    1624412 728710 3674856 6027978 5bfaca vmlinux.before
    1624412 728710 3674856 6027978 5bfaca vmlinux.after

    [akpm@osdl.org: topology.c fix]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • It has no users and it's doubtful that we'll need it again.

    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Use put_pages_list() instead of opencoding it.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Move process freezing functions from include/linux/sched.h to freezer.h, so
    that modifications to the freezer or the kernel configuration don't require
    recompiling just about everything.

    [akpm@osdl.org: fix ueagle driver]
    Signed-off-by: Nigel Cunningham
    Cc: "Rafael J. Wysocki"
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nigel Cunningham
     
  • Currently swsusp saves the contents of highmem pages by copying them to the
    normal zone which is quite inefficient (eg. it requires two normal pages
    to be used for saving one highmem page). This may be improved by using
    highmem for saving the contents of saveable highmem pages.

    Namely, during the suspend phase of the suspend-resume cycle we try to
    allocate as many free highmem pages as there are saveable highmem pages.
    If there are not enough highmem image pages to store the contents of all of
    the saveable highmem pages, some of them will be stored in the "normal"
    memory. Next, we allocate as many free "normal" pages as needed to store
    the (remaining) image data. We use a memory bitmap to mark the allocated
    free pages (ie. highmem as well as "normal" image pages).

    Now, we use another memory bitmap to mark all of the saveable pages
    (highmem as well as "normal") and the contents of the saveable pages are
    copied into the image pages. Then, the second bitmap is used to save the
    pfns corresponding to the saveable pages and the first one is used to save
    their data.

    During the resume phase the pfns of the pages that were saveable during the
    suspend are loaded from the image and used to mark the "unsafe" page
    frames. Next, we try to allocate as many free highmem page frames as to
    load all of the image data that had been in the highmem before the suspend
    and we allocate so many free "normal" page frames that the total number of
    allocated free pages (highmem and "normal") is equal to the size of the
    image. While doing this we have to make sure that there will be some extra
    free "normal" and "safe" page frames for two lists of PBEs constructed
    later.

    Now, the image data are loaded, if possible, into their "original" page
    frames. The image data that cannot be written into their "original" page
    frames are loaded into "safe" page frames and their "original" kernel
    virtual addresses, as well as the addresses of the "safe" pages containing
    their copies, are stored in one of two lists of PBEs.

    One list of PBEs is for the copies of "normal" suspend pages (ie. "normal"
    pages that were saveable during the suspend) and it is used in the same way
    as previously (ie. by the architecture-dependent parts of swsusp). The
    other list of PBEs is for the copies of highmem suspend pages. The pages
    in this list are restored (in a reversible way) right before the
    arch-dependent code is called.

    Signed-off-by: Rafael J. Wysocki
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Make swsusp use block device offsets instead of swap offsets to identify swap
    locations and make it use the same code paths for writing as well as for
    reading data.

    This allows us to use the same code for handling swap files and swap
    partitions and to simplify the code, eg. by dropping rw_swap_page_sync().

    Signed-off-by: Rafael J. Wysocki
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • The Linux kernel handles swap files almost in the same way as it handles swap
    partitions and there are only two differences between these two types of swap
    areas:

    (1) swap files need not be contiguous,

    (2) the header of a swap file is not in the first block of the partition
    that holds it. From the swsusp's point of view (1) is not a problem,
    because it is already taken care of by the swap-handling code, but (2) has
    to be taken into consideration.

    In principle the location of a swap file's header may be determined with the
    help of appropriate filesystem driver. Unfortunately, however, it requires
    the filesystem holding the swap file to be mounted, and if this filesystem is
    journaled, it cannot be mounted during a resume from disk. For this reason we
    need some other means by which swap areas can be identified.

    For example, to identify a swap area we can use the partition that holds the
    area and the offset from the beginning of this partition at which the swap
    header is located.

    The following patch allows swsusp to identify swap areas this way. It changes
    swap_type_of() so that it takes an additional argument representing an offset
    of the swap header within the partition represented by its first argument.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Make radix tree lookups safe to be performed without locks. Readers are
    protected against nodes being deleted by using RCU based freeing. Readers
    are protected against new node insertion by using memory barriers to ensure
    the node itself will be properly written before it is visible in the radix
    tree.

    Each radix tree node keeps a record of their height (above leaf nodes).
    This height does not change after insertion -- when the radix tree is
    extended, higher nodes are only inserted in the top. So a lookup can take
    the pointer to what is *now* the root node, and traverse down it even if
    the tree is concurrently extended and this node becomes a subtree of a new
    root.

    "Direct" pointers (tree height of 0, where root->rnode points directly to
    the data item) are handled by using the low bit of the pointer to signal
    whether rnode is a direct pointer or a pointer to a radix tree node.

    When a reader wants to traverse the next branch, they will take a copy of
    the pointer. This pointer will be either NULL (and the branch is empty) or
    non-NULL (and will point to a valid node).

    [akpm@osdl.org: cleanups]
    [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
    [clameter@sgi.com: build fix]
    Signed-off-by: Nick Piggin
    Cc: "Paul E. McKenney"
    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Currently we we use the lru head link of the second page of a compound page
    to hold its destructor. This was ok when it was purely an internal
    implmentation detail. However, hugetlbfs overrides this destructor
    violating the layering. Abstract this out as explicit calls, also
    introduce a type for the callback function allowing them to be type
    checked. For each callback we pre-declare the function, causing a type
    error on definition rather than on use elsewhere.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Currently we simply attempt to allocate from all allowed nodes using
    GFP_THISNODE. However, GFP_THISNODE does not do reclaim (it wont do any at
    all if the recent GFP_THISNODE patch is accepted). If we truly run out of
    memory in the whole system then fallback_alloc may return NULL although
    memory may still be available if we would perform more thorough reclaim.

    This patch changes fallback_alloc() so that we first only inspect all the
    per node queues for available slabs. If we find any then we allocate from
    those. This avoids slab fragmentation by first getting rid of all partial
    allocated slabs on every node before allocating new memory.

    If we cannot satisfy the allocation from any per node queue then we extend
    a slab. We now call into the page allocator without specifying
    GFP_THISNODE. The page allocator will then implement its own fallback (in
    the given cpuset context), perform necessary reclaim (again considering not
    a single node but the whole set of allowed nodes) and then return pages for
    a new slab.

    We identify from which node the pages were allocated and then insert the
    pages into the corresponding per node structure. In order to do so we need
    to modify cache_grow() to take a parameter that specifies the new slab.
    kmem_getpages() can no longer set the GFP_THISNODE flag since we need to be
    able to use kmem_getpage to allocate from an arbitrary node. GFP_THISNODE
    needs to be specified when calling cache_grow().

    One key advantage is that the decision from which node to allocate new
    memory is removed from slab fallback processing. The patch allows to go
    back to use of the page allocators fallback/reclaim logic.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The intent of GFP_THISNODE is to make sure that an allocation occurs on a
    particular node. If this is not possible then NULL needs to be returned so
    that the caller can choose what to do next on its own (the slab allocator
    depends on that).

    However, GFP_THISNODE currently triggers reclaim before returning a failure
    (GFP_THISNODE means GFP_NORETRY is set). If we have over allocated a node
    then we will currently do some reclaim before returning NULL. The caller
    may want memory from other nodes before reclaim should be triggered. (If
    the caller wants reclaim then he can directly use __GFP_THISNODE instead).

    There is no flag to avoid reclaim in the page allocator and adding yet
    another GFP_xx flag would be difficult given that we are out of available
    flags.

    So just compare and see if all bits for GFP_THISNODE (__GFP_THISNODE,
    __GFP_NORETRY and __GFP_NOWARN) are set. If so then we return NULL before
    waking up kswapd.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This addresses two issues:

    1. Kmalloc_node() may intermittently return NULL if we are allocating
    from the current node and are unable to obtain memory for the current
    node from the page allocator. This is because we call ___cache_alloc()
    if nodeid == numa_node_id() and ____cache_alloc is not able to fallback
    to other nodes.

    This was introduced in the 2.6.19 development cycle.
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_DMA is an alias of GFP_DMA. This is the last one so we
    remove the leftover comment too.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_LEVEL_MASK is only used internally to the slab and is
    and alias of GFP_LEVEL_MASK.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • It is only used internally in the slab.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • David Binderman and his Intel C compiler rightly observe that
    install_file_pte no longer has any use for its pte_val.

    Signed-off-by: Hugh Dickins
    Cc: d binderman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • These patches introduced new switch statements which are indented contrary
    to the concensus in mm/*.c. Fix them up to match that concensus.

    [PATCH] node local per-cpu-pages
    [PATCH] ZVC: Scale thresholds depending on the size of the system
    commit e7c8d5c9955a4d2e88e36b640563f5d6d5aba48a
    commit df9ecaba3f152d1ea79f2a5e0b87505e03f47590

    Signed-off-by: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • The fsfuzzer found this; with a corrupt small swapfile that claims to have
    many pages:

    [root]# file swap.741.img
    swap.741.img: Linux/i386 swap file (new style) 1 (4K pages) size 1040191487 pages
    [root]# ls -l swap.741.img
    -rw-r--r-- 1 root root 16777216 Nov 22 05:18 swap.741.img

    sys_swapon() will try to vmalloc all those pages, and -then- check to see if
    the file is actually that large:

    if (!(p->swap_map = vmalloc(maxpages * sizeof(short)))) {

    if (swapfilesize && maxpages > swapfilesize) {
    printk(KERN_WARNING
    "Swap area shorter than signature indicates\n");

    It seems to me that it would make more sense to move this test up before
    the vmalloc, with the other checks, to avoid the OOM-killer in this
    situation...

    Signed-off-by: Eric Sandeen
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • NUMA node ids are passed as either int or unsigned int almost exclusivly
    page_to_nid and zone_to_nid both return unsigned long. This is a throw
    back to when page_to_nid was a #define and was thus exposing the real type
    of the page flags field.

    In addition to fixing up the definitions of page_to_nid and zone_to_nid I
    audited the users of these functions identifying the following incorrect
    uses:

    1) mm/page_alloc.c show_node() -- printk dumping the node id,
    2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
    against numa_node_id() which returns an int from cpu_to_node(), and
    3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
    uses bit_set which in generic code takes an int.

    Signed-off-by: Andy Whitcroft
    Cc: Christoph Lameter
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • drain_node_pages() currently drains the complete pageset of all pages. If
    there are a large number of pages in the queues then we may hold off
    interrupts for too long.

    Duplicate the method used in free_hot_cold_page. Only drain pcp->batch
    pages at one time.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter