09 Sep, 2005

1 commit

  • Run PCI driver initialization on local node

    Instead of adding messy kmalloc_node()s everywhere run the
    PCI driver probe on the node local to the device.

    This would not have helped for IDE, but should for
    other more clean drivers that do more initialization in probe().
    It won't help for drivers that do most of the work
    on first open (like many network drivers)

    Signed-off-by: Andi Kleen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     

05 Sep, 2005

1 commit

  • This patch was recently discussed on linux-mm:
    http://marc.theaimsgroup.com/?t=112085728500002&r=1&w=2

    I inherited a large code base from Ray for page migration. There was a
    small patch in there that I find to be very useful since it allows the
    display of the locality of the pages in use by a process. I reworked that
    patch and came up with a /proc//numa_maps that gives more information
    about the vma's of a process. numa_maps is indexes by the start address
    found in /proc//maps. F.e. with this patch you can see the page use
    of the "getty" process:

    margin:/proc/12008 # cat maps
    00000000-00004000 r--p 00000000 00:00 0
    2000000000000000-200000000002c000 r-xp 00000000 08:04 516 /lib/ld-2.3.3.so
    2000000000038000-2000000000040000 rw-p 00028000 08:04 516 /lib/ld-2.3.3.so
    2000000000040000-2000000000044000 rw-p 2000000000040000 00:00 0
    2000000000058000-2000000000260000 r-xp 00000000 08:04 54707842 /lib/tls/libc.so.6.1
    2000000000260000-2000000000268000 ---p 00208000 08:04 54707842 /lib/tls/libc.so.6.1
    2000000000268000-2000000000274000 rw-p 00200000 08:04 54707842 /lib/tls/libc.so.6.1
    2000000000274000-2000000000280000 rw-p 2000000000274000 00:00 0
    2000000000280000-20000000002b4000 r--p 00000000 08:04 9126923 /usr/lib/locale/en_US.utf8/LC_CTYPE
    2000000000300000-2000000000308000 r--s 00000000 08:04 60071467 /usr/lib/gconv/gconv-modules.cache
    2000000000318000-2000000000328000 rw-p 2000000000318000 00:00 0
    4000000000000000-4000000000008000 r-xp 00000000 08:04 29576399 /sbin/mingetty
    6000000000004000-6000000000008000 rw-p 00004000 08:04 29576399 /sbin/mingetty
    6000000000008000-600000000002c000 rw-p 6000000000008000 00:00 0 [heap]
    60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
    60000ffffff44000-60000ffffff98000 rw-p 60000ffffff44000 00:00 0 [stack]
    a000000000000000-a000000000020000 ---p 00000000 00:00 0 [vdso]

    cat numa_maps
    2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
    2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
    2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
    2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
    2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
    2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
    2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
    2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
    2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
    4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
    6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
    6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
    60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
    60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1

    getty uses ld.so. The first vma is the code segment which is used by 43
    other processes and the pages are evenly distributed over the 4 nodes.

    The second vma is the process specific data portion for ld.so. This is
    only one page.

    The display format is:

    Links to information in /proc//map
    This can be "default" "interleave={}", "prefer=" or "bind={}"
    MaxRef=
    Pages=
    Mapped=
    Anon=
    Nx=

    The content of the proc-file is self-evident. If this would be tied into
    the sparsemem system then the contents of this file would not be too
    useful.

    Signed-off-by: Christoph Lameter
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

02 Aug, 2005

1 commit

  • A kernel BUG() is triggered by a call to set_mempolicy() with a negative
    first argument. This is because the mode is declared as an int, and the
    validity check doesnt check < 0 values. Alternatively, mode could be
    declared as unsigned int or unsigned long.

    Signed-off-by: Eric Dumazet
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

28 Jul, 2005

1 commit

  • All mempolicy changes must be inside the spinlock and readding the rb_erase
    prevents a crash while doing:

    > echo "1" > /tmp/numatest
    > numactl --length=0x4000 --shm /tmp/numatest --localalloc
    > numactl --length=0x2000 --offset=0 --shm /tmp/numatest --membind=0
    > numactl --length=0x2000 --offset=0x2000 --shm /tmp/numatest --membind=1
    > ipcs
    > ipcrm -M "the_key_value_of_this_shm_area"

    Based on a patch by John Blackwood

    Cc:
    Cc:
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

22 Jun, 2005

3 commits

  • Strict mbind's check for currently mapped pages being on node has been
    using a slow loop which re-evaluates pgd, pud, pmd, pte for each entry:
    replace that by a standard four-level page table walk like others in mm.
    Since mmap_sem is held for writing, page_table_lock can be taken at the
    inner level to limit latency.

    Signed-off-by: Hugh Dickins
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Strict mbind's check that pages already mapped are on right node has been
    using pte_page without checking if pfn_valid, and without page_table_lock
    to prevent spurious failures when try_to_unmap_one intervenes between the
    pte_present and the pte_page.

    Signed-off-by: Hugh Dickins
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • This patch modifies the way pagesets in struct zone are managed.

    Each zone has a per-cpu array of pagesets. So any particular CPU has some
    memory in each zone structure which belongs to itself. Even if that CPU is
    not local to that zone.

    So the patch relocates the pagesets for each cpu to the node that is nearest
    to the cpu instead of allocating the pagesets in the (possibly remote) target
    zone. This means that the operations to manage pages on remote zone can be
    done with information available locally.

    We play a macro trick so that non-NUMA pmachines avoid the additional
    pointer chase on the page allocator fastpath.

    AIM7 benchmark on a 32 CPU SGI Altix

    w/o patches:
    Tasks jobs/min jti jobs/min/task real cpu
    1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005
    100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005
    200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005
    300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005
    400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005
    500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005
    600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005
    700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005
    800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005
    900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005
    1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005

    with slab API changes and pageset patch:

    Tasks jobs/min jti jobs/min/task real cpu
    1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005
    100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005
    200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005
    300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005
    400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005
    500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005
    600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005
    700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005
    800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005
    900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005
    1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005

    Signed-off-by: Christoph Lameter
    Signed-off-by: Shobhit Dayal
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

25 Apr, 2005

1 commit

  • zonelist_policy() forgot to mask non-zone bits from gfp when comparing
    zone number with policy_zone.

    ACKed-by: Andi Kleen
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds