24 Jun, 2005

18 commits

  • Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
    mem_map[] is needed by discontiguous memory machines (like in the old
    CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
    replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
    become a complete replacement.

    A significant advantage over DISCONTIGMEM is that it's completely separated
    from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
    and DISCONTIG are often confused.

    Another advantage is that sparse doesn't require each NUMA node's ranges to be
    contiguous. It can handle overlapping ranges between nodes with no problems,
    where DISCONTIGMEM currently throws away that memory.

    Sparsemem uses an array to provide different pfn_to_page() translations for
    each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
    to be chopped up.

    In order to do quick pfn_to_page() operations, the section number of the page
    is encoded in page->flags. Part of the sparsemem infrastructure enables
    sharing of these bits more dynamically (at compile-time) between the
    page_zone() and sparsemem operations. However, on 32-bit architectures, the
    number of bits is quite limited, and may require growing the size of the
    page->flags type in certain conditions. Several things might force this to
    occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
    memory), an increase in the physical address space, or an increase in the
    number of used page->flags.

    One thing to note is that, once sparsemem is present, the NUMA node
    information no longer needs to be stored in the page->flags. It might provide
    speed increases on certain platforms and will be stored there if there is
    room. But, if out of room, an alternate (theoretically slower) mechanism is
    used.

    This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
    there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
    often have to compile out the same areas of code.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Adrian Bunk
    Signed-off-by: Yasunori Goto
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Allow architectures to indicate that they will be providing hooks to indice
    installed memory areas, memory_present(). Provide prototypes for the i386
    implementation.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Provide a default implementation for early_pfn_to_nid returning node 0. Allow
    architectures to override this with their own implementation out of
    asm/mmzone.h.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • This patch changes some of the default behavior in the ppc64 Kconfig file
    that was recently changed/added to 2.6.12-rc2-mm1 by Dave Hansen in
    preparation for SPARSEMEM. Patch allows the display of both FLAT and
    DISCONTIG models on pseries. As before, default is DISCONTIG for SMP and
    PSERIES and FLAT for others.

    Signed-off-by: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • This gives DISCONTIGMEM a bit more help text to explain what it does, not just
    when to choose it.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • I got some feedback from users who think that the new "Memory Model" menu is a
    little invasive. This patch will hide that menu, except when
    CONFIG_EXPERIMENTAL is enabled *or* when an individual architecture wants it.

    An individual arch may want to enable it because they've removed their
    arch-specific DISCONTIG prompt in favor of the mm/Kconfig one.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This used to be used to disable FLATMEM selection, but I decided to change it
    to be done generically when DISCONTIG is enabled. The option is unused, so
    this kills it.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • The following patch applies on top of 2.6.12-rc2-mm1. It fixes a minor
    user interaction issue, and an early reference to SPARSEMEM.

    This "choice" menu would always default to FLATMEM, as it was listed first.
    Move it to the end so that the other defaults have a chance first.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • There is some confusion that arose when working on SPARSEMEM patch between
    what is needed for DISCONTIG vs. NUMA.

    Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
    All of the current NUMA implementations require an implementation of
    DISCONTIG. Because of this, quite a lot of code which is really needed for
    NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some
    of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
    case.

    Introducing this new NEED_MULTIPLE_NODES config option allows code that is
    needed for both NUMA or DISCONTIG to be separated out from code that is
    specific to DISCONTIG.

    One great advantage of this approach is that it doesn't require every
    architecture to be converted over. All of the current implementations
    should "just work", only the ones implementing SPARSEMEM will have to be
    fixed up.

    The change to free_area_init() makes it work inside, or out of the new
    config option.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This will at least suppress one prompt that users would have received the
    first time they compile with the new DISCONTIG arch option. They'll still
    get the "Memory Model" prompt, but 99% of them will have the default work
    there.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • For all architectures, this just means that you'll see a "Memory Model"
    choice in your architecture menu. For those that implement DISCONTIGMEM,
    you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool
    y" and make your users select DISCONTIGMEM right out of the new choice
    menu. The only disadvantage might be if you have some specific things that
    you need in your help option to explain something about DISCONTIGMEM.

    Signed-off-by: Dave Hansen
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • With sparsemem being introduced, we need a central place for new
    memory-related .config options: mm/Kconfig. This allows us to remove many
    of the duplicated arch-specific options.

    The new option, CONFIG_FLATMEM, is there to enable us to detangle NUMA and
    DISCONTIGMEM. This is a requirement for sparsemem because sparsemem uses
    the NUMA code without the presence of DISCONTIGMEM. The sparsemem patches
    use CONFIG_FLATMEM in generic code, so this patch is a requirement before
    applying them.

    Almost all places that used to do '#ifndef CONFIG_DISCONTIGMEM' should use
    '#ifdef CONFIG_FLATMEM' instead.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • discontig.c has some assumptions that mem_map[]s inside of a node are
    contiguous. Teach it to make sure that each region that it's bringing online
    is actually made up of valid ranges of ram.

    Written-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Generify the value fields in the page_flags. The aim is to allow the location
    and size of these fields to be varied. Additionally we want to move away from
    fixed allocations per field whilst still enforcing the overall bit utilisation
    limits. We rely on the compiler to spot and optimise the accessor functions.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Introduce a simple allocator for the NUMA remap space. This space is very
    scarce, used for structures which are best allocated node local.

    This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
    the pgdat->node_mem_map initialized in a consistent place for all
    architectures.

    Issues:
    o alloc_remap takes a node_id where we might expect a pgdat which was intended
    to allow us to allocate the pgdat's using this mechanism; which we do not yet
    do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • The following four patches provide the last needed changes before the
    introduction of sparsemem. For a more complete description of what this
    will do, please see this patch:

    http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch

    or previous posts on the subject:
    http://marc.theaimsgroup.com/?t=110868540700001&r=1&w=2
    http://marc.theaimsgroup.com/?l=linux-mm&m=109897373315016&w=2

    Three of these are i386-only, but one of them reorganizes the macros
    used to manage the space in page->flags, and will affect all platforms.
    There are analogous patches to the i386 ones for ppc64, ia64, and
    x86_64, but those will be submitted by the normal arch maintainers.

    The combination of the four patches has been test-booted on a variety of
    i386 hardware, and compiled for ppc64, i386, and x86-64 with about 17
    different .configs. It's also been runtime-tested on ia64 configs (with
    more patches on top).

    This patch:

    We _know_ which node pages in general belong to, at least at a very gross
    level in node_{start,end}_pfn[]. Use those to target the allocations of
    pages.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This patch effectively eliminates direct use of pgdat->node_mem_map outside
    of the DISCONTIG code. On a flat memory system, these fields aren't
    currently used, neither are they on a sparsemem system.

    There was also a node_mem_map(nid) macro on many architectures. Its use
    along with the use of ->node_mem_map itself was not consistent. It has
    been removed in favor of two new, more explicit, arch-independent macros:

    pgdat_page_nr(pgdat, pagenr)
    nid_page_nr(nid, pagenr)

    I called them "pgdat" and "nid" because we overload the term "node" to mean
    "NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I
    believe the newer names are much clearer.

    These macros can be overridden in the sparsemem case with a theoretically
    slower operation using node_start_pfn and pfn_to_page(), instead. We could
    make this the only behavior if people want, but I don't want to change too
    much at once. One thing at a time.

    This patch removes more code than it adds.

    Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
    generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list
    here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/

    Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.

    Signed-off-by: Dave Hansen
    Signed-off-by: Martin J. Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • Linus Torvalds
     

23 Jun, 2005

22 commits

  • This patch fixes an obvious and nasty bug where we could exit the transmit
    routine while holding tx_lock.

    Signed-off-by: Mitch Williams

    Mitch Williams
     
  • Linus Torvalds
     
  • Linus Torvalds
     
  • Don't error out if something "bad" happens when trying to bind a driver to a
    device. We want the sysfs attributes to be present for later when we try to
    tear down the device.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Drivers need to return -ENODEV when they can't bind to a device.
    Anything else stops the "bind a device to a driver" search.

    From: Stelian Pop
    Signed-off-by: Greg Kroah-Hartman

    Stelian Pop
     
  • Use ssleep() / msleep() [as appropriate]
    instead of schedule_timeout() to guarantee the task delays as expected.

    Signed-off-by: Nishanth Aravamudan
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: Maximilian Attems
    Signed-off-by: Domen Puncer
    Signed-off-by: David S. Miller

    Nishanth Aravamudan
     
  • This patch is a follow up to patch 1 regarding "Selective Sub Address
    matching with call user data". It allows use of the Fast-Select-Acceptance
    optional user facility for X.25.

    This patch just implements fast select with no restriction on response
    (NRR). What this means (according to ITU-T Recomendation 10/96 section
    6.16) is that if in an incoming call packet, the relevant facility bits are
    set for fast-select-NRR, then the called DTE can issue a direct response to
    the incoming packet using a call-accepted packet that contains
    call-user-data. This patch allows such a response.

    The called DTE can also respond with a clear-request packet that contains
    call-user-data. However, this feature is currently not implemented by the
    patch.

    How is Fast Select Acceptance used?
    By default, the system does not allow fast select acceptance (as before).
    To enable a response to fast select acceptance,
    After a listen socket in created and bound as follows
    socket(AF_X25, SOCK_SEQPACKET, 0);
    bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
    but before a listen system call is made, the following ioctl should be used.
    ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
    Now the listen system call can be made
    listen(call_soc, 4);
    After this, an incoming-call packet will be accepted, but no call-accepted
    packet will be sent back until the following system call is made on the socket
    that accepts the call
    ioctl(vc_soc,SIOCX25SENDCALLACCPT);
    The network (or cisco xot router used for testing here) will allow the
    application server's call-user-data in the call-accepted packet,
    provided the call-request was made with Fast-select NRR.

    Signed-off-by: Shaun Pereira
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Shaun Pereira
     
  • From: Shaun Pereira

    This is the first (independent of the second) patch of two that I am
    working on with x25 on linux (tested with xot on a cisco router). Details
    are as follows.

    Current state of module:

    A server using the current implementation (2.6.11.7) of the x25 module will
    accept a call request/ incoming call packet at the listening x.25 address,
    from all callers to that address, as long as NO call user data is present
    in the packet header.

    If the server needs to choose to accept a particular call request/ incoming
    call packet arriving at its listening x25 address, then the kernel has to
    allow a match of call user data present in the call request packet with its
    own. This is required when multiple servers listen at the same x25 address
    and device interface. The kernel currently matches ALL call user data, if
    present.

    Current Changes:

    This patch is a follow up to the patch submitted previously by Andrew
    Hendry, and allows the user to selectively control the number of octets of
    call user data in the call request packet, that the kernel will match. By
    default no call user data is matched, even if call user data is present.
    To allow call user data matching, a cudmatchlength > 0 has to be passed
    into the kernel after which the passed number of octets will be matched.
    Otherwise the kernel behavior is exactly as the original implementation.

    This patch also ensures that as is normally the case, no call user data
    will be present in the Call accepted / call connected packet sent back to
    the caller

    Future Changes on next patch:

    There are cases however when call user data may be present in the call
    accepted packet. According to the X.25 recommendation (ITU-T 10/96)
    section 5.2.3.2 call user data may be present in the call accepted packet
    provided the fast select facility is used. My next patch will include this
    fast select utility and the ability to send up to 128 octets call user data
    in the call accepted packet provided the fast select facility is used. I
    am currently testing this, again with xot on linux and cisco.

    Signed-off-by: Shaun Pereira

    (With a fix from Alexey Dobriyan )
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Shaun Pereira
     
  • From: jlamanna@gmail.com

    ebtables.c vfree() checking cleanups.

    Signed-off by: James Lamanna
    Signed-off-by: Domen Puncer
    Signed-off-by: David S. Miller

    James Lamanna
     
  • From: Nishanth Aravamudan

    Use msleep() instead of schedule_timeout() to guarantee the task
    delays as expected. The current code is not wrong, but it does not account for
    early return due to signals, so I think msleep() should be appropriate.

    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Domen Puncer
    Signed-off-by: David S. Miller

    Nishanth Aravamudan
     
  • Signed-off by: Chuck Short
    Signed-off-by: David S. Miller

    Chuck Short
     
  • This patch provides support for registering multiple netpoll clients to the
    same network device. Only one of these clients may register an rx_hook,
    however. In practice, this restriction has not been problematic. It is
    worth mentioning, though, that the current design can be easily extended to
    allow for the registration of multiple rx_hooks.

    The basic idea of the patch is that the rx_np pointer in the netpoll_info
    structure points to the struct netpoll that has rx_hook filled in. Aside
    from this one case, there is no need for a pointer from the struct
    net_device to an individual struct netpoll.

    A lock is introduced to protect the setting and clearing of the np_rx
    pointer. The pointer will only be cleared upon netpoll client module
    removal, and the lock should be uncontested.

    Signed-off-by: Jeff Moyer
    Signed-off-by: David S. Miller

    Jeff Moyer
     
  • This patch introduces a netpoll_info structure, which the struct net_device
    will now point to instead of pointing to a struct netpoll. The reason for
    this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
    should be maintained per net_device, not per netpoll; and 2) this is a first
    step in providing support for multiple netpoll clients to register against the
    same net_device.

    The struct netpoll is now pointed to by the netpoll_info structure. As
    such, the previous behaviour of the code is preserved.

    Signed-off-by: Jeff Moyer
    Signed-off-by: David S. Miller

    Jeff Moyer
     
  • This trivial patch moves the assignment of poll_owner to -1 inside of
    the lock. This fixes a potential SMP race in the code.

    Signed-off-by: Jeff Moyer
    Signed-off-by: David S. Miller

    Jeff Moyer
     
  • The boot_pageset needs to be preserved for hotplugging and for off line
    processors and nodes. Otherwise pointers will point into memory that has
    now a different use. /proc/zoneinfo is currently showing strange results
    if processors / nodes are not present.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Linus Torvalds
     
  • Small patch to save an unecessary call to strlen() : sprintf() gave us
    the length, just trust it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Linus Torvalds
     
  • Since meminfo.bank[] array contains page-aligned start/size, we
    no longer need to explicitly round up/down the addresses when
    converting to PFNs.

    Signed-off-by: Russell King

    Russell King
     
  • Ensure that meminfo.bank[] array contains page-aligned start/size
    information.

    Signed-off-by: Russell King

    Russell King
     
  • After using this facility for a while to test my changes to the
    cipher crypt() layer, I realised that I should've listend to Dave
    and made this thing use CPU cycle counters :) As it is it's too
    jittery for me to feel safe about relying on the results.

    So here is a patch to make it use CPU cycles by default but fall
    back to jiffies if the user specifies a non-zero sec value.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The existing keys used in the speed tests do not pass the 3DES quality check.
    This patch makes it use the template keys instead.

    Other algorithms can supply template keys through the same interface if needed.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu