05 Dec, 2011

1 commit

  • There was a mixup when the SGI UV2 hub chip was sent to be
    fabricated, and it ended up with the wrong part number in the
    HRP_NODE_ID mmr. Future versions of the chip will (may) have the
    correct part number. Change the UV infrastructure to recognize
    both part numbers as valid IDs of a UV2 hub chip.

    Signed-off-by: Jack Steiner
    Link: http://lkml.kernel.org/r/20111129210058.GA20452@sgi.com
    Signed-off-by: Ingo Molnar

    Jack Steiner
     

28 Oct, 2011

1 commit


21 Sep, 2011

1 commit

  • This is a workaround for a UV2 hub bug that affects the format of system
    global addresses.

    The GRU API for UV2 was inadvertently broken by a hardware change. The
    format of the physical address used for TLB dropins and for addresses used
    with instructions running in unmapped mode has changed. This change was
    not documented and became apparent only when diags failed running on
    system simulators.

    For UV1, TLB and GRU instruction physical addresses are identical to
    socket physical addresses (although high NASID bits must be OR'ed into the
    address).

    For UV2, socket physical addresses need to be converted. The NODE portion
    of the physical address needs to be shifted so that the low bit is in bit
    39 or bit 40, depending on an MMR value.

    It is not yet clear if this bug will be fixed in a silicon respin. If it
    is fixed, the hub revision will be incremented & the workaround disabled.

    Signed-off-by: Jack Steiner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Thomas Gleixner

    Jack Steiner
     

30 Aug, 2011

1 commit


21 Jun, 2011

4 commits

  • Correct the UV2 broacast assist unit's destination timeout
    period. And the activation status register in UV2 should be
    tested for a destination timeout with a 4, not a 2. The values
    for Active versus Timeout were reversed.

    This patch is critical for TLB shootdown on an Altix UV2 system
    (i.e. the follow-on to the current Altix UV).

    Destination timeout period:
    The period is set in 4 bits of memory-mapped register MISC_CONTROL.
    The left bit toggles base period between 10us and 80us.
    The other 3 bits are the multiplier.
    Decimal 15, hex f, gives the maximum: 7 * 80us

    Signed-off-by: Cliff Wickman
    Link: http://lkml.kernel.org/r/20110621122243.117324443@sgi.com
    Signed-off-by: Ingo Molnar

    cpw@sgi.com
     
  • Remove the large stack-resident cpumask_t from
    reset_with_ipi()'s stack by allocating one per uvhub.

    Due to the limited size of the stack the potentially huge cpumask_t may
    cause stack overrun. We haven't seen it happen yet, but we need to make it
    a practice not to push such structures onto the stack.

    Signed-off-by: Cliff Wickman
    Reviewed-by: Pekka Enberg
    Link: http://lkml.kernel.org/r/20110621122242.832589130@sgi.com
    Signed-off-by: Ingo Molnar

    cpw@sgi.com
     
  • Rename 'bau_targ_hubmask' to 'pnmask' for clarity.

    The BAU distribution bit mask is indexed by pnode number, not hub or
    blade number. This important fact is not clear while the mask is
    called a 'hubmask'.

    Signed-off-by: Cliff Wickman
    Link: http://lkml.kernel.org/r/20110621122242.630995969@sgi.com
    Signed-off-by: Ingo Molnar

    cpw@sgi.com
     
  • Make all the functions in uv_bau.h inline so that it can
    be included in the fake prom (used in simulations).

    If not inlined the unused functions will generate compiler warnings.

    Signed-off-by: Cliff Wickman
    Reviewed-by: Pekka Enberg
    Link: http://lkml.kernel.org/r/20110621122242.230529678@sgi.com
    Signed-off-by: Ingo Molnar

    cpw@sgi.com
     

30 May, 2011

1 commit

  • No code changes. Reformat definitions to make it more readable.

    I fixed alignment of comments in the structure definitions.

    Also aligned comments and most field definitions & values. Also
    sorted the defines for the SHIFT & MASK values for each MMR.
    This make the file visually much more acceptable.

    Some of the symbol names are still quite long. The file is based
    on post-processing of verilog definitions that are used for the
    node controller chip design. Although some symbol names are not
    what I would chose, I would like to maintain compatibility with
    the names used by the chip designers. We have a number of
    cross-reference utilities & having common names is important.

    Signed-off-by: Jack Steiner
    Link: http://lkml.kernel.org/r/20110527145256.GA31224@sgi.com
    Signed-off-by: Ingo Molnar
    --
    arch/x86/include/asm/uv/uv_mmrs.h | 2873 +++++++++++++++++++++-----------------
    1 file changed, 1600 insertions(+), 1273 deletions(-)

    Jack Steiner
     

25 May, 2011

2 commits

  • SGI UV's uv_tlb.c driver has become rather hard to read, with overly large
    functions, non-standard coding style and (way) too long variable, constant
    and function names and non-obvious code flow sequences.

    This patch improves the readability and maintainability of the driver
    significantly, by doing the following strict code cleanups with no side
    effects:

    - Split long functions into shorter logical functions.

    - Shortened some variable and structure member names.

    - Added special functions for reads and writes of MMR regs with
    very long names.

    - Added the 'tunables' table to shortened tunables_write().

    - Added the 'stat_description' table to shorten uv_ptc_proc_write().

    - Pass fewer 'stat' arguments where it can be derived from the 'bcp'
    argument.

    - Function definitions consistent on one line, and inline in few (short) cases.

    - Moved some small structures and an atomic inline function to the header file.

    - Moved some local variables to the blocks where they are used.

    - Updated the copyright date.

    - Shortened uv_write_global_mmr64() etc. using some aliasing; no
    line breaks. Renamed many uv_.. functions that are not exported.

    - Aligned structure fields.
    [ note that not all structures are aligned the same way though; I'd like
    to keep the extensive commenting in some of them. ]

    - Shortened some long structure names.

    - Standard pass/fail exit from init_per_cpu()

    - Vertical alignment for mass initializations.

    - More separation between blocks of code.

    Tested on a 16-processor Altix UV.

    Signed-off-by: Cliff Wickman
    Cc: penberg@kernel.org
    Link: http://lkml.kernel.org/r/E1QOw12-0004MN-Lp@eag09.americas.sgi.com
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • This patch adds support for a new version of the SGI UV hub
    chip. The hub chip is the node controller that connects multiple
    blades into a larger coherent SSI.

    For the most part, UV2 is compatible with UV1. The majority of
    the changes are in the addresses of MMRs and in a few cases, the
    contents of MMRs. These changes are the result in changes in the
    system topology such as node configuration, processor types,
    maximum nodes, physical address sizes, etc.

    Signed-off-by: Jack Steiner
    Link: http://lkml.kernel.org/r/20110511175028.GA18006@sgi.com
    Signed-off-by: Ingo Molnar

    Jack Steiner
     

13 May, 2011

1 commit

  • This is a fix for the SGI Altix-UV Broadcast Assist Unit code,
    which is used for TLB flushing.

    Certain hardware configurations (that customers are ordering)
    cause nasids (numa address space id's) to be non-consecutive.
    Specifically, once you have more than 4 blades in a IRU
    (Individual Rack Unit - or 1/2 rack) but less than the maximum
    of 16, the nasid numbering becomes non-consecutive. This
    currently results in a 'catastrophic error' (CATERR) detected by
    the firmware during OS boot. The BAU is generating an 'INTD'
    request that is targeting a non-existent nasid value. Such
    configurations may also occur when a blade is configured off
    because of hardware errors. (There is one UV hub per blade.)

    This patch is required to support such configurations.

    The problem with the tlb_uv.c code is that is using the
    consecutive hub numbers as indices to the BAU distribution bit
    map. These are simply the ordinal position of the hub or blade
    within its partition. It should be using physical node numbers
    (pnodes), which correspond to the physical nasid values. Use of
    the hub number only works as long as the nasids in the partition
    are consecutive and increase with a stride of 1.

    This patch changes the index to be the pnode number, thus
    allowing nasids to be non-consecutive.
    It also provides a table in local memory for each cpu to
    translate target cpu number to target pnode and nasid.
    And it improves naming to properly reflect 'node' and 'uvhub'
    versus 'nasid'.

    Signed-off-by: Cliff Wickman
    Cc:
    Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.com
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     

10 May, 2011

1 commit

  • This fixes problems seen on UV systems handling NMIs from the
    node controller.

    I isolated the "dazed..." messages that I saw earlier to a bug in
    the BMC on our platform. It was sending NMIs w/o properly setting
    a register that indicated the source of NMI.

    So rather than _assuming_ any unhandled NMI came from the UV system
    maintenance console (SMC), add a check to verify that the SMC actually
    sent the NMI.

    Signed-off-by: Jack Steiner
    Cc: gorcunov@gmail.com
    Cc: dzickus@redhat.com
    Signed-off-by: Ingo Molnar

    Jack Steiner
     

09 Mar, 2011

1 commit


04 Jan, 2011

1 commit

  • Fix a hard-coded limit of a maximum of 16 cpu's per socket.

    The UV Broadcast Assist Unit code initializes by scanning the
    cpu topology of the system and assigning a master cpu for each
    socket and UV hub. That scan had an assumption of a limit of 16
    cpus per socket. With Westmere we are going over that limit.
    The UV hub hardware will allow up to 32.

    If the scan finds the system has gone over that limit it returns
    an error and we print a warning and fall back to doing TLB
    shootdowns without the BAU.

    Signed-off-by: Cliff Wickman
    Cc: # .37.x
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     

18 Nov, 2010

1 commit

  • This patch for SGI UV systems addresses a problem whereby
    interrupt transactions being looped back from a local IOH,
    through the hub to a local CPU can (erroneously) conflict with
    IO port operations and other transactions.

    To workaound this we set a high bit in the APIC IDs used for
    interrupts. This bit appears to be ignored by the sockets, but
    it avoids the conflict in the hub.

    Signed-off-by: Dimitri Sivanich
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    ___

    arch/x86/include/asm/uv/uv_hub.h | 4 ++++
    arch/x86/include/asm/uv/uv_mmrs.h | 19 ++++++++++++++++++-
    arch/x86/kernel/apic/x2apic_uv_x.c | 25 +++++++++++++++++++++++--
    arch/x86/platform/uv/tlb_uv.c | 2 +-
    arch/x86/platform/uv/uv_time.c | 4 +++-
    5 files changed, 49 insertions(+), 5 deletions(-)

    Dimitri Sivanich
     

10 Nov, 2010

1 commit

  • A new version of the SGI UV hub node controller is being
    developed. A few of the MMRs (control registers) that exist on
    the current hub no longer exist on the new hub. Fortunately,
    there are alternate MMRs that are are functionally equivalent
    and that exist on both hubs.

    This patch changes the UV code to use MMRs that exist in BOTH
    versions of the hub node controller.

    Signed-off-by: Jack Steiner
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jack Steiner
     

27 Oct, 2010

1 commit

  • Enable Westmere support on SGI UV. The UV initialization code is dependent on
    the APICID bits. Westmere-EX uses different APIC bit mapping than Nehalem-EX.
    This code reads the apic shift value from a UV MMR to do the proper bit
    decoding to determint the pnode.

    Signed-off-by: Russ Anderson
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Russ Anderson
     

09 Jun, 2010

8 commits

  • Streamline the large uv_flush_send_and_wait() function by use of
    a couple of helper functions.

    And remove some excess comments.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • Make the Broadcast Assist Unit driver use the BAU for TLB
    shootdowns of cpu's on the local uvhub.

    It was previously thought that IPI might be faster to the cpu's
    on the local hub. But the IPI operation would have to follow
    the completion of the BAU broadcast anyway. So we broadcast to
    the local uvhub in all cases except when the current cpu was the
    only local cpu in the mask.

    This simplifies uv_flush_send_and_wait() in that it returns
    either all shootdowns complete, or none.

    Adjust the statistics to account for shootdowns on the local
    uvhub.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • Remove a faulty assumption that a long running BAU request has
    encountered a hardware problem and will never finish.

    Numalink congestion can make a request appear to have
    encountered such a problem, but it is not safe to cancel the
    request. If such a cancel is done but a reply is later received
    we can miss a TLB shootdown.

    We depend upon the max_bau_concurrent 'throttle' to prevent the
    stay-busy case from happening.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • Move some structure definitions from the C code to the BAU
    header file, and change the organization of that header file a
    little.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • Use a pointer from the per-cpu BAU control structure to the
    per-cpu BAU statistics structure.
    We nearly always know the first before needing the second.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • The numalink network can become so congested that TLB shootdown
    using the Broadcast Assist Unit becomes slower than using IPI's.

    In that case, disable the use of the BAU for a period of time.
    The period is tunable. When the period expires the use of the
    BAU is re-enabled. A count of these actions is added to the
    statistics file.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • Make the Broadcast Assist Unit driver's nine tuning values variable by
    making them accessible through a read/write debugfs file.

    The file will normally be mounted as
    /sys/kernel/debug/sgi_uv/bau_tunables. The tunables are kept in each
    cpu's per-cpu BAU structure.

    The patch also does a little name improvement, and corrects the reset of
    two destination timeout counters.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     
  • Calculate the Broadcast Assist Unit's destination timeout period from the
    values in the relevant MMR's.

    Store it in each cpu's per-cpu BAU structure so that a destination
    timeout can be differentiated from a 'plugged' situation in which all
    software ack resources are already allocated and a timeout is pending.
    That case returns an immediate destination error.

    Signed-off-by: Cliff Wickman
    Cc: gregkh@suse.de
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     

17 Apr, 2010

1 commit

  • Fix all sparse warnings in building uv_irq.c.

    arch/x86/kernel/uv_irq.c:46:17: warning: symbol 'uv_irq_chip' was not declared. Should it be static?
    arch/x86/kernel/uv_irq.c:143:50: error: no identifier for function argument
    arch/x86/kernel/uv_irq.c:162:13: error: typename in expression
    arch/x86/kernel/uv_irq.c:162:13: error: undefined identifier 'restrict'
    arch/x86/kernel/uv_irq.c:250:44: error: no identifier for function argument
    arch/x86/kernel/uv_irq.c:260:17: error: typename in expression
    arch/x86/kernel/uv_irq.c:260:17: error: undefined identifier 'restrict'
    arch/x86/kernel/uv_irq.c:233:50: warning: incorrect type in argument 3 (different signedness)
    arch/x86/kernel/uv_irq.c:233:50: expected int *pnode
    arch/x86/kernel/uv_irq.c:233:50: got unsigned int *
    arch/x86/include/asm/uv/uv_hub.h:318:44: warning: incorrect type in argument 2 (different address spaces)
    arch/x86/include/asm/uv/uv_hub.h:318:44: expected void volatile [noderef] *addr
    arch/x86/include/asm/uv/uv_hub.h:318:44: got unsigned long *

    Signed-off-by: Randy Dunlap
    Cc: Dimitri Sivanich
    Cc: Russ Anderson
    Cc: Robin Holt
    Cc: Mike Travis
    Cc: Cliff Wickman
    Cc: Jack Steiner
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

15 Apr, 2010

1 commit

  • - increase performance of the interrupt handler

    - release timed-out software acknowledge resources

    - recover from continuous-busy status due to a hardware issue

    - add a 'throttle' to keep a uvhub from sending more than a
    specified number of broadcasts concurrently (work around the hardware issue)

    - provide a 'nobau' boot command line option

    - rename 'pnode' and 'node' to 'uvhub' (the 'node' terminology
    is ambiguous)

    - add some new statistics about the scope of broadcasts, retries, the
    hardware issue and the 'throttle'

    - split off new function uv_bau_retry_msg() from
    uv_bau_process_message() per community coding style feedback.

    - simplify the argument list to uv_bau_process_message(), per
    community coding style feedback.

    Signed-off-by: Cliff Wickman
    Cc: linux-mm@kvack.org
    Cc: Jack Steiner
    Cc: Russ Anderson
    Cc: Mike Travis
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     

11 Mar, 2010

1 commit


01 Mar, 2010

1 commit


27 Feb, 2010

1 commit

  • Enable NMI on all cpus in UV system and add an NMI handler
    to dump_stack on each cpu.

    By default on x86 all the cpus except the boot cpu have NMI
    masked off. This patch enables NMI on all cpus in UV system
    and adds an NMI handler to dump_stack on each cpu. This
    way if a system hangs we can NMI the machine and get a
    backtrace from all the cpus.

    Version 2: Use x86_platform driver mechanism for nmi init, per
    Ingo's suggestion.

    Version 3: Clean up Ingo's nits.

    Signed-off-by: Russ Anderson
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Russ Anderson
     

06 Feb, 2010

1 commit


16 Jan, 2010

1 commit


08 Jan, 2010

1 commit


29 Dec, 2009

1 commit

  • The wrong address was being used to write the SCIR led regs on
    remote hubs. Also, there was an inconsistency between how BIOS
    and the kernel indexed these regs. Standardize on using the
    lower 6 bits of the APIC ID as the index.

    This patch fixes the problem of writing to an errant address to
    a cpu # >= 64.

    Signed-off-by: Mike Travis
    Reviewed-by: Jack Steiner
    Cc: Robin Holt
    Cc: Linus Torvalds
    Cc: stable@kernel.org
    LKML-Reference:
    [ v2: fix a number of annoying checkpatch artifacts and whitespace noise ]
    Signed-off-by: Ingo Molnar

    Mike Travis
     

18 Dec, 2009

1 commit


16 Dec, 2009

4 commits

  • Create a function to generate the value that is written to the UV hub MMR
    to cause an IPI interrupt to be sent. The function will be used in the
    GRU message queue error recovery code that sends IPIs to nodes in remote
    partitions.

    Signed-off-by: Jack Steiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jack Steiner
     
  • The UV BIOS has moved the location of some of their pointers to the
    "partition reserved page" from memory into a uv hub MMR. The GRU does not
    support bcopy operations from MMR space so we need to special case the MMR
    addresses using VLOAD operations.

    Additionally, the BIOS call for registering a message queue watchlist has
    removed the 'blade' value and eliminated the structure that was being
    passed in. This is also reflected in this patch.

    Signed-off-by: Robin Holt
    Cc: Jack Steiner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     
  • Provide a mechanism for determining if a global physical address is
    pointing to a UV hub MMR.

    Signed-off-by: Robin Holt
    Cc: Jack Steiner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     
  • The UV BIOS has been updated to implement some of our interface
    functionality differently than originally expected. These patches update
    the kernel to the bios implementation and include a few minor bug fixes
    which prevent us from doing significant testing on real hardware.

    This patch:

    For SGI UV systems, translate from a global physical address back to a
    socket physical address. This does nothing to ensure the socket physical
    address is actually addressable by the kernel. That is the responsibility
    of the user of the function.

    Signed-off-by: Robin Holt
    Cc: Jack Steiner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt