30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Tony Luck
    Cc: linux-ia64@vger.kernel.org
    Signed-off-by: James Morris

    David Howells
     

05 Feb, 2008

1 commit


12 May, 2007

1 commit


09 May, 2007

1 commit


09 Mar, 2007

2 commits

  • Similar to memory error recovery, when a cache error is consumed
    by a user process terminate the user instead of crashing the system.

    Signed-off-by: Russ Anderson (rja@sgi.com)
    Acked-by: Hidetoshi Seto
    Signed-off-by: Tony Luck

    Russ Anderson
     
  • Jack Steiner noticed that duplicate TLB DTC entries do not cause a
    linux panic. See discussion:

    http://www.gelato.unsw.edu.au/archives/linux-ia64/0307/6108.html

    The current TLB recovery code is recovering from the duplicate itr.d
    dropins, masking the underlying problem. This change modifies
    the MCA recovery code to look for the TLB check signature of the
    duplicate TLB entry and panic in that case.

    Signed-off-by: Russ Anderson (rja@sgi.com)
    Signed-off-by: Tony Luck

    Russ Anderson
     

01 Nov, 2006

1 commit

  • The information in MCA records is filled in slightly differently on
    Montecito than on Madison/McKinley. Usually, the cache check and bus
    check target identifiers have the same address. On Montecito the
    cache check and bus check target identifiers can be different if
    a corrected error (ie SBE or unconsumed poison data) was encountered and
    then an uncorrected error (ie DBE) was consumed. In that case, the
    cache check target identifier is the physical address of the DBE (that
    caused the MCA to surface) while the bus check target identifier is the
    physical address of the SBE. This patch correctly finds the target
    identifier that triggered the MCA.

    If there are multiple valid cache target identifiers in the same
    error record then use the one with the lowest cache level.

    Signed-off-by: Russ Anderson (rja@sgi.com)
    Signed-off-by: Tony Luck

    Russ Anderson
     

27 Sep, 2006

1 commit

  • Printing message to console from MCA/INIT handler is useful,
    however doing oops_in_progress = 1 in them exactly makes
    something in kernel wrong. Especially it sounds ugly if
    system goes wrong after returning from recoverable MCA.

    This patch adds ia64_mca_printk() function that collects
    messages into temporary-not-so-large message buffer during
    in MCA/INIT environment and print them out later, after
    returning to normal context or when handlers determine to
    down the system.

    Also this print function is exported for use in extensional
    MCA handler. It would be useful to describe detail about
    recovery.

    NOTE:
    I don't think it is sane thing if temporary message buffer
    is enlarged enough to hold whole stack dumps from INIT, so
    buffering is disabled during stack dump from INIT-monarch
    (= default_monarch_init_process). please fix it in future.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Russ Anderson
    Signed-off-by: Tony Luck

    Hidetoshi Seto
     

01 Jul, 2006

1 commit


28 Apr, 2006

1 commit

  • When the mca recovery code encounters a condition that makes
    the MCA non-recoverable, print the reason it could not recover.
    This will make it easier to identify why the recovery code did
    not recover.

    Signed-off-by: Russ Anderson
    Signed-off-by: Tony Luck

    Russ Anderson
     

25 Mar, 2006

1 commit

  • Memory errors encountered by user applications may surface
    when the CPU is running in kernel context. The current code
    will not attempt recovery if the MCA surfaces in kernel
    context (privilage mode 0). This patch adds a check for cases
    where the user initiated the load that surfaces in kernel
    interrupt code.

    An example is a user process lauching a load from memory
    and the data in memory had bad ECC. Before the bad data
    gets to the CPU register, and interrupt comes in. The
    code jumps to the IVT interrupt entry point and begins
    execution in kernel context. The process of saving the
    user registers (SAVE_REST) causes the bad data to be loaded
    into a CPU register, triggering the MCA. The MCA surfaces in
    kernel context, even though the load was initiated from
    user context.

    As suggested by David and Tony, this patch uses an exception
    table like approach, puting the tagged recovery addresses in
    a searchable table. One difference from the exception table
    is that MCAs do not surface in precise places (such as with
    a TLB miss), so instead of tagging specific instructions,
    address ranges are registers. A single macro is used to do
    the tagging, with the input parameter being the label
    of the starting address and the macro being the ending
    address. This limits clutter in the code.

    This patch only tags one spot, the interrupt ivt entry.
    Testing showed that spot to be a "heavy hitter" with
    MCAs surfacing while saving user registers. Other spots
    can be added as needed by adding a single macro.

    Signed-off-by: Russ Anderson (rja@sgi.com)
    Signed-off-by: Tony Luck

    Russ Anderson
     

08 Mar, 2006

2 commits

  • When there is no bus check, the return code should be failure, not success.

    Signed-off-by: Russ Anderson (rja@sgi.com)
    Signed-off-by: Tony Luck

    Russ Anderson
     
  • The MCA recovery messages are currently KERN_DEBUG,
    so they don't show up in /var/log/messages (by default).
    Increase the severity to KERN_ERR, for the initial
    message (and also add the physical address to this
    message). Leave the successful isolation message as
    KERN_DEBUG, but increase the severity when isolation
    fails to KERN_CRIT.

    [Russ' patch made these all KERN_CRIT]

    Signed-off-by: Russ Anderson (rja@sgi.com)
    Signed-off-by: Tony Luck

    Russ Anderson
     

10 Feb, 2006

1 commit


11 Nov, 2005

1 commit


09 Nov, 2005

3 commits

  • When a page has a memory uncorrectable ECC error, the recovery
    code wants to prevent the page from being reused. This change
    bumps the reference count to prevent the page from getting back
    on the free list.

    Signed-off-by: Russ Anderson (rja@sgi.com)
    Signed-off-by: Tony Luck

    Russ Anderson
     
  • paddr needs to be shifted by PAGE_SHIFT to be valid
    input for pfn_valid().

    Signed-off-by: Russ Anderson
    Signed-off-by: Tony Luck

    Russ Anderson
     
  • The determination of whether an MCA is recoverable or not must
    be based on the bits set in the PSP (Processor State Parameter).
    The specific bits are shown in the Intel IA-64 Architecture Software
    Developer's Manual, Vol 2, Table 11-6 Software Recovery Bits in
    Processor State Parameter. Those bits should be consistent
    across the entire IA-64 family of processors.

    Signed-off-by: Russ Anderson
    Signed-off-by: Tony Luck

    Russ Anderson
     

23 Sep, 2005

1 commit


17 Sep, 2005

1 commit


12 Sep, 2005

1 commit

  • The bulk of the change. Use per cpu MCA/INIT stacks. Change the SAL
    to OS state (sos) to be per process. Do all the assembler work on the
    MCA/INIT stacks, leaving the original stack alone. Pass per cpu state
    data to the C handlers for MCA and INIT, which also means changing the
    mca_drv interfaces slightly. Lots of verification on whether the
    original stack is usable before converting it to a sleeping process.

    Signed-off-by: Keith Owens
    Signed-off-by: Tony Luck

    Keith Owens
     

04 May, 2005

1 commit

  • Jack Steiner uncovered some opportunities for improvement in
    the MCA recovery code.

    1) Set bsp to save registers on the kernel stack.
    2) Disable interrupts while in the MCA recovery code.
    3) Change the way the user process is killed, to avoid
    a panic in schedule.

    Testing shows that these changes make the recovery code much
    more reliable with the 2.6.12 kernel.

    Signed-off-by: Russ Anderson
    Signed-off-by: Tony Luck

    Russ Anderson
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds