02 Jul, 2013

8 commits


26 Jun, 2013

3 commits

  • Critical interrupts are not handled on PPC64 BookE machines,
    so when the first watchdog interrupt fires the machine will
    freeze without a warning until it's rebooted by the second
    watchdog trigger.
    Plus, the interrupt isn't used anyway since the driver
    expects a usermode app to ping the watchdog periodically.

    Signed-off-by: Laurentiu Tudor
    Signed-off-by: Scott Wood

    Tudor Laurentiu
     
  • In case of collision on i2c bus the controller which lost bus mastership
    stays as a slave for all subsequent transfers. This results in the i2c
    controller never writing to the bus for future transactions, resulting
    in i2c transfer timeouts.
    This fix checks for a collision on last I2C transaction and sets the
    I2COM_MASTER bit for the new transaction.

    Signed-off-by: Sachin Surendran
    Signed-off-by: Scott Wood

    Sachin Surendran
     
  • Use the module_i2c_driver() macro to make the code smaller
    and a bit simpler.

    dpatch engine is used to auto generate this patch.
    (https://github.com/weiyj/dpatch)

    Signed-off-by: Wei Yongjun
    Signed-off-by: Scott Wood

    Wei Yongjun
     

25 Jun, 2013

1 commit

  • The Interlaken is a narrow, high speed channelized chip-to-chip interface. To
    facilitate interoperability between a data path device and a look-aside
    co-processor, the Interlaken Look-Aside protocol is defined for short
    transaction-related transfers. Although based on the Interlaken protocol,
    Interlaken Look-Aside is not directly compatible with Interlaken and can be
    considered a different operation mode.

    The Interlaken LA controller connects internal platform to Interlaken serial
    interface. It accepts LA command through software portals, which are system
    memory mapped 4KB spaces. The LA commands are then translated into the
    Interlaken control words and data words, which are sent on TX side to TCAM
    through SerDes lanes.

    Signed-off-by: Joe Liccese
    Signed-off-by: Scott Wood

    Joe Liccese
     

21 Jun, 2013

20 commits

  • Hugepage invalidate involves invalidating multiple hpte entries.
    Optimize the operation using H_BULK_REMOVE on lpar platforms.
    On native, reduce the number of tlb flush.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • We enable only if the we support 16MB page size.

    Reviewed-by: David Gibson
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • We find all the overlapping vma and mark them such that we don't allocate
    hugepage in that range. Also we split existing huge page so that the
    normal page hash can be invalidated and new page faulted in with new
    protection bits.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • With THP we set pmd to none, before we do pte_clear. Hence we can't
    walk page table to get the pte lock ptr and verify whether it is locked.
    THP do take pte lock before calling pte_clear. So we don't change the locking
    rules here. It is that we can't use page table walking to check whether
    pte locks are held with THP.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • GCC is very likely to read the pagetables just once and cache them in
    the local stack or in a register, but it is can also decide to re-read
    the pagetables. The problem is that the pagetable in those places can
    change from under gcc.

    With THP/hugetlbfs the pmd (and pgd for hugetlbfs giga pages) can
    change under gup_fast. The pages won't be freed untill we finish
    gup fast because we have irq disabled and we free these pages via
    rcu callback.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • We need to have irqs disabled to handle all the possible parallel update for
    linux page table without holding locks.

    Events that we are intersted in while walking page tables are
    1) Page fault
    2) umap
    3) THP split
    4) THP collapse

    A) local_irq_disabled:
    ------------------------
    1) page fault:
    A none to valid transition via page fault is not an issue because we
    would either see a none or valid. If it is none, we would error out
    the page table walk. We may need to use on stack values when checking for
    type of page table elements, because if we do

    if (!is_hugepd()) {
    if (!pmd_none() {
    if (pmd_bad() {

    We could take that bad condition because the pmd got converted to a hugepd
    after the !is_hugepd check via a hugetlb fault.

    The right way would be to check for pmd_none higher up or use on stack value.

    2) A valid to none conversion via unmap:
    We can safely walk the upper level table, because we don't remove the the
    page table entries until rcu grace period. So even if we followed a
    wrong pointer we still have the pointer valid till the grace period.

    A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and
    _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent
    pte_clear we take _PAGE_BUSY.

    3) THP split:
    A valid transparent hugepage is converted to nomal page. Before we split we
    do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING
    So when walking page table we need to check for pmd_trans_splitting and
    handle that. The pte returned should also need to be checked for
    _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save
    the value of PTE on stack and check for the flag in the local pte value.
    If we don't have the value set we can safely operate on the local pte value
    and we atomicaly set _PAGE_BUSY.

    4) THP collapse:
    A normal page gets converted to hugepage. In the collapse path, we
    mark the pmd none early (pmdp_clear_flush). With irq disabled, if we
    are aleady walking page table we would see the pmd_none and won't continue.
    If we see a valid PMD, we should still check for _PAGE_PRESENT before
    setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • The deposted PTE page in the second half of the PMD table is used to
    track the state on hash PTEs. After updating the HPTE, we mark the
    coresponding slot in the deposted PTE page valid.

    Reviewed-by: David Gibson
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • Reviewed-by: David Gibson
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • We can find pte that are splitting while walking page tables. Return
    None pte in that case.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
    document why we don't need to handle transparent hugepages at callsites.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • Reviewed-by: David Gibson
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • We will use this in the later patch for handling THP pages

    Reviewed-by: David Gibson
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • We now have pmd entries covering 16MB range and the PMD table double its original size.
    We use the second half of the PMD table to deposit the pgtable (PTE page).
    The depoisted PTE page is further used to track the HPTE information. The information
    include [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
    With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
    4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk
    the PTE page and invalidate all valid HPTEs.

    This patch implements necessary arch specific functions for THP support and also
    hugepage invalidate logic. These PMD related functions are intentionally kept
    similar to their PTE counter-part.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • THP code does PTE page allocation along with large page request and deposit them
    for later use. This is to ensure that we won't have any failures when we split
    hugepages to regular pages.

    On powerpc we want to use the deposited PTE page for storing hash pte slot and
    secondary bit information for the HPTEs. We use the second half
    of the pmd table to save the deposted PTE page.

    Reviewed-by: David Gibson
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • If a hash bucket gets full, we "evict" a more/less random entry from it.
    When we do that we don't invalidate the TLB (hpte_remove) because we assume
    the old translation is still technically "valid". This implies that when
    we are invalidating or updating pte, even if HPTE entry is not valid
    we should do a tlb invalidate. With hugepages, we need to pass the correct
    actual page size value for tlb invalidation.

    This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c
    "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle
    transparent hugepages correctly.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Benjamin Herrenschmidt

    Aneesh Kumar K.V
     
  • The patch creates debugfs entries (powerpc/PCIxxxx/err_injct) for
    injecting EEH errors for testing purpose.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch creates one debugfs directory ("powerpc/PCIxxxx") for
    each PHB so that we can hook EEH error injection debugfs entry
    there in proceeding patch.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch registers OPAL event notifier and process the PCI errors
    from firmware. If we have pending PCI errors, special EEH event
    (without binding PE) will be sent to EEH core for processing.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • While we're restarting or powering off the system, we needn't
    the OPAL notifier any more. So just to disable that.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • This patch implements a notifier to receive a notification on OPAL
    event mask changes. The notifier is only called as a result of an OPAL
    interrupt, which will happen upon reception of FSP messages or PCI errors.
    Any event mask change detected as a result of opal_poll_events() will not
    result in a notifier call.

    [benh: changelog]
    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     

20 Jun, 2013

8 commits

  • It's meaningless to handle frozen PE if we already had fenced PHB.
    The patch intends to check the PHB state before checking PE. If the
    PHB has been put into fenced state, we need take care of that firstly.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch enables EEH check and let EEH core to process the EEH
    errors for PowerNV platform while accessing config space. Originally,
    the implementation already had mechanism to check EEH errors and
    tried to recover from them. However, we never let EEH core to handle
    the EEH errors.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch initializes EEH for PowerNV platform. Because the OPAL
    APIs requires HUB ID, we need trace that through struct pnv_phb.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch adds EEH backends for PowerNV platform. It's notable that
    part of those EEH backends call to the I/O chip dependent backends.

    [Removed pointless change to eeh_pseries.c -- BenH]

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch implements the backend for EEH core to retrieve next
    EEH error to handle. For the informational errors, we won't bother
    the EEH core. Otherwise, the EEH should take appropriate actions
    depending on the return value:

    0 - No further errors detected
    1 - Frozen PE
    2 - Fenced PHB
    3 - Dead PHB
    4 - Dead IOC

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch adds backends to retrieve error log and configure p2p
    bridges for the indicated PE.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch adds the I/O chip backend to do PE reset. For now, we
    focus on PCI bus dependent PE. If PHB PE has been put into error
    state, the PHB will take complete reset. Besides, the root bridge
    will take fundamental or hot reset accordingly if the indicated
    PE locates at the toppest of PCI hierarchy tree. Otherwise, the
    upstream p2p bridge will take hot reset.

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan
     
  • The patch adds I/O chip backend to retrieve the state for the
    indicated PE. While the PE state is temperarily unavailable,
    the upper layer (powernv platform) should return default delay
    (1 second).

    Signed-off-by: Gavin Shan
    Signed-off-by: Benjamin Herrenschmidt

    Gavin Shan