12 Feb, 2007

14 commits

  • Values are readily available via ZVC per node and global sums.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Function is unnecessary now. We can use the summing features of the ZVCs to
    get the values we need.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • nr_free_pages is now a simple access to a global variable. Make it a macro
    instead of a function.

    The nr_free_pages now requires vmstat.h to be included. There is one
    occurrence in power management where we need to add the include. Directly
    refrer to global_page_state() there to clarify why the #include was added.

    [akpm@osdl.org: arm build fix]
    [akpm@osdl.org: sparc64 build fix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The global and per zone counter sums are in arrays of longs. Reorder the ZVCs
    so that the most frequently used ZVCs are put into the same cacheline. That
    way calculations of the global, node and per zone vm state touches only a
    single cacheline. This is mostly important for 64 bit systems were one 128
    byte cacheline takes only 8 longs.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This is again simplifies some of the VM counter calculations through the use
    of the ZVC consolidated counters.

    [michal.k.k.piotrowski@gmail.com: build fix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The determination of the dirty ratio to determine writeback behavior is
    currently based on the number of total pages on the system.

    However, not all pages in the system may be dirtied. Thus the ratio is always
    too low and can never reach 100%. The ratio may be particularly skewed if
    large hugepage allocations, slab allocations or device driver buffers make
    large sections of memory not available anymore. In that case we may get into
    a situation in which f.e. the background writeback ratio of 40% cannot be
    reached anymore which leads to undesired writeback behavior.

    This patchset fixes that issue by determining the ratio based on the actual
    pages that may potentially be dirty. These are the pages on the active and
    the inactive list plus free pages.

    The problem with those counts has so far been that it is expensive to
    calculate these because counts from multiple nodes and multiple zones will
    have to be summed up. This patchset makes these counters ZVC counters. This
    means that a current sum per zone, per node and for the whole system is always
    available via global variables and not expensive anymore to calculate.

    The patchset results in some other good side effects:

    - Removal of the various functions that sum up free, active and inactive
    page counts

    - Cleanup of the functions that display information via the proc filesystem.

    This patch:

    The use of a ZVC for nr_inactive and nr_active allows a simplification of some
    counter operations. More ZVC functionality is used for sums etc in the
    following patches.

    [akpm@osdl.org: UP build fix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • After do_wp_page has tested page_mkwrite, it must release old_page after
    acquiring page table lock, not before: at some stage that ordering got
    reversed, leaving a (very unlikely) window in which old_page might be
    truncated, freed, and reused in the same position.

    Signed-off-by: Hugh Dickins
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • With CONFIG_SPARSEMEM=y:

    mm/rmap.c:579: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'int'

    Make __page_to_pfn() return unsigned long.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • This early break prevents us from displaying info for the vm stats thresholds
    if the zone doesn't have any pages in its per-cpu pagesets.

    So my 800MB i386 box says:

    Node 0, zone DMA
    pages free 2365
    min 16
    low 20
    high 24
    active 0
    inactive 0
    scanned 0 (a: 0 i: 0)
    spanned 4096
    present 4044
    nr_anon_pages 0
    nr_mapped 1
    nr_file_pages 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 0
    nr_page_table_pages 0
    nr_dirty 0
    nr_writeback 0
    nr_unstable 0
    nr_bounce 0
    nr_vmscan_write 0
    protection: (0, 868, 868)
    pagesets
    all_unreclaimable: 0
    prev_priority: 12
    start_pfn: 0
    Node 0, zone Normal
    pages free 199713
    min 934
    low 1167
    high 1401
    active 10215
    inactive 4507
    scanned 0 (a: 0 i: 0)
    spanned 225280
    present 222420
    nr_anon_pages 2685
    nr_mapped 1110
    nr_file_pages 12055
    nr_slab_reclaimable 2216
    nr_slab_unreclaimable 1527
    nr_page_table_pages 213
    nr_dirty 0
    nr_writeback 0
    nr_unstable 0
    nr_bounce 0
    nr_vmscan_write 0
    protection: (0, 0, 0)
    pagesets
    cpu: 0 pcp: 0
    count: 152
    high: 186
    batch: 31
    cpu: 0 pcp: 1
    count: 13
    high: 62
    batch: 15
    vm stats threshold: 16
    cpu: 1 pcp: 0
    count: 34
    high: 186
    batch: 31
    cpu: 1 pcp: 1
    count: 10
    high: 62
    batch: 15
    vm stats threshold: 16
    all_unreclaimable: 0
    prev_priority: 12
    start_pfn: 4096

    Just nuke all that search-for-the-first-non-empty-pageset code. Dunno why it
    was there in the first place..

    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • find_min_pfn_for_node() and find_min_pfn_with_active_regions() sort
    early_node_map[] on every call. This is an excessive amount of sorting and
    that can be avoided. This patch always searches the whole early_node_map[]
    in find_min_pfn_for_node() instead of returning the first value found. The
    map is then only sorted once when required. Successfully boot tested on a
    number of machines.

    [akpm@osdl.org: cleanup]
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
    flag: use "MAP_ANONYMOUS" instead.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Use the pointer passed to cache_reap to determine the work pointer and
    consolidate exit paths.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Clean up __cache_alloc and __cache_alloc_node functions a bit. We no
    longer need to do NUMA_BUILD tricks and the UMA allocation path is much
    simpler. No functional changes in this patch.

    Note: saves few kernel text bytes on x86 NUMA build due to using gotos in
    __cache_alloc_node() and moving __GFP_THISNODE check in to
    fallback_alloc().

    Cc: Andy Whitcroft
    Cc: Christoph Hellwig
    Cc: Manfred Spraul
    Acked-by: Christoph Lameter
    Cc: Paul Jackson
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • The PageSlab debug check in kfree_debugcheck() is broken for compound
    pages. It is also redundant as we already do BUG_ON for non-slab pages in
    page_get_cache() and page_get_slab() which are always called before we free
    any actual objects.

    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     

10 Feb, 2007

26 commits

  • The ATA_ENABLE_PATA define was never meant to be permanent, and in
    recent kernels, it's already been unconditionally enabled. Remove.

    Signed-off-by: Jeff Garzik

    Jeff Garzik
     
  • Signed-off-by: Jeff Garzik

    Jeff Garzik
     
  • If we are doing a PIO setup for a CFA card and it blows up with a device
    error then assume it is an older CFA card which doesn't support this
    rather than failing the device out of existance.

    Stands seperate to the quieting patch but that is obviously useful with
    this change.

    Signed-off-by: Alan Cox
    Signed-off-by: Jeff Garzik

    Alan
     
  • ata_pci_device_do_resume can fail if the PCI device couldn't be re-enabled.
    Update sata_nv to propagate the return value from this call and to not try
    to do any other resume activities if it fails. Fixes a compile warning.

    Signed-off-by: Robert Hancock
    Signed-off-by: Andrew Morton
    Signed-off-by: Jeff Garzik

    Robert Hancock
     
  • Update sata_nv to wait for the controller to indicate via the status
    register that it has entered the requested state when switching between
    ADMA mode and register mode. This issue came up recently when debugging
    some problems with cache flush command timeouts and while it didn't appear
    to fix that problem, this is something we should likely be doing in any
    case.

    Signed-off-by: Robert Hancock
    Cc: Tejun Heo
    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Jeff Garzik

    Robert Hancock
     
  • Some problems showed up recently with cache flush commands timing out on
    sata_nv. Previously these commands were always handled by transitioning to
    legacy mode from ADMA mode first. The timeout problem was worked around
    already by a change to the interrupt handling code for legacy mode, but for
    non-data commands like these it appears we can handle them in ADMA mode, so
    the switch to legacy mode is not needed.

    This patch changes the behavior so that we use ADMA mode to submit
    interrupt-driven commands with ATA_PROT_NODATA protocol. In addition to
    avoiding the problem mentioned above entirely, this avoids the overhead of
    switching to legacy mode and back to ADMA mode for handling cache flushes.
    When handling non-DMA-mapped commands, we leave the APRD blank and clear
    the NV_CPB_CTL_APRD_VALID field in the CPB so the controller does not
    attempt to read it.

    Signed-off-by: Robert Hancock
    Cc: Jeff Garzik
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Jeff Garzik

    Robert Hancock
     
  • This cleans up a few issues with the error handling in sata_nv in ADMA mode
    to make it more consistent with other NCQ-capable drivers like ahci and
    sata_sil24:

    - When a command failed, we would effectively set AC_ERR_DEV on the
    queued command always. In the case of NCQ commands this prevents libata
    from doing a log page query to determine the details of the failed
    command, since it thinks we've already analyzed. Just set flags in the
    port ehi->err_mask, then freeze or abort and let libata figure out what
    went wrong.

    - The code handled NV_ADMA_STAT_CPBERR as a "really bad error" which
    caused it to set error flags on every queued command. I don't know
    exactly what this flag means (no docs, grr!) but from what I can guess
    from the standard ADMA spec, it just means that one or more of the CPBs
    had an error, so we just need to go through and do our normal checks in
    this case.

    - In the error_handler function the code would always dump the state of
    all the CPBs. This output seems redundant at this point since libata
    already dumps the state of all active commands on errors (and it also
    triggers at times when it shouldn't, like when suspending). Take this
    out.

    [akpm@osdl.org: many coding-style fixes]
    Signed-off-by: Robert Hancock
    Cc: Jeff Garzik
    Cc: Tejun Heo
    Cc: Allen Martin
    Signed-off-by: Andrew Morton
    Signed-off-by: Jeff Garzik

    Robert Hancock
     
  • MPIIX has only single channel IDE which can be configured for either primary or
    secondary legacy I/O ports and IRQ. So, get rid of the unneeded second probe
    entry in mpiix_init_one() and of the invalid (but unused anyway) enable bits in
    mpiix_pre_reset().

    Warning: this cleanup has only been compile-tested...

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: Jeff Garzik

    Sergei Shtylyov
     
  • Fix clearing/setting the wrong TIME/IE/PPE bits for a slave drive caused by a
    wrong shift count.
    Fix the PIO mode 1 being overclocked by wrongly selecting the fast timing bank.
    Also, fix/rephrase some comments while at it.

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: Jeff Garzik

    Sergei Shtylyov
     
  • Fix the PIO mode 2 using mode 0 timings -- this driver should enable the
    fast timing bank starting with PIO2, just like the ata_piix driver does.
    Also, fix/rephrase some comments while at it.

    Signed-off-by: Sergei Shtylyov
    Signed-off-by: Jeff Garzik

    Sergei Shtylyov
     
  • IORDY and IORDY enable/disable flags.

    Signed-off-by: Alan Cox
    Signed-off-by: Jeff Garzik

    Alan
     
  • People are getting confused about which drivers to enable for PATA PIIX
    type devices. Change the ATA_PIIX line and help to make it clearer.

    Signed-off-by: Alan Cox
    Signed-off-by: Jeff Garzik

    Alan
     
  • * Hardreset must not exit without actually performing reset regardless
    of link status. We're resetting the link after all.

    * Minor message update.

    * 150ms delay is meaningful iff link is online after reset is
    complete.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Follow the old SRST rule and delay 150ms between completion of
    hardreset and status checking. Debouncing delay should usually cover
    this but debounce duration could be shorter than 150ms under certain
    circumstances.

    Usefulness depends on host controller implementation but it can't hurt
    and serves as a reminder that 2s delay for GoVault should also be
    added here.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Per Jeff's suggestion, this patch rearranges the info printed for ATA
    drives into dmesg to add the full ATA firmware revision and model
    information, while keeping the output to 2 lines.

    Signed-off-by: Eric D. Mudama
    Signed-off-by: Jeff Garzik

    Eric D. Mudama
     
  • Fix the wrong "compatible" PIO mode choices: MWDMA0 has 480 ns cycle while PIO1
    only has 383 ns cycle, and MWDMA2 timings matchs those of PIO4 exactly.

    Signed-off-by: Jeff Garzik

    Sergei Shtylyov
     
  • This patch is against each libata driver.

    Two IRQ calls are added in ata_port_operations.
    - irq_on() is used to enable interrupts.
    - irq_ack() is used to acknowledge a device interrupt.

    In most drivers, ata_irq_on() and ata_irq_ack() are used for
    irq_on and irq_ack respectively.

    In some drivers (ex: ahci, sata_sil24) which cannot use them
    as is, ata_dummy_irq_on() and ata_dummy_irq_ack() are used.

    Signed-off-by: Kou Ishizaki
    Signed-off-by: Akira Iguchi
    Signed-off-by: Jeff Garzik

    Akira Iguchi
     
  • This patch is against the libata core and headers.

    Two IRQ calls are added in ata_port_operations.
    - irq_on() is used to enable interrupts.
    - irq_ack() is used to acknowledge a device interrupt.

    In most drivers, ata_irq_on() and ata_irq_ack() are used for
    irq_on and irq_ack respectively.

    In some drivers (ex: ahci, sata_sil24) which cannot use them
    as is, ata_dummy_irq_on() and ata_dummy_irq_ack() are used.

    Signed-off-by: Kou Ishizaki
    Signed-off-by: Akira Iguchi
    Signed-off-by: Jeff Garzik

    Akira Iguchi
     
  • In file included from drivers/infiniband/hw/ipath/ipath_diag.c:44:
    include/linux/io.h:35: warning: 'struct device' declared inside parameter list
    include/linux/io.h:35: warning: its scope is only this definition or declaration

    Cc: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Jeff Garzik

    Andrew Morton
     
  • Convert libata core layer and LLDs to use iomap.

    * managed iomap is used. Pointer to pcim_iomap_table() is cached at
    host->iomap and used through out LLDs. This basically replaces
    host->mmio_base.

    * if possible, pcim_iomap_regions() is used

    Most iomap operation conversions are taken from Jeff Garzik
    's iomap branch.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • devres updates for pata_platform were dropped while merging devres
    patches due to merge conflict. This is the updated version.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • devres change moved iomap.o from obj-$(CONFIG_GENERIC_IOMAP) to lib-y
    making it not linked if no in-kernel driver uses it. Fix it.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Signed-off-by: Jeff Garzik

    Jeff Garzik
     
  • Implement pcim_iomap_regions(). This function takes mask of BARs to
    request and iomap. No BAR should have length of zero. BARs are
    iomapped using pcim_iomap_table().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Now that all LLDs are converted to use devres, default stop callbacks
    are unused. Remove them.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo
     
  • Update libata LLDs to use devres. Core layer is already converted to
    support managed LLDs. This patch simplifies initialization and fixes
    many resource related bugs in init failure and detach path. For
    example, all converted drivers now handle ata_device_add() failure
    gracefully without excessive resource rollback code.

    As most resources are released automatically on driver detach, many
    drivers don't need or can do with much simpler ->{port|host}_stop().
    In general, stop callbacks are need iff port or host needs to be given
    commands to shut it down. Note that freezing is enough in many cases
    and ports are automatically frozen before being detached.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jeff Garzik

    Tejun Heo