12 Jan, 2012

1 commit

  • frv, h8300, m68k, microblaze, openrisc, score, um and xtensa currently
    do not register a CPU device. Add the config option GENERIC_CPU_DEVICES
    which causes a generic CPU device to be registered for each present CPU,
    and make all these architectures select it.

    Richard Weinberger covered UML and suggested using
    per_cpu.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Linus Torvalds

    Ben Hutchings
     

11 Jan, 2012

1 commit

  • lib: use generic pci_iomap on all architectures

    Many architectures don't want to pull in iomap.c,
    so they ended up duplicating pci_iomap from that file.
    That function isn't trivial, and we are going to modify it
    https://lkml.org/lkml/2011/11/14/183
    so the duplication hurts.

    This reduces the scope of the problem significantly,
    by moving pci_iomap to a separate file and
    referencing that from all architectures.

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    alpha: drop pci_iomap/pci_iounmap from pci-noop.c
    mn10300: switch to GENERIC_PCI_IOMAP
    mn10300: add missing __iomap markers
    frv: switch to GENERIC_PCI_IOMAP
    tile: switch to GENERIC_PCI_IOMAP
    tile: don't panic on iomap
    sparc: switch to GENERIC_PCI_IOMAP
    sh: switch to GENERIC_PCI_IOMAP
    powerpc: switch to GENERIC_PCI_IOMAP
    parisc: switch to GENERIC_PCI_IOMAP
    mips: switch to GENERIC_PCI_IOMAP
    microblaze: switch to GENERIC_PCI_IOMAP
    arm: switch to GENERIC_PCI_IOMAP
    alpha: switch to GENERIC_PCI_IOMAP
    lib: add GENERIC_PCI_IOMAP
    lib: move GENERIC_IOMAP to lib/Kconfig

    Fix up trivial conflicts due to changes nearby in arch/{m68k,score}/Kconfig

    Linus Torvalds
     

04 Dec, 2011

1 commit

  • frv uses a version of pci_iomap that simply
    casts and returns back the start address.
    Looking closely, both ioremap and ioport_map seem to
    do this on this platform, so the generic pci_iomap
    will DTRT automatically.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

30 Oct, 2011

1 commit


03 Aug, 2011

1 commit

  • cmpxchg() is widely used by lockless code, including NMI-safe lockless
    code. But on some architectures, the cmpxchg() implementation is not
    NMI-safe, on these architectures the lockless code may need a
    spin_trylock_irqsave() based implementation.

    This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that
    NMI-safe lockless code can depend on it or provide different
    implementation according to it.

    On many architectures, cmpxchg is only NMI-safe for several specific
    operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch
    only guarantees cmpxchg is NMI-safe for sizeof(unsigned long).

    Signed-off-by: Huang Ying
    Acked-by: Mike Frysinger
    Acked-by: Paul Mundt
    Acked-by: Hans-Christian Egtvedt
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Chris Metcalf
    Acked-by: Richard Henderson
    CC: Mikael Starvik
    Acked-by: David Howells
    CC: Yoshinori Sato
    CC: Tony Luck
    CC: Hirokazu Takata
    CC: Geert Uytterhoeven
    CC: Michal Simek
    Acked-by: Ralf Baechle
    CC: Kyle McMartin
    CC: Martin Schwidefsky
    CC: Chen Liqin
    CC: "David S. Miller"
    CC: Ingo Molnar
    CC: Chris Zankel
    Signed-off-by: Len Brown

    Huang Ying
     

27 May, 2011

1 commit

  • By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
    CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
    to test for existence of find bitops anymore.

    Signed-off-by: Akinobu Mita
    Acked-by: Greg Ungerer
    Cc: Arnd Bergmann
    Cc: Russell King
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

30 Mar, 2011

1 commit


29 Mar, 2011

3 commits


24 Mar, 2011

1 commit

  • This introduces CONFIG_GENERIC_FIND_BIT_LE to tell whether to use generic
    implementation of find_*_bit_le() in lib/find_next_bit.c or not.

    For now we select CONFIG_GENERIC_FIND_BIT_LE for all architectures which
    enable CONFIG_GENERIC_FIND_NEXT_BIT.

    But m68knommu wants to define own faster find_next_zero_bit_le() and
    continues using generic find_next_{,zero_}bit().
    (CONFIG_GENERIC_FIND_NEXT_BIT and !CONFIG_GENERIC_FIND_BIT_LE)

    Signed-off-by: Akinobu Mita
    Cc: Greg Ungerer
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

21 Jan, 2011

2 commits

  • No functional change.

    Signed-off-by: Thomas Gleixner
    Acked-by: David Howells

    Thomas Gleixner
     
  • All architectures are finally converted. Remove the cruft.

    Signed-off-by: Thomas Gleixner
    Cc: Richard Henderson
    Cc: Mike Frysinger
    Cc: David Howells
    Cc: Tony Luck
    Cc: Greg Ungerer
    Cc: Michal Simek
    Acked-by: David Howells
    Cc: Kyle McMartin
    Acked-by: Benjamin Herrenschmidt
    Cc: Chen Liqin
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Jeff Dike

    Thomas Gleixner
     

29 Oct, 2010

1 commit

  • * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (38 commits)
    kbuild: convert `arch/tile' to the kconfig mainmenu upgrade
    README: cite nconfig
    Revert "kconfig: Temporarily disable dependency warnings"
    kconfig: Use PATH_MAX instead of 128 for path buffer sizes.
    kconfig: Fix realloc usage()
    kconfig: Propagate const
    kconfig: Don't go out from read config loop when you read new symbol
    kconfig: fix menuconfig on debian lenny
    kbuild: migrate all arch to the kconfig mainmenu upgrade
    kconfig: expand file names
    kconfig: use the file's name of sourced file
    kconfig: constify file name
    kconfig: don't emit warning upon rootmenu's prompt redefinition
    kconfig: replace KERNELVERSION usage by the mainmenu's prompt
    kconfig: delay gconf window initialization
    kconfig: expand by default the rootmenu's prompt
    kconfig: add a symbol string expansion helper
    kconfig: regen parser
    kconfig: implement the `mainmenu' directive
    kconfig: allow PACKAGE to be defined on the compiler's command-line
    ...

    Fix up trivial conflict in arch/mn10300/Kconfig

    Linus Torvalds
     

19 Oct, 2010

1 commit

  • Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

    Signed-off-by: Peter Zijlstra
    Acked-by: Kyle McMartin
    Acked-by: Martin Schwidefsky
    [ various fixes ]
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

12 Oct, 2010

1 commit


20 Sep, 2010

1 commit


27 Jul, 2010

1 commit

  • Now that all arches have been converted over to use generic time via
    clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
    config option and simplify the generic code.

    Signed-off-by: John Stultz
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    John Stultz
     

21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Jul, 2009

1 commit


12 Jun, 2009

1 commit


20 Oct, 2008

1 commit

  • This patch implements a new freezer subsystem in the control groups
    framework. It provides a way to stop and resume execution of all tasks in
    a cgroup by writing in the cgroup filesystem.

    The freezer subsystem in the container filesystem defines a file named
    freezer.state. Writing "FROZEN" to the state file will freeze all tasks
    in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
    the cgroup. Reading will return the current state.

    * Examples of usage :

    # mkdir /containers/freezer
    # mount -t cgroup -ofreezer freezer /containers
    # mkdir /containers/0
    # echo $some_pid > /containers/0/tasks

    to get status of the freezer subsystem :

    # cat /containers/0/freezer.state
    RUNNING

    to freeze all tasks in the container :

    # echo FROZEN > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    FREEZING
    # cat /containers/0/freezer.state
    FROZEN

    to unfreeze all tasks in the container :

    # echo RUNNING > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    RUNNING

    This is the basic mechanism which should do the right thing for user space
    task in a simple scenario.

    It's important to note that freezing can be incomplete. In that case we
    return EBUSY. This means that some tasks in the cgroup are busy doing
    something that prevents us from completely freezing the cgroup at this
    time. After EBUSY, the cgroup will remain partially frozen -- reflected
    by freezer.state reporting "FREEZING" when read. The state will remain
    "FREEZING" until one of these things happens:

    1) Userspace cancels the freezing operation by writing "RUNNING" to
    the freezer.state file
    2) Userspace retries the freezing operation by writing "FROZEN" to
    the freezer.state file (writing "FREEZING" is not legal
    and returns EIO)
    3) The tasks that blocked the cgroup from entering the "FROZEN"
    state disappear from the cgroup's set of tasks.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: export thaw_process]
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Matt Helsley
    Acked-by: Serge E. Hallyn
    Tested-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     

09 Feb, 2008

2 commits

  • To allow flexible configuration of IDE introduce HAVE_IDE.
    All archs except arm, um and s390 unconditionally select it.
    For arm the actual configuration determine if IDE is supported.

    This is a step towards introducing drivers/Kconfig for arm.

    Signed-off-by: Sam Ravnborg
    Acked-by: Russell King - ARM Linux
    Acked-by: Bartlomiej Zolnierkiewicz

    Sam Ravnborg
     
  • When the conversion factor between jiffies and milli- or microseconds is
    not a single multiply or divide, as for the case of HZ == 300, we currently
    do a multiply followed by a divide. The intervening result, however, is
    subject to overflows, especially since the fraction is not simplified (for
    HZ == 300, we multiply by 300 and divide by 1000).

    This is exposed to the user when passing a large timeout to poll(), for
    example.

    This patch replaces the multiply-divide with a reciprocal multiplication on
    32-bit platforms. When the input is an unsigned long, there is no portable
    way to do this on 64-bit platforms there is no portable way to do this
    since it requires a 128-bit intermediate result (which gcc does support on
    64-bit platforms but may generate libgcc calls, e.g. on 64-bit s390), but
    since the output is a 32-bit integer in the cases affected, just simplify
    the multiply-divide (*3/10 instead of *300/1000).

    The reciprocal multiply used can have off-by-one errors in the upper half
    of the valid output range. This could be avoided at the expense of having
    to deal with a potential 65-bit intermediate result. Since the intent is
    to avoid overflow problems and most of the other time conversions are only
    semiexact, the off-by-one errors were considered an acceptable tradeoff.

    At Ralf Baechle's suggestion, this version uses a Perl script to compute
    the necessary constants. We already have dependencies on Perl for kernel
    compiles. This does, however, require the Perl module Math::BigInt, which
    is included in the standard Perl distribution starting with version 5.8.0.
    In order to support older versions of Perl, include a table of canned
    constants in the script itself, and structure the script so that
    Math::BigInt isn't required if pulling values from said table.

    Running the script requires that the HZ value is available from the
    Makefile. Thus, this patch also adds the Kconfig variable CONFIG_HZ to the
    architectures which didn't already have it (alpha, cris, frv, h8300, m32r,
    m68k, m68knommu, sparc, v850, and xtensa.) It does *not* touch the sh or
    sh64 architectures, since Paul Mundt has dealt with those separately in the
    sh tree.

    Signed-off-by: H. Peter Anvin
    Cc: Ralf Baechle ,
    Cc: Sam Ravnborg ,
    Cc: Paul Mundt ,
    Cc: Richard Henderson ,
    Cc: Michael Starvik ,
    Cc: David Howells ,
    Cc: Yoshinori Sato ,
    Cc: Hirokazu Takata ,
    Cc: Geert Uytterhoeven ,
    Cc: Roman Zippel ,
    Cc: William L. Irwin ,
    Cc: Chris Zankel ,
    Cc: H. Peter Anvin ,
    Cc: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     

06 Feb, 2008

1 commit

  • Permit the memory to be located somewhere other than address 0xC0000000 in
    NOMMU mode. The configuration options are already present, it just
    requires wiring up in the linker script.

    Note that only a limited set of locations of runtime addresses are available
    because of the way the CPU protection registers work.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

04 Feb, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits)
    Jesper Juhl is the new trivial patches maintainer
    Documentation: mention email-clients.txt in SubmittingPatches
    fs/binfmt_elf.c: spello fix
    do_invalidatepage() comment typo fix
    Documentation/filesystems/porting fixes
    typo fixes in net/core/net_namespace.c
    typo fix in net/rfkill/rfkill.c
    typo fixes in net/sctp/sm_statefuns.c
    lib/: Spelling fixes
    kernel/: Spelling fixes
    include/scsi/: Spelling fixes
    include/linux/: Spelling fixes
    include/asm-m68knommu/: Spelling fixes
    include/asm-frv/: Spelling fixes
    fs/: Spelling fixes
    drivers/watchdog/: Spelling fixes
    drivers/video/: Spelling fixes
    drivers/ssb/: Spelling fixes
    drivers/serial/: Spelling fixes
    drivers/scsi/: Spelling fixes
    ...

    Linus Torvalds
     

03 Feb, 2008

2 commits

  • My first guess for "fujitsu" was it might be related to the
    fujitsu-laptop.c driver...

    Move the frv directory one level up since frv is the name of the
    architecture in the Linux kernel.

    Signed-off-by: Adrian Bunk

    Adrian Bunk
     
  • Move the instrumentation Kconfig to

    arch/Kconfig for architecture dependent options
    - oprofile
    - kprobes

    and

    init/Kconfig for architecture independent options
    - profiling
    - markers

    Remove the "Instrumentation Support" menu. Everything moves to "General setup".
    Delete the kernel/Kconfig.instrumentation file.

    Signed-off-by: Mathieu Desnoyers
    Cc: Linus Torvalds
    Cc:
    Signed-off-by: Sam Ravnborg

    Mathieu Desnoyers
     

02 Feb, 2008

3 commits

  • * 'suspend' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (38 commits)
    suspend: cleanup reference to swsusp_pg_dir[]
    PM: Remove obsolete /sys/devices/.../power/state docs
    Hibernation: Invoke suspend notifications after console switch
    Suspend: Invoke suspend notifications after console switch
    Suspend: Clean up suspend_64.c
    Suspend: Add config option to disable the freezer if architecture wants that
    ACPI: Print message before calling _PTS
    ACPI hibernation: Call _PTS before suspending devices
    Hibernation: Introduce begin() and end() callbacks
    ACPI suspend: Call _PTS before suspending devices
    ACPI: Separate disabling of GPEs from _PTS
    ACPI: Separate invocations of _GTS and _BFS from _PTS and _WAK
    Suspend: Introduce begin() and end() callbacks
    suspend: fix ia64 allmodconfig build
    ACPI: clear GPE earily in resume to avoid warning
    Suspend: Clean up Kconfig (V2)
    Hibernation: Clean up Kconfig (V2)
    Hibernation: Update messages
    Suspend: Use common prefix in messages
    Hibernation: Remove unnecessary variable declaration
    ...

    Linus Torvalds
     
  • This cleans up the suspend Kconfig and removes the need to
    declare centrally which architectures support suspend. All
    architectures that currently support suspend are modified
    accordingly.

    Signed-off-by: Johannes Berg
    Acked-by: Russell King
    Acked-by: Paul Mackerras
    Acked-by: Ralf Baechle
    Acked-by: Paul Mundt
    Cc: Pavel Machek
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Len Brown

    Johannes Berg
     
  • A HOWTO that hasn't been updated for half a dozen years no longer
    "contains valuable information about which PCI hardware does work under
    Linux and which doesn't".

    Signed-off-by: Adrian Bunk
    Signed-off-by: Greg Kroah-Hartman

    Adrian Bunk
     

20 Oct, 2007

1 commit

  • Quoting Randy:

    "It seems sad that this patch sources Kconfig.marker, a 7-line file,
    20-something times. Yes, you (we) don't want to put those 7 lines into
    20-something different files, so sourcing is the right thing.

    However, what you did for avr32 seems more on the right track to me: make
    _one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
    and MARKERS and then use (source) that in all of the arches."

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

17 May, 2007

1 commit

  • Currently we have a maze of configuration variables that determine the
    maximum slab size. Worst of all it seems to vary between SLAB and SLUB.

    So define a common maximum size for kmalloc. For conveniences sake we use
    the maximum size ever supported which is 32 MB. We limit the maximum size
    to a lower limit if MAX_ORDER does not allow such large allocations.

    For many architectures this patch will have the effect of adding large
    kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of
    memory will be needed for these caches (contemporary SLAB has dynamically
    sizeable node and cpu structure so the waste is less than in the past)

    Most architectures will then be able to allocate object with sizes up to
    MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the
    number of supported processors) on IA64 because one or the other struct
    grew beyond what the slab allocators supported. This will avoid future
    issues and f.e. avoid fixes for 2k and 4k cpu support.

    CONFIG_LARGE_ALLOCS is no longer necessary so drop it.

    It fixes sparc64 with SLAB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 May, 2007

1 commit


08 May, 2007

1 commit

  • This is a new slab allocator which was motivated by the complexity of the
    existing code in mm/slab.c. It attempts to address a variety of concerns
    with the existing implementation.

    A. Management of object queues

    A particular concern was the complex management of the numerous object
    queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
    each allocating CPU and use objects from a slab directly instead of
    queueing them up.

    B. Storage overhead of object queues

    SLAB Object queues exist per node, per CPU. The alien cache queue even
    has a queue array that contain a queue for each processor on each
    node. For very large systems the number of queues and the number of
    objects that may be caught in those queues grows exponentially. On our
    systems with 1k nodes / processors we have several gigabytes just tied up
    for storing references to objects for those queues This does not include
    the objects that could be on those queues. One fears that the whole
    memory of the machine could one day be consumed by those queues.

    C. SLAB meta data overhead

    SLAB has overhead at the beginning of each slab. This means that data
    cannot be naturally aligned at the beginning of a slab block. SLUB keeps
    all meta data in the corresponding page_struct. Objects can be naturally
    aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
    boundaries and can fit tightly into a 4k page with no bytes left over.
    SLAB cannot do this.

    D. SLAB has a complex cache reaper

    SLUB does not need a cache reaper for UP systems. On SMP systems
    the per CPU slab may be pushed back into partial list but that
    operation is simple and does not require an iteration over a list
    of objects. SLAB expires per CPU, shared and alien object queues
    during cache reaping which may cause strange hold offs.

    E. SLAB has complex NUMA policy layer support

    SLUB pushes NUMA policy handling into the page allocator. This means that
    allocation is coarser (SLUB does interleave on a page level) but that
    situation was also present before 2.6.13. SLABs application of
    policies to individual slab objects allocated in SLAB is
    certainly a performance concern due to the frequent references to
    memory policies which may lead a sequence of objects to come from
    one node after another. SLUB will get a slab full of objects
    from one node and then will switch to the next.

    F. Reduction of the size of partial slab lists

    SLAB has per node partial lists. This means that over time a large
    number of partial slabs may accumulate on those lists. These can
    only be reused if allocator occur on specific nodes. SLUB has a global
    pool of partial slabs and will consume slabs from that pool to
    decrease fragmentation.

    G. Tunables

    SLAB has sophisticated tuning abilities for each slab cache. One can
    manipulate the queue sizes in detail. However, filling the queues still
    requires the uses of the spin lock to check out slabs. SLUB has a global
    parameter (min_slab_order) for tuning. Increasing the minimum slab
    order can decrease the locking overhead. The bigger the slab order the
    less motions of pages between per CPU and partial lists occur and the
    better SLUB will be scaling.

    G. Slab merging

    We often have slab caches with similar parameters. SLUB detects those
    on boot up and merges them into the corresponding general caches. This
    leads to more effective memory use. About 50% of all caches can
    be eliminated through slab merging. This will also decrease
    slab fragmentation because partial allocated slabs can be filled
    up again. Slab merging can be switched off by specifying
    slub_nomerge on boot up.

    Note that merging can expose heretofore unknown bugs in the kernel
    because corrupted objects may now be placed differently and corrupt
    differing neighboring objects. Enable sanity checks to find those.

    H. Diagnostics

    The current slab diagnostics are difficult to use and require a
    recompilation of the kernel. SLUB contains debugging code that
    is always available (but is kept out of the hot code paths).
    SLUB diagnostics can be enabled via the "slab_debug" option.
    Parameters can be specified to select a single or a group of
    slab caches for diagnostics. This means that the system is running
    with the usual performance and it is much more likely that
    race conditions can be reproduced.

    I. Resiliency

    If basic sanity checks are on then SLUB is capable of detecting
    common error conditions and recover as best as possible to allow the
    system to continue.

    J. Tracing

    Tracing can be enabled via the slab_debug=T, option
    during boot. SLUB will then protocol all actions on that slabcache
    and dump the object contents on free.

    K. On demand DMA cache creation.

    Generally DMA caches are not needed. If a kmalloc is used with
    __GFP_DMA then just create this single slabcache that is needed.
    For systems that have no ZONE_DMA requirement the support is
    completely eliminated.

    L. Performance increase

    Some benchmarks have shown speed improvements on kernbench in the
    range of 5-10%. The locking overhead of slub is based on the
    underlying base allocation size. If we can reliably allocate
    larger order pages then it is possible to increase slub
    performance much further. The anti-fragmentation patches may
    enable further performance increases.

    Tested on:
    i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

    SLUB Boot options

    slub_nomerge Disable merging of slabs
    slub_min_order=x Require a minimum order for slab caches. This
    increases the managed chunk size and therefore
    reduces meta data and locking overhead.
    slub_min_objects=x Mininum objects per slab. Default is 8.
    slub_max_order=x Avoid generating slabs larger than order specified.
    slub_debug Enable all diagnostics for all caches
    slub_debug= Enable selective options for all caches
    slub_debug=, Enable selective options for a certain set of
    caches

    Available Debug options
    F Double Free checking, sanity and resiliency
    R Red zoning
    P Object / padding poisoning
    U Track last free / alloc
    T Trace all allocs / frees (only use for individual slabs).

    To use SLUB: Apply this patch and then select SLUB as the default slab
    allocator.

    [hugh@veritas.com: fix an oops-causing locking error]
    [akpm@linux-foundation.org: various stupid cleanups and small fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

12 Feb, 2007

1 commit

  • This patch simply defines CONFIG_ZONE_DMA for all arches. We later do special
    things with CONFIG_ZONE_DMA after the VM and an arch are prepared to work
    without ZONE_DMA.

    CONFIG_ZONE_DMA can be defined in two ways depending on how an architecture
    handles ISA DMA.

    First if CONFIG_GENERIC_ISA_DMA is set by the arch then we know that the arch
    needs ZONE_DMA because ISA DMA devices are supported. We can catch this in
    mm/Kconfig and do not need to modify arch code.

    Second, arches may use ZONE_DMA in an unknown way. We set CONFIG_ZONE_DMA for
    all arches that do not set CONFIG_GENERIC_ISA_DMA in order to insure backwards
    compatibility. The arches may later undefine ZONE_DMA if their arch code has
    been verified to not depend on ZONE_DMA.

    Signed-off-by: Christoph Lameter
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Kyle McMartin
    Cc: Matthew Wilcox
    Cc: James Bottomley
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Dec, 2006

1 commit

  • This facility provides three entry points:

    ilog2() Log base 2 of unsigned long
    ilog2_u32() Log base 2 of u32
    ilog2_u64() Log base 2 of u64

    These facilities can either be used inside functions on dynamic data:

    int do_something(long q)
    {
    ...;
    y = ilog2(x)
    ...;
    }

    Or can be used to statically initialise global variables with constant values:

    unsigned n = ilog2(27);

    When performing static initialisation, the compiler will report "error:
    initializer element is not constant" if asked to take a log of zero or of
    something not reducible to a constant. They treat negative numbers as
    unsigned.

    When not dealing with a constant, they fall back to using fls() which permits
    them to use arch-specific log calculation instructions - such as BSR on
    x86/x86_64 or SCAN on FRV - if available.

    [akpm@osdl.org: MMC fix]
    Signed-off-by: David Howells
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Herbert Xu
    Cc: David Howells
    Cc: Wojtek Kaniewski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

03 Oct, 2006

1 commit


26 Sep, 2006

2 commits