28 Jun, 2008

1 commit

  • This patch adds saved stack-traces to the backtrace suite of self-tests.

    Note that we don't depend on or unconditionally enable CONFIG_STACKTRACE
    because not all architectures may have it (and we still want to enable the
    other tests for those architectures).

    Cc: Arjan van de Ven
    Signed-off-by: Vegard Nossum
    Cc: Arjan van de Ven
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Vegard Nossum
     

25 May, 2008

1 commit


19 May, 2008

1 commit


15 May, 2008

4 commits

  • * 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: fix error path during early mount
    9p: make cryptic unknown error from server less scary
    9p: fix flags length in net
    9p: Correct fidpool creation failure in p9_client_create
    9p: use struct mutex instead of struct semaphore
    9p: propagate parse_option changes to client and transports
    fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure.
    9p: Documentation updates
    add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
    sparc64: Use a TS_RESTORE_SIGMASK
    lmb: Make lmb debugging more useful.
    lmb: Fix inconsistent alignment of size argument.
    sparc: Fix mremap address range validation.

    Linus Torvalds
     
  • Add a common hex array in hexdump.c so everyone can use it.

    Add a common hi/lo helper to avoid the shifting masking that is
    done to get the upper and lower nibbles of a byte value.

    Pull the pack_hex_byte helper from kgdb as it is opencoded many
    places in the tree that will be consolidated.

    Signed-off-by: Harvey Harrison
    Acked-by: Paul Mundt
    Cc: Jason Wessel
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • match_strcpy() is a somewhat creepy function: the caller needs to make sure
    that the destination buffer is big enough, and when he screws up or
    forgets, match_strcpy() happily overruns the buffer.

    There's exactly one customer: v9fs_parse_options(). I believe it currently
    can't overflow its buffer, but that's not exactly obvious.

    The source string is a substing of the mount options. The kernel silently
    truncates those to PAGE_SIZE bytes, including the terminating zero. See
    compat_sys_mount() and do_mount().

    The destination buffer is obtained from __getname(), which allocates from
    name_cachep, which is initialized by vfs_caches_init() for size PATH_MAX.

    We're safe as long as PATH_MAX
    Cc: Latchesar Ionkov
    Cc: Jim Meyering
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Eric Van Hensbergen

    Markus Armbruster
     

13 May, 2008

3 commits

  • They aren't used. They were briefly used as part of some other patches to
    provide an alternative format for displaying some /proc and /sys cpumasks.
    They probably should have been removed when those other patches were dropped,
    in favor of a different solution.

    Signed-off-by: Paul Jackson
    Cc: "Mike Travis"
    Cc: "Bert Wesarg"
    Cc: Alexey Dobriyan
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Having to muck with the build and set DEBUG just to
    get lmb_dump_all() to print things isn't very useful.

    So use pr_info() and use an early boot param
    "lmb=debug" so we can simply ask users to reboot
    with this option when we need some debugging from
    them.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When allocating, if we will align up the size when making
    the reservation, we should also align the size for the
    check that the space is actually available.

    The simplest thing is to just aling the size up from
    the beginning, then we can use plain 'size' throughout.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 May, 2008

1 commit

  • The generic semaphore rewrite had a huge performance regression on AIM7
    (and potentially other BKL-heavy benchmarks) because the generic
    semaphores had been rewritten to be simple to understand and fair. The
    latter, in particular, turns a semaphore-based BKL implementation into a
    mess of scheduling.

    The attempt to fix the performance regression failed miserably (see the
    previous commit 00b41ec2611dc98f87f30753ee00a53db648d662 'Revert
    "semaphore: fix"'), and so for now the simple and sane approach is to
    instead just go back to the old spinlock-based BKL implementation that
    never had any issues like this.

    This patch also has the advantage of being reported to fix the
    regression completely according to Yanmin Zhang, unlike the semaphore
    hack which still left a couple percentage point regression.

    As a spinlock, the BKL obviously has the potential to be a latency
    issue, but it's not really any different from any other spinlock in that
    respect. We do want to get rid of the BKL asap, but that has been the
    plan for several years.

    These days, the biggest users are in the tty layer (open/release in
    particular) and Alan holds out some hope:

    "tty release is probably a few months away from getting cured - I'm
    afraid it will almost certainly be the very last user of the BKL in
    tty to get fixed as it depends on everything else being sanely locked."

    so while we're not there yet, we do have a plan of action.

    Tested-by: Yanmin Zhang
    Cc: Ingo Molnar
    Cc: Andi Kleen
    Cc: Matthew Wilcox
    Cc: Alexander Viro
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 May, 2008

1 commit


05 May, 2008

2 commits


01 May, 2008

8 commits

  • The return inside the loop makes us free only a single layer.

    Signed-off-by: Nadia Derbey
    Cc: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Add a new sysfs_streq() string comparison function, which ignores
    the trailing newlines found in sysfs inputs. By example:

    sysfs_streq("a", "b") ==> false
    sysfs_streq("a", "a") ==> true
    sysfs_streq("a", "a\n") ==> true
    sysfs_streq("a\n", "a") ==> true

    This is intended to simplify parsing of sysfs inputs, letting them
    avoid the need to manually strip off newlines from inputs.

    Signed-off-by: David Brownell
    Acked-by: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     
  • Rename div64_64 to div64_u64 to make it consistent with the other divide
    functions, so it clearly includes the type of the divide. Move its definition
    to math64.h as currently no architecture overrides the generic implementation.
    They can still override it of course, but the duplicated declarations are
    avoided.

    Signed-off-by: Roman Zippel
    Cc: Avi Kivity
    Cc: Russell King
    Cc: Geert Uytterhoeven
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: Jeff Dike
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • The current do_div doesn't explicitly say that it's unsigned and the signed
    counterpart is missing, which is e.g. needed when dealing with time values.

    This introduces 64bit signed/unsigned divide functions which also attempts to
    cleanup the somewhat awkward calling API, which often requires the use of
    temporary variables for the dividend. To avoid the need for temporary
    variables everywhere for the remainder, each divide variant also provides a
    version which doesn't return the remainder.

    Each architecture can now provide optimized versions of these function,
    otherwise generic fallback implementations will be used.

    As an example I provided an alternative for the current x86 divide, which
    avoids the asm casts and using an union allows gcc to generate better code.
    It also avoids the upper divde in a few more cases, where the result is known
    (i.e. upper quotient is zero).

    Signed-off-by: Roman Zippel
    Cc: john stultz
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • Finally clean up the odd spacing in these files.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Use a resource_size_t instead of unsigned long since some arch's are
    capable of having ioremap deal with addresses greater than the size of a
    unsigned long.

    Signed-off-by: Kumar Gala
    Cc: Tejun Heo
    Cc: Jeff Garzik
    Signed-off-by: Greg Kroah-Hartman

    Kumar Gala
     
  • This prevents a few unneeded copies.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     
  • Add klist_add_after() and klist_add_before() which puts a new node
    after and before an existing node, respectively. This is useful for
    callers which need to keep klist ordered. Note that synchronizing
    between simultaneous additions for ordering is the caller's
    responsibility.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

30 Apr, 2008

5 commits

  • __FUNCTION__ is gcc specific, use __func__

    Signed-off-by: Harvey Harrison
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • Add calls to the generic object debugging infrastructure and provide fixup
    functions which allow to keep the system alive when recoverable problems have
    been detected by the object debugging core code.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • We can see an ever repeating problem pattern with objects of any kind in the
    kernel:

    1) freeing of active objects
    2) reinitialization of active objects

    Both problems can be hard to debug because the crash happens at a point where
    we have no chance to decode the root cause anymore. One problem spot are
    kernel timers, where the detection of the problem often happens in interrupt
    context and usually causes the machine to panic.

    While working on a timer related bug report I had to hack specialized code
    into the timer subsystem to get a reasonable hint for the root cause. This
    debug hack was fine for temporary use, but far from a mergeable solution due
    to the intrusiveness into the timer code.

    The code further lacked the ability to detect and report the root cause
    instantly and keep the system operational.

    Keeping the system operational is important to get hold of the debug
    information without special debugging aids like serial consoles and special
    knowledge of the bug reporter.

    The problems described above are not restricted to timers, but timers tend to
    expose it usually in a full system crash. Other objects are less explosive,
    but the symptoms caused by such mistakes can be even harder to debug.

    Instead of creating specialized debugging code for the timer subsystem a
    generic infrastructure is created which allows developers to verify their code
    and provides an easy to enable debug facility for users in case of trouble.

    The debugobjects core code keeps track of operations on static and dynamic
    objects by inserting them into a hashed list and sanity checking them on
    object operations and provides additional checks whenever kernel memory is
    freed.

    The tracked object operations are:
    - initializing an object
    - adding an object to a subsystem list
    - deleting an object from a subsystem list

    Each operation is sanity checked before the operation is executed and the
    subsystem specific code can provide a fixup function which allows to prevent
    the damage of the operation. When the sanity check triggers a warning message
    and a stack trace is printed.

    The list of operations can be extended if the need arises. For now it's
    limited to the requirements of the first user (timers).

    The core code enqueues the objects into hash buckets. The hash index is
    generated from the address of the object to simplify the lookup for the check
    on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
    global lock.

    The debug code can be compiled in without being active. The runtime overhead
    is minimal and could be optimized by asm alternatives. A kernel command line
    option enables the debugging code.

    Thanks to Ingo Molnar for review, suggestions and cleanup patches.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Cc: Greg KH
    Cc: Randy Dunlap
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Add "max_ratio" to /sys/class/bdi. This indicates the maximum percentage of
    the global dirty threshold allocated to this bdi.

    [mszeredi@suse.cz]

    - fix parsing in max_ratio_store().
    - export bdi_set_max_ratio() to modules
    - limit bdi_dirty with bdi->max_ratio
    - document new sysfs attribute

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
    This allows us to see and set the various BDI specific variables.

    In particular this properly exposes the read-ahead window for all relevant
    users and /sys/block//queue/read_ahead_kb should be deprecated.

    With patient help from Kay Sievers and Greg KH

    [mszeredi@suse.cz]

    - split off NFS and FUSE changes into separate patches
    - document new sysfs attributes under Documentation/ABI
    - do bdi_class_init as a core_initcall, otherwise the "default" BDI
    won't be initialized
    - remove bdi_init_fmt macro, it's not used very much

    [akpm@linux-foundation.org: fix ia64 warning]
    Signed-off-by: Peter Zijlstra
    Cc: Kay Sievers
    Acked-by: Greg KH
    Cc: Trond Myklebust
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

29 Apr, 2008

11 commits

  • * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
    [RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
    [RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
    [RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
    [RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
    [RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
    [RAPIDIO] Auto-probe the RapidIO system size
    [RAPIDIO] Add OF-tree support to RapidIO controller driver
    [RAPIDIO] Add RapidIO multi mport support
    [RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
    [RAPIDIO] Add RapidIO option to kernel configuration
    [RAPIDIO] Change RIO function mpc85xx_ to fsl_
    [POWERPC] Provide walk_memory_resource() for powerpc
    [POWERPC] Update lmb data structures for hotplug memory add/remove
    [POWERPC] Hotplug memory remove notifications for powerpc
    [POWERPC] windfarm: Add PowerMac 12,1 support
    [POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
    [POWERPC] Add IRQSTACKS support on ppc32
    [POWERPC] Use __always_inline for xchg* and cmpxchg*
    [POWERPC] Add fast little-endian switch system call

    Linus Torvalds
     
  • The mapsize optimizations which were moved from x86 to the generic
    code in commit 64970b68d2b3ed32b964b0b30b1b98518fde388e increased the
    binary size on non x86 architectures.

    Looking into the real effects of the "optimizations" it turned out
    that they are not used in find_next_bit() and find_next_zero_bit().

    The ones in find_first_bit() and find_first_zero_bit() are used in a
    couple of places but none of them is a real hot path.

    Remove the "optimizations" all together and call the library functions
    unconditionally.

    Boot-tested on x86 and compile tested on every cross compiler I have.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
    unconditionary at boot time rather than creating it on-demand when idr_init()
    is called the first time.

    This change also enables us to eliminate the check every time idr_init() is
    called.

    [akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
    [akpm@linux-foundation.org: fix alpha build]
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces.
    Implement the old dma_*map_*() interfaces in terms of the corresponding new
    interfaces. For ia64/sn, make use of one dma attribute,
    DMA_ATTR_WRITE_BARRIER. Introduce swiotlb_*map*_attrs() functions.

    Signed-off-by: Arthur Kepner
    Cc: Tony Luck
    Cc: Jesse Barnes
    Cc: Jes Sorensen
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Cc: James Bottomley
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Grant Grundler
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Kepner
     
  • Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some
    serious complaining of kernel, we need repeat the warnings, so here I isolate
    the ratelimit part of printk.c to a standalone file.

    Signed-off-by: Dave Young
    Acked-by: Paul E. McKenney
    Tested-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
    (commit 3715863aa142c4f4c5208f5f3e5e9bac06006d2f). SWIOTLB can use it instead
    of the homegrown function.

    Signed-off-by: FUJITA Tomonori
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     
  • There's a pointlessly braced block of code in there. Remove the braces and
    save a tabstop.

    Cc: Andi Kleen
    Cc: FUJITA Tomonori
    Cc: Jan Beulich
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Almost all implementations of pci_iomap() in the kernel, including the generic
    lib/iomap.c one, copies the content of a struct resource into unsigned long's
    which will break on 32 bits platforms with 64 bits resources.

    This fixes all definitions of pci_iomap() to use resource_size_t. I also
    "fixed" the 64bits arch for consistency.

    Signed-off-by: Benjamin Herrenschmidt
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • lib/inflate.c (inflate_dynamic): Don't deref NULL upon failed malloc.

    Signed-off-by: Jim Meyering
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Meyering
     
  • Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains
    logical memory region mapping in the lmb.memory structure. Walk
    through these structures and do the callbacks for the contiguous
    chunks.

    Signed-off-by: Badari Pulavarty
    Cc: Yasunori Goto
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Paul Mackerras

    Badari Pulavarty
     
  • The powerpc kernel maintains information about logical memory blocks
    in the lmb.memory structure, which is initialized and updated at boot
    time, but not when memory is added or removed while the kernel is
    running.

    This adds a hotplug memory notifier which updates lmb.memory when
    memory is added or removed. This information is useful for eHEA
    driver to find out the memory layout and holes.

    NOTE: No special locking is needed for lmb_add() and lmb_remove().
    Calls to these are serialized by caller. (pSeries_reconfig_chain).

    Signed-off-by: Badari Pulavarty
    Cc: Yasunori Goto
    Cc: Benjamin Herrenschmidt
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Paul Mackerras

    Badari Pulavarty
     

28 Apr, 2008

2 commits

  • The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
    with the usual cpumask and nodemask wrappers.

    The bitmap_onto() operator computes one bitmap relative to another. If the
    n-th bit in the origin mask is set, then the m-th bit of the destination mask
    will be set, where m is the position of the n-th set bit in the relative mask.

    The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
    the input bitmap has some bit n set, where m == n mod sz, for the specified sz
    value.

    There are two substantive changes between this patch and its
    predecessor bitmap_relative:
    1) Renamed bitmap_relative() to be bitmap_onto().
    2) Added bitmap_fold().

    The essential motivation for bitmap_onto() is to provide a mechanism for
    converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset
    relative masks are written as if the current task were in a cpuset whose CPUs
    or Nodes were just the consecutive ones numbered 0..N-1, for some N. The
    bitmap_onto() operator is provided in anticipation of adding support for the
    first such cpuset relative mask, by the mbind() and set_mempolicy() system
    calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators
    (and their nodemask wrappers, in particular) will be used in code that
    converts the user specified cpuset relative memory policy to a specific system
    node numbered policy, given the current mems_allowed of the tasks cpuset.

    Such cpuset relative mempolicies will address two deficiencies
    of the existing interface between cpusets and mempolicies:
    1) A task cannot at present reliably establish a cpuset
    relative mempolicy because there is an essential race
    condition, in that the tasks cpuset may be changed in
    between the time the task can query its cpuset placement,
    and the time the task can issue the applicable mbind or
    set_memplicy system call.
    2) A task cannot at present establish what cpuset relative
    mempolicy it would like to have, if it is in a smaller
    cpuset than it might have mempolicy preferences for,
    because the existing interface only allows specifying
    mempolicies for nodes currently allowed by the cpuset.

    Cpuset relative mempolicies are useful for tasks that don't distinguish
    particularly between one CPU or Node and another, but only between how many of
    each are allowed, and the proper placement of threads and memory pages on the
    various CPUs and Nodes available.

    The motivation for the added bitmap_fold() can be seen in the following
    example.

    Let's say an application has specified some mempolicies that presume 16 memory
    nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
    relative) nodes 12-15. Then lets say that application is crammed into a
    cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(),
    this mempolicy, mapped to that cpuset, would ignore the requested relative
    nodes above 7, leaving it empty of nodes. That's not good; better to fold the
    higher nodes down, so that some nodes are included in the resulting mapped
    mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
    weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
    specifying nodes 4-7.

    Signed-off-by: Paul Jackson
    Signed-off-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Andi Kleen
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Migrate flags must be set on slab creation as agreed upon when the antifrag
    logic was reviewed. Otherwise some slabs of a slabcache will end up in the
    unmovable and others in the reclaimable section depending on which flag was
    active when a new slab page was allocated.

    This likely slid in somehow when antifrag was merged. Remove it.

    The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
    SLAB_RECLAIM_ACCOUNT option is set. The set_migrateflags() never had any
    effect there.

    Radix tree allocations are not directly reclaimable but they are allocated
    with __GFP_RECLAIMABLE set on each allocation. We now set
    SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
    tree slabs are consistently placed in the reclaimable section. Radix tree
    slabs will also be accounted as such.

    There is then no user left of set_migratepages. So remove it.

    Signed-off-by: Christoph Lameter
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter