Eric Lee / smarc-fsl-linux-kernel

08 Dec, 2006

24 commits

a866374ae [PATCH] mm: pagefault_{disable,enable}() ... Browse Code »

Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.

Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.

(NOTE: the extra barrier() in pagefault_disable might fix some holes on
machines which have too many registers for their own good)

[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-08 00:39:21 +0800
6edaf68a8 [PATCH] mm: arch do_page_fault() vs in_atomic() ... Browse Code »

In light of the recent pagefault and filemap_copy_from_user work I've gone
through all the arch pagefault handlers to make sure the inc_preempt_count()
'feature' works as expected.

Several sections of code (including the new filemap_copy_from_user) rely on
the fact that faults do not take locks under increased preempt count.

arch/x86_64 - good
arch/powerpc - good
arch/cris - fixed
arch/i386 - good
arch/parisc - fixed
arch/sh - good
arch/sparc - good
arch/s390 - good
arch/m68k - fixed
arch/ppc - good
arch/alpha - fixed
arch/mips - good
arch/sparc64 - good
arch/ia64 - good
arch/arm - fixed
arch/um - good
arch/avr32 - good
arch/h8300 - NA
arch/m32r - good
arch/v850 - good
arch/frv - fixed
arch/m68knommu - NA
arch/arm26 - fixed
arch/sh64 - fixed
arch/xtensa - good

Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-08 00:39:21 +0800
3395ee058 [PATCH] mm: add noaliencache boot option to disable numa alien caches ... Browse Code »

When using numa=fake on non-NUMA hardware there is no benefit to having the
alien caches, and they consume much memory.

Add a kernel boot option to disable them.

Christoph sayeth "This is good to have even on large NUMA. The problem is
that the alien caches grow by the square of the size of the system in terms of
nodes."

Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2006-12-08 00:39:21 +0800
8f5be20bf [PATCH] mm: slab: eliminate lock_cpu_hotplug from slab ... Browse Code »

Here's an attempt towards doing away with lock_cpu_hotplug in the slab
subsystem. This approach also fixes a bug which shows up when cpus are
being offlined/onlined and slab caches are being tuned simultaneously.

http://marc.theaimsgroup.com/?l=linux-kernel&m=116098888100481&w=2

The patch has been stress tested overnight on a 2 socket 4 core AMD box with
repeated cpu online and offline, while dbench and kernbench process are
running, and slab caches being tuned at the same time.
There were no lockdep warnings either. (This test on 2,6.18 as 2.6.19-rc
crashes at __drain_pages
http://marc.theaimsgroup.com/?l=linux-kernel&m=116172164217678&w=2 )

The approach here is to hold cache_chain_mutex from CPU_UP_PREPARE until
CPU_ONLINE (similar in approach as worqueue_mutex) . Slab code sensitive
to cpu_online_map (kmem_cache_create, kmem_cache_destroy, slabinfo_write,
__cache_shrink) is already serialized with cache_chain_mutex. (This patch
lengthens cache_chain_mutex hold time at kmem_cache_destroy to cover this).
This patch also takes the cache_chain_sem at kmem_cache_shrink to protect
sanity of cpu_online_map at __cache_shrink, as viewed by slab.
(kmem_cache_shrink->__cache_shrink->drain_cpu_caches). But, really,
kmem_cache_shrink is used at just one place in the acpi subsystem! Do we
really need to keep kmem_cache_shrink at all?

Another note. Looks like a cpu hotplug event can send CPU_UP_CANCELED to
a registered subsystem even if the subsystem did not receive CPU_UP_PREPARE.
This could be due to a subsystem registered for notification earlier than
the current subsystem crapping out with NOTIFY_BAD. Badness can occur with
in the CPU_UP_CANCELED code path at slab if this happens (The same would
apply for workqueue.c as well). To overcome this, we might have to use either
a) a per subsystem flag and avoid handling of CPU_UP_CANCELED, or
b) Use a special notifier events like LOCK_ACQUIRE/RELEASE as Gautham was
using in his experiments, or
c) Do not send CPU_UP_CANCELED to a subsystem which did not receive
CPU_UP_PREPARE.

I would prefer c).

Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2006-12-08 00:39:21 +0800
a44b56d35 [PATCH] slab debug and ARCH_SLAB_MINALIGN don't get along ... Browse Code »

When CONFIG_SLAB_DEBUG is used in combination with ARCH_SLAB_MINALIGN, some
debug flags should be disabled which depend on BYTES_PER_WORD alignment.

The disabling of these debug flags is not properly handled when
BYTES_PER_WORD < ARCH_SLAB_MEMALIGN < cache_line_size()

This patch fixes that and also adds an alignment check to
cache_alloc_debugcheck_after() when ARCH_SLAB_MINALIGN is used.

Signed-off-by: Kevin Hilman
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kevin Hilman
2006-12-08 00:39:21 +0800
cace673d3 [PATCH] htlb forget rss with pt sharing ... Browse Code »

Imprecise RSS accounting is an irritating ill effect with pt sharing. After
consulted with several VM experts, I have tried various methods to solve that
problem: (1) iterate through all mm_structs that share the PT and increment
count; (2) keep RSS count in page table structure and then sum them up at
reporting time. None of the above methods yield any satisfactory
implementation.

Since process RSS accounting is pure information only, I propose we don't
count them at all for hugetlb page. rlimit has such field, though there is
absolutely no enforcement on limiting that resource. One other method is to
account all RSS at hugetlb mmap time regardless they are faulted or not. I
opt for the simplicity of no accounting at all.

Hugetlb page are special, they are reserved up front in global reservation
pool and is not reclaimable. From physical memory resource point of view, it
is already consumed regardless whether there are users using them.

If the concern is that RSS can be used to control resource allocation, we
already can specify hugetlb fs size limit and sysadmin can enforce that at
mount time. Combined with the two points mentioned above, I fail to see if
there is anything got affected because of this patch.

Signed-off-by: Ken Chen
Acked-by: Hugh Dickins
Cc: Dave McCracken
Cc: William Lee Irwin III
Cc: "Luck, Tony"
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: David Gibson
Cc: Adam Litke
Cc: Paul Mundt
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-12-08 00:39:21 +0800
39dde65c9 [PATCH] shared page table for hugetlb page ... Browse Code »
44

Following up with the work on shared page table done by Dave McCracken. This
set of patch target shared page table for hugetlb memory only.

The shared page table is particular useful in the situation of large number of
independent processes sharing large shared memory segments. In the normal
page case, the amount of memory saved from process' page table is quite
significant. For hugetlb, the saving on page table memory is not the primary
objective (as hugetlb itself already cuts down page table overhead
significantly), instead, the purpose of using shared page table on hugetlb is
to allow faster TLB refill and smaller cache pollution upon TLB miss.

With PT sharing, pte entries are shared among hundreds of processes, the cache
consumption used by all the page table is smaller and in return, application
gets much higher cache hit ratio. One other effect is that cache hit ratio
with hardware page walker hitting on pte in cache will be higher and this
helps to reduce tlb miss latency. These two effects contribute to higher
application performance.

Signed-off-by: Ken Chen
Acked-by: Hugh Dickins
Cc: Dave McCracken
Cc: William Lee Irwin III
Cc: "Luck, Tony"
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: David Gibson
Cc: Adam Litke
Cc: Paul Mundt
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-12-08 00:39:21 +0800
e1dbeda60 [PATCH] balance_pdgat() cleanup ... Browse Code »

Despaghettify balance_pdgat() a bit.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-08 00:39:21 +0800
cc1025090 [PATCH] mm: add arch_alloc_page ... Browse Code »

Add an arch_alloc_page to match arch_free_page.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:21 +0800
7602bdf2f [PATCH] new scheme to preempt swap token ... Browse Code »

The new swap token patches replace the current token traversal algo. The old
algo had a crude timeout parameter that was used to handover the token from
one task to another. This algo, transfers the token to the tasks that are in
need of the token. The urgency for the token is based on the number of times
a task is required to swap-in pages. Accordingly, the priority of a task is
incremented if it has been badly affected due to swap-outs. To ensure that
the token doesnt bounce around rapidly, the token holders are given a priority
boost. The priority of tasks is also decremented, if their rate of swap-in's
keeps reducing. This way, the condition to check whether to pre-empt the swap
token, is a matter of comparing two task's priority fields.

[akpm@osdl.org: cleanups]
Signed-off-by: Ashwin Chaugule
Cc: Rik van Riel
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ashwin Chaugule
2006-12-08 00:39:21 +0800
098fe651f [PATCH] grab swap token reordered ... Browse Code »

Make sure the contention for the token happens _before_ any read-in and
kicks the swap-token algo only when the VM is under pressure.

Signed-off-by: Ashwin Chaugule
Cc: Rik van Riel
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ashwin Chaugule
2006-12-08 00:39:21 +0800
cd54e7e54 [PATCH] mm: incorrect VM_FAULT_OOM returns from drivers ... Browse Code »

Some drivers are returning OOM when it is not in response to a memory
shortage.

Signed-off-by: Nick Piggin
Cc: Dave Airlie
Cc: Jaroslav Kysela
Cc: Takashi Iwai
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:20 +0800
f2a2a7108 [PATCH] oom: less memdie ... Browse Code »

Don't cause all threads in all other thread groups to gain TIF_MEMDIE
otherwise we'll get a thundering herd eating our memory reserve. This may not
be the optimal scheme, but it fits our policy of allowing just one TIF_MEMDIE
in the system at once.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:20 +0800
f3af38d30 [PATCH] oom: cleanup messages ... Browse Code »

Clean up the OOM killer messages to be more consistent.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:20 +0800
c33e0fca3 [PATCH] oom: don't kill unkillable children or siblings ... Browse Code »

Abort the kill if any of our threads have OOM_DISABLE set. Having this
test here also prevents any OOM_DISABLE child of the "selected" process
from being killed.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-12-08 00:39:20 +0800
7253f4ef0 [PATCH] memory page_alloc zonelist caching reorder structure ... Browse Code »

Rearrange the struct members in the 'struct zonelist_cache' structure, so
as to put the readonly (once initialized) z_to_n[] array first, where it
will come right after the zones[] array in struct zonelist.

This pretty much eliminates the chance that the two frequently written
elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap
times, will end up on the same cache line as the performance sensitive,
frequently read, never (after init) written zones[] array.

Keeping frequently written data off frequently read cache lines is good for
performance.

Thanks to Rohit Seth for the suggestion.

Signed-off-by: Paul Jackson
Cc: Rohit Seth
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-12-08 00:39:20 +0800
9276b1bc9 [PATCH] memory page_alloc zonelist caching speedup ... Browse Code »

Optimize the critical zonelist scanning for free pages in the kernel memory
allocator by caching the zones that were found to be full recently, and
skipping them.

Remembers the zones in a zonelist that were short of free memory in the
last second. And it stashes a zone-to-node table in the zonelist struct,
to optimize that conversion (minimize its cache footprint.)

Recent changes:

This differs in a significant way from a similar patch that I
posted a week ago. Now, instead of having a nodemask_t of
recently full nodes, I have a bitmask of recently full zones.
This solves a problem that last weeks patch had, which on
systems with multiple zones per node (such as DMA zone) would
take seeing any of these zones full as meaning that all zones
on that node were full.

Also I changed names - from "zonelist faster" to "zonelist cache",
as that seemed to better convey what we're doing here - caching
some of the key zonelist state (for faster access.)

See below for some performance benchmark results. After all that
discussion with David on why I didn't need them, I went and got
some ;). I wanted to verify that I had not hurt the normal case
of memory allocation noticeably. At least for my one little
microbenchmark, I found (1) the normal case wasn't affected, and
(2) workloads that forced scanning across multiple nodes for
memory improved up to 10% fewer System CPU cycles and lower
elapsed clock time ('sys' and 'real'). Good. See details, below.

I didn't have the logic in get_page_from_freelist() for various
full nodes and zone reclaim failures correct. That should be
fixed up now - notice the new goto labels zonelist_scan,
this_zone_full, and try_next_zone, in get_page_from_freelist().

There are two reasons I persued this alternative, over some earlier
proposals that would have focused on optimizing the fake numa
emulation case by caching the last useful zone:

1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)
have seen real customer loads where the cost to scan the zonelist
was a problem, due to many nodes being full of memory before
we got to a node we could use. Or at least, I think we have.
This was related to me by another engineer, based on experiences
from some time past. So this is not guaranteed. Most likely, though.

The following approach should help such real numa systems just as
much as it helps fake numa systems, or any combination thereof.

2) The effort to distinguish fake from real numa, using node_distance,
so that we could cache a fake numa node and optimize choosing
it over equivalent distance fake nodes, while continuing to
properly scan all real nodes in distance order, was going to
require a nasty blob of zonelist and node distance munging.

The following approach has no new dependency on node distances or
zone sorting.

See comment in the patch below for a description of what it actually does.

Technical details of note (or controversy):

- See the use of "zlc_active" and "did_zlc_setup" below, to delay
adding any work for this new mechanism until we've looked at the
first zone in zonelist. I figured the odds of the first zone
having the memory we needed were high enough that we should just
look there, first, then get fancy only if we need to keep looking.

- Some odd hackery was needed to add items to struct zonelist, while
not tripping up the custom zonelists built by the mm/mempolicy.c
code for MPOL_BIND. My usual wordy comments below explain this.
Search for "MPOL_BIND".

- Some per-node data in the struct zonelist is now modified frequently,
with no locking. Multiple CPU cores on a node could hit and mangle
this data. The theory is that this is just performance hint data,
and the memory allocator will work just fine despite any such mangling.
The fields at risk are the struct 'zonelist_cache' fields 'fullzones'
(a bitmask) and 'last_full_zap' (unsigned long jiffies). It should
all be self correcting after at most a one second delay.

- This still does a linear scan of the same lengths as before. All
I've optimized is making the scan faster, not algorithmically
shorter. It is now able to scan a compact array of 'unsigned
short' in the case of many full nodes, so one cache line should
cover quite a few nodes, rather than each node hitting another
one or two new and distinct cache lines.

- If both Andi and Nick don't find this too complicated, I will be
(pleasantly) flabbergasted.

- I removed the comment claiming we only use one cachline's worth of
zonelist. We seem, at least in the fake numa case, to have put the
lie to that claim.

- I pay no attention to the various watermarks and such in this performance
hint. A node could be marked full for one watermark, and then skipped
over when searching for a page using a different watermark. I think
that's actually quite ok, as it will tend to slightly increase the
spreading of memory over other nodes, away from a memory stressed node.

===============

Performance - some benchmark results and analysis:

This benchmark runs a memory hog program that uses multiple
threads to touch alot of memory as quickly as it can.

Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of
the total 96 GBytes on the system, and using 1, 19, 37, or 55
threads (on a 56 CPU system.) System, user and real (elapsed)
timings were recorded for each run, shown in units of seconds,
in the table below.

Two kernels were tested - 2.6.18-mm3 and the same kernel with
this zonelist caching patch added. The table also shows the
percentage improvement the zonelist caching sys time is over
(lower than) the stock *-mm kernel.

number 2.6.18-mm3 zonelist-cache delta (< 0 good) percent
GBs N ------------ -------------- ---------------- systime
mem threads sys user real sys user real sys user real better
12 1 153 24 177 151 24 176 -2 0 -1 1%
12 19 99 22 8 99 22 8 0 0 0 0%
12 37 111 25 6 112 25 6 1 0 0 -0%
12 55 115 25 5 110 23 5 -5 -2 0 4%
38 1 502 74 576 497 73 570 -5 -1 -6 0%
38 19 426 78 48 373 76 39 -53 -2 -9 12%
38 37 544 83 36 547 82 36 3 -1 0 -0%
38 55 501 77 23 511 80 24 10 3 1 -1%
64 1 917 125 1042 890 124 1014 -27 -1 -28 2%
64 19 1118 138 119 965 141 103 -153 3 -16 13%
64 37 1202 151 94 1136 150 81 -66 -1 -13 5%
64 55 1118 141 61 1072 140 58 -46 -1 -3 4%
90 1 1342 177 1519 1275 174 1450 -67 -3 -69 4%
90 19 2392 199 192 2116 189 176 -276 -10 -16 11%
90 37 3313 238 175 2972 225 145 -341 -13 -30 10%
90 55 1948 210 104 1843 213 100 -105 3 -4 5%

Notes:
1) This test ran a memory hog program that started a specified number N of
threads, and had each thread allocate and touch 1/N'th of
the total memory to be used in the test run in a single loop,
writing a constant word to memory, one store every 4096 bytes.
Watching this test during some earlier trial runs, I would see
each of these threads sit down on one CPU and stay there, for
the remainder of the pass, a different CPU for each thread.

2) The 'real' column is not comparable to the 'sys' or 'user' columns.
The 'real' column is seconds wall clock time elapsed, from beginning
to end of that test pass. The 'sys' and 'user' columns are total
CPU seconds spent on that test pass. For a 19 thread test run,
for example, the sum of 'sys' and 'user' could be up to 19 times the
number of 'real' elapsed wall clock seconds.

3) Tests were run on a fresh, single-user boot, to minimize the amount
of memory already in use at the start of the test, and to minimize
the amount of background activity that might interfere.

4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM.

5) Notice that the 'real' time gets large for the single thread runs, even
though the measured 'sys' and 'user' times are modest. I'm not sure what
that means - probably something to do with it being slow for one thread to
be accessing memory along ways away. Perhaps the fake numa system, running
ostensibly the same workload, would not show this substantial degradation
of 'real' time for one thread on many nodes -- lets hope not.

6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs)
ran quite efficiently, as one might expect. Each pair of threads needed
to allocate and touch the memory on the node the two threads shared, a
pleasantly parallizable workload.

7) The intermediate thread count passes, when asking for alot of memory forcing
them to go to a few neighboring nodes, improved the most with this zonelist
caching patch.

Conclusions:
* This zonelist cache patch probably makes little difference one way or the
other for most workloads on real numa hardware, if those workloads avoid
heavy off node allocations.
* For memory intensive workloads requiring substantial off-node allocations
on real numa hardware, this patch improves both kernel and elapsed timings
up to ten per-cent.
* For fake numa systems, I'm optimistic, but will have to leave that up to
Rohit Seth to actually test (once I get him a 2.6.18 backport.)

Signed-off-by: Paul Jackson
Cc: Rohit Seth
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-12-08 00:39:20 +0800
89689ae7f [PATCH] Get rid of zone_table[] ... Browse Code »

The zone table is mostly not needed. If we have a node in the page flags
then we can get to the zone via NODE_DATA() which is much more likely to be
already in the cpu cache.

In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
access an exact replica of zonetable in the node_zones field. In all of
the above cases there will be no need at all for the zone table.

The only remaining case is if in a NUMA system the node numbers do not fit
into the page flags. In that case we make sparse generate a table that
maps sections to nodes and use that table to to figure out the node number.
This table is sized to fit in a single cache line for the known 32 bit
NUMA platform which makes it very likely that the information can be
obtained without a cache miss.

For sparsemem the zone table seems to be have been fairly large based on
the maximum possible number of sections and the number of zones per node.
There is some memory saving by removing zone_table. The main benefit is to
reduce the cache foootprint of the VM from the frequent lookups of zones.
Plus it simplifies the page allocator.

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter
Cc: Dave Hansen
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:20 +0800
c0a499c2c [PATCH] __unmap_hugepage_range(): add comment ... Browse Code »

Signed-off-by: Ken Chen
Cc: David Gibson
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen, Kenneth W
2006-12-08 00:39:20 +0800
0798e5193 [PATCH] memory page alloc minor cleanups ... Browse Code »

- s/freeliest/freelist/ spelling fix

- Check for NULL *z zone seems useless - even if it could happen, so
what? Perhaps we should have a check later on if we are faced with an
allocation request that is not allowed to fail - shouldn't that be a
serious kernel error, passing an empty zonelist with a mandate to not
fail?

- Initializing 'z' to zonelist->zones can wait until after the first
get_page_from_freelist() fails; we only use 'z' in the wakeup_kswapd()
loop, so let's initialize 'z' there, in a 'for' loop. Seems clearer.

- Remove superfluous braces around a break

- Fix a couple errant spaces

- Adjust indentation on the cpuset_zone_allowed() check, to match the
lines just before it -- seems easier to read in this case.

- Add another set of braces to the zone_watermark_ok logic

From: Paul Jackson

Backout one item from a previous "memory page_alloc minor cleanups" patch.
Until and unless we are certain that no one can ever pass an empty zonelist
to __alloc_pages(), this check for an empty zonelist (or some BUG
equivalent) is essential. The code in get_page_from_freelist() blow ups if
passed an empty zonelist.

Signed-off-by: Paul Jackson
Acked-by: Christoph Lameter
Cc: Nick Piggin
Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-12-08 00:39:20 +0800
a2ce77409 [PATCH] uml: workqueue build fix ... Browse Code »

arch/um/drivers/chan_kern.c:643: error: conflicting types for 'chan_interrupt'
arch/um/include/chan_kern.h:31: error: previous declaration of 'chan_interrupt'

Cc: David Howells
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-08 00:39:20 +0800
822191a2f [PATCH] skip data conversion in compat_sys_mount when data_page is NULL ... Browse Code »

OpenVZ Linux kernel team has found a problem with mounting in compat mode.

Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode
leads to oops:

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: compat_sys_mount+0xd6/0x290
Process mount (pid: 14656, veid=300, threadinfo ffff810034d30000, task ffff810034c86bc0)
Call Trace: ia32_sysret+0x0/0xa

The problem is that data_page pointer can be NULL, so we should skip data
conversion in this case.

Signed-off-by: Andrey Mirkin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Mirkin
2006-12-08 00:39:20 +0800
a1e85378b [PATCH] drm-sis linkage fix ... Browse Code »

Fix http://bugzilla.kernel.org/show_bug.cgi?id=7606

WARNING: "drm_sman_set_manager" [drivers/char/drm/sis.ko] undefined!

Cc:
Cc: Dave Airlie
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-08 00:39:20 +0800
676dcb8bc [PATCH] add bottom_half.h ... Browse Code »

With CONFIG_SMP=n:

drivers/input/ff-memless.c:384: warning: implicit declaration of function 'local_bh_disable'
drivers/input/ff-memless.c:393: warning: implicit declaration of function 'local_bh_enable'

Really linux/spinlock.h should include linux/interrupt.h. But interrupt.h
includes sched.h which will need spinlock.h.

So the patch breaks the _bh declarations out into a separate header and
includes it in both interrupt.h and spinlock.h.

Cc: "Randy.Dunlap"
Cc: Andi Kleen
Cc:
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-12-08 00:39:20 +0800

07 Dec, 2006

16 commits

620034c84 [PATCH] A few small additions and corrections to README ... Browse Code »

Here's a small patch which

- adds a few archs to the current list of supported platforms.
- adds a few missing slashes at the end of URLs.
- adds a few references to additional documentation.
- adds "make config" to the list of possible configuration targets.
- makes a few other minor changes.

Signed-off-by: Jesper Juhl
[ Ben Nizette points out AVR32 arch too ]
Signed-off-by: Linus Torvalds

Jesper Juhl
2006-12-07 08:41:21 +0800
6fc52f81a [PATCH] Clean up 'make help' output for documentation targets. ... Browse Code »

Here's a patch that cleans up the "make help" output a bit for the
documentation targets.

Currently the documentation targets are listed completely different than
all the other targets :

Documentation targets:
Linux kernel internal documentation in different formats:
xmldocs (XML DocBook), psdocs (Postscript), pdfdocs (PDF)
htmldocs (HTML), mandocs (man pages, use installmandocs to install)

with this patch they are more in line with the rest of the output :

Documentation targets:
Linux kernel internal documentation in different formats:
htmldocs - HTML
installmandocs - install man pages generated by mandocs
mandocs - man pages
pdfdocs - PDF
psdocs - Postscript
xmldocs - XML DocBook

Signed-off-by: Jesper Juhl
Acked-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Jesper Juhl
2006-12-07 08:38:58 +0800
3f5e573a0 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus ... Browse Code »

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Import updates from i386's i8259.c
[MIPS] *-berr: Header inclusions for DEC bus error handlers
[MIPS] Compile __do_IRQ() when really needed
[MIPS] genirq: use name instead of typename
[MIPS] Do not use handle_level_irq for ioasic_dma_irq_type.
[MIPS] pte_offset(dir,addr): parenthesis fix

Linus Torvalds
2006-12-07 08:17:37 +0800
9232d5876 Merge branch 'release' of master.kernel.org:/home/ftp/pub/scm/linux/kernel/git/aegl/linux-2.6 ... Browse Code »

* 'release' of master.kernel.org:/home/ftp/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix pci.c kernel compilation breakage.

Linus Torvalds
2006-12-07 08:16:35 +0800
2fdb611d3 [PATCH] ... and then some more work_struct-induced breakage (ibmvscsi) ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2006-12-07 06:51:14 +0800
91c7c5685 [PATCH] ... and more work_struct-induced breakage (mips) ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2006-12-07 06:51:14 +0800
4927b3f74 [PATCH] More work_struct induced breakage (s390) ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2006-12-07 06:51:14 +0800
f9e9dcb38 x86[-64]:Remove 'volatile' from atomic_t ... Browse Code »

Any code that relies on the volatile would be a bug waiting to happen
anyway.

Don't encourage people to think that putting 'volatile' on data
structures somehow fixes problems. We should always use proper locking
(and other serialization) techniques.

Signed-off-by: Linus Torvalds

Linus Torvalds
2006-12-07 06:42:57 +0800
16afea025 [PATCH] Remove 'volatile' from spinlock_types ... Browse Code »

This is a resubmission of patches originally created by Ingo Molnar.
The link below is the initial (?) posting of the patch.

http://marc.theaimsgroup.com/?l=linux-kernel&m=115217423929806&w=2

Remove 'volatile' from spinlock_types as it causes GCC to generate bad
code (see link) and locking should be used on kernel data.

Signed-off-by: Art Haas
Signed-off-by: Linus Torvalds

Art Haas
2006-12-07 06:39:53 +0800
c7f570a5e [IA64] Fix pci.c kernel compilation breakage. ... Browse Code »

The recent change to convert the is_enabled flag in the PCI device to an
atomic count broke the IA64 compilation.

As pcibios_disable_device is only ever called if the reference count
is zero, convert the if to a BUG_ON.

Signed-off-by: Peter Chubb
Signed-off-by: Tony Luck

Peter Chubb
2006-12-07 06:13:38 +0800
2cafe9784 [MIPS] Import updates from i386's i8259.c ... Browse Code »

Import many updates from i386's i8259.c, especially genirq transitions.

Signed-off-by: Atsushi Nemoto
Signed-off-by: Ralf Baechle

Atsushi Nemoto
2006-12-07 04:16:09 +0800
49afb1f67 [MIPS] *-berr: Header inclusions for DEC bus error handlers ... Browse Code »

A fixup to add missing header inclusions for bus error handlers for
DECstation system after the recent switch to get_irq_regs().

Signed-off-by: Maciej W. Rozycki
Signed-off-by: Ralf Baechle

Maciej W. Rozycki
2006-12-07 04:16:09 +0800
e77c232cf [MIPS] Compile __do_IRQ() when really needed ... Browse Code »

__do_IRQ() is needed only by irq handlers that can't use
default handlers defined in kernel/irq/chip.c.

For others platforms there's no need to compile this function
since it won't be used. For those platforms this patch defines
GENERIC_HARDIRQS_NO__DO_IRQ symbol which is used exactly for
this purpose.

Futhermore for platforms which do not use __do_IRQ(), end()
method which is part of the 'irq_chip' structure is not used.
This patch simply removes this method in this case.

Signed-off-by: Franck Bui-Huu
Signed-off-by: Ralf Baechle

Franck Bui-Huu
2006-12-07 04:16:08 +0800
1ccd1c1c3 [MIPS] genirq: use name instead of typename ... Browse Code »

The "typename" field was obsoleted by the "name" field.

Signed-off-by: Atsushi Nemoto
Signed-off-by: Ralf Baechle

Atsushi Nemoto
2006-12-07 04:16:08 +0800
25ba2f506 [MIPS] Do not use handle_level_irq for ioasic_dma_irq_type. ... Browse Code »

Signed-off-by: Atsushi Nemoto
Signed-off-by: Ralf Baechle

Atsushi Nemoto
2006-12-07 04:16:08 +0800
5b70a3170 [MIPS] pte_offset(dir,addr): parenthesis fix ... Browse Code »

This patch adds missing parenthesis around 'dir' argument in pte_offset()
macro definition.

It also removes an extra space in the definition of pte_offset_kernel()
macro.

Signed-off-by: Franck Bui-Huu
Signed-off-by: Ralf Baechle

Franck Bui-Huu
2006-12-07 04:16:08 +0800