Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2007

12 commits

f438d914b kprobes: support kretprobe blacklist ... Browse Code »

Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.

This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.

Signed-off-by: Masami Hiramatsu
Cc: Prasanna S Panchamukhi
Acked-by: Ananth N Mavinakayanahalli
Cc: Anil S Keshavamurthy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masami Hiramatsu
2007-10-17 00:43:10 +0800
607717a65 cpuset: remove sched domain hooks from cpusets ... Browse Code »

Remove the cpuset hooks that defined sched domains depending on the setting
of the 'cpu_exclusive' flag.

The cpu_exclusive flag can only be set on a child if it is set on the
parent.

This made that flag painfully unsuitable for use as a flag defining a
partitioning of a system.

It was entirely unobvious to a cpuset user what partitioning of sched
domains they would be causing when they set that one cpu_exclusive bit on
one cpuset, because it depended on what CPUs were in the remainder of that
cpusets siblings and child cpusets, after subtracting out other
cpu_exclusive cpusets.

Furthermore, there was no way on production systems to query the
result.

Using the cpu_exclusive flag for this was simply wrong from the get go.

Fortunately, it was sufficiently borked that so far as I know, almost no
successful use has been made of this. One real time group did use it to
affectively isolate CPUs from any load balancing efforts. They are willing
to adapt to alternative mechanisms for this, such as someway to manipulate
the list of isolated CPUs on a running system. They can do without this
present cpu_exclusive based mechanism while we develop an alternative.

There is a real risk, to the best of my understanding, of users
accidentally setting up a partitioned scheduler domains, inhibiting desired
load balancing across all their CPUs, due to the nonobvious (from the
cpuset perspective) side affects of the cpu_exclusive flag.

Furthermore, since there was no way on a running system to see what one was
doing with sched domains, this change will be invisible to any using code.
Unless they have real insight to the scheduler load balancing choices, they
will be unable to detect that this change has been made in the kernel's
behaviour.

Initial discussion on lkml of this patch has generated much comment. My
(probably controversial) take on that discussion is that it has reached a
rough concensus that the current cpuset cpu_exclusive mechanism for
defining sched domains is borked. There is no concensus on the
replacement. But since we can remove this mechanism, and since its
continued presence risks causing unwanted partitioning of the schedulers
load balancing, we should remove it while we can, as we proceed to work the
replacement scheduler domain mechanisms.

Signed-off-by: Paul Jackson
Cc: Ingo Molnar
Cc: Nick Piggin
Cc: Christoph Lameter
Cc: Dinakar Guniguntala
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2007-10-17 00:43:09 +0800
0ac155591 m32r: convert to generic sys_ptrace ... Browse Code »

Convert m32r to the generic sys_ptrace. The conversion requires an
architecture hook after ptrace_attach which this patch adds. The hook
will also be needed for a conersion of ia64 to the generic ptrace code.

Thanks to Hirokazu Takata for fixing a bug in the first version of this
code.

Signed-off-by: Christoph Hellwig
Cc: Hirokazu Takata
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2007-10-17 00:43:04 +0800
54f9f80d6 hugetlb: Add hugetlb_dynamic_pool sysctl ... Browse Code »

The maximum size of the huge page pool can be controlled using the overall
size of the hugetlb filesystem (via its 'size' mount option). However in the
common case the this will not be set as the pool is traditionally fixed in
size at boot time. In order to maintain the expected semantics, we need to
prevent the pool expanding by default.

This patch introduces a new sysctl controlling dynamic pool resizing. When
this is enabled the pool will expand beyond its base size up to the size of
the hugetlb filesystem. It is disabled by default.

Signed-off-by: Adam Litke
Acked-by: Andy Whitcroft
Acked-by: Dave McCracken
Cc: William Irwin
Cc: David Gibson
Cc: Ken Chen
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2007-10-17 00:43:02 +0800
75884fb1c memory unplug: memory hotplug cleanup ... Browse Code »

A clean up patch for "scanning memory resource [start, end)" operation.

Now, find_next_system_ram() function is used in memory hotplug, but this
interface is not easy to use and codes are complicated.

This patch adds walk_memory_resouce(start,len,arg,func) function.
The function 'func' is called per valid memory resouce range in [start,pfn).

[pbadari@us.ibm.com: Error handling in walk_memory_resource()]
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-10-17 00:43:01 +0800
e12ba74d8 Group short-lived and reclaimable kernel allocations ... Browse Code »

This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations. When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved. i.e. they can be migrated by deleting
them and re-reading the information from elsewhere.

Signed-off-by: Mel Gorman
Cc: Andy Whitcroft
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2007-10-17 00:43:00 +0800
0e1e7c7a7 Memoryless nodes: Use N_HIGH_MEMORY for cpusets ... Browse Code »

cpusets try to ensure that any node added to a cpuset's mems_allowed is
on-line and contains memory. The assumption was that online nodes contained
memory. Thus, it is possible to add memoryless nodes to a cpuset and then add
tasks to this cpuset. This results in continuous series of oom-kill and
apparent system hang.

Change cpusets to use node_states[N_HIGH_MEMORY] [a.k.a. node_memory_map] in
place of node_online_map when vetting memories. Return error if admin
attempts to write a non-empty mems_allowed node mask containing only
memoryless-nodes.

Signed-off-by: Lee Schermerhorn
Signed-off-by: Bob Picco
Signed-off-by: Nishanth Aravamudan
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-10-17 00:42:59 +0800
4199cfa02 Memoryless nodes: Allow profiling data to fall back to other nodes ... Browse Code »

Processors on memoryless nodes must be able to fall back to remote nodes in
order to get a profiling buffer. This may lead to excessive NUMA traffic but
I think we should allow this rather than failing.

Signed-off-by: Christoph Lameter
Acked-by: Nishanth Aravamudan
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-10-17 00:42:58 +0800
74a0b5762 x86: optimize page faults like all other achitectures and kill notifier cruft ... Browse Code »

x86(-64) are the last architectures still using the page fault notifier
cruft for the kprobes page fault hook. This patch converts them to the
proper direct calls, and removes the now unused pagefault notifier bits
aswell as the cruft in kprobes.c that was related to this mess.

I know Andi didn't really like this, but all other architecture maintainers
agreed the direct calls are much better and besides the obvious cruft
removal a common way of dealing with kprobes across architectures is
important aswell.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc64]
Signed-off-by: Christoph Hellwig
Cc: Andi Kleen
Cc:
Cc: Prasanna S Panchamukhi
Cc: Ananth N Mavinakayanahalli
Cc: Anil S Keshavamurthy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2007-10-17 00:42:50 +0800
d5a7430dd Convert cpu_sibling_map to be a per cpu variable ... Browse Code »

Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.

Signed-off-by: Mike Travis
Cc: Andi Kleen
Cc: Christoph Lameter
Cc: "Siddha, Suresh B"
Cc: "David S. Miller"
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Travis
2007-10-17 00:42:50 +0800
bfe8df3d3 slow down printk during boot ... Browse Code »

Optionally add a boot delay after each kernel printk() call, crudely
measured in milliseconds, with a maximum delay of 10 seconds per printk.

Enable CONFIG_BOOT_PRINTK_DELAY=y and then add (e.g.):
"lpj=loops_per_jiffy boot_delay=100"
to the kernel command line.

It has been useful in cases like "during boot, my machine just reboots or the
screen goes black" by slowing down printk, (and adding initcall_debug), we can
usually see the last thing that happened before the lights went out which is
usually a valuable clue.

[akpm@linux-foundation.org: not all architectures implement CONFIG_HZ]
[akpm@linux-foundation.org: fix lots of stuff]
[bunk@stusta.de: kernel/printk.c: make 2 variables static]
[heiko.carstens@de.ibm.com: fix slow down printk on boot compile error]
Signed-off-by: Randy Dunlap
Signed-off-by: Dave Jones
Signed-off-by: Adrian Bunk
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2007-10-17 00:42:49 +0800
1bcf54829 Consolidate PTRACE_DETACH ... Browse Code »

Identical handlers of PTRACE_DETACH go into ptrace_request().
Not touching compat code.
Not touching archs that don't call ptrace_request.

Signed-off-by: Alexey Dobriyan
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2007-10-17 00:42:49 +0800

16 Oct, 2007

2 commits

f4921aff5 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 ... Browse Code »

* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
NFSv4: Fix a typo in nfs_inode_reclaim_delegation
NFS: Add a boot parameter to disable 64 bit inode numbers
NFS: nfs_refresh_inode should clear cache_validity flags on success
NFS: Fix a connectathon regression in NFSv3 and NFSv4
NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
SUNRPC: Don't call xprt_release in call refresh
SUNRPC: Don't call xprt_release() if call_allocate fails
SUNRPC: Fix buggy UDP transmission
[23/37] Clean up duplicate includes in
[2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
SUNRPC: Use correct type in buffer length calculations
SUNRPC: Fix default hostname created in rpc_create()
nfs: add server port to rpc_pipe info file
NFS: Get rid of some obsolete macros
NFS: Simplify filehandle revalidation
NFS: Ensure that nfs_link() returns a hashed dentry
NFS: Be strict about dentry revalidation when doing exclusive create
NFS: Don't zap the readdir caches upon error
NFS: Remove the redundant nfs_reval_fsid()
NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c

Linus Torvalds
2007-10-16 01:47:35 +0800
419217cb1 Merge branch 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…peterz/linux-2.6-lockdep

* 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
lockdep: annotate dir vs file i_mutex
lockdep: per filesystem inode lock class
lockdep: annotate kprobes irq fiddling
lockdep: annotate rcu_read_{,un}lock{,_bh}
lockdep: annotate journal_start()
lockdep: s390: connect the sysexit hook
lockdep: x86_64: connect the sysexit hook
lockdep: i386: connect the sysexit hook
lockdep: syscall exit check
lockdep: fixup mutex annotations
lockdep: fix mismatched lockdep_depth/curr_chain_hash
lockdep: Avoid /proc/lockdep & lock_stat infinite output
lockdep: maintainers

Linus Torvalds
2007-10-16 01:40:41 +0800

15 Oct, 2007

26 commits

9c63d9c02 sched: sync wakeups preempt too ... Browse Code »

make sure sync wakeups preempt too - the scheduler will not
overschedule as we've got various throttles against that.
As a result, sync wakeups can be used more widely in the kernel
(to signal wakeup affinity between tasks), and no arbitrary
latencies will be introduced either.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:20 +0800
71e20f187 sched: affine sync wakeups ... Browse Code »

make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:19 +0800
94886b84b sched: guest CPU accounting: maintain stats in account_system_time() ... Browse Code »

modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.

Signed-off-by: Laurent Vivier
Acked-by: Avi Kivity
Signed-off-by: Ingo Molnar

Laurent Vivier
2007-10-15 23:00:19 +0800
9ac52315d sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields ... Browse Code »

like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.

Modify /proc//stat to display these new fields.

Signed-off-by: Laurent Vivier
Acked-by: Avi Kivity
Signed-off-by: Ingo Molnar

Laurent Vivier
2007-10-15 23:00:19 +0800
6323469f9 sched: domain sysctl fixes: add terminator comment ... Browse Code »

we had an incorrect-terminator bug in sd_alloc_ctl_domain_table()
before, so add a comment that documents it.

Signed-off-by: Milton Miller
Signed-off-by: Ingo Molnar

Milton Miller
2007-10-15 23:00:19 +0800
ad1cdc1d7 sched: domain sysctl fixes: do not crash on allocation failure ... Browse Code »

Now that we are calling this at runtime, a more relaxed error path is
suggested. If an allocation fails, we just register the partial table,
which will show empty directories.

Signed-off-by: Milton Miller
Signed-off-by: Ingo Molnar

Milton Miller
2007-10-15 23:00:19 +0800
6382bc90f sched: domain sysctl fixes: unregister the sysctl table before domains ... Browse Code »

Unregister and free the sysctl table before destroying domains, then
rebuild and register after creating the new domains. This prevents the
sysctl table from pointing to freed memory for root to write.

Signed-off-by: Milton Miller
Signed-off-by: Ingo Molnar

Milton Miller
2007-10-15 23:00:19 +0800
97b6ea7b6 sched: domain sysctl fixes: use for_each_online_cpu() ... Browse Code »

init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu
variables. If the cpus_possible mask is not contigious this will result
in a crash referencing unallocated data. If the online mask is not
contigious then we would show offline cpus and miss online ones.

Signed-off-by: Milton Miller
Signed-off-by: Ingo Molnar

Milton Miller
2007-10-15 23:00:19 +0800
5cf9f062c sched: domain sysctl fixes: use kcalloc() ... Browse Code »

kcalloc checks for n * sizeof(element) overflows and it zeros.

Signed-off-by: Milton Miller
Signed-off-by: Ingo Molnar

Milton Miller
2007-10-15 23:00:19 +0800
0dbee3a6b Make scheduler debug file operations const ... Browse Code »

In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.

Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar

Arjan van de Ven
2007-10-15 23:00:19 +0800
6bc1665ba sched: allow the immediate migration of cache-cold tasks ... Browse Code »

allow the immediate migration of cache-cold tasks.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:18 +0800
cc367732f sched: debug, improve migration statistics ... Browse Code »

add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc//sched.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:18 +0800
2d92f2278 sched: debug: increase width of debug line ... Browse Code »

increase width of debug line - in preparation of more debugging info.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:18 +0800
ff56b2f01 sched: activate task_hot() only on fair-scheduled tasks ... Browse Code »

activate task_hot() only for fair-scheduled tasks (i.e. disable it
for RT tasks).

Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2007-10-15 23:00:18 +0800
da84d9617 sched: reintroduce cache-hot affinity ... Browse Code »

reintroduce a simplified version of cache-hot/cold scheduling
affinity. This improves performance with certain SMP workloads,
such as sysbench.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:18 +0800
e5f32a385 sched: speed up context-switches a bit ... Browse Code »

speed up context-switches a bit by not clearing p->exec_start.

(as a side-effect, this also makes p->exec_start a universal timestamp
available to cache-hot estimations.)

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:18 +0800
91c234b4e sched: do not wakeup-preempt with SCHED_BATCH tasks ... Browse Code »

do not wakeup-preempt with SCHED_BATCH tasks, their preemption
is batched too, driven by the tick.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:18 +0800
fb7dde37e sched: generate uevents for user creation/destruction ... Browse Code »

Generate uevents when a user is being created/destroyed. These events
can be used to configure cpu share of a new user.

Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Dhaval Giani
Signed-off-by: Ingo Molnar

Srivatsa Vaddagiri
2007-10-15 23:00:18 +0800
178be7934 sched: do not normalize kernel threads via SysRq-N ... Browse Code »

do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.

pointed out by Andi Kleen.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:18 +0800
1666703af sched: remove stale comment from sched_group_set_shares() ... Browse Code »

remove stale comment from sched_group_set_shares().

Function never returns -EINVAL.

Signed-off-by: Andi Kleen
Signed-off-by: Ingo Molnar

Andi Kleen
2007-10-15 23:00:18 +0800
d5036e89d sched: clean up is_migration_thread() ... Browse Code »

clean up is_migration_thread() and turn it into an inline function.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:15 +0800
3a5e4dc12 sched: cleanup: refactor normalize_rt_tasks ... Browse Code »

Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier to read.

Signed-off-by: Andi Kleen
Signed-off-by: Ingo Molnar

Andi Kleen
2007-10-15 23:00:15 +0800
8cbbe86df sched: cleanup: refactor common code of sleep_on / wait_for_completion ... Browse Code »

Refactor common code of sleep_on / wait_for_completion

These functions were largely cut'n'pasted. This moves
the common code into single helpers instead. Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.

Signed-off-by: Andi Kleen
Signed-off-by: Ingo Molnar

Andi Kleen
2007-10-15 23:00:14 +0800
3a5c359a5 sched: cleanup: remove unnecessary gotos ... Browse Code »

Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;

No functional changes.

Signed-off-by: Andi Kleen
Signed-off-by: Ingo Molnar

Andi Kleen
2007-10-15 23:00:14 +0800
d274a4cee sched: update comment ... Browse Code »

update comment: clarify time-slices and remove obsolete tuning detail.

Signed-off-by: Ingo Molnar

Ingo Molnar
2007-10-15 23:00:14 +0800
95938a35c sched: prevent wakeup over-scheduling ... Browse Code »

Prevent wakeup over-scheduling. Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept. Instead, the
task is marked for preemption at the next tick. Tasks of higher
priority still preempt immediately.

Signed-off-by: Mike Galbraith
Signed-off-by: Ingo Molnar

Mike Galbraith
2007-10-15 23:00:14 +0800