10 Jul, 2013
40 commits
-
Merge together the unicore32, arm, and x86 reboot= command line
parameter handling.Signed-off-by: Robin Holt
Cc: H. Peter Anvin
Cc: Russell King
Cc: Guan Xuetao
Cc: Russ Anderson
Cc: Robin Holt
Acked-by: Ingo Molnar
Acked-by: Guan Xuetao
Acked-by: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Preparing to move the parsing of reboot= to generic kernel code forces
the change in reboot_mode handling to use the enum.[akpm@linux-foundation.org: fix arch/arm/mach-socfpga/socfpga.c]
Signed-off-by: Robin Holt
Cc: Russell King
Cc: Russ Anderson
Cc: Robin Holt
Cc: H. Peter Anvin
Cc: Guan Xuetao
Acked-by: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.Signed-off-by: Robin Holt
Cc: Russell King
Cc: Russ Anderson
Cc: Robin Holt
Cc: H. Peter Anvin
Cc: Guan Xuetao
Acked-by: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
These restart_mode fields are not used at all. Remove them to make
moving the reboot= cmdline options to the general kernel easier.Signed-off-by: Robin Holt
Cc: Russell King
Cc: Russ Anderson
Cc: Robin Holt
Cc: H. Peter Anvin
Cc: Guan Xuetao
Acked-by: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.Signed-off-by: Robin Holt
Cc: Guan Xuetao
Cc: Russ Anderson
Cc: Robin Holt
Cc: Russell King
Cc: H. Peter Anvin
Acked-by: Guan Xuetao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.Signed-off-by: Robin Holt
Cc: H. Peter Anvin
Cc: Miguel Boton
Cc: Russ Anderson
Cc: Robin Holt
Cc: Russell King
Cc: Guan Xuetao
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Get the new file to pass scripts/checkpatch.pl
Signed-off-by: Robin Holt
Cc: H. Peter Anvin
Cc: Russ Anderson
Cc: Robin Holt
Cc: Russell King
Cc: Guan Xuetao
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch is preparatory. It moves reboot related syscall, etc
functions from kernel/sys.c to kernel/reboot.c.Signed-off-by: Robin Holt
Cc: H. Peter Anvin
Cc: Russ Anderson
Cc: Robin Holt
Cc: Russell King
Cc: Guan Xuetao
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the prior patch's #define for easier backporting to the stable
releases.Signed-off-by: Robin Holt
Cc: H. Peter Anvin
Cc: Russ Anderson
Cc: Robin Holt
Cc: Russell King
Cc: Guan Xuetao
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Graft AIX partitions enumeration into partitions/msdos.c
There is already a AIX disks detection logic in msdos.c. When an AIX disk
has been found, and if configured to, call the aix partitions recognizer.
This avoids removal of AIX disks protection from msdos.c, avoids code
duplication, and ensures that AIX partitions enumeration is called before
plain msdos partitions enumeration.Signed-off-by: Philippe De Muyter
Cc: Karel Zak
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add partitions/aix.h and partitions/aix.c.
AIX LVM permits to make "logical volumes" which are made of multiple
slices of multiple disks. The new code allows only access to the
"logical volumes" which are made of one slice on the probed disk, a
slice being a contiguous disk area. The code also detects "logical
volumes" made of multiple slices on the probed disk, but can not
describe them to the partition layer, because the partition layer
generic code does not support that. When such non-contiguous "logical
volumes" are detected, a diagnostic message is printed.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Philippe De Muyter
Cc: Karel Zak
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Philippe De Muyter
Cc: Karel Zak
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Smatch complains that on 64 bit systems, there is a hole in the
MW_ABILITIES struct between ->component_count and ->component_list[].
It leaks stack information from the mwave_ioctl() function.I've added a memset() to initialize the struct to zero.
Signed-off-by: Dan Carpenter
Cc: Greg KH
Cc: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cleanup: Some minor points that I noticed while writing the previous
patches1) The name try_atomic_semop() is misleading: The function performs the
operation (if it is possible).2) Some documentation updates.
No real code change, a rename and documentation changes.
Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sem_otime contains the time of the last semaphore operation that
completed successfully. Every operation updates this value, thus access
from multiple cpus can cause thrashing.Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.No performance improvement on a single-socket i3 - only important for
larger systems.Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.The patch restores the behavior of linux pending_alter is used.
- otherwise, the per-semaphore queues are used.As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Introduce separate queues for operations that do not modify the
semaphore values. Advantages:- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations are
always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As now each semaphore has its own spinlock and parallel operations are
possible, give each semaphore its own cacheline.On a i3 laptop, this gives up to 28% better performance:
#semscale 10 | grep "interleave 2"
- before:
Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
Cpus 4, interleave 2 delay 0: 81963928 in 10 secs-after:
Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
use first the full cores. HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.
Rationale:
The SysV sem code tries to move the main spinlock into a seperate
cacheline (____cacheline_aligned_in_smp). This works only if
ipc_rcu_alloc returns cacheline aligned pointers. vmalloc and kmalloc
return cacheline algined pointers, the implementation of ipc_rcu_alloc
breaks that.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Cc: Rik van Riel
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We can now drop the msg_lock and msg_lock_check functions along with a
bogus comment introduced previously in semctl_down.Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
do_msgrcv() is the last msg queue function that abuses the ipc lock Take
it only when needed when actually updating msq.Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
do_msgsnd() is another function that does too many things with the ipc
object lock acquired. Take it only when needed when actually updating
msq.Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While the INFO cmd doesn't take the ipc lock, the STAT commands do
acquire it unnecessarily. We can do the permissions and security checks
only holding the rcu lock.This function now mimics semctl_nolock().
Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add msq_obtain_object() and msq_obtain_object_check(), which will allow
us to get the ipc object without acquiring the lock. Just as with
semaphores, these functions are basically wrappers around
ipc_obtain_object*().Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
can be performed without acquiring the ipc object.Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
out of msgctl(). This change still takes the lock and it will be
properly lockless in the next patchSigned-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Instead of holding the ipc lock for the entire function, use the
ipcctl_pre_down_nolock and only acquire the lock for specific commands:
RMID and SET.Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Simple helpers around the (kern_ipc_perm *)->lock spinlock.
Signed-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patchset continues the work that began in the sysv ipc semaphore
scaling series, seehttps://lkml.org/lkml/2013/3/20/546
Just like semaphores used to be, sysv shared memory and msg queues also
abuse the ipc lock, unnecessarily holding it for operations such as
permission and security checks.This patchset mostly deals with mqueues, and while shared mem can be
done in a very similar way, I want to get these patches out in the open
first. It also does some pending cleanups, mostly focused on the two
level locking we have in ipc code, taking care of ipc_addid() and
ipcctl_pre_down_nolock() - yes there are still functions that need to be
updated as well.This patch:
Make all callers explicitly take and release the RCU read lock.
This addresses the two level locking seen in newary(), newseg() and
newqueue(). For the last two, explicitly unlock the ipc object and the
rcu lock, instead of calling the custom shm_unlock and msg_unlock
functions. The next patch will deal with the open coded locking for
->perm.lockSigned-off-by: Davidlohr Bueso
Cc: Andi Kleen
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
flush_ptrace_hw_breakpoint() destroys the counters set by ptrace, but
"leaks" ->debugreg6 and ->ptrace_dr7.The problem is minor, but still it doesn't look right and flush_thread()
did this until commit 66cb59172959 ("hw-breakpoints: use the new wrapper
routines to access debug registers in process/thread code"). Now that
PTRACE_DETACH does flush_ too this makes even more sense.Signed-off-by: Oleg Nesterov
Cc: Benjamin Herrenschmidt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child). This
frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
ensures that the tracee won't be killed by SIGTRAP triggered by the
active breakpoints.Test-case:
unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
{
unsigned long dr7;dr7 = ((len | type) & 0xf)
<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
if (enable)
dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));return dr7;
}int write_dr(int pid, int dr, unsigned long val)
{
return ptrace(PTRACE_POKEUSER, pid,
offsetof (struct user, u_debugreg[dr]),
val);
}void func(void)
{
}int main(void)
{
int pid, stat;
unsigned long dr7;pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
kill(getpid(), SIGHUP);func();
return 0x13;
}assert(pid == waitpid(-1, &stat, 0));
assert(WSTOPSIG(stat) == SIGHUP);assert(write_dr(pid, 0, (long)func) == 0);
dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
assert(write_dr(pid, 7, dr7) == 0);assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
assert(pid == waitpid(-1, &stat, 0));
assert(stat == 0x1300);return 0;
}Before this patch the child is killed after PTRACE_DETACH.
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ptrace_set_debugreg() is trivial but looks horrible. Kill the unnecessary
goto's and return's to cleanup the code.This matches ptrace_get_debugreg() which also needs the trivial whitespace
cleanups.Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 24f1e32c60c4 ("hw-breakpoints: Rewrite the hw-breakpoints layer
on top of perf events") introduced the minor regression. Before this
commitPTRACE_POKEUSER DR7, enableDR0
PTRACE_POKEUSER DR0, addresswas perfectly valid, now PTRACE_POKEUSER(DR7) fails if DR0 was not
previously initialized by PTRACE_POKEUSER(DR0).Change ptrace_write_dr7() to do ptrace_register_breakpoint(addr => 0) if
!bp && !disabled.This fixes watchpoint-zeroaddr from ptrace-tests, see
https://bugzilla.redhat.com/show_bug.cgi?id=660204.
Signed-off-by: Oleg Nesterov
Reported-by: Jan Kratochvil
Acked-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No functional changes, preparation.
Extract the "register breakpoint" code from ptrace_get_debugreg() into
the new/generic helper, ptrace_register_breakpoint(). It will have more
users.The patch also adds another simple helper, ptrace_fill_bp_fields(), to
factor out the arch_bp_generic_fields() logic in register/modify.Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ptrace_write_dr7() skips ptrace_modify_breakpoint(disabled => true)
unless second_pass, this buys nothing but complicates the code and means
that we always do the main loop twice even if "disabled" was never true.The comment says:
Don't unregister the breakpoints right-away,
unless all register_user_hw_breakpoint()
requests have succeeded.Firstly, we do not do register_user_hw_breakpoint(), it was removed by
commit 24f1e32c60c4 ("hw-breakpoints: Rewrite the hw-breakpoints layer
on top of perf events").We are going to restore register_user_hw_breakpoint() (see the next
patch) but this doesn't matter: after commit 44234adcdce3
("hw-breakpoints: Modify breakpoints without unregistering them")
perf_event_disable() can not hurt, hw_breakpoint_del() does not free the
slot.Remove the "second_pass" check from the main loop and simplify the code.
Since we have to check "bp != NULL" anyway, the patch also removes the
same check in ptrace_modify_breakpoint() and moves the comment into
ptrace_write_dr7().With this patch the second pass is only needed to restore the saved
old_dr7. This should never fail, so the patch adds WARN_ON() to catch
the potential problems as Frederic suggested.Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ptrace_write_dr7() looks unnecessarily overcomplicated. We can factor
out ptrace_modify_breakpoint() and do not do "continue" twice, just we
need to pass the proper "disabled" argument to
ptrace_modify_breakpoint().Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This reverts commit bf26c018490c ("Prepare to fix racy accesses on task
breakpoints").The patch was fine but we can no longer race with SIGKILL after commit
9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
we can kill them and remove task->ptrace_bp_refcnt.Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Acked-by: Michael Neuling
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Paul Mackerras
Cc: Paul Mundt
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This reverts commit e0ac8457d020 ("hw_breakpoints: Fix racy access to
ptrace breakpoints").The patch was fine but we can no longer race with SIGKILL after commit
9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.Signed-off-by: Oleg Nesterov
Cc: Paul Mundt
Cc: Benjamin Herrenschmidt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Jan Kratochvil
Cc: Michael Neuling
Cc: Paul Mackerras
Cc: Will Deacon
Cc: Prasad
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds