07 Dec, 2010
3 commits
-
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / Hibernate: Fix memory corruption related to swap
PM / Hibernate: Use async I/O when reading compressed hibernation image -
There is a problem that swap pages allocated before the creation of
a hibernation image can be released and used for storing the contents
of different memory pages while the image is being saved. Since the
kernel stored in the image doesn't know of that, it causes memory
corruption to occur after resume from hibernation, especially on
systems with relatively small RAM that need to swap often.This issue can be addressed by keeping the GFP_IOFS bits clear
in gfp_allowed_mask during the entire hibernation, including the
saving of the image, until the system is finally turned off or
the hibernation is aborted. Unfortunately, for this purpose
it's necessary to rework the way in which the hibernate and
suspend code manipulates gfp_allowed_mask.This change is based on an earlier patch from Hugh Dickins.
Signed-off-by: Rafael J. Wysocki
Reported-by: Ondrej Zary
Acked-by: Hugh Dickins
Reviewed-by: KAMEZAWA Hiroyuki
Cc: stable@kernel.org -
This is a fix for reading LZO compressed image using async I/O.
Essentially, instead of having just one page into which we keep
reading blocks from swap, we allocate enough of them to cover the
largest compressed size and then let block I/O pick them all up. Once
we have them all (and here we wait), we decompress them, as usual.
Obviously, the very first block we still pick up synchronously,
because we need to know the size of the lot before we pick up the
rest.Also fixed the copyright line, which I've forgotten before.
Signed-off-by: Bojan Smojver
Signed-off-by: Rafael J. Wysocki
03 Dec, 2010
1 commit
-
If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
otherwise reset before do_exit(). do_exit may later (via mm_release in
fork.c) do a put_user to a user-controlled address, potentially allowing
a user to leverage an oops into a controlled write into kernel memory.This is only triggerable in the presence of another bug, but this
potentially turns a lot of DoS bugs into privilege escalations, so it's
worth fixing. I have proof-of-concept code which uses this bug along
with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
I've tested that this is not theoretical.A more logical place to put this fix might be when we know an oops has
occurred, before we call do_exit(), but that would involve changing
every architecture, in multiple places.Let's just stick it in do_exit instead.
[akpm@linux-foundation.org: update code comment]
Signed-off-by: Nelson Elhage
Cc: KOSAKI Motohiro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Nov, 2010
1 commit
-
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix the software context switch counter
perf, x86: Fixup Kconfig deps
x86, perf, nmi: Disable perf if counters are not accessible
perf: Fix inherit vs. context rotation bug
27 Nov, 2010
3 commits
-
…el/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix-cpu-timers: Rcu_read_lock/unlock protect find_task_by_vpid call -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf symbols: Remove incorrect open-coded container_of()
perf record: Handle restrictive permissions in /proc/{kallsyms,modules}
x86/kprobes: Prevent kprobes to probe on save_args()
irq_work: Drop cmpxchg() result
perf: Fix owner-list vs exit
x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
tracing: Fix recursive user stack trace
perf,hw_breakpoint: Initialize hardware api earlier
x86: Ignore trap bits on single step exceptions
tracing: Force arch_local_irq_* notrace for paravirt
tracing: Fix module use of trace_bprintk() -
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix idle balancing
sched: Fix volanomark performance regression
26 Nov, 2010
2 commits
-
Stephane noticed that because the perf_sw_event() call is inside the
perf_event_task_sched_out() call it won't get called unless we
have a per-task counter.Reported-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
It was found that sometimes children of tasks with inherited events had
one extra event. Eventually it turned out to be due to the list rotation
no being exclusive with the list iteration in the inheritance code.Cure this by temporarily disabling the rotation while we inherit the events.
Signed-off-by: Thomas Gleixner
Signed-off-by: Peter Zijlstra
LKML-Reference:
Cc:
Signed-off-by: Ingo Molnar
20 Nov, 2010
1 commit
-
This reverts commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a.
It turns out that this can break certain existing user land setups.
Quoth Sarah Sharp:"On Wednesday, I updated my branch to commit 460781b from linus' tree,
and my box would not boot. klogd segfaulted, which stalled the whole
system.At first I thought it actually hung the box, but it continued booting
after 5 minutes, and I was able to log in. It dropped back to the
text console instead of the graphical bootup display for that period
of time. dmesg surprisingly still works. I've bisected the problem
down to this commit (commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a)The box is running klogd 1.5.5ubuntu3 (from Jaunty). Yes, I know
that's old. I read the bit in the commit about changing the
permissions of kallsyms after boot, but if I can't boot that doesn't
help."So let's just keep the old default, and encourage distributions to do
the "chmod -r /proc/kallsyms" in their bootup scripts. This is not
worth a kernel option to change default behavior, since it's so easily
done in user space.Reported-and-bisected-by: Sarah Sharp
Cc: Marcus Meissner
Cc: Tejun Heo
Cc: Eugene Teo
Cc: Jesper Juhl
Signed-off-by: Linus Torvalds
19 Nov, 2010
1 commit
-
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb,ppc: Fix regression in evr register handling
kgdb,x86: fix regression in detach handling
kdb: fix crash when KDB_BASE_CMD_MAX is exceeded
kdb: fix memory leak in kdb_main.c
18 Nov, 2010
9 commits
-
The compiler warned us about:
kernel/irq_work.c: In function 'irq_work_run':
kernel/irq_work.c:148: warning: value computed is not usedDropping the cmpxchg() result is indeed weird, but correct -
so annotate away the warning.Signed-off-by: Sergio Aguirre
Cc: Huang Ying
Cc: Martin Schwidefsky
Cc: Kyle McMartin
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Oleg noticed that a perf-fd keeping a reference on the creating task
leads to a few funny side effects.There's two different aspects to this:
- kernel based perf-events, these should not take out
a reference on the creating task and appear on the task's
event list since they're not bound to fds nor visible
to userspace.- fork() and pthread_create(), these can lead to the creating
task dying (and thus the task's event-list becomming useless)
but keeping the list and ref alive until the event is closed.Combined they lead to malfunction of the ptrace hw_tracepoints.
Cure this by not considering kernel based perf_events for the
owner-list and destroying the owner-list when the owner dies.Reported-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
Acked-by: Oleg Nesterov
LKML-Reference:
Signed-off-by: Ingo Molnar -
An earlier commit reverts idle balancing throttling reset to fix a 30%
regression in volanomark throughput. We still need to reset idle_stamp
when we pull a task in newidle balance.Reported-by: Alex Shi
Signed-off-by: Nikhil Rao
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
Commit fab4762 triggers excessive idle balancing, causing a ~30% loss in
volanomark throughput. Remove idle balancing throttle reset.Originally-by: Alex Shi
Signed-off-by: Mike Galbraith
Acked-by: Nikhil Rao
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
…eric/random-tracing into perf/urgent
-
…t/rostedt/linux-2.6-trace into perf/urgent
-
When the number of dyanmic kdb commands exceeds KDB_BASE_CMD_MAX, the
kernel will fault.Signed-off-by: Jovi Zhang
Signed-off-by: Jason Wessel -
Call kfree in the error path as well as the success path in kdb_ll().
Signed-off-by: Jovi Zhang
Signed-off-by: Jason Wessel -
The big kernel lock has been removed from all these files at some point,
leaving only the #include.Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds
17 Nov, 2010
5 commits
-
Making /proc/kallsyms readable only for root by default makes it
slightly harder for attackers to write generic kernel exploits by
removing one source of knowledge where things are in the kernel.This is the second submit, discussion happened on this on first submit
and mostly concerned that this is just one hole of the sieve ... but
one of the bigger ones.Changing the permissions of at least System.map and vmlinux is also
required to fix the same set, but a packaging issue.Target of this starter patch and follow ups is removing any kind of
kernel space address information leak from the kernel.[ Side note: the default of root-only reading is the "safe" value, and
it's easy enough to then override at any time after boot. The /proc
filesystem allows root to change the permissions with a regular
chmod, so you can "revert" this at run-time by simply doingchmod og+r /proc/kallsyms
as root if you really want regular users to see the kernel symbols.
It does help some tools like "perf" figure them out without any
setup, so it may well make sense in some situations. - Linus ]Signed-off-by: Marcus Meissner
Acked-by: Tejun Heo
Acked-by: Eugene Teo
Reviewed-by: Jesper Juhl
Signed-off-by: Linus Torvalds -
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix cross-sched-class wakeup preemption
sched: Fix runnable condition for stoptask
sched: Use group weight, idle cpu metrics to fix imbalances during idle -
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / PM QoS: Fix reversed min and max
PM / OPP: Hide OPP configuration when SoCs do not provide an implementation
PM: Allow devices to be removed during late suspend and early resume -
* 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Address compiler warnings in exit_robust_list -
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] kprobes: Fix the return address of multiple kretprobes
[S390] kprobes: disable interrupts throughout
[S390] ftrace: build without frame pointers on s390
[S390] mm: add devmem_is_allowed() for STRICT_DEVMEM checking
[S390] vmlogrdr: purge after recording is switched off
[S390] cio: fix incorrect ccw_device_init_count
[S390] tape: add medium state notifications
[S390] fix get_user_pages_fast
16 Nov, 2010
3 commits
-
Sigh...
Signed-off-by: Joe Perches
Acked-by: Eric Paris
Signed-off-by: Linus Torvalds -
The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
failure when CONFIG_PRINTK=n. This is because the capabilities code
which used the new option was built even though the variable in question
didn't exist.The patch here fixes this by moving the capabilities checks out of the
LSM and into the caller. All (known) LSMs should have been calling the
capabilities hook already so it actually makes the code organization
better to eliminate the hook altogether.Signed-off-by: Eric Paris
Acked-by: James Morris
Signed-off-by: Linus Torvalds -
pm_qos_get_value had min and max reversed, causing all pm_qos
requests to have no effect.Signed-off-by: Colin Cross
Acked-by: mark
Signed-off-by: Rafael J. Wysocki
Cc: stable@kernel.org
13 Nov, 2010
3 commits
-
The user stack trace can fault when examining the trace. Which
would call the do_page_fault handler, which would trace again,
which would do the user stack trace, which would fault and call
do_page_fault again ...Thus this is causing a recursive bug. We need to have a recursion
detector here.[ Resubmitted by Jiri Olsa ]
[ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ]
Cc: Eric Dumazet
Signed-off-by: Jiri Olsa
LKML-Reference:
Signed-off-by: Steven Rostedt -
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits)
block: remove unused copy_io_context()
Documentation: remove anticipatory scheduler info
block: remove REQ_HARDBARRIER
ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2)
ioprio: fix RCU locking around task dereference
block: ioctl: fix information leak to userland
block: read i_size with i_size_read()
cciss: fix proc warning on attempt to remove non-existant directory
bio: take care not overflow page count when mapping/copying user data
block: limit vec count in bio_kmalloc() and bio_alloc_map_data()
block: take care not to overflow when calculating total iov length
block: check for proper length of iov entries in blk_rq_map_user_iov()
cciss: remove controllers supported by hpsa
cciss: use usleep_range not msleep for small sleeps
cciss: limit commands allocated on reset_devices
cciss: Use kernel provided PCI state save and restore functions
cciss: fix board status waiting code
drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs
drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses
drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code
... -
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation
perf_events: Fix time tracking in samples
perf trace: update usage
perf trace: update Documentation with new perf trace variants
perf trace: live-mode command-line cleanup
perf trace record: handle commands correctly
perf record: make the record options available outside perf record
perf trace scripting: remove system-wide param from shell scripts
perf trace scripting: fix some small memory leaks and missing error checks
perf: Fix usages of profile_cpu in builtin-top.c to use cpu_list
perf, ui: Eliminate stack-smashing protection compiler complaint
12 Nov, 2010
4 commits
-
The kernel syslog contains debugging information that is often useful
during exploitation of other vulnerabilities, such as kernel heap
addresses. Rather than futilely attempt to sanitize hundreds (or
thousands) of printk statements and simultaneously cripple useful
debugging functionality, it is far simpler to create an option that
prevents unprivileged users from reading the syslog.This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
dmesg_restrict sysctl. When set to "0", the default, no restrictions are
enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
kernel syslog via dmesg(8) or other mechanisms.[akpm@linux-foundation.org: explain the config option in kernel.txt]
Signed-off-by: Dan Rosenberg
Acked-by: Ingo Molnar
Acked-by: Eugene Teo
Acked-by: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Per task latencytop accumulator prematurely terminates due to erroneous
placement of latency_record_count. It should be incremented whenever a
new record is allocated instead of increment on every latencytop event.Also fix search iterator to only search known record events instead of
blindly searching all pre-allocated space.Signed-off-by: Ken Chen
Reviewed-by: Arjan van de Ven
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
clean_sort_range() should return a number of nonempty elements of range
array, but if the array is full clean_sort_range() returns 0.The problem is that the number of nonempty elements is evaluated by
finding the first empty element of the array. If there is no such element
it returns an initial value of local variable nr_range that is zero.The fix is trivial: it changes initial value of nr_range to size of the
array.The bug can lead to loss of information regarding all ranges, since
typically returned value of clean_sort_range() is considered as an actual
number of ranges in the array after a series of add/subtract operations.Found by Analytical Verification project of Linux Verification Center
(linuxtesting.org), thanks to Alexander Kolosov.Signed-off-by: Alexey Khoroshilov
Cc: Yinghai Lu
Cc: "H. Peter Anvin"
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When using early debugging, the kernel does not initialize the
hw_breakpoint API early enough and causes the late initialization of
the kernel debugger to fail. The boot arguments are:earlyprintk=vga ekgdboc=kbd kgdbwait
Then simply type "go" at the kdb prompt and boot. The kernel will
later emit the message:kgdb: Could not allocate hwbreakpoints
And at that point the kernel debugger will cease to work correctly.
The solution is to initialize the hw_breakpoint at the same time that
all the other perf call backs are initialized instead of using a
core_initcall() initialization which happens well after the kernel
debugger can make use of hardware breakpoints.Signed-off-by: Jason Wessel
CC: Frederic Weisbecker
CC: Ingo Molnar
CC: Peter Zijlstra
LKML-Reference:
Signed-off-by: Frederic Weisbecker
11 Nov, 2010
4 commits
-
Instead of dealing with sched classes inside each check_preempt_curr()
implementation, pull out this logic into the generic wakeup preemption
path.This fixes a hang in KVM (and others) where we are waiting for the
stop machine thread to run ...Reported-by: Markus Trippelsdorf
Tested-by: Marcelo Tosatti
Tested-by: Sergey Senozhatsky
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar -
On use of trace_printk() there's a macro that determines if the format
is static or a variable. If it is static, it defaults to __trace_bprintk()
otherwise it uses __trace_printk().A while ago, Lai Jiangshan added __trace_bprintk(). In that patch, we
discussed a way to allow modules to use it. The difference between
__trace_bprintk() and __trace_printk() is that for faster processing,
just the format and args are stored in the trace instead of running
it through a sprintf function. In order to do this, the format used
by the __trace_bprintk() had to be persistent.See commit 1ba28e02a18cbdbea123836f6c98efb09cbf59ec
The problem comes with trace_bprintk() where the module is unloaded.
The pointer left in the buffer is still pointing to the format.To solve this issue, the formats in the module were copied into kernel
core. If the same format was used, they would use the same copy (to prevent
memory leak). This all worked well until we tried to merge everything.At the time this was written, Lai Jiangshan, Frederic Weisbecker,
Ingo Molnar and myself were all touching the same code. When this was
merged, we lost the part of it that was in module.c. This kept out the
copying of the formats and unloading the module could cause bad pointers
left in the ring buffer.This patch adds back (with updates required for current kernel) the
module code that sets up the necessary pointers.Cc: Lai Jiangshan
Cc: Rusty Russell
Signed-off-by: Steven Rostedt -
Since the OPP API is only useful with an appropraite SoC-specific
implementation there is no point in offering the ability to enable
the API on general systems. Provide an ARCH_HAS OPP Kconfig symbol
which masks out the option unless selected by an implementation.Signed-off-by: Mark Brown
Acked-by: Nishanth Menon
Acked-by: Kevin Hilman
Signed-off-by: Rafael J. Wysocki -
Heiko reported that the TASK_RUNNING check is not sufficient for
CONFIG_PREEMPT=y since we can get preempted with !TASK_RUNNING.He suggested adding a ->se.on_rq test to the existing TASK_RUNNING
one, however TASK_RUNNING will always have ->se.on_rq, so we might as
well reduce that to a single test.[ stop tasks should never get preempted, but its good to handle
this case correctly should this ever happen ]Reported-by: Heiko Carstens
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar