07 Dec, 2010

3 commits

  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM / Hibernate: Fix memory corruption related to swap
    PM / Hibernate: Use async I/O when reading compressed hibernation image

    Linus Torvalds
     
  • There is a problem that swap pages allocated before the creation of
    a hibernation image can be released and used for storing the contents
    of different memory pages while the image is being saved. Since the
    kernel stored in the image doesn't know of that, it causes memory
    corruption to occur after resume from hibernation, especially on
    systems with relatively small RAM that need to swap often.

    This issue can be addressed by keeping the GFP_IOFS bits clear
    in gfp_allowed_mask during the entire hibernation, including the
    saving of the image, until the system is finally turned off or
    the hibernation is aborted. Unfortunately, for this purpose
    it's necessary to rework the way in which the hibernate and
    suspend code manipulates gfp_allowed_mask.

    This change is based on an earlier patch from Hugh Dickins.

    Signed-off-by: Rafael J. Wysocki
    Reported-by: Ondrej Zary
    Acked-by: Hugh Dickins
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: stable@kernel.org

    Rafael J. Wysocki
     
  • This is a fix for reading LZO compressed image using async I/O.
    Essentially, instead of having just one page into which we keep
    reading blocks from swap, we allocate enough of them to cover the
    largest compressed size and then let block I/O pick them all up. Once
    we have them all (and here we wait), we decompress them, as usual.
    Obviously, the very first block we still pick up synchronously,
    because we need to know the size of the lot before we pick up the
    rest.

    Also fixed the copyright line, which I've forgotten before.

    Signed-off-by: Bojan Smojver
    Signed-off-by: Rafael J. Wysocki

    Bojan Smojver
     

03 Dec, 2010

1 commit

  • If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
    otherwise reset before do_exit(). do_exit may later (via mm_release in
    fork.c) do a put_user to a user-controlled address, potentially allowing
    a user to leverage an oops into a controlled write into kernel memory.

    This is only triggerable in the presence of another bug, but this
    potentially turns a lot of DoS bugs into privilege escalations, so it's
    worth fixing. I have proof-of-concept code which uses this bug along
    with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
    I've tested that this is not theoretical.

    A more logical place to put this fix might be when we know an oops has
    occurred, before we call do_exit(), but that would involve changing
    every architecture, in multiple places.

    Let's just stick it in do_exit instead.

    [akpm@linux-foundation.org: update code comment]
    Signed-off-by: Nelson Elhage
    Cc: KOSAKI Motohiro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nelson Elhage
     

29 Nov, 2010

1 commit


27 Nov, 2010

3 commits


26 Nov, 2010

2 commits

  • Stephane noticed that because the perf_sw_event() call is inside the
    perf_event_task_sched_out() call it won't get called unless we
    have a per-task counter.

    Reported-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • It was found that sometimes children of tasks with inherited events had
    one extra event. Eventually it turned out to be due to the list rotation
    no being exclusive with the list iteration in the inheritance code.

    Cure this by temporarily disabling the rotation while we inherit the events.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Cc:
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

20 Nov, 2010

1 commit

  • This reverts commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a.

    It turns out that this can break certain existing user land setups.
    Quoth Sarah Sharp:

    "On Wednesday, I updated my branch to commit 460781b from linus' tree,
    and my box would not boot. klogd segfaulted, which stalled the whole
    system.

    At first I thought it actually hung the box, but it continued booting
    after 5 minutes, and I was able to log in. It dropped back to the
    text console instead of the graphical bootup display for that period
    of time. dmesg surprisingly still works. I've bisected the problem
    down to this commit (commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a)

    The box is running klogd 1.5.5ubuntu3 (from Jaunty). Yes, I know
    that's old. I read the bit in the commit about changing the
    permissions of kallsyms after boot, but if I can't boot that doesn't
    help."

    So let's just keep the old default, and encourage distributions to do
    the "chmod -r /proc/kallsyms" in their bootup scripts. This is not
    worth a kernel option to change default behavior, since it's so easily
    done in user space.

    Reported-and-bisected-by: Sarah Sharp
    Cc: Marcus Meissner
    Cc: Tejun Heo
    Cc: Eugene Teo
    Cc: Jesper Juhl
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

19 Nov, 2010

1 commit


18 Nov, 2010

9 commits


17 Nov, 2010

5 commits

  • Making /proc/kallsyms readable only for root by default makes it
    slightly harder for attackers to write generic kernel exploits by
    removing one source of knowledge where things are in the kernel.

    This is the second submit, discussion happened on this on first submit
    and mostly concerned that this is just one hole of the sieve ... but
    one of the bigger ones.

    Changing the permissions of at least System.map and vmlinux is also
    required to fix the same set, but a packaging issue.

    Target of this starter patch and follow ups is removing any kind of
    kernel space address information leak from the kernel.

    [ Side note: the default of root-only reading is the "safe" value, and
    it's easy enough to then override at any time after boot. The /proc
    filesystem allows root to change the permissions with a regular
    chmod, so you can "revert" this at run-time by simply doing

    chmod og+r /proc/kallsyms

    as root if you really want regular users to see the kernel symbols.
    It does help some tools like "perf" figure them out without any
    setup, so it may well make sense in some situations. - Linus ]

    Signed-off-by: Marcus Meissner
    Acked-by: Tejun Heo
    Acked-by: Eugene Teo
    Reviewed-by: Jesper Juhl
    Signed-off-by: Linus Torvalds

    Marcus Meissner
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Fix cross-sched-class wakeup preemption
    sched: Fix runnable condition for stoptask
    sched: Use group weight, idle cpu metrics to fix imbalances during idle

    Linus Torvalds
     
  • * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
    PM / PM QoS: Fix reversed min and max
    PM / OPP: Hide OPP configuration when SoCs do not provide an implementation
    PM: Allow devices to be removed during late suspend and early resume

    Linus Torvalds
     
  • * 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: Address compiler warnings in exit_robust_list

    Linus Torvalds
     
  • * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
    [S390] kprobes: Fix the return address of multiple kretprobes
    [S390] kprobes: disable interrupts throughout
    [S390] ftrace: build without frame pointers on s390
    [S390] mm: add devmem_is_allowed() for STRICT_DEVMEM checking
    [S390] vmlogrdr: purge after recording is switched off
    [S390] cio: fix incorrect ccw_device_init_count
    [S390] tape: add medium state notifications
    [S390] fix get_user_pages_fast

    Linus Torvalds
     

16 Nov, 2010

3 commits


13 Nov, 2010

3 commits

  • The user stack trace can fault when examining the trace. Which
    would call the do_page_fault handler, which would trace again,
    which would do the user stack trace, which would fault and call
    do_page_fault again ...

    Thus this is causing a recursive bug. We need to have a recursion
    detector here.

    [ Resubmitted by Jiri Olsa ]

    [ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ]

    Cc: Eric Dumazet
    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits)
    block: remove unused copy_io_context()
    Documentation: remove anticipatory scheduler info
    block: remove REQ_HARDBARRIER
    ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2)
    ioprio: fix RCU locking around task dereference
    block: ioctl: fix information leak to userland
    block: read i_size with i_size_read()
    cciss: fix proc warning on attempt to remove non-existant directory
    bio: take care not overflow page count when mapping/copying user data
    block: limit vec count in bio_kmalloc() and bio_alloc_map_data()
    block: take care not to overflow when calculating total iov length
    block: check for proper length of iov entries in blk_rq_map_user_iov()
    cciss: remove controllers supported by hpsa
    cciss: use usleep_range not msleep for small sleeps
    cciss: limit commands allocated on reset_devices
    cciss: Use kernel provided PCI state save and restore functions
    cciss: fix board status waiting code
    drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs
    drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses
    drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code
    ...

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation
    perf_events: Fix time tracking in samples
    perf trace: update usage
    perf trace: update Documentation with new perf trace variants
    perf trace: live-mode command-line cleanup
    perf trace record: handle commands correctly
    perf record: make the record options available outside perf record
    perf trace scripting: remove system-wide param from shell scripts
    perf trace scripting: fix some small memory leaks and missing error checks
    perf: Fix usages of profile_cpu in builtin-top.c to use cpu_list
    perf, ui: Eliminate stack-smashing protection compiler complaint

    Linus Torvalds
     

12 Nov, 2010

4 commits

  • The kernel syslog contains debugging information that is often useful
    during exploitation of other vulnerabilities, such as kernel heap
    addresses. Rather than futilely attempt to sanitize hundreds (or
    thousands) of printk statements and simultaneously cripple useful
    debugging functionality, it is far simpler to create an option that
    prevents unprivileged users from reading the syslog.

    This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
    dmesg_restrict sysctl. When set to "0", the default, no restrictions are
    enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
    kernel syslog via dmesg(8) or other mechanisms.

    [akpm@linux-foundation.org: explain the config option in kernel.txt]
    Signed-off-by: Dan Rosenberg
    Acked-by: Ingo Molnar
    Acked-by: Eugene Teo
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     
  • Per task latencytop accumulator prematurely terminates due to erroneous
    placement of latency_record_count. It should be incremented whenever a
    new record is allocated instead of increment on every latencytop event.

    Also fix search iterator to only search known record events instead of
    blindly searching all pre-allocated space.

    Signed-off-by: Ken Chen
    Reviewed-by: Arjan van de Ven
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Chen
     
  • clean_sort_range() should return a number of nonempty elements of range
    array, but if the array is full clean_sort_range() returns 0.

    The problem is that the number of nonempty elements is evaluated by
    finding the first empty element of the array. If there is no such element
    it returns an initial value of local variable nr_range that is zero.

    The fix is trivial: it changes initial value of nr_range to size of the
    array.

    The bug can lead to loss of information regarding all ranges, since
    typically returned value of clean_sort_range() is considered as an actual
    number of ranges in the array after a series of add/subtract operations.

    Found by Analytical Verification project of Linux Verification Center
    (linuxtesting.org), thanks to Alexander Kolosov.

    Signed-off-by: Alexey Khoroshilov
    Cc: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Khoroshilov
     
  • When using early debugging, the kernel does not initialize the
    hw_breakpoint API early enough and causes the late initialization of
    the kernel debugger to fail. The boot arguments are:

    earlyprintk=vga ekgdboc=kbd kgdbwait

    Then simply type "go" at the kdb prompt and boot. The kernel will
    later emit the message:

    kgdb: Could not allocate hwbreakpoints

    And at that point the kernel debugger will cease to work correctly.

    The solution is to initialize the hw_breakpoint at the same time that
    all the other perf call backs are initialized instead of using a
    core_initcall() initialization which happens well after the kernel
    debugger can make use of hardware breakpoints.

    Signed-off-by: Jason Wessel
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Jason Wessel
     

11 Nov, 2010

4 commits

  • Instead of dealing with sched classes inside each check_preempt_curr()
    implementation, pull out this logic into the generic wakeup preemption
    path.

    This fixes a hang in KVM (and others) where we are waiting for the
    stop machine thread to run ...

    Reported-by: Markus Trippelsdorf
    Tested-by: Marcelo Tosatti
    Tested-by: Sergey Senozhatsky
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • On use of trace_printk() there's a macro that determines if the format
    is static or a variable. If it is static, it defaults to __trace_bprintk()
    otherwise it uses __trace_printk().

    A while ago, Lai Jiangshan added __trace_bprintk(). In that patch, we
    discussed a way to allow modules to use it. The difference between
    __trace_bprintk() and __trace_printk() is that for faster processing,
    just the format and args are stored in the trace instead of running
    it through a sprintf function. In order to do this, the format used
    by the __trace_bprintk() had to be persistent.

    See commit 1ba28e02a18cbdbea123836f6c98efb09cbf59ec

    The problem comes with trace_bprintk() where the module is unloaded.
    The pointer left in the buffer is still pointing to the format.

    To solve this issue, the formats in the module were copied into kernel
    core. If the same format was used, they would use the same copy (to prevent
    memory leak). This all worked well until we tried to merge everything.

    At the time this was written, Lai Jiangshan, Frederic Weisbecker,
    Ingo Molnar and myself were all touching the same code. When this was
    merged, we lost the part of it that was in module.c. This kept out the
    copying of the formats and unloading the module could cause bad pointers
    left in the ring buffer.

    This patch adds back (with updates required for current kernel) the
    module code that sets up the necessary pointers.

    Cc: Lai Jiangshan
    Cc: Rusty Russell
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Since the OPP API is only useful with an appropraite SoC-specific
    implementation there is no point in offering the ability to enable
    the API on general systems. Provide an ARCH_HAS OPP Kconfig symbol
    which masks out the option unless selected by an implementation.

    Signed-off-by: Mark Brown
    Acked-by: Nishanth Menon
    Acked-by: Kevin Hilman
    Signed-off-by: Rafael J. Wysocki

    Mark Brown
     
  • Heiko reported that the TASK_RUNNING check is not sufficient for
    CONFIG_PREEMPT=y since we can get preempted with !TASK_RUNNING.

    He suggested adding a ->se.on_rq test to the existing TASK_RUNNING
    one, however TASK_RUNNING will always have ->se.on_rq, so we might as
    well reduce that to a single test.

    [ stop tasks should never get preempted, but its good to handle
    this case correctly should this ever happen ]

    Reported-by: Heiko Carstens
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra