06 Mar, 2010

1 commit

  • …nel/git/tip/linux-2.6-tip

    * 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Issue at least one memory barrier in stop_machine_text_poke()
    perf probe: Correct probe syntax on command line help
    perf probe: Add lazy line matching support
    perf probe: Show more lines after last line
    perf probe: Check function address range strictly in line finder
    perf probe: Use libdw callback routines
    perf probe: Use elfutils-libdw for analyzing debuginfo
    perf probe: Rename probe finder functions
    perf probe: Fix bugs in line range finder
    perf probe: Update perf probe document
    perf probe: Do not show --line option without dwarf support
    kprobes: Add documents of jump optimization
    kprobes/x86: Support kprobes jump optimization on x86
    x86: Add text_poke_smp for SMP cross modifying code
    kprobes/x86: Cleanup save/restore registers
    kprobes/x86: Boost probes when reentering
    kprobes: Jump optimization sysctl interface
    kprobes: Introduce kprobes jump optimization
    kprobes: Introduce generic insn_slot framework
    kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE

    Linus Torvalds
     

01 Mar, 2010

1 commit


26 Feb, 2010

1 commit

  • Add /proc/sys/debug/kprobes-optimization sysctl which enables
    and disables kprobes jump optimization on the fly for debugging.

    Changes in v7:
    - Remove ctl_name = CTL_UNNUMBERED for upstream compatibility.

    Changes in v6:
    - Update comments and coding style.

    Signed-off-by: Masami Hiramatsu
    Cc: systemtap
    Cc: DLE
    Cc: Ananth N Mavinakayanahalli
    Cc: Jim Keniston
    Cc: Srikar Dronamraju
    Cc: Christoph Hellwig
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Anders Kaseorg
    Cc: Tim Abbott
    Cc: Andi Kleen
    Cc: Jason Baron
    Cc: Mathieu Desnoyers
    Cc: Frederic Weisbecker
    Cc: Ananth N Mavinakayanahalli
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

18 Dec, 2009

2 commits


17 Dec, 2009

1 commit

  • In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be
    skipped by the compiler. We do this as the minimum mmap address doesn't make
    any sense in NOMMU mode.

    mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode.

    Signed-off-by: David Howells
    Acked-by: Eric Paris
    Signed-off-by: James Morris

    David Howells
     

16 Dec, 2009

2 commits

  • Jan Engelhardt reported we have this problem:

    setting max_map_count to a value large enough results in programs dying at
    first try. This is on 2.6.31.6:

    15:59 borg:/proc/sys/vm # echo $[1<max_map_count
    15:59 borg:/proc/sys/vm # cat max_map_count
    1073741824
    15:59 borg:/proc/sys/vm # echo $[1<max_map_count
    15:59 borg:/proc/sys/vm # cat max_map_count
    Killed

    This is because we have a chance to make 'max_map_count' negative. but
    it's meaningless. Make it only accept non-negative values.

    Reported-by: Jan Engelhardt
    Signed-off-by: WANG Cong
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: James Morris
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • This patch derives a "nodes_allowed" node mask from the numa mempolicy of
    the task modifying the number of persistent huge pages to control the
    allocation, freeing and adjusting of surplus huge pages when the pool page
    count is modified via the new sysctl or sysfs attribute
    "nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows:

    * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
    is produced. This will cause the hugetlb subsystem to use
    node_online_map as the "nodes_allowed". This preserves the
    behavior before this patch.
    * For "preferred" mempolicy, including explicit local allocation,
    a nodemask with the single preferred node will be produced.
    "local" policy will NOT track any internode migrations of the
    task adjusting nr_hugepages.
    * For "bind" and "interleave" policy, the mempolicy's nodemask
    will be used.
    * Other than to inform the construction of the nodes_allowed node
    mask, the actual mempolicy mode is ignored. That is, all modes
    behave like interleave over the resulting nodes_allowed mask
    with no "fallback".

    See the updated documentation [next patch] for more information
    about the implications of this patch.

    Examples:

    Starting with:

    Node 0 HugePages_Total: 0
    Node 1 HugePages_Total: 0
    Node 2 HugePages_Total: 0
    Node 3 HugePages_Total: 0

    Default behavior [with or without this patch] balances persistent
    hugepage allocation across nodes [with sufficient contiguous memory]:

    sysctl vm.nr_hugepages[_mempolicy]=32

    yields:

    Node 0 HugePages_Total: 8
    Node 1 HugePages_Total: 8
    Node 2 HugePages_Total: 8
    Node 3 HugePages_Total: 8

    Of course, we only have nr_hugepages_mempolicy with the patch,
    but with default mempolicy, nr_hugepages_mempolicy behaves the
    same as nr_hugepages.

    Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
    '--membind' because it allows multiple nodes to be specified
    and it's easy to type]--we can allocate huge pages on
    individual nodes or sets of nodes. So, starting from the
    condition above, with 8 huge pages per node, add 8 more to
    node 2 using:

    numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40

    This yields:

    Node 0 HugePages_Total: 8
    Node 1 HugePages_Total: 8
    Node 2 HugePages_Total: 16
    Node 3 HugePages_Total: 8

    The incremental 8 huge pages were restricted to node 2 by the
    specified mempolicy.

    Similarly, we can use mempolicy to free persistent huge pages
    from specified nodes:

    numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32

    yields:

    Node 0 HugePages_Total: 4
    Node 1 HugePages_Total: 4
    Node 2 HugePages_Total: 16
    Node 3 HugePages_Total: 8

    The 8 huge pages freed were balanced over nodes 0 and 1.

    [rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
    Signed-off-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Acked-by: Mel Gorman
    Reviewed-by: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: Randy Dunlap
    Cc: Nishanth Aravamudan
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Cc: Eric Whitney
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

13 Dec, 2009

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
    sched: Remove forced2_migrations stats
    sched: Fix memory leak in two error corner cases
    sched: Fix build warning in get_update_sysctl_factor()
    sched: Update normalized values on user updates via proc
    sched: Make tunable scaling style configurable
    sched: Fix missing sched tunable recalculation on cpu add/remove
    sched: Fix task priority bug
    sched: cgroup: Implement different treatment for idle shares
    sched: Remove unnecessary RCU exclusion
    sched: Discard some old bits
    sched: Clean up check_preempt_wakeup()
    sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call
    sched: Sanitize fork() handling
    sched: Clean up ttwu() rq locking
    sched: Remove rq->clock coupling from set_task_cpu()
    sched: Consolidate select_task_rq() callers
    sched: Remove sysctl.sched_features
    sched: Protect sched_rr_get_param() access to task->sched_class
    sched: Protect task->cpus_allowed access in sched_getaffinity()
    sched: Fix balance vs hotplug race
    ...

    Fixed up conflicts in kernel/sysctl.c (due to sysctl cleanup)

    Linus Torvalds
     

09 Dec, 2009

3 commits

  • The normalized values are also recalculated in case the scaling factor
    changes.

    This patch updates the internally used scheduler tuning values that are
    normalized to one cpu in case a user sets new values via sysfs.

    Together with patch 2 of this series this allows to let user configured
    values scale (or not) to cpu add/remove events taking place later.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    [ v2: fix warning ]
    Signed-off-by: Ingo Molnar

    Christian Ehrhardt
     
  • As scaling now takes place on all kind of cpu add/remove events a user
    that configures values via proc should be able to configure if his set
    values are still rescaled or kept whatever happens.

    As the comments state that log2 was just a second guess that worked the
    interface is not just designed for on/off, but to choose a scaling type.
    Currently this allows none, log and linear, but more important it allwos
    us to keep the interface even if someone has an even better idea how to
    scale the values.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Christian Ehrhardt
     
  • Since we've had a much saner debugfs interface to this, remove the
    sysctl one.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    [ v2: build fix ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
    security/tomoyo: Remove now unnecessary handling of security_sysctl.
    security/tomoyo: Add a special case to handle accesses through the internal proc mount.
    sysctl: Drop & in front of every proc_handler.
    sysctl: Remove CTL_NONE and CTL_UNNUMBERED
    sysctl: kill dead ctl_handler definitions.
    sysctl: Remove the last of the generic binary sysctl support
    sysctl net: Remove unused binary sysctl code
    sysctl security/tomoyo: Don't look at ctl_name
    sysctl arm: Remove binary sysctl support
    sysctl x86: Remove dead binary sysctl support
    sysctl sh: Remove dead binary sysctl support
    sysctl powerpc: Remove dead binary sysctl support
    sysctl ia64: Remove dead binary sysctl support
    sysctl s390: Remove dead sysctl binary support
    sysctl frv: Remove dead binary sysctl support
    sysctl mips/lasat: Remove dead binary sysctl support
    sysctl drivers: Remove dead binary sysctl support
    sysctl crypto: Remove dead binary sysctl support
    sysctl security/keys: Remove dead binary sysctl support
    sysctl kernel: Remove binary sysctl logic
    ...

    Linus Torvalds
     

06 Dec, 2009

1 commit


19 Nov, 2009

1 commit


12 Nov, 2009

1 commit


11 Nov, 2009

3 commits


06 Nov, 2009

1 commit


24 Sep, 2009

4 commits

  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
    HWPOISON: Enable error_remove_page on btrfs
    HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
    HWPOISON: Add madvise() based injector for hardware poisoned pages v4
    HWPOISON: Enable error_remove_page for NFS
    HWPOISON: Enable .remove_error_page for migration aware file systems
    HWPOISON: The high level memory error handler in the VM v7
    HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
    HWPOISON: shmem: call set_page_dirty() with locked page
    HWPOISON: Define a new error_remove_page address space op for async truncation
    HWPOISON: Add invalidate_inode_page
    HWPOISON: Refactor truncate to allow direct truncating of page v2
    HWPOISON: check and isolate corrupted free pages v2
    HWPOISON: Handle hardware poisoned pages in try_to_unmap
    HWPOISON: Use bitmask/action code for try_to_unmap behaviour
    HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
    HWPOISON: Add poison check to page fault handling
    HWPOISON: Add basic support for poisoned pages in fault handler v3
    HWPOISON: Add new SIGBUS error codes for hardware poison signals
    HWPOISON: Add support for poison swap entries v2
    HWPOISON: Export some rmap vma locking to outside world
    ...

    Linus Torvalds
     
  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Introduce core pipe limiting sysctl.

    Since we can dump cores to pipe, rather than directly to the filesystem,
    we create a condition in which a user can create a very high load on the
    system simply by running bad applications.

    If the pipe reader specified in core_pattern is poorly written, we can
    have lots of ourstandig resources and processes in the system.

    This sysctl introduces an ability to limit that resource consumption.
    core_pipe_limit defines how many in-flight dumps may be run in parallel,
    dumps beyond this value are skipped and a note is made in the kernel log.
    A special value of 0 in core_pipe_limit denotes unlimited core dumps may
    be handled (this is the default value).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Neil Horman
    Reported-by: Earl Chew
    Cc: Oleg Nesterov
    Cc: Andi Kleen
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

23 Sep, 2009

1 commit

  • When syslog is not possible, at the same time there's no serial/net
    console available, it will be hard to read the printk messages. For
    example oops/panic/warning messages in shutdown phase.

    Add a printk delay feature, we can make each printk message delay some
    milliseconds.

    Setting the delay by proc/sysctl interface: /proc/sys/kernel/printk_delay

    The value range from 0 - 10000, default value is 0

    [akpm@linux-foundation.org: fix a few things]
    Signed-off-by: Dave Young
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     

22 Sep, 2009

1 commit

  • Decouple kernel.h from ratelimit.h: the global declaration of
    printk's ratelimit_state is not needed, and it leads to messy
    circular dependencies due to ratelimit.h's (new) adding of a
    spinlock_types.h include.

    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: David S. Miller
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 Sep, 2009

2 commits

  • Add the high level memory handler that poisons pages
    that got corrupted by hardware (typically by a two bit flip in a DIMM
    or a cache) on the Linux level. The goal is to prevent everyone
    from accessing these pages in the future.

    This done at the VM level by marking a page hwpoisoned
    and doing the appropriate action based on the type of page
    it is.

    The code that does this is portable and lives in mm/memory-failure.c

    To quote the overview comment:

    High level machine check handler. Handles pages reported by the
    hardware as being corrupted usually due to a 2bit ECC memory or cache
    failure.

    This focuses on pages detected as corrupted in the background.
    When the current CPU tries to consume corruption the currently
    running process can just be killed directly instead. This implies
    that if the error cannot be handled for some reason it's safe to
    just ignore it because no corruption has been consumed yet. Instead
    when that happens another machine check will happen.

    Handles page cache pages in various states. The tricky part
    here is that we can access any page asynchronous to other VM
    users, because memory failures could happen anytime and anywhere,
    possibly violating some of their assumptions. This is why this code
    has to be extremely careful. Generally it tries to use normal locking
    rules, as in get the standard locks, even if that means the
    error handling takes potentially a long time.

    Some of the operations here are somewhat inefficient and have non
    linear algorithmic complexity, because the data structures have not
    been optimized for this case. This is in particular the case
    for the mapping from a vma to a process. Since this case is expected
    to be rare we hope we can get away with this.

    There are in principle two strategies to kill processes on poison:
    - just unmap the data and wait for an actual reference before
    killing
    - kill as soon as corruption is detected.
    Both have advantages and disadvantages and should be used
    in different situations. Right now both are implemented and can
    be switched with a new sysctl vm.memory_failure_early_kill
    The default is early kill.

    The patch does some rmap data structure walking on its own to collect
    processes to kill. This is unusual because normally all rmap data structure
    knowledge is in rmap.c only. I put it here for now to keep
    everything together and rmap knowledge has been seeping out anyways

    Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
    Nick Piggin (who did a lot of great work) and others.

    Cc: npiggin@suse.de
    Cc: riel@redhat.com
    Signed-off-by: Andi Kleen
    Acked-by: Rik van Riel
    Reviewed-by: Hidehiro Kawai

    Andi Kleen
     
  • kernel/built-in.o:(.data+0x17b0): undefined reference to `blk_iopoll_enabled'

    Since the extern declaration makes the compile work, but the actual
    symbol is missing when block/blk-iopoll.o isn't linked in.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Sep, 2009

1 commit

  • * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
    block: use blkdev_issue_discard in blk_ioctl_discard
    Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
    block: don't assume device has a request list backing in nr_requests store
    block: Optimal I/O limit wrapper
    cfq: choose a new next_req when a request is dispatched
    Seperate read and write statistics of in_flight requests
    aoe: end barrier bios with EOPNOTSUPP
    block: trace bio queueing trial only when it occurs
    block: enable rq CPU completion affinity by default
    cfq: fix the log message after dispatched a request
    block: use printk_once
    cciss: memory leak in cciss_init_one()
    splice: update mtime and atime on files
    block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
    cfq-iosched: get rid of must_alloc flag
    block: use interrupts disabled version of raise_softirq_irqoff()
    block: fix comment in blk-iopoll.c
    block: adjust default budget for blk-iopoll
    block: fix long lines in block/blk-iopoll.c
    block: add blk-iopoll, a NAPI like approach for block devices
    ...

    Linus Torvalds
     

12 Sep, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
    sched: Fix sched::sched_stat_wait tracepoint field
    sched: Disable NEW_FAIR_SLEEPERS for now
    sched: Keep kthreads at default priority
    sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
    sched: Turn off child_runs_first
    sched: Ensure that a child can't gain time over it's parent after fork()
    sched: enable SD_WAKE_IDLE
    sched: Deal with low-load in wake_affine()
    sched: Remove short cut from select_task_rq_fair()
    sched: Turn on SD_BALANCE_NEWIDLE
    sched: Clean up topology.h
    sched: Fix dynamic power-balancing crash
    sched: Remove reciprocal for cpu_power
    sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
    sched: Try to deal with low capacity
    sched: Scale down cpu_power due to RT tasks
    sched: Implement dynamic cpu_power
    sched: Add smt_gain
    sched: Update the cpu_power sum during load-balance
    sched: Add SD_PREFER_SIBLING
    ...

    Linus Torvalds
     

11 Sep, 2009

1 commit

  • This borrows some code from NAPI and implements a polled completion
    mode for block devices. The idea is the same as NAPI - instead of
    doing the command completion when the irq occurs, schedule a dedicated
    softirq in the hopes that we will complete more IO when the iopoll
    handler is invoked. Devices have a budget of commands assigned, and will
    stay in polled mode as long as they continue to consume their budget
    from the iopoll softirq handler. If they do not, the device is set back
    to interrupt completion mode.

    This patch holds the core bits for blk-iopoll, device driver support
    sold separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Sep, 2009

1 commit

  • Set child_runs_first default to off.

    It hurts 'optimal' make -j workloads as make jobs
    get preempted by child tasks, reducing parallelism.

    Note, this patch might make existing races in user
    applications more prominent than before - so breakages
    might be bisected to this commit.

    Child-runs-first is broken on SMP to begin with, and we
    already had it off briefly in v2.6.23 so most of the
    offenders ought to be fixed. Would be nice not to revert
    this commit but fix those apps finally ...

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
    people want to work around broken apps. ]
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

07 Sep, 2009

1 commit


04 Sep, 2009

1 commit

  • Keep an average on the amount of time spend on RT tasks and use
    that fraction to scale down the cpu_power for regular tasks.

    Signed-off-by: Peter Zijlstra
    Tested-by: Andreas Herrmann
    Acked-by: Andreas Herrmann
    Acked-by: Gautham R Shenoy
    Cc: Balbir Singh
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Aug, 2009

1 commit

  • Currently SELinux enforcement of controls on the ability to map low memory
    is determined by the mmap_min_addr tunable. This patch causes SELinux to
    ignore the tunable and instead use a seperate Kconfig option specific to how
    much space the LSM should protect.

    The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
    permissions will always protect the amount of low memory designated by
    CONFIG_LSM_MMAP_MIN_ADDR.

    This allows users who need to disable the mmap_min_addr controls (usual reason
    being they run WINE as a non-root user) to do so and still have SELinux
    controls preventing confined domains (like a web server) from being able to
    map some area of low memory.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     

29 Jun, 2009

1 commit

  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, delay: tsc based udelay should have rdtsc_barrier
    x86, setup: correct include file in <asm/boot.h>
    x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h>
    x86, mce: percpu mcheck_timer should be pinned
    x86: Add sysctl to allow panic on IOCK NMI error
    x86: Fix uv bau sending buffer initialization
    x86, mce: Fix mce resume on 32bit
    x86: Move init_gbpages() to setup_arch()
    x86: ensure percpu lpage doesn't consume too much vmalloc space
    x86: implement percpu_alloc kernel parameter
    x86: fix pageattr handling for lpage percpu allocator and re-enable it
    x86: reorganize cpa_process_alias()
    x86: prepare setup_pcpu_lpage() for pageattr fix
    x86: rename remap percpu first chunk allocator to lpage
    x86: fix duplicate free in setup_pcpu_remap() failure path
    percpu: fix too lazy vunmap cache flushing
    x86: Set cpu_llc_id on AMD CPUs

    Linus Torvalds
     

26 Jun, 2009

1 commit

  • This patch introduces a new sysctl:

    /proc/sys/kernel/panic_on_io_nmi

    which defaults to 0 (off).

    When enabled, the kernel panics when the kernel receives an NMI
    caused by an IO error.

    The IO error triggered NMI indicates a serious system
    condition, which could result in IO data corruption. Rather
    than contiuing, panicing and dumping might be a better choice,
    so one can figure out what's causing the IO error.

    This could be especially important to companies running IO
    intensive applications where corruption must be avoided, e.g. a
    bank's databases.

    [ SuSE has been shipping it for a while, it was done at the
    request of a large database vendor, for their users. ]

    Signed-off-by: Kurt Garloff
    Signed-off-by: Roberto Angelino
    Signed-off-by: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Kurt Garloff
     

23 Jun, 2009

1 commit

  • Poornima Nayek reported:

    | Timer migration interface /proc/sys/kernel/timer_migration in
    | 2.6.30-git9 accepts any numerical value as input.
    |
    | Steps to reproduce:
    | 1. echo -6666666 > /proc/sys/kernel/timer_migration
    | 2. cat /proc/sys/kernel/timer_migration
    | -6666666
    |
    | 1. echo 44444444444444444444444444444444444444444444444444444444444 > /proc/sys/kernel/timer_migration
    | 2. cat /proc/sys/kernel/timer_migration
    | -1357789412
    |
    | Expected behavior: Should 'echo: write error: Invalid argument' while
    | setting any value other then 0 & 1

    Restrict valid values to 0 and 1.

    Reported-by: Poornima Nayak
    Tested-by: Poornima Nayak
    Signed-off-by: Arun R Bharadwaj
    Cc: poornima nayak
    Cc: Arun Bharadwaj
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arun R Bharadwaj
     

19 Jun, 2009

1 commit

  • Remoce the unused variable 'val' from __do_proc_dointvec()

    The integer has been declared and used as 'val = -val' and there is no
    reference to it anywhere.

    Signed-off-by: Sukanto Ghosh
    Cc: Jaswinder Singh Rajput
    Cc: Sukanto Ghosh
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukanto Ghosh