09 Dec, 2009

3 commits

  • Currently we try to do task placement in wake_up_new_task() after we do
    the load-balance pass in sched_fork(). This yields complicated semantics
    in that we have to deal with tasks on different RQs and the
    set_task_cpu() calls in copy_process() and sched_fork()

    Rename ->task_new() to ->task_fork() and call it from sched_fork()
    before the balancing, this gives the policy a clear point to place the
    task.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since we've had a much saner debugfs interface to this, remove the
    sysctl one.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    [ v2: build fix ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • sched_rr_get_param calls
    task->sched_class->get_rr_interval(task) without protection
    against a concurrent sched_setscheduler() call which modifies
    task->sched_class.

    Serialize the access with task_rq_lock(task) and hand the rq
    pointer into get_rr_interval() as it's needed at least in the
    sched_fair implementation.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

07 Dec, 2009

1 commit

  • Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo
    sched domain managment) we have cpu_active_mask which is suppose to rule
    scheduler migration and load-balancing, except it never (fully) did.

    The particular problem being solved here is a crash in try_to_wake_up()
    where select_task_rq() ends up selecting an offline cpu because
    select_task_rq_fair() trusts the sched_domain tree to reflect the
    current state of affairs, similarly select_task_rq_rt() trusts the
    root_domain.

    However, the sched_domains are updated from CPU_DEAD, which is after the
    cpu is taken offline and after stop_machine is done. Therefore it can
    race perfectly well with code assuming the domains are right.

    Cure this by building the domains from cpu_active_mask on
    CPU_DOWN_PREPARE.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

06 Dec, 2009

16 commits

  • * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size
    x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
    x86/alternatives: Check replacementlen t use the strict copy checks when branch profiling is in use
    x86, 64-bit: Move K8 B step iret fixup to fault entry asm
    x86: Generate cmpxchg build failures
    x86: Add a Kconfig option to turn the copy_from_user warnings into errors
    x86: Turn the copy_from_user check into an (optional) compile time warning
    x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
    x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
    sched, cputime: Introduce thread_group_times()
    sched, cputime: Cleanups related to task_times()
    Revert "sched, x86: Optimize branch hint in __switch_to()"
    sched: Fix isolcpus boot option
    sched: Revert 498657a478c60be092208422fefa9c7b248729c2
    sched, time: Define nsecs_to_jiffies()
    sched: Remove task_{u,s,g}time()
    sched: Introduce task_times() to replace task_{u,s}time() pair
    sched: Limit the number of scheduler debug messages
    sched.c: Call debug_show_all_locks() when dumping all tasks
    sched, x86: Optimize branch hint in __switch_to()
    sched: Optimize branch hint in context_switch()
    sched: Optimize branch hint in pick_next_task_fair()
    sched_feat_write(): Update ppos instead of file->f_pos
    sched: Sched_rt_periodic_timer vs cpu hotplug
    sched, kvm: Fix race condition involving sched_in_preempt_notifers
    sched: More generic WAKE_AFFINE vs select_idle_sibling()
    sched: Cleanup select_task_rq_fair()
    sched: Fix granularity of task_u/stime()
    sched: Fix/add missing update_rq_clock() calls
    ...

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits)
    x86: Fix comments of register/stack access functions
    perf tools: Replace %m with %a in sscanf
    hw-breakpoints: Keep track of user disabled breakpoints
    tracing/syscalls: Make syscall events print callbacks static
    tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook
    perf: Don't free perf_mmap_data until work has been done
    perf_event: Fix compile error
    perf tools: Fix _GNU_SOURCE macro related strndup() build error
    trace_syscalls: Remove unused syscall_name_to_nr()
    trace_syscalls: Simplify syscall profile
    trace_syscalls: Remove duplicate init_enter_##sname()
    trace_syscalls: Add syscall_nr field to struct syscall_metadata
    trace_syscalls: Remove enter_id exit_id
    trace_syscalls: Set event_enter_##sname->data to its metadata
    trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit
    perf_event: Initialize data.period in perf_swevent_hrtimer()
    perf probe: Simplify event naming
    perf probe: Add --list option for listing current probe events
    perf probe: Add argv_split() from lib/argv_split.c
    perf probe: Move probe event utility functions to probe-event.c
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
    tracing: Separate raw syscall from syscall tracer
    ring-buffer-benchmark: Add parameters to set produce/consumer priorities
    tracing, function tracer: Clean up strstrip() usage
    ring-buffer benchmark: Run producer/consumer threads at nice +19
    tracing: Remove the stale include/trace/power.h
    tracing: Only print objcopy version warning once from recordmcount
    tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
    ring-buffer: Move access to commit_page up into function used
    tracing: do not disable interrupts for trace_clock_local
    ring-buffer: Add multiple iterations between benchmark timestamps
    kprobes: Sanitize struct kretprobe_instance allocations
    tracing: Fix to use __always_unused attribute
    compiler: Introduce __always_unused
    tracing: Exit with error if a weak function is used in recordmcount.pl
    tracing: Move conditional into update_funcs() in recordmcount.pl
    tracing: Add regex for weak functions in recordmcount.pl
    tracing: Move mcount section search to front of loop in recordmcount.pl
    tracing: Fix objcopy revision check in recordmcount.pl
    tracing: Check absolute path of input file in recordmcount.pl
    tracing: Correct the check for number of arguments in recordmcount.pl
    ...

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: Fix trace_marker output
    tracing: Fix event format export
    tracing: Fix return value of tracing_stats_read()

    Linus Torvalds
     
  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: Fix spurious irq seqfile conversion
    genirq: switch /proc/irq/*/spurious to seq_file
    irq: Do not attempt to create subdirectories if /proc/irq/ failed
    irq: Remove unused debug_poll_all_shared_irqs()
    irq: Fix docbook comments
    irq: trivial: Fix typo in comment for #endif

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'core-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    signal: Print warning message when dropping signals
    signal: Fix alternate signal stack check

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
    rcu: Make RCU's CPU-stall detector be default
    rcu: Add expedited grace-period support for preemptible RCU
    rcu: Enable fourth level of TREE_RCU hierarchy
    rcu: Rename "quiet" functions
    rcu: Re-arrange code to reduce #ifdef pain
    rcu: Eliminate unneeded function wrapping
    rcu: Fix grace-period-stall bug on large systems with CPU hotplug
    rcu: Eliminate __rcu_pending() false positives
    rcu: Further cleanups of use of lastcomp
    rcu: Simplify association of forced quiescent states with grace periods
    rcu: Accelerate callback processing on CPUs not detecting GP end
    rcu: Mark init-time-only rcu_bootup_announce() as __init
    rcu: Simplify association of quiescent states with grace periods
    rcu: Rename dynticks_completed to completed_fqs
    rcu: Enable synchronize_sched_expedited() fastpath
    rcu: Remove inline from forward-referenced functions
    rcu: Fix note_new_gpnum() uses of ->gpnum
    rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
    rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
    rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
    ...

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ratelimit: Make suppressed output messages more useful
    printk: Remove ratelimit.h from kernel.h
    ratelimit: Fix/allow use in atomic contexts
    ratelimit: Use per ratelimit context locking

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    mutex: Fix missing conditions to build mutex_spin_on_owner()
    mutex: Better control mutex adaptive spinning config
    locking, task_struct: Reduce size on TRACE_IRQFLAGS and 64bit
    locking: Use __[SPIN|RW]_LOCK_UNLOCKED in [spin|rw]_lock_init()
    locking: Remove unused prototype
    locking: Reduce ifdefs in kernel/spinlock.c
    locking: Make inlining decision Kconfig based

    Linus Torvalds
     
  • * 'core-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    generic-ipi: Add smp_call_function_any()
    generic-ipi: Fix misleading smp_call_function*() description

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
    x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
    x86/amd-iommu: Remove amd_iommu_pd_table
    x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
    x86/amd-iommu: Cleanup DTE flushing code
    x86/amd-iommu: Introduce iommu_flush_device() function
    x86/amd-iommu: Cleanup attach/detach_device code
    x86/amd-iommu: Keep devices per domain in a list
    x86/amd-iommu: Add device bind reference counting
    x86/amd-iommu: Use dev->arch->iommu to store iommu related information
    x86/amd-iommu: Remove support for domain sharing
    x86/amd-iommu: Rearrange dma_ops related functions
    x86/amd-iommu: Move some pte allocation functions in the right section
    x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
    x86/amd-iommu: Use get_device_id and check_device where appropriate
    x86/amd-iommu: Move find_protection_domain to helper functions
    x86/amd-iommu: Simplify get_device_resources()
    x86/amd-iommu: Let domain_for_device handle aliases
    x86/amd-iommu: Remove iommu specific handling from dma_ops path
    x86/amd-iommu: Remove iommu parameter from __(un)map_single
    x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (31 commits)
    GFS2: Fix glock refcount issues
    writeback: remove unused nonblocking and congestion checks (gfs2)
    GFS2: drop rindex glock to refresh rindex list
    GFS2: Tag all metadata with jid
    GFS2: Locking order fix in gfs2_check_blk_state
    GFS2: Remove dirent_first() function
    GFS2: Display nobarrier option in /proc/mounts
    GFS2: add barrier/nobarrier mount options
    GFS2: remove division from new statfs code
    GFS2: Improve statfs and quota usability
    GFS2: Use dquot_send_warning()
    VFS: Export dquot_send_warning
    GFS2: Add set_xquota support
    GFS2: Add get_xquota support
    GFS2: Clean up gfs2_adjust_quota() and do_glock()
    GFS2: Remove constant argument from qd_get()
    GFS2: Remove constant argument from qdsb_get()
    GFS2: Add proper error reporting to quota sync via sysfs
    GFS2: Add get_xstate quota function
    GFS2: Remove obsolete code in quota.c
    ...

    Linus Torvalds
     
  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (30 commits)
    TOMOYO: Add recursive directory matching operator support.
    remove CONFIG_SECURITY_FILE_CAPABILITIES compile option
    SELinux: print denials for buggy kernel with unknown perms
    Silence the existing API for capability version compatibility check.
    LSM: Move security_path_chmod()/security_path_chown() to after mutex_lock().
    SELinux: header generation may hit infinite loop
    selinux: Fix warnings
    security: report the module name to security_module_request
    Config option to set a default LSM
    sysctl: require CAP_SYS_RAWIO to set mmap_min_addr
    tpm: autoload tpm_tis based on system PnP IDs
    tpm_tis: TPM_STS_DATA_EXPECT workaround
    define convenient securebits masks for prctl users (v2)
    tpm: fix header for modular build
    tomoyo: improve hash bucket dispersion
    tpm add default function definitions
    LSM: imbed ima calls in the security hooks
    SELinux: add .gitignore files for dynamic classes
    security: remove root_plug
    SELinux: fix locking issue introduced with c6d3aaa4e35c71a3
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6: (50 commits)
    pcmcia: rework the irq_req_t typedef
    pcmcia: remove deprecated handle_to_dev() macro
    pcmcia: pcmcia_request_window() doesn't need a pointer to a pointer
    pcmcia: remove unused "window_t" typedef
    pcmcia: move some window-related code to pcmcia_ioctl.c
    pcmcia: Change window_handle_t logic to unsigned long
    pcmcia: Pass struct pcmcia_socket to pcmcia_get_mem_page()
    pcmcia: Pass struct pcmcia_device to pcmcia_map_mem_page()
    pcmcia: Pass struct pcmcia_device to pcmcia_release_window()
    drivers/pcmcia: remove unnecessary kzalloc
    pcmcia: correct handling for Zoomed Video registers in topic.h
    pcmcia: fix printk formats
    pcmcia: autoload module pcmcia
    pcmcia/staging: update comedi drivers
    PCMCIA: stop duplicating pci_irq in soc_pcmcia_socket
    PCMCIA: ss: allow PCI IRQs > 255
    PCMCIA: soc_common: remove 'dev' member from soc_pcmcia_socket
    PCMCIA: soc_common: constify soc_pcmcia_socket ops member
    PCMCIA: sa1111: remove duplicated initializers
    PCMCIA: sa1111: wrap soc_pcmcia_socket to contain sa1111 specific data
    ...

    Linus Torvalds
     
  • Starting with version 4.5, GCC has a new built-in function
    __builtin_unreachable() that can be used in places like the kernel's
    BUG() where inline assembly is used to transfer control flow. This
    eliminated the need for an endless loop in these places.

    The patch adds a new macro 'unreachable()' that will expand to either
    __builtin_unreachable() or an endless loop depending on the compiler
    version.

    Change from v1: Simplify unreachable() for non-GCC 4.5 case.

    Signed-off-by: David Daney
    Acked-by: Ralf Baechle
    Signed-off-by: Linus Torvalds

    David Daney
     

04 Dec, 2009

1 commit


03 Dec, 2009

8 commits

  • There are two spare field in the header common to all GFS2
    metadata. One is just the right size to fit a journal id
    in it, and this patch updates the journal code so that each
    time a metadata block is modified, we tag it with the journal
    id of the node which is performing the modification.

    The reason for this is that it should make it much easier to
    debug issues which arise if we can tell which node was the
    last to modify a particular metadata block.

    Since the field is updated before the block is written into
    the journal, each journal should only contain metadata which
    is tagged with its own journal id. The one exception to this
    is the journal header block, which might have a different node's
    id in it, if that journal was recovered by another node in the
    cluster.

    Thus each journal will contain a record of which nodes recovered
    it, via the journal header.

    The other field in the metadata header could potentially be
    used to hold information about what kind of operation was
    performed, but for the time being we just zero it on each
    transaction so that if we use it for that in future, we'll
    know that the information (where it exists) is reliable.

    I did consider using the other field to hold the journal
    sequence number, however since in GFS2's journaling we write
    the modified data into the journal and not the original
    data, this gives no information as to what action caused the
    modification, so I think we can probably come up with a better
    use for those 64 bits in the future.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • Sending a message to userspace in a generic format to warn
    of events (e.g. quota exceeded) in the quota subsystem is
    a generically useful feature. This patch makes some minor
    changes to the send_message function from dquot.c renaming
    it quota_send_message, moving it to quota.c and exporting it
    for use by filesystems which do not use the dquot code.

    Signed-off-by: Steven Whitehouse

    Steven Whitehouse
     
  • This is required for cluster filesystems which want to use
    cached ACLs so that they can invalidate the cache when
    required.

    Signed-off-by: Steven Whitehouse
    Cc: Alexander Viro
    Cc: Christoph Hellwig

    Steven Whitehouse
     
  • James Morris
     
  • Maybe 4.1.0 doesn't too, but this fixed it for me.

    Caused by:

    4a31276: x86: Turn the copy_from_user check into an (optional) compile time warning
    9f0cf4a: x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()

    Signed-off-by: Andrew Morton
    Cc: Arjan van de Ven
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Andrew Morton
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
    mfd: Correct WM831X_MAX_ISEL_VALUE

    Linus Torvalds
     
  • This is a real fix for problem of utime/stime values decreasing
    described in the thread:

    http://lkml.org/lkml/2009/11/3/522

    Now cputime is accounted in the following way:

    - {u,s}time in task_struct are increased every time when the thread
    is interrupted by a tick (timer interrupt).

    - When a thread exits, its {u,s}time are added to signal->{u,s}time,
    after adjusted by task_times().

    - When all threads in a thread_group exits, accumulated {u,s}time
    (and also c{u,s}time) in signal struct are added to c{u,s}time
    in signal struct of the group's parent.

    So {u,s}time in task struct are "raw" tick count, while
    {u,s}time and c{u,s}time in signal struct are "adjusted" values.

    And accounted values are used by:

    - task_times(), to get cputime of a thread:
    This function returns adjusted values that originates from raw
    {u,s}time and scaled by sum_exec_runtime that accounted by CFS.

    - thread_group_cputime(), to get cputime of a thread group:
    This function returns sum of all {u,s}time of living threads in
    the group, plus {u,s}time in the signal struct that is sum of
    adjusted cputimes of all exited threads belonged to the group.

    The problem is the return value of thread_group_cputime(),
    because it is mixed sum of "raw" value and "adjusted" value:

    group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

    This misbehavior can break {u,s}time monotonicity.
    Assume that if there is a thread that have raw values greater
    than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
    but only runs 45ms) and if it exits, cputime will decrease (e.g.
    -5ms).

    To fix this, we could do:

    group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

    But task_times() contains hard divisions, so applying it for
    every thread should be avoided.

    This patch fixes the above problem in the following way:

    - Modify thread's exit (= __exit_signal()) not to use task_times().
    It means {u,s}time in signal struct accumulates raw values instead
    of adjusted values. As the result it makes thread_group_cputime()
    to return pure sum of "raw" values.

    - Introduce a new function thread_group_times(*task, *utime, *stime)
    that converts "raw" values of thread_group_cputime() to "adjusted"
    values, in same calculation procedure as task_times().

    - Modify group's exit (= wait_task_zombie()) to use this introduced
    thread_group_times(). It make c{u,s}time in signal struct to
    have adjusted values like before this patch.

    - Replace some thread_group_cputime() by thread_group_times().
    This replacements are only applied where conveys the "adjusted"
    cputime to users, and where already uses task_times() near by it.
    (i.e. sys_times(), getrusage(), and /proc//stat.)

    This patch have a positive side effect:

    - Before this patch, if a group contains many short-life threads
    (e.g. runs 0.9ms and not interrupted by ticks), the group's
    cputime could be invisible since thread's cputime was accumulated
    after adjusted: imagine adjustment function as adj(ticks, runtime),
    {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
    After this patch it will not happen because the adjustment is
    applied after accumulated.

    v2:
    - remove if()s, put new variables into signal_struct.

    Signed-off-by: Hidetoshi Seto
    Acked-by: Peter Zijlstra
    Cc: Spencer Candland
    Cc: Americo Wang
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     
  • - Remove if({u,s}t)s because no one call it with NULL now.
    - Use cputime_{add,sub}().
    - Add ifndef-endif for prev_{u,s}time since they are used
    only when !VIRT_CPU_ACCOUNTING.

    Signed-off-by: Hidetoshi Seto
    Cc: Peter Zijlstra
    Cc: Spencer Candland
    Cc: Americo Wang
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Stanislaw Gruszka
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hidetoshi Seto
     

02 Dec, 2009

10 commits


01 Dec, 2009

1 commit