19 Feb, 2013

1 commit

  • IA64 relied on it through sched.h inclusion:

    arch/ia64/kernel/init_task.c:38:11: error: 'MAX_PRIO' undeclared here (not in a function)
    arch/ia64/kernel/init_task.c:38:11: error: 'RR_TIMESLICE' undeclared here (not in a function)

    Reported-by: kbuild test robot
    Cc: Clark Williams
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/n/tip-xaan1twswggedMR0airtpjui@git.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Jan, 2013

1 commit

  • While remotely reading the cputime of a task running in a
    full dynticks CPU, the values stored in utime/stime fields
    of struct task_struct may be stale. Its values may be those
    of the last kernel user transition time snapshot and
    we need to add the tickless time spent since this snapshot.

    To fix this, flush the cputime of the dynticks CPUs on
    kernel user transition and record the time / context
    where we did this. Then on top of this snapshot and the current
    time, perform the fixup on the reader side from task_times()
    accessors.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    [fixed kvm module related build errors]
    Signed-off-by: Sedat Dilek

    Frederic Weisbecker
     

18 Sep, 2012

1 commit

  • Always store audit loginuids in type kuid_t.

    Print loginuids by converting them into uids in the appropriate user
    namespace, and then printing the resulting uid.

    Modify audit_get_loginuid to return a kuid_t.

    Modify audit_set_loginuid to take a kuid_t.

    Modify /proc//loginuid on read to convert the loginuid into the
    user namespace of the opener of the file.

    Modify /proc//loginud on write to convert the loginuid
    rom the user namespace of the opener of the file.

    Cc: Al Viro
    Cc: Eric Paris
    Cc: Paul Moore ?
    Cc: David Miller
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

27 Jul, 2012

1 commit

  • Pull scheduler changes from Ingo Molnar:
    "The biggest change is a performance improvement on SMP systems:

    | 4 socket 40 core + SMT Westmere box, single 30 sec tbench
    | runs, higher is better:
    |
    | clients 1 2 4 8 16 32 64 128
    |..........................................................................
    | pre 30 41 118 645 3769 6214 12233 14312
    | post 299 603 1211 2418 4697 6847 11606 14557
    |
    | A nice increase in performance.

    which speedup is particularly noticeable on heavily interacting
    few-tasks workloads, so the changes should help desktop-style Xorg
    workloads and interactivity as well, on multi-core CPUs.

    There are also cpuset suspend behavior fixes/restructuring and various
    smaller tweaks."

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Fix race in task_group()
    sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task
    sched: Reset loop counters if all tasks are pinned and we need to redo load balance
    sched: Reorder 'struct lb_env' members to reduce its size
    sched: Improve scalability via 'CPU buddies', which withstand random perturbations
    cpusets: Remove/update outdated comments
    cpusets, hotplug: Restructure functions that are invoked during hotplug
    cpusets, hotplug: Implement cpuset tree traversal in a helper function
    CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume
    sched/x86: Remove broken power estimation

    Linus Torvalds
     

24 Jul, 2012

1 commit

  • Stefan reported a crash on a kernel before a3e5d1091c1 ("sched:
    Don't call task_group() too many times in set_task_rq()"), he
    found the reason to be that the multiple task_group()
    invocations in set_task_rq() returned different values.

    Looking at all that I found a lack of serialization and plain
    wrong comments.

    The below tries to fix it using an extra pointer which is
    updated under the appropriate scheduler locks. Its not pretty,
    but I can't really see another way given how all the cgroup
    stuff works.

    Reported-and-tested-by: Stefan Bader
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jul, 2012

1 commit


30 May, 2012

1 commit


22 Mar, 2012

1 commit

  • Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
    changing cpuset's mems") wins a super prize for the largest number of
    memory barriers entered into fast paths for one commit.

    [get|put]_mems_allowed is incredibly heavy with pairs of full memory
    barriers inserted into a number of hot paths. This was detected while
    investigating at large page allocator slowdown introduced some time
    after 2.6.32. The largest portion of this overhead was shown by
    oprofile to be at an mfence introduced by this commit into the page
    allocator hot path.

    For extra style points, the commit introduced the use of yield() in an
    implementation of what looks like a spinning mutex.

    This patch replaces the full memory barriers on both read and write
    sides with a sequence counter with just read barriers on the fast path
    side. This is much cheaper on some architectures, including x86. The
    main bulk of the patch is the retry logic if the nodemask changes in a
    manner that can cause a false failure.

    While updating the nodemask, a check is made to see if a false failure
    is a risk. If it is, the sequence number gets bumped and parallel
    allocators will briefly stall while the nodemask update takes place.

    In a page fault test microbenchmark, oprofile samples from
    __alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
    actual results were

    3.3.0-rc3 3.3.0-rc3
    rc3-vanilla nobarrier-v2r1
    Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
    Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
    Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
    Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
    Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
    Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
    Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
    Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
    Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
    Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
    Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
    Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
    Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
    Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
    Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds) 135.68 132.17
    User+Sys Time Running Test (seconds) 164.2 160.13
    Total Elapsed Time (seconds) 123.46 120.87

    The overall improvement is small but the System CPU time is much
    improved and roughly in correlation to what oprofile reported (these
    performance figures are without profiling so skew is expected). The
    actual number of page faults is noticeably improved.

    For benchmarks like kernel builds, the overall benefit is marginal but
    the system CPU time is slightly reduced.

    To test the actual bug the commit fixed I opened two terminals. The
    first ran within a cpuset and continually ran a small program that
    faulted 100M of anonymous data. In a second window, the nodemask of the
    cpuset was continually randomised in a loop.

    Without the commit, the program would fail every so often (usually
    within 10 seconds) and obviously with the commit everything worked fine.
    With this patch applied, it also worked fine so the fix should be
    functionally equivalent.

    Signed-off-by: Mel Gorman
    Cc: Miao Xie
    Cc: David Rientjes
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

22 Feb, 2012

1 commit

  • Current the initial SCHED_RR timeslice of init_task is HZ, which means
    1s, and is not same as the default SCHED_RR timeslice DEF_TIMESLICE.

    Change that initial timeslice to the DEF_TIMESLICE.

    Signed-off-by: Hiroshi Shimamoto
    [ s/DEF_TIMESLICE/RR_TIMESLICE/g ]
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4F3C9995.3010800@ct.jp.nec.com
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     

10 Jan, 2012

1 commit

  • * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
    cgroup: fix to allow mounting a hierarchy by name
    cgroup: move assignement out of condition in cgroup_attach_proc()
    cgroup: Remove task_lock() from cgroup_post_fork()
    cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
    cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
    cgroup: only need to check oldcgrp==newgrp once
    cgroup: remove redundant get/put of task struct
    cgroup: remove redundant get/put of old css_set from migrate
    cgroup: Remove unnecessary task_lock before fetching css_set on migration
    cgroup: Drop task_lock(parent) on cgroup_fork()
    cgroups: remove redundant get/put of css_set from css_set_check_fetched()
    resource cgroups: remove bogus cast
    cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
    cgroup, cpuset: don't use ss->pre_attach()
    cgroup: don't use subsys->can_attach_task() or ->attach_task()
    cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
    cgroup: improve old cgroup handling in cgroup_attach_proc()
    cgroup: always lock threadgroup during migration
    threadgroup: extend threadgroup_lock() to cover exit and exec
    threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
    ...

    Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
    fix a css_set not found bug in cgroup_attach_proc" that already
    mentioned that the bug is fixed (differently) in Tejun's cgroup
    patchset. This one, in other words.

    Linus Torvalds
     

13 Dec, 2011

1 commit

  • Make the following renames to prepare for extension of threadgroup
    locking.

    * s/signal->threadgroup_fork_lock/signal->group_rwsem/
    * s/threadgroup_fork_read_lock()/threadgroup_change_begin()/
    * s/threadgroup_fork_read_unlock()/threadgroup_change_end()/
    * s/threadgroup_fork_write_lock()/threadgroup_lock()/
    * s/threadgroup_fork_write_unlock()/threadgroup_unlock()/

    This patch doesn't cause any behavior change.

    -v2: Rename threadgroup_change_done() to threadgroup_change_end() per
    KAMEZAWA's suggestion.

    Signed-off-by: Tejun Heo
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Li Zefan
    Cc: Oleg Nesterov
    Cc: Andrew Morton
    Cc: Paul Menage

    Tejun Heo
     

06 Dec, 2011

1 commit

  • * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched, x86: Avoid unnecessary overflow in sched_clock
    sched: Fix buglet in return_cfs_rq_runtime()
    sched: Avoid SMT siblings in select_idle_sibling() if possible
    sched: Set the command name of the idle tasks in SMP kernels
    sched, rt: Provide means of disabling cross-cpu bandwidth sharing
    sched: Document wait_for_completion_*() return values
    sched_fair: Fix a typo in the comment describing update_sd_lb_stats
    sched: Add a comment to effective_load() since it's a pain

    Linus Torvalds
     

17 Nov, 2011

1 commit


14 Nov, 2011

1 commit

  • In UP systems, the idle task is initialized using the init_task
    structure from which the command name is taken (currently "swapper").

    In SMP systems, one idle task per CPU is forked by the worker thread
    from which the task structure is copied. The command name is, therefore,
    "kworker/0:0" or "kworker/0:1", if not updated. Since such update was
    lacking, all idle tasks in SMP systems were incorrectly named. This
    longtime bug was not discovered immediately, because there is no /proc/0
    entry - the bug only becomes apparent when tracing is enabled.

    This patch sets the command name of the idle tasks in SMP systems to the
    name that is used in the INIT_TASK structure suffixed by a slash and the
    number of the CPU.

    Signed-off-by: Carsten Emde
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111026211708.768925506@osadl.org
    Signed-off-by: Ingo Molnar

    Carsten Emde
     

13 Sep, 2011

1 commit

  • The thread_group_cputimer lock can be taken in atomic context and therefore
    cannot be preempted on -rt - annotate it.

    In mainline this change documents the low level nature of
    the lock - otherwise there's no functional difference. Lockdep
    and Sparse checking will work as usual.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

12 Jul, 2011

1 commit

  • fs_excl is a poor man's priority inheritance for filesystems to hint to
    the block layer that an operation is important. It was never clearly
    specified, not widely adopted, and will not prevent starvation in many
    cases (like across cgroups).

    fs_excl was introduced with the time sliced CFQ IO scheduler, to
    indicate when a process held FS exclusive resources and thus needed
    a boost.

    It doesn't cover all file systems, and it was never fully complete.
    Lets kill it.

    Signed-off-by: Justin TerAvest
    Signed-off-by: Jens Axboe

    Justin TerAvest
     

27 May, 2011

1 commit

  • Adds functionality to read/write lock CLONE_THREAD fork()ing per-threadgroup

    Add an rwsem that lives in a threadgroup's signal_struct that's taken for
    reading in the fork path, under CONFIG_CGROUPS. If another part of the
    kernel later wants to use such a locking mechanism, the CONFIG_CGROUPS
    ifdefs should be changed to a higher-up flag that CGROUPS and the other
    system would both depend on.

    This is a pre-patch for cgroup-procs-write.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

24 May, 2011

1 commit


24 Apr, 2011

1 commit

  • Neil Brown pointed out that lock_depth somehow escaped the BKL
    removal work. Let's get rid of it now.

    Note that the perf scripting utilities still have a bunch of
    code for dealing with common_lock_depth in tracepoints; I have
    left that in place in case anybody wants to use that code with
    older kernels.

    Suggested-by: Neil Brown
    Signed-off-by: Jonathan Corbet
    Cc: Arnd Bergmann
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20110422111910.456c0e84@bike.lwn.net
    Signed-off-by: Ingo Molnar

    Jonathan Corbet
     

04 Apr, 2011

1 commit


07 Jan, 2011

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
    sched: Change wait_for_completion_*_timeout() to return a signed long
    sched, autogroup: Fix reference leak
    sched, autogroup: Fix potential access to freed memory
    sched: Remove redundant CONFIG_CGROUP_SCHED ifdef
    sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight
    sched: Move periodic share updates to entity_tick()
    printk: Use this_cpu_{read|write} api on printk_pending
    sched: Make pushable_tasks CONFIG_SMP dependant
    sched: Add 'autogroup' scheduling feature: automated per session task groups
    sched: Fix unregister_fair_sched_group()
    sched: Remove unused argument dest_cpu to migrate_task()
    mutexes, sched: Introduce arch_mutex_cpu_relax()
    sched: Add some clock info to sched_debug
    cpu: Remove incorrect BUG_ON
    cpu: Remove unused variable
    sched: Fix UP build breakage
    sched: Make task dump print all 15 chars of proc comm
    sched: Update tg->shares after cpu.shares write
    sched: Allow update_cfs_load() to update global load
    sched: Implement demand based update_cfs_load()
    ...

    Linus Torvalds
     

23 Dec, 2010

1 commit


09 Dec, 2010

1 commit

  • As noted by Peter Zijlstra at https://lkml.org/lkml/2010/11/10/391
    (while reviewing other stuff, though), tracking pushable tasks
    only makes sense on SMP systems.

    Signed-off-by: Dario Faggioli
    Acked-by: Steven Rostedt
    Acked-by: Gregory Haskins
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Dario Faggioli
     

30 Nov, 2010

1 commit

  • Add priority boosting, but only for TINY_PREEMPT_RCU. This is enabled
    by the default-off RCU_BOOST kernel parameter. The priority to which to
    boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel
    parameter (defaulting to real-time priority 1) and the time to wait
    before boosting the readers blocking a given grace period is controlled
    by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

28 Oct, 2010

1 commit

  • Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
    itself and we can reuse ->cred_guard_mutex for it. Yes, concurrent
    execve() has no worth.

    Let's move ->cred_guard_mutex from task_struct to signal_struct. It
    naturally prevent multiple-threads-inside-exec.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Oleg Nesterov
    Acked-by: Roland McGrath
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

20 Aug, 2010

2 commits

  • Implement a small-memory-footprint uniprocessor-only implementation of
    preemptible RCU. This implementation uses but a single blocked-tasks
    list rather than the combinatorial number used per leaf rcu_node by
    TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
    processing. This version also takes advantage of uniprocessor execution
    to accelerate grace periods in the case where there are no readers.

    The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.

    This implementation is a step towards having RCU implementation driven
    off of the SMP and PREEMPT kernel configuration variables, which can
    happen once this implementation has accumulated sufficient experience.

    Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
    suggested by Steve Rostedt in order to avoid the compiler-reordering
    issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).

    As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
    savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time
    workloads, CONFIG_TINY_RCU is even better.

    CONFIG_TREE_PREEMPT_RCU

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    6170 825 28 7023 kernel/rcutree.o
    ----
    7026 Total

    CONFIG_TINY_PREEMPT_RCU

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    2081 81 8 2170 kernel/rcutiny.o
    ----
    2183 Total

    CONFIG_TINY_RCU (non-preemptible)

    text data bss dec filename
    13 0 0 13 kernel/rcupdate.o
    719 25 0 744 kernel/rcutiny.o
    ---
    757 Total

    Requested-by: Loïc Minier
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This adds annotations for RCU operations in core kernel components

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul E. McKenney
    Cc: Al Viro
    Cc: Jens Axboe
    Cc: Andrew Morton
    Reviewed-by: Josh Triplett

    Arnd Bergmann
     

01 Jun, 2010

1 commit

  • * 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
    kbuild: Revert part of e8d400a to resolve a conflict
    kbuild: Fix checking of scm-identifier variable
    gconfig: add support to show hidden options that have prompts
    menuconfig: add support to show hidden options which have prompts
    gconfig: remove show_debug option
    gconfig: remove dbg_print_ptype() and dbg_print_stype()
    kconfig: fix zconfdump()
    kconfig: some small fixes
    add random binaries to .gitignore
    kbuild: Include gen_initramfs_list.sh and the file list in the .d file
    kconfig: recalc symbol value before showing search results
    .gitignore: ignore *.lzo files
    headerdep: perlcritic warning
    scripts/Makefile.lib: Align the output of LZO
    kbuild: Generate modules.builtin in make modules_install
    Revert "kbuild: specify absolute paths for cscope"
    kbuild: Do not unnecessarily regenerate modules.builtin
    headers_install: use local file handles
    headers_check: fix perl warnings
    export_report: fix perl warnings
    ...

    Linus Torvalds
     

28 May, 2010

4 commits

  • Cosmetic, no changes in the compiled code. Just s/NULL/SIG_DFL/ to make
    it more readable and grep-friendly.

    Note: probably SIG_IGN makes more sense, we could kill ignore_signals().
    But then kernel_init() should do flush_signal_handlers() before exec().

    Signed-off-by: Oleg Nesterov
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Herbert Poetzl
    Cc: Mathias Krause
    Acked-by: Roland McGrath
    Acked-by: Serge Hallyn
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • "statically initialize struct pid for swapper" commit 820e45db says:

    Statically initialize a struct pid for the swapper process (pid_t == 0)
    and attach it to init_task. This is needed so task_pid(), task_pgrp()
    and task_session() interfaces work on the swapper process also.

    OK, but:

    - it doesn't make sense to add init_task.pids[].node into
    init_struct_pid.tasks[], and in fact this just wrong.

    idle threads are special, they shouldn't be visible on any
    global list. In particular do_each_pid_task(init_struct_pid)
    shouldn't see swapper.

    This is the actual reason why kill(0, SIGKILL) from /sbin/init
    (which starts with 0,0 special pids) crashes the kernel. The
    signal sent to pgid/sid == 0 must never see idle threads, even
    if the previous patch fixed the crash itself.

    - we have other idle threads running on the non-boot CPUs, see
    the next patch.

    Change INIT_STRUCT_PID/INIT_PID_LINK to create the empty/unhashed
    hlist_head/hlist_node. Like any other idle thread swapper can never exit,
    so detach_pid()->__hlist_del() is not possible, but we could change
    INIT_PID_LINK() to set pprev = &next if needed.

    All we need is the valid swapper->pids[].pid == &init_struct_pid.

    Reported-by: Mathias Krause
    Signed-off-by: Oleg Nesterov
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Herbert Poetzl
    Cc: Mathias Krause
    Acked-by: Roland McGrath
    Acked-by: Serge Hallyn
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • The trivial /sbin/init doing

    int main(void)
    {
    kill(0, SIGKILL)
    }

    crashes the kernel.

    This happens because __kill_pgrp_info(init_struct_pid) also sends SIGKILL
    to the swapper process which runs with the uninitialized ->thread_group.

    Change INIT_TASK() to initialize ->thread_group properly.

    Note: the real problem is that the swapper process must not be visible to
    signals, see the next patch. But this change is right anyway and fixes
    the crash.

    Reported-and-tested-by: Mathias Krause
    Signed-off-by: Oleg Nesterov
    Cc: Cedric Le Goater
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Herbert Poetzl
    Cc: Mathias Krause
    Acked-by: Roland McGrath
    Acked-by: Serge Hallyn
    Acked-by: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No functional changes, just s/atomic_t count/int nr_threads/.

    With the recent changes this counter has a single user, get_nr_threads()
    And, none of its callers need the really accurate number of threads, not
    to mention each caller obviously races with fork/exit. It is only used to
    report this value to the user-space, except first_tid() uses it to avoid
    the unnecessary while_each_thread() loop in the unlikely case.

    It is a bit sad we need a word in struct signal_struct for this, perhaps
    we can change get_nr_threads() to approximate the number of threads using
    signal->live and kill ->nr_threads later.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Oleg Nesterov
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Acked-by: Roland McGrath
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

12 May, 2010

1 commit


13 Mar, 2010

1 commit

  • Remove INIT_NSPROXY(), use C99 initializer.
    Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it.

    Note: headers trim will be done later, now it's quite pointless because
    results will be invalidated by merge window.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

03 Mar, 2010

1 commit


18 Dec, 2009

1 commit

  • This reverts commit e4c570c4cb7a95dbfafa3d016d2739bf3fdfe319, as
    requested by Alexey:

    "I think I gave a good enough arguments to not merge it.
    To iterate:
    * patch makes impossible to start using ext3 on EXT3_FS=n kernels
    without reboot.
    * this is done only for one pointer on task_struct"

    None of config options which define task_struct are tristate directly
    or effectively."

    Requested-by: Alexey Dobriyan
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

16 Dec, 2009

2 commits

  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
    clockevents: Convert to raw_spinlock
    clockevents: Make tick_device_lock static
    debugobjects: Convert to raw_spinlocks
    perf_event: Convert to raw_spinlock
    hrtimers: Convert to raw_spinlocks
    genirq: Convert irq_desc.lock to raw_spinlock
    smp: Convert smplocks to raw_spinlocks
    rtmutes: Convert rtmutex.lock to raw_spinlock
    sched: Convert pi_lock to raw_spinlock
    sched: Convert cpupri lock to raw_spinlock
    sched: Convert rt_runtime_lock to raw_spinlock
    sched: Convert rq->lock to raw_spinlock
    plist: Make plist debugging raw_spinlock aware
    bkl: Fixup core_lock fallout
    locking: Cleanup the name space completely
    locking: Further name space cleanups
    alpha: Fix fallout from locking changes
    locking: Implement new raw_spinlock
    locking: Convert raw_rwlock functions to arch_rwlock
    locking: Convert raw_rwlock to arch_rwlock
    ...

    Linus Torvalds
     
  • journal_info in task_struct is used in journaling file system only. So
    introduce CONFIG_FS_JOURNAL_INFO and make it conditional.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Steven Whitehouse
    Cc: KONISHI Ryusuke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hiroshi Shimamoto
     

15 Dec, 2009

1 commit


24 Nov, 2009

1 commit

  • As far as I know, all distros currently ship kernels with default
    CONFIG_SECURITY_FILE_CAPABILITIES=y. Since having the option on
    leaves a 'no_file_caps' option to boot without file capabilities,
    the main reason to keep the option is that turning it off saves
    you (on my s390x partition) 5k. In particular, vmlinux sizes
    came to:

    without patch fscaps=n: 53598392
    without patch fscaps=y: 53603406
    with this patch applied: 53603342

    with the security-next tree.

    Against this we must weigh the fact that there is no simple way for
    userspace to figure out whether file capabilities are supported,
    while things like per-process securebits, capability bounding
    sets, and adding bits to pI if CAP_SETPCAP is in pE are not supported
    with SECURITY_FILE_CAPABILITIES=n, leaving a bit of a problem for
    applications wanting to know whether they can use them and/or why
    something failed.

    It also adds another subtly different set of semantics which we must
    maintain at the risk of severe security regressions.

    So this patch removes the SECURITY_FILE_CAPABILITIES compile
    option. It drops the kernel size by about 50k over the stock
    SECURITY_FILE_CAPABILITIES=y kernel, by removing the
    cap_limit_ptraced_target() function.

    Changelog:
    Nov 20: remove cap_limit_ptraced_target() as it's logic
    was ifndef'ed.

    Signed-off-by: Serge E. Hallyn
    Acked-by: Andrew G. Morgan"
    Signed-off-by: James Morris

    Serge E. Hallyn