20 Sep, 2011

3 commits


15 Sep, 2011

1 commit

  • Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to
    make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing
    and then incrementing nr_active when it activates a delayed work.

    We discovered this when a corner case in one of our drivers resulted in
    us trying to destroy a workqueue in which the remaining work would
    always requeue itself again in the same workqueue. We would hit this
    race condition and trip the BUG_ON on workqueue.c:3080.

    Signed-off-by: Thomas Tuttle
    Acked-by: Tejun Heo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Tuttle
     

12 Sep, 2011

1 commit

  • If an irq_chip provides .irq_shutdown(), but neither of .irq_disable() or
    .irq_mask(), free_irq() crashes when jumping to NULL.
    Fix this by only trying .irq_disable() and .irq_mask() if there's no
    .irq_shutdown() provided.

    This revives the symmetry with irq_startup(), which tries .irq_startup(),
    .irq_enable(), and irq_unmask(), and makes it consistent with the comment for
    irq_chip.irq_shutdown() in , which says:

    * @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL)

    This is also how __free_irq() behaved before the big overhaul, cfr. e.g.
    3b56f0585fd4c02d047dc406668cb40159b2d340 ("genirq: Remove bogus conditional"),
    where the core interrupt code always overrode .irq_shutdown() to
    .irq_disable() if .irq_shutdown() was NULL.

    Signed-off-by: Geert Uytterhoeven
    Cc: linux-m68k@lists.linux-m68k.org
    Link: http://lkml.kernel.org/r/1315742394-16036-2-git-send-email-geert@linux-m68k.org
    Cc: stable@kernel.org
    Signed-off-by: Thomas Gleixner

    Geert Uytterhoeven
     

08 Sep, 2011

2 commits


31 Aug, 2011

1 commit

  • We detected a serious issue with PERF_SAMPLE_READ and
    timing information when events were being multiplexing.

    Samples would have time_running > time_enabled. That
    was easy to reproduce with a libpfm4 example (ran 3
    times to cause multiplexing on Core 2):

    $ syst_smpl -e uops_retired:freq=1 &
    $ syst_smpl -e uops_retired:freq=1 &
    $ syst_smpl -e uops_retired:freq=1 &
    IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184
    syst_smpl: WARNING: time_running > time_enabled
    63277537998 uops_retired:freq=1 , scaled

    The bug was not present in kernel up to (and including) 3.0. It turns
    out the bug was introduced by the following commit:

    commit c4794295917ebeda8013b6cb9c8d71ab4f74a1fa

    events: Move lockless timer calculation into helper function

    The parameters of the function got reversed yet the call sites
    were not updated to reflect the change. That lead to time_running
    and time_enabled being swapped. That had no effect when there was
    no multiplexing because in that case time_running = time_enabled
    but it would show up in any other scenario.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110829124112.GA4828@quad
    Signed-off-by: Ingo Molnar

    Eric B Munson
     

29 Aug, 2011

4 commits

  • The current cgroup context switch code was incorrect leading
    to bogus counts. Furthermore, as soon as there was an active
    cgroup event on a CPU, the context switch cost on that CPU
    would increase by a significant amount as demonstrated by a
    simple ping/pong example:

    $ ./pong
    Both processes pinned to CPU1, running for 10s
    10684.51 ctxsw/s

    Now start a cgroup perf stat:
    $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

    $ ./pong
    Both processes pinned to CPU1, running for 10s
    6674.61 ctxsw/s

    That's a 37% penalty.

    Note that pong is not even in the monitored cgroup.

    The results shown by perf stat are bogus:
    $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100

    Performance counter stats for 'sleep 100':

    CPU1 cycles test
    CPU1 16,984,189,138 cycles # 0.000 GHz

    The second 'cycles' event should report a count @ CPU clock
    (here 2.4GHz) as it is counting across all cgroups.

    The patch below fixes the bogus accounting and bypasses any
    cgroup switches in case the outgoing and incoming tasks are
    in the same cgroup.

    With this patch the same test now yields:
    $ ./pong
    Both processes pinned to CPU1, running for 10s
    10775.30 ctxsw/s

    Start perf stat with cgroup:

    $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

    Run pong outside the cgroup:
    $ /pong
    Both processes pinned to CPU1, running for 10s
    10687.80 ctxsw/s

    The penalty is now less than 2%.

    And the results for perf stat are correct:

    $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

    Performance counter stats for 'sleep 10':

    CPU1 cycles test # 0.000 GHz
    CPU1 23,933,981,448 cycles # 0.000 GHz

    Now perf stat reports the correct counts for
    for the non cgroup event.

    If we run pong inside the cgroup, then we also get the
    correct counts:

    $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10

    Performance counter stats for 'sleep 10':

    CPU1 22,297,726,205 cycles test # 0.000 GHz
    CPU1 23,933,981,448 cycles # 0.000 GHz

    10.001457237 seconds time elapsed

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch fixes the following memory leak:

    unreferenced object 0xffff880107266800 (size 512):
    comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] create_object+0x187/0x28b
    [] kmemleak_alloc+0x73/0x98
    [] __kmalloc_node+0x104/0x159
    [] kzalloc_node.clone.97+0x15/0x17
    [] build_sched_domains+0xb7/0x7f3
    [] partition_sched_domains+0x1db/0x24a
    [] do_rebuild_sched_domains+0x3b/0x47
    [] rebuild_sched_domains+0x10/0x12
    [] sched_power_savings_store+0x6c/0x7b
    [] sched_mc_power_savings_store+0x16/0x18
    [] sysdev_class_store+0x20/0x22
    [] sysfs_write_file+0x108/0x144
    [] vfs_write+0xaf/0x102
    [] sys_write+0x4d/0x74
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff

    Signed-off-by: WANG Cong
    Signed-off-by: Peter Zijlstra
    Cc: stable@kernel.org # 3.0
    Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.com
    Signed-off-by: Ingo Molnar

    WANG Cong
     
  • There is no real reason to run blk_schedule_flush_plug() with
    interrupts and preemption disabled.

    Move it into schedule() and call it when the task is going voluntarily
    to sleep. There might be false positives when the task is woken
    between that call and actually scheduling, but that's not really
    different from being woken immediately after switching away.

    This fixes a deadlock in the scheduler where the
    blk_schedule_flush_plug() callchain enables interrupts and thereby
    allows a wakeup to happen of the task that's going to sleep.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Cc: stable@kernel.org # 2.6.39+
    Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.org
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     
  • Block-IO and workqueues call into notifier functions from the
    scheduler core code with interrupts and preemption disabled. These
    calls should be made before entering the scheduler core.

    To simplify this, separate the scheduler core code into
    __schedule(). __schedule() is directly called from the places which
    set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
    checks into schedule(), so they are only called when a task voluntary
    goes to sleep.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra
    Cc: Tejun Heo
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Cc: stable@kernel.org # 2.6.39+
    Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

27 Aug, 2011

1 commit


26 Aug, 2011

2 commits

  • It seems that 7bf693951a8e ("console: allow to retain boot console via
    boot option keep_bootcon") doesn't always achieve what it aims, as when
    printk_late_init() runs it unconditionally turns off all boot consoles.
    With this patch, I am able to see more messages on the boot console in
    KVM guests than I can without, when keep_bootcon is specified.

    I think it is appropriate for the relevant -stable trees. However, it's
    more of an annoyance than a serious bug (ideally you don't need to keep
    the boot console around as console handover should be working -- I was
    encountering a situation where the console handover wasn't working and
    not having the boot console available meant I couldn't see why).

    Signed-off-by: Nishanth Aravamudan
    Cc: David S. Miller
    Cc: Alan Cox
    Cc: Greg KH
    Acked-by: Fabio M. Di Nitto
    Cc: [2.6.39.x, 3.0.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     
  • I ran into a couple of programs which broke with the new Linux 3.0
    version. Some of those were binary only. I tried to use LD_PRELOAD to
    work around it, but it was quite difficult and in one case impossible
    because of a mix of 32bit and 64bit executables.

    For example, all kind of management software from HP doesnt work, unless
    we pretend to run a 2.6 kernel.

    $ uname -a
    Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux

    $ hpacucli ctrl all show

    Error: No controllers detected.

    $ rpm -qf /usr/sbin/hpacucli
    hpacucli-8.75-12.0

    Another notable case is that Python now reports "linux3" from
    sys.platform(); which in turn can break things that were checking
    sys.platform() == "linux2":

    https://bugzilla.mozilla.org/show_bug.cgi?id=664564

    It seems pretty clear to me though it's a bug in the apps that are using
    '==' instead of .startswith(), but this allows us to unbreak broken
    programs.

    This patch adds a UNAME26 personality that makes the kernel report a
    2.6.40+x version number instead. The x is the x in 3.x.

    I know this is somewhat ugly, but I didn't find a better workaround, and
    compatibility to existing programs is important.

    Some programs also read /proc/sys/kernel/osrelease. This can be worked
    around in user space with mount --bind (and a mount namespace)

    To use:

    wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
    gcc -o uname26 uname26.c
    ./uname26 program

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

24 Aug, 2011

2 commits

  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: fix tracing builds inside the source tree
    xfs: remove subdirectories
    xfs: don't expect xfs headers to be in subdirectories

    Linus Torvalds
     
  • This reverts commit f3637a5f2e2eb391ff5757bc83fb5de8f9726464.

    It turns out that this breaks several drivers, one example being OMAP
    boards which use the on-board OMAP UARTs and the omap-serial driver that
    will not boot to userspace after the commit.

    Paul Walmsley reports that enabling CONFIG_DEBUG_SHIRQ reveals 'IRQ
    handler type mismatch' errors:

    IRQ handler type mismatch for IRQ 74
    current handler: serial idle
    ...

    and the reason is that setting IRQF_ONESHOT will now result in those
    interrupt handlers having different IRQF flags, and thus being
    unsharable. So the commit log in the reverted commit:

    "Since it is required for those users and
    there is no difference for others it makes sense to add this flag
    unconditionally."

    is simply not true: there may not be any difference from a "actions at
    irq time", but there is a *big* difference wrt this flag testing irq
    management (see __setup_irq() in kernel/irq/manage.c).

    One solution may be to stop verifying IRQF_ONESHOT in __setup_irq(), but
    right now the safe course of action is to revert the change. Let's
    revisit this in a later merge window.

    Reported-by: Paul Walmsley
    Cc: Sebastian Andrzej Siewior
    Requested-by: Alan Cox
    Acked-by: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

20 Aug, 2011

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
    Revert "cfq: Remove special treatment for metadata rqs."
    block: fix flush machinery for stacking drivers with differring flush flags
    block: improve rq_affinity placement
    blktrace: add FLUSH/FUA support
    Move some REQ flags to the common bio/request area
    allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
    xen/blkback: Make description more obvious.
    cfq-iosched: Add documentation about idling
    block: Make rq_affinity = 1 work as expected
    block: swim3: fix unterminated of_device_id table
    block/genhd.c: remove useless cast in diskstats_show()
    drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
    drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
    bsg-lib: add module.h include
    cfq-iosched: Reduce linked group count upon group destruction
    blk-throttle: correctly determine sync bio
    loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
    loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
    loop: add management interface for on-demand device allocation
    loop: replace linked list of allocated devices with an idr index
    ...

    Linus Torvalds
     

19 Aug, 2011

1 commit


18 Aug, 2011

3 commits


14 Aug, 2011

1 commit


13 Aug, 2011

1 commit

  • Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
    annoying subdirectories in the XFS source code. Besides the large
    amount of file rename the only changes are to the Makefile, a few
    files including headers with the subdirectory prefix, and the binary
    sysctl compat code that includes a header under fs/xfs/ from
    kernel/.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

12 Aug, 2011

2 commits

  • The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
    check in set_user() to check for NPROC exceeding via setuid() and
    similar functions.

    Before the check there was a possibility to greatly exceed the allowed
    number of processes by an unprivileged user if the program relied on
    rlimit only. But the check created new security threat: many poorly
    written programs simply don't check setuid() return code and believe it
    cannot fail if executed with root privileges. So, the check is removed
    in this patch because of too often privilege escalations related to
    buggy programs.

    The NPROC can still be enforced in the common code flow of daemons
    spawning user processes. Most of daemons do fork()+setuid()+execve().
    The check introduced in execve() (1) enforces the same limit as in
    setuid() and (2) doesn't create similar security issues.

    Neil Brown suggested to track what specific process has exceeded the
    limit by setting PF_NPROC_EXCEEDED process flag. With the change only
    this process would fail on execve(), and other processes' execve()
    behaviour is not changed.

    Solar Designer suggested to re-check whether NPROC limit is still
    exceeded at the moment of execve(). If the process was sleeping for
    days between set*uid() and execve(), and the NPROC counter step down
    under the limit, the defered execve() failure because NPROC limit was
    exceeded days ago would be unexpected. If the limit is not exceeded
    anymore, we clear the flag on successful calls to execve() and fork().

    The flag is also cleared on successful calls to set_user() as the limit
    was exceeded for the previous user, not the current one.

    Similar check was introduced in -ow patches (without the process flag).

    v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

    Reviewed-by: James Morris
    Signed-off-by: Vasiliy Kulikov
    Acked-by: NeilBrown
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • …l/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf symbols: Check '/tmp/perf-' symbol file ownership
    perf sched: Usage leftover from trace -> script rename
    perf sched: Do not delete session object prematurely
    perf tools: Check $HOME/.perfconfig ownership
    perf, x86: Add model 45 SandyBridge support
    perf tools: Add support to install perf python extension
    perf tools: do not look at ./config for configuration
    perf tools: Make clean leaves some files
    perf lock: Dropping unsupported ':r' modifier
    perf probe: Fix coredump introduced by probe module option
    jump label: Reduce the cycle count by changing the link order
    perf report: Use ui__warning in some more places
    perf python: Add PERF_RECORD_{LOST,READ,SAMPLE} routine tables
    perf evlist: Introduce 'disable' method
    trace events: Update version number reference to new 3.x scheme for EVENT_POWER_TRACING_DEPRECATED
    perf buildid-cache: Zero out buffer of filenames when adding/removing buildid

    Linus Torvalds
     

11 Aug, 2011

2 commits

  • Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
    FUA follows WRITE, use the same 'F' flag for both cases and
    distinguish them by their (relative) position. The end results
    look like (other flags might be shown also):

    - WRITE: W
    - WRITE_FLUSH: FW
    - WRITE_FUA: WF
    - WRITE_FLUSH_FUA: FWF

    Note that we reuse TC_BARRIER due to lack of bit space of act_mask
    so that the older versions of blktrace tools will report flush
    requests as barriers from now on.

    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • Its possible to jam up the alarm timers by setting very small interval
    timers, which will cause the alarmtimer subsystem to spend all of its time
    firing and restarting timers. This can effectivly lock up a box.

    A deeper fix is needed, closely mimicking the hrtimer code, but for now
    just cap the interval to 100us to avoid userland hanging the system.

    CC: Thomas Gleixner
    CC: stable@kernel.org
    Signed-off-by: John Stultz

    John Stultz
     

10 Aug, 2011

3 commits

  • Following common_timer_get, zero out the itimerspec passed in.

    CC: Thomas Gleixner
    CC: stable@kernel.org
    Signed-off-by: John Stultz

    John Stultz
     
  • We don't check if old_setting is non null before assigning it, so
    correct this.

    CC: Thomas Gleixner
    CC: stable@kernel.org
    Signed-off-by: John Stultz

    John Stultz
     
  • syslog-ng versions before 3.3.0beta1 (2011-05-12) assume that
    CAP_SYS_ADMIN is sufficient to access syslog, so ever since CAP_SYSLOG
    was introduced (2010-11-25) they have triggered a warning.

    Commit ee24aebffb75 ("cap_syslog: accept CAP_SYS_ADMIN for now")
    improved matters a little by making syslog-ng work again, just keeping
    the WARN_ONCE(). But still, this is a warning that writes a stack trace
    we don't care about to syslog, sets a taint flag, and alarms sysadmins
    when nothing worse has happened than use of an old userspace with a
    recent kernel.

    Convert the WARN_ONCE to a printk_once to avoid that while continuing to
    give userspace developers a hint that this is an unwanted
    backward-compatibility feature and won't be around forever.

    Reported-by: Ralf Hildebrandt
    Reported-by: Niels
    Reported-by: Paweł Sikora
    Signed-off-by: Jonathan Nieder
    Liked-by: Gergely Nagy
    Acked-by: Serge Hallyn
    Acked-by: James Morris
    Signed-off-by: Linus Torvalds

    Jonathan Nieder
     

09 Aug, 2011

1 commit

  • match_held_lock() was assuming it was being called on a lock class
    that had already seen usage.

    This condition was true for bug-free code using lockdep_assert_held(),
    since you're in fact holding the lock when calling it. However the
    assumption fails the moment you assume the assertion can fail, which
    is the whole point of having the assertion in the first place.

    Anyway, now that there's more lockdep_is_held() users, notably
    __rcu_dereference_check(), its much easier to trigger this since we
    test for a number of locks and we only need to hold any one of them to
    be good.

    Reported-by: Sergey Senozhatsky
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1312547787.28695.2.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

06 Aug, 2011

1 commit

  • In the course of testing jump labels for use with the CFS
    bandwidth controller, Paul Turner, discovered that using jump
    labels reduced the branch count and the instruction count, but
    did not reduce the cycle count or wall time.

    I noticed that having the jump_label.o included in the kernel
    but not used in any way still caused this increase in cycle
    count and wall time. Thus, I moved jump_label.o in the
    kernel/Makefile, thus changing the link order, and presumably
    moving it out of hot icache areas. This brought down the cycle
    count/time as expected.

    In addition to Paul's testing, I've tested the patch using a
    single 'static_branch()' in the getppid() path, and basically
    running tight loops of calls to getppid(). Here are my results
    for the branch disabled case:

    With jump labels turned on (CONFIG_JUMP_LABEL), branch disabled:

    Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):

    3,969,510,217 instructions # 0.864 IPC ( +-0.000% )
    4,592,334,954 cycles ( +- 0.046% )
    751,634,470 branches ( +- 0.000% )

    1.722635797 seconds time elapsed ( +- 0.046% )

    Jump labels turned off (CONFIG_JUMP_LABEL not set), branch
    disabled:

    Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):

    4,009,611,846 instructions # 0.867 IPC ( +-0.000% )
    4,622,210,580 cycles ( +- 0.012% )
    771,662,904 branches ( +- 0.000% )

    1.734341454 seconds time elapsed ( +- 0.022% )

    Signed-off-by: Jason Baron
    Cc: rth@redhat.com
    Cc: a.p.zijlstra@chello.nl
    Cc: rostedt@goodmis.org
    Link: http://lkml.kernel.org/r/20110805204040.GG2522@redhat.com
    Signed-off-by: Ingo Molnar
    Tested-by: Paul Turner

    Jason Baron
     

05 Aug, 2011

2 commits


04 Aug, 2011

5 commits

  • lockdep_init_map() only initializes parts of lockdep_map and triggers
    kmemcheck warning when it is copied as a whole. There isn't anything
    to be gained by clearing selectively. memset() the whole structure
    and remove loop for ->class_cache[] clearing.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=35532

    Signed-off-by: Tejun Heo
    Reported-and-tested-by: Christian Casteyde
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35532
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110714131909.GJ3455@htj.dyndns.org
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • On Sun, 2011-07-24 at 21:06 -0400, Arnaud Lacombe wrote:

    > /src/linux/linux/kernel/lockdep.c: In function 'mark_held_locks':
    > /src/linux/linux/kernel/lockdep.c:2471:31: warning: comparison of
    > distinct pointer types lacks a cast

    The warning is harmless in this case, but the below makes it go away.

    Reported-by: Arnaud Lacombe
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311588599.2617.56.camel@laptop
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Commit dd4e5d3ac4a ("lockdep: Fix trace_[soft,hard]irqs_[on,off]()
    recursion") made a bit of a mess of the various checks and error
    conditions.

    In particular it moved the check for !irqs_disabled() before the
    spurious enable test, resulting in some warnings.

    Reported-by: Arnaud Lacombe
    Reported-by: Dave Jones
    Reported-and-tested-by: Sergey Senozhatsky
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1311679697.24752.28.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The core device layer sends tons of uevent notifications for each device
    it finds, and if the kernel has been built with a non-empty
    CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
    helper binary for all these events very early in the boot.

    Not only won't the root filesystem even be mounted at that point, we
    literally won't have necessarily even initialized all the process
    handling data structures at that point, which causes no end of silly
    problems even when the usermode helper doesn't actually succeed in
    executing.

    So just use our existing infrastructure to disable the usermodehelpers
    to make the kernel start out with them disabled. We enable them when
    we've at least initialized stuff a bit.

    Problems related to an uninitialized

    init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex

    reported by various people.

    Reported-by: Manuel Lauss
    Reported-by: Richard Weinberger
    Reported-by: Marc Zyngier
    Acked-by: Kay Sievers
    Cc: Andrew Morton
    Cc: Vasiliy Kulikov
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Ingo Molnar