07 Nov, 2011

1 commit

  • …/kernel/git/jeremy/xen

    * 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
    jump-label: initialize jump-label subsystem much earlier
    x86/jump_label: add arch_jump_label_transform_static()
    s390/jump-label: add arch_jump_label_transform_static()
    jump_label: add arch_jump_label_transform_static() to optimise non-live code updates
    sparc/jump_label: drop arch_jump_label_text_poke_early()
    x86/jump_label: drop arch_jump_label_text_poke_early()
    jump_label: if a key has already been initialized, don't nop it out
    stop_machine: make stop_machine safe and efficient to call early
    jump_label: use proper atomic_t initializer

    Conflicts:
    - arch/x86/kernel/jump_label.c
    Added __init_or_module to arch_jump_label_text_poke_early vs
    removal of that function entirely
    - kernel/stop_machine.c
    same patch ("stop_machine: make stop_machine safe and efficient
    to call early") merged twice, with whitespace fix in one version

    Linus Torvalds
     

03 Nov, 2011

3 commits

  • When I tried to send a patch to remove it, Andi told me we still need to
    keep compabitlies for old libc, so we can't remove this completely. Then
    just make it default to n and remove the doc from
    feature-removal-schedule.txt.

    Signed-off-by: WANG Cong
    Cc: Eric Biederman
    Cc: Andi Kleen
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • Expand root=PARTUUID=UUID syntax to support selecting a root partition by
    integer offset from a known, unique partition. This approach provides
    similar properties to specifying a device and partition number, but using
    the UUID as the unique path prior to evaluating the offset.

    For example,
    root=PARTUUID=99DE9194-FC15-4223-9192-FC243948F88B/PARTNROFF=1
    selects the partition with UUID 99DE.. then select the next
    partition.

    This change is motivated by a particular usecase in Chromium OS where the
    bootloader can easily determine what partition it is on (by UUID) but
    doesn't perform general partition table walking.

    That said, support for this model provides a direct mechanism for the user
    to modify the root partition to boot without specifically needing to
    extract each UUID or update the bootloader explicitly when the root
    partition UUID is changed (if it is recreated to be larger, for instance).
    Pinning to a /boot-style partition UUID allows the arbitrary root
    partition reconfiguration/modifications with slightly less ambiguity than
    just [dev][partition] and less stringency than the specific root partition
    UUID.

    [sfr@canb.auug.org.au: fix init sections warning]
    Signed-off-by: Will Drewry
    Cc: Kay Sievers
    Cc: Randy Dunlap
    Cc: Namhyung Kim
    Cc: Trond Myklebust
    Cc: Jens Axboe
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Drewry
     
  • When a cramfs ramdisk padded with 512 bytes is given to the kernel, the
    current identify_ramdisk_image function fails to identify it.

    Tested with a padded cramfs image on an ARM based board.

    Signed-off-by: Neil Armstrong
    Cc: Namhyung Kim
    Cc: Davidlohr Bueso
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Armstrong
     

26 Oct, 2011

4 commits

  • * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    llist: Add back llist_add_batch() and llist_del_first() prototypes
    sched: Don't use tasklist_lock for debug prints
    sched: Warn on rt throttling
    sched: Unify the ->cpus_allowed mask copy
    sched: Wrap scheduler p->cpus_allowed access
    sched: Request for idle balance during nohz idle load balance
    sched: Use resched IPI to kick off the nohz idle balance
    sched: Fix idle_cpu()
    llist: Remove cpu_relax() usage in cmpxchg loops
    sched: Convert to struct llist
    llist: Add llist_next()
    irq_work: Use llist in the struct irq_work logic
    llist: Return whether list is empty before adding in llist_add()
    llist: Move cpu_relax() to after the cmpxchg()
    llist: Remove the platform-dependent NMI checks
    llist: Make some llist functions inline
    sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch
    sched: Remove redundant test in check_preempt_tick()
    sched: Add documentation for bandwidth control
    sched: Return unused runtime on group dequeue
    ...

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp()
    rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states
    rcu: Wire up RCU_BOOST_PRIO for rcutree
    rcu: Make rcu_torture_boost() exit loops at end of test
    rcu: Make rcu_torture_fqs() exit loops at end of test
    rcu: Permit rt_mutex_unlock() with irqs disabled
    rcu: Avoid having just-onlined CPU resched itself when RCU is idle
    rcu: Suppress NMI backtraces when stall ends before dump
    rcu: Prohibit grace periods during early boot
    rcu: Simplify unboosting checks
    rcu: Prevent early boot set_need_resched() from __rcu_pending()
    rcu: Dump local stack if cannot dump all CPUs' stacks
    rcu: Move __rcu_read_unlock()'s barrier() within if-statement
    rcu: Improve rcu_assign_pointer() and RCU_INIT_POINTER() documentation
    rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier
    rcu: Make rcu_implicit_dynticks_qs() locals be correct size
    rcu: Eliminate in_irq() checks in rcu_enter_nohz()
    nohz: Remove nohz_cpu_mask
    rcu: Document interpretation of RCU-lockdep splats
    rcu: Allow rcutorture's stat_interval parameter to be changed at runtime
    ...

    Linus Torvalds
     
  • The user may use "foo-bar" for a kernel parameter defined as "foo_bar".
    Make sure it works the other way around too.

    Apply the equality of dashes and underscores on early_params and __setup
    params as well.

    The example given in Documentation/kernel-parameters.txt indicates that
    this is the intended behaviour.

    With the patch the kernel accepts "log-buf-len=1M" as expected.
    https://bugzilla.redhat.com/show_bug.cgi?id=744545

    Signed-off-by: Michal Schmidt
    Signed-off-by: Rusty Russell (neatened implementations)

    Michal Schmidt
     
  • Initialize jump_labels much, much earlier, so they're available for use
    during system setup.

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Peter Zijlstra

    Jeremy Fitzhardinge
     

04 Oct, 2011

1 commit


01 Oct, 2011

1 commit


30 Sep, 2011

1 commit

  • Commit d5767c53535a ("bootup: move 'usermodehelper_enable()' to the end
    of do_basic_setup()") moved 'usermodehelper_enable()' to end of
    do_basic_setup() to after the initcalls. But then I get failed to let
    uvesafb work on my computer, and lose the splash boot.

    So maybe we could start usermodehelper_enable a little early to make
    some task work that need eary init with the help of user mode.

    [ I would *really* prefer that initcalls not call into user space - even
    the real 'init' hasn't been execve'd yet, after all! But for uvesafb
    it really does look like we don't have much choice.

    I considered doing this when we mount the root filesystem, but
    depending on config options that is in multiple places. We could do
    the usermode helper enable as a rootfs_initcall()..

    So I'm just using wang yanqing's trivial patch. It's not wonderful,
    but it's simple and should work. We should revisit this some day,
    though. - Linus ]

    Signed-off-by: Linus Torvalds

    wangyanqing
     

29 Sep, 2011

2 commits

  • This commit eliminates the possibility of running TREE_PREEMPT_RCU
    when SMP=n and of running TINY_RCU when PREEMPT=y. People who really
    want these combinations can hand-edit init/Kconfig, but eliminating
    them as choices for production systems reduces the amount of testing
    required. It will also allow cutting out a few #ifdefs.

    Note that running TREE_RCU and TINY_RCU on single-CPU systems using
    SMP-built kernels is still supported.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Doing it just before starting to call into cpu_idle() made a sick kind
    of sense only because the original bug we fixed (see commit
    288d5abec831: "Boot up with usermodehelper disabled") was about problems
    with some scheduler data structures not being initialized, and they had
    better be initialized at that point.

    But it really didn't make any other conceptual sense, and doing it after
    the initial "schedule()" call for the idle thread actually opened up a
    race: what if the main initialization thread did everything without
    needing to sleep, and got all the way into user land too? Without
    actually having scheduled back to the idle thread?

    Now, in normal circumstances that doesn't ever happen, but it looks like
    Richard Cochran triggered exactly that on his ARM IXP4xx machines:

    "I have some ARM IXP4xx based machines that use the two on chip MAC
    ports (aka NPEs). The NPE needs a firmware in order to function.
    Ever since the following commit [that 288d5abec831 one], it is no
    longer possible to bring up the interfaces during the init scripts."

    with a call trace showing an ioctl coming from user space. Richard says:

    "The init is busybox, and the startup script does mount, syslogd, and
    then ifup, so that all can go by quickly."

    The fix is to move the usermodehelper_enable() into the main 'init'
    thread, and just put it after we've done all our initcalls. By then,
    everything really should be up, but we've obviously not actually started
    the user-mode portion of init yet.

    Reported-and-tested-by: Richard Cochran
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Sep, 2011

1 commit

  • When a malformed loglevel value (for example "${abc}") is passed on the
    kernel cmdline, the loglevel itself is being set to 0.

    That then suppresses all following messages, including all the errors
    and crashes caused by other malformed cmdline options. This could make
    debugging process quite tricky.

    This patch leaves the previous value of loglevel if the new value is
    incorrect and reports an error code in this case.

    Signed-off-by: Alexander Sverdlin
    Signed-off-by: Linus Torvalds

    Alexander Sverdlin
     

14 Aug, 2011

1 commit

  • In this patch we introduce the notion of CFS bandwidth, partitioned into
    globally unassigned bandwidth, and locally claimed bandwidth.

    - The global bandwidth is per task_group, it represents a pool of unclaimed
    bandwidth that cfs_rqs can allocate from.
    - The local bandwidth is tracked per-cfs_rq, this represents allotments from
    the global pool bandwidth assigned to a specific cpu.

    Bandwidth is managed via cgroupfs, adding two new interfaces to the cpu subsystem:
    - cpu.cfs_period_us : the bandwidth period in usecs
    - cpu.cfs_quota_us : the cpu bandwidth (in usecs) that this tg will be allowed
    to consume over period above.

    Signed-off-by: Paul Turner
    Signed-off-by: Nikhil Rao
    Signed-off-by: Bharata B Rao
    Reviewed-by: Hidetoshi Seto
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20110721184756.972636699@google.com
    Signed-off-by: Ingo Molnar

    Paul Turner
     

04 Aug, 2011

2 commits

  • The core device layer sends tons of uevent notifications for each device
    it finds, and if the kernel has been built with a non-empty
    CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
    helper binary for all these events very early in the boot.

    Not only won't the root filesystem even be mounted at that point, we
    literally won't have necessarily even initialized all the process
    handling data structures at that point, which causes no end of silly
    problems even when the usermode helper doesn't actually succeed in
    executing.

    So just use our existing infrastructure to disable the usermodehelpers
    to make the kernel start out with them disabled. We enable them when
    we've at least initialized stuff a bit.

    Problems related to an uninitialized

    init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex

    reported by various people.

    Reported-by: Manuel Lauss
    Reported-by: Richard Weinberger
    Reported-by: Marc Zyngier
    Acked-by: Kay Sievers
    Cc: Andrew Morton
    Cc: Vasiliy Kulikov
    Cc: Greg KH
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • While it's at its least, make a number of boring nitpicky cleanups to
    shmem.c, mostly for consistency of variable naming. Things like "swap"
    instead of "entry", "pgoff_t index" instead of "unsigned long idx".

    And since everything else here is prefixed "shmem_", better change
    init_tmpfs() to shmem_init().

    Signed-off-by: Hugh Dickins
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

26 Jul, 2011

2 commits

  • For each CPU, do the calibration delay only once. For subsequent calls,
    use the cached per-CPU value of loops_per_jiffy.

    This saves about 200ms of resume time on dual core Intel Atom N5xx based
    systems. This helps bring down the kernel resume time on such systems
    from about 500ms to about 300ms.

    [akpm@linux-foundation.org: make cpu_loops_per_jiffy static]
    [akpm@linux-foundation.org: clean up message text]
    [akpm@linux-foundation.org: fix things up after upstream rmk changes]
    Signed-off-by: Sameer Nanda
    Cc: Phil Carmody
    Cc: Andrew Worsley
    Cc: David Daney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sameer Nanda
     
  • In commit a2c8990aed5ab ("memsw: remove noswapaccount kernel parameter"),
    Michal forgot to remove some left pieces of noswapaccount in the tree,
    this patch removes them all.

    Signed-off-by: WANG Cong
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     

24 Jul, 2011

1 commit

  • …us' and 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    um: Make rwsem.S depend on CONFIG_RWSEM_XCHGADD_ALGORITHM

    * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    debug: Make CONFIG_EXPERT select CONFIG_DEBUG_KERNEL to unhide debug options

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: Remove unused CHECK_IRQ_PER_CPU()

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf tools, x86: Fix 32-bit compile on 64-bit system

    Linus Torvalds
     

23 Jul, 2011

1 commit

  • …rnel/git/tip/linux-2.6-tip

    * 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    mips: Fix i8253 clockevent fallout
    i8253: Cleanup outb/inb magic
    arm: Footbridge: Use common i8253 clockevent
    mips: Use common i8253 clockevent
    x86: Use common i8253 clockevent
    i8253: Create common clockevent implementation
    i8253: Export i8253_lock unconditionally
    pcpskr: MIPS: Make config dependencies finer grained
    pcspkr: Cleanup Kconfig dependencies
    i8253: Move remaining content and delete asm/i8253.h
    i8253: Consolidate definitions of PIT_LATCH
    x86: i8253: Consolidate definitions of global_clock_event
    i8253: Alpha, PowerPC: Remove unused asm/8253pit.h
    alpha: i8253: Cleanup remaining users of i8253pit.h
    i8253: Remove I8253_LOCK config
    i8253: Make pcsp sound driver use the shared i8253_lock
    i8253: Make pcspkr input driver use the shared i8253_lock
    i8253: Consolidate all kernel definitions of i8253_lock
    i8253: Unify all kernel declarations of i8253_lock
    i8253: Create linux/i8253.h and use it in all 8253 related files

    Linus Torvalds
     

23 Jun, 2011

1 commit

  • Secondary CPU bringup typically calls calibrate_delay() during its
    initialization. However, calibrate_delay() modifies a global variable
    (loops_per_jiffy) used for udelay() and __delay().

    A side effect of 71c696b1 ("calibrate: extract fall-back calculation
    into own helper") introduced in the 2.6.39 merge window means that we
    end up with a substantial period where loops_per_jiffy is zero. This
    causes the spinlock debugging code to malfunction:

    u64 loops = loops_per_jiffy * HZ;
    for (;;) {
    for (i = 0; i < loops; i++) {
    if (arch_spin_trylock(&lock->raw_lock))
    return;
    __delay(1);
    }
    ...
    }

    by never calling arch_spin_trylock() - resulting in the CPU locking
    up in an infinite loop inside __spin_lock_debug().

    Work around this by only writing to loops_per_jiffy only once we have
    completed all the calibration decisions.

    Tested-by: Santosh Shilimkar
    Signed-off-by: Russell King
    Cc: (2.6.39-stable)
    --
    Better solutions (such as omitting the calibration for secondary CPUs,
    or arranging for calibrate_delay() to return the LPJ value and leave
    it to the caller to decide where to store it) are a possibility, but
    would be much more invasive into each architecture.

    I think this is the best solution for -rc and stable, but it should be
    revisited for the next merge window.

    init/calibrate.c | 14 ++++++++------
    1 files changed, 8 insertions(+), 6 deletions(-)
    Signed-off-by: Linus Torvalds

    Russell King
     

17 Jun, 2011

1 commit

  • There is a problem that kdump(2nd kernel) sometimes hangs up due
    to a pending IPI from 1st kernel. Kernel panic occurs because IPI
    comes before call_single_queue is initialized.

    To fix the crash, rename init_call_single_data() to call_function_init()
    and call it in start_kernel() so that call_single_queue can be
    initialized before enabling interrupts.

    The details of the crash are:

    (1) 2nd kernel boots up

    (2) A pending IPI from 1st kernel comes when irqs are first enabled
    in start_kernel().

    (3) Kernel tries to handle the interrupt, but call_single_queue
    is not initialized yet at this point. As a result, in the
    generic_smp_call_function_single_interrupt(), NULL pointer
    dereference occurs when list_replace_init() tries to access
    &q->list.next.

    Therefore this patch changes the name of init_call_single_data()
    to call_function_init() and calls it before local_irq_enable()
    in start_kernel().

    Signed-off-by: Takao Indoh
    Reviewed-by: WANG Cong
    Acked-by: Neil Horman
    Acked-by: Vivek Goyal
    Acked-by: Peter Zijlstra
    Cc: Milton Miller
    Cc: Jens Axboe
    Cc: Paul E. McKenney
    Cc: kexec@lists.infradead.org
    Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
    Signed-off-by: Ingo Molnar

    Takao Indoh
     

16 Jun, 2011

3 commits

  • CONFIG_CONSTRUCTORS controls support for running constructor functions at
    kernel init time. According to commit b99b87f70c7785ab ("kernel:
    constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
    CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
    and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
    CONFIG_GCOV_KERNEL select it, so that the normal case of
    CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

    Observed in the short list of =y values in a minimal kernel configuration.

    Signed-off-by: Josh Triplett
    Acked-by: WANG Cong
    Acked-by: Peter Oberparleiter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
    calculation as it appears when booting every core on setups with
    'ignore_loglevel' which dmesg people scan for possible issues. As the
    message doesn't show very useful information to the widest audience of
    kernel boot message gazers, it should be removed.

    Introduced by commit d2b463135f84 ("init/calibrate.c: fix for critical
    bogoMIPS intermittent calculation failure").

    Signed-off-by: Borislav Petkov
    Cc: Andrew Worsley
    Cc: Phil Carmody
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • The "hostname" tool falls back to setting the hostname to "localhost" if
    /etc/hostname does not exist. Distribution init scripts have the same
    fallback. However, if userspace never calls sethostname, such as when
    booting with init=/bin/sh, or otherwise booting a minimal system without
    the usual init scripts, the default hostname of "(none)" remains,
    unhelpfully appearing in various places such as prompts ("root@(none):~#")
    and logs. Furthermore, "(none)" doesn't typically resolve to anything
    useful.

    Make the default hostname configurable. This removes the need for the
    standard fallback, provides a useful default for systems that never call
    sethostname, and makes minimal systems that much more useful with less
    configuration. Distributions could choose to use "localhost" here to
    avoid the fallback, while embedded systems may wish to use a specific
    target hostname.

    Signed-off-by: Josh Triplett
    Acked-by: Linus Torvalds
    Acked-by: David Miller
    Cc: Serge Hallyn
    Cc: Kel Modderman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

09 Jun, 2011

2 commits

  • Lenghty lists of the kind "depends on ARCH1 || ARCH2 ... || ARCH123" are
    usually either wrong or too coarse grained. Or plain an ugly sin.

    [ tglx: Fixed up amigaone ]

    Signed-off-by: Ralf Baechle
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Acked-by: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Gerhard Pircher
    Link: http://lkml.kernel.org/r/20110601180610.984881988@duck.linux-mips.net
    Signed-off-by: Thomas Gleixner

    Ralf Baechle
     
  • Move them to drivers/clocksource/i8253.c and remove the
    implementations in arch/

    [ tglx: Avoid the extra file in lib - folded arch patches in. The
    export will become conditional in a later step ]

    Signed-off-by: Ralf Baechle
    Link: http://lkml.kernel.org/r/20110601180610.221426078@duck.linux-mips.net
    Cc: Russell King
    Signed-off-by: Thomas Gleixner

    Ralf Baechle
     

07 Jun, 2011

1 commit

  • Several debugging options currently default to y, such as
    CONFIG_DEBUG_BUGVERBOSE and CONFIG_DEBUG_RODATA. Embedded users
    might want to turn those options off to save space; however,
    turning them off requires turning on CONFIG_DEBUG_KERNEL to
    unhide them. Since CONFIG_DEBUG_KERNEL exists specifically to
    unhide debugging options, and CONFIG_EXPERT exists specifically
    to unhide options potentially needed by experts and/or embedded
    users, make CONFIG_EXPERT automatically imply
    CONFIG_DEBUG_KERNEL.

    Signed-off-by: Josh Triplett
    Acked-by: Frederic Weisbecker
    Cc: Sam Ravnborg
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20110606012358.GA1909@leaf
    Signed-off-by: Ingo Molnar

    Josh Triplett
     

30 May, 2011

1 commit

  • Thomas Gleixner reports that we now have a boot crash triggered by
    CONFIG_CPUMASK_OFFSTACK=y:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] find_next_bit+0x55/0xb0
    Call Trace:
    [] cpumask_any_but+0x2a/0x70
    [] flush_tlb_mm+0x2b/0x80
    [] pud_populate+0x35/0x50
    [] pgd_alloc+0x9a/0xf0
    [] mm_init+0xec/0x120
    [] mm_alloc+0x53/0xd0

    which was introduced by commit de03c72cfce5 ("mm: convert
    mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
    mm_init() vs mm_init_cpumask

    Thomas wrote a patch to just fix the ordering of initialization, but I
    hate the new double allocation in the fork path, so I ended up instead
    doing some more radical surgery to clean it all up.

    Reported-by: Thomas Gleixner
    Reported-by: Ingo Molnar
    Cc: KOSAKI Motohiro
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

27 May, 2011

1 commit

  • The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
    leads to some problems:

    * cgroup creation is out-of-control
    * cgroup name can conflict when pids are looping
    * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
    * we may want to create a namespace without creating a cgroup

    The ns_cgroup was replaced by a compatibility flag 'clone_children',
    where a newly created cgroup will copy the parent cgroup values.
    The userspace has to manually create a cgroup and add a task to
    the 'tasks' file.

    This patch removes the ns_cgroup as suggested in the following thread:

    https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

    The 'cgroup_clone' function is removed because it is no longer used.

    This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
    ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
    printk warning users that the feature is planned for removal. Since that
    time we have heard from XXX users who were affected by this.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Cc: Jamal Hadi Salim
    Reviewed-by: Li Zefan
    Acked-by: Paul Menage
    Acked-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     

25 May, 2011

4 commits

  • On larger systems, because of the numerous ACPI, Bootmem and EFI messages,
    the static log buffer overflows before the larger one specified by the
    log_buf_len param is allocated. Minimize the overflow by allocating the
    new log buffer as soon as possible.

    On kernels without memblock, a later call to setup_log_buf from
    kernel/init.c is the fallback.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix CONFIG_PRINTK=n build]
    Signed-off-by: Mike Travis
    Cc: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Jack Steiner
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Travis
     
  • A fix to the TSC (Time Stamp Counter) based bogoMIPS calculation used on
    secondary CPUs which has two faults:

    1: Not handling wrapping of the lower 32 bits of the TSC counter on
    32bit kernel - perhaps TSC is not reset by a warm reset?

    2: TSC and Jiffies are no incrementing together properly. Either
    jiffies increment too quickly or Time Stamp Counter isn't incremented
    in during an SMI but the real time clock is and jiffies are
    incremented.

    Case 1 can result in a factor of 16 too large a value which makes udelay()
    values too small and can cause mysterious driver errors. Case 2 appears
    to give smaller 10-15% errors after averaging but enough to cause
    occasional failures on my own board

    I have tested this code on my own branch and attach patch suitable for
    current kernel code. See below for examples of the failures and how the
    fix handles these situations now.

    I reported this issue earlier here:
    Intermittent problem with BogoMIPs calculation on Intel AP CPUs -
    http://marc.info/?l=linux-kernel&m=129947246316875&w=4

    I suspect this issue has been seen by others but as it is intermittent and
    bogoMIPS for secondary CPUs are no longer printed out it might have been
    difficult to identify this as the cause. Perhaps these unresolved issues,
    although quite old, might be relevant as possibly this fault has been
    around for a while. In particular Case 1 may only be relevant to 32bit
    kernels on newer HW (most people run 64bit kernels?). Case 2 is less
    dramatic since the earlier fix in this area and also intermittent.

    Re: bogomips discrepancy on Intel Core2 Quad CPU -
    http://marc.info/?l=linux-kernel&m=118929277524298&w=4
    slow system and bogus bogomips -
    http://marc.info/?l=linux-kernel&m=116791286716107&w=4
    Re: Re: [RFC-PATCH] clocksource: update lpj if clocksource has -
    http://marc.info/?l=linux-kernel&m=128952775819467&w=4

    This issue is masked a little by commit feae3203d711db0a ("timers, init:
    Limit the number of per cpu calibration bootup messages") which only
    prints out the first bogoMIPS value making it much harder to notice other
    values differing. Perhaps it should be changed to only suppress them when
    they are similar values?

    Here are some outputs showing faults occurring and the new code handling
    them properly. See my earlier message for examples of the original
    failure.

    Case 1: A Time Stamp Counter wrap:
    ...
    Calibrating delay loop (skipped), value calculated using timer
    frequency.. 6332.70 BogoMIPS (lpj=31663540)
    ....
    calibrate_delay_direct() timer_rate_max=31666493
    timer_rate_min=31666151 pre_start=4170369255 pre_end=4202035539
    calibrate_delay_direct() timer_rate_max=2425955274
    timer_rate_min=2425954941 pre_start=4265368533 pre_end=2396356387
    calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
    around start=4265368581 >=post_end=2396356511
    calibrate_delay_direct() timer_rate_max=31666274
    timer_rate_min=31665942 pre_start=2440373374 pre_end=2472039515
    calibrate_delay_direct() timer_rate_max=31666492
    timer_rate_min=31666160 pre_start=2535372139 pre_end=2567038422
    calibrate_delay_direct() timer_rate_max=31666455
    timer_rate_min=31666207 pre_start=2630371084 pre_end=2662037415
    Calibrating delay using timer specific routine.. 6333.28 BogoMIPS (lpj=31666428)
    Total of 2 processors activated (12665.99 BogoMIPS).
    ....

    Case 2: Some thing (presumably the SMM interrupt?) causing the
    very low increase in TSC counter for the DELAY_CALIBRATION_TICKS
    increase in jiffies
    ...
    Calibrating delay loop (skipped), value calculated using timer
    frequency.. 6333.25 BogoMIPS (lpj=31666270)
    ...
    calibrate_delay_direct() timer_rate_max=31666483
    timer_rate_min=31666074 pre_start=4199536526 pre_end=4231202809
    calibrate_delay_direct() timer_rate_max=864348 timer_rate_min=864016
    pre_start=2405343672 pre_end=2406207897
    calibrate_delay_direct() timer_rate_max=31666483
    timer_rate_min=31666179 pre_start=2469540464 pre_end=2501206823
    calibrate_delay_direct() timer_rate_max=31666511
    timer_rate_min=31666122 pre_start=2564539400 pre_end=2596205712
    calibrate_delay_direct() timer_rate_max=31666084
    timer_rate_min=31665685 pre_start=2659538782 pre_end=2691204657
    calibrate_delay_direct() dropping min bogoMips estimate 1 = 864348
    Calibrating delay using timer specific routine.. 6333.27 BogoMIPS (lpj=31666390)
    Total of 2 processors activated (12666.53 BogoMIPS).
    ...

    After 70 boots I saw 2 variations
    Reviewed-by: Phil Carmody
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Worsley
     
  • cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
    It might lead to reduce cache hit ratio.

    This patch has two change.
    1) Move the place of cpumask into last of mm_struct. Because usually cpumask
    is accessed only front bits when the system has cpu-hotplug capability
    2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
    footprint if cpumask_size() will use nr_cpumask_bits properly in future.

    In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
    It may help to detect out of tree cpu_vm_mask users.

    This patch has no functional change.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KOSAKI Motohiro
    Cc: David Howells
    Cc: Koichi Yasutake
    Cc: Hugh Dickins
    Cc: Chris Metcalf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
    kbuild: make KBUILD_NOCMDDEP=1 handle empty built-in.o
    scripts/kallsyms.c: fix potential segfault
    scripts/gen_initramfs_list.sh: Convert to a /bin/sh script
    kbuild: Fix GNU make v3.80 compatibility
    kbuild: Fix passing -Wno-* options to gcc 4.4+
    kbuild: move scripts/basic/docproc.c to scripts/docproc.c
    kbuild: Fix Makefile.asm-generic for um
    kbuild: Allow to combine multiple W= levels
    kbuild: Disable -Wunused-but-set-variable for gcc 4.6.0
    Fix handling of backlash character in LINUX_COMPILE_BY name
    kbuild: asm-generic support
    kbuild: implement several W= levels
    kbuild: Fix build with binutils <= 2.19
    initramfs: Use KBUILD_BUILD_TIMESTAMP for generated entries
    kbuild: Allow to override LINUX_COMPILE_BY and LINUX_COMPILE_HOST macros
    kbuild: Drop unused LINUX_COMPILE_TIME and LINUX_COMPILE_DOMAIN macros
    kbuild: Use the deterministic mode of ar
    kbuild: Call gzip with -n
    kbuild: move KALLSYMS_EXTRA_PASS from Kconfig to Makefile
    Kconfig: improve KALLSYMS_ALL documentation

    Fix up trivial conflict in Makefile

    Linus Torvalds
     

23 May, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (28 commits)
    sparc32: fix build, fix missing cpu_relax declaration
    SCHED_TTWU_QUEUE is not longer needed since sparc32 now implements IPI
    sparc32,leon: Remove unnecessary page_address calls in LEON DMA API.
    sparc: convert old cpumask API into new one
    sparc32, sun4d: Implemented SMP IPIs support for SUN4D machines
    sparc32, sun4m: Implemented SMP IPIs support for SUN4M machines
    sparc32,leon: Implemented SMP IPIs for LEON CPU
    sparc32: implement SMP IPIs using the generic functions
    sparc32,leon: SMP power down implementation
    sparc32,leon: added some SMP comments
    sparc: add {read,write}*_be routines
    sparc32,leon: don't rely on bootloader to mask IRQs
    sparc32,leon: operate on boot-cpu IRQ controller registers
    sparc32: always define boot_cpu_id
    sparc32: removed unused code, implemented by generic code
    sparc32: avoid build warning at mm/percpu.c:1647
    sparc32: always register a PROM based early console
    sparc32: probe for cpu info only during startup
    sparc: consolidate show_cpuinfo in cpu.c
    sparc32,leon: implement genirq CPU affinity
    ...

    Linus Torvalds
     
  • I still happen to believe that I$ miss costs are a major thing, but
    sadly, -Os doesn't seem to be the solution. With or without it, gcc
    will miss some obvious code size improvements, and with it enabled gcc
    will sometimes make choices that aren't good even with high I$ miss
    ratios.

    For example, with -Os, gcc on x86 will turn a 20-byte constant memcpy
    into a "rep movsl". While I sincerely hope that x86 CPU's will some day
    do a good job at that, they certainly don't do it yet, and the cost is
    higher than a L1 I$ miss would be.

    Some day I hope we can re-enable this.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

21 May, 2011

1 commit


20 May, 2011

2 commits

  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits)
    Revert "rcu: Decrease memory-barrier usage based on semi-formal proof"
    net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
    batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu
    batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree()
    batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu
    net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu()
    net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu()
    net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu()
    perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()
    perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()
    net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(net_generic_release) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu()
    security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu()
    net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()
    net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()
    net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()
    ...

    Linus Torvalds
     
  • …kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
    sched: Fix and optimise calculation of the weight-inverse
    sched: Avoid going ahead if ->cpus_allowed is not changed
    sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
    sched: Remove unused parameters from sched_fork() and wake_up_new_task()
    sched: Shorten the construction of the span cpu mask of sched domain
    sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUG
    sched: Remove unused 'this_best_prio arg' from balance_tasks()
    sched: Remove noop in alloc_rt_sched_group()
    sched: Get rid of lock_depth
    sched: Remove obsolete comment from scheduler_tick()
    sched: Fix sched_domain iterations vs. RCU
    sched: Next buddy hint on sleep and preempt path
    sched: Make set_*_buddy() work on non-task entities
    sched: Remove need_migrate_task()
    sched: Move the second half of ttwu() to the remote cpu
    sched: Restructure ttwu() some more
    sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
    sched: Remove rq argument from ttwu_stat()
    sched: Remove rq->lock from the first half of ttwu()
    sched: Drop rq->lock from sched_exec()
    ...

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: Fix rt_rq runtime leakage bug

    Linus Torvalds