11 Nov, 2014

1 commit

  • Currently if the user passes an invalid value on the kernel command line
    then the kernel will crash during argument parsing. On most systems this
    is very hard to debug because the console hasn't been initialized yet.

    This is a regression due to commit 51e158c12aca ("param: hand arguments
    after -- straight to init") which, in response to the systemd debug
    controversy, made it possible to explicitly pass arguments to init. To
    achieve this parse_args() was extended from simply returning an error
    code to returning a pointer. Regretably the new init args logic does not
    perform a proper validity check on the pointer resulting in a crash.

    This patch fixes the validity check. Should the check fail then no arguments
    will be passed to init. This is reasonable and matches how the kernel treats
    its own arguments (i.e. no error recovery).

    Signed-off-by: Daniel Thompson
    Cc: stable@vger.kernel.org
    Signed-off-by: Rusty Russell

    Daniel Thompson
     

28 Oct, 2014

1 commit

  • introduce two configs:
    - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
    depend on
    - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use

    that solves several problems:
    - tracing and others that wish to use eBPF don't need to depend on NET.
    They can use BPF_SYSCALL to allow loading from userspace or select BPF
    to use it directly from kernel in NET-less configs.
    - in 3.18 programs cannot be attached to events yet, so don't force it on
    - when the rest of eBPF infra is there in 3.19+, it's still useful to
    switch it off to minimize kernel size

    bloat-o-meter on x64 shows:
    add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)

    tested with many different config combinations. Hopefully didn't miss anything.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

14 Oct, 2014

3 commits


13 Oct, 2014

2 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - changes related to No-CBs CPUs and NO_HZ_FULL

    - RCU-tasks implementation

    - torture-test updates

    - miscellaneous fixes

    - locktorture updates

    - RCU documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
    workqueue: Use cond_resched_rcu_qs macro
    workqueue: Add quiescent state between work items
    locktorture: Cleanup header usage
    locktorture: Cannot hold read and write lock
    locktorture: Fix __acquire annotation for spinlock irq
    locktorture: Support rwlocks
    rcu: Eliminate deadlock between CPU hotplug and expedited grace periods
    locktorture: Document boot/module parameters
    rcutorture: Rename rcutorture_runnable parameter
    locktorture: Add test scenario for rwsem_lock
    locktorture: Add test scenario for mutex_lock
    locktorture: Make torture scripting account for new _runnable name
    locktorture: Introduce torture context
    locktorture: Support rwsems
    locktorture: Add infrastructure for torturing read locks
    torture: Address race in module cleanup
    locktorture: Make statistics generic
    locktorture: Teach about lock debugging
    locktorture: Support mutexes
    locktorture: Add documentation
    ...

    Linus Torvalds
     

10 Oct, 2014

1 commit

  • ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
    _PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and
    relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
    fault scanner. This was found to be conceptually confusing with a lot of
    implicit assumptions and it was asked that an alternative be found.

    Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
    PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
    bits and shrunk the maximum possible swap size but it did not go far
    enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA
    but the relics still exist.

    This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
    duplication in powerpc vs the generic implementation by defining the types
    the core NUMA helpers expected to exist from x86 with their ppc64
    equivalent. This necessitated that a PTE bit mask be created that
    identified the bits that distinguish present from NUMA pte entries but it
    is expected this will only differ between arches based on _PAGE_PROTNONE.
    The naming for the generic helpers was taken from x86 originally but ppc64
    has types that are equivalent for the purposes of the helper so they are
    mapped instead of duplicating code.

    Signed-off-by: Mel Gorman
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Cyrill Gorcunov
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

09 Oct, 2014

1 commit

  • Pull timer fixes from Ingo Molnar:
    "Main changes:

    - Fix the deadlock reported by Dave Jones et al
    - Clean up and fix nohz_full interaction with arch abilities
    - nohz init code consolidation/cleanup"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    nohz: nohz full depends on irq work self IPI support
    nohz: Consolidate nohz full init code
    arm64: Tell irq work about self IPI support
    arm: Tell irq work about self IPI support
    x86: Tell irq work about self IPI support
    irq_work: Force raised irq work to run on irq work interrupt
    irq_work: Introduce arch_irq_work_has_interrupt()
    nohz: Move nohz full init call to tick init

    Linus Torvalds
     

08 Oct, 2014

2 commits

  • Pull "trivial tree" updates from Jiri Kosina:
    "Usual pile from trivial tree everyone is so eagerly waiting for"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Remove MN10300_PROC_MN2WS0038
    mei: fix comments
    treewide: Fix typos in Kconfig
    kprobes: update jprobe_example.c for do_fork() change
    Documentation: change "&" to "and" in Documentation/applying-patches.txt
    Documentation: remove obsolete pcmcia-cs from Changes
    Documentation: update links in Changes
    Documentation: Docbook: Fix generated DocBook/kernel-api.xml
    score: Remove GENERIC_HAS_IOMAP
    gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
    tty: doc: Fix grammar in serial/tty
    dma-debug: modify check_for_stack output
    treewide: fix errors in printk
    genirq: fix reference in devm_request_threaded_irq comment
    treewide: fix synchronize_rcu() in comments
    checkstack.pl: port to AArch64
    doc: queue-sysfs: minor fixes
    init/do_mounts: better syntax description
    MIPS: fix comment spelling
    powerpc/simpleboot: fix comment
    ...

    Linus Torvalds
     
  • Pull module update from Rusty Russell:
    "Nothing major: support for compressing modules, and auto-tainting
    params.

    PS. My virtio-next tree is empty: DaveM took the patches I had. There
    might be a virtio-rng starvation fix, but so far it's a bit voodoo
    so I will get to that in the next two days or it will wait"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    moduleparam: Resolve missing-field-initializer warning
    kbuild: handle module compression while running 'make modules_install'.
    modinst: wrap long lines in order to enhance cmd_modules_install
    modsign: lookup lines ending in .ko in .mod files
    modpost: simplify file name generation of *.mod.c files
    modpost: reduce visibility of symbols and constify r/o arrays
    param: check for tainting before calling set op.
    drm/i915: taint the kernel if unsafe module parameters are set
    module: add module_param_unsafe and module_param_named_unsafe
    module: make it possible to have unsafe, tainting module params
    module: rename KERNEL_PARAM_FL_NOARG to avoid confusion

    Linus Torvalds
     

07 Oct, 2014

1 commit

  • Pull "tinification" patches from Josh Triplett.

    Work on making smaller kernels.

    * tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
    bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
    mm: Support compiling out madvise and fadvise
    x86: Support compiling out human-friendly processor feature names
    x86: Drop support for /proc files when !CONFIG_PROC_FS
    x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
    x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
    x86, boot: Use the usual -y -n mechanism for objects in vmlinux
    x86: Add "make tinyconfig" to configure the tiniest possible kernel
    x86, platform, kconfig: move kvmconfig functionality to a helper

    Linus Torvalds
     

05 Oct, 2014

1 commit


04 Oct, 2014

2 commits

  • commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow
    architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
    HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in
    the middle of the options for the EXPERT menu. However,
    HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
    placing items in the EXPERT menu, and displays the remaining several
    EXPERT items (starting with EPOLL) directly in the General Setup menu.

    Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
    HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the
    subsequent items display as part of the EXPERT menu again; the EMBEDDED
    menu now appears as the next top-level item in the General Setup menu,
    which makes General Setup much shorter and more usable.

    Signed-off-by: Josh Triplett
    Acked-by: Randy Dunlap
    Cc: stable

    Josh Triplett
     
  • The buffers sized by CONFIG_LOG_BUF_SHIFT and
    CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't
    ask about their size at all.

    Signed-off-by: Josh Triplett
    Acked-by: Randy Dunlap
    Cc: stable

    Josh Triplett
     

23 Sep, 2014

1 commit

  • …/linux-rcu into core/rcu

    Pull the v3.18 RCU changes from Paul E. McKenney:

    "
    * Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/378.

    * Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/386. An additional fix that
    eliminates a documented (but now inconvenient) deadlock between
    RCU hotplug and expedited grace periods was posted at
    https://lkml.org/lkml/2014/8/28/573.

    * Changes related to No-CBs CPUs and NO_HZ_FULL. These were posted
    to LKML at https://lkml.org/lkml/2014/8/28/412.

    * Torture-test updates. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/546 and at
    https://lkml.org/lkml/2014/9/11/1114.

    * RCU-tasks implementation. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/540.
    "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

19 Sep, 2014

1 commit

  • Tasks get their end of stack set to STACK_END_MAGIC with the
    aim to catch stack overruns. Currently this feature does not
    apply to init_task. This patch removes this restriction.

    Note that a similar patch was posted by Prarit Bhargava
    some time ago but was never merged:

    http://marc.info/?l=linux-kernel&m=127144305403241&w=2

    Signed-off-by: Aaron Tomlin
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Oleg Nesterov
    Acked-by: Michael Ellerman
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: dzickus@redhat.com
    Cc: bmr@redhat.com
    Cc: jcastillo@redhat.com
    Cc: jgh@redhat.com
    Cc: minchan@kernel.org
    Cc: tglx@linutronix.de
    Cc: hannes@cmpxchg.org
    Cc: Alex Thorlton
    Cc: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Daeseok Youn
    Cc: David Rientjes
    Cc: Fabian Frederick
    Cc: Geert Uytterhoeven
    Cc: Jiri Olsa
    Cc: Kees Cook
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Michael Opdenacker
    Cc: Paul Mackerras
    Cc: Prarit Bhargava
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Seiji Aguchi
    Cc: Steven Rostedt
    Cc: Vladimir Davydov
    Cc: Yasuaki Ishimatsu
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     

17 Sep, 2014

3 commits

  • This reverts commit 4dfe694f616e00e6fd83e5bbcd7a3c4d7113493d.

    In that, we did:

    Here we move the rootdelay code to be right beside the rootwait code, so
    that their behaviour is consistent.

    ...which is fine, but in hindsight, perhaps moving the rootwait to be
    beside the rootdelay would have been better. We also indicated:

    It should be noted that in doing so, the actions based on the
    saved_root_name[0] and initrd_load() were previously put on hold by
    rootdelay=N and now currently will not be delayed. However, I think
    consistent behaviour is more important than matching historical behaviour
    of delaying the above two operations.

    But Pavel reported an instance where an ARM target with root on MMC
    was failing to mount root, and Russell diagnosed it to the fact that
    the call to set ROOT_DEV within the saved_root_name[0] processing
    block mentioned above was no longer being delayed.

    Rather than moving both wait clauses to the original position of
    rootdelay and risking unearthing other possible corner case breakage
    at this point in time, we simply revert now and we can revisit
    trying the alternate/earlier location in another development cycle.

    Cc: Pavel Machek
    Cc: Russell King
    Cc: Andrew Morton
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     
  • rcu-tasks.2014.09.10a: Add RCU-tasks flavor of RCU.

    Paul E. McKenney
     
  • Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically
    requested) failed to adjust the callback lists of the CPUs that are
    known to be no-CBs CPUs only because they are also nohz_full= CPUs.
    This failure can result in callbacks that are posted during early boot
    getting stranded on nxtlist for CPUs whose no-CBs property becomes
    apparent late, and there can also be spurious warnings about offline
    CPUs posting callbacks.

    This commit fixes these problems by adding an early-boot rcu_init_nohz()
    that properly initializes the no-CBs CPUs.

    Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
    CONFIG_RCU_NOCB_CPU=n do not exhibit this bug. Neither do kernels
    booted without the nohz_full= boot parameter.

    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar
    Tested-by: Paul Gortmaker

    Paul E. McKenney
     

14 Sep, 2014

1 commit

  • This way we unbloat a bit main.c and more importantly we initialize
    nohz full after init_IRQ(). This dependency will be needed in further
    patches because nohz full needs irq work to raise its own IRQ.
    Information about the support for this ability on ARM64 is obtained on
    init_IRQ() which initialize the pointer to __smp_call_function.

    Since tick_init() is called right after init_IRQ(), this is a good place
    to call tick_nohz_init() and prepare for that dependency.

    Acked-by: Peter Zijlstra (Intel)
    Cc: Ingo Molnar
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

08 Sep, 2014

1 commit

  • This commit adds a new RCU-tasks flavor of RCU, which provides
    call_rcu_tasks(). This RCU flavor's quiescent states are voluntary
    context switch (not preemption!) and userspace execution (not the idle
    loop -- use some sort of schedule_on_each_cpu() if you need to handle the
    idle tasks. Note that unlike other RCU flavors, these quiescent states
    occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt.

    This RCU flavor is assumed to have very infrequent latency-tolerant
    updaters. This assumption permits significant simplifications, including
    a single global callback list protected by a single global lock, along
    with a single task-private linked list containing all tasks that have not
    yet passed through a quiescent state. If experience shows this assumption
    to be incorrect, the required additional complexity will be added.

    Suggested-by: Steven Rostedt
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

28 Aug, 2014

1 commit


27 Aug, 2014

1 commit

  • Since module-init-tools (gzip) and kmod (gzip and xz) support compressed
    modules, it could be useful to include a support for compressing modules
    right after having them installed. Doing this in kbuild instead of per
    distro can permit to make this kind of usage more generic.

    This patch add a Kconfig entry to "Enable loadable module support" menu
    and let you choose to compress using gzip (default) or xz.

    Both gzip and xz does not used any extra -[1-9] option since Andi Kleen
    and Rusty Russell prove no gain is made using them. gzip is called with -n
    argument to avoid storing original filename inside compressed file, that
    way we can save some more bytes.

    On a v3.16 kernel, 'make allmodconfig' generated 4680 modules for a
    total of 378MB (no strip, no sign, no compress), the following table
    shows observed disk space gain based on the allmodconfig .config :

    | time |
    +-------------+-----------------+
    | manual .ko | make | size | percent
    | compression | modules_install | | gain
    +-------------+-----------------+------+--------
    - | | 18.61s | 378M |
    GZIP | 3m16s | 3m37s | 102M | 73.41%
    XZ | 5m22s | 5m39s | 77M | 79.83%

    The gain for restricted environnement seems to be interesting while
    uncompress can be time consuming but happens only while loading a module,
    that is generally done only once.

    This is fully compatible with signed modules while the signed module is
    compressed. module-init-tools or kmod handles decompression
    and provide to other layer the uncompressed but signed payload.

    Reviewed-by: Willy Tarreau
    Signed-off-by: Bertrand Jacquin
    Signed-off-by: Rusty Russell

    Bertrand Jacquin
     

26 Aug, 2014

1 commit


18 Aug, 2014

1 commit

  • Many embedded systems will not need these syscalls, and omitting them
    saves space. Add a new EXPERT config option CONFIG_ADVISE_SYSCALLS
    (default y) to support compiling them out.

    bloat-o-meter:
    add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-2250 (-2250)
    function old new delta
    sys_fadvise64 57 - -57
    sys_fadvise64_64 691 - -691
    sys_madvise 1502 - -1502

    Signed-off-by: Josh Triplett

    Josh Triplett
     

15 Aug, 2014

1 commit


09 Aug, 2014

8 commits

  • Merge more incoming from Andrew Morton:
    "Two new syscalls:

    memfd_create in "shm: add memfd_create() syscall"
    kexec_file_load in "kexec: implementation of new syscall kexec_file_load"

    And:

    - Most (all?) of the rest of MM

    - Lots of the usual misc bits

    - fs/autofs4

    - drivers/rtc

    - fs/nilfs

    - procfs

    - fork.c, exec.c

    - more in lib/

    - rapidio

    - Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs,
    fs/cramfs, fs/romfs, fs/qnx6.

    - initrd/initramfs work

    - "file sealing" and the memfd_create() syscall, in tmpfs

    - add pci_zalloc_consistent, use it in lots of places

    - MAINTAINERS maintenance

    - kexec feature work"

    * emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits)
    MAINTAINERS: update nomadik patterns
    MAINTAINERS: update usb/gadget patterns
    MAINTAINERS: update DMA BUFFER SHARING patterns
    kexec: verify the signature of signed PE bzImage
    kexec: support kexec/kdump on EFI systems
    kexec: support for kexec on panic using new system call
    kexec-bzImage64: support for loading bzImage using 64bit entry
    kexec: load and relocate purgatory at kernel load time
    purgatory: core purgatory functionality
    purgatory/sha256: provide implementation of sha256 in purgaotory context
    kexec: implementation of new syscall kexec_file_load
    kexec: new syscall kexec_file_load() declaration
    kexec: make kexec_segment user buffer pointer a union
    resource: provide new functions to walk through resources
    kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc()
    kexec: move segment verification code in a separate function
    kexec: rename unusebale_pages to unusable_pages
    kernel: build bin2c based on config option CONFIG_BUILD_BIN2C
    bin2c: move bin2c in scripts/basic
    shm: wait for pins to be released when sealing
    ...

    Linus Torvalds
     
  • currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be
    used by kexec too. So make it compilation dependent on CONFIG_BUILD_BIN2C
    and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Fixing some checkpatch warnings(remove global initialization, move
    __initdata, coalesce formats ...)

    Signed-off-by: Fabian Frederick
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • On a system with low memory extracting the initramfs may fail. If this
    happens the user gets "Failed to execute /init" instead of an initramfs
    error.

    Check return value of sys_write and call error() when the write was
    incomplete or failed.

    Signed-off-by: David Engraf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Engraf
     
  • Now with 64bit bzImage and kexec tools, we support ramdisk that size is
    bigger than 2g, as we could put it above 4G.

    Found compressed initramfs image could not be decompressed properly. It
    turns out that image length is int during decompress detection, and it
    will become < 0 when length is more than 2G. Furthermore, during
    decompressing len as int is used for inbuf count, that has problem too.

    Change len to long, that should be ok as on 32 bit platform long is
    32bits.

    Tested with following compressed initramfs image as root with kexec.
    gzip, bzip2, xz, lzma, lzop, lz4.
    run time for populate_rootfs():
    size name Nehalem-EX Westmere-EX Ivybridge-EX
    9034400256 root_img : 26s 24s 30s
    3561095057 root_img.lz4 : 28s 27s 27s
    3459554629 root_img.lzo : 29s 29s 28s
    3219399480 root_img.gz : 64s 62s 49s
    2251594592 root_img.xz : 262s 260s 183s
    2226366598 root_img.lzma: 386s 376s 277s
    2901482513 root_img.bz2 : 635s 599s

    Signed-off-by: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Rashika Kheria
    Cc: Josh Triplett
    Cc: Kyungsik Lee
    Cc: P J P
    Cc: Al Viro
    Cc: Tetsuo Handa
    Cc: "Daniel M. Weeks"
    Cc: Alexandre Courbot
    Cc: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • When initrd (compressed or not) is used, kernel report data corrupted with
    /dev/ram0.

    The root cause:
    During initramfs checking, if it is initrd, it will be transferred to
    /initrd.image with sys_write.
    sys_write only support 2G-4K write, so if the initrd ram is more than
    that, /initrd.image will not complete at all.

    Add local xwrite to loop calling sys_write to workaround the problem.

    Also need to use xwrite in write_buffer() to handle:
    image is uncompressed cpio and there is one big file (>2G) in it.
    unpack_to_rootfs ===> write_buffer ===> actions[]/do_copy

    At the same time, we don't need to worry about sys_read/sys_write in
    do_mounts_rd.c::crd_load. As decompressor will have fill/flush and local
    buffer that is smaller than 2G.

    Test with uncompressed initrd, and compressed ones with gz, bz2, lzma,xz,
    lzop.

    Signed-off-by: Yinghai Lu
    Acked-by: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Geert Uytterhoeven
    Cc: Tetsuo Handa
    Cc: "Daniel M. Weeks"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • Currently rootdelay=N and rootwait behave differently (aside from the
    obvious unbounded wait duration) because they are at different places in
    the init sequence.

    The difference manifests itself for md devices because the call to
    md_run_setup() lives between rootdelay and rootwait, so if you try to use
    rootdelay=20 to try and allow a slow RAID0 array to assemble, you get
    this:

    [ 4.526011] sd 6:0:0:0: [sdc] Attached SCSI removable disk
    [ 22.972079] md: Waiting for all devices to be available before autodetect

    i.e. you've achieved nothing other than delaying the probing 20s, when
    what you wanted was a 20s delay _after_ the probing for md devices was
    initiated.

    Here we move the rootdelay code to be right beside the rootwait code, so
    that their behaviour is consistent.

    It should be noted that in doing so, the actions based on the
    saved_root_name[0] and initrd_load() were previously put on hold by
    rootdelay=N and now currently will not be delayed. However, I think
    consistent behaviour is more important than matching historical behaviour
    of delaying the above two operations.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     
  • Pull ARM SoC platform changes from Olof Johansson:
    "This is the bulk of new SoC enablement and other platform changes for
    3.17:

    - Samsung S5PV210 has been converted to DT and multiplatform
    - Clock drivers and bindings for some of the lower-end i.MX 1/2
    platforms
    - Kirkwood, one of the popular Marvell platforms, is folded into the
    mvebu platform code, removing mach-kirkwood
    - Hwmod data for TI AM43xx and DRA7 platforms
    - More additions of Renesas shmobile platform support
    - Removal of plat-samsung contents that can be removed with S5PV210
    being multiplatform/DT-enabled and the other two old platforms
    being removed

    New platforms (most with only basic support right now):

    - Hisilicon X5HD2 settop box chipset is introduced
    - Mediatek MT6589 (mobile chipset) is introduced
    - Broadcom BCM7xxx settop box chipset is introduced

    + as usual a lot other pieces all over the platform code"

    * tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits)
    ARM: hisi: remove smp from machine descriptor
    power: reset: move hisilicon reboot code
    ARM: dts: Add hix5hd2-dkb dts file.
    ARM: debug: Rename Hi3716 to HIX5HD2
    ARM: hisi: enable hix5hd2 SoC
    ARM: hisi: add ARCH_HISI
    MAINTAINERS: add entry for Broadcom ARM STB architecture
    ARM: brcmstb: select GISB arbiter and interrupt drivers
    ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs
    ARM: configs: enable SMP in bcm_defconfig
    ARM: add SMP support for Broadcom mobile SoCs
    Documentation: arm: misc updates to Marvell EBU SoC status
    Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC
    ARM: mvebu: fix build without platforms selected
    ARM: mvebu: add cpuidle support for Armada 38x
    ARM: mvebu: add cpuidle support for Armada 370
    cpuidle: mvebu: add Armada 38x support
    cpuidle: mvebu: add Armada 370 support
    cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7
    ARM: mvebu: export the SCU address
    ...

    Linus Torvalds
     

07 Aug, 2014

1 commit

  • The default size of the ring buffer is too small for machines with a
    large amount of CPUs under heavy load. What ends up happening when
    debugging is the ring buffer overlaps and chews up old messages making
    debugging impossible unless the size is passed as a kernel parameter.
    An idle system upon boot up will on average spew out only about one or
    two extra lines but where this really matters is on heavy load and that
    will vary widely depending on the system and environment.

    There are mechanisms to help increase the kernel ring buffer for tracing
    through debugfs, and those interfaces even allow growing the kernel ring
    buffer per CPU. We also have a static value which can be passed upon
    boot. Relying on debugfs however is not ideal for production, and
    relying on the value passed upon bootup is can only used *after* an
    issue has creeped up. Instead of being reactive this adds a proactive
    measure which lets you scale the amount of contributions you'd expect to
    the kernel ring buffer under load by each CPU in the worst case
    scenario.

    We use num_possible_cpus() to avoid complexities which could be
    introduced by dynamically changing the ring buffer size at run time,
    num_possible_cpus() lets us use the upper limit on possible number of
    CPUs therefore avoiding having to deal with hotplugging CPUs on and off.
    This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT
    which is used to specify the maximum amount of contributions to the
    kernel ring buffer in the worst case before the kernel ring buffer flips
    over, the size is specified as a power of 2. The total amount of
    contributions made by each CPU must be greater than half of the default
    kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger
    an increase upon bootup. The kernel ring buffer is increased to the
    next power of two that would fit the required minimum kernel ring buffer
    size plus the additional CPU contribution. For example if LOG_BUF_SHIFT
    is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs
    in order to trigger an increase of the kernel ring buffer. With a
    LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64
    possible CPUs to trigger an increase. If you had 128 possible CPUs the
    amount of minimum required kernel ring buffer bumps to:

    ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB

    Since we require the ring buffer to be a power of two the new required
    size would be 1024 KB.

    This CPU contributions are ignored when the "log_buf_len" kernel
    parameter is used as it forces the exact size of the ring buffer to an
    expected power of two value.

    [pmladek@suse.cz: fix build]
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Petr Mladek
    Tested-by: Davidlohr Bueso
    Tested-by: Petr Mladek
    Reviewed-by: Davidlohr Bueso
    Cc: Andrew Lunn
    Cc: Stephen Warren
    Cc: Michal Hocko
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Arun KS
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Chris Metcalf
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     

10 Jul, 2014

2 commits


08 Jul, 2014

1 commit

  • Enabling NO_HZ_FULL currently has the side effect of enabling callback
    offloading on all CPUs. This results in lots of additional rcuo kthreads,
    and can also increase context switching and wakeups, even in cases where
    callback offloading is neither needed nor particularly desirable. This
    commit therefore enables callback offloading on a given CPU only if
    specifically requested at build time or boot time, or if that CPU has
    been specifically designated (again, either at build time or boot time)
    as a nohz_full CPU.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

17 Jun, 2014

1 commit