15 Aug, 2014

1 commit


09 Aug, 2014

8 commits

  • Merge more incoming from Andrew Morton:
    "Two new syscalls:

    memfd_create in "shm: add memfd_create() syscall"
    kexec_file_load in "kexec: implementation of new syscall kexec_file_load"

    And:

    - Most (all?) of the rest of MM

    - Lots of the usual misc bits

    - fs/autofs4

    - drivers/rtc

    - fs/nilfs

    - procfs

    - fork.c, exec.c

    - more in lib/

    - rapidio

    - Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs,
    fs/cramfs, fs/romfs, fs/qnx6.

    - initrd/initramfs work

    - "file sealing" and the memfd_create() syscall, in tmpfs

    - add pci_zalloc_consistent, use it in lots of places

    - MAINTAINERS maintenance

    - kexec feature work"

    * emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits)
    MAINTAINERS: update nomadik patterns
    MAINTAINERS: update usb/gadget patterns
    MAINTAINERS: update DMA BUFFER SHARING patterns
    kexec: verify the signature of signed PE bzImage
    kexec: support kexec/kdump on EFI systems
    kexec: support for kexec on panic using new system call
    kexec-bzImage64: support for loading bzImage using 64bit entry
    kexec: load and relocate purgatory at kernel load time
    purgatory: core purgatory functionality
    purgatory/sha256: provide implementation of sha256 in purgaotory context
    kexec: implementation of new syscall kexec_file_load
    kexec: new syscall kexec_file_load() declaration
    kexec: make kexec_segment user buffer pointer a union
    resource: provide new functions to walk through resources
    kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc()
    kexec: move segment verification code in a separate function
    kexec: rename unusebale_pages to unusable_pages
    kernel: build bin2c based on config option CONFIG_BUILD_BIN2C
    bin2c: move bin2c in scripts/basic
    shm: wait for pins to be released when sealing
    ...

    Linus Torvalds
     
  • currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be
    used by kexec too. So make it compilation dependent on CONFIG_BUILD_BIN2C
    and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Fixing some checkpatch warnings(remove global initialization, move
    __initdata, coalesce formats ...)

    Signed-off-by: Fabian Frederick
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • On a system with low memory extracting the initramfs may fail. If this
    happens the user gets "Failed to execute /init" instead of an initramfs
    error.

    Check return value of sys_write and call error() when the write was
    incomplete or failed.

    Signed-off-by: David Engraf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Engraf
     
  • Now with 64bit bzImage and kexec tools, we support ramdisk that size is
    bigger than 2g, as we could put it above 4G.

    Found compressed initramfs image could not be decompressed properly. It
    turns out that image length is int during decompress detection, and it
    will become < 0 when length is more than 2G. Furthermore, during
    decompressing len as int is used for inbuf count, that has problem too.

    Change len to long, that should be ok as on 32 bit platform long is
    32bits.

    Tested with following compressed initramfs image as root with kexec.
    gzip, bzip2, xz, lzma, lzop, lz4.
    run time for populate_rootfs():
    size name Nehalem-EX Westmere-EX Ivybridge-EX
    9034400256 root_img : 26s 24s 30s
    3561095057 root_img.lz4 : 28s 27s 27s
    3459554629 root_img.lzo : 29s 29s 28s
    3219399480 root_img.gz : 64s 62s 49s
    2251594592 root_img.xz : 262s 260s 183s
    2226366598 root_img.lzma: 386s 376s 277s
    2901482513 root_img.bz2 : 635s 599s

    Signed-off-by: Yinghai Lu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Rashika Kheria
    Cc: Josh Triplett
    Cc: Kyungsik Lee
    Cc: P J P
    Cc: Al Viro
    Cc: Tetsuo Handa
    Cc: "Daniel M. Weeks"
    Cc: Alexandre Courbot
    Cc: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • When initrd (compressed or not) is used, kernel report data corrupted with
    /dev/ram0.

    The root cause:
    During initramfs checking, if it is initrd, it will be transferred to
    /initrd.image with sys_write.
    sys_write only support 2G-4K write, so if the initrd ram is more than
    that, /initrd.image will not complete at all.

    Add local xwrite to loop calling sys_write to workaround the problem.

    Also need to use xwrite in write_buffer() to handle:
    image is uncompressed cpio and there is one big file (>2G) in it.
    unpack_to_rootfs ===> write_buffer ===> actions[]/do_copy

    At the same time, we don't need to worry about sys_read/sys_write in
    do_mounts_rd.c::crd_load. As decompressor will have fill/flush and local
    buffer that is smaller than 2G.

    Test with uncompressed initrd, and compressed ones with gz, bz2, lzma,xz,
    lzop.

    Signed-off-by: Yinghai Lu
    Acked-by: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Geert Uytterhoeven
    Cc: Tetsuo Handa
    Cc: "Daniel M. Weeks"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • Currently rootdelay=N and rootwait behave differently (aside from the
    obvious unbounded wait duration) because they are at different places in
    the init sequence.

    The difference manifests itself for md devices because the call to
    md_run_setup() lives between rootdelay and rootwait, so if you try to use
    rootdelay=20 to try and allow a slow RAID0 array to assemble, you get
    this:

    [ 4.526011] sd 6:0:0:0: [sdc] Attached SCSI removable disk
    [ 22.972079] md: Waiting for all devices to be available before autodetect

    i.e. you've achieved nothing other than delaying the probing 20s, when
    what you wanted was a 20s delay _after_ the probing for md devices was
    initiated.

    Here we move the rootdelay code to be right beside the rootwait code, so
    that their behaviour is consistent.

    It should be noted that in doing so, the actions based on the
    saved_root_name[0] and initrd_load() were previously put on hold by
    rootdelay=N and now currently will not be delayed. However, I think
    consistent behaviour is more important than matching historical behaviour
    of delaying the above two operations.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     
  • Pull ARM SoC platform changes from Olof Johansson:
    "This is the bulk of new SoC enablement and other platform changes for
    3.17:

    - Samsung S5PV210 has been converted to DT and multiplatform
    - Clock drivers and bindings for some of the lower-end i.MX 1/2
    platforms
    - Kirkwood, one of the popular Marvell platforms, is folded into the
    mvebu platform code, removing mach-kirkwood
    - Hwmod data for TI AM43xx and DRA7 platforms
    - More additions of Renesas shmobile platform support
    - Removal of plat-samsung contents that can be removed with S5PV210
    being multiplatform/DT-enabled and the other two old platforms
    being removed

    New platforms (most with only basic support right now):

    - Hisilicon X5HD2 settop box chipset is introduced
    - Mediatek MT6589 (mobile chipset) is introduced
    - Broadcom BCM7xxx settop box chipset is introduced

    + as usual a lot other pieces all over the platform code"

    * tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits)
    ARM: hisi: remove smp from machine descriptor
    power: reset: move hisilicon reboot code
    ARM: dts: Add hix5hd2-dkb dts file.
    ARM: debug: Rename Hi3716 to HIX5HD2
    ARM: hisi: enable hix5hd2 SoC
    ARM: hisi: add ARCH_HISI
    MAINTAINERS: add entry for Broadcom ARM STB architecture
    ARM: brcmstb: select GISB arbiter and interrupt drivers
    ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs
    ARM: configs: enable SMP in bcm_defconfig
    ARM: add SMP support for Broadcom mobile SoCs
    Documentation: arm: misc updates to Marvell EBU SoC status
    Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC
    ARM: mvebu: fix build without platforms selected
    ARM: mvebu: add cpuidle support for Armada 38x
    ARM: mvebu: add cpuidle support for Armada 370
    cpuidle: mvebu: add Armada 38x support
    cpuidle: mvebu: add Armada 370 support
    cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7
    ARM: mvebu: export the SCU address
    ...

    Linus Torvalds
     

07 Aug, 2014

1 commit

  • The default size of the ring buffer is too small for machines with a
    large amount of CPUs under heavy load. What ends up happening when
    debugging is the ring buffer overlaps and chews up old messages making
    debugging impossible unless the size is passed as a kernel parameter.
    An idle system upon boot up will on average spew out only about one or
    two extra lines but where this really matters is on heavy load and that
    will vary widely depending on the system and environment.

    There are mechanisms to help increase the kernel ring buffer for tracing
    through debugfs, and those interfaces even allow growing the kernel ring
    buffer per CPU. We also have a static value which can be passed upon
    boot. Relying on debugfs however is not ideal for production, and
    relying on the value passed upon bootup is can only used *after* an
    issue has creeped up. Instead of being reactive this adds a proactive
    measure which lets you scale the amount of contributions you'd expect to
    the kernel ring buffer under load by each CPU in the worst case
    scenario.

    We use num_possible_cpus() to avoid complexities which could be
    introduced by dynamically changing the ring buffer size at run time,
    num_possible_cpus() lets us use the upper limit on possible number of
    CPUs therefore avoiding having to deal with hotplugging CPUs on and off.
    This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT
    which is used to specify the maximum amount of contributions to the
    kernel ring buffer in the worst case before the kernel ring buffer flips
    over, the size is specified as a power of 2. The total amount of
    contributions made by each CPU must be greater than half of the default
    kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger
    an increase upon bootup. The kernel ring buffer is increased to the
    next power of two that would fit the required minimum kernel ring buffer
    size plus the additional CPU contribution. For example if LOG_BUF_SHIFT
    is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs
    in order to trigger an increase of the kernel ring buffer. With a
    LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64
    possible CPUs to trigger an increase. If you had 128 possible CPUs the
    amount of minimum required kernel ring buffer bumps to:

    ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB

    Since we require the ring buffer to be a power of two the new required
    size would be 1024 KB.

    This CPU contributions are ignored when the "log_buf_len" kernel
    parameter is used as it forces the exact size of the ring buffer to an
    expected power of two value.

    [pmladek@suse.cz: fix build]
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Petr Mladek
    Tested-by: Davidlohr Bueso
    Tested-by: Petr Mladek
    Reviewed-by: Davidlohr Bueso
    Cc: Andrew Lunn
    Cc: Stephen Warren
    Cc: Michal Hocko
    Cc: Petr Mladek
    Cc: Joe Perches
    Cc: Arun KS
    Cc: Kees Cook
    Cc: Davidlohr Bueso
    Cc: Chris Metcalf
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis R. Rodriguez
     

10 Jul, 2014

2 commits


08 Jul, 2014

1 commit

  • Enabling NO_HZ_FULL currently has the side effect of enabling callback
    offloading on all CPUs. This results in lots of additional rcuo kthreads,
    and can also increase context switching and wakeups, even in cases where
    callback offloading is neither needed nor particularly desirable. This
    commit therefore enables callback offloading on a given CPU only if
    specifically requested at build time or boot time, or if that CPU has
    been specifically designated (again, either at build time or boot time)
    as a nohz_full CPU.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

17 Jun, 2014

1 commit


12 Jun, 2014

1 commit

  • Pull module updates from Rusty Russell:
    "Most of this is cleaning up various driver sysfs permissions so we can
    re-add the perm check (we unified the module param and sysfs checks,
    but the module ones were stronger so we weakened them temporarily).

    Param parsing gets documented, and also "--" now forces args to be
    handed to init (and ignored by the kernel).

    Module NX/RO protections get tightened: we now set them before calling
    parse_args()"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    module: set nx before marking module MODULE_STATE_COMING.
    samples/kobject/: avoid world-writable sysfs files.
    drivers/hid/hid-picolcd_fb: avoid world-writable sysfs files.
    drivers/staging/speakup/: avoid world-writable sysfs files.
    drivers/regulator/virtual: avoid world-writable sysfs files.
    drivers/scsi/pm8001/pm8001_ctl.c: avoid world-writable sysfs files.
    drivers/hid/hid-lg4ff.c: avoid world-writable sysfs files.
    drivers/video/fbdev/sm501fb.c: avoid world-writable sysfs files.
    drivers/mtd/devices/docg3.c: avoid world-writable sysfs files.
    speakup: fix incorrect perms on speakup_acntsa.c
    cpumask.h: silence warning with -Wsign-compare
    Documentation: Update kernel-parameters.tx
    param: hand arguments after -- straight to init
    modpost: Fix resource leak in read_dump()

    Linus Torvalds
     

05 Jun, 2014

11 commits

  • Pull x86-64 espfix changes from Peter Anvin:
    "This is the espfix64 code, which fixes the IRET information leak as
    well as the associated functionality problem. With this code applied,
    16-bit stack segments finally work as intended even on a 64-bit
    kernel.

    Consequently, this patchset also removes the runtime option that we
    added as an interim measure.

    To help the people working on Linux kernels for very small systems,
    this patchset also makes these compile-time configurable features"

    * 'x86/espfix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
    x86, espfix: Make it possible to disable 16-bit support
    x86, espfix: Make espfix64 a Kconfig option, fix UML
    x86, espfix: Fix broken header guard
    x86, espfix: Move espfix definitions into a separate header file
    x86-32, espfix: Remove filter for espfix32 due to race
    x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack

    Linus Torvalds
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • 1. Remove CLONE_KERNEL, it has no users and it is dangerous.

    The (old) comment says "List of flags we want to share for kernel
    threads" but this is not true, we do not want to share ->sighand by
    default. This flag can only be used if the caller is sure that both
    parent/child will never play with signals (say, allow_signal/etc).

    2. Change rest_init() to clone kernel_init() without CLONE_SIGHAND.

    In this case CLONE_SIGHAND does not really hurt, and it looks like
    optimization because copy_sighand() can avoid kmem_cache_alloc().

    But in fact this only adds the minor pessimization. kernel_init()
    is going to exec the init process, and de_thread() will need to
    unshare ->sighand and do kmem_cache_alloc(sighand_cachep) anyway,
    but it needs to do more work and take tasklist_lock and siglock.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Acked-by: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Mathieu Desnoyers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When a module is built into the kernel the module_init() function
    becomes an initcall. Sometimes debugging through dynamic debug can
    help, however, debugging built in kernel modules is typically done by
    changing the .config, recompiling, and booting the new kernel in an
    effort to determine exactly which module caused a problem.

    This patchset can be useful stand-alone or combined with initcall_debug.
    There are cases where some initcalls can hang the machine before the
    console can be flushed, which can make initcall_debug output inaccurate.
    Having the ability to skip initcalls can help further debugging of these
    scenarios.

    Usage: initcall_blacklist=

    ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
    the log contains:

    blacklisting initcall sgi_uv_sysfs_init
    ...
    ...
    initcall sgi_uv_sysfs_init blacklisted

    ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter
    and the log contains:

    blacklisting initcall foo_bar
    blacklisting initcall sgi_uv_sysfs_init
    ...
    ...
    initcall sgi_uv_sysfs_init blacklisted

    [akpm@linux-foundation.org: tweak printk text]
    Signed-off-by: Prarit Bhargava
    Cc: Richard Weinberger
    Cc: Andi Kleen
    Cc: Josh Boyer
    Cc: Rob Landley
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     
  • Pertially revert commit ea676e846a81 ("init/main.c: convert to
    pr_foo()").

    Unbeknownst to me, pr_debug() is different from the other pr_foo()
    levels: pr_debug() is a no-op when DEBUG is not defined.

    Happily, init/main.c does have a #define DEBUG so we didn't break
    initcall_debug. But the functioning of initcall_debug should not be
    dependent upon the presence of that #define DEBUG.

    Reported-by: Russell King
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • ... instead of naked numbers.

    Stuff in sysrq.c used to set it to 8 which is supposed to mean above
    default level so set it to DEBUG instead as we're terminating/killing all
    tasks and we want to be verbose there.

    Also, correct the check in x86_64_start_kernel which should be >= as
    we're clearly issuing the string there for all debug levels, not only
    the magical 10.

    Signed-off-by: Borislav Petkov
    Acked-by: Kees Cook
    Acked-by: Randy Dunlap
    Cc: Joe Perches
    Cc: Valdis Kletnieks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • sys_sgetmask and sys_ssetmask are obsolete system calls no longer
    supported in libc.

    This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
    mode configuration.That option is enabled by default for those
    architectures.

    Signed-off-by: Fabian Frederick
    Cc: Steven Miao
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Koichi Yasutake
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Greg Ungerer
    Cc: Heiko Carstens
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • CONFIG_CROSS_MEMORY_ATTACH adds couple syscalls: process_vm_readv and
    process_vm_writev, it's a kind of IPC for copying data between processes.
    Currently this option is placed inside "Processor type and features".

    This patch moves it into "General setup" (where all other arch-independed
    syscalls and ipc features are placed) and changes prompt string to less
    cryptic.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Christopher Yeoh
    Cc: Davidlohr Bueso
    Cc: Hugh Dickins
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Remove start_kernel()->mm_init_owner(&init_mm, &init_task).

    This doesn't really hurt but unnecessary and misleading. init_task is the
    "swapper" thread == current, its ->mm is always NULL. And init_mm can
    only be used as ->active_mm, not as ->mm.

    mm_init_owner() has a single caller with this patch, perhaps it should
    die. mm_init() can initialize ->owner under #ifdef.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Michal Hocko
    Cc: Balbir Singh
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: Peter Chiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • CONFIG_MM_OWNER makes no sense. It is not user-selectable, it is only
    selected by CONFIG_MEMCG automatically. So we can kill this option in
    init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally.

    Signed-off-by: Oleg Nesterov
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Kmemcg is currently under development and lacks some important features.
    In particular, it does not have support of kmem reclaim on memory pressure
    inside cgroup, which practically makes it unusable in real life. Let's
    warn about it in both Kconfig and Documentation to prevent complaints
    arising.

    Signed-off-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

22 May, 2014

1 commit


06 May, 2014

1 commit


05 May, 2014

1 commit

  • Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML
    build which had broken due to the non-existence of init_espfix_bsp()
    in UML: since UML uses its own Kconfig, this option does not appear in
    the UML build.

    This also makes it possible to make support for 16-bit segments a
    configuration option, for the people who want to minimize the size of
    the kernel.

    Reported-by: Ingo Molnar
    Signed-off-by: H. Peter Anvin
    Cc: Richard Weinberger
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com

    H. Peter Anvin
     

01 May, 2014

1 commit

  • The IRET instruction, when returning to a 16-bit segment, only
    restores the bottom 16 bits of the user space stack pointer. This
    causes some 16-bit software to break, but it also leaks kernel state
    to user space. We have a software workaround for that ("espfix") for
    the 32-bit kernel, but it relies on a nonzero stack segment base which
    is not available in 64-bit mode.

    In checkin:

    b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

    we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
    the logic that 16-bit support is crippled on 64-bit kernels anyway (no
    V86 support), but it turns out that people are doing stuff like
    running old Win16 binaries under Wine and expect it to work.

    This works around this by creating percpu "ministacks", each of which
    is mapped 2^16 times 64K apart. When we detect that the return SS is
    on the LDT, we copy the IRET frame to the ministack and use the
    relevant alias to return to userspace. The ministacks are mapped
    readonly, so if IRET faults we promote #GP to #DF which is an IST
    vector and thus has its own stack; we then do the fixup in the #DF
    handler.

    (Making #GP an IST exception would make the msr_safe functions unsafe
    in NMI/MC context, and quite possibly have other effects.)

    Special thanks to:

    - Andy Lutomirski, for the suggestion of using very small stack slots
    and copy (as opposed to map) the IRET frame there, and for the
    suggestion to mark them readonly and let the fault promote to #DF.
    - Konrad Wilk for paravirt fixup and testing.
    - Borislav Petkov for testing help and useful comments.

    Reported-by: Brian Gerst
    Signed-off-by: H. Peter Anvin
    Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
    Cc: Konrad Rzeszutek Wilk
    Cc: Borislav Petkov
    Cc: Andrew Lutomriski
    Cc: Linus Torvalds
    Cc: Dirk Hohndel
    Cc: Arjan van de Ven
    Cc: comex
    Cc: Alexander van Heukelum
    Cc: Boris Ostrovsky
    Cc: # consider after upstream merge

    H. Peter Anvin
     

28 Apr, 2014

1 commit

  • The kernel passes any args it doesn't need through to init, except it
    assumes anything containing '.' belongs to the kernel (for a module).
    This change means all users can clearly distinguish which arguments
    are for init.

    For example, the kernel uses debug ("dee-bug") to mean log everything to
    the console, where systemd uses the debug from the Scandinavian "day-boog"
    meaning "fail to boot". If a future versions uses argv[] instead of
    reading /proc/cmdline, this confusion will be avoided.

    eg: test 'FOO="this is --foo"' -- 'systemd.debug="true true true"'

    Gives:
    argv[0] = '/debug-init'
    argv[1] = 'test'
    argv[2] = 'systemd.debug=true true true'
    envp[0] = 'HOME=/'
    envp[1] = 'TERM=linux'
    envp[2] = 'FOO=this is --foo'

    Signed-off-by: Rusty Russell

    Rusty Russell
     

19 Apr, 2014

1 commit


13 Apr, 2014

1 commit

  • Pull audit updates from Eric Paris.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
    audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
    audit: do not cast audit_rule_data pointers pointlesly
    AUDIT: Allow login in non-init namespaces
    audit: define audit_is_compat in kernel internal header
    kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
    sched: declare pid_alive as inline
    audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
    syscall_get_arch: remove useless function arguments
    audit: remove stray newline from audit_log_execve_info() audit_panic() call
    audit: remove stray newlines from audit_log_lost messages
    audit: include subject in login records
    audit: remove superfluous new- prefix in AUDIT_LOGIN messages
    audit: allow user processes to log from another PID namespace
    audit: anchor all pid references in the initial pid namespace
    audit: convert PPIDs to the inital PID namespace.
    pid: get pid_t ppid of task in init_pid_ns
    audit: rename the misleading audit_get_context() to audit_take_context()
    audit: Add generic compat syscall support
    audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
    ...

    Linus Torvalds
     

08 Apr, 2014

2 commits

  • This can greatly aid in narrowing down the real source of initramfs
    problems such as failures related to the compression of the in-kernel
    initramfs when an external initramfs is in use as well. Existing errors
    are ambiguous as to which initramfs is a problem and why.

    [akpm@linux-foundation.org: use pr_debug()]
    Signed-off-by: Daniel M. Weeks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel M. Weeks
     
  • "make allnoconfig" exists to ease testing of minimal configurations.
    Documentation/SubmitChecklist includes a note to test with allnoconfig.
    This helps catch missing dependencies on common-but-not-required
    functionality, which might otherwise go unnoticed.

    However, allnoconfig still leaves many symbols enabled, because they're
    hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT. For instance, allnoconfig
    still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't
    typically get build-tested with those disabled.

    To address this, introduce a new Kconfig option "allnoconfig_y", used on
    symbols which only exist to hide other symbols. Set it on CONFIG_EMBEDDED
    (which then selects CONFIG_EXPERT). allnoconfig will then disable all the
    symbols hidden behind those.

    Signed-off-by: Josh Triplett
    Tested-by: Paul E. McKenney
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

04 Apr, 2014

5 commits

  • Merge first patch-bomb from Andrew Morton:
    - Various misc bits
    - kmemleak fixes
    - small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things
    - fanotify
    - I appear to have become SuperH maintainer
    - ocfs2 updates
    - direct-io tweaks
    - a bit of the MM queue
    - printk updates
    - MAINTAINERS maintenance
    - some backlight things
    - lib/ updates
    - checkpatch updates
    - the rtc queue
    - nilfs2 updates
    - Small Documentation/ updates

    * emailed patches from Andrew Morton : (237 commits)
    Documentation/SubmittingPatches: remove references to patch-scripts
    Documentation/SubmittingPatches: update some dead URLs
    Documentation/filesystems/ntfs.txt: remove changelog reference
    Documentation/kmemleak.txt: updates
    fs/reiserfs/super.c: add __init to init_inodecache
    fs/reiserfs: move prototype declaration to header file
    fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache()
    fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks
    fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block
    nilfs2: update project's web site in nilfs2.txt
    nilfs2: update MAINTAINERS file entries fix
    nilfs2: verify metadata sizes read from disk
    nilfs2: add FITRIM ioctl support for nilfs2
    nilfs2: add nilfs_sufile_trim_fs to trim clean segs
    nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
    nilfs2: add nilfs_sufile_set_suinfo to update segment usage
    nilfs2: add struct nilfs_suinfo_update and flags
    nilfs2: update MAINTAINERS file entries
    fs/coda/inode.c: add __init to init_inodecache()
    BEFS: logging cleanup
    ...

    Linus Torvalds
     
  • Signed-off-by: chishanmingshen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    chishanmingshen
     
  • uselib hasn't been used since libc5; glibc does not use it. Support
    turning it off.

    When disabled, also omit the load_elf_library implementation from
    binfmt_elf.c, which only uselib invokes.

    bloat-o-meter:
    add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785)
    function old new delta
    padzero 39 36 -3
    uselib_flags 20 - -20
    sys_uselib 168 - -168
    SyS_uselib 168 - -168
    load_elf_library 426 - -426

    The new CONFIG_USELIB defaults to `y'.

    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • sys_sysfs is an obsolete system call no longer supported by libc.

    - This patch adds a default CONFIG_SYSFS_SYSCALL=y

    - Option can be turned off in expert mode.

    - cond_syscall added to kernel/sys_ni.c

    [akpm@linux-foundation.org: tweak Kconfig help text]
    Signed-off-by: Fabian Frederick
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Pull cgroup updates from Tejun Heo:
    "A lot updates for cgroup:

    - The biggest one is cgroup's conversion to kernfs. cgroup took
    after the long abandoned vfs-entangled sysfs implementation and
    made it even more convoluted over time. cgroup's internal objects
    were fused with vfs objects which also brought in vfs locking and
    object lifetime rules. Naturally, there are places where vfs rules
    don't fit and nasty hacks, such as credential switching or lock
    dance interleaving inode mutex and cgroup_mutex with object serial
    number comparison thrown in to decide whether the operation is
    actually necessary, needed to be employed.

    After conversion to kernfs, internal object lifetime and locking
    rules are mostly isolated from vfs interactions allowing shedding
    of several nasty hacks and overall simplification. This will also
    allow implmentation of operations which may affect multiple cgroups
    which weren't possible before as it would have required nesting
    i_mutexes.

    - Various simplifications including dropping of module support,
    easier cgroup name/path handling, simplified cgroup file type
    handling and task_cg_lists optimization.

    - Prepatory changes for the planned unified hierarchy, which is still
    a patchset away from being actually operational. The dummy
    hierarchy is updated to serve as the default unified hierarchy.
    Controllers which aren't claimed by other hierarchies are
    associated with it, which BTW was what the dummy hierarchy was for
    anyway.

    - Various fixes from Li and others. This pull request includes some
    patches to add missing slab.h to various subsystems. This was
    triggered xattr.h include removal from cgroup.h. cgroup.h
    indirectly got included a lot of files which brought in xattr.h
    which brought in slab.h.

    There are several merge commits - one to pull in kernfs updates
    necessary for converting cgroup (already in upstream through
    driver-core), others for interfering changes in the fixes branch"

    * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
    cgroup: remove useless argument from cgroup_exit()
    cgroup: fix spurious lockdep warning in cgroup_exit()
    cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
    cgroup: break kernfs active_ref protection in cgroup directory operations
    cgroup: fix cgroup_taskset walking order
    cgroup: implement CFTYPE_ONLY_ON_DFL
    cgroup: make cgrp_dfl_root mountable
    cgroup: drop const from @buffer of cftype->write_string()
    cgroup: rename cgroup_dummy_root and related names
    cgroup: move ->subsys_mask from cgroupfs_root to cgroup
    cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
    cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
    cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
    cgroup: reorganize cgroup bootstrapping
    cgroup: relocate setting of CGRP_DEAD
    cpuset: use rcu_read_lock() to protect task_cs()
    cgroup_freezer: document freezer_fork() subtleties
    cgroup: update cgroup_transfer_tasks() to either succeed or fail
    cgroup: drop task_lock() protection around task->cgroups
    cgroup: update how a newly forked task gets associated with css_set
    ...

    Linus Torvalds