17 Mar, 2016

1 commit

  • Merge first patch-bomb from Andrew Morton:

    - some misc things

    - ofs2 updates

    - about half of MM

    - checkpatch updates

    - autofs4 update

    * emailed patches from Andrew Morton : (120 commits)
    autofs4: fix string.h include in auto_dev-ioctl.h
    autofs4: use pr_xxx() macros directly for logging
    autofs4: change log print macros to not insert newline
    autofs4: make autofs log prints consistent
    autofs4: fix some white space errors
    autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
    autofs4: fix coding style line length in autofs4_wait()
    autofs4: fix coding style problem in autofs4_get_set_timeout()
    autofs4: coding style fixes
    autofs: show pipe inode in mount options
    kallsyms: add support for relative offsets in kallsyms address table
    kallsyms: don't overload absolute symbol type for percpu symbols
    x86: kallsyms: disable absolute percpu symbols on !SMP
    checkpatch: fix another left brace warning
    checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
    checkpatch: warn on bare unsigned or signed declarations without int
    checkpatch: exclude asm volatile from complex macro check
    mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
    mm: migrate: consolidate mem_cgroup_migrate() calls
    mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
    ...

    Linus Torvalds
     

16 Mar, 2016

2 commits

  • Use list_for_each_entry() instead of list_for_each() to simplify the code.

    Signed-off-by: Geliang Tang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geliang Tang
     
  • Pull cpu hotplug updates from Thomas Gleixner:
    "This is the first part of the ongoing cpu hotplug rework:

    - Initial implementation of the state machine

    - Runs all online and prepare down callbacks on the plugged cpu and
    not on some random processor

    - Replaces busy loop waiting with completions

    - Adds tracepoints so the states can be followed"

    More detailed commentary on this work from an earlier email:
    "What's wrong with the current cpu hotplug infrastructure?

    - Asymmetry

    The hotplug notifier mechanism is asymmetric versus the bringup and
    teardown. This is mostly caused by the notifier mechanism.

    - Largely undocumented dependencies

    While some notifiers use explicitely defined notifier priorities,
    we have quite some notifiers which use numerical priorities to
    express dependencies without any documentation why.

    - Control processor driven

    Most of the bringup/teardown of a cpu is driven by a control
    processor. While it is understandable, that preperatory steps,
    like idle thread creation, memory allocation for and initialization
    of essential facilities needs to be done before a cpu can boot,
    there is no reason why everything else must run on a control
    processor. Before this patch series, bringup looks like this:

    Control CPU Booting CPU

    do preparatory steps
    kick cpu into life

    do low level init

    sync with booting cpu sync with control cpu

    bring the rest up

    - All or nothing approach

    There is no way to do partial bringups. That's something which is
    really desired because we waste e.g. at boot substantial amount of
    time just busy waiting that the cpu comes to life. That's stupid
    as we could very well do preparatory steps and the initial IPI for
    other cpus and then go back and do the necessary low level
    synchronization with the freshly booted cpu.

    - Minimal debuggability

    Due to the notifier based design, it's impossible to switch between
    two stages of the bringup/teardown back and forth in order to test
    the correctness. So in many hotplug notifiers the cancel
    mechanisms are either not existant or completely untested.

    - Notifier [un]registering is tedious

    To [un]register notifiers we need to protect against hotplug at
    every callsite. There is no mechanism that bringup/teardown
    callbacks are issued on the online cpus, so every caller needs to
    do it itself. That also includes error rollback.

    What's the new design?

    The base of the new design is a symmetric state machine, where both
    the control processor and the booting/dying cpu execute a well
    defined set of states. Each state is symmetric in the end, except
    for some well defined exceptions, and the bringup/teardown can be
    stopped and reversed at almost all states.

    So the bringup of a cpu will look like this in the future:

    Control CPU Booting CPU

    do preparatory steps
    kick cpu into life

    do low level init

    sync with booting cpu sync with control cpu

    bring itself up

    The synchronization step does not require the control cpu to wait.
    That mechanism can be done asynchronously via a worker or some
    other mechanism.

    The teardown can be made very similar, so that the dying cpu cleans
    up and brings itself down. Cleanups which need to be done after
    the cpu is gone, can be scheduled asynchronously as well.

    There is a long way to this, as we need to refactor the notion when a
    cpu is available. Today we set the cpu online right after it comes
    out of the low level bringup, which is not really correct.

    The proper mechanism is to set it to available, i.e. cpu local
    threads, like softirqd, hotplug thread etc. can be scheduled on that
    cpu, and once it finished all booting steps, it's set to online, so
    general workloads can be scheduled on it. The reverse happens on
    teardown. First thing to do is to forbid scheduling of general
    workloads, then teardown all the per cpu resources and finally shut it
    off completely.

    This patch series implements the basic infrastructure for this at the
    core level. This includes the following:

    - Basic state machine implementation with well defined states, so
    ordering and prioritization can be expressed.

    - Interfaces to [un]register state callbacks

    This invokes the bringup/teardown callback on all online cpus with
    the proper protection in place and [un]installs the callbacks in
    the state machine array.

    For callbacks which have no particular ordering requirement we have
    a dynamic state space, so that drivers don't have to register an
    explicit hotplug state.

    If a callback fails, the code automatically does a rollback to the
    previous state.

    - Sysfs interface to drive the state machine to a particular step.

    This is only partially functional today. Full functionality and
    therefor testability will be achieved once we converted all
    existing hotplug notifiers over to the new scheme.

    - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
    processor:

    Control CPU Booting CPU

    do preparatory steps
    kick cpu into life

    do low level init

    sync with booting cpu sync with control cpu
    wait for boot
    bring itself up

    Signal completion to control cpu

    In a previous step of this work we've done a full tree mechanical
    conversion of all hotplug notifiers to the new scheme. The balance
    is a net removal of about 4000 lines of code.

    This is not included in this series, as we decided to take a
    different approach. Instead of mechanically converting everything
    over, we will do a proper overhaul of the usage sites one by one so
    they nicely fit into the symmetric callback scheme.

    I decided to do that after I looked at the ugliness of some of the
    converted sites and figured out that their hotplug mechanism is
    completely buggered anyway. So there is no point to do a
    mechanical conversion first as we need to go through the usage
    sites one by one again in order to achieve a full symmetric and
    testable behaviour"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    cpu/hotplug: Document states better
    cpu/hotplug: Fix smpboot thread ordering
    cpu/hotplug: Remove redundant state check
    cpu/hotplug: Plug death reporting race
    rcu: Make CPU_DYING_IDLE an explicit call
    cpu/hotplug: Make wait for dead cpu completion based
    cpu/hotplug: Let upcoming cpu bring itself fully up
    arch/hotplug: Call into idle with a proper state
    cpu/hotplug: Move online calls to hotplugged cpu
    cpu/hotplug: Create hotplug threads
    cpu/hotplug: Split out the state walk into functions
    cpu/hotplug: Unpark smpboot threads from the state machine
    cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
    cpu/hotplug: Implement setup/removal interface
    cpu/hotplug: Make target state writeable
    cpu/hotplug: Add sysfs state interface
    cpu/hotplug: Hand in target state to _cpu_up/down
    cpu/hotplug: Convert the hotplugged cpu work to a state machine
    cpu/hotplug: Convert to a state machine for the control processor
    cpu/hotplug: Add tracepoints
    ...

    Linus Torvalds
     

15 Mar, 2016

1 commit

  • Pull read-only kernel memory updates from Ingo Molnar:
    "This tree adds two (security related) enhancements to the kernel's
    handling of read-only kernel memory:

    - extend read-only kernel memory to a new class of formerly writable
    kernel data: 'post-init read-only memory' via the __ro_after_init
    attribute, and mark the ARM and x86 vDSO as such read-only memory.

    This kind of attribute can be used for data that requires a once
    per bootup initialization sequence, but is otherwise never modified
    after that point.

    This feature was based on the work by PaX Team and Brad Spengler.

    (by Kees Cook, the ARM vDSO bits by David Brown.)

    - make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
    Kconfig option. This simplifies the kernel and also signals that
    read-only memory is the default model and a first-class citizen.
    (Kees Cook)"

    * 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ARM/vdso: Mark the vDSO code read-only after init
    x86/vdso: Mark the vDSO code read-only after init
    lkdtm: Verify that '__ro_after_init' works correctly
    arch: Introduce post-init read-only memory
    x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
    mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
    asm-generic: Consolidate mark_rodata_ro()

    Linus Torvalds
     

02 Mar, 2016

2 commits

  • Handle the smpboot threads in the state machine.

    Signed-off-by: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: Rik van Riel
    Cc: Rafael Wysocki
    Cc: "Srivatsa S. Bhat"
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Cc: Sebastian Siewior
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Paul McKenney
    Cc: Linus Torvalds
    Cc: Paul Turner
    Link: http://lkml.kernel.org/r/20160226182341.295777684@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Move the split out steps into a callback array and let the cpu_up/down
    code iterate through the array functions. For now most of the
    callbacks are asymmetric to resemble the current hotplug maze.

    Signed-off-by: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: Rik van Riel
    Cc: Rafael Wysocki
    Cc: "Srivatsa S. Bhat"
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Cc: Sebastian Siewior
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Paul McKenney
    Cc: Linus Torvalds
    Cc: Paul Turner
    Link: http://lkml.kernel.org/r/20160226182340.671816690@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

22 Feb, 2016

1 commit

  • It may be useful to debug writes to the readonly sections of memory,
    so provide a cmdline "rodata=off" to allow for this. This can be
    expanded in the future to support "log" and "write" modes, but that
    will need to be architecture-specific.

    This also makes KDB software breakpoints more usable, as read-only
    mappings can now be disabled on any kernel.

    Suggested-by: H. Peter Anvin
    Signed-off-by: Kees Cook
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: David Brown
    Cc: Denys Vlasenko
    Cc: Emese Revfy
    Cc: Linus Torvalds
    Cc: Mathias Krause
    Cc: Michael Ellerman
    Cc: PaX Team
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: kernel-hardening@lists.openwall.com
    Cc: linux-arch
    Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.org
    Signed-off-by: Ingo Molnar

    Kees Cook
     

09 Feb, 2016

1 commit

  • Lockdep is initialized at compile time now. Get rid of lockdep_init().

    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Cc: Linus Torvalds
    Cc: Mike Krinkin
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Cc: mm-commits@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Andrey Ryabinin
     

21 Jan, 2016

1 commit


05 Dec, 2015

1 commit

  • This commit adds the invocation of rcu_end_inkernel_boot() just before
    init is invoked. This allows the CONFIG_RCU_EXPEDITE_BOOT Kconfig
    option to do something useful and prepares for the upcoming
    rcupdate.rcu_normal_after_boot kernel parameter.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

11 Sep, 2015

1 commit

  • We need to launch the usermodehelper kernel threads with the widest
    affinity and this is partly why we use khelper. This workqueue has
    unbound properties and thus a wide affinity inherited by all its children.

    Now khelper also has special properties that we aren't much interested in:
    ordered and singlethread. There is really no need about ordering as all
    we do is creating kernel threads. This can be done concurrently. And
    singlethread is a useless limitation as well.

    The workqueue engine already proposes generic unbound workqueues that
    don't share these useless properties and handle well parallel jobs.

    The only worrysome specific is their affinity to the node of the current
    CPU. It's fine for creating the usermodehelper kernel threads but those
    inherit this affinity for longer jobs such as requesting modules.

    This patch proposes to use these node affine unbound workqueues assuming
    that a node is sufficient to handle several parallel usermodehelper
    requests.

    Signed-off-by: Frederic Weisbecker
    Cc: Rik van Riel
    Reviewed-by: Oleg Nesterov
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     

07 Aug, 2015

1 commit

  • Dave Hansen reported the following;

    My laptop has been behaving strangely with 4.2-rc2. Once I log
    in to my X session, I start getting all kinds of strange errors
    from applications and see this in my dmesg:

    VFS: file-max limit 8192 reached

    The problem is that the file-max is calculated before memory is fully
    initialised and miscalculates how much memory the kernel is using. This
    patch recalculates file-max after deferred memory initialisation. Note
    that using memory hotplug infrastructure would not have avoided this
    problem as the value is not recalculated after memory hot-add.

    4.1: files_stat.max_files = 6582781
    4.2-rc2: files_stat.max_files = 8192
    4.2-rc2 patched: files_stat.max_files = 6562467

    Small differences with the patch applied and 4.1 but not enough to matter.

    Signed-off-by: Mel Gorman
    Reported-by: Dave Hansen
    Cc: Nicolai Stange
    Cc: Dave Hansen
    Cc: Alex Ng
    Cc: Fengguang Wu
    Cc: Peter Zijlstra (Intel)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

01 Jul, 2015

1 commit

  • Waiman Long reported that 24TB machines hit OOM during basic setup when
    struct page initialisation was deferred. One approach is to initialise
    memory on demand but it interferes with page allocator paths. This patch
    creates dedicated threads to initialise memory before basic setup. It
    then blocks on a rw_semaphore until completion as a wait_queue and counter
    is overkill. This may be slower to boot but it's simplier overall and
    also gets rid of a section mangling which existed so kswapd could do the
    initialisation.

    [akpm@linux-foundation.org: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast]
    Signed-off-by: Mel Gorman
    Cc: Waiman Long
    Cc: Dave Hansen
    Cc: Scott Norton
    Tested-by: Daniel J Blueman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

27 Jun, 2015

1 commit

  • Pull driver core updates from Greg KH:
    "Here is the driver core / firmware changes for 4.2-rc1.

    A number of small changes all over the place in the driver core, and
    in the firmware subsystem. Nothing really major, full details in the
    shortlog. Some of it is a bit of churn, given that the platform
    driver probing changes was found to not work well, so they were
    reverted.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
    Revert "base/platform: Only insert MEM and IO resources"
    Revert "base/platform: Continue on insert_resource() error"
    Revert "of/platform: Use platform_device interface"
    Revert "base/platform: Remove code duplication"
    firmware: add missing kfree for work on async call
    fs: sysfs: don't pass count == 0 to bin file readers
    base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
    base/platform: Remove code duplication
    of/platform: Use platform_device interface
    base/platform: Continue on insert_resource() error
    base/platform: Only insert MEM and IO resources
    firmware: use const for remaining firmware names
    firmware: fix possible use after free on name on asynchronous request
    firmware: check for file truncation on direct firmware loading
    firmware: fix __getname() missing failure check
    drivers: of/base: move of_init to driver_init
    drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
    sysfs: disambiguate between "error code" and "failure" in comments
    driver-core: fix build for !CONFIG_MODULES
    driver-core: make __device_attach() static
    ...

    Linus Torvalds
     

11 Jun, 2015

1 commit

  • Commit 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before
    timekeeping_init()" moved the ACPI subsystem initialization,
    including the ACPI mode enabling, to an earlier point in the
    initialization sequence, to allow the timekeeping subsystem
    use ACPI early. Unfortunately, that resulted in boot regressions
    on some systems and the early ACPI initialization was moved toward
    its original position in the kernel initialization code by commit
    c4e1acbb35e4 "ACPI / init: Invoke early ACPI initialization later".

    However, that turns out to be insufficient, as boot is still broken
    on the Tyan S8812 mainboard.

    To fix that issue, split the ACPI early initialization code into
    two pieces so the majority of it still located in acpi_early_init()
    and the part switching over the platform into the ACPI mode goes into
    a new function, acpi_subsystem_init(), executed at the original early
    ACPI initialization spot.

    That fixes the Tyan S8812 boot problem, but still allows ACPI
    tables to be loaded earlier which is useful to the EFI code in
    efi_enter_virtual_mode().

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141
    Fixes: 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before timekeeping_init()"
    Reported-and-tested-by: Marius Tolzmann
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Toshi Kani
    Reviewed-by: Hanjun Guo
    Reviewed-by: Lee, Chun-Yi

    Rafael J. Wysocki
     

20 May, 2015

1 commit

  • This adds an extra argument onto parse_params() to be used
    as a way to make the unused callback a bit more useful and
    generic by allowing the caller to pass on a data structure
    of its choice. An example use case is to allow us to easily
    make module parameters for every module which we will do
    next.

    @ parse @
    identifier name, args, params, num, level_min, level_max;
    identifier unknown, param, val, doing;
    type s16;
    @@
    extern char *parse_args(const char *name,
    char *args,
    const struct kernel_param *params,
    unsigned num,
    s16 level_min,
    s16 level_max,
    + void *arg,
    int (*unknown)(char *param, char *val,
    const char *doing
    + , void *arg
    ));

    @ parse_mod @
    identifier name, args, params, num, level_min, level_max;
    identifier unknown, param, val, doing;
    type s16;
    @@
    char *parse_args(const char *name,
    char *args,
    const struct kernel_param *params,
    unsigned num,
    s16 level_min,
    s16 level_max,
    + void *arg,
    int (*unknown)(char *param, char *val,
    const char *doing
    + , void *arg
    ))
    {
    ...
    }

    @ parse_args_found @
    expression R, E1, E2, E3, E4, E5, E6;
    identifier func;
    @@

    (
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    func);
    |
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    &func);
    |
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    NULL);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    func);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    &func);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    NULL);
    )

    @ parse_args_unused depends on parse_args_found @
    identifier parse_args_found.func;
    @@

    int func(char *param, char *val, const char *unused
    + , void *arg
    )
    {
    ...
    }

    @ mod_unused depends on parse_args_found @
    identifier parse_args_found.func;
    expression A1, A2, A3;
    @@

    - func(A1, A2, A3);
    + func(A1, A2, A3, NULL);

    Generated-by: Coccinelle SmPL
    Cc: cocci@systeme.lip6.fr
    Cc: Tejun Heo
    Cc: Arjan van de Ven
    Cc: Greg Kroah-Hartman
    Cc: Rusty Russell
    Cc: Christoph Hellwig
    Cc: Felipe Contreras
    Cc: Ewan Milne
    Cc: Jean Delvare
    Cc: Hannes Reinecke
    Cc: Jani Nikula
    Cc: linux-kernel@vger.kernel.org
    Reviewed-by: Tejun Heo
    Acked-by: Rusty Russell
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Greg Kroah-Hartman

    Luis R. Rodriguez
     

17 Apr, 2015

1 commit

  • PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
    THREAD_SIZE.

    E.g. architecture hexagon may have page size 1M and thread size 4096.
    This would lead to a division by zero in the calculation of max_threads.

    With this patch the buggy code is moved to a separate function
    set_max_threads. The error is not fixed.

    After fixing the problem in a separate patch the new function can be
    reused to adjust max_threads after adding or removing memory.

    Argument mempages of function fork_init() is removed as totalram_pages is
    an exported symbol.

    The creation of separate patches for refactoring to a new function and for
    fixing the logic was suggested by Ingo Molnar.

    Signed-off-by: Heinrich Schuchardt
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Guenter Roeck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinrich Schuchardt
     

15 Apr, 2015

4 commits

  • Merge first patchbomb from Andrew Morton:

    - arch/sh updates

    - ocfs2 updates

    - kernel/watchdog feature

    - about half of mm/

    * emailed patches from Andrew Morton : (122 commits)
    Documentation: update arch list in the 'memtest' entry
    Kconfig: memtest: update number of test patterns up to 17
    arm: add support for memtest
    arm64: add support for memtest
    memtest: use phys_addr_t for physical addresses
    mm: move memtest under mm
    mm, hugetlb: abort __get_user_pages if current has been oom killed
    mm, mempool: do not allow atomic resizing
    memcg: print cgroup information when system panics due to panic_on_oom
    mm: numa: remove migrate_ratelimited
    mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    mm: split ET_DYN ASLR from mmap ASLR
    s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    mm: expose arch_mmap_rnd when available
    s390: standardize mmap_rnd() usage
    powerpc: standardize mmap_rnd() usage
    mips: extract logic for mmap_rnd()
    arm64: standardize mmap_rnd() usage
    x86: standardize mmap_rnd() usage
    arm: factor out mmap ASLR into mmap_rnd
    ...

    Linus Torvalds
     
  • Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
    I/O mappings with pud/pmd are enabled on the kernel.

    ioremap_huge_init() calls arch_ioremap_pud_supported() and
    arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

    A new kernel option "nohugeiomap" is also added, so that user can disable
    the huge I/O map capabilities when necessary.

    Signed-off-by: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Dave Hansen
    Cc: Robert Elliott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Pull RCU changes from Ingo Molnar:
    "The main changes in this cycle were:

    - changes permitting use of call_rcu() and friends very early in
    boot, for example, before rcu_init() is invoked.

    - add in-kernel API to enable and disable expediting of normal RCU
    grace periods.

    - improve RCU's handling of (hotplug-) outgoing CPUs.

    - NO_HZ_FULL_SYSIDLE fixes.

    - tiny-RCU updates to make it more tiny.

    - documentation updates.

    - miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
    cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
    cpu: Defer smpboot kthread unparking until CPU known to scheduler
    rcu: Associate quiescent-state reports with grace period
    rcu: Yet another fix for preemption and CPU hotplug
    rcu: Add diagnostics to grace-period cleanup
    rcutorture: Default to grace-period-initialization delays
    rcu: Handle outgoing CPUs on exit from idle loop
    cpu: Make CPU-offline idle-loop transition point more precise
    rcu: Eliminate ->onoff_mutex from rcu_node structure
    rcu: Process offlining and onlining only at grace-period start
    rcu: Move rcu_report_unblock_qs_rnp() to common code
    rcu: Rework preemptible expedited bitmask handling
    rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
    rcutorture: Enable slow grace-period initializations
    rcu: Provide diagnostic option to slow down grace-period initialization
    rcu: Detect stalls caused by failure to propagate up rcu_node tree
    rcu: Eliminate empty HOTPLUG_CPU ifdef
    rcu: Simplify sync_rcu_preempt_exp_init()
    rcu: Put all orphan-callback-related code under same comment
    rcu: Consolidate offline-CPU callback initialization
    ...

    Linus Torvalds
     
  • Pull trivial tree from Jiri Kosina:
    "Usual trivial tree updates. Nothing outstanding -- mostly printk()
    and comment fixes and unused identifier removals"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    goldfish: goldfish_tty_probe() is not using 'i' any more
    powerpc: Fix comment in smu.h
    qla2xxx: Fix printks in ql_log message
    lib: correct link to the original source for div64_u64
    si2168, tda10071, m88ds3103: Fix firmware wording
    usb: storage: Fix printk in isd200_log_config()
    qla2xxx: Fix printk in qla25xx_setup_mode
    init/main: fix reset_device comment
    ipwireless: missing assignment
    goldfish: remove unreachable line of code
    coredump: Fix do_coredump() comment
    stacktrace.h: remove duplicate declaration task_struct
    smpboot.h: Remove unused function prototype
    treewide: Fix typo in printk messages
    treewide: Fix typo in printk messages
    mod_devicetable: fix comment for match_flags

    Linus Torvalds
     

13 Apr, 2015

1 commit

  • Currently, smpboot_unpark_threads() is invoked before the incoming CPU
    has been added to the scheduler's runqueue structures. This might
    potentially cause the unparked kthread to run on the wrong CPU, since the
    correct CPU isn't fully set up yet.

    That causes a sporadic, hard to debug boot crash triggering on some
    systems, reported by Borislav Petkov, and bisected down to:

    2a442c9c6453 ("x86: Use common outgoing-CPU-notification code")

    This patch places smpboot_unpark_threads() in a CPU hotplug
    notifier with priority set so that these kthreads are unparked just after
    the CPU has been added to the runqueues.

    Reported-and-tested-by: Borislav Petkov
    Signed-off-by: Paul E. McKenney
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

07 Mar, 2015

1 commit


05 Mar, 2015

1 commit

  • Now we call ss->bind() in cgroup_init(), so cgroup_init() will
    call cpuset_bind() and then the latter will access top_cpuset's
    cpumask, which is NULL, because cpuset_init() is called after
    cgroup_init()

    The simplest fix is to swap cgroup_init() and cpuset_init().

    Cc: Vladimir Davydov
    Fixes: 295458e67284 ("cgroup: call cgroup_subsys->bind on cgroup subsys initialization")
    Reported by: Ming Lei
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo
    Acked-by: Vladimir Davydov

    Zefan Li
     

14 Feb, 2015

1 commit

  • CONFIG_INIT_FALLBACK adds config bloat without an obvious use case that
    makes it worth keeping around. Delete it.

    Signed-off-by: Andy Lutomirski
    Cc: Rusty Russell
    Cc: Chuck Ebbert
    Cc: Frank Rowand
    Reviewed-by: Josh Triplett
    Cc: Randy Dunlap
    Cc: Rob Landley
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

22 Jan, 2015

1 commit

  • The UP local API support can be set up from an early initcall. No need
    for horrible hackery in the init code.

    Signed-off-by: Thomas Gleixner
    Cc: Jiang Liu
    Cc: Joerg Roedel
    Cc: Tony Luck
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/20150115211703.827943883@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

17 Dec, 2014

2 commits

  • Pull vfs pile #2 from Al Viro:
    "Next pile (and there'll be one or two more).

    The large piece in this one is getting rid of /proc/*/ns/* weirdness;
    among other things, it allows to (finally) make nameidata completely
    opaque outside of fs/namei.c, making for easier further cleanups in
    there"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    coda_venus_readdir(): use file_inode()
    fs/namei.c: fold link_path_walk() call into path_init()
    path_init(): don't bother with LOOKUP_PARENT in argument
    fs/namei.c: new helper (path_cleanup())
    path_init(): store the "base" pointer to file in nameidata itself
    make default ->i_fop have ->open() fail with ENXIO
    make nameidata completely opaque outside of fs/namei.c
    kill proc_ns completely
    take the targets of /proc/*/ns/* symlinks to separate fs
    bury struct proc_ns in fs/proc
    copy address of proc_ns_ops into ns_common
    new helpers: ns_alloc_inum/ns_free_inum
    make proc_ns_operations work with struct ns_common * instead of void *
    switch the rest of proc_ns_operations to working with &...->ns
    netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
    make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
    common object embedded into various struct ....ns

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "As the merge window is still open, and this code was not as complex as
    I thought it might be. I'm pushing this in now.

    This will allow Thomas to debug his irq work for 3.20.

    This adds two new features:

    1) Allow traceopoints to be enabled right after mm_init().

    By passing in the trace_event= kernel command line parameter,
    tracepoints can be enabled at boot up. For debugging things like
    the initialization of interrupts, it is needed to have tracepoints
    enabled very early. People have asked about this before and this
    has been on my todo list. As it can be helpful for Thomas to debug
    his upcoming 3.20 IRQ work, I'm pushing this now. This way he can
    add tracepoints into the IRQ set up and have users enable them when
    things go wrong.

    2) Have the tracepoints printed via printk() (the console) when they
    are triggered.

    If the irq code locks up or reboots the box, having the tracepoint
    output go into the kernel ring buffer is useless for debugging.
    But being able to add the tp_printk kernel command line option
    along with the trace_event= option will have these tracepoints
    printed as they occur, and that can be really useful for debugging
    early lock up or reboot problems.

    This code is not that intrusive and it passed all my tests. Thomas
    tried them out too and it works for his needs.

    Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org"

    * tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Add tp_printk cmdline to have tracepoints go to printk()
    tracing: Move enabling tracepoints to just after rcu_init()

    Linus Torvalds
     

15 Dec, 2014

2 commits

  • Enabling tracepoints at boot up can be very useful. The tracepoint
    can be initialized right after RCU has been. There's no need to
    wait for the early_initcall() to be called. That's too late for some
    things that can use tracepoints for debugging. Move the logic to
    enable tracepoints out of the initcalls and into init/main.c to
    right after rcu_init().

    This also allows trace_printk() to be used early too.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
    Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org

    Reviewed-by: Paul E. McKenney
    Suggested-by: Thomas Gleixner
    Tested-by: Thomas Gleixner
    Acked-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Pull security layer updates from James Morris:
    "In terms of changes, there's general maintenance to the Smack,
    SELinux, and integrity code.

    The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
    which allows IMA appraisal to require signatures. Support for reading
    keys from rootfs before init is call is also added"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
    selinux: Remove security_ops extern
    security: smack: fix out-of-bounds access in smk_parse_smack()
    VFS: refactor vfs_read()
    ima: require signature based appraisal
    integrity: provide a hook to load keys when rootfs is ready
    ima: load x509 certificate from the kernel
    integrity: provide a function to load x509 certificate from the kernel
    integrity: define a new function integrity_read_file()
    Security: smack: replace kzalloc with kmem_cache for inode_smack
    Smack: Lock mode for the floor and hat labels
    ima: added support for new kernel cmdline parameter ima_template_fmt
    ima: allocate field pointers array on demand in template_desc_init_fields()
    ima: don't allocate a copy of template_fmt in template_desc_init_fields()
    ima: display template format in meas. list if template name length is zero
    ima: added error messages to template-related functions
    ima: use atomic bit operations to protect policy update interface
    ima: ignore empty and with whitespaces policy lines
    ima: no need to allocate entry for comment
    ima: report policy load status
    ima: use path names cache
    ...

    Linus Torvalds
     

14 Dec, 2014

1 commit

  • When we debug something, we'd like to insert some information to every
    page. For this purpose, we sometimes modify struct page itself. But,
    this has drawbacks. First, it requires re-compile. This makes us
    hesitate to use the powerful debug feature so development process is
    slowed down. And, second, sometimes it is impossible to rebuild the
    kernel due to third party module dependency. At third, system behaviour
    would be largely different after re-compile, because it changes size of
    struct page greatly and this structure is accessed by every part of
    kernel. Keeping this as it is would be better to reproduce errornous
    situation.

    This feature is intended to overcome above mentioned problems. This
    feature allocates memory for extended data per page in certain place
    rather than the struct page itself. This memory can be accessed by the
    accessor functions provided by this code. During the boot process, it
    checks whether allocation of huge chunk of memory is needed or not. If
    not, it avoids allocating memory at all. With this advantage, we can
    include this feature into the kernel in default and can avoid rebuild and
    solve related problems.

    Until now, memcg uses this technique. But, now, memcg decides to embed
    their variable to struct page itself and it's code to extend struct page
    has been removed. I'd like to use this code to develop debug feature, so
    this patch resurrect it.

    To help these things to work well, this patch introduces two callbacks for
    clients. One is the need callback which is mandatory if user wants to
    avoid useless memory allocation at boot-time. The other is optional, init
    callback, which is used to do proper initialization after memory is
    allocated. Detailed explanation about purpose of these functions is in
    code comment. Please refer it.

    Others are completely same with previous extension code in memcg.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

11 Dec, 2014

4 commits

  • Al Viro
     
  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro
     
  • If a user puts init=/whatever on the command line and /whatever can't be
    run, then the kernel will try a few default options before giving up. If
    init=/whatever came from a bootloader prompt, then this is unexpected but
    probably harmless. On the other hand, if it comes from a script (e.g. a
    tool like virtme or perhaps a future kselftest script), then the fallbacks
    are likely to exist, but they'll do the wrong thing. For example, they
    might unexpectedly invoke systemd.

    This adds a config option CONFIG_INIT_FALLBACK. If unset, then a failure
    to run the specified init= process be fatal.

    The tentative plan is to remove CONFIG_INIT_FALLBACK for 3.20.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Andy Lutomirski
    Cc: Rob Landley
    Cc: Chuck Ebbert
    Cc: Randy Dunlap
    Cc: Shuah Khan
    Cc: Frank Rowand
    Cc: Josh Triplett
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Memory cgroups used to have 5 per-page pointers. To allow users to
    disable that amount of overhead during runtime, those pointers were
    allocated in a separate array, with a translation layer between them and
    struct page.

    There is now only one page pointer remaining: the memcg pointer, that
    indicates which cgroup the page is associated with when charged. The
    complexity of runtime allocation and the runtime translation overhead is
    no longer justified to save that *potential* 0.19% of memory. With
    CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
    after the page->private member and doesn't even increase struct page,
    and then this patch actually saves space. Remaining users that care can
    still compile their kernels without CONFIG_MEMCG.

    text data bss dec hex filename
    8828345 1725264 983040 11536649 b00909 vmlinux.old
    8827425 1725264 966656 11519345 afc571 vmlinux.new

    [mhocko@suse.cz: update Documentation/cgroups/memory.txt]
    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Acked-by: David S. Miller
    Acked-by: KAMEZAWA Hiroyuki
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Acked-by: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

19 Nov, 2014

1 commit


18 Nov, 2014

1 commit

  • Keys can only be loaded once the rootfs is mounted. Initcalls
    are not suitable for that. This patch defines a special hook
    to load the x509 public keys onto the IMA keyring, before
    attempting to access any file. The keys are required for
    verifying the file's signature. The hook is called after the
    root filesystem is mounted and before the kernel calls 'init'.

    Changes in v3:
    * added more explanation to the patch description (Mimi)

    Changes in v2:
    * Hook renamed as 'integrity_load_keys()' to handle both IMA and EVM
    keys by integrity subsystem.
    * Hook patch moved after defining loading functions

    Signed-off-by: Dmitry Kasatkin
    Signed-off-by: Mimi Zohar

    Dmitry Kasatkin
     

11 Nov, 2014

1 commit

  • Currently if the user passes an invalid value on the kernel command line
    then the kernel will crash during argument parsing. On most systems this
    is very hard to debug because the console hasn't been initialized yet.

    This is a regression due to commit 51e158c12aca ("param: hand arguments
    after -- straight to init") which, in response to the systemd debug
    controversy, made it possible to explicitly pass arguments to init. To
    achieve this parse_args() was extended from simply returning an error
    code to returning a pointer. Regretably the new init args logic does not
    perform a proper validity check on the pointer resulting in a crash.

    This patch fixes the validity check. Should the check fail then no arguments
    will be passed to init. This is reasonable and matches how the kernel treats
    its own arguments (i.e. no error recovery).

    Signed-off-by: Daniel Thompson
    Cc: stable@vger.kernel.org
    Signed-off-by: Rusty Russell

    Daniel Thompson
     

14 Oct, 2014

1 commit


13 Oct, 2014

1 commit

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds