16 Dec, 2015

1 commit


20 Nov, 2015

1 commit


22 Jul, 2015

1 commit

  • commit b064a8fa77dfead647564c46ac8fc5b13bd1ab73 upstream.

    Commit 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before
    timekeeping_init()" moved the ACPI subsystem initialization,
    including the ACPI mode enabling, to an earlier point in the
    initialization sequence, to allow the timekeeping subsystem
    use ACPI early. Unfortunately, that resulted in boot regressions
    on some systems and the early ACPI initialization was moved toward
    its original position in the kernel initialization code by commit
    c4e1acbb35e4 "ACPI / init: Invoke early ACPI initialization later".

    However, that turns out to be insufficient, as boot is still broken
    on the Tyan S8812 mainboard.

    To fix that issue, split the ACPI early initialization code into
    two pieces so the majority of it still located in acpi_early_init()
    and the part switching over the platform into the ACPI mode goes into
    a new function, acpi_subsystem_init(), executed at the original early
    ACPI initialization spot.

    That fixes the Tyan S8812 boot problem, but still allows ACPI
    tables to be loaded earlier which is useful to the EFI code in
    efi_enter_virtual_mode().

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141
    Fixes: 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before timekeeping_init()"
    Reported-and-tested-by: Marius Tolzmann
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Toshi Kani
    Reviewed-by: Hanjun Guo
    Reviewed-by: Lee, Chun-Yi
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     

06 May, 2015

1 commit

  • Commit 283e7ad02 ("init: stricter checking of major:minor root=
    values") was so strict that it exposed the fact that a previously
    unknown device format was being used.

    Distributions like Ubuntu uses klibc (rather than uswsusp) to resume
    system from hibernation. klibc expressed the swap partition/file in
    the form of major:minor:offset. For example, 8:3:0 represents a swap
    partition in klibc, and klibc's resume process in initrd will finally
    echo 8:3:0 to /sys/power/resume for manually resuming. However, due
    to commit 283e7ad02's stricter checking, 8:3:0 will be treated as an
    invalid device format, and manual resuming from hibernation will fail.

    Fix this by adding support for devices with major:minor:offset format
    when resuming from hibernation.

    Reported-by: Prigent, Christophe
    Signed-off-by: Chen Yu
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Mike Snitzer

    Chen Yu
     

18 Apr, 2015

2 commits

  • Pull documentation updates from Jonathan Corbet:
    "Numerous fixes, the overdue removal of the i2o docs, some new Chinese
    translations, and, hopefully, the README fix that will end the flow of
    identical patches to that file"

    * tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (34 commits)
    Documentation/memcg: update memcg/kmem status
    Documentation: blackfin: Makefile: Typo building issue
    Documentation/vm/pagemap.txt: correct location of page-types tool
    Documentation/memory-barriers.txt: typo fix
    doc: Add guest_nice column to example output of `cat /proc/stat'
    Documentation/kernel-parameters: Move "eagerfpu" to its right place
    Documentation: gpio: Update ACPI part of the document to mention _DSD
    docs/completion.txt: Various tweaks and corrections
    doc: completion: context, scope and language fixes
    Documentation:Update Documentation/zh_CN/arm64/memory.txt
    Documentation:Update Documentation/zh_CN/arm64/booting.txt
    Documentation: Chinese translation of arm64/legacy_instructions.txt
    DocBook media: fix broken EIA hyperlink
    Documentation: tweak the maintainers entry
    README: Change gzip/bzip2 to xz compression format
    README: Update version number reference
    doc:pci: Fix typo in Documentation/PCI
    Documentation: drm: Use '->' when describing access through pointers.
    Documentation: Remove mentioning of block barriers
    Documentation/email-clients.txt: Fix one grammar mistake, add extra info about TB
    ...

    Linus Torvalds
     
  • Pull device mapper updates from Mike Snitzer:

    - the most extensive changes this cycle are the DM core improvements to
    add full blk-mq support to request-based DM.

    - disabled by default but user can opt-in with CONFIG_DM_MQ_DEFAULT
    - depends on some blk-mq changes from Jens' for-4.1/core branch so
    that explains why this pull is built on linux-block.git

    - update DM to use name_to_dev_t() rather than open-coding a less
    capable device parser.

    - includes a couple small improvements to name_to_dev_t() that offer
    stricter constraints that DM's code provided.

    - improvements to the dm-cache "mq" cache replacement policy.

    - a DM crypt crypt_ctr() error path fix and an async crypto deadlock
    fix

    - a small efficiency improvement for DM crypt decryption by leveraging
    immutable biovecs

    - add error handling modes for corrupted blocks to DM verity

    - a new "log-writes" DM target from Josef Bacik that is meant for file
    system developers to test file system integrity at particular points
    in the life of a file system

    - a few DM log userspace cleanups and fixes

    - a few Documentation fixes (for thin, cache, crypt and switch)

    * tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (34 commits)
    dm crypt: fix missing error code return from crypt_ctr error path
    dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
    dm crypt: leverage immutable biovecs when decrypting on read
    dm crypt: update URLs to new cryptsetup project page
    dm: add log writes target
    dm table: use bool function return values of true/false not 1/0
    dm verity: add error handling modes for corrupted blocks
    dm thin: remove stale 'trim' message documentation
    dm delay: use msecs_to_jiffies for time conversion
    dm log userspace base: fix compile warning
    dm log userspace transfer: match wait_for_completion_timeout return type
    dm table: fall back to getting device using name_to_dev_t()
    init: stricter checking of major:minor root= values
    init: export name_to_dev_t and mark name argument as const
    dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr
    dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
    dm: add full blk-mq support to request-based DM
    dm: impose configurable deadline for dm_request_fn's merge heuristic
    dm sysfs: introduce ability to add writable attributes
    dm: don't start current request if it would've merged with the previous
    ...

    Linus Torvalds
     

17 Apr, 2015

1 commit

  • PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
    THREAD_SIZE.

    E.g. architecture hexagon may have page size 1M and thread size 4096.
    This would lead to a division by zero in the calculation of max_threads.

    With this patch the buggy code is moved to a separate function
    set_max_threads. The error is not fixed.

    After fixing the problem in a separate patch the new function can be
    reused to adjust max_threads after adding or removing memory.

    Argument mempages of function fork_init() is removed as totalram_pages is
    an exported symbol.

    The creation of separate patches for refactoring to a new function and for
    fixing the logic was suggested by Ingo Molnar.

    Signed-off-by: Heinrich Schuchardt
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Guenter Roeck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinrich Schuchardt
     

16 Apr, 2015

3 commits

  • There are a lot of embedded systems that run most or all of their
    functionality in init, running as root:root. For these systems,
    supporting multiple users is not necessary.

    This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
    non-root users, non-root groups, and capabilities optional. It is enabled
    under CONFIG_EXPERT menu.

    When this symbol is not defined, UID and GID are zero in any possible case
    and processes always have all capabilities.

    The following syscalls are compiled out: setuid, setregid, setgid,
    setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
    getgroups, setfsuid, setfsgid, capget, capset.

    Also, groups.c is compiled out completely.

    In kernel/capability.c, capable function was moved in order to avoid
    adding two ifdef blocks.

    This change saves about 25 KB on a defconfig build. The most minimal
    kernels have total text sizes in the high hundreds of kB rather than
    low MB. (The 25k goes down a bit with allnoconfig, but not that much.

    The kernel was booted in Qemu. All the common functionalities work.
    Adding users/groups is not possible, failing with -ENOSYS.

    Bloat-o-meter output:
    add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Iulia Manda
    Reviewed-by: Josh Triplett
    Acked-by: Geert Uytterhoeven
    Tested-by: Paul E. McKenney
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Iulia Manda
     
  • In the kernel command-line, previously, root=1:2jakshflaksjdhfa would
    be accepted and interpreted just like root=1:2. This patch adds
    stricter checking so that additional characters after major:minor are
    rejected by root=.

    The goal of this change is to help in unifying DM's interpretation of
    its block device argument by using existing kernel code (name_to_dev_t).
    But DM rejects malformed major:minor pairs, it seems reasonable for
    root= to reject them as well.

    Signed-off-by: Dan Ehrenberg
    Signed-off-by: Mike Snitzer

    Dan Ehrenberg
     
  • DM will switch its device lookup code to using name_to_dev_t() so it
    must be exported. Also, the @name argument should be marked const.

    Signed-off-by: Dan Ehrenberg
    Signed-off-by: Mike Snitzer

    Dan Ehrenberg
     

15 Apr, 2015

5 commits

  • Merge first patchbomb from Andrew Morton:

    - arch/sh updates

    - ocfs2 updates

    - kernel/watchdog feature

    - about half of mm/

    * emailed patches from Andrew Morton : (122 commits)
    Documentation: update arch list in the 'memtest' entry
    Kconfig: memtest: update number of test patterns up to 17
    arm: add support for memtest
    arm64: add support for memtest
    memtest: use phys_addr_t for physical addresses
    mm: move memtest under mm
    mm, hugetlb: abort __get_user_pages if current has been oom killed
    mm, mempool: do not allow atomic resizing
    memcg: print cgroup information when system panics due to panic_on_oom
    mm: numa: remove migrate_ratelimited
    mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    mm: split ET_DYN ASLR from mmap ASLR
    s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    mm: expose arch_mmap_rnd when available
    s390: standardize mmap_rnd() usage
    powerpc: standardize mmap_rnd() usage
    mips: extract logic for mmap_rnd()
    arm64: standardize mmap_rnd() usage
    x86: standardize mmap_rnd() usage
    arm: factor out mmap ASLR into mmap_rnd
    ...

    Linus Torvalds
     
  • Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when
    I/O mappings with pud/pmd are enabled on the kernel.

    ioremap_huge_init() calls arch_ioremap_pud_supported() and
    arch_ioremap_pmd_supported() to initialize the capabilities at boot-time.

    A new kernel option "nohugeiomap" is also added, so that user can disable
    the huge I/O map capabilities when necessary.

    Signed-off-by: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Dave Hansen
    Cc: Robert Elliott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • Pull perf changes from Ingo Molnar:
    "Core kernel changes:

    - One of the more interesting features in this cycle is the ability
    to attach eBPF programs (user-defined, sandboxed bytecode executed
    by the kernel) to kprobes.

    This allows user-defined instrumentation on a live kernel image
    that can never crash, hang or interfere with the kernel negatively.
    (Right now it's limited to root-only, but in the future we might
    allow unprivileged use as well.)

    (Alexei Starovoitov)

    - Another non-trivial feature is per event clockid support: this
    allows, amongst other things, the selection of different clock
    sources for event timestamps traced via perf.

    This feature is sought by people who'd like to merge perf generated
    events with external events that were measured with different
    clocks:

    - cluster wide profiling

    - for system wide tracing with user-space events,

    - JIT profiling events

    etc. Matching perf tooling support is added as well, available via
    the -k, --clockid parameter to perf record et al.

    (Peter Zijlstra)

    Hardware enablement kernel changes:

    - x86 Intel Processor Trace (PT) support: which is a hardware tracer
    on steroids, available on Broadwell CPUs.

    The hardware trace stream is directly output into the user-space
    ring-buffer, using the 'AUX' data format extension that was added
    to the perf core to support hardware constraints such as the
    necessity to have the tracing buffer physically contiguous.

    This patch-set was developed for two years and this is the result.
    A simple way to make use of this is to use BTS tracing, the PT
    driver emulates BTS output - available via the 'intel_bts' PMU.
    More explicit PT specific tooling support is in the works as well -
    will probably be ready by 4.2.

    (Alexander Shishkin, Peter Zijlstra)

    - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
    feature of Intel Xeon CPUs that allows the measurement and
    allocation/partitioning of caches to individual workloads.

    These kernel changes expose the measurement side as a new PMU
    driver, which exposes various QoS related PMU events. (The
    partitioning change is work in progress and is planned to be merged
    as a cgroup extension.)

    (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
    Waskiewicz Jr)

    - x86 Intel Haswell LBR call stack support: this is a new Haswell
    feature that allows the hardware recording of call chains, plus
    tooling support. To activate this feature you have to enable it
    via the new 'lbr' call-graph recording option:

    perf record --call-graph lbr
    perf report

    or:

    perf top --call-graph lbr

    This hardware feature is a lot faster than stack walk or dwarf
    based unwinding, but has some limitations:

    - It reuses the current LBR facility, so LBR call stack and
    branch record can not be enabled at the same time.

    - It is only available for user-space callchains.

    (Yan, Zheng)

    - x86 Intel Broadwell CPU support and various event constraints and
    event table fixes for earlier models.

    (Andi Kleen)

    - x86 Intel HT CPUs event scheduling workarounds. This is a complex
    CPU bug affecting the SNB,IVB,HSW families that results in counter
    value corruption. The mitigation code is automatically enabled and
    is transparent.

    (Maria Dimakopoulou, Stephane Eranian)

    The perf tooling side had a ton of changes in this cycle as well, so
    I'm only able to list the user visible changes here, in addition to
    the tooling changes outlined above:

    User visible changes affecting all tools:

    - Improve support of compressed kernel modules (Jiri Olsa)
    - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
    - Bash completion for subcommands (Yunlong Song)
    - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
    - Support missing -f to override perf.data file ownership. (Yunlong Song)
    - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

    User visible changes in individual tools:

    'perf data':

    New tool for converting perf.data to other formats, initially
    for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
    Sebastian Siewior)

    'perf diff':

    Add --kallsyms option (David Ahern)

    'perf list':

    Allow listing events with 'tracepoint' prefix (Yunlong Song)

    Sort the output of the command (Yunlong Song)

    'perf kmem':

    Respect -i option (Jiri Olsa)

    Print big numbers using thousands' group (Namhyung Kim)

    Allow -v option (Namhyung Kim)

    Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

    Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

    Support unnamed union/structure members data collection. (Masami Hiramatsu)

    Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

    Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

    Support recording running/enabled time (Andi Kleen)

    'perf sched':

    Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

    Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

    Indicate which callchain entries are annotated in the
    TUI hists browser (Arnaldo Carvalho de Melo)

    Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

    Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
    cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
    events (Arnaldo Carvalho de Melo)

    'perf stat':

    Report unsupported events properly (Suzuki K. Poulose)

    Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

    Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

    Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

    Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

    Dump stack on segfaults (Arnaldo Carvalho de Melo)

    No need to explicitely enable evsels for workload started from perf, let it
    be enabled via perf_event_attr.enable_on_exec, removing some events that take
    place in the 'perf trace' before a workload is really started by it.
    (Arnaldo Carvalho de Melo)

    Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

    There's also been a ton of infrastructure work done, such as the
    split-out of perf's build system into tools/build/ and other changes -
    see the shortlog and changelog for details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
    perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
    perf evlist: Fix type for references to data_head/tail
    perf probe: Check the orphaned -x option
    perf probe: Support multiple probes on different binaries
    perf buildid-list: Fix segfault when show DSOs with hits
    perf tools: Fix cross-endian analysis
    perf tools: Fix error path to do closedir() when synthesizing threads
    perf tools: Fix synthesizing fork_event.ppid for non-main thread
    perf tools: Add 'I' event modifier for exclude_idle bit
    perf report: Don't call map__kmap if map is NULL.
    perf tests: Fix attr tests
    perf probe: Fix ARM 32 building error
    perf tools: Merge all perf_event_attr print functions
    perf record: Add clockid parameter
    perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
    perf sched replay: Support using -f to override perf.data file ownership
    perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
    perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
    perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
    perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
    ...

    Linus Torvalds
     
  • Pull RCU changes from Ingo Molnar:
    "The main changes in this cycle were:

    - changes permitting use of call_rcu() and friends very early in
    boot, for example, before rcu_init() is invoked.

    - add in-kernel API to enable and disable expediting of normal RCU
    grace periods.

    - improve RCU's handling of (hotplug-) outgoing CPUs.

    - NO_HZ_FULL_SYSIDLE fixes.

    - tiny-RCU updates to make it more tiny.

    - documentation updates.

    - miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
    cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
    cpu: Defer smpboot kthread unparking until CPU known to scheduler
    rcu: Associate quiescent-state reports with grace period
    rcu: Yet another fix for preemption and CPU hotplug
    rcu: Add diagnostics to grace-period cleanup
    rcutorture: Default to grace-period-initialization delays
    rcu: Handle outgoing CPUs on exit from idle loop
    cpu: Make CPU-offline idle-loop transition point more precise
    rcu: Eliminate ->onoff_mutex from rcu_node structure
    rcu: Process offlining and onlining only at grace-period start
    rcu: Move rcu_report_unblock_qs_rnp() to common code
    rcu: Rework preemptible expedited bitmask handling
    rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
    rcutorture: Enable slow grace-period initializations
    rcu: Provide diagnostic option to slow down grace-period initialization
    rcu: Detect stalls caused by failure to propagate up rcu_node tree
    rcu: Eliminate empty HOTPLUG_CPU ifdef
    rcu: Simplify sync_rcu_preempt_exp_init()
    rcu: Put all orphan-callback-related code under same comment
    rcu: Consolidate offline-CPU callback initialization
    ...

    Linus Torvalds
     
  • Pull trivial tree from Jiri Kosina:
    "Usual trivial tree updates. Nothing outstanding -- mostly printk()
    and comment fixes and unused identifier removals"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    goldfish: goldfish_tty_probe() is not using 'i' any more
    powerpc: Fix comment in smu.h
    qla2xxx: Fix printks in ql_log message
    lib: correct link to the original source for div64_u64
    si2168, tda10071, m88ds3103: Fix firmware wording
    usb: storage: Fix printk in isd200_log_config()
    qla2xxx: Fix printk in qla25xx_setup_mode
    init/main: fix reset_device comment
    ipwireless: missing assignment
    goldfish: remove unreachable line of code
    coredump: Fix do_coredump() comment
    stacktrace.h: remove duplicate declaration task_struct
    smpboot.h: Remove unused function prototype
    treewide: Fix typo in printk messages
    treewide: Fix typo in printk messages
    mod_devicetable: fix comment for match_flags

    Linus Torvalds
     

13 Apr, 2015

1 commit

  • Currently, smpboot_unpark_threads() is invoked before the incoming CPU
    has been added to the scheduler's runqueue structures. This might
    potentially cause the unparked kthread to run on the wrong CPU, since the
    correct CPU isn't fully set up yet.

    That causes a sporadic, hard to debug boot crash triggering on some
    systems, reported by Borislav Petkov, and bisected down to:

    2a442c9c6453 ("x86: Use common outgoing-CPU-notification code")

    This patch places smpboot_unpark_threads() in a CPU hotplug
    notifier with priority set so that these kthreads are unparked just after
    the CPU has been added to the runqueues.

    Reported-and-tested-by: Borislav Petkov
    Signed-off-by: Paul E. McKenney
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

11 Apr, 2015

1 commit


02 Apr, 2015

1 commit

  • So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only
    dependency, it also depends on the tracing infrastructure and on kprobes,
    without which it will fail to build with:

    In file included from kernel/trace/bpf_trace.c:14:0:
    kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’:
    kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’
    unsigned int val = current->trace_recursion;
    [...]

    It took quite some time to trigger this build failure, because right now
    BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL
    more configurable, not just under CONFIG_EXPERT.

    If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing
    gateway as well.

    We might want to make this an interactive option later on, although
    I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of
    an indicator that the user wants BPF support.

    Cc: Alexei Starovoitov
    Cc: Andrew Morton
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

07 Mar, 2015

1 commit


05 Mar, 2015

1 commit

  • Now we call ss->bind() in cgroup_init(), so cgroup_init() will
    call cpuset_bind() and then the latter will access top_cpuset's
    cpumask, which is NULL, because cpuset_init() is called after
    cgroup_init()

    The simplest fix is to swap cgroup_init() and cpuset_init().

    Cc: Vladimir Davydov
    Fixes: 295458e67284 ("cgroup: call cgroup_subsys->bind on cgroup subsys initialization")
    Reported by: Ming Lei
    Signed-off-by: Zefan Li
    Signed-off-by: Tejun Heo
    Acked-by: Vladimir Davydov

    Zefan Li
     

27 Feb, 2015

1 commit

  • This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
    that emulates a very early boot rcu_expedite_gp(). A late-boot
    call to rcu_end_inkernel_boot() will provide the corresponding
    rcu_unexpedite_gp(). The late-boot call to rcu_end_inkernel_boot()
    should be made just before init is spawned.

    According to Arjan:

    > To show the boot time, I'm using the timestamp of the "Write protecting"
    > line, that's pretty much the last thing we print prior to ring 3 execution.
    >
    > A kernel with default RCU behavior (inside KVM, only virtual devices)
    > looks like this:
    >
    > [ 0.038724] Write protecting the kernel read-only data: 10240k
    >
    > a kernel with expedited RCU (using the command line option, so that I
    > don't have to recompile between measurements and thus am completely
    > oranges-to-oranges)
    >
    > [ 0.031768] Write protecting the kernel read-only data: 10240k
    >
    > which, in percentage, is an 18% improvement.

    Reported-by: Arjan van de Ven
    Signed-off-by: Paul E. McKenney
    Tested-by: Arjan van de Ven

    Paul E. McKenney
     

20 Feb, 2015

2 commits

  • Pull kconfig updates from Michal Marek:
    "Yann E Morin was supposed to take over kconfig maintainership, but
    this hasn't happened. So I'm sending a few kconfig patches that I
    collected:

    - Fix for missing va_end in kconfig
    - merge_config.sh displays used if given too few arguments
    - s/boolean/bool/ in Kconfig files for consistency, with the plan to
    only support bool in the future"

    * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kconfig: use va_end to match corresponding va_start
    merge_config.sh: Display usage if given too few arguments
    kconfig: use bool instead of boolean for type definition attributes

    Linus Torvalds
     
  • Pull misc kbuild changes from Michal Marek:
    "Just a few non-critical kbuild changes:

    - builddeb adds the actual distribution name in the changelog
    - documentation fixes"

    * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kbuild: trivial - fix the help doc of CONFIG_CC_OPTIMIZE_FOR_SIZE
    kbuild: Update documentation of clean-files and clean-dirs
    builddeb: Try to determine distribution
    builddeb: Update year and git repository URL in debian/copyright

    Linus Torvalds
     

14 Feb, 2015

1 commit

  • CONFIG_INIT_FALLBACK adds config bloat without an obvious use case that
    makes it worth keeping around. Delete it.

    Signed-off-by: Andy Lutomirski
    Cc: Rusty Russell
    Cc: Chuck Ebbert
    Cc: Frank Rowand
    Reviewed-by: Josh Triplett
    Cc: Randy Dunlap
    Cc: Rob Landley
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

10 Feb, 2015

1 commit

  • Pull x86 APIC updates from Ingo Molnar:
    "Continued fallout of the conversion of the x86 IRQ code to the
    hierarchical irqdomain framework: more cleanups, simplifications,
    memory allocation behavior enhancements, mainly in the interrupt
    remapping and APIC code"

    * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    x86, init: Fix UP boot regression on x86_64
    iommu/amd: Fix irq remapping detection logic
    x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC
    x86: Consolidate boot cpu timer setup
    x86/apic: Reuse apic_bsp_setup() for UP APIC setup
    x86/smpboot: Sanitize uniprocessor init
    x86/smpboot: Move apic init code to apic.c
    init: Get rid of x86isms
    x86/apic: Move apic_init_uniprocessor code
    x86/smpboot: Cleanup ioapic handling
    x86/apic: Sanitize ioapic handling
    x86/ioapic: Add proper checks to setp/enable_IO_APIC()
    x86/ioapic: Provide stub functions for IOAPIC%3Dn
    x86/smpboot: Move smpboot inlines to code
    x86/x2apic: Use state information for disable
    x86/x2apic: Split enable and setup function
    x86/x2apic: Disable x2apic from nox2apic setup
    x86/x2apic: Add proper state tracking
    x86/x2apic: Clarify remapping mode for x2apic enablement
    x86/x2apic: Move code in conditional region
    ...

    Linus Torvalds
     

22 Jan, 2015

1 commit

  • The UP local API support can be set up from an early initcall. No need
    for horrible hackery in the init code.

    Signed-off-by: Thomas Gleixner
    Cc: Jiang Liu
    Cc: Joerg Roedel
    Cc: Tony Luck
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/20150115211703.827943883@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

16 Jan, 2015

2 commits

  • …rcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD

    doc.2015.01.07a: Documentation updates.
    fixes.2015.01.15a: Miscellaneous fixes.
    preempt.2015.01.06a: Changes to handling of lists of preempted tasks.
    srcu.2015.01.06a: SRCU updates.
    stall.2015.01.16a: RCU CPU stall-warning updates and fixes.
    torture.2015.01.11a: RCU torture-test updates and fixes.

    Paul E. McKenney
     
  • Recent testing has shown that under heavy load, running RCU's grace-period
    kthreads at real-time priority can improve performance (according to 0day
    test robot) and reduce the incidence of RCU CPU stall warnings. However,
    most systems do just fine with the default non-realtime priorities for
    these kthreads, and it does not make sense to expose the entire user
    base to any risk stemming from this change, given that this change is
    of use only to a few users running extremely heavy workloads.

    Therefore, this commit allows users to specify realtime priorities
    for the grace-period kthreads, but leaves them running SCHED_OTHER
    by default. The realtime priority may be specified at build time
    via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
    rcutree.kthread_prio parameter. Either way, 0 says to continue the
    default SCHED_OTHER behavior and values from 1-99 specify that priority
    of SCHED_FIFO behavior. Note that a value of 0 is not permitted when
    the RCU_BOOST Kconfig parameter is specified.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

08 Jan, 2015

1 commit


07 Jan, 2015

3 commits

  • Support for keyword 'boolean' will be dropped later on.

    No functional change.

    Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.com
    Signed-off-by: Christoph Jaeger
    Signed-off-by: Michal Marek

    Christoph Jaeger
     
  • SRCU is not necessary to be compiled by default in all cases. For tinification
    efforts not compiling SRCU unless necessary is desirable.

    The current patch tries to make compiling SRCU optional by introducing a new
    Kconfig option CONFIG_SRCU which is selected when any of the components making
    use of SRCU are selected.

    If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

    text data bss dec hex filename
    2007 0 0 2007 7d7 kernel/rcu/srcu.o

    Size of arch/powerpc/boot/zImage changes from

    text data bss dec hex filename
    831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
    829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after

    so the savings are about ~2000 bytes.

    Signed-off-by: Pranith Kumar
    CC: Paul E. McKenney
    CC: Josh Triplett
    CC: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]

    Pranith Kumar
     
  • The 48a7639ce80c ("rcu: Make callers awaken grace-period kthread")
    removed the irq_work_queue(), so the TREE_RCU doesn't need
    irq work any more. This commit therefore updates RCU's Kconfig and

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Paul E. McKenney

    Lai Jiangshan
     

17 Dec, 2014

3 commits

  • If mount flags don't have MS_RDONLY, iso9660 returns EACCES without actually
    checking if it's an iso image.

    This tricks mount_block_root() into retrying with MS_RDONLY. This results
    in a read-only root despite the "rw" boot parameter if the actual
    filesystem was checked after iso9660.

    I believe the behavior of iso9660 is okay, while that of mount_block_root()
    is not. It should rather try all types without MS_RDONLY and only then
    retry with MS_RDONLY.

    This change also makes the code more robust against the case when EACCES is
    returned despite MS_RDONLY, which would've resulted in a lockup.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Al Viro

    Miklos Szeredi
     
  • Pull vfs pile #2 from Al Viro:
    "Next pile (and there'll be one or two more).

    The large piece in this one is getting rid of /proc/*/ns/* weirdness;
    among other things, it allows to (finally) make nameidata completely
    opaque outside of fs/namei.c, making for easier further cleanups in
    there"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    coda_venus_readdir(): use file_inode()
    fs/namei.c: fold link_path_walk() call into path_init()
    path_init(): don't bother with LOOKUP_PARENT in argument
    fs/namei.c: new helper (path_cleanup())
    path_init(): store the "base" pointer to file in nameidata itself
    make default ->i_fop have ->open() fail with ENXIO
    make nameidata completely opaque outside of fs/namei.c
    kill proc_ns completely
    take the targets of /proc/*/ns/* symlinks to separate fs
    bury struct proc_ns in fs/proc
    copy address of proc_ns_ops into ns_common
    new helpers: ns_alloc_inum/ns_free_inum
    make proc_ns_operations work with struct ns_common * instead of void *
    switch the rest of proc_ns_operations to working with &...->ns
    netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
    make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
    common object embedded into various struct ....ns

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "As the merge window is still open, and this code was not as complex as
    I thought it might be. I'm pushing this in now.

    This will allow Thomas to debug his irq work for 3.20.

    This adds two new features:

    1) Allow traceopoints to be enabled right after mm_init().

    By passing in the trace_event= kernel command line parameter,
    tracepoints can be enabled at boot up. For debugging things like
    the initialization of interrupts, it is needed to have tracepoints
    enabled very early. People have asked about this before and this
    has been on my todo list. As it can be helpful for Thomas to debug
    his upcoming 3.20 IRQ work, I'm pushing this now. This way he can
    add tracepoints into the IRQ set up and have users enable them when
    things go wrong.

    2) Have the tracepoints printed via printk() (the console) when they
    are triggered.

    If the irq code locks up or reboots the box, having the tracepoint
    output go into the kernel ring buffer is useless for debugging.
    But being able to add the tp_printk kernel command line option
    along with the trace_event= option will have these tracepoints
    printed as they occur, and that can be really useful for debugging
    early lock up or reboot problems.

    This code is not that intrusive and it passed all my tests. Thomas
    tried them out too and it works for his needs.

    Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org"

    * tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Add tp_printk cmdline to have tracepoints go to printk()
    tracing: Move enabling tracepoints to just after rcu_init()

    Linus Torvalds
     

15 Dec, 2014

2 commits

  • Enabling tracepoints at boot up can be very useful. The tracepoint
    can be initialized right after RCU has been. There's no need to
    wait for the early_initcall() to be called. That's too late for some
    things that can use tracepoints for debugging. Move the logic to
    enable tracepoints out of the initcalls and into init/main.c to
    right after rcu_init().

    This also allows trace_printk() to be used early too.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
    Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org

    Reviewed-by: Paul E. McKenney
    Suggested-by: Thomas Gleixner
    Tested-by: Thomas Gleixner
    Acked-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Pull security layer updates from James Morris:
    "In terms of changes, there's general maintenance to the Smack,
    SELinux, and integrity code.

    The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
    which allows IMA appraisal to require signatures. Support for reading
    keys from rootfs before init is call is also added"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
    selinux: Remove security_ops extern
    security: smack: fix out-of-bounds access in smk_parse_smack()
    VFS: refactor vfs_read()
    ima: require signature based appraisal
    integrity: provide a hook to load keys when rootfs is ready
    ima: load x509 certificate from the kernel
    integrity: provide a function to load x509 certificate from the kernel
    integrity: define a new function integrity_read_file()
    Security: smack: replace kzalloc with kmem_cache for inode_smack
    Smack: Lock mode for the floor and hat labels
    ima: added support for new kernel cmdline parameter ima_template_fmt
    ima: allocate field pointers array on demand in template_desc_init_fields()
    ima: don't allocate a copy of template_fmt in template_desc_init_fields()
    ima: display template format in meas. list if template name length is zero
    ima: added error messages to template-related functions
    ima: use atomic bit operations to protect policy update interface
    ima: ignore empty and with whitespaces policy lines
    ima: no need to allocate entry for comment
    ima: report policy load status
    ima: use path names cache
    ...

    Linus Torvalds
     

14 Dec, 2014

1 commit

  • When we debug something, we'd like to insert some information to every
    page. For this purpose, we sometimes modify struct page itself. But,
    this has drawbacks. First, it requires re-compile. This makes us
    hesitate to use the powerful debug feature so development process is
    slowed down. And, second, sometimes it is impossible to rebuild the
    kernel due to third party module dependency. At third, system behaviour
    would be largely different after re-compile, because it changes size of
    struct page greatly and this structure is accessed by every part of
    kernel. Keeping this as it is would be better to reproduce errornous
    situation.

    This feature is intended to overcome above mentioned problems. This
    feature allocates memory for extended data per page in certain place
    rather than the struct page itself. This memory can be accessed by the
    accessor functions provided by this code. During the boot process, it
    checks whether allocation of huge chunk of memory is needed or not. If
    not, it avoids allocating memory at all. With this advantage, we can
    include this feature into the kernel in default and can avoid rebuild and
    solve related problems.

    Until now, memcg uses this technique. But, now, memcg decides to embed
    their variable to struct page itself and it's code to extend struct page
    has been removed. I'd like to use this code to develop debug feature, so
    this patch resurrect it.

    To help these things to work well, this patch introduces two callbacks for
    clients. One is the need callback which is mandatory if user wants to
    avoid useless memory allocation at boot-time. The other is optional, init
    callback, which is used to do proper initialization after memory is
    allocated. Detailed explanation about purpose of these functions is in
    code comment. Please refer it.

    Others are completely same with previous extension code in memcg.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

11 Dec, 2014

2 commits

  • Al Viro
     
  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro