04 Jul, 2015

3 commits

  • Pull scheduler fixes from Ingo Molnar:
    "Debug info and other statistics fixes and related enhancements"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/numa: Fix numa balancing stats in /proc/pid/sched
    sched/numa: Show numa_group ID in /proc/sched_debug task listings
    sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
    sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y
    sched/stat: Simplify the sched_info accounting dependency

    Linus Torvalds
     
  • Pull max log buf size increase from Ingo Molnar:
    "Ran into this limit recently, so increase it by an order of magnitude"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    printk: Increase maximum CONFIG_LOG_BUF_SHIFT from 21 to 25

    Linus Torvalds
     
  • Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
    sched_info, which results in ugly #if clauses.

    Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
    switch, selected by both.

    Signed-off-by: Naveen N. Rao
    Cc: Balbir Singh
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: a.p.zijlstra@chello.nl
    Cc: ricklind@us.ibm.com
    Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Naveen N. Rao
     

02 Jul, 2015

2 commits

  • Merge third patchbomb from Andrew Morton:

    - the rest of MM

    - scripts/gdb updates

    - ipc/ updates

    - lib/ updates

    - MAINTAINERS updates

    - various other misc things

    * emailed patches from Andrew Morton : (67 commits)
    genalloc: rename of_get_named_gen_pool() to of_gen_pool_get()
    genalloc: rename dev_get_gen_pool() to gen_pool_get()
    x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit
    MAINTAINERS: add zpool
    MAINTAINERS: BCACHE: Kent Overstreet has changed email address
    MAINTAINERS: move Jens Osterkamp to CREDITS
    MAINTAINERS: remove unused nbd.h pattern
    MAINTAINERS: update brcm gpio filename pattern
    MAINTAINERS: update brcm dts pattern
    MAINTAINERS: update sound soc intel patterns
    MAINTAINERS: remove website for paride
    MAINTAINERS: update Emulex ocrdma email addresses
    bcache: use kvfree() in various places
    libcxgbi: use kvfree() in cxgbi_free_big_mem()
    target: use kvfree() in session alloc and free
    IB/ehca: use kvfree() in ipz_queue_{cd}tor()
    drm/nouveau/gem: use kvfree() in u_free()
    drm: use kvfree() in drm_free_large()
    cxgb4: use kvfree() in t4_free_mem()
    cxgb3: use kvfree() in cxgb_free_mem()
    ...

    Linus Torvalds
     
  • Pull module updates from Rusty Russell:
    "Main excitement here is Peter Zijlstra's lockless rbtree optimization
    to speed module address lookup. He found some abusers of the module
    lock doing that too.

    A little bit of parameter work here too; including Dan Streetman's
    breaking up the big param mutex so writing a parameter can load
    another module (yeah, really). Unfortunately that broke the usual
    suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
    appended too"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
    modules: only use mod->param_lock if CONFIG_MODULES
    param: fix module param locks when !CONFIG_SYSFS.
    rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
    module: add per-module param_lock
    module: make perm const
    params: suppress unused variable error, warn once just in case code changes.
    modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
    kernel/module.c: avoid ifdefs for sig_enforce declaration
    kernel/workqueue.c: remove ifdefs over wq_power_efficient
    kernel/params.c: export param_ops_bool_enable_only
    kernel/params.c: generalize bool_enable_only
    kernel/module.c: use generic module param operaters for sig_enforce
    kernel/params: constify struct kernel_param_ops uses
    sysfs: tightened sysfs permission checks
    module: Rework module_addr_{min,max}
    module: Use __module_address() for module_address_lookup()
    module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
    module: Optimize __module_address() using a latched RB-tree
    rbtree: Implement generic latch_tree
    seqlock: Introduce raw_read_seqcount_latch()
    ...

    Linus Torvalds
     

01 Jul, 2015

2 commits

  • So I tried to some kernel debugging that produced a ton of kernel messages
    on a big box, and wanted to save them all: but CONFIG_LOG_BUF_SHIFT maxes
    out at 21 (2 MB).

    Increase it to 25 (32 MB).

    This does not affect any existing config or defaults.

    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Waiman Long reported that 24TB machines hit OOM during basic setup when
    struct page initialisation was deferred. One approach is to initialise
    memory on demand but it interferes with page allocator paths. This patch
    creates dedicated threads to initialise memory before basic setup. It
    then blocks on a rw_semaphore until completion as a wait_queue and counter
    is overkill. This may be slower to boot but it's simplier overall and
    also gets rid of a section mangling which existed so kswapd could do the
    initialisation.

    [akpm@linux-foundation.org: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast]
    Signed-off-by: Mel Gorman
    Cc: Waiman Long
    Cc: Dave Hansen
    Cc: Scott Norton
    Tested-by: Daniel J Blueman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

27 Jun, 2015

3 commits

  • Pull cgroup updates from Tejun Heo:

    - threadgroup_lock got reorganized so that its users can pick the
    actual locking mechanism to use. Its only user - cgroups - is
    updated to use a percpu_rwsem instead of per-process rwsem.

    This makes things a bit lighter on hot paths and allows cgroups to
    perform and fail multi-task (a process) migrations atomically.
    Multi-task migrations are used in several places including the
    unified hierarchy.

    - Delegation rule and documentation added to unified hierarchy. This
    will likely be the last interface update from the cgroup core side
    for unified hierarchy before lifting the devel mask.

    - Some groundwork for the pids controller which is scheduled to be
    merged in the coming devel cycle.

    * 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: add delegation section to unified hierarchy documentation
    cgroup: require write perm on common ancestor when moving processes on the default hierarchy
    cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
    kernfs: make kernfs_get_inode() public
    MAINTAINERS: add a cgroup core co-maintainer
    cgroup: fix uninitialised iterator in for_each_subsys_which
    cgroup: replace explicit ss_mask checking with for_each_subsys_which
    cgroup: use bitmask to filter for_each_subsys
    cgroup: add seq_file forward declaration for struct cftype
    cgroup: simplify threadgroup locking
    sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
    sched, cgroup: reorganize threadgroup locking
    cgroup: switch to unsigned long for bitmasks
    cgroup: reorganize include/linux/cgroup.h
    cgroup: separate out include/linux/cgroup-defs.h
    cgroup: fix some comment typos

    Linus Torvalds
     
  • Pull driver core updates from Greg KH:
    "Here is the driver core / firmware changes for 4.2-rc1.

    A number of small changes all over the place in the driver core, and
    in the firmware subsystem. Nothing really major, full details in the
    shortlog. Some of it is a bit of churn, given that the platform
    driver probing changes was found to not work well, so they were
    reverted.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
    Revert "base/platform: Only insert MEM and IO resources"
    Revert "base/platform: Continue on insert_resource() error"
    Revert "of/platform: Use platform_device interface"
    Revert "base/platform: Remove code duplication"
    firmware: add missing kfree for work on async call
    fs: sysfs: don't pass count == 0 to bin file readers
    base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
    base/platform: Remove code duplication
    of/platform: Use platform_device interface
    base/platform: Continue on insert_resource() error
    base/platform: Only insert MEM and IO resources
    firmware: use const for remaining firmware names
    firmware: fix possible use after free on name on asynchronous request
    firmware: check for file truncation on direct firmware loading
    firmware: fix __getname() missing failure check
    drivers: of/base: move of_init to driver_init
    drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
    sysfs: disambiguate between "error code" and "failure" in comments
    driver-core: fix build for !CONFIG_MODULES
    driver-core: make __device_attach() static
    ...

    Linus Torvalds
     
  • Merge second patchbomb from Andrew Morton:

    - most of the rest of MM

    - lots of misc things

    - procfs updates

    - printk feature work

    - updates to get_maintainer, MAINTAINERS, checkpatch

    - lib/ updates

    * emailed patches from Andrew Morton : (96 commits)
    exit,stats: /* obey this comment */
    coredump: add __printf attribute to cn_*printf functions
    coredump: use from_kuid/kgid when formatting corename
    fs/reiserfs: remove unneeded cast
    NILFS2: support NFSv2 export
    fs/befs/btree.c: remove unneeded initializations
    fs/minix: remove unneeded cast
    init/do_mounts.c: add create_dev() failure log
    kasan: remove duplicate definition of the macro KASAN_FREE_PAGE
    fs/efs: femove unneeded cast
    checkpatch: emit "NOTE: " message only once after multiple files
    checkpatch: emit an error when there's a diff in a changelog
    checkpatch: validate MODULE_LICENSE content
    checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY
    checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
    checkpatch: fix processing of MEMSET issues
    checkpatch: suggest using ether_addr_equal*()
    checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files
    checkpatch: remove local from codespell path
    checkpatch: add --showfile to allow input via pipe to show filenames
    ...

    Linus Torvalds
     

26 Jun, 2015

3 commits

  • If create_dev() function fails to create the root mount device
    (/dev/root), then it goes to panic as root device not found but there is
    no printk in this case. So I have added the log in case it fails to
    create the root device. It will help in debugging.

    [akpm@linux-foundation.org: simplify printk(), use pr_emerg(), display errno]
    Signed-off-by: Vishnu Pratap Singh
    Acked-by: Pavel Machek
    Cc: Paul Gortmaker
    Cc: Mike Snitzer
    Cc: Dan Ehrenberg
    Cc: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vishnu Pratap Singh
     
  • Commit 818411616baf ("fs, proc: introduce /proc//task//children
    entry") introduced the children entry for checkpoint restore and the
    file is only available on kernels configured with CONFIG_EXPERT and
    CONFIG_CHECKPOINT_RESTORE.

    This is available in most distributions (Fedora, Debian, Ubuntu, CoreOS)
    because they usually enable CONFIG_EXPERT and CONFIG_CHECKPOINT_RESTORE.
    But Arch does not enable CONFIG_EXPERT or CONFIG_CHECKPOINT_RESTORE.

    However, the children proc file is useful outside of checkpoint restore.
    I would like to use it in rkt. The rkt process exec() another program
    it does not control, and that other program will fork()+exec() a child
    process. I would like to find the pid of the child process from an
    external tool without iterating in /proc over all processes to find
    which one has a parent pid equal to rkt.

    This commit introduces CONFIG_PROC_CHILDREN and makes
    CONFIG_CHECKPOINT_RESTORE select it. This allows enabling
    /proc//task//children without needing to enable
    CONFIG_CHECKPOINT_RESTORE and CONFIG_EXPERT.

    Alban tested that /proc//task//children is present when the
    kernel is configured with CONFIG_PROC_CHILDREN=y but without
    CONFIG_CHECKPOINT_RESTORE

    Signed-off-by: Iago López Galeiras
    Tested-by: Alban Crequy
    Reviewed-by: Cyrill Gorcunov
    Cc: Oleg Nesterov
    Cc: Kees Cook
    Cc: Pavel Emelyanov
    Cc: Serge Hallyn
    Cc: KAMEZAWA Hiroyuki
    Cc: Alexander Viro
    Cc: Andy Lutomirski
    Cc: Djalal Harouni
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Iago López Galeiras
     
  • Pull cgroup writeback support from Jens Axboe:
    "This is the big pull request for adding cgroup writeback support.

    This code has been in development for a long time, and it has been
    simmering in for-next for a good chunk of this cycle too. This is one
    of those problems that has been talked about for at least half a
    decade, finally there's a solution and code to go with it.

    Also see last weeks writeup on LWN:

    http://lwn.net/Articles/648292/"

    * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
    writeback, blkio: add documentation for cgroup writeback support
    vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
    writeback: do foreign inode detection iff cgroup writeback is enabled
    v9fs: fix error handling in v9fs_session_init()
    bdi: fix wrong error return value in cgwb_create()
    buffer: remove unusued 'ret' variable
    writeback: disassociate inodes from dying bdi_writebacks
    writeback: implement foreign cgroup inode bdi_writeback switching
    writeback: add lockdep annotation to inode_to_wb()
    writeback: use unlocked_inode_to_wb transaction in inode_congested()
    writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
    writeback: implement [locked_]inode_to_wb_and_lock_list()
    writeback: implement foreign cgroup inode detection
    writeback: make writeback_control track the inode being written back
    writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
    mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
    writeback: implement memcg writeback domain based throttling
    writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
    writeback: implement memcg wb_domain
    writeback: update wb_over_bg_thresh() to use wb_domain aware operations
    ...

    Linus Torvalds
     

24 Jun, 2015

1 commit

  • Pull power management and ACPI updates from Rafael Wysocki:
    "The rework of backlight interface selection API from Hans de Goede
    stands out from the number of commits and the number of affected
    places perspective. The cpufreq core fixes from Viresh Kumar are
    quite significant too as far as the number of commits goes and because
    they should reduce CPU online/offline overhead quite a bit in the
    majority of cases.

    From the new featues point of view, the ACPICA update (to upstream
    revision 20150515) adding support for new ACPI 6 material to ACPICA is
    the one that matters the most as some new significant features will be
    based on it going forward. Also included is an update of the ACPI
    device power management core to follow ACPI 6 (which in turn reflects
    the Windows' device PM implementation), a PM core extension to support
    wakeup interrupts in a more generic way and support for the ACPI _CCA
    device configuration object.

    The rest is mostly fixes and cleanups all over and some documentation
    updates, including new DT bindings for Operating Performance Points.

    There is one fix for a regression introduced in the 4.1 cycle, but it
    adds quite a number of lines of code, it wasn't really ready before
    Thursday and you were on vacation, so I refrained from pushing it on
    the last minute for 4.1.

    Specifics:

    - ACPICA update to upstream revision 20150515 including basic support
    for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO,
    XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM,
    FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI,
    _MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore,
    Lv Zheng).

    - ACPI device power management core code update to follow ACPI 6
    which reflects the ACPI device power management implementation in
    Windows (Rafael J Wysocki).

    - rework of the backlight interface selection logic to reduce the
    number of kernel command line options and improve the handling of
    DMI quirks that may be involved in that and to make the code
    generally more straightforward (Hans de Goede).

    - fixes for the ACPI Embedded Controller (EC) driver related to the
    handling of EC transactions (Lv Zheng).

    - fix for a regression related to the ACPI resources management and
    resulting from a recent change of ACPI initialization code ordering
    (Rafael J Wysocki).

    - fix for a system initialization regression related to ACPI
    introduced during the 3.14 cycle and caused by running the code
    that switches the platform over to the ACPI mode too early in the
    initialization sequence (Rafael J Wysocki).

    - support for the ACPI _CCA device configuration object related to
    DMA cache coherence (Suravee Suthikulpanit).

    - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).

    - ACPI battery driver cleanups (Luis Henriques, Mathias Krause).

    - ACPI processor driver cleanups (Hanjun Guo).

    - cleanups and documentation update related to the ACPI device
    properties interface based on _DSD (Rafael J Wysocki).

    - ACPI device power management fixes (Rafael J Wysocki).

    - assorted cleanups related to ACPI (Dominik Brodowski, Fabian
    Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).

    - fix for a long-standing issue causing General Protection Faults to
    be generated occasionally on return to user space after resume from
    ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).

    - fix to make the suspend core code return -EBUSY consistently in all
    cases when system suspend is aborted due to wakeup detection (Ruchi
    Kandoi).

    - support for automated device wakeup IRQ handling allowing drivers
    to make their PM support more starightforward (Tony Lindgren).

    - new tracepoints for suspend-to-idle tracing and rework of the
    prepare/complete callbacks tracing in the PM core (Todd E Brandt,
    Rafael J Wysocki).

    - wakeup sources framework enhancements (Jin Qian).

    - new macro for noirq system PM callbacks (Grygorii Strashko).

    - assorted cleanups related to system suspend (Rafael J Wysocki).

    - cpuidle core cleanups to make the code more efficient (Rafael J
    Wysocki).

    - powernv/pseries cpuidle driver update (Shilpasri G Bhat).

    - cpufreq core fixes related to CPU online/offline that should reduce
    the overhead of these operations quite a bit, unless the CPU in
    question is physically going away (Viresh Kumar, Saravana Kannan).

    - serialization of cpufreq governor callbacks to avoid race
    conditions in some cases (Viresh Kumar).

    - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
    Bhargava, Joe Konno).

    - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
    Holla, Felipe Balbi, Tang Yuantian).

    - assorted cleanups in cpufreq drivers and core (Shailendra Verma,
    Fabian Frederick, Wang Long).

    - new Device Tree bindings for representing Operating Performance
    Points (Viresh Kumar).

    - updates for the common clock operations support code in the PM core
    (Rajendra Nayak, Geert Uytterhoeven).

    - PM domains core code update (Geert Uytterhoeven).

    - Intel Knights Landing support for the RAPL (Running Average Power
    Limit) power capping driver (Dasaratharaman Chandramouli).

    - fixes related to the floor frequency setting on Atom SoCs in the
    RAPL power capping driver (Ajay Thomas).

    - runtime PM framework documentation update (Ben Dooks).

    - cpupower tool fix (Herton R Krzesinski)"

    * tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits)
    cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
    x86: Load __USER_DS into DS/ES after resume
    PM / OPP: Add binding for 'opp-suspend'
    PM / OPP: Allow multiple OPP tables to be passed via DT
    PM / OPP: Add new bindings to address shortcomings of existing bindings
    ACPI: Constify ACPI device IDs in documentation
    ACPI / enumeration: Document the rules regarding the PRP0001 device ID
    ACPI / video: Make acpi_video_unregister_backlight() private
    acpi-video-detect: Remove old API
    toshiba-acpi: Port to new backlight interface selection API
    thinkpad-acpi: Port to new backlight interface selection API
    sony-laptop: Port to new backlight interface selection API
    samsung-laptop: Port to new backlight interface selection API
    msi-wmi: Port to new backlight interface selection API
    msi-laptop: Port to new backlight interface selection API
    intel-oaktrail: Port to new backlight interface selection API
    ideapad-laptop: Port to new backlight interface selection API
    fujitsu-laptop: Port to new backlight interface selection API
    eeepc-laptop: Port to new backlight interface selection API
    dell-wmi: Port to new backlight interface selection API
    ...

    Linus Torvalds
     

23 Jun, 2015

2 commits

  • Andreas turned this option on, only to find out Debian (and Ubuntu!)
    don't enable support in their kmod builds.

    Shorten the text, and suggest N at the bottom (at least for now).

    Reported-by: Andreas Mohr
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Pull perf updates from Ingo Molnar:
    "Kernel side changes mostly consist of work on x86 PMU drivers:

    - x86 Intel PT (hardware CPU tracer) improvements (Alexander
    Shishkin)

    - x86 Intel CQM (cache quality monitoring) improvements (Thomas
    Gleixner)

    - x86 Intel PEBSv3 support (Peter Zijlstra)

    - x86 Intel PEBS interrupt batching support for lower overhead
    sampling (Zheng Yan, Kan Liang)

    - x86 PMU scheduler fixes and improvements (Peter Zijlstra)

    There's too many tooling improvements to list them all - here are a
    few select highlights:

    'perf bench':

    - Introduce new 'perf bench futex' benchmark: 'wake-parallel', to
    measure parallel waker threads generating contention for kernel
    locks (hb->lock). (Davidlohr Bueso)

    'perf top', 'perf report':

    - Allow disabling/enabling events dynamicaly in 'perf top':
    a 'perf top' session can instantly become a 'perf report'
    one, i.e. going from dynamic analysis to a static one,
    returning to a dynamic one is possible, to toogle the
    modes, just press 'f' to 'freeze/unfreeze' the sampling. (Arnaldo Carvalho de Melo)

    - Make Ctrl-C stop processing on TUI, allowing interrupting the load of big
    perf.data files (Namhyung Kim)

    'perf probe': (Masami Hiramatsu)

    - Support glob wildcards for function name
    - Support $params special probe argument: Collect all function arguments
    - Make --line checks validate C-style function name.
    - Add --no-inlines option to avoid searching inline functions
    - Greatly speed up 'perf probe --list' by caching debuginfo.
    - Improve --filter support for 'perf probe', allowing using its arguments
    on other commands, as --add, --del, etc.

    'perf sched':

    - Add option in 'perf sched' to merge like comms to lat output (Josef Bacik)

    Plus tons of infrastructure work - in particular preparation for
    upcoming threaded perf report support, but also lots of other work -
    and fixes and other improvements. See (much) more details in the
    shortlog and in the git log"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (305 commits)
    perf tools: Configurable per thread proc map processing time out
    perf tools: Add time out to force stop proc map processing
    perf report: Fix sort__sym_cmp to also compare end of symbol
    perf hists browser: React to unassigned hotkey pressing
    perf top: Tell the user how to unfreeze events after pressing 'f'
    perf hists browser: Honour the help line provided by builtin-{top,report}.c
    perf hists browser: Do not exit when 'f' is pressed in 'report' mode
    perf top: Replace CTRL+z with 'f' as hotkey for enable/disable events
    perf annotate: Rename source_line_percent to source_line_samples
    perf annotate: Display total number of samples with --show-total-period
    perf tools: Ensure thread-stack is flushed
    perf top: Allow disabling/enabling events dynamicly
    perf evlist: Add toggle_enable() method
    perf trace: Fix race condition at the end of started workloads
    perf probe: Speed up perf probe --list by caching debuginfo
    perf probe: Show usage even if the last event is skipped
    perf tools: Move libtraceevent dynamic list to separated LDFLAGS variable
    perf tools: Fix a problem when opening old perf.data with different byte order
    perf tools: Ignore .config-detected in .gitignore
    perf probe: Fix to return error if no probe is added
    ...

    Linus Torvalds
     

11 Jun, 2015

1 commit

  • Commit 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before
    timekeeping_init()" moved the ACPI subsystem initialization,
    including the ACPI mode enabling, to an earlier point in the
    initialization sequence, to allow the timekeeping subsystem
    use ACPI early. Unfortunately, that resulted in boot regressions
    on some systems and the early ACPI initialization was moved toward
    its original position in the kernel initialization code by commit
    c4e1acbb35e4 "ACPI / init: Invoke early ACPI initialization later".

    However, that turns out to be insufficient, as boot is still broken
    on the Tyan S8812 mainboard.

    To fix that issue, split the ACPI early initialization code into
    two pieces so the majority of it still located in acpi_early_init()
    and the part switching over the platform into the ACPI mode goes into
    a new function, acpi_subsystem_init(), executed at the original early
    ACPI initialization spot.

    That fixes the Tyan S8812 boot problem, but still allows ACPI
    tables to be loaded earlier which is useful to the EFI code in
    efi_enter_virtual_mode().

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141
    Fixes: 73f7d1ca3263 "ACPI / init: Run acpi_early_init() before timekeeping_init()"
    Reported-and-tested-by: Marius Tolzmann
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Toshi Kani
    Reviewed-by: Hanjun Guo
    Reviewed-by: Lee, Chun-Yi

    Rafael J. Wysocki
     

02 Jun, 2015

1 commit

  • cgroup writeback requires support from both bdi and filesystem sides.
    Add BDI_CAP_CGROUP_WRITEBACK and FS_CGROUP_WRITEBACK to indicate
    support and enable BDI_CAP_CGROUP_WRITEBACK on block based bdi's by
    default. Also, define CONFIG_CGROUP_WRITEBACK which is enabled if
    both MEMCG and BLK_CGROUP are enabled.

    inode_cgwb_enabled() which determines whether a given inode's both bdi
    and fs support cgroup writeback is added.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Jan Kara
    Signed-off-by: Jens Axboe

    Tejun Heo
     

28 May, 2015

10 commits

  • Andrew worried about the overhead on small systems; only use the fancy
    code when either perf or tracing is enabled.

    Cc: Rusty Russell
    Cc: Steven Rostedt
    Requested-by: Andrew Morton
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Rusty Russell

    Peter Zijlstra
     
  • The RCU implementation is chosen based on PREEMPT and SMP config options
    and is not really a user-selectable choice. This commit removes the
    menu entry, given that there is not much point in calling something a
    choice when there is in fact no choice.. The TINY_RCU, TREE_RCU, and
    PREEMPT_RCU Kconfig options continue to be selected based solely on the
    values of the PREEMPT and SMP options.

    Signed-off-by: Pranith Kumar
    Signed-off-by: Paul E. McKenney

    Pranith Kumar
     
  • This commit updates the initialization of the kthread_prio boot parameter
    so that RCU will build even when CONFIG_RCU_KTHREAD_PRIO is undefined.
    The kthread_prio boot parameter is set to CONFIG_RCU_KTHREAD_PRIO if
    that is defined, otherwise to 1 if CONFIG_RCU_BOOST is defined and
    to zero otherwise. This commit then makes CONFIG_RCU_KTHREAD_PRIO
    depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about
    CONFIG_RCU_KTHREAD_PRIO unless they want to be.

    Reported-by: Linus Torvalds
    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     
  • This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so
    that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined.
    The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF
    when defined, otherwise it is set to 32 for 32-bit systems and 64 for
    64-bit systems. This commit then makes CONFIG_RCU_FANOUT_LEAF depend
    on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about
    CONFIG_RCU_FANOUT_LEAF unless they want to be.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     
  • This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will
    build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is
    set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set
    to 32 for 32-bit systems and 64 for 64-bit systems. This commit then
    makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig
    users won't be asked about CONFIG_RCU_FANOUT unless they want to be.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     
  • RCU_FANOUT_LEAF's range and default values depend on the value of
    RCU_FANOUT, which at the time seemed like a cute way to save two lines
    of Kconfig code. However, adding a dependency from both of these
    Kconfig parameters on RCU_EXPERT requires that RCU_FANOUT_LEAF operate
    correctly even if RCU_FANOUT is undefined. This commit therefore
    allows RCU_FANOUT_LEAF to take on the full range of permitted values,
    even in cases where RCU_FANOUT is undefined.

    Signed-off-by: Paul E. McKenney
    [ paulmck: Eliminate redundant "default" as suggested by Pranith Kumar. ]
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     
  • This commit creates an RCU_EXPERT Kconfig and hides the independent
    boolean RCU-related user-visible Kconfig parameters behind it, namely
    RCU_FAST_NO_HZ and RCU_BOOST. This prevents Kconfig from asking about
    these parameters unless the user really wants to be asked.

    Reported-by: Linus Torvalds
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     
  • The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and
    perhaps only) by rcutorture to verify that RCU works correctly in specific
    rcu_node combining-tree configurations. It therefore does not make
    much sense have this as a question to people attempting to configure
    their kernels. So this commit creates an rcutree.rcu_fanout_exact=
    boot parameter that rcutorture can use, and eliminates the original
    CONFIG_RCU_FANOUT_EXACT Kconfig parameter.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     
  • Currently, Kconfig will ask the user whether RCU_USER_QS should be set.
    This is silly because Kconfig already has all the information that it
    needs to set this parameter. This commit therefore directly drives
    the value of RCU_USER_QS via NO_HZ_FULL's "select" statement.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Pranith Kumar
    Acked-by: Frederic Weisbecker

    Paul E. McKenney
     
  • Currently, Kconfig will ask the user whether TASKS_RCU should be set.
    This is silly because Kconfig already has all the information that it
    needs to set this parameter. This commit therefore directly drives
    the value of TASKS_RCU via "select" statements. Which means that
    as subsystems require TASKS_RCU, those subsystems will need to add
    "select" statements of their own.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Cc: Steven Rostedt
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     

27 May, 2015

2 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The cgroup side of threadgroup locking uses signal_struct->group_rwsem
    to synchronize against threadgroup changes. This per-process rwsem
    adds small overhead to thread creation, exit and exec paths, forces
    cgroup code paths to do lock-verify-unlock-retry dance in a couple
    places and makes it impossible to atomically perform operations across
    multiple processes.

    This patch replaces signal_struct->group_rwsem with a global
    percpu_rwsem cgroup_threadgroup_rwsem which is cheaper on the reader
    side and contained in cgroups proper. This patch converts one-to-one.

    This does make writer side heavier and lower the granularity; however,
    cgroup process migration is a fairly cold path, we do want to optimize
    thread operations over it and cgroup migration operations don't take
    enough time for the lower granularity to matter.

    Signed-off-by: Tejun Heo
    Cc: Ingo Molnar
    Cc: Peter Zijlstra

    Tejun Heo
     

20 May, 2015

1 commit

  • This adds an extra argument onto parse_params() to be used
    as a way to make the unused callback a bit more useful and
    generic by allowing the caller to pass on a data structure
    of its choice. An example use case is to allow us to easily
    make module parameters for every module which we will do
    next.

    @ parse @
    identifier name, args, params, num, level_min, level_max;
    identifier unknown, param, val, doing;
    type s16;
    @@
    extern char *parse_args(const char *name,
    char *args,
    const struct kernel_param *params,
    unsigned num,
    s16 level_min,
    s16 level_max,
    + void *arg,
    int (*unknown)(char *param, char *val,
    const char *doing
    + , void *arg
    ));

    @ parse_mod @
    identifier name, args, params, num, level_min, level_max;
    identifier unknown, param, val, doing;
    type s16;
    @@
    char *parse_args(const char *name,
    char *args,
    const struct kernel_param *params,
    unsigned num,
    s16 level_min,
    s16 level_max,
    + void *arg,
    int (*unknown)(char *param, char *val,
    const char *doing
    + , void *arg
    ))
    {
    ...
    }

    @ parse_args_found @
    expression R, E1, E2, E3, E4, E5, E6;
    identifier func;
    @@

    (
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    func);
    |
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    &func);
    |
    R =
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    NULL);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    func);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    &func);
    |
    parse_args(E1, E2, E3, E4, E5, E6,
    + NULL,
    NULL);
    )

    @ parse_args_unused depends on parse_args_found @
    identifier parse_args_found.func;
    @@

    int func(char *param, char *val, const char *unused
    + , void *arg
    )
    {
    ...
    }

    @ mod_unused depends on parse_args_found @
    identifier parse_args_found.func;
    expression A1, A2, A3;
    @@

    - func(A1, A2, A3);
    + func(A1, A2, A3, NULL);

    Generated-by: Coccinelle SmPL
    Cc: cocci@systeme.lip6.fr
    Cc: Tejun Heo
    Cc: Arjan van de Ven
    Cc: Greg Kroah-Hartman
    Cc: Rusty Russell
    Cc: Christoph Hellwig
    Cc: Felipe Contreras
    Cc: Ewan Milne
    Cc: Jean Delvare
    Cc: Hannes Reinecke
    Cc: Jani Nikula
    Cc: linux-kernel@vger.kernel.org
    Reviewed-by: Tejun Heo
    Acked-by: Rusty Russell
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: Greg Kroah-Hartman

    Luis R. Rodriguez
     

08 May, 2015

1 commit

  • On powerpc the perf event interrupt is not masked when interrupts are
    disabled, allowing it to function as an NMI.

    This causes problems if perf is using vmalloc. If we take a page fault
    on the vmalloc region the fault handler will fail the page fault because
    it detects we are coming in from an NMI (see do_hash_page()).

    We don't actually need or want vmalloc backed perf so just disable it on
    powerpc.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Andrew Morton
    Cc: Anton Blanchard
    Cc: Borislav Petkov
    Cc: H. Peter Anvin
    Cc: Paul Mackerras
    Cc: Thomas Gleixner
    Cc: acme@ghostprotocols.net
    Cc: sukadev@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/r/1430720799-18426-1-git-send-email-mpe@ellerman.id.au
    Signed-off-by: Ingo Molnar

    Michael Ellerman
     

06 May, 2015

1 commit

  • Commit 283e7ad02 ("init: stricter checking of major:minor root=
    values") was so strict that it exposed the fact that a previously
    unknown device format was being used.

    Distributions like Ubuntu uses klibc (rather than uswsusp) to resume
    system from hibernation. klibc expressed the swap partition/file in
    the form of major:minor:offset. For example, 8:3:0 represents a swap
    partition in klibc, and klibc's resume process in initrd will finally
    echo 8:3:0 to /sys/power/resume for manually resuming. However, due
    to commit 283e7ad02's stricter checking, 8:3:0 will be treated as an
    invalid device format, and manual resuming from hibernation will fail.

    Fix this by adding support for devices with major:minor:offset format
    when resuming from hibernation.

    Reported-by: Prigent, Christophe
    Signed-off-by: Chen Yu
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Mike Snitzer

    Chen Yu
     

18 Apr, 2015

2 commits

  • Pull documentation updates from Jonathan Corbet:
    "Numerous fixes, the overdue removal of the i2o docs, some new Chinese
    translations, and, hopefully, the README fix that will end the flow of
    identical patches to that file"

    * tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (34 commits)
    Documentation/memcg: update memcg/kmem status
    Documentation: blackfin: Makefile: Typo building issue
    Documentation/vm/pagemap.txt: correct location of page-types tool
    Documentation/memory-barriers.txt: typo fix
    doc: Add guest_nice column to example output of `cat /proc/stat'
    Documentation/kernel-parameters: Move "eagerfpu" to its right place
    Documentation: gpio: Update ACPI part of the document to mention _DSD
    docs/completion.txt: Various tweaks and corrections
    doc: completion: context, scope and language fixes
    Documentation:Update Documentation/zh_CN/arm64/memory.txt
    Documentation:Update Documentation/zh_CN/arm64/booting.txt
    Documentation: Chinese translation of arm64/legacy_instructions.txt
    DocBook media: fix broken EIA hyperlink
    Documentation: tweak the maintainers entry
    README: Change gzip/bzip2 to xz compression format
    README: Update version number reference
    doc:pci: Fix typo in Documentation/PCI
    Documentation: drm: Use '->' when describing access through pointers.
    Documentation: Remove mentioning of block barriers
    Documentation/email-clients.txt: Fix one grammar mistake, add extra info about TB
    ...

    Linus Torvalds
     
  • Pull device mapper updates from Mike Snitzer:

    - the most extensive changes this cycle are the DM core improvements to
    add full blk-mq support to request-based DM.

    - disabled by default but user can opt-in with CONFIG_DM_MQ_DEFAULT
    - depends on some blk-mq changes from Jens' for-4.1/core branch so
    that explains why this pull is built on linux-block.git

    - update DM to use name_to_dev_t() rather than open-coding a less
    capable device parser.

    - includes a couple small improvements to name_to_dev_t() that offer
    stricter constraints that DM's code provided.

    - improvements to the dm-cache "mq" cache replacement policy.

    - a DM crypt crypt_ctr() error path fix and an async crypto deadlock
    fix

    - a small efficiency improvement for DM crypt decryption by leveraging
    immutable biovecs

    - add error handling modes for corrupted blocks to DM verity

    - a new "log-writes" DM target from Josef Bacik that is meant for file
    system developers to test file system integrity at particular points
    in the life of a file system

    - a few DM log userspace cleanups and fixes

    - a few Documentation fixes (for thin, cache, crypt and switch)

    * tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (34 commits)
    dm crypt: fix missing error code return from crypt_ctr error path
    dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
    dm crypt: leverage immutable biovecs when decrypting on read
    dm crypt: update URLs to new cryptsetup project page
    dm: add log writes target
    dm table: use bool function return values of true/false not 1/0
    dm verity: add error handling modes for corrupted blocks
    dm thin: remove stale 'trim' message documentation
    dm delay: use msecs_to_jiffies for time conversion
    dm log userspace base: fix compile warning
    dm log userspace transfer: match wait_for_completion_timeout return type
    dm table: fall back to getting device using name_to_dev_t()
    init: stricter checking of major:minor root= values
    init: export name_to_dev_t and mark name argument as const
    dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr
    dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
    dm: add full blk-mq support to request-based DM
    dm: impose configurable deadline for dm_request_fn's merge heuristic
    dm sysfs: introduce ability to add writable attributes
    dm: don't start current request if it would've merged with the previous
    ...

    Linus Torvalds
     

17 Apr, 2015

1 commit

  • PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
    THREAD_SIZE.

    E.g. architecture hexagon may have page size 1M and thread size 4096.
    This would lead to a division by zero in the calculation of max_threads.

    With this patch the buggy code is moved to a separate function
    set_max_threads. The error is not fixed.

    After fixing the problem in a separate patch the new function can be
    reused to adjust max_threads after adding or removing memory.

    Argument mempages of function fork_init() is removed as totalram_pages is
    an exported symbol.

    The creation of separate patches for refactoring to a new function and for
    fixing the logic was suggested by Ingo Molnar.

    Signed-off-by: Heinrich Schuchardt
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Guenter Roeck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinrich Schuchardt
     

16 Apr, 2015

3 commits

  • There are a lot of embedded systems that run most or all of their
    functionality in init, running as root:root. For these systems,
    supporting multiple users is not necessary.

    This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
    non-root users, non-root groups, and capabilities optional. It is enabled
    under CONFIG_EXPERT menu.

    When this symbol is not defined, UID and GID are zero in any possible case
    and processes always have all capabilities.

    The following syscalls are compiled out: setuid, setregid, setgid,
    setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
    getgroups, setfsuid, setfsgid, capget, capset.

    Also, groups.c is compiled out completely.

    In kernel/capability.c, capable function was moved in order to avoid
    adding two ifdef blocks.

    This change saves about 25 KB on a defconfig build. The most minimal
    kernels have total text sizes in the high hundreds of kB rather than
    low MB. (The 25k goes down a bit with allnoconfig, but not that much.

    The kernel was booted in Qemu. All the common functionalities work.
    Adding users/groups is not possible, failing with -ENOSYS.

    Bloat-o-meter output:
    add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Iulia Manda
    Reviewed-by: Josh Triplett
    Acked-by: Geert Uytterhoeven
    Tested-by: Paul E. McKenney
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Iulia Manda
     
  • In the kernel command-line, previously, root=1:2jakshflaksjdhfa would
    be accepted and interpreted just like root=1:2. This patch adds
    stricter checking so that additional characters after major:minor are
    rejected by root=.

    The goal of this change is to help in unifying DM's interpretation of
    its block device argument by using existing kernel code (name_to_dev_t).
    But DM rejects malformed major:minor pairs, it seems reasonable for
    root= to reject them as well.

    Signed-off-by: Dan Ehrenberg
    Signed-off-by: Mike Snitzer

    Dan Ehrenberg
     
  • DM will switch its device lookup code to using name_to_dev_t() so it
    must be exported. Also, the @name argument should be marked const.

    Signed-off-by: Dan Ehrenberg
    Signed-off-by: Mike Snitzer

    Dan Ehrenberg
     

15 Apr, 2015

1 commit

  • Merge first patchbomb from Andrew Morton:

    - arch/sh updates

    - ocfs2 updates

    - kernel/watchdog feature

    - about half of mm/

    * emailed patches from Andrew Morton : (122 commits)
    Documentation: update arch list in the 'memtest' entry
    Kconfig: memtest: update number of test patterns up to 17
    arm: add support for memtest
    arm64: add support for memtest
    memtest: use phys_addr_t for physical addresses
    mm: move memtest under mm
    mm, hugetlb: abort __get_user_pages if current has been oom killed
    mm, mempool: do not allow atomic resizing
    memcg: print cgroup information when system panics due to panic_on_oom
    mm: numa: remove migrate_ratelimited
    mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    mm: split ET_DYN ASLR from mmap ASLR
    s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    mm: expose arch_mmap_rnd when available
    s390: standardize mmap_rnd() usage
    powerpc: standardize mmap_rnd() usage
    mips: extract logic for mmap_rnd()
    arm64: standardize mmap_rnd() usage
    x86: standardize mmap_rnd() usage
    arm: factor out mmap ASLR into mmap_rnd
    ...

    Linus Torvalds