17 Dec, 2014

2 commits

  • Pull vfs pile #2 from Al Viro:
    "Next pile (and there'll be one or two more).

    The large piece in this one is getting rid of /proc/*/ns/* weirdness;
    among other things, it allows to (finally) make nameidata completely
    opaque outside of fs/namei.c, making for easier further cleanups in
    there"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    coda_venus_readdir(): use file_inode()
    fs/namei.c: fold link_path_walk() call into path_init()
    path_init(): don't bother with LOOKUP_PARENT in argument
    fs/namei.c: new helper (path_cleanup())
    path_init(): store the "base" pointer to file in nameidata itself
    make default ->i_fop have ->open() fail with ENXIO
    make nameidata completely opaque outside of fs/namei.c
    kill proc_ns completely
    take the targets of /proc/*/ns/* symlinks to separate fs
    bury struct proc_ns in fs/proc
    copy address of proc_ns_ops into ns_common
    new helpers: ns_alloc_inum/ns_free_inum
    make proc_ns_operations work with struct ns_common * instead of void *
    switch the rest of proc_ns_operations to working with &...->ns
    netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
    make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
    common object embedded into various struct ....ns

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "As the merge window is still open, and this code was not as complex as
    I thought it might be. I'm pushing this in now.

    This will allow Thomas to debug his irq work for 3.20.

    This adds two new features:

    1) Allow traceopoints to be enabled right after mm_init().

    By passing in the trace_event= kernel command line parameter,
    tracepoints can be enabled at boot up. For debugging things like
    the initialization of interrupts, it is needed to have tracepoints
    enabled very early. People have asked about this before and this
    has been on my todo list. As it can be helpful for Thomas to debug
    his upcoming 3.20 IRQ work, I'm pushing this now. This way he can
    add tracepoints into the IRQ set up and have users enable them when
    things go wrong.

    2) Have the tracepoints printed via printk() (the console) when they
    are triggered.

    If the irq code locks up or reboots the box, having the tracepoint
    output go into the kernel ring buffer is useless for debugging.
    But being able to add the tp_printk kernel command line option
    along with the trace_event= option will have these tracepoints
    printed as they occur, and that can be really useful for debugging
    early lock up or reboot problems.

    This code is not that intrusive and it passed all my tests. Thomas
    tried them out too and it works for his needs.

    Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org"

    * tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Add tp_printk cmdline to have tracepoints go to printk()
    tracing: Move enabling tracepoints to just after rcu_init()

    Linus Torvalds
     

15 Dec, 2014

2 commits

  • Enabling tracepoints at boot up can be very useful. The tracepoint
    can be initialized right after RCU has been. There's no need to
    wait for the early_initcall() to be called. That's too late for some
    things that can use tracepoints for debugging. Move the logic to
    enable tracepoints out of the initcalls and into init/main.c to
    right after rcu_init().

    This also allows trace_printk() to be used early too.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
    Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org

    Reviewed-by: Paul E. McKenney
    Suggested-by: Thomas Gleixner
    Tested-by: Thomas Gleixner
    Acked-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Pull security layer updates from James Morris:
    "In terms of changes, there's general maintenance to the Smack,
    SELinux, and integrity code.

    The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
    which allows IMA appraisal to require signatures. Support for reading
    keys from rootfs before init is call is also added"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
    selinux: Remove security_ops extern
    security: smack: fix out-of-bounds access in smk_parse_smack()
    VFS: refactor vfs_read()
    ima: require signature based appraisal
    integrity: provide a hook to load keys when rootfs is ready
    ima: load x509 certificate from the kernel
    integrity: provide a function to load x509 certificate from the kernel
    integrity: define a new function integrity_read_file()
    Security: smack: replace kzalloc with kmem_cache for inode_smack
    Smack: Lock mode for the floor and hat labels
    ima: added support for new kernel cmdline parameter ima_template_fmt
    ima: allocate field pointers array on demand in template_desc_init_fields()
    ima: don't allocate a copy of template_fmt in template_desc_init_fields()
    ima: display template format in meas. list if template name length is zero
    ima: added error messages to template-related functions
    ima: use atomic bit operations to protect policy update interface
    ima: ignore empty and with whitespaces policy lines
    ima: no need to allocate entry for comment
    ima: report policy load status
    ima: use path names cache
    ...

    Linus Torvalds
     

14 Dec, 2014

1 commit

  • When we debug something, we'd like to insert some information to every
    page. For this purpose, we sometimes modify struct page itself. But,
    this has drawbacks. First, it requires re-compile. This makes us
    hesitate to use the powerful debug feature so development process is
    slowed down. And, second, sometimes it is impossible to rebuild the
    kernel due to third party module dependency. At third, system behaviour
    would be largely different after re-compile, because it changes size of
    struct page greatly and this structure is accessed by every part of
    kernel. Keeping this as it is would be better to reproduce errornous
    situation.

    This feature is intended to overcome above mentioned problems. This
    feature allocates memory for extended data per page in certain place
    rather than the struct page itself. This memory can be accessed by the
    accessor functions provided by this code. During the boot process, it
    checks whether allocation of huge chunk of memory is needed or not. If
    not, it avoids allocating memory at all. With this advantage, we can
    include this feature into the kernel in default and can avoid rebuild and
    solve related problems.

    Until now, memcg uses this technique. But, now, memcg decides to embed
    their variable to struct page itself and it's code to extend struct page
    has been removed. I'd like to use this code to develop debug feature, so
    this patch resurrect it.

    To help these things to work well, this patch introduces two callbacks for
    clients. One is the need callback which is mandatory if user wants to
    avoid useless memory allocation at boot-time. The other is optional, init
    callback, which is used to do proper initialization after memory is
    allocated. Detailed explanation about purpose of these functions is in
    code comment. Please refer it.

    Others are completely same with previous extension code in memcg.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

11 Dec, 2014

9 commits

  • Al Viro
     
  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro
     
  • If a user puts init=/whatever on the command line and /whatever can't be
    run, then the kernel will try a few default options before giving up. If
    init=/whatever came from a bootloader prompt, then this is unexpected but
    probably harmless. On the other hand, if it comes from a script (e.g. a
    tool like virtme or perhaps a future kselftest script), then the fallbacks
    are likely to exist, but they'll do the wrong thing. For example, they
    might unexpectedly invoke systemd.

    This adds a config option CONFIG_INIT_FALLBACK. If unset, then a failure
    to run the specified init= process be fatal.

    The tentative plan is to remove CONFIG_INIT_FALLBACK for 3.20.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Andy Lutomirski
    Cc: Rob Landley
    Cc: Chuck Ebbert
    Cc: Randy Dunlap
    Cc: Shuah Khan
    Cc: Frank Rowand
    Cc: Josh Triplett
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Now that the external page_cgroup data structure and its lookup is
    gone, let the generic bad_page() check for page->mem_cgroup sanity.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Acked-by: David S. Miller
    Cc: KAMEZAWA Hiroyuki
    Cc: "Kirill A. Shutemov"
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Memory cgroups used to have 5 per-page pointers. To allow users to
    disable that amount of overhead during runtime, those pointers were
    allocated in a separate array, with a translation layer between them and
    struct page.

    There is now only one page pointer remaining: the memcg pointer, that
    indicates which cgroup the page is associated with when charged. The
    complexity of runtime allocation and the runtime translation overhead is
    no longer justified to save that *potential* 0.19% of memory. With
    CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
    after the page->private member and doesn't even increase struct page,
    and then this patch actually saves space. Remaining users that care can
    still compile their kernels without CONFIG_MEMCG.

    text data bss dec hex filename
    8828345 1725264 983040 11536649 b00909 vmlinux.old
    8827425 1725264 966656 11519345 afc571 vmlinux.new

    [mhocko@suse.cz: update Documentation/cgroups/memory.txt]
    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Acked-by: David S. Miller
    Acked-by: KAMEZAWA Hiroyuki
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Acked-by: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Add the default enable config option after the NUMA_BALANCING option so
    that it appears related in the nconfig interface.

    Signed-off-by: Aneesh Kumar K.V
    Acked-by: David Rientjes
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • All memory accounting and limiting has been switched over to the
    lockless page counters. Bye, res_counter!

    [akpm@linux-foundation.org: update Documentation/cgroups/memory.txt]
    [mhocko@suse.cz: ditch the last remainings of res_counter]
    Signed-off-by: Johannes Weiner
    Acked-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc: Paul Bolle
    Signed-off-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Abandon the spinlock-protected byte counters in favor of the unlocked
    page counters in the hugetlb controller as well.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Memory is internally accounted in bytes, using spinlock-protected 64-bit
    counters, even though the smallest accounting delta is a page. The
    counter interface is also convoluted and does too many things.

    Introduce a new lockless word-sized page counter API, then change all
    memory accounting over to it. The translation from and to bytes then only
    happens when interfacing with userspace.

    The removed locking overhead is noticable when scaling beyond the per-cpu
    charge caches - on a 4-socket machine with 144-threads, the following test
    shows the performance differences of 288 memcgs concurrently running a
    page fault benchmark:

    vanilla:

    18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% )
    1,380,638 context-switches # 0.074 K/sec ( +- 0.75% )
    24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% )
    1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% )
    50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% )
    stalled-cycles-frontend
    stalled-cycles-backend
    8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% )
    1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% )
    1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% )

    132.474343877 seconds time elapsed ( +- 0.21% )

    lockless:

    12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% )
    832,850 context-switches # 0.068 K/sec ( +- 0.54% )
    15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% )
    1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% )
    32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% )
    stalled-cycles-frontend
    stalled-cycles-backend
    9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% )
    2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% )
    1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% )

    91.369330729 seconds time elapsed ( +- 0.45% )

    On top of improved scalability, this also gets rid of the icky long long
    types in the very heart of memcg, which is great for 32 bit and also makes
    the code a lot more readable.

    Notable differences between the old and new API:

    - res_counter_charge() and res_counter_charge_nofail() become
    page_counter_try_charge() and page_counter_charge() resp. to match
    the more common kernel naming scheme of try_do()/do()

    - res_counter_uncharge_until() is only ever used to cancel a local
    counter and never to uncharge bigger segments of a hierarchy, so
    it's replaced by the simpler page_counter_cancel()

    - res_counter_set_limit() is replaced by page_counter_limit(), which
    expects its callers to serialize against themselves

    - res_counter_memparse_write_strategy() is replaced by
    page_counter_limit(), which rounds down to the nearest page size -
    rather than up. This is more reasonable for explicitely requested
    hard upper limits.

    - to keep charging light-weight, page_counter_try_charge() charges
    speculatively, only to roll back if the result exceeds the limit.
    Because of this, a failing bigger charge can temporarily lock out
    smaller charges that would otherwise succeed. The error is bounded
    to the difference between the smallest and the biggest possible
    charge size, so for memcg, this means that a failing THP charge can
    send base page charges into reclaim upto 2MB (4MB) before the limit
    would have been reached. This should be acceptable.

    [akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
    [akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: Vladimir Davydov
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

05 Dec, 2014

2 commits


20 Nov, 2014

1 commit


19 Nov, 2014

1 commit


18 Nov, 2014

1 commit

  • Keys can only be loaded once the rootfs is mounted. Initcalls
    are not suitable for that. This patch defines a special hook
    to load the x509 public keys onto the IMA keyring, before
    attempting to access any file. The keys are required for
    verifying the file's signature. The hook is called after the
    root filesystem is mounted and before the kernel calls 'init'.

    Changes in v3:
    * added more explanation to the patch description (Mimi)

    Changes in v2:
    * Hook renamed as 'integrity_load_keys()' to handle both IMA and EVM
    keys by integrity subsystem.
    * Hook patch moved after defining loading functions

    Signed-off-by: Dmitry Kasatkin
    Signed-off-by: Mimi Zohar

    Dmitry Kasatkin
     

11 Nov, 2014

1 commit

  • Currently if the user passes an invalid value on the kernel command line
    then the kernel will crash during argument parsing. On most systems this
    is very hard to debug because the console hasn't been initialized yet.

    This is a regression due to commit 51e158c12aca ("param: hand arguments
    after -- straight to init") which, in response to the systemd debug
    controversy, made it possible to explicitly pass arguments to init. To
    achieve this parse_args() was extended from simply returning an error
    code to returning a pointer. Regretably the new init args logic does not
    perform a proper validity check on the pointer resulting in a crash.

    This patch fixes the validity check. Should the check fail then no arguments
    will be passed to init. This is reasonable and matches how the kernel treats
    its own arguments (i.e. no error recovery).

    Signed-off-by: Daniel Thompson
    Cc: stable@vger.kernel.org
    Signed-off-by: Rusty Russell

    Daniel Thompson
     

30 Oct, 2014

2 commits

  • PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after
    TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU
    and uses PREEMPT_RCU config option in its place.

    Signed-off-by: Pranith Kumar
    Signed-off-by: Paul E. McKenney

    Pranith Kumar
     
  • Rename CONFIG_RCU_BOOST_PRIO to CONFIG_RCU_KTHREAD_PRIO and use this
    value for both the per-CPU kthreads (rcuc/N) and the rcu boosting
    threads (rcub/n).

    Also, create the module_parameter rcutree.kthread_prio to be used on
    the kernel command line at boot to set a new value (rcutree.kthread_prio=N).

    Signed-off-by: Clark Williams
    [ paulmck: Ported to rcu/dev, applied Paul Bolle and Peter Zijlstra feedback. ]
    Signed-off-by: Paul E. McKenney

    Clark Williams
     

29 Oct, 2014

1 commit

  • Every choice item of the "Build-forced no-CBs CPUs" choice had a
    dependency to RCU_NOCB_CPU. It's more comprehensible if the choice
    itself has the dependency instead of every choice item. The choice
    itself doesn't need to be visible if there are no items selectable
    (i.e. on arch/frv) or RCU_NOCB_CPU is not defined.

    Signed-off-by: Stefan Hengelein
    Signed-off-by: Andreas Ruprecht
    Reviewed-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Paul E. McKenney

    Stefan Hengelein
     

28 Oct, 2014

1 commit

  • introduce two configs:
    - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters
    depend on
    - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use

    that solves several problems:
    - tracing and others that wish to use eBPF don't need to depend on NET.
    They can use BPF_SYSCALL to allow loading from userspace or select BPF
    to use it directly from kernel in NET-less configs.
    - in 3.18 programs cannot be attached to events yet, so don't force it on
    - when the rest of eBPF infra is there in 3.19+, it's still useful to
    switch it off to minimize kernel size

    bloat-o-meter on x64 shows:
    add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601)

    tested with many different config combinations. Hopefully didn't miss anything.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

14 Oct, 2014

3 commits


13 Oct, 2014

2 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - changes related to No-CBs CPUs and NO_HZ_FULL

    - RCU-tasks implementation

    - torture-test updates

    - miscellaneous fixes

    - locktorture updates

    - RCU documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
    workqueue: Use cond_resched_rcu_qs macro
    workqueue: Add quiescent state between work items
    locktorture: Cleanup header usage
    locktorture: Cannot hold read and write lock
    locktorture: Fix __acquire annotation for spinlock irq
    locktorture: Support rwlocks
    rcu: Eliminate deadlock between CPU hotplug and expedited grace periods
    locktorture: Document boot/module parameters
    rcutorture: Rename rcutorture_runnable parameter
    locktorture: Add test scenario for rwsem_lock
    locktorture: Add test scenario for mutex_lock
    locktorture: Make torture scripting account for new _runnable name
    locktorture: Introduce torture context
    locktorture: Support rwsems
    locktorture: Add infrastructure for torturing read locks
    torture: Address race in module cleanup
    locktorture: Make statistics generic
    locktorture: Teach about lock debugging
    locktorture: Support mutexes
    locktorture: Add documentation
    ...

    Linus Torvalds
     

10 Oct, 2014

1 commit

  • ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
    _PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and
    relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
    fault scanner. This was found to be conceptually confusing with a lot of
    implicit assumptions and it was asked that an alternative be found.

    Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
    PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
    bits and shrunk the maximum possible swap size but it did not go far
    enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA
    but the relics still exist.

    This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
    duplication in powerpc vs the generic implementation by defining the types
    the core NUMA helpers expected to exist from x86 with their ppc64
    equivalent. This necessitated that a PTE bit mask be created that
    identified the bits that distinguish present from NUMA pte entries but it
    is expected this will only differ between arches based on _PAGE_PROTNONE.
    The naming for the generic helpers was taken from x86 originally but ppc64
    has types that are equivalent for the purposes of the helper so they are
    mapped instead of duplicating code.

    Signed-off-by: Mel Gorman
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Cyrill Gorcunov
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

09 Oct, 2014

1 commit

  • Pull timer fixes from Ingo Molnar:
    "Main changes:

    - Fix the deadlock reported by Dave Jones et al
    - Clean up and fix nohz_full interaction with arch abilities
    - nohz init code consolidation/cleanup"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    nohz: nohz full depends on irq work self IPI support
    nohz: Consolidate nohz full init code
    arm64: Tell irq work about self IPI support
    arm: Tell irq work about self IPI support
    x86: Tell irq work about self IPI support
    irq_work: Force raised irq work to run on irq work interrupt
    irq_work: Introduce arch_irq_work_has_interrupt()
    nohz: Move nohz full init call to tick init

    Linus Torvalds
     

08 Oct, 2014

2 commits

  • Pull "trivial tree" updates from Jiri Kosina:
    "Usual pile from trivial tree everyone is so eagerly waiting for"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Remove MN10300_PROC_MN2WS0038
    mei: fix comments
    treewide: Fix typos in Kconfig
    kprobes: update jprobe_example.c for do_fork() change
    Documentation: change "&" to "and" in Documentation/applying-patches.txt
    Documentation: remove obsolete pcmcia-cs from Changes
    Documentation: update links in Changes
    Documentation: Docbook: Fix generated DocBook/kernel-api.xml
    score: Remove GENERIC_HAS_IOMAP
    gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
    tty: doc: Fix grammar in serial/tty
    dma-debug: modify check_for_stack output
    treewide: fix errors in printk
    genirq: fix reference in devm_request_threaded_irq comment
    treewide: fix synchronize_rcu() in comments
    checkstack.pl: port to AArch64
    doc: queue-sysfs: minor fixes
    init/do_mounts: better syntax description
    MIPS: fix comment spelling
    powerpc/simpleboot: fix comment
    ...

    Linus Torvalds
     
  • Pull module update from Rusty Russell:
    "Nothing major: support for compressing modules, and auto-tainting
    params.

    PS. My virtio-next tree is empty: DaveM took the patches I had. There
    might be a virtio-rng starvation fix, but so far it's a bit voodoo
    so I will get to that in the next two days or it will wait"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    moduleparam: Resolve missing-field-initializer warning
    kbuild: handle module compression while running 'make modules_install'.
    modinst: wrap long lines in order to enhance cmd_modules_install
    modsign: lookup lines ending in .ko in .mod files
    modpost: simplify file name generation of *.mod.c files
    modpost: reduce visibility of symbols and constify r/o arrays
    param: check for tainting before calling set op.
    drm/i915: taint the kernel if unsafe module parameters are set
    module: add module_param_unsafe and module_param_named_unsafe
    module: make it possible to have unsafe, tainting module params
    module: rename KERNEL_PARAM_FL_NOARG to avoid confusion

    Linus Torvalds
     

07 Oct, 2014

1 commit

  • Pull "tinification" patches from Josh Triplett.

    Work on making smaller kernels.

    * tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
    bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
    mm: Support compiling out madvise and fadvise
    x86: Support compiling out human-friendly processor feature names
    x86: Drop support for /proc files when !CONFIG_PROC_FS
    x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
    x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
    x86, boot: Use the usual -y -n mechanism for objects in vmlinux
    x86: Add "make tinyconfig" to configure the tiniest possible kernel
    x86, platform, kconfig: move kvmconfig functionality to a helper

    Linus Torvalds
     

05 Oct, 2014

1 commit


04 Oct, 2014

2 commits

  • commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow
    architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
    HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in
    the middle of the options for the EXPERT menu. However,
    HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
    placing items in the EXPERT menu, and displays the remaining several
    EXPERT items (starting with EPOLL) directly in the General Setup menu.

    Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
    HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the
    subsequent items display as part of the EXPERT menu again; the EMBEDDED
    menu now appears as the next top-level item in the General Setup menu,
    which makes General Setup much shorter and more usable.

    Signed-off-by: Josh Triplett
    Acked-by: Randy Dunlap
    Cc: stable

    Josh Triplett
     
  • The buffers sized by CONFIG_LOG_BUF_SHIFT and
    CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't
    ask about their size at all.

    Signed-off-by: Josh Triplett
    Acked-by: Randy Dunlap
    Cc: stable

    Josh Triplett
     

23 Sep, 2014

1 commit

  • …/linux-rcu into core/rcu

    Pull the v3.18 RCU changes from Paul E. McKenney:

    "
    * Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/378.

    * Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/386. An additional fix that
    eliminates a documented (but now inconvenient) deadlock between
    RCU hotplug and expedited grace periods was posted at
    https://lkml.org/lkml/2014/8/28/573.

    * Changes related to No-CBs CPUs and NO_HZ_FULL. These were posted
    to LKML at https://lkml.org/lkml/2014/8/28/412.

    * Torture-test updates. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/546 and at
    https://lkml.org/lkml/2014/9/11/1114.

    * RCU-tasks implementation. These were posted to LKML at
    https://lkml.org/lkml/2014/8/28/540.
    "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

19 Sep, 2014

1 commit

  • Tasks get their end of stack set to STACK_END_MAGIC with the
    aim to catch stack overruns. Currently this feature does not
    apply to init_task. This patch removes this restriction.

    Note that a similar patch was posted by Prarit Bhargava
    some time ago but was never merged:

    http://marc.info/?l=linux-kernel&m=127144305403241&w=2

    Signed-off-by: Aaron Tomlin
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Oleg Nesterov
    Acked-by: Michael Ellerman
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: dzickus@redhat.com
    Cc: bmr@redhat.com
    Cc: jcastillo@redhat.com
    Cc: jgh@redhat.com
    Cc: minchan@kernel.org
    Cc: tglx@linutronix.de
    Cc: hannes@cmpxchg.org
    Cc: Alex Thorlton
    Cc: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Daeseok Youn
    Cc: David Rientjes
    Cc: Fabian Frederick
    Cc: Geert Uytterhoeven
    Cc: Jiri Olsa
    Cc: Kees Cook
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Michael Opdenacker
    Cc: Paul Mackerras
    Cc: Prarit Bhargava
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Seiji Aguchi
    Cc: Steven Rostedt
    Cc: Vladimir Davydov
    Cc: Yasuaki Ishimatsu
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     

17 Sep, 2014

1 commit

  • This reverts commit 4dfe694f616e00e6fd83e5bbcd7a3c4d7113493d.

    In that, we did:

    Here we move the rootdelay code to be right beside the rootwait code, so
    that their behaviour is consistent.

    ...which is fine, but in hindsight, perhaps moving the rootwait to be
    beside the rootdelay would have been better. We also indicated:

    It should be noted that in doing so, the actions based on the
    saved_root_name[0] and initrd_load() were previously put on hold by
    rootdelay=N and now currently will not be delayed. However, I think
    consistent behaviour is more important than matching historical behaviour
    of delaying the above two operations.

    But Pavel reported an instance where an ARM target with root on MMC
    was failing to mount root, and Russell diagnosed it to the fact that
    the call to set ROOT_DEV within the saved_root_name[0] processing
    block mentioned above was no longer being delayed.

    Rather than moving both wait clauses to the original position of
    rootdelay and risking unearthing other possible corner case breakage
    at this point in time, we simply revert now and we can revisit
    trying the alternate/earlier location in another development cycle.

    Cc: Pavel Machek
    Cc: Russell King
    Cc: Andrew Morton
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Linus Torvalds

    Paul Gortmaker