19 Apr, 2014

1 commit


13 Apr, 2014

1 commit

  • Pull audit updates from Eric Paris.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
    audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
    audit: do not cast audit_rule_data pointers pointlesly
    AUDIT: Allow login in non-init namespaces
    audit: define audit_is_compat in kernel internal header
    kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
    sched: declare pid_alive as inline
    audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
    syscall_get_arch: remove useless function arguments
    audit: remove stray newline from audit_log_execve_info() audit_panic() call
    audit: remove stray newlines from audit_log_lost messages
    audit: include subject in login records
    audit: remove superfluous new- prefix in AUDIT_LOGIN messages
    audit: allow user processes to log from another PID namespace
    audit: anchor all pid references in the initial pid namespace
    audit: convert PPIDs to the inital PID namespace.
    pid: get pid_t ppid of task in init_pid_ns
    audit: rename the misleading audit_get_context() to audit_take_context()
    audit: Add generic compat syscall support
    audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
    ...

    Linus Torvalds
     

08 Apr, 2014

2 commits

  • This can greatly aid in narrowing down the real source of initramfs
    problems such as failures related to the compression of the in-kernel
    initramfs when an external initramfs is in use as well. Existing errors
    are ambiguous as to which initramfs is a problem and why.

    [akpm@linux-foundation.org: use pr_debug()]
    Signed-off-by: Daniel M. Weeks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel M. Weeks
     
  • "make allnoconfig" exists to ease testing of minimal configurations.
    Documentation/SubmitChecklist includes a note to test with allnoconfig.
    This helps catch missing dependencies on common-but-not-required
    functionality, which might otherwise go unnoticed.

    However, allnoconfig still leaves many symbols enabled, because they're
    hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT. For instance, allnoconfig
    still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't
    typically get build-tested with those disabled.

    To address this, introduce a new Kconfig option "allnoconfig_y", used on
    symbols which only exist to hide other symbols. Set it on CONFIG_EMBEDDED
    (which then selects CONFIG_EXPERT). allnoconfig will then disable all the
    symbols hidden behind those.

    Signed-off-by: Josh Triplett
    Tested-by: Paul E. McKenney
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

04 Apr, 2014

5 commits

  • Merge first patch-bomb from Andrew Morton:
    - Various misc bits
    - kmemleak fixes
    - small befs, codafs, cifs, efs, freexxfs, hfsplus, minixfs, reiserfs things
    - fanotify
    - I appear to have become SuperH maintainer
    - ocfs2 updates
    - direct-io tweaks
    - a bit of the MM queue
    - printk updates
    - MAINTAINERS maintenance
    - some backlight things
    - lib/ updates
    - checkpatch updates
    - the rtc queue
    - nilfs2 updates
    - Small Documentation/ updates

    * emailed patches from Andrew Morton : (237 commits)
    Documentation/SubmittingPatches: remove references to patch-scripts
    Documentation/SubmittingPatches: update some dead URLs
    Documentation/filesystems/ntfs.txt: remove changelog reference
    Documentation/kmemleak.txt: updates
    fs/reiserfs/super.c: add __init to init_inodecache
    fs/reiserfs: move prototype declaration to header file
    fs/hfsplus/attributes.c: add __init to hfsplus_create_attr_tree_cache()
    fs/hfsplus/extents.c: fix concurrent acess of alloc_blocks
    fs/hfsplus/extents.c: remove unused variable in hfsplus_get_block
    nilfs2: update project's web site in nilfs2.txt
    nilfs2: update MAINTAINERS file entries fix
    nilfs2: verify metadata sizes read from disk
    nilfs2: add FITRIM ioctl support for nilfs2
    nilfs2: add nilfs_sufile_trim_fs to trim clean segs
    nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
    nilfs2: add nilfs_sufile_set_suinfo to update segment usage
    nilfs2: add struct nilfs_suinfo_update and flags
    nilfs2: update MAINTAINERS file entries
    fs/coda/inode.c: add __init to init_inodecache()
    BEFS: logging cleanup
    ...

    Linus Torvalds
     
  • Signed-off-by: chishanmingshen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    chishanmingshen
     
  • uselib hasn't been used since libc5; glibc does not use it. Support
    turning it off.

    When disabled, also omit the load_elf_library implementation from
    binfmt_elf.c, which only uselib invokes.

    bloat-o-meter:
    add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785)
    function old new delta
    padzero 39 36 -3
    uselib_flags 20 - -20
    sys_uselib 168 - -168
    SyS_uselib 168 - -168
    load_elf_library 426 - -426

    The new CONFIG_USELIB defaults to `y'.

    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • sys_sysfs is an obsolete system call no longer supported by libc.

    - This patch adds a default CONFIG_SYSFS_SYSCALL=y

    - Option can be turned off in expert mode.

    - cond_syscall added to kernel/sys_ni.c

    [akpm@linux-foundation.org: tweak Kconfig help text]
    Signed-off-by: Fabian Frederick
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Pull cgroup updates from Tejun Heo:
    "A lot updates for cgroup:

    - The biggest one is cgroup's conversion to kernfs. cgroup took
    after the long abandoned vfs-entangled sysfs implementation and
    made it even more convoluted over time. cgroup's internal objects
    were fused with vfs objects which also brought in vfs locking and
    object lifetime rules. Naturally, there are places where vfs rules
    don't fit and nasty hacks, such as credential switching or lock
    dance interleaving inode mutex and cgroup_mutex with object serial
    number comparison thrown in to decide whether the operation is
    actually necessary, needed to be employed.

    After conversion to kernfs, internal object lifetime and locking
    rules are mostly isolated from vfs interactions allowing shedding
    of several nasty hacks and overall simplification. This will also
    allow implmentation of operations which may affect multiple cgroups
    which weren't possible before as it would have required nesting
    i_mutexes.

    - Various simplifications including dropping of module support,
    easier cgroup name/path handling, simplified cgroup file type
    handling and task_cg_lists optimization.

    - Prepatory changes for the planned unified hierarchy, which is still
    a patchset away from being actually operational. The dummy
    hierarchy is updated to serve as the default unified hierarchy.
    Controllers which aren't claimed by other hierarchies are
    associated with it, which BTW was what the dummy hierarchy was for
    anyway.

    - Various fixes from Li and others. This pull request includes some
    patches to add missing slab.h to various subsystems. This was
    triggered xattr.h include removal from cgroup.h. cgroup.h
    indirectly got included a lot of files which brought in xattr.h
    which brought in slab.h.

    There are several merge commits - one to pull in kernfs updates
    necessary for converting cgroup (already in upstream through
    driver-core), others for interfering changes in the fixes branch"

    * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
    cgroup: remove useless argument from cgroup_exit()
    cgroup: fix spurious lockdep warning in cgroup_exit()
    cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
    cgroup: break kernfs active_ref protection in cgroup directory operations
    cgroup: fix cgroup_taskset walking order
    cgroup: implement CFTYPE_ONLY_ON_DFL
    cgroup: make cgrp_dfl_root mountable
    cgroup: drop const from @buffer of cftype->write_string()
    cgroup: rename cgroup_dummy_root and related names
    cgroup: move ->subsys_mask from cgroupfs_root to cgroup
    cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
    cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
    cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
    cgroup: reorganize cgroup bootstrapping
    cgroup: relocate setting of CGRP_DEAD
    cpuset: use rcu_read_lock() to protect task_cs()
    cgroup_freezer: document freezer_fork() subtleties
    cgroup: update cgroup_transfer_tasks() to either succeed or fail
    cgroup: drop task_lock() protection around task->cgroups
    cgroup: update how a newly forked task gets associated with css_set
    ...

    Linus Torvalds
     

01 Apr, 2014

1 commit

  • Pull core locking updates from Ingo Molnar:
    "The biggest change is the MCS spinlock generalization changes from Tim
    Chen, Peter Zijlstra, Jason Low et al. There's also lockdep
    fixes/enhancements from Oleg Nesterov, in particular a false negative
    fix related to lockdep_set_novalidate_class() usage"

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    locking/mutex: Fix debug checks
    locking/mutexes: Add extra reschedule point
    locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
    locking/mutexes: Unlock the mutex without the wait_lock
    locking/mutexes: Modify the way optimistic spinners are queued
    locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
    locking: Move mcs_spinlock.h into kernel/locking/
    m68k: Skip futex_atomic_cmpxchg_inatomic() test
    futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
    Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
    lockdep: Change lockdep_set_novalidate_class() to use _and_name
    lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
    lockdep: Don't create the wrong dependency on hlock->check == 0
    lockdep: Make held_lock->check and "int check" argument bool
    locking/mcs: Allow architecture specific asm files to be used for contended case
    locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
    sched/wait: Suppress Sparse 'variable shadowing' warning
    hung_task/Documentation: Fix hung_task_warnings description
    locking/mcs: Allow architectures to hook in to contended paths
    locking/mcs: Micro-optimize the MCS code, add extra comments
    ...

    Linus Torvalds
     

20 Mar, 2014

2 commits

  • Currently AUDITSYSCALL has a long list of architecture depencency:
    depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
    SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
    The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
    for simplicity.

    Signed-off-by: AKASHI Takahiro
    Acked-by: Will Deacon (arm)
    Acked-by: Richard Guy Briggs (audit)
    Acked-by: Matt Turner (alpha)
    Acked-by: Michael Ellerman (powerpc)
    Signed-off-by: Eric Paris

    AKASHI Takahiro
     
  • Signed-off-by: Zhenglong.cai
    Signed-off-by: Matt Turner

    蔡正龙
     

13 Mar, 2014

1 commit

  • Commit 73f7d1ca3263 (ACPI / init: Run acpi_early_init() before
    timekeeping_init()) optimistically moved the early ACPI initialization
    before timekeeping_init(), but that didn't work, because it broke fast
    TSC calibration for Julian Wollrath on Thinkpad x121e (and most likely
    for others too). The reason is that acpi_early_init() enables the SCI
    and that interferes with the fast TSC calibration mechanism.

    Thus follow the original idea to execute acpi_early_init() before
    efi_enter_virtual_mode() to help the EFI people for now and we can
    revisit the other problem that commit 73f7d1ca3263 attempted to
    address in the future (if really necessary).

    Fixes: 73f7d1ca3263 (ACPI / init: Run acpi_early_init() before timekeeping_init())
    Reported-by: Julian Wollrath
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

03 Mar, 2014

1 commit

  • If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
    is no runtime check necessary, allow to skip the test within futex_init().

    This allows to get rid of some code which would always give the same result,
    and also allows the compiler to optimize a couple of if statements away.

    Signed-off-by: Heiko Carstens
    Cc: Finn Thain
    Cc: Geert Uytterhoeven
    Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
    Signed-off-by: Thomas Gleixner

    Heiko Carstens
     

12 Feb, 2014

1 commit

  • cgroup filesystem code was derived from the original sysfs
    implementation which was heavily intertwined with vfs objects and
    locking with the goal of re-using the existing vfs infrastructure.
    That experiment turned out rather disastrous and sysfs switched, a
    long time ago, to distributed filesystem model where a separate
    representation is maintained which is queried by vfs. Unfortunately,
    cgroup stuck with the failed experiment all these years and
    accumulated even more problems over time.

    Locking and object lifetime management being entangled with vfs is
    probably the most egregious. vfs is never designed to be misused like
    this and cgroup ends up jumping through various convoluted dancing to
    make things work. Even then, operations across multiple cgroups can't
    be done safely as it'll deadlock with rename locking.

    Recently, kernfs is separated out from sysfs so that it can be used by
    users other than sysfs. This patch converts cgroup to use kernfs,
    which will bring the following benefits.

    * Separation from vfs internals. Locking and object lifetime
    management is contained in cgroup proper making things a lot
    simpler. This removes significant amount of locking convolutions,
    hairy object lifetime rules and the restriction on multi-cgroup
    operations.

    * Can drop a lot of code to implement filesystem interface as most are
    provided by kernfs.

    * Proper "severing" semantics, which allows controllers to not worry
    about lingering file accesses after offline.

    While the preceding patches did as much as possible to make the
    transition less painful, large part of the conversion has to be one
    discrete step making this patch rather large. The rest of the commit
    message lists notable changes in different areas.

    Overall
    -------

    * vfs constructs replaced with kernfs ones. cgroup->dentry w/ ->kn,
    cgroupfs_root->sb w/ ->kf_root.

    * All dentry accessors are removed. Helpers to map from kernfs
    constructs are added.

    * All vfs plumbing around dentry, inode and bdi removed.

    * cgroup_mount() now directly looks for matching root and then
    proceeds to create a new one if not found.

    Synchronization and object lifetime
    -----------------------------------

    * vfs inode locking removed. Among other things, this removes the
    need for the convolution in cgroup_cfts_commit(). Future patches
    will further simplify it.

    * vfs refcnting replaced with cgroup internal ones. cgroup->refcnt,
    cgroupfs_root->refcnt added. cgroup_put_root() now directly puts
    root->refcnt and when it reaches zero proceeds to destroy it thus
    merging cgroup_put_root() and the former cgroup_kill_sb().
    Simliarly, cgroup_put() now directly schedules cgroup_free_rcu()
    when refcnt reaches zero.

    * Unlike before, kernfs objects don't hold onto cgroup objects. When
    cgroup destroys a kernfs node, all existing operations are drained
    and the association is broken immediately. The same for
    cgroupfs_roots and mounts.

    * All operations which come through kernfs guarantee that the
    associated cgroup is and stays valid for the duration of operation;
    however, there are two paths which need to find out the associated
    cgroup from dentry without going through kernfs -
    css_tryget_from_dir() and cgroupstats_build(). For these two,
    kernfs_node->priv is RCU managed so that they can dereference it
    under RCU read lock.

    File and directory handling
    ---------------------------

    * File and directory operations converted to kernfs_ops and
    kernfs_syscall_ops.

    * xattrs is implicitly supported by kernfs. No need to worry about it
    from cgroup. This means that "xattr" mount option is no longer
    necessary. A future patch will add a deprecated warning message
    when sane_behavior.

    * When cftype->max_write_len > PAGE_SIZE, it's necessary to make a
    private copy of one of the kernfs_ops to set its atomic_write_len.
    cftype->kf_ops is added and cgroup_init/exit_cftypes() are updated
    to handle it.

    * cftype->lockdep_key added so that kernfs lockdep annotation can be
    per cftype.

    * Inidividual file entries and open states are now managed by kernfs.
    No need to worry about them from cgroup. cfent, cgroup_open_file
    and their friends are removed.

    * kernfs_nodes are created deactivated and kernfs_activate()
    invocations added to places where creation of new nodes are
    committed.

    * cgroup_rmdir() uses kernfs_[un]break_active_protection() for
    self-removal.

    v2: - Li pointed out in an earlier patch that specifying "name="
    during mount without subsystem specification should succeed if
    there's an existing hierarchy with a matching name although it
    should fail with -EINVAL if a new hierarchy should be created.
    Prior to the conversion, this used by handled by deferring
    failure from NULL return from cgroup_root_from_opts(), which was
    necessary because root was being created before checking for
    existing ones. Note that cgroup_root_from_opts() returned an
    ERR_PTR() value for error conditions which require immediate
    mount failure.

    As we now have separate search and creation steps, deferring
    failure from cgroup_root_from_opts() is no longer necessary.
    cgroup_root_from_opts() is updated to always return ERR_PTR()
    value on failure.

    - The logic to match existing roots is updated so that a mount
    attempt with a matching name but different subsys_mask are
    rejected. This was handled by a separate matching loop under
    the comment "Check for name clashes with existing mounts" but
    got lost during conversion. Merge the check into the main
    search loop.

    - Add __rcu __force casting in RCU_INIT_POINTER() in
    cgroup_destroy_locked() to avoid the sparse address space
    warning reported by kbuild test bot. Maybe we want an explicit
    interface to use kn->priv as RCU protected pointer?

    v3: Make CONFIG_CGROUPS select CONFIG_KERNFS.

    v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to
    cgroup_idr with cgroup_mutex").

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Cc: kbuild test robot fengguang.wu@intel.com>

    Tejun Heo
     

06 Feb, 2014

1 commit

  • This changes 'do_execve()' to get the executable name as a 'struct
    filename', and to free it when it is done. This is what the normal
    users want, and it simplifies and streamlines their error handling.

    The controlled lifetime of the executable name also fixes a
    use-after-free problem with the trace_sched_process_exec tracepoint: the
    lifetime of the passed-in string for kernel users was not at all
    obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
    the pathname allocation lifetime with the execve() having finished,
    which in turn meant that the trace point that happened after
    mm_release() of the old process VM ended up using already free'd memory.

    To solve the kernel string lifetime issue, this simply introduces
    "getname_kernel()" that works like the normal user-space getname()
    function, except with the source coming from kernel memory.

    As Oleg points out, this also means that we could drop the tcomm[] array
    from 'struct linux_binprm', since the pathname lifetime now covers
    setup_new_exec(). That would be a separate cleanup.

    Reported-by: Igor Zhbanov
    Tested-by: Steven Rostedt
    Cc: Oleg Nesterov
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Feb, 2014

1 commit


29 Jan, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
    assorted cleanups and fixes all over the place...

    There will be another pile later this week"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
    __dentry_path() fixes
    vfs: Remove second variable named error in __dentry_path
    vfs: Is mounted should be testing mnt_ns for NULL or error.
    Fix race when checking i_size on direct i/o read
    hfsplus: remove can_set_xattr
    nfsd: use get_acl and ->set_acl
    fs: remove generic_acl
    nfs: use generic posix ACL infrastructure for v3 Posix ACLs
    gfs2: use generic posix ACL infrastructure
    jfs: use generic posix ACL infrastructure
    xfs: use generic posix ACL infrastructure
    reiserfs: use generic posix ACL infrastructure
    ocfs2: use generic posix ACL infrastructure
    jffs2: use generic posix ACL infrastructure
    hfsplus: use generic posix ACL infrastructure
    f2fs: use generic posix ACL infrastructure
    ext2/3/4: use generic posix ACL infrastructure
    btrfs: use generic posix ACL infrastructure
    fs: make posix_acl_create more useful
    fs: make posix_acl_chmod more useful
    ...

    Linus Torvalds
     

28 Jan, 2014

1 commit


26 Jan, 2014

1 commit

  • Pull user namespaces work from Eric Biederman:
    "The work to convert the kernel to use kuid_t and kgid_t has been
    finished since 3.12 so it is time to remove the scaffolding that
    allowed the work to progress incrementally.

    The first patch on this branch just removes the scaffolding, ensuring
    we will always get compile errors if people accidentally try the
    userspace and the kernel uid and gid types. The second patch an
    overlooked and unused chunk of mips code that that fails to build
    after the first patch.

    The code hasn't been in linux-next for long (as I was out of it and
    could not sheppared the cold properly) but the patch has been around
    for a long time just waiting for the day when I had finished the
    uid/gid conversions. Putting the code in linux-next did find the
    compile failure on mips so I took the time to get that fix reviewed
    and included. Beyond that I am not too worried about errors because
    all these two patches do is delete a modest amount of code"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    MIPS: VPE: Remove vpe_getuid and vpe_getgid
    userns: userns: Remove UIDGID_STRICT_TYPE_CHECKS

    Linus Torvalds
     

25 Jan, 2014

2 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Pull ACPI and power management updates from Rafael Wysocki:
    "As far as the number of commits goes, the top spot belongs to ACPI
    this time with cpufreq in the second position and a handful of PM
    core, PNP and cpuidle updates. They are fixes and cleanups mostly, as
    usual, with a couple of new features in the mix.

    The most visible change is probably that we will create struct
    acpi_device objects (visible in sysfs) for all devices represented in
    the ACPI tables regardless of their status and there will be a new
    sysfs attribute under those objects allowing user space to check that
    status via _STA.

    Consequently, ACPI device eject or generally hot-removal will not
    delete those objects, unless the table containing the corresponding
    namespace nodes is unloaded, which is extremely rare. Also ACPI
    container hotplug will be handled quite a bit differently and cpufreq
    will support CPU boost ("turbo") generically and not only in the
    acpi-cpufreq driver.

    Specifics:

    - ACPI core changes to make it create a struct acpi_device object for
    every device represented in the ACPI tables during all namespace
    scans regardless of the current status of that device. In
    accordance with this, ACPI hotplug operations will not delete those
    objects, unless the underlying ACPI tables go away.

    - On top of the above, new sysfs attribute for ACPI device objects
    allowing user space to check device status by triggering the
    execution of _STA for its ACPI object. From Srinivas Pandruvada.

    - ACPI core hotplug changes reducing code duplication, integrating
    the PCI root hotplug with the core and reworking container hotplug.

    - ACPI core simplifications making it use ACPI_COMPANION() in the
    code "glueing" ACPI device objects to "physical" devices.

    - ACPICA update to upstream version 20131218. This adds support for
    the DBG2 and PCCT tables to ACPICA, fixes some bugs and improves
    debug facilities. From Bob Moore, Lv Zheng and Betty Dall.

    - Init code change to carry out the early ACPI initialization
    earlier. That should allow us to use ACPI during the timekeeping
    initialization and possibly to simplify the EFI initialization too.
    From Chun-Yi Lee.

    - Clenups of the inclusions of ACPI headers in many places all over
    from Lv Zheng and Rashika Kheria (work in progress).

    - New helper for ACPI _DSM execution and rework of the code in
    drivers that uses _DSM to execute it via the new helper. From
    Jiang Liu.

    - New Win8 OSI blacklist entries from Takashi Iwai.

    - Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun
    Guo, Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava,
    Rashika Kheria, Tang Chen, Zhang Rui.

    - intel_pstate driver updates, including proper Baytrail support,
    from Dirk Brandewie and intel_pstate documentation from Ramkumar
    Ramachandra.

    - Generic CPU boost ("turbo") support for cpufreq from Lukasz
    Majewski.

    - powernow-k6 cpufreq driver fixes from Mikulas Patocka.

    - cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark
    Brown.

    - Assorted cpufreq drivers fixes and cleanups from Anson Huang, John
    Tobias, Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh
    Kumar.

    - cpuidle cleanups from Bartlomiej Zolnierkiewicz.

    - Support for hibernation APM events from Bin Shi.

    - Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC
    disabled during thaw transitions from Bjørn Mork.

    - PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf
    Hansson.

    - PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente
    Kurusa, Rashika Kheria.

    - New tool for profiling system suspend from Todd E Brandt and a
    cpupower tool cleanup from One Thousand Gnomes"

    * tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (153 commits)
    thermal: exynos: boost: Automatic enable/disable of BOOST feature (at Exynos4412)
    cpufreq: exynos4x12: Change L0 driver data to CPUFREQ_BOOST_FREQ
    Documentation: cpufreq / boost: Update BOOST documentation
    cpufreq: exynos: Extend Exynos cpufreq driver to support boost
    cpufreq / boost: Kconfig: Support for software-managed BOOST
    acpi-cpufreq: Adjust the code to use the common boost attribute
    cpufreq: Add boost frequency support in core
    intel_pstate: Add trace point to report internal state.
    cpufreq: introduce cpufreq_generic_get() routine
    ARM: SA1100: Create dummy clk_get_rate() to avoid build failures
    cpufreq: stats: create sysfs entries when cpufreq_stats is a module
    cpufreq: stats: free table and remove sysfs entry in a single routine
    cpufreq: stats: remove hotplug notifiers
    cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly
    cpufreq: speedstep: remove unused speedstep_get_state
    platform: introduce OF style 'modalias' support for platform bus
    PM / tools: new tool for suspend/resume performance optimization
    ACPI: fix module autoloading for ACPI enumerated devices
    ACPI: add module autoloading support for ACPI enumerated devices
    ACPI: fix create_modalias() return value handling
    ...

    Linus Torvalds
     

24 Jan, 2014

2 commits


22 Jan, 2014

4 commits

  • Merge first patch-bomb from Andrew Morton:

    - a couple of misc things

    - inotify/fsnotify work from Jan

    - ocfs2 updates (partial)

    - about half of MM

    * emailed patches from Andrew Morton : (117 commits)
    mm/migrate: remove unused function, fail_migrate_page()
    mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages
    mm/migrate: correct failure handling if !hugepage_migration_support()
    mm/migrate: add comment about permanent failure path
    mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure
    mm: compaction: reset scanner positions immediately when they meet
    mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
    mm: compaction: detect when scanners meet in isolate_freepages
    mm: compaction: reset cached scanner pfn's before reading them
    mm: compaction: encapsulate defer reset logic
    mm: compaction: trace compaction begin and end
    memcg, oom: lock mem_cgroup_print_oom_info
    sched: add tracepoints related to NUMA task migration
    mm: numa: do not automatically migrate KSM pages
    mm: numa: trace tasks that fail migration due to rate limiting
    mm: numa: limit scope of lock for NUMA migrate rate limiting
    mm: numa: make NUMA-migrate related functions static
    lib/show_mem.c: show num_poisoned_pages when oom
    mm/hwpoison: add '#' to hwpoison_inject
    mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter
    ...

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:
    "The bulk of changes are cleanups and preparations for the upcoming
    kernfs conversion.

    - cgroup_event mechanism which is and will be used only by memcg is
    moved to memcg.

    - pidlist handling is updated so that it can be served by seq_file.

    Also, the list is not sorted if sane_behavior. cgroup
    documentation explicitly states that the file is not sorted but it
    has been for quite some time.

    - All cgroup file handling now happens on top of seq_file. This is
    to prepare for kernfs conversion. In addition, all operations are
    restructured so that they map 1-1 to kernfs operations.

    - Other cleanups and low-pri fixes"

    * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits)
    cgroup: trivial style updates
    cgroup: remove stray references to css_id
    doc: cgroups: Fix typo in doc/cgroups
    cgroup: fix fail path in cgroup_load_subsys()
    cgroup: fix missing unlock on error in cgroup_load_subsys()
    cgroup: remove for_each_root_subsys()
    cgroup: implement for_each_css()
    cgroup: factor out cgroup_subsys_state creation into create_css()
    cgroup: combine css handling loops in cgroup_create()
    cgroup: reorder operations in cgroup_create()
    cgroup: make for_each_subsys() useable under cgroup_root_mutex
    cgroup: css iterations and css_from_dir() are safe under cgroup_mutex
    cgroup: unify pidlist and other file handling
    cgroup: replace cftype->read_seq_string() with cftype->seq_show()
    cgroup: attach cgroup_open_file to all cgroup files
    cgroup: generalize cgroup_pidlist_open_file
    cgroup: unify read path so that seq_file is always used
    cgroup: unify cgroup_write_X64() and cgroup_write_string()
    cgroup: remove cftype->read(), ->read_map() and ->write()
    hugetlb_cgroup: convert away from cftype->read()
    ...

    Linus Torvalds
     
  • Switch to memblock interfaces for early memory allocator instead of
    bootmem allocator. No functional change in beahvior than what it is in
    current code from bootmem users points of view.

    Archs already converted to NO_BOOTMEM now directly use memblock
    interfaces instead of bootmem wrappers build on top of memblock. And
    the archs which still uses bootmem, these new apis just fall back to
    exiting bootmem APIs.

    Signed-off-by: Santosh Shilimkar
    Cc: Yinghai Lu
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Greg Kroah-Hartman
    Cc: Grygorii Strashko
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tony Lindgren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Santosh Shilimkar
     
  • If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
    is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
    so we loose 24 on each. An average system can easily allocate few tens
    thousands of page->ptl and overhead is significant.

    Let's create a separate slab for page->ptl allocation to solve this.

    To make sure that it really works this time, some numbers from my test
    machine (just booted, no load):

    Before:
    # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
    kmalloc-96 31987 32190 128 30 1 : tunables 120 60 8 : slabdata 1073 1073 92
    After:
    # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
    page->ptl 27516 28143 72 53 1 : tunables 120 60 8 : slabdata 531 531 9
    kmalloc-96 3853 5280 128 30 1 : tunables 120 60 8 : slabdata 176 176 0

    Note that the patch is useful not only for debug case, but also for
    PREEMPT_RT, where spinlock_t is always bloated.

    Signed-off-by: Kirill A. Shutemov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

16 Jan, 2014

1 commit

  • This is a variant patch from Rafael J. Wysocki's
    ACPI / init: Run acpi_early_init() before efi_enter_virtual_mode()

    According to Matt Fleming, if acpi_early_init() was executed before
    efi_enter_virtual_mode(), the EFI initialization could benefit from
    it, so Rafael's patch makes that happen.

    And, we want accessing ACPI TAD device to set system clock, so move
    acpi_early_init() before timekeeping_init(). This final position is
    also before efi_enter_virtual_mode().

    Tested-by: Toshi Kani
    Signed-off-by: Lee, Chun-Yi
    Signed-off-by: Rafael J. Wysocki

    Lee, Chun-Yi
     

12 Jan, 2014

1 commit


11 Dec, 2013

1 commit

  • Introduce mul_u64_u32_shr() as proposed by Andy a while back; it
    allows using 64x64->128 muls on 64bit archs and recent GCC
    which defines __SIZEOF_INT128__ and __int128.

    (This new method will be used by the scheduler.)

    Signed-off-by: Peter Zijlstra
    Cc: fweisbec@gmail.com
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Dec, 2013

1 commit


27 Nov, 2013

1 commit

  • Removing UIDGID_STRICT_TYPE_CHECKS simplifies the code and always
    generates a compile error if the uids and kuids or gids and kgids are
    mixed by accident. Now that the appropriate conversions have been
    placed throughout the kernel there is no longer a need for a mode where
    we don't detect them as compile errors.

    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

23 Nov, 2013

2 commits

  • Merge v3.12 based patch series to move cgroup_event implementation to
    memcg into for-3.14. The following two commits cause a conflict in
    kernel/cgroup.c

    2ff2a7d03bbe4 ("cgroup: kill css_id")
    79bd9814e5ec9 ("cgroup, memcg: move cgroup_event implementation to memcg")

    Each patch removes a struct definition from kernel/cgroup.c. As the
    two are adjacent, they cause a context conflict. Easily resolved by
    removing both structs.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • cgroup_event is way over-designed and tries to build a generic
    flexible event mechanism into cgroup - fully customizable event
    specification for each user of the interface. This is utterly
    unnecessary and overboard especially in the light of the planned
    unified hierarchy as there's gonna be single agent. Simply generating
    events at fixed points, or if that's too restrictive, configureable
    cadence or single set of configureable points should be enough.

    Thankfully, memcg is the only user and gets to keep it. Replacing it
    with something simpler on sane_behavior is strongly recommended.

    This patch moves cgroup_event and "cgroup.event_control"
    implementation to mm/memcontrol.c. Clearing of events on cgroup
    destruction is moved from cgroup_destroy_locked() to
    mem_cgroup_css_offline(), which shouldn't make any noticeable
    difference.

    cgroup_css() and __file_cft() are exported to enable the move;
    however, this will soon be reverted once the event code is updated to
    be memcg specific.

    Note that "cgroup.event_control" will now exist only on the hierarchy
    with memcg attached to it. While this change is visible to userland,
    it is unlikely to be noticeable as the file has never been meaningful
    outside memcg.

    Aside from the above change, this is pure code relocation.

    v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
    poll.h inclusion moved from cgroup.c to memcontrol.c.

    Signed-off-by: Tejun Heo
    Acked-by: Li Zefan
    Acked-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Balbir Singh

    Tejun Heo
     

22 Nov, 2013

2 commits

  • Pull security subsystem updates from James Morris:
    "In this patchset, we finally get an SELinux update, with Paul Moore
    taking over as maintainer of that code.

    Also a significant update for the Keys subsystem, as well as
    maintenance updates to Smack, IMA, TPM, and Apparmor"

    and since I wanted to know more about the updates to key handling,
    here's the explanation from David Howells on that:

    "Okay. There are a number of separate bits. I'll go over the big bits
    and the odd important other bit, most of the smaller bits are just
    fixes and cleanups. If you want the small bits accounting for, I can
    do that too.

    (1) Keyring capacity expansion.

    KEYS: Consolidate the concept of an 'index key' for key access
    KEYS: Introduce a search context structure
    KEYS: Search for auth-key by name rather than target key ID
    Add a generic associative array implementation.
    KEYS: Expand the capacity of a keyring

    Several of the patches are providing an expansion of the capacity of a
    keyring. Currently, the maximum size of a keyring payload is one page.
    Subtract a small header and then divide up into pointers, that only gives
    you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
    a keyring to store ID mapping data, that has proven to be insufficient to
    the cause.

    Whatever data structure I use to handle the keyring payload, it can only
    store pointers to keys, not the keys themselves because several keyrings
    may point to a single key. This precludes inserting, say, and rb_node
    struct into the key struct for this purpose.

    I could make an rbtree of records such that each record has an rb_node
    and a key pointer, but that would use four words of space per key stored
    in the keyring. It would, however, be able to use much existing code.

    I selected instead a non-rebalancing radix-tree type approach as that
    could have a better space-used/key-pointer ratio. I could have used the
    radix tree implementation that we already have and insert keys into it by
    their serial numbers, but that means any sort of search must iterate over
    the whole radix tree. Further, its nodes are a bit on the capacious side
    for what I want - especially given that key serial numbers are randomly
    allocated, thus leaving a lot of empty space in the tree.

    So what I have is an associative array that internally is a radix-tree
    with 16 pointers per node where the index key is constructed from the key
    type pointer and the key description. This means that an exact lookup by
    type+description is very fast as this tells us how to navigate directly to
    the target key.

    I made the data structure general in lib/assoc_array.c as far as it is
    concerned, its index key is just a sequence of bits that leads to a
    pointer. It's possible that someone else will be able to make use of it
    also. FS-Cache might, for example.

    (2) Mark keys as 'trusted' and keyrings as 'trusted only'.

    KEYS: verify a certificate is signed by a 'trusted' key
    KEYS: Make the system 'trusted' keyring viewable by userspace
    KEYS: Add a 'trusted' flag and a 'trusted only' flag
    KEYS: Separate the kernel signature checking keyring from module signing

    These patches allow keys carrying asymmetric public keys to be marked as
    being 'trusted' and allow keyrings to be marked as only permitting the
    addition or linkage of trusted keys.

    Keys loaded from hardware during kernel boot or compiled into the kernel
    during build are marked as being trusted automatically. New keys can be
    loaded at runtime with add_key(). They are checked against the system
    keyring contents and if their signatures can be validated with keys that
    are already marked trusted, then they are marked trusted also and can
    thus be added into the master keyring.

    Patches from Mimi Zohar make this usable with the IMA keyrings also.

    (3) Remove the date checks on the key used to validate a module signature.

    X.509: Remove certificate date checks

    It's not reasonable to reject a signature just because the key that it was
    generated with is no longer valid datewise - especially if the kernel
    hasn't yet managed to set the system clock when the first module is
    loaded - so just remove those checks.

    (4) Make it simpler to deal with additional X.509 being loaded into the kernel.

    KEYS: Load *.x509 files into kernel keyring
    KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate

    The builder of the kernel now just places files with the extension ".x509"
    into the kernel source or build trees and they're concatenated by the
    kernel build and stuffed into the appropriate section.

    (5) Add support for userspace kerberos to use keyrings.

    KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
    KEYS: Implement a big key type that can save to tmpfs

    Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
    We looked at storing it in keyrings instead as that confers certain
    advantages such as tickets being automatically deleted after a certain
    amount of time and the ability for the kernel to get at these tokens more
    easily.

    To make this work, two things were needed:

    (a) A way for the tickets to persist beyond the lifetime of all a user's
    sessions so that cron-driven processes can still use them.

    The problem is that a user's session keyrings are deleted when the
    session that spawned them logs out and the user's user keyring is
    deleted when the UID is deleted (typically when the last log out
    happens), so neither of these places is suitable.

    I've added a system keyring into which a 'persistent' keyring is
    created for each UID on request. Each time a user requests their
    persistent keyring, the expiry time on it is set anew. If the user
    doesn't ask for it for, say, three days, the keyring is automatically
    expired and garbage collected using the existing gc. All the kerberos
    tokens it held are then also gc'd.

    (b) A key type that can hold really big tickets (up to 1MB in size).

    The problem is that Active Directory can return huge tickets with lots
    of auxiliary data attached. We don't, however, want to eat up huge
    tracts of unswappable kernel space for this, so if the ticket is
    greater than a certain size, we create a swappable shmem file and dump
    the contents in there and just live with the fact we then have an
    inode and a dentry overhead. If the ticket is smaller than that, we
    slap it in a kmalloc()'d buffer"

    * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
    KEYS: Fix keyring content gc scanner
    KEYS: Fix error handling in big_key instantiation
    KEYS: Fix UID check in keyctl_get_persistent()
    KEYS: The RSA public key algorithm needs to select MPILIB
    ima: define '_ima' as a builtin 'trusted' keyring
    ima: extend the measurement list to include the file signature
    kernel/system_certificate.S: use real contents instead of macro GLOBAL()
    KEYS: fix error return code in big_key_instantiate()
    KEYS: Fix keyring quota misaccounting on key replacement and unlink
    KEYS: Fix a race between negating a key and reading the error set
    KEYS: Make BIG_KEYS boolean
    apparmor: remove the "task" arg from may_change_ptraced_domain()
    apparmor: remove parent task info from audit logging
    apparmor: remove tsk field from the apparmor_audit_struct
    apparmor: fix capability to not use the current task, during reporting
    Smack: Ptrace access check mode
    ima: provide hash algo info in the xattr
    ima: enable support for larger default filedata hash algorithms
    ima: define kernel parameter 'ima_template=' to change configured default
    ima: add Kconfig default measurement list template
    ...

    Linus Torvalds
     
  • Pull audit updates from Eric Paris:
    "Nothing amazing. Formatting, small bug fixes, couple of fixes where
    we didn't get records due to some old VFS changes, and a change to how
    we collect execve info..."

    Fixed conflict in fs/exec.c as per Eric and linux-next.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    audit: fix type of sessionid in audit_set_loginuid()
    audit: call audit_bprm() only once to add AUDIT_EXECVE information
    audit: move audit_aux_data_execve contents into audit_context union
    audit: remove unused envc member of audit_aux_data_execve
    audit: Kill the unused struct audit_aux_data_capset
    audit: do not reject all AUDIT_INODE filter types
    audit: suppress stock memalloc failure warnings since already managed
    audit: log the audit_names record type
    audit: add child record before the create to handle case where create fails
    audit: use given values in tty_audit enable api
    audit: use nlmsg_len() to get message payload length
    audit: use memset instead of trying to initialize field by field
    audit: fix info leak in AUDIT_GET requests
    audit: update AUDIT_INODE filter rule to comparator function
    audit: audit feature to set loginuid immutable
    audit: audit feature to only allow unsetting the loginuid
    audit: allow unsetting the loginuid (with priv)
    audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
    audit: loginuid functions coding style
    selinux: apply selinux checks on new audit message types
    ...

    Linus Torvalds
     

21 Nov, 2013

1 commit

  • This reverts commit ea1e7ed33708c7a760419ff9ded0a6cb90586a50.

    Al points out that while the commit *does* actually create a separate
    slab for the page->ptl allocation, that slab is never actually used, and
    the code continues to use kmalloc/kfree.

    Damien Wyart points out that the original patch did have the conversion
    to use kmem_cache_alloc/free, so it got lost somewhere on its way to me.

    Revert the half-arsed attempt that didn't do anything. If we really do
    want the special slab (remember: this is all relevant just for debug
    builds, so it's not necessarily all that critical) we might as well redo
    the patch fully.

    Reported-by: Al Viro
    Acked-by: Andrew Morton
    Cc: Kirill A Shutemov
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Nov, 2013

1 commit

  • This reverts commit 69f0554ec261fd686ac7fa1c598cc9eb27b83a80.

    This patch breaks randconfig on at least the x86-64 architecture, and
    most likely on others. There is work underway to support uncompressed
    kernels in a generic way, but it looks like it will amount to
    rewriting the support from scratch; see the LKML thread in the Link:
    for info.

    Therefore, revert this change and wait for the fix.

    Reported-by: Pavel Roskin
    Cc: Christian Ruppert
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20131113113418.167b8ffd@IRBT4585
    Signed-off-by: H. Peter Anvin
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     

16 Nov, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "Usual earth-shaking, news-breaking, rocket science pile from
    trivial.git"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
    doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
    doc: add missing files to timers/00-INDEX
    timekeeping: Fix some trivial typos in comments
    mm: Fix some trivial typos in comments
    irq: Fix some trivial typos in comments
    NUMA: fix typos in Kconfig help text
    mm: update 00-INDEX
    doc: Documentation/DMA-attributes.txt fix typo
    DRM: comment: `halve' -> `half'
    Docs: Kconfig: `devlopers' -> `developers'
    doc: typo on word accounting in kprobes.c in mutliple architectures
    treewide: fix "usefull" typo
    treewide: fix "distingush" typo
    mm/Kconfig: Grammar s/an/a/
    kexec: Typo s/the/then/
    Documentation/kvm: Update cpuid documentation for steal time and pv eoi
    treewide: Fix common typo in "identify"
    __page_to_pfn: Fix typo in comment
    Correct some typos for word frequency
    clk: fixed-factor: Fix a trivial typo
    ...

    Linus Torvalds