05 Oct, 2012

1 commit

  • Pull KVM updates from Avi Kivity:
    "Highlights of the changes for this release include support for vfio
    level triggered interrupts, improved big real mode support on older
    Intels, a streamlines guest page table walker, guest APIC speedups,
    PIO optimizations, better overcommit handling, and read-only memory."

    * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
    KVM: s390: Fix vcpu_load handling in interrupt code
    KVM: x86: Fix guest debug across vcpu INIT reset
    KVM: Add resampling irqfds for level triggered interrupts
    KVM: optimize apic interrupt delivery
    KVM: MMU: Eliminate pointless temporary 'ac'
    KVM: MMU: Avoid access/dirty update loop if all is well
    KVM: MMU: Eliminate eperm temporary
    KVM: MMU: Optimize is_last_gpte()
    KVM: MMU: Simplify walk_addr_generic() loop
    KVM: MMU: Optimize pte permission checks
    KVM: MMU: Update accessed and dirty bits after guest pagetable walk
    KVM: MMU: Move gpte_access() out of paging_tmpl.h
    KVM: MMU: Optimize gpte_access() slightly
    KVM: MMU: Push clean gpte write protection out of gpte_access()
    KVM: clarify kvmclock documentation
    KVM: make processes waiting on vcpu mutex killable
    KVM: SVM: Make use of asm.h
    KVM: VMX: Make use of asm.h
    KVM: VMX: Make lto-friendly
    KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
    ...

    Conflicts:
    arch/s390/include/asm/processor.h
    arch/x86/kvm/i8259.c

    Linus Torvalds
     

03 Oct, 2012

8 commits

  • Pull security subsystem updates from James Morris:
    "Highlights:

    - Integrity: add local fs integrity verification to detect offline
    attacks
    - Integrity: add digital signature verification
    - Simple stacking of Yama with other LSMs (per LSS discussions)
    - IBM vTPM support on ppc64
    - Add new driver for Infineon I2C TIS TPM
    - Smack: add rule revocation for subject labels"

    Fixed conflicts with the user namespace support in kernel/auditsc.c and
    security/integrity/ima/ima_policy.c.

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
    Documentation: Update git repository URL for Smack userland tools
    ima: change flags container data type
    Smack: setprocattr memory leak fix
    Smack: implement revoking all rules for a subject label
    Smack: remove task_wait() hook.
    ima: audit log hashes
    ima: generic IMA action flag handling
    ima: rename ima_must_appraise_or_measure
    audit: export audit_log_task_info
    tpm: fix tpm_acpi sparse warning on different address spaces
    samples/seccomp: fix 31 bit build on s390
    ima: digital signature verification support
    ima: add support for different security.ima data types
    ima: add ima_inode_setxattr/removexattr function and calls
    ima: add inode_post_setattr call
    ima: replace iint spinblock with rwlock/read_lock
    ima: allocating iint improvements
    ima: add appraise action keywords and default rules
    ima: integrity appraisal extension
    vfs: move ima_file_free before releasing the file
    ...

    Linus Torvalds
     
  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • Pull power management updates from Rafael J Wysocki:

    - Improved system suspend/resume and runtime PM handling for the SH
    TMU, CMT and MTU2 clock event devices (also used by ARM/shmobile).

    - Generic PM domains framework extensions related to cpuidle support
    and domain objects lookup using names.

    - ARM/shmobile power management updates including improved support for
    the SH7372's A4S power domain containing the CPU core.

    - cpufreq changes related to AMD CPUs support from Matthew Garrett,
    Andre Przywara and Borislav Petkov.

    - cpu0 cpufreq driver from Shawn Guo.

    - cpufreq governor fixes related to the relaxing of limit from Michal
    Pecio.

    - OMAP cpufreq updates from Axel Lin and Richard Zhao.

    - cpuidle ladder governor fixes related to the disabling of states from
    Carsten Emde and me.

    - Runtime PM core updates related to the interactions with the system
    suspend core from Alan Stern and Kevin Hilman.

    - Wakeup sources modification allowing more helper functions to be
    called from interrupt context from John Stultz and additional
    diagnostic code from Todd Poynor.

    - System suspend error code path fix from Feng Hong.

    Fixed up conflicts in cpufreq/powernow-k8 that stemmed from the
    workqueue fixes conflicting fairly badly with the removal of support for
    hardware P-state chips. The changes were independent but somewhat
    intertwined.

    * tag 'pm-for-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
    Revert "PM QoS: Use spinlock in the per-device PM QoS constraints code"
    PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE, even when disabled, v2
    cpuidle: rename function name "__cpuidle_register_driver", v2
    cpufreq: OMAP: Check IS_ERR() instead of NULL for omap_device_get_by_hwmod_name
    cpuidle: remove some empty lines
    PM: Prevent runtime suspend during system resume
    PM QoS: Use spinlock in the per-device PM QoS constraints code
    PM / Sleep: use resume event when call dpm_resume_early
    cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure
    ACPI / processor: remove pointless variable initialization
    ACPI / processor: remove unused function parameter
    cpufreq: OMAP: remove loops_per_jiffy recalculate for smp
    sections: fix section conflicts in drivers/cpufreq
    cpufreq: conservative: update frequency when limits are relaxed
    cpufreq / ondemand: update frequency when limits are relaxed
    properly __init-annotate pm_sysrq_init()
    cpufreq: Add a generic cpufreq-cpu0 driver
    PM / OPP: Initialize OPP table from device tree
    ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
    cpufreq: Remove support for hardware P-state chips from powernow-k8
    ...

    Linus Torvalds
     
  • Pull networking changes from David Miller:

    1) GRE now works over ipv6, from Dmitry Kozlov.

    2) Make SCTP more network namespace aware, from Eric Biederman.

    3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

    4) Make openvswitch network namespace aware, from Pravin B Shelar.

    5) IPV6 NAT implementation, from Patrick McHardy.

    6) Server side support for TCP Fast Open, from Jerry Chu and others.

    7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

    8) Increate the loopback default MTU to 64K, from Eric Dumazet.

    9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic. This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

    10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail. Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

    11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

    12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

    13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

    Fix up various fairly trivial conflicts, mostly caused by the user
    namespace changes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
    hyperv: Add buffer for extended info after the RNDIS response message.
    hyperv: Report actual status in receive completion packet
    hyperv: Remove extra allocated space for recv_pkt_list elements
    hyperv: Fix page buffer handling in rndis_filter_send_request()
    hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
    hyperv: Fix the max_xfer_size in RNDIS initialization
    vxlan: put UDP socket in correct namespace
    vxlan: Depend on CONFIG_INET
    sfc: Fix the reported priorities of different filter types
    sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
    sfc: Fix loopback self-test with separate_tx_channels=1
    sfc: Fix MCDI structure field lookup
    sfc: Add parentheses around use of bitfield macro arguments
    sfc: Fix null function pointer in efx_sriov_channel_type
    vxlan: virtual extensible lan
    igmp: export symbol ip_mc_leave_group
    netlink: add attributes to fdb interface
    tg3: unconditionally select HWMON support when tg3 is enabled.
    Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
    gre: fix sparse warning
    ...

    Linus Torvalds
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     
  • Pull cgroup hierarchy update from Tejun Heo:
    "Currently, different cgroup subsystems handle nested cgroups
    completely differently. There's no consistency among subsystems and
    the behaviors often are outright broken.

    People at least seem to agree that the broken hierarhcy behaviors need
    to be weeded out if any progress is gonna be made on this front and
    that the fallouts from deprecating the broken behaviors should be
    acceptable especially given that the current behaviors don't make much
    sense when nested.

    This patch makes cgroup emit warning messages if cgroups for
    subsystems with broken hierarchy behavior are nested to prepare for
    fixing them in the future. This was put in a separate branch because
    more related changes were expected (didn't make it this round) and the
    memory cgroup wanted to pull in this and make changes on top."

    * 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:

    - xattr support added. The implementation is shared with tmpfs. The
    usage is restricted and intended to be used to manage per-cgroup
    metadata by system software. tmpfs changes are routed through this
    branch with Hugh's permission.

    - cgroup subsystem ID handling simplified.

    * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
    cgroup: Assign subsystem IDs during compile time
    cgroup: Do not depend on a given order when populating the subsys array
    cgroup: Wrap subsystem selection macro
    cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
    cgroup: net_prio: Do not define task_netpioidx() when not selected
    cgroup: net_cls: Do not define task_cls_classid() when not selected
    cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
    cgroup: trivial fixes for Documentation/cgroups/cgroups.txt
    xattr: mark variable as uninitialized to make both gcc and smatch happy
    fs: add missing documentation to simple_xattr functions
    cgroup: add documentation on extended attributes usage
    cgroup: rename subsys_bits to subsys_mask
    cgroup: add xattr support
    cgroup: revise how we re-populate root directory
    xattr: extract simple_xattr code from tmpfs

    Linus Torvalds
     
  • Pull workqueue changes from Tejun Heo:
    "This is workqueue updates for v3.7-rc1. A lot of activities this
    round including considerable API and behavior cleanups.

    * delayed_work combines a timer and a work item. The handling of the
    timer part has always been a bit clunky leading to confusing
    cancelation API with weird corner-case behaviors. delayed_work is
    updated to use new IRQ safe timer and cancelation now works as
    expected.

    * Another deficiency of delayed_work was lack of the counterpart of
    mod_timer() which led to cancel+queue combinations or open-coded
    timer+work usages. mod_delayed_work[_on]() are added.

    These two delayed_work changes make delayed_work provide interface
    and behave like timer which is executed with process context.

    * A work item could be executed concurrently on multiple CPUs, which
    is rather unintuitive and made flush_work() behavior confusing and
    half-broken under certain circumstances. This problem doesn't
    exist for non-reentrant workqueues. While non-reentrancy check
    isn't free, the overhead is incurred only when a work item bounces
    across different CPUs and even in simulated pathological scenario
    the overhead isn't too high.

    All workqueues are made non-reentrant. This removes the
    distinction between flush_[delayed_]work() and
    flush_[delayed_]_work_sync(). The former is now as strong as the
    latter and the specified work item is guaranteed to have finished
    execution of any previous queueing on return.

    * In addition to the various bug fixes, Lai redid and simplified CPU
    hotplug handling significantly.

    * Joonsoo introduced system_highpri_wq and used it during CPU
    hotplug.

    There are two merge commits - one to pull in IRQ safe timer from
    tip/timers/core and the other to pull in CPU hotplug fixes from
    wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

    Fixed a number of trivial conflicts, but the more interesting conflicts
    were silent ones where the deprecated interfaces had been used by new
    code in the merge window, and thus didn't cause any real data conflicts.

    Tejun pointed out a few of them, I fixed a couple more.

    * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
    workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
    workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
    workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
    workqueue: remove @delayed from cwq_dec_nr_in_flight()
    workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
    workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
    workqueue: use __cpuinit instead of __devinit for cpu callbacks
    workqueue: rename manager_mutex to assoc_mutex
    workqueue: WORKER_REBIND is no longer necessary for idle rebinding
    workqueue: WORKER_REBIND is no longer necessary for busy rebinding
    workqueue: reimplement idle worker rebinding
    workqueue: deprecate __cancel_delayed_work()
    workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
    workqueue: use mod_delayed_work() instead of __cancel + queue
    workqueue: use irqsafe timer for delayed_work
    workqueue: clean up delayed_work initializers and add missing one
    workqueue: make deferrable delayed_work initializer names consistent
    workqueue: cosmetic whitespace updates for macro definitions
    workqueue: deprecate system_nrt[_freezable]_wq
    workqueue: deprecate flush[_delayed]_work_sync()
    ...

    Linus Torvalds
     

02 Oct, 2012

9 commits

  • Pull TTY changes from Greg Kroah-Hartman:
    "As we skipped the merge window for 3.6-rc1 for the tty tree,
    everything is now settled down and working properly, so we are ready
    for 3.7-rc1. Here's the patchset, it's big, but the large changes are
    removing a firmware file and adding a staging tty driver (it depended
    on the tty core changes, so it's going through this tree instead of
    the staging tree.)

    All of these patches have been in the linux-next tree for a while.

    Signed-off-by: Greg Kroah-Hartman "

    Fix up more-or-less trivial conflicts in
    - drivers/char/pcmcia/synclink_cs.c:
    tty NULL dereference fix vs tty_port_cts_enabled() helper function
    - drivers/staging/{Kconfig,Makefile}:
    add-add conflict (dgrp driver added close to other staging drivers)
    - drivers/staging/ipack/devices/ipoctal.c:
    "split ipoctal_channel from iopctal" vs "TTY: use tty_port_register_device"

    * tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (235 commits)
    tty/serial: Add kgdb_nmi driver
    tty/serial/amba-pl011: Quiesce interrupts in poll_get_char
    tty/serial/amba-pl011: Implement poll_init callback
    tty/serial/core: Introduce poll_init callback
    kdb: Turn KGDB_KDB=n stubs into static inlines
    kdb: Implement disable_nmi command
    kernel/debug: Mask KGDB NMI upon entry
    serial: pl011: handle corruption at high clock speeds
    serial: sccnxp: Make 'default' choice in switch last
    serial: sccnxp: Remove mask termios caps for SW flow control
    serial: sccnxp: Report actual baudrate back to core
    serial: samsung: Add poll_get_char & poll_put_char
    Powerpc 8xx CPM_UART setting MAXIDL register proportionaly to baud rate
    Powerpc 8xx CPM_UART maxidl should not depend on fifo size
    Powerpc 8xx CPM_UART too many interrupts
    Powerpc 8xx CPM_UART desynchronisation
    serial: set correct baud_base for EXSYS EX-41092 Dual 16950
    serial: omap: fix the reciever line error case
    8250: blacklist Winbond CIR port
    8250_pnp: do pnp probe before legacy probe
    ...

    Linus Torvalds
     
  • Pull arm64 support from Catalin Marinas:
    "Linux support for the 64-bit ARM architecture (AArch64)

    Features currently supported:
    - 39-bit address space for user and kernel (each)
    - 4KB and 64KB page configurations
    - Compat (32-bit) user applications (ARMv7, EABI only)
    - Flattened Device Tree (mandated for all AArch64 platforms)
    - ARM generic timers"

    * tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (35 commits)
    arm64: ptrace: remove obsolete ptrace request numbers from user headers
    arm64: Do not set the SMP/nAMP processor bit
    arm64: MAINTAINERS update
    arm64: Build infrastructure
    arm64: Miscellaneous header files
    arm64: Generic timers support
    arm64: Loadable modules
    arm64: Miscellaneous library functions
    arm64: Performance counters support
    arm64: Add support for /proc/sys/debug/exception-trace
    arm64: Debugging support
    arm64: Floating point and SIMD
    arm64: 32-bit (compat) applications support
    arm64: User access library functions
    arm64: Signal handling support
    arm64: VDSO support
    arm64: System calls handling
    arm64: ELF definitions
    arm64: SMP support
    arm64: DMA mapping API
    ...

    Linus Torvalds
     
  • Pull x86/asm changes from Ingo Molnar:
    "The one change that stands out is the alternatives patching change
    that prevents us from ever patching back instructions from SMP to UP:
    this simplifies things and speeds up CPU hotplug.

    Other than that it's smaller fixes, cleanups and improvements."

    * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86: Unspaghettize do_trap()
    x86_64: Work around old GAS bug
    x86: Use REP BSF unconditionally
    x86: Prefer TZCNT over BFS
    x86/64: Adjust types of temporaries used by ffs()/fls()/fls64()
    x86: Drop unnecessary kernel_eflags variable on 64-bit
    x86/smp: Don't ever patch back to UP if we unplug cpus

    Linus Torvalds
     
  • Pull timer changes from Ingo Molnar:
    "Timer enhancements, generalizations and cleanups from Tejun Heo, in
    preparation for workqueue facility enhancements."

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timer: Implement TIMER_IRQSAFE
    timer: Clean up timer initializers
    timer: Relocate declarations of init_timer_on_stack_key()
    timer: Generalize timer->base flags handling

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:
    "Continued quest to clean up and enhance the cputime code by Frederic
    Weisbecker, in preparation for future tickless kernel features.

    Other than that, smallish changes."

    Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    cputime: Make finegrained irqtime accounting generally available
    cputime: Gather time/stats accounting config options into a single menu
    ia64: Reuse system and user vtime accounting functions on task switch
    ia64: Consolidate user vtime accounting
    vtime: Consolidate system/idle context detection
    cputime: Use a proper subsystem naming for vtime related APIs
    sched: cpu_power: enable ARCH_POWER
    sched/nohz: Clean up select_nohz_load_balancer()
    sched: Fix load avg vs. cpu-hotplug
    sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW
    sched: Fix nohz_idle_balance()
    sched: Remove useless code in yield_to()
    sched: Add time unit suffix to sched sysctl knobs
    sched/debug: Limit sd->*_idx range on sysctl
    sched: Remove AFFINE_WAKEUPS feature flag
    s390: Remove leftover account_tick_vtime() header
    cputime: Consolidate vtime handling on context switch
    sched: Move cputime code to its own file
    cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
    tile: Remove SD_PREFER_LOCAL leftover
    ...

    Linus Torvalds
     
  • Pull perf update from Ingo Molnar:
    "Lots of changes in this cycle as well, with hundreds of commits from
    over 30 contributors. Most of the activity was on the tooling side.

    Higher level changes:

    - New 'perf kvm' analysis tool, from Xiao Guangrong.

    - New 'perf trace' system-wide tracing tool

    - uprobes fixes + cleanups from Oleg Nesterov.

    - Lots of patches to make perf build on Android out of box, from
    Irina Tirdea

    - Extend ftrace function tracing utility to be more dynamic for its
    users. It allows for data passing to the callback functions, as
    well as reading regs as if a breakpoint were to trigger at function
    entry.

    The main goal of this patch series was to allow kprobes to use
    ftrace as an optimized probe point when a probe is placed on an
    ftrace nop. With lots of help from Masami Hiramatsu, and going
    through lots of iterations, we finally came up with a good
    solution.

    - Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng.

    - Various tracing updates from Steve Rostedt

    - Clean up and improve 'perf sched' performance by elliminating lots
    of needless calls to libtraceevent.

    - Event group parsing support, from Jiri Olsa

    - UI/gtk refactorings and improvements from Namhyung Kim

    - Add support for non-tracepoint events in perf script python, from
    Feng Tang

    - Add --symbols to 'script', similar to the one in 'report', from
    Feng Tang.

    Infrastructure enhancements and fixes:

    - Convert the trace builtins to use the growing evsel/evlist
    tracepoint infrastructure, removing several open coded constructs
    like switch like series of strcmp to dispatch events, etc.
    Basically what had already been showcased in 'perf sched'.

    - Add evsel constructor for tracepoints, that uses libtraceevent just
    to parse the /format events file, use it in a new 'perf test' to
    make sure the libtraceevent format parsing regressions can be more
    readily caught.

    - Some strange errors were happening in some builds, but not on the
    next, reported by several people, problem was some parser related
    files, generated during the build, didn't had proper make deps, fix
    from Eric Sandeen.

    - Introduce struct and cache information about the environment where
    a perf.data file was captured, from Namhyung Kim.

    - Fix handling of unresolved samples when --symbols is used in
    'report', from Feng Tang.

    - Add union member access support to 'probe', from Hyeoncheol Lee.

    - Fixups to die() removal, from Namhyung Kim.

    - Render fixes for the TUI, from Namhyung Kim.

    - Don't enable annotation in non symbolic view, from Namhyung Kim.

    - Fix pipe mode in 'report', from Namhyung Kim.

    - Move related stats code from stat to util/, will be used by the
    'stat' kvm tool, from Xiao Guangrong.

    - Remove die()/exit() calls from several tools.

    - Resolve vdso callchains, from Jiri Olsa

    - Don't pass const char pointers to basename, so that we can
    unconditionally use libgen.h and thus avoid ifdef BIONIC lines,
    from David Ahern

    - Refactor hist formatting so that it can be reused with the GTK
    browser, From Namhyung Kim

    - Fix build for another rbtree.c change, from Adrian Hunter.

    - Make 'perf diff' command work with evsel hists, from Jiri Olsa.

    - Use the only field_sep var that is set up: symbol_conf.field_sep,
    fix from Jiri Olsa.

    - .gitignore compiled python binaries, from Namhyung Kim.

    - Get rid of die() in more libtraceevent places, from Namhyung Kim.

    - Rename libtraceevent 'private' struct member to 'priv' so that it
    works in C++, from Steven Rostedt

    - Remove lots of exit()/die() calls from tools so that the main perf
    exit routine can take place, from David Ahern

    - Fix x86 build on x86-64, from David Ahern.

    - {int,str,rb}list fixes from Suzuki K Poulose

    - perf.data header fixes from Namhyung Kim

    - Allow user to indicate objdump path, needed in cross environments,
    from Maciek Borzecki

    - Fix hardware cache event name generation, fix from Jiri Olsa

    - Add round trip test for sw, hw and cache event names, catching the
    problem Jiri fixed, after Jiri's patch, the test passes
    successfully.

    - Clean target should do clean for lib/traceevent too, fix from David
    Ahern

    - Check the right variable for allocation failure, fix from Namhyung
    Kim

    - Set up evsel->tp_format regardless of evsel->name being set
    already, fix from Namhyung Kim

    - Oprofile fixes from Robert Richter.

    - Remove perf_event_attr needless version inflation, from Jiri Olsa

    - Introduce libtraceevent strerror like error reporting facility,
    from Namhyung Kim

    - Add pmu mappings to perf.data header and use event names from cmd
    line, from Robert Richter

    - Fix include order for bison/flex-generated C files, from Ben
    Hutchings

    - Build fixes and documentation corrections from David Ahern

    - Assorted cleanups from Robert Richter

    - Let O= makes handle relative paths, from Steven Rostedt

    - perf script python fixes, from Feng Tang.

    - Initial bash completion support, from Frederic Weisbecker

    - Allow building without libelf, from Namhyung Kim.

    - Support DWARF CFI based unwind to have callchains when %bp based
    unwinding is not possible, from Jiri Olsa.

    - Symbol resolution fixes, while fixing support PPC64 files with an
    .opt ELF section was the end goal, several fixes for code that
    handles all architectures and cleanups are included, from Cody
    Schafer.

    - Assorted fixes for Documentation and build in 32 bit, from Robert
    Richter

    - Cache the libtraceevent event_format associated to each evsel
    early, so that we avoid relookups, i.e. calling pevent_find_event
    repeatedly when processing tracepoint events.

    [ This is to reduce the surface contact with libtraceevents and
    make clear what is that the perf tools needs from that lib: so
    far parsing the common and per event fields. ]

    - Don't stop the build if the audit libraries are not installed, fix
    from Namhyung Kim.

    - Fix bfd.h/libbfd detection with recent binutils, from Markus
    Trippelsdorf.

    - Improve warning message when libunwind devel packages not present,
    from Jiri Olsa"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits)
    perf trace: Add aliases for some syscalls
    perf probe: Print an enum type variable in "enum variable-name" format when showing accessible variables
    perf tools: Check libaudit availability for perf-trace builtin
    perf hists: Add missing period_* fields when collapsing a hist entry
    perf trace: New tool
    perf evsel: Export the event_format constructor
    perf evsel: Introduce rawptr() method
    perf tools: Use perf_evsel__newtp in the event parser
    perf evsel: The tracepoint constructor should store sys:name
    perf evlist: Introduce set_filter() method
    perf evlist: Renane set_filters method to apply_filters
    perf test: Add test to check we correctly parse and match syscall open parms
    perf evsel: Handle endianity in intval method
    perf evsel: Know if byte swap is needed
    perf tools: Allow handling a NULL cpu_map as meaning "all cpus"
    perf evsel: Improve tracepoint constructor setup
    tools lib traceevent: Fix error path on pevent_parse_event
    perf test: Fix build failure
    trace: Move trace event enable from fs_initcall to core_initcall
    tracing: Add an option for disabling markers
    ...

    Linus Torvalds
     
  • Pull trivial irq core update from Ingo Molnar:
    "Two symbol exports for modular irq-chip drivers"

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Export dummy_irq_chip
    genirq: Export irq_set_chip_and_handler_name()

    Linus Torvalds
     
  • Pull core locking changes from Ingo Molnar:
    "It includes a lockdep improvement plus a spinlock inlining Kconfig
    cleanup."

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking: Adjust spin lock inlining Kconfig options
    lockdep: Check if nested lock is actually held

    Linus Torvalds
     
  • Pull core kernel fixes from Ingo Molnar:
    "This is a complex task_work series from Oleg that fixes the bug that
    this VFS commit tried to fix:

    d35abdb28824 hold task_lock around checks in keyctl

    but solves the problem without the lockup regression that d35abdb28824
    introduced in v3.6.

    This series came late in v3.6 and I did not feel confident about it so
    late in the cycle. Might be worth backporting to -stable if it proves
    itself upstream."

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    task_work: Simplify the usage in ptrace_notify() and get_signal_to_deliver()
    task_work: Revert "hold task_lock around checks in keyctl"
    task_work: task_work_add() should not succeed after exit_task_work()
    task_work: Make task_work_add() lockless

    Linus Torvalds
     

29 Sep, 2012

1 commit

  • Conflicts:
    drivers/net/team/team.c
    drivers/net/usb/qmi_wwan.c
    net/batman-adv/bat_iv_ogm.c
    net/ipv4/fib_frontend.c
    net/ipv4/route.c
    net/l2tp/l2tp_netlink.c

    The team, fib_frontend, route, and l2tp_netlink conflicts were simply
    overlapping changes.

    qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

    With help from Antonio Quartulli.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Sep, 2012

1 commit


27 Sep, 2012

7 commits


26 Sep, 2012

13 commits

  • Checking "user" before "is_idle_task()" allows better optimizations
    in cases where inlining is possible. Also, "bool" should be passed
    "true" or "false" rather than "1" or "0". This commit therefore makes
    these changes, as noted in Josh's review.

    Reported-by: Josh Triplett
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • Provide a config option that enables the userspace
    RCU extended quiescent state on every CPUs by default.

    This is for testing purpose.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Josh Triplett
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • When exceptions or irq are about to resume userspace, if
    the task needs to be rescheduled, the arch low level code
    calls schedule() directly.

    If we call it, it is because we have the TIF_RESCHED flag:

    - It can be set after random local calls to set_need_resched()
    (RCU, drm, ...)

    - A wake up happened and the CPU needs preemption. This can
    happen in several ways:

    * Remotely: the remote waking CPU has set TIF_RESCHED and send the
    wakee an IPI to schedule the new task.
    * Remotely enqueued: the remote waking CPU sends an IPI to the target
    and the wake up is made by the target.
    * Locally: waking CPU == wakee CPU and the wakeup is done locally.
    set_need_resched() is called without IPI.

    In the case of local and remotely enqueued wake ups, the tick can
    be restarted when we enqueue the new task and RCU can exit the
    extended quiescent state at the same time. Then by the time we reach
    irq exit path and we call schedule, we are not in RCU user mode.

    But if we call schedule() only because something called set_need_resched(),
    RCU may still be in user mode when we reach schedule.

    Also if a wake up is done remotely, the CPU might see the TIF_RESCHED
    flag and call schedule while the IPI has not yet happen to restart the
    tick and exit RCU user mode.

    We need to manually protect against these corner cases.

    Create a new API schedule_user() that calls schedule() inside
    rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
    will need to rely on it now to implement user preemption safely.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Josh Triplett
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • When an exception or an irq exits, and we are going to resume into
    interrupted kernel code, the low level architecture code calls
    preempt_schedule_irq() if there is a need to reschedule.

    If the interrupt/exception occured between a call to rcu_user_enter()
    (from syscall exit, exception exit, do_notify_resume exit, ...) and
    a real resume to userspace (iret,...), preempt_schedule_irq() can be
    called whereas RCU thinks we are in userspace. But preempt_schedule_irq()
    is going to run kernel code and may be some RCU read side critical
    section. We must exit the userspace extended quiescent state before
    we call it.

    To solve this, just call rcu_user_exit() in the beginning of
    preempt_schedule_irq().

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Josh Triplett
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Clear the syscalls hook of a task when it's scheduled out so that if
    the task migrates, it doesn't run the syscall slow path on a CPU
    that might not need it.

    Also set the syscalls hook on the next task if needed.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Josh Triplett
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • By default we don't want to enter into RCU extended quiescent
    state while in userspace because doing this produces some overhead
    (eg: use of syscall slowpath). Set it off by default and ready to
    run when some feature like adaptive tickless need it.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Josh Triplett
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Allow calls to rcu_user_enter() even if we are already
    in userspace (as seen by RCU) and allow calls to rcu_user_exit()
    even if we are already in the kernel.

    This makes the APIs more flexible to be called from architectures.
    Exception entries for example won't need to know if they come from
    userspace before calling rcu_user_exit().

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Josh Triplett
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Create a new config option under the RCU menu that put
    CPUs under RCU extended quiescent state (as in dynticks
    idle mode) when they run in userspace. This require
    some contribution from architectures to hook into kernel
    and userspace boundaries.

    Signed-off-by: Frederic Weisbecker
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: H. Peter Anvin
    Cc: Ingo Molnar
    Cc: Josh Triplett
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
    the current CPU of RCU callbacks. This is appropriate when the CPU is
    entering idle, where it doesn't have much useful to do anyway, but is most
    definitely not what you want when transitioning to user-mode execution.
    This commit therefore detects the adaptive-tick case, and refrains from
    burning CPU time getting rid of RCU callbacks in that case.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Paul E. McKenney
     
  • In some cases, it is necessary to enter or exit userspace-RCU-idle mode
    from an interrupt handler, for example, if some other CPU sends this
    CPU a resched IPI. In this case, the current CPU would enter the IPI
    handler in userspace-RCU-idle mode, but would need to exit the IPI handler
    after having exited that mode.

    To allow this to work, this commit adds two new APIs to TREE_RCU:

    - rcu_user_enter_after_irq(). This must be called from an interrupt between
    rcu_irq_enter() and rcu_irq_exit(). After the irq calls rcu_irq_exit(),
    the irq handler will return into an RCU extended quiescent state.
    In theory, this interrupt is never a nested interrupt, but in practice
    it might interrupt softirq, which looks to RCU like a nested interrupt.

    - rcu_user_exit_after_irq(). This must be called from a non-nesting
    interrupt, interrupting an RCU extended quiescent state, also
    between rcu_irq_enter() and rcu_irq_exit(). After the irq calls
    rcu_irq_exit(), the irq handler will return in an RCU non-quiescent
    state.

    [ Combined with "Allow calls to rcu_exit_user_irq from nesting irqs." ]

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • RCU currently insists that only idle tasks can enter RCU idle mode, which
    prohibits an adaptive tickless kernel (AKA nohz cpusets), which in turn
    would mean that usermode execution would always take scheduling-clock
    interrupts, even when there is only one task runnable on the CPU in
    question.

    This commit therefore adds rcu_user_enter() and rcu_user_exit(), which
    allow non-idle tasks to enter RCU idle mode. These are quite similar
    to rcu_idle_enter() and rcu_idle_exit(), respectively, except that they
    omit the idle-task checks.

    [ Updated to use "user" flag rather than separate check functions. ]

    [ paulmck: Updated to drop exports of new functions based on Josh's patch
    getting rid of the need for them. ]

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Paul E. McKenney
    Cc: Alessio Igor Bogani
    Cc: Andrew Morton
    Cc: Avi Kivity
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Daniel Lezcano
    Cc: Geoff Levand
    Cc: Gilad Ben Yossef
    Cc: Hakan Akkan
    Cc: Ingo Molnar
    Cc: Kevin Hilman
    Cc: Max Krasnyansky
    Cc: Peter Zijlstra
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Sven-Thorsten Dietrich
    Cc: Thomas Gleixner
    Reviewed-by: Josh Triplett

    Frederic Weisbecker
     
  • Resolved conflict in kernel/sched/core.c using Peter Zijlstra's
    approach from https://lkml.org/lkml/2012/9/5/585.

    Paul E. McKenney
     
  • The conflicts between kernel/rcutree.h and kernel/rcutree_plugin.h
    were due to adjacent insertions and deletions, which were resolved
    by simply accepting the changes on both branches.

    Paul E. McKenney