11 Dec, 2013

1 commit

  • Introduce mul_u64_u32_shr() as proposed by Andy a while back; it
    allows using 64x64->128 muls on 64bit archs and recent GCC
    which defines __SIZEOF_INT128__ and __int128.

    (This new method will be used by the scheduler.)

    Signed-off-by: Peter Zijlstra
    Cc: fweisbec@gmail.com
    Cc: Andy Lutomirski
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

22 Nov, 2013

2 commits

  • Pull security subsystem updates from James Morris:
    "In this patchset, we finally get an SELinux update, with Paul Moore
    taking over as maintainer of that code.

    Also a significant update for the Keys subsystem, as well as
    maintenance updates to Smack, IMA, TPM, and Apparmor"

    and since I wanted to know more about the updates to key handling,
    here's the explanation from David Howells on that:

    "Okay. There are a number of separate bits. I'll go over the big bits
    and the odd important other bit, most of the smaller bits are just
    fixes and cleanups. If you want the small bits accounting for, I can
    do that too.

    (1) Keyring capacity expansion.

    KEYS: Consolidate the concept of an 'index key' for key access
    KEYS: Introduce a search context structure
    KEYS: Search for auth-key by name rather than target key ID
    Add a generic associative array implementation.
    KEYS: Expand the capacity of a keyring

    Several of the patches are providing an expansion of the capacity of a
    keyring. Currently, the maximum size of a keyring payload is one page.
    Subtract a small header and then divide up into pointers, that only gives
    you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
    a keyring to store ID mapping data, that has proven to be insufficient to
    the cause.

    Whatever data structure I use to handle the keyring payload, it can only
    store pointers to keys, not the keys themselves because several keyrings
    may point to a single key. This precludes inserting, say, and rb_node
    struct into the key struct for this purpose.

    I could make an rbtree of records such that each record has an rb_node
    and a key pointer, but that would use four words of space per key stored
    in the keyring. It would, however, be able to use much existing code.

    I selected instead a non-rebalancing radix-tree type approach as that
    could have a better space-used/key-pointer ratio. I could have used the
    radix tree implementation that we already have and insert keys into it by
    their serial numbers, but that means any sort of search must iterate over
    the whole radix tree. Further, its nodes are a bit on the capacious side
    for what I want - especially given that key serial numbers are randomly
    allocated, thus leaving a lot of empty space in the tree.

    So what I have is an associative array that internally is a radix-tree
    with 16 pointers per node where the index key is constructed from the key
    type pointer and the key description. This means that an exact lookup by
    type+description is very fast as this tells us how to navigate directly to
    the target key.

    I made the data structure general in lib/assoc_array.c as far as it is
    concerned, its index key is just a sequence of bits that leads to a
    pointer. It's possible that someone else will be able to make use of it
    also. FS-Cache might, for example.

    (2) Mark keys as 'trusted' and keyrings as 'trusted only'.

    KEYS: verify a certificate is signed by a 'trusted' key
    KEYS: Make the system 'trusted' keyring viewable by userspace
    KEYS: Add a 'trusted' flag and a 'trusted only' flag
    KEYS: Separate the kernel signature checking keyring from module signing

    These patches allow keys carrying asymmetric public keys to be marked as
    being 'trusted' and allow keyrings to be marked as only permitting the
    addition or linkage of trusted keys.

    Keys loaded from hardware during kernel boot or compiled into the kernel
    during build are marked as being trusted automatically. New keys can be
    loaded at runtime with add_key(). They are checked against the system
    keyring contents and if their signatures can be validated with keys that
    are already marked trusted, then they are marked trusted also and can
    thus be added into the master keyring.

    Patches from Mimi Zohar make this usable with the IMA keyrings also.

    (3) Remove the date checks on the key used to validate a module signature.

    X.509: Remove certificate date checks

    It's not reasonable to reject a signature just because the key that it was
    generated with is no longer valid datewise - especially if the kernel
    hasn't yet managed to set the system clock when the first module is
    loaded - so just remove those checks.

    (4) Make it simpler to deal with additional X.509 being loaded into the kernel.

    KEYS: Load *.x509 files into kernel keyring
    KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate

    The builder of the kernel now just places files with the extension ".x509"
    into the kernel source or build trees and they're concatenated by the
    kernel build and stuffed into the appropriate section.

    (5) Add support for userspace kerberos to use keyrings.

    KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
    KEYS: Implement a big key type that can save to tmpfs

    Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
    We looked at storing it in keyrings instead as that confers certain
    advantages such as tickets being automatically deleted after a certain
    amount of time and the ability for the kernel to get at these tokens more
    easily.

    To make this work, two things were needed:

    (a) A way for the tickets to persist beyond the lifetime of all a user's
    sessions so that cron-driven processes can still use them.

    The problem is that a user's session keyrings are deleted when the
    session that spawned them logs out and the user's user keyring is
    deleted when the UID is deleted (typically when the last log out
    happens), so neither of these places is suitable.

    I've added a system keyring into which a 'persistent' keyring is
    created for each UID on request. Each time a user requests their
    persistent keyring, the expiry time on it is set anew. If the user
    doesn't ask for it for, say, three days, the keyring is automatically
    expired and garbage collected using the existing gc. All the kerberos
    tokens it held are then also gc'd.

    (b) A key type that can hold really big tickets (up to 1MB in size).

    The problem is that Active Directory can return huge tickets with lots
    of auxiliary data attached. We don't, however, want to eat up huge
    tracts of unswappable kernel space for this, so if the ticket is
    greater than a certain size, we create a swappable shmem file and dump
    the contents in there and just live with the fact we then have an
    inode and a dentry overhead. If the ticket is smaller than that, we
    slap it in a kmalloc()'d buffer"

    * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
    KEYS: Fix keyring content gc scanner
    KEYS: Fix error handling in big_key instantiation
    KEYS: Fix UID check in keyctl_get_persistent()
    KEYS: The RSA public key algorithm needs to select MPILIB
    ima: define '_ima' as a builtin 'trusted' keyring
    ima: extend the measurement list to include the file signature
    kernel/system_certificate.S: use real contents instead of macro GLOBAL()
    KEYS: fix error return code in big_key_instantiate()
    KEYS: Fix keyring quota misaccounting on key replacement and unlink
    KEYS: Fix a race between negating a key and reading the error set
    KEYS: Make BIG_KEYS boolean
    apparmor: remove the "task" arg from may_change_ptraced_domain()
    apparmor: remove parent task info from audit logging
    apparmor: remove tsk field from the apparmor_audit_struct
    apparmor: fix capability to not use the current task, during reporting
    Smack: Ptrace access check mode
    ima: provide hash algo info in the xattr
    ima: enable support for larger default filedata hash algorithms
    ima: define kernel parameter 'ima_template=' to change configured default
    ima: add Kconfig default measurement list template
    ...

    Linus Torvalds
     
  • Pull audit updates from Eric Paris:
    "Nothing amazing. Formatting, small bug fixes, couple of fixes where
    we didn't get records due to some old VFS changes, and a change to how
    we collect execve info..."

    Fixed conflict in fs/exec.c as per Eric and linux-next.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    audit: fix type of sessionid in audit_set_loginuid()
    audit: call audit_bprm() only once to add AUDIT_EXECVE information
    audit: move audit_aux_data_execve contents into audit_context union
    audit: remove unused envc member of audit_aux_data_execve
    audit: Kill the unused struct audit_aux_data_capset
    audit: do not reject all AUDIT_INODE filter types
    audit: suppress stock memalloc failure warnings since already managed
    audit: log the audit_names record type
    audit: add child record before the create to handle case where create fails
    audit: use given values in tty_audit enable api
    audit: use nlmsg_len() to get message payload length
    audit: use memset instead of trying to initialize field by field
    audit: fix info leak in AUDIT_GET requests
    audit: update AUDIT_INODE filter rule to comparator function
    audit: audit feature to set loginuid immutable
    audit: audit feature to only allow unsetting the loginuid
    audit: allow unsetting the loginuid (with priv)
    audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
    audit: loginuid functions coding style
    selinux: apply selinux checks on new audit message types
    ...

    Linus Torvalds
     

21 Nov, 2013

1 commit

  • This reverts commit ea1e7ed33708c7a760419ff9ded0a6cb90586a50.

    Al points out that while the commit *does* actually create a separate
    slab for the page->ptl allocation, that slab is never actually used, and
    the code continues to use kmalloc/kfree.

    Damien Wyart points out that the original patch did have the conversion
    to use kmem_cache_alloc/free, so it got lost somewhere on its way to me.

    Revert the half-arsed attempt that didn't do anything. If we really do
    want the special slab (remember: this is all relevant just for debug
    builds, so it's not necessarily all that critical) we might as well redo
    the patch fully.

    Reported-by: Al Viro
    Acked-by: Andrew Morton
    Cc: Kirill A Shutemov
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Nov, 2013

1 commit

  • This reverts commit 69f0554ec261fd686ac7fa1c598cc9eb27b83a80.

    This patch breaks randconfig on at least the x86-64 architecture, and
    most likely on others. There is work underway to support uncompressed
    kernels in a generic way, but it looks like it will amount to
    rewriting the support from scratch; see the LKML thread in the Link:
    for info.

    Therefore, revert this change and wait for the fix.

    Reported-by: Pavel Roskin
    Cc: Christian Ruppert
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/20131113113418.167b8ffd@IRBT4585
    Signed-off-by: H. Peter Anvin
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     

16 Nov, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "Usual earth-shaking, news-breaking, rocket science pile from
    trivial.git"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
    doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
    doc: add missing files to timers/00-INDEX
    timekeeping: Fix some trivial typos in comments
    mm: Fix some trivial typos in comments
    irq: Fix some trivial typos in comments
    NUMA: fix typos in Kconfig help text
    mm: update 00-INDEX
    doc: Documentation/DMA-attributes.txt fix typo
    DRM: comment: `halve' -> `half'
    Docs: Kconfig: `devlopers' -> `developers'
    doc: typo on word accounting in kprobes.c in mutliple architectures
    treewide: fix "usefull" typo
    treewide: fix "distingush" typo
    mm/Kconfig: Grammar s/an/a/
    kexec: Typo s/the/then/
    Documentation/kvm: Update cpuid documentation for steal time and pv eoi
    treewide: Fix common typo in "identify"
    __page_to_pfn: Fix typo in comment
    Correct some typos for word frequency
    clk: fixed-factor: Fix a trivial typo
    ...

    Linus Torvalds
     

15 Nov, 2013

2 commits

  • Pull module updates from Rusty Russell:
    "Mainly boring here, too. rmmod --wait finally removed, though"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    modpost: fix bogus 'exported twice' warnings.
    init: fix in-place parameter modification regression
    asmlinkage, module: Make ksymtab and kcrctab symbols and __this_module __visible
    kernel: add support for init_array constructors
    modpost: Optionally ignore secondary errors seen if a single module build fails
    module: remove rmmod --wait option.

    Linus Torvalds
     
  • If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
    is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
    so we loose 24 on each. An average system can easily allocate few tens
    thousands of page->ptl and overhead is significant.

    Let's create a separate slab for page->ptl allocation to solve this.

    Signed-off-by: Kirill A. Shutemov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

13 Nov, 2013

7 commits

  • Pull networking updates from David Miller:

    1) The addition of nftables. No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations. For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset. In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

    2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

    3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

    4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

    5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

    6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

    7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option. From Eric Dumazet.

    8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

    9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too. Implementation from Shawn
    Bohrer.

    10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets. With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention. From Eric Dumazet.

    11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations. From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

    12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels. From Eric Dumazet.

    13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies. From Hannes Frederic Sowa. The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

    14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more. Many drivers have been cleaned
    up in this way, from Jingoo Han.

    15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

    16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled. Also from Daniel
    Borkmann.

    17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value. This helps avoid PMTU attacks,
    particularly on DNS servers. From Hannes Frederic Sowa.

    18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net. From Jason Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
    random32: add test cases for taus113 implementation
    random32: upgrade taus88 generator to taus113 from errata paper
    random32: move rnd_state to linux/random.h
    random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
    random32: add periodic reseeding
    random32: fix off-by-one in seeding requirement
    PHY: Add RTL8201CP phy_driver to realtek
    xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
    macmace: add missing platform_set_drvdata() in mace_probe()
    ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
    ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
    vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
    ixgbe: add warning when max_vfs is out of range.
    igb: Update link modes display in ethtool
    netfilter: push reasm skb through instead of original frag skbs
    ip6_output: fragment outgoing reassembled skb properly
    MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
    net_sched: tbf: support of 64bit rates
    ixgbe: deleting dfwd stations out of order can cause null ptr deref
    ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
    ...

    Linus Torvalds
     
  • Make menuconfig allows one to choose compression format of an initial
    ramdisk image. But this choice does not result in duly compressed ramdisk
    image. Because - $ make install - does not pass on the selected
    compression choice to the dracut(8) tool, which creates the initramfs
    file. dracut(8) generates the image with the default compression, ie.
    gzip(1).

    This patch exports the selected compression option to a sub-shell
    environment, so that it could be used by dracut(8) tool to generate
    appropriately compressed initramfs images.

    There isn't a straightforward way to pass on options to dracut(8) via
    positional parameters. Because it is indirectly invoked at the end of a $
    make install sequence.

    # make install
    -> arch/$arch/boot/Makefile
    -> arch/$arch/boot/install.sh
    -> /sbing/installkernel ...
    -> /sbin/new-kernel-pkg ...
    -> /sbin/dracut ...

    Signed-off-by: P J P
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    P J P
     
  • Some ARC users say they can boot faster with without kernel compression.
    This probably depends on things like the FLASH chip they use etc.

    Until now, kernel compression can only be disabled by removing "select
    HAVE_" lines from the architecture Kconfig. So add the
    Kconfig logic to permit disabling of kernel compression.

    Signed-off-by: Christian Ruppert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christian Ruppert
     
  • This patch proposes to make init failures more explicit.

    Before this, the "No init found" message didn't help much. It could
    sometimes be misleading and actually mean "No *working* init found".

    This message could hide many different issues:
    - no init program candidates found at all
    - some init program candidates exist but can't be executed (missing
    execute permissions, failed to load shared libraries, executable
    compiled for an unknown architecture...)

    This patch notifies the kernel user when a candidate init program is found
    but can't be executed. In each failure situation, the error code is
    displayed, to quickly find the root cause. "No init found" is also
    replaced by "No working init found", which is more correct.

    This will help embedded Linux developers (especially the newcomers),
    regularly making and debugging new root filesystems.

    Credits to Geert Uytterhoeven and Janne Karhunen for their improvement
    suggestions.

    Signed-off-by: Michael Opdenacker
    Cc: Geert Uytterhoeven
    Tested-by: Janne Karhunen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Opdenacker
     
  • Make menuconfig allows one to choose compression format of an initial
    ramdisk image. But this choice does not result in duly compressed initial
    ramdisk image. Because - $ make install - does not pass on the selected
    compression choice to the dracut(8) tool, which creates the initramfs
    file. dracut(8) generates the image with the default compression, ie.
    gzip(1).

    If a user chose any other compression instead of gzip(1), it leads to a
    crash due to NULL pointer dereference in crd_load(), caused by a NULL
    function pointer returned by the 'decompress_method()' routine. Because
    the initramfs image is gzip(1) compressed, whereas the kernel knows only
    to decompress the chosen format and not gzip(1).

    This patch replaces the crash by an explicit panic() call with an
    appropriate error message. This shall prevent the kernel from
    eventually panicking in: init/do_mounts.c: mount_block_root() with
    -> panic("VFS: Unable to mount root fs on %s", b);

    [akpm@linux-foundation.org: mention that the problem is with the ramdisk, don't print known-to-be-NULL value]
    Signed-off-by: P J P
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    P J P
     
  • It's already available in

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • The name_to_dev_t function has a comment block which lists the supported
    syntaxes for the device name. Add a bullet for the :
    syntax, which is already supported in the code

    Signed-off-by: Sebastian Capella
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sebastian Capella
     

12 Nov, 2013

2 commits

  • Pull timer changes from Ingo Molnar:
    "Main changes in this cycle were:

    - Updated full dynticks support.

    - Event stream support for architected (ARM) timers.

    - ARM clocksource driver updates.

    - Move arm64 to using the generic sched_clock framework & resulting
    cleanup in the generic sched_clock code.

    - Misc fixes and cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
    x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
    clocksource: sun4i: remove IRQF_DISABLED
    clocksource: sun4i: Report the minimum tick that we can program
    clocksource: sun4i: Select CLKSRC_MMIO
    clocksource: Provide timekeeping for efm32 SoCs
    clocksource: em_sti: convert to clk_prepare/unprepare
    time: Fix signedness bug in sysfs_get_uname() and its callers
    timekeeping: Fix some trivial typos in comments
    alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
    clocksource: arch_timer: Do not register arch_sys_counter twice
    timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
    sched_clock: Remove sched_clock_func() hook
    arch_timer: Move to generic sched_clock framework
    clocksource: tcb_clksrc: Remove IRQF_DISABLED
    clocksource: tcb_clksrc: Improve driver robustness
    clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
    clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
    clocksource: dw_apb_timer_of: Mark a few more functions as __init
    clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
    arm: zynq: Enable arm_global_timer
    ...

    Linus Torvalds
     
  • Pull scheduler changes from Ingo Molnar:
    "The main changes in this cycle are:

    - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
    van Riel, Peter Zijlstra et al. Yay!

    - optimize preemption counter handling: merge the NEED_RESCHED flag
    into the preempt_count variable, by Peter Zijlstra.

    - wait.h fixes and code reorganization from Peter Zijlstra

    - cfs_bandwidth fixes from Ben Segall

    - SMP load-balancer cleanups from Peter Zijstra

    - idle balancer improvements from Jason Low

    - other fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
    ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
    stop_machine: Fix race between stop_two_cpus() and stop_cpus()
    sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
    sched: Fix asymmetric scheduling for POWER7
    sched: Move completion code from core.c to completion.c
    sched: Move wait code from core.c to wait.c
    sched: Move wait.c into kernel/sched/
    sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
    sched: Avoid throttle_cfs_rq() racing with period_timer stopping
    sched: Guarantee new group-entities always have weight
    sched: Fix hrtimer_cancel()/rq->lock deadlock
    sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
    sched: Fix race on toggling cfs_bandwidth_used
    sched: Remove extra put_online_cpus() inside sched_setaffinity()
    sched/rt: Fix task_tick_rt() comment
    sched/wait: Fix build breakage
    sched/wait: Introduce prepare_to_wait_event()
    sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
    sched: Remove get_online_cpus() usage
    sched: Fix race in migrate_swap_stop()
    ...

    Linus Torvalds
     

08 Nov, 2013

1 commit


06 Nov, 2013

1 commit

  • After trying to use this feature in Fedora we found the hard coding
    policy like this into the kernel was a bad idea. Surprise surprise.
    We ran into these problems because it was impossible to launch a
    container as a logged in user and run a login daemon inside that container.
    This reverts back to the old behavior before this option was added. The
    option will be re-added in a userspace selectable manor such that
    userspace can choose when it is and when it is not appropriate.

    Signed-off-by: Eric Paris
    Signed-off-by: Richard Guy Briggs
    Signed-off-by: Eric Paris

    Eric Paris
     

31 Oct, 2013

1 commit

  • Before commit 026cee0086fe1df4cf74691cf273062cc769617d
    ("params: _initcall-like kernel parameters") the __setup
    parameter parsing code could modify parameter in the
    static_command_line buffer and such modifications were kept. After
    that commit such modifications are destroyed during per-initcall level
    parameter parsing because the same static_command_line buffer is used
    and only parameters for appropriate initcall level are parsed.

    That change broke at least parsing "ubd" parameter in the ubd driver
    when the COW file is used.

    Now the separate buffer is used for per-initcall parameter parsing.

    Signed-off-by: Krzysztof Mazur
    Signed-off-by: Rusty Russell

    Krzysztof Mazur
     

24 Oct, 2013

1 commit


20 Oct, 2013

1 commit

  • Usage of the static key primitives to toggle a branch must not be used
    before jump_label_init() is called from init/main.c. jump_label_init
    reorganizes and wires up the jump_entries so usage before that could
    have unforeseen consequences.

    Following primitives are now checked for correct use:
    * static_key_slow_inc
    * static_key_slow_dec
    * static_key_slow_dec_deferred
    * jump_label_rate_limit

    The x86 architecture already checks this by testing if the default_nop
    was already replaced with an optimal nop or with a branch instruction. It
    will panic then. Other architectures don't check for this.

    Because we need to relax this check for the x86 arch to allow code to
    transition from default_nop to the enabled state and other architectures
    did not check for this at all this patch introduces checking on the
    static_key primitives in a non-arch dependent manner.

    All checked functions are considered slow-path so the additional check
    does no harm to performance.

    The warnings are best observed with earlyprintk.

    Based on a patch from Andi Kleen.

    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andi Kleen
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

14 Oct, 2013

1 commit


11 Oct, 2013

2 commits


30 Sep, 2013

2 commits

  • The CONFIG_64BIT requirement on vtime can finally be removed
    since we now depend on HAVE_VIRT_CPU_ACCOUNTING_GEN which
    already takes care of the arch ability to handle nsecs based
    cputime_t safely.

    Signed-off-by: Kevin Hilman
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Paul E. McKenney
    Cc: Arm Linux
    Signed-off-by: Frederic Weisbecker

    Kevin Hilman
     
  • With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order
    to use that feature, arch code should be audited to ensure there are no
    races in concurrent read/write of cputime_t. For example,
    reading/writing 64-bit cputime_t on some 32-bit arches may require
    multiple accesses for low and high value parts, so proper locking
    is needed to protect against concurrent accesses.

    Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can
    enable after they've been audited for potential races.

    This option is automatically enabled on 64-bit platforms.

    Feature requested by Frederic Weisbecker.

    Signed-off-by: Kevin Hilman
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Paul E. McKenney
    Cc: Arm Linux
    Signed-off-by: Frederic Weisbecker

    Kevin Hilman
     

26 Sep, 2013

1 commit


25 Sep, 2013

1 commit

  • Replace the single preempt_count() 'function' that's an lvalue with
    two proper functions:

    preempt_count() - returns the preempt_count value as rvalue
    preempt_count_set() - Allows setting the preempt-count value

    Also provide preempt_count_ptr() as a convenience wrapper to implement
    all modifying operations.

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-orxrbycjozopqfhb4dxdkdvb@git.kernel.org
    [ Fixed build failure. ]
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

23 Sep, 2013

1 commit


15 Sep, 2013

1 commit

  • Pull SLAB update from Pekka Enberg:
    "Nothing terribly exciting here apart from Christoph's kmalloc
    unification patches that brings sl[aou]b implementations closer to
    each other"

    * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
    slab: Use correct GFP_DMA constant
    slub: remove verify_mem_not_deleted()
    mm/sl[aou]b: Move kmallocXXX functions to common code
    mm, slab_common: add 'unlikely' to size check of kmalloc_slab()
    mm/slub.c: beautify code for removing redundancy 'break' statement.
    slub: Remove unnecessary page NULL check
    slub: don't use cpu partial pages on UP
    mm/slub: beautify code for 80 column limitation and tab alignment
    mm/slub: remove 'per_cpu' which is useless variable

    Linus Torvalds
     

12 Sep, 2013

3 commits

  • Command line option rootfstype=ramfs to obtain old initramfs behavior, and
    use ramfs instead of tmpfs for stub when root= defined (for cosmetic
    reasons).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • Conditionally call the appropriate fs_init function and fill_super
    functions. Add a use once guard to shmem_init() to simply succeed on a
    second call.

    (Note that IS_ENABLED() is a compile time constant so dead code
    elimination removes unused function calls when CONFIG_TMPFS is disabled.)

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     
  • When the rootfs code was a wrapper around ramfs, having them in the same
    file made sense. Now that it can wrap another filesystem type, move it in
    with the init code instead.

    This also allows a subsequent patch to access rootfstype= command line
    arg.

    Signed-off-by: Rob Landley
    Cc: Jeff Layton
    Cc: Jens Axboe
    Cc: Stephen Warren
    Cc: Rusty Russell
    Cc: Jim Cromie
    Cc: Sam Ravnborg
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Alexander Viro
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Landley
     

11 Sep, 2013

1 commit

  • Pull kconfig updates from Michal Marek:
    "This is the kconfig part of kbuild for v3.12-rc1:
    - post-3.11 search code fixes and micro-optimizations
    - CONFIG_MODULES is no longer a special case; this is needed to
    eventually fix the bug that using KCONFIG_ALLCONFIG breaks
    allmodconfig
    - long long is used to store hex and int values
    - make silentoldconfig no longer warns when a symbol changes from
    tristate to bool (it's a job for make oldconfig)
    - scripts/diffconfig updated to work with newer Pythons
    - scripts/config does not rely on GNU sed extensions"

    * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kconfig: do not allow more than one symbol to have 'option modules'
    kconfig: regenerate bison parser
    kconfig: do not special-case 'MODULES' symbol
    diffconfig: Update script to support python versions 2.5 through 3.3
    diffconfig: Gracefully exit if the default config files are not present
    modules: do not depend on kconfig to set 'modules' option to symbol MODULES
    kconfig: silence warning when parsing auto.conf when a symbol has changed type
    scripts/config: use sed's POSIX interface
    kconfig: switch to "long long" for sanity
    kconfig: simplify symbol-search code
    kconfig: don't allocate n+1 elements in temporary array
    kconfig: minor style fixes in symbol-search code
    kconfig/[mn]conf: shorten title in search-box
    kconfig: avoid multiple calls to strlen
    Documentation/kconfig: more concise and straightforward search explanation

    Linus Torvalds
     

10 Sep, 2013

1 commit

  • Pull xfs updates from Ben Myers:
    "For 3.12-rc1 there are a number of bugfixes in addition to work to
    ease usage of shared code between libxfs and the kernel, the rest of
    the work to enable project and group quotas to be used simultaneously,
    performance optimisations in the log and the CIL, directory entry file
    type support, fixes for log space reservations, some spelling/grammar
    cleanups, and the addition of user namespace support.

    - introduce readahead to log recovery
    - add directory entry file type support
    - fix a number of spelling errors in comments
    - introduce new Q_XGETQSTATV quotactl for project quotas
    - add USER_NS support
    - log space reservation rework
    - CIL optimisations
    - kernel/userspace libxfs rework"

    * tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
    xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
    xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
    Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
    xfs: finish removing IOP_* macros.
    xfs: inode log reservations are too small
    xfs: check correct status variable for xfs_inobt_get_rec() call
    xfs: inode buffers may not be valid during recovery readahead
    xfs: check LSN ordering for v5 superblocks during recovery
    xfs: btree block LSN escaping to disk uninitialised
    XFS: Assertion failed: first < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
    xfs: fix bad dquot buffer size in log recovery readahead
    xfs: don't account buffer cancellation during log recovery readahead
    xfs: check for underflow in xfs_iformat_fork()
    xfs: xfs_dir3_sfe_put_ino can be static
    xfs: introduce object readahead to log recovery
    xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
    xfs: Register hotcpu notifier after initialization
    xfs: add xfs sb v4 support for dirent filetype field
    xfs: Add write support for dirent filetype field
    xfs: Add read-only support for dirent filetype field
    ...

    Linus Torvalds
     

05 Sep, 2013

1 commit

  • Pull timers/nohz changes from Ingo Molnar:
    "It mostly contains fixes and full dynticks off-case optimizations, by
    Frederic Weisbecker"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    nohz: Include local CPU in full dynticks global kick
    nohz: Optimize full dynticks's sched hooks with static keys
    nohz: Optimize full dynticks state checks with static keys
    nohz: Rename a few state variables
    vtime: Always debug check snapshot source _before_ updating it
    vtime: Always scale generic vtime accounting results
    vtime: Optimize full dynticks accounting off case with static keys
    vtime: Describe overriden functions in dedicated arch headers
    m68k: hardirq_count() only need preempt_mask.h
    hardirq: Split preempt count mask definitions
    context_tracking: Split low level state headers
    vtime: Fix racy cputime delta update
    vtime: Remove a few unneeded generic vtime state checks
    context_tracking: User/kernel broundary cross trace events
    context_tracking: Optimize context switch off case with static keys
    context_tracking: Optimize guest APIs off case with static key
    context_tracking: Optimize main APIs off case with static key
    context_tracking: Ground setup for static key use
    context_tracking: Remove full dynticks' hacky dependency on wide context tracking
    nohz: Only enable context tracking on full dynticks CPUs
    ...

    Linus Torvalds
     

03 Sep, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    "
    * Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/611.

    * Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/619.

    * Full-system idle detection. This is for use by Frederic
    Weisbecker's adaptive-ticks mechanism. Its purpose is
    to allow the timekeeping CPU to shut off its tick when
    all other CPUs are idle. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/648.

    * Improve rcutorture test coverage. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/675.
    "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

24 Aug, 2013

1 commit

  • The swapaccount kernel parameter without any values has been removed by
    commit a2c8990aed5a ("memsw: remove noswapaccount kernel parameter") but
    it seems that we didn't get rid of all the left overs.

    Make sure that menuconfig help text and kernel-parameters.txt are clear
    about value for the paramter and remove the stalled comment which is not
    very much useful on its own.

    Signed-off-by: Michal Hocko
    Reported-by: Gergely Risko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

19 Aug, 2013

1 commit

  • TREE_RCU and TREE_PREEMPT_RCU both cause kernel/rcutree.c to be built,
    but only TREE_RCU selects IRQ_WORK, which can result in an undefined
    reference to irq_work_queue for some (random) configs:

    kernel/built-in.o In function `rcu_start_gp_advanced':
    kernel/rcutree.c:1564: undefined reference to `irq_work_queue'

    Select IRQ_WORK from TREE_PREEMPT_RCU too to fix this.

    Signed-off-by: James Hogan
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Josh Triplett

    James Hogan