06 Feb, 2015

2 commits

  • commit 14bf61ffe6ac54afcd1e888a4407fe16054483db upstream.

    Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
    tracks space limits and usage in 512-byte blocks. However VFS quotas
    track usage in bytes (as some filesystems require that) and we need to
    somehow pass this information. Upto now it wasn't a problem because we
    didn't do any unit conversion (thus VFS quota routines happily stuck
    number of bytes into d_bcount field of struct fd_disk_quota). Only if
    you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
    / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
    tried this but reportedly some Samba users hit the problem in practice.
    So when we want interfaces compatible we need to fix this.

    We bite the bullet and define another quota structure used for passing
    information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
    to have more conversion routines in fs/quota/quota.c and another copying
    of quota structure slows down getting of quota information by about 2%
    but it seems cleaner than overloading e.g. units of d_bcount to bytes.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 33692f27597fcab536d7cbbcc8f52905133e4aa7 upstream.

    The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
    "you should SIGSEGV" error, because the SIGSEGV case was generally
    handled by the caller - usually the architecture fault handler.

    That results in lots of duplication - all the architecture fault
    handlers end up doing very similar "look up vma, check permissions, do
    retries etc" - but it generally works. However, there are cases where
    the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

    In particular, when accessing the stack guard page, libsigsegv expects a
    SIGSEGV. And it usually got one, because the stack growth is handled by
    that duplicated architecture fault handler.

    However, when the generic VM layer started propagating the error return
    from the stack expansion in commit fee7e49d4514 ("mm: propagate error
    from stack expansion even for guard page"), that now exposed the
    existing VM_FAULT_SIGBUS result to user space. And user space really
    expected SIGSEGV, not SIGBUS.

    To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
    duplicate architecture fault handlers about it. They all already have
    the code to handle SIGSEGV, so it's about just tying that new return
    value to the existing code, but it's all a bit annoying.

    This is the mindless minimal patch to do this. A more extensive patch
    would be to try to gather up the mostly shared fault handling logic into
    one generic helper routine, and long-term we really should do that
    cleanup.

    Just from this patch, you can generally see that most architectures just
    copied (directly or indirectly) the old x86 way of doing things, but in
    the meantime that original x86 model has been improved to hold the VM
    semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
    "newer" things, so it would be a good idea to bring all those
    improvements to the generic case and teach other architectures about
    them too.

    Reported-and-tested-by: Takashi Iwai
    Tested-by: Jan Engelhardt
    Acked-by: Heiko Carstens # "s390 still compiles and boots"
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

30 Jan, 2015

8 commits

  • commit 5d26a105b5a73e5635eae0629b42fa0a90e07b7b upstream.

    This prefixes all crypto module loading with "crypto-" so we never run
    the risk of exposing module auto-loading to userspace via a crypto API,
    as demonstrated by Mathias Krause:

    https://lkml.org/lkml/2013/3/4/70

    Signed-off-by: Kees Cook
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • commit 175f8e2650f7ca6b33d338be3ccc1c00e89594ea upstream.

    In some cases acpi_device_wakeup() may be called to ensure wakeup
    power to be off for a given device even though that device's wakeup
    GPE has not been enabled so far. It calls acpi_disable_gpe() on a
    GPE that's not enabled and this causes ACPICA to return the AE_LIMIT
    status code from that call which then is reported as an error by the
    ACPICA's debug facilities (if enabled). This may lead to a fair
    amount of confusion, so introduce a new ACPI device wakeup flag
    to store the wakeup GPE status and avoid disabling wakeup GPEs
    that have not been enabled.

    Reported-and-tested-by: Venkat Raghavulu
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • commit 45f87de57f8fad59302fd263dd81ffa4843b5b24 upstream.

    Commit 2457aec63745 ("mm: non-atomically mark page accessed during page
    cache allocation where possible") has added a separate parameter for
    specifying gfp mask for radix tree allocations.

    Not only this is less than optimal from the API point of view because it
    is error prone, it is also buggy currently because
    grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
    fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
    AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
    the radix tree allocation wouldn't obey the restriction and might
    recurse into filesystem and cause deadlocks. This is the case for most
    filesystems unfortunately because only ext4 and gfs2 are using
    AOP_FLAG_NOFS.

    Let's simply remove radix_gfp_mask parameter because the allocation
    context is same for both page cache and for the radix tree. Just make
    sure that the radix tree gets only the sane subset of the mask (e.g. do
    not pass __GFP_WRITE).

    Long term it is more preferable to convert remaining users of
    AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
    interface even further.

    Reported-by: Dave Chinner
    Signed-off-by: Michal Hocko
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michal Hocko
     
  • commit 6ada1fc0e1c4775de0e043e1bd3ae9d065491aa5 upstream.

    An unvalidated user input is multiplied by a constant, which can result in
    an undefined behaviour for large values. While this is validated later,
    we should avoid triggering undefined behaviour.

    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Sasha Levin
    [jstultz: include trivial milisecond->microsecond correction noticed
    by Andy]
    Signed-off-by: John Stultz
    Signed-off-by: Greg Kroah-Hartman

    Sasha Levin
     
  • commit f331a859e0ee5a898c1f47596eddad4c4f02d657 upstream.

    Enable a mechanism for devices to quirk that they do not behave when
    doing a PCI bus reset. We require a modest level of spec compliant
    behavior in order to do a reset, for instance the device should come
    out of reset without throwing errors and PCI config space should be
    accessible after reset. This is too much to ask for some devices.

    Link: http://lkml.kernel.org/r/20140923210318.498dacbd@dualc.maya.org
    Signed-off-by: Alex Williamson
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Greg Kroah-Hartman

    Alex Williamson
     
  • commit 8505e729a2f6eb0803ff943a15f133dd10afff3a upstream.

    Add pci_claim_bridge_resource() to claim a PCI-PCI bridge window. This is
    like regular pci_claim_resource(), except that if we fail to claim the
    window, we check to see if we can reduce the size of the window and try
    again.

    This is for scenarios like this:

    pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
    pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
    pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]

    The 00:01.0 window is illegal: it starts before the host bridge window, so
    we have to assume the [0xbdf00000-0xbfffffff] region is inaccessible. We
    can make it legal by clipping it to [mem 0xc0000000-0xddefffff 64bit pref].

    Previously we discarded the 00:01.0 window and tried to reassign that part
    of the hierarchy from scratch. That is a problem because Linux doesn't
    always assign things optimally. For example, in this case, BIOS put the
    01:00.0 device in a prefetchable window below 4GB, but after 5b28541552ef,
    Linux puts the prefetchable window above 4GB where the 32-bit 01:00.0
    device can't use it.

    Clipping the 00:01.0 window is less intrusive than completely reassigning
    things and is sufficient to let us use most of the BIOS configuration. Of
    course, it's possible that devices below 00:01.0 will no longer fit. If
    that's the case, we'll have to reassign things. But that's a separate
    problem.

    [bhelgaas: changelog, split into separate patch]
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
    Reported-by: Marek Kordik
    Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
    Signed-off-by: Yinghai Lu
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Greg Kroah-Hartman

    Yinghai Lu
     
  • commit 72dd299d5039a336493993dcc63413cf31d0e662 upstream.

    Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
    "Since commit 8a4aeec8d "libata/ahci: accommodate tag ordered
    controllers" the access to the harddisk on the first SATA-port is
    failing on its first access. The access to the harddisk on the
    second port is working normal.

    When reverting the above commit, access to both harddisks is working
    fine again."

    Maintain tag ordered submission as the default, but allow sata_sil24 to
    continue with the old behavior.

    Cc: Tejun Heo
    Reported-by: Ronny Hegewald
    Signed-off-by: Dan Williams
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     
  • commit 6cfda7fbebe8a4fd33ea5722fa0212f98f643c35 upstream.

    During the CAN FD standardization process within the ISO it turned out that
    the failure detection capability has to be improved.

    The CAN in Automation organization (CiA) defined the already implemented CAN
    FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
    'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937

    Finally there will be three types of CAN FD controllers in the future:

    1. ISO compliant (fixed)
    2. non-ISO compliant (fixed, like the M_CAN IP v3.0.1 in m_can.c)
    3. ISO/non-ISO CAN FD controllers (switchable, like the PEAK USB FD)

    So the current M_CAN driver for the M_CAN IP v3.0.1 has to expose its non-ISO
    implementation by setting the CAN_CTRLMODE_FD_NON_ISO ctrlmode at startup.
    As this bit cannot be switched at configuration time CAN_CTRLMODE_FD_NON_ISO
    must not be set in ctrlmode_supported of the current M_CAN driver.

    Signed-off-by: Oliver Hartkopp
    Signed-off-by: Marc Kleine-Budde
    Signed-off-by: Greg Kroah-Hartman

    Oliver Hartkopp
     

28 Jan, 2015

4 commits

  • commit c291ee622165cb2c8d4e7af63fffd499354a23be upstream.

    Since the rework of the sparse interrupt code to actually free the
    unused interrupt descriptors there exists a race between the /proc
    interfaces to the irq subsystem and the code which frees the interrupt
    descriptor.

    CPU0 CPU1
    show_interrupts()
    desc = irq_to_desc(X);
    free_desc(desc)
    remove_from_radix_tree();
    kfree(desc);
    raw_spinlock_irq(&desc->lock);

    /proc/interrupts is the only interface which can actively corrupt
    kernel memory via the lock access. /proc/stat can only read from freed
    memory. Extremly hard to trigger, but possible.

    The interfaces in /proc/irq/N/ are not affected by this because the
    removal of the proc file is serialized in procfs against concurrent
    readers/writers. The removal happens before the descriptor is freed.

    For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
    as the descriptor is never freed. It's merely cleared out with the irq
    descriptor lock held. So any concurrent proc access will either see
    the old correct value or the cleared out ones.

    Protect the lookup and access to the irq descriptor in
    show_interrupts() with the sparse_irq_lock.

    Provide kstat_irqs_usr() which is protecting the lookup and access
    with sparse_irq_lock and switch /proc/stat to use it.

    Document the existing kstat_irqs interfaces so it's clear that the
    caller needs to take care about protection. The users of these
    interfaces are either not affected due to SPARSE_IRQ=n or already
    protected against removal.

    Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • commit 3875f15207f9ecb3f24a8e91e7ad196899139595 upstream.

    scripts/headers_install.sh will transform __packed to
    __attribute__((packed)), so the #ifndef is not necessary.
    (and, in fact, it's problematic, because we'll end up with the header
    containing:
    #ifndef __attribute__((packed))
    #define __attribu...
    and so forth.)

    Signed-off-by: Kyle McMartin
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Greg Kroah-Hartman

    Kyle McMartin
     
  • [ Upstream commit 5f35227ea34bb616c436d9da47fc325866c428f3 ]

    GSO isn't the only offload feature with restrictions that
    potentially can't be expressed with the current features mechanism.
    Checksum is another although it's a general issue that could in
    theory apply to anything. Even if it may be possible to
    implement these restrictions in other ways, it can result in
    duplicate code or inefficient per-packet behavior.

    This generalizes ndo_gso_check so that drivers can remove any
    features that don't make sense for a given packet, similar to
    netif_skb_features(). It also converts existing driver
    restrictions to the new format, completing the work that was
    done to support tunnel protocols since the issues apply to
    checksums as well.

    By actually removing features from the set that are used to do
    offloading, it solves another problem with the existing
    interface. In these cases, GSO would run with the original set
    of features and not do anything because it appears that
    segmentation is not required.

    CC: Tom Herbert
    CC: Joe Stringer
    CC: Eric Dumazet
    CC: Hayes Wang
    Signed-off-by: Jesse Gross
    Acked-by: Tom Herbert
    Fixes: 04ffcb255f22 ("net: Add ndo_gso_check")
    Tested-by: Hayes Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jesse Gross
     
  • [ Upstream commit 6d08acd2d32e3e877579315dc3202d7a5f336d98 ]

    Resolve conflicts between glibc definition of IPV6 socket options
    and those defined in Linux headers. Looks like earlier efforts to
    solve this did not cover all the definitions.

    It resolves warnings during iproute2 build.
    Please consider for stable as well.

    Signed-off-by: Stephen Hemminger
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    stephen hemminger
     

16 Jan, 2015

7 commits

  • commit fee7e49d45149fba60156f5b59014f764d3e3728 upstream.

    Jay Foad reports that the address sanitizer test (asan) sometimes gets
    confused by a stack pointer that ends up being outside the stack vma
    that is reported by /proc/maps.

    This happens due to an interaction between RLIMIT_STACK and the guard
    page: when we do the guard page check, we ignore the potential error
    from the stack expansion, which effectively results in a missing guard
    page, since the expected stack expansion won't have been done.

    And since /proc/maps explicitly ignores the guard page (commit
    d7824370e263: "mm: fix up some user-visible effects of the stack guard
    page"), the stack pointer ends up being outside the reported stack area.

    This is the minimal patch: it just propagates the error. It also
    effectively makes the guard page part of the stack limit, which in turn
    measn that the actual real stack is one page less than the stack limit.

    Let's see if anybody notices. We could teach acct_stack_growth() to
    allow an extra page for a grow-up/grow-down stack in the rlimit test,
    but I don't want to add more complexity if it isn't needed.

    Reported-and-tested-by: Jay Foad
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit 2d6d7f98284648c5ed113fe22a132148950b140f upstream.

    Tejun, while reviewing the code, spotted the following race condition
    between the dirtying and truncation of a page:

    __set_page_dirty_nobuffers() __delete_from_page_cache()
    if (TestSetPageDirty(page))
    page->mapping = NULL
    if (PageDirty())
    dec_zone_page_state(page, NR_FILE_DIRTY);
    dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
    if (page->mapping)
    account_page_dirtied(page)
    __inc_zone_page_state(page, NR_FILE_DIRTY);
    __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);

    which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.

    Dirtiers usually lock out truncation, either by holding the page lock
    directly, or in case of zap_pte_range(), by pinning the mapcount with
    the page table lock held. The notable exception to this rule, though,
    is do_wp_page(), for which this race exists. However, do_wp_page()
    already waits for a locked page to unlock before setting the dirty bit,
    in order to prevent a race where clear_page_dirty() misses the page bit
    in the presence of dirty ptes. Upgrade that wait to a fully locked
    set_page_dirty() to also cover the situation explained above.

    Afterwards, the code in set_page_dirty() dealing with a truncation race
    is no longer needed. Remove it.

    Reported-by: Tejun Heo
    Signed-off-by: Johannes Weiner
    Acked-by: Kirill A. Shutemov
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Johannes Weiner
     
  • commit 1e359a5de861a57aa04d92bb620f52a5c1d7f8b1 upstream.

    This reverts commit ca34e3b5c808385b175650605faa29e71e91991b.

    It turns out that the p54 and cw2100 drivers assume that there's
    tailroom even when they don't say they really need it. However,
    there's currently no way for them to explicitly say they do need
    it, so for now revert this.

    This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.

    Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter")
    Reported-by: Christopher Chavez
    Bisected-by: Larry Finger
    Debugged-by: Christian Lamparter
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • commit 31d4ea1a093fcf668d5f95af44b8d41488bdb7ec upstream.

    An attempt to fix fcopy on i586 (bc5a5b0 Drivers: hv: util: Properly pack the data
    for file copy functionality) led to a regression on x86_64 (and actually didn't fix
    i586 breakage). Fcopy messages from Hyper-V host come in the following format:

    struct do_fcopy_hdr | 36 bytes
    0000 | 4 bytes
    offset | 8 bytes
    size | 4 bytes
    data | 6144 bytes

    On x86_64 struct hv_do_fcopy matched this format without ' __attribute__((packed))'
    and on i586 adding ' __attribute__((packed))' to it doesn't change anything. Keep
    the structure packed and add padding to match re reality. Tested both i586 and x86_64
    on Hyper-V Server 2012 R2.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • commit aee4e5f3d3abb7a2239dd02f6d8fb173413fd02f upstream.

    When recording the state of a task for the sched_switch tracepoint a check of
    task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
    because, technically, a task being preempted is really in the TASK_RUNNING
    state, and that is what should be recorded when tracing a sched_switch,
    even if the task put itself into another state (it hasn't scheduled out
    in that state yet).

    But with the change to use per_cpu preempt counts, the
    task_thread_info(p)->preempt_count is no longer used, and instead
    task_preempt_count(p) is used.

    The problem is that this does not use the current preempt count but a stale
    one from a previous sched_switch. The task_preempt_count(p) uses
    saved_preempt_count and not preempt_count(). But for tracing sched_switch,
    if p is current, we really want preempt_count().

    I hit this bug when I was tracing sleep and the call from do_nanosleep()
    scheduled out in the "RUNNING" state.

    sleep-4290 [000] 537272.259992: sched_switch: sleep:4290 [120] R ==> swapper/0:0 [120]
    sleep-4290 [000] 537272.260015: kernel_stack:
    => __schedule (ffffffff8150864a)
    => schedule (ffffffff815089f8)
    => do_nanosleep (ffffffff8150b76c)
    => hrtimer_nanosleep (ffffffff8108d66b)
    => SyS_nanosleep (ffffffff8108d750)
    => return_to_handler (ffffffff8150e8e5)
    => tracesys_phase2 (ffffffff8150c844)

    After a bit of hair pulling, I found that the state was really
    TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
    set and caused the sched_switch tracepoint to show it as RUNNING.

    Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.home

    Acked-by: Ingo Molnar
    Cc: Peter Zijlstra
    Fixes: 01028747559a "sched: Create more preempt_count accessors"
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Red Hat)
     
  • commit 027bc8b08242c59e19356b4b2c189f2d849ab660 upstream.

    On some ARMs the memory can be mapped pgprot_noncached() and still
    be working for atomic operations. As pointed out by Colin Cross
    , in some cases you do want to use
    pgprot_noncached() if the SoC supports it to see a debug printk
    just before a write hanging the system.

    On ARMs, the atomic operations on strongly ordered memory are
    implementation defined. So let's provide an optional kernel parameter
    for configuring pgprot_noncached(), and use pgprot_writecombine() by
    default.

    Cc: Arnd Bergmann
    Cc: Rob Herring
    Cc: Randy Dunlap
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Olof Johansson
    Cc: Russell King
    Acked-by: Kees Cook
    Signed-off-by: Tony Lindgren
    Signed-off-by: Tony Luck
    Signed-off-by: Greg Kroah-Hartman

    Tony Lindgren
     
  • commit 63f13448d81c910a284b096149411a719cbed501 upstream.

    Since both ppc and ppc64 have LE variants which are now reported by uname, add
    that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE
    variant.

    Without this, perf trace and auditctl fail.

    Mainline kernel reports ppc64le (per a058801) but there is no matching
    AUDIT_ARCH_PPC64LE.

    Since 32-bit PPC LE is not supported by audit, don't advertise it in
    AUDIT_ARCH_PPC* variants.

    See:
    https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html
    https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html

    Signed-off-by: Richard Guy Briggs
    Acked-by: Paul Moore
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Richard Guy Briggs
     

09 Jan, 2015

4 commits

  • commit 041d7b98ffe59c59fdd639931dea7d74f9aa9a59 upstream.

    A regression was caused by commit 780a7654cee8:
    audit: Make testing for a valid loginuid explicit.
    (which in turn attempted to fix a regression caused by e1760bd)

    When audit_krule_to_data() fills in the rules to get a listing, there was a
    missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.

    This broke userspace by not returning the same information that was sent and
    expected.

    The rule:
    auditctl -a exit,never -F auid=-1
    gives:
    auditctl -l
    LIST_RULES: exit,never f24=0 syscall=all
    when it should give:
    LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all

    Tag it so that it is reported the same way it was set. Create a new
    private flags audit_krule field (pflags) to store it that won't interact with
    the public one from the API.

    Signed-off-by: Richard Guy Briggs
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Richard Guy Briggs
     
  • commit 9cc46516ddf497ea16e8d7cb986ae03a0f6b92f8 upstream.

    - Expose the knob to user space through a proc file /proc//setgroups

    A value of "deny" means the setgroups system call is disabled in the
    current processes user namespace and can not be enabled in the
    future in this user namespace.

    A value of "allow" means the segtoups system call is enabled.

    - Descendant user namespaces inherit the value of setgroups from
    their parents.

    - A proc file is used (instead of a sysctl) as sysctls currently do
    not allow checking the permissions at open time.

    - Writing to the proc file is restricted to before the gid_map
    for the user namespace is set.

    This ensures that disabling setgroups at a user namespace
    level will never remove the ability to call setgroups
    from a process that already has that ability.

    A process may opt in to the setgroups disable for itself by
    creating, entering and configuring a user namespace or by calling
    setns on an existing user namespace with setgroups disabled.
    Processes without privileges already can not call setgroups so this
    is a noop. Prodcess with privilege become processes without
    privilege when entering a user namespace and as with any other path
    to dropping privilege they would not have the ability to call
    setgroups. So this remains within the bounds of what is possible
    without a knob to disable setgroups permanently in a user namespace.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 273d2c67c3e179adb1e74f403d1e9a06e3f841b5 upstream.

    setgroups is unique in not needing a valid mapping before it can be called,
    in the case of setgroups(0, NULL) which drops all supplemental groups.

    The design of the user namespace assumes that CAP_SETGID can not actually
    be used until a gid mapping is established. Therefore add a helper function
    to see if the user namespace gid mapping has been established and call
    that function in the setgroups permission check.

    This is part of the fix for CVE-2014-8989, being able to drop groups
    without privilege using user namespaces.

    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream.

    Today there are 3 instances of setgroups and due to an oversight their
    permission checking has diverged. Add a common function so that
    they may all share the same permission checking code.

    This corrects the current oversight in the current permission checks
    and adds a helper to avoid this in the future.

    A user namespace security fix will update this new helper, shortly.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

17 Dec, 2014

1 commit


05 Dec, 2014

1 commit


29 Nov, 2014

2 commits

  • Pull staging/IIO driver fixes from Greg KH:
    "Here are some staging and IIO driver fixes for 3.18-rc7 that resolve a
    number of reported issues, and a new device id for a staging wireless
    driver.

    All of these have been in linux-next"

    * tag 'staging-3.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: r8188eu: Add new device ID for DLink GO-USB-N150
    staging: r8188eu: Fix scheduling while atomic error introduced in commit fadbe0cd
    iio: accel: bmc150: set low default thresholds
    iio: accel: bmc150: Fix iio_event_spec direction
    iio: accel: bmc150: Send x, y and z motion separately
    iio: accel: bmc150: Error handling when mode set fails
    iio: gyro: bmg160: Fix iio_event_spec direction
    iio: gyro: bmg160: Send x, y and z motion separately
    iio: gyro: bmg160: Don't let interrupt mode to be open drain
    iio: gyro: bmg160: Error handling when mode set fails
    iio: adc: men_z188_adc: Add terminating entry for men_z188_ids
    iio: accel: kxcjk-1013: Fix kxcjk10013_set_range
    iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "No excitement, here are only minor fixes: an endian fix for the new
    DSD format we added in 3.18, a fix for HP mute LED, and a fix for
    Native Instrument quirk"

    * tag 'sound-3.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: pcm: Add big-endian DSD sample formats and fix XMOS DSD sample format
    ALSA: hda - One more HP machine needs to change mute led quirk
    ALSA: usb-audio: Use snd_usb_ctl_msg() for Native Instruments quirk

    Linus Torvalds
     

28 Nov, 2014

1 commit

  • Pull networking fixes from David Miller:
    "Several small fixes here:

    1) Don't crash in tg3 driver when the number of tx queues has been
    configured to be different from the number of rx queues. From
    Thadeu Lima de Souza Cascardo.

    2) VLAN filter not disabled properly in promisc mode in ixgbe driver,
    from Vlad Yasevich.

    3) Fix OOPS on dellink op in VTI tunnel driver, from Xin Long.

    4) IPV6 GRE driver WCCP code checks skb->protocol for ETH_P_IP
    instead of ETH_P_IPV6, whoops. From Yuri Chislov.

    5) Socket matching in ping driver is buggy when packet AF does not
    match socket's AF. Fix from Jane Zhou.

    6) Fix checksum calculation errors in VXLAN due to where the
    udp_tunnel6_xmit_skb() helper gets it's saddr/daddr from. From
    Alexander Duyck.

    7) Fix 5G detection problem in rtlwifi driver, from Larry Finger.

    8) Fix NULL deref in tcp_v{4,6}_send_reset, from Eric Dumazet.

    9) Various missing netlink attribute verifications in bridging code,
    from Thomas Graf.

    10) tcp_recvmsg() unconditionally calls ipv4 ip_recv_error even for
    ipv6 sockets, whoops. Fix from Willem de Bruijn"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
    net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks
    bridge: Sanitize IFLA_EXT_MASK for AF_BRIDGE:RTM_GETLINK
    bridge: Add missing policy entry for IFLA_BRPORT_FAST_LEAVE
    net: Check for presence of IFLA_AF_SPEC
    net: Validate IFLA_BRIDGE_MODE attribute length
    bridge: Validate IFLA_BRIDGE_FLAGS attribute length
    stmmac: platform: fix default values of the filter bins setting
    net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
    net: dsa: bcm_sf2: reset switch prior to initialization
    net: dsa: bcm_sf2: fix unmapping registers in case of errors
    tg3: fix ring init when there are more TX than RX channels
    tcp: fix possible NULL dereference in tcp_vX_send_reset()
    rtlwifi: Change order in device startup
    rtlwifi: rtl8821ae: Fix 5G detection problem
    Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"
    vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX]
    ip6_udp_tunnel: Fix checksum calculation
    net-timestamp: Fix a documentation typo
    net/ping: handle protocol mismatching scenario
    af_packet: fix sparse warning
    ...

    Linus Torvalds
     

27 Nov, 2014

2 commits

  • TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets.
    If the socket is of family AF_INET6, call ipv6_recv_error instead
    of ip_recv_error.

    This change is more complex than a single branch due to the loadable
    ipv6 module. It reuses a pre-existing indirect function call from
    ping. The ping code is safe to call, because it is part of the core
    ipv6 module and always present when AF_INET6 sockets are active.

    Fixes: 4ed2d765 (net-timestamp: TCP timestamping)
    Signed-off-by: Willem de Bruijn

    ----

    It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6)
    to ip_recv_error.
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • …/jic23/iio into staging-linus

    Jonathan writes:

    Third set of IIO fixes for the 3.18 cycle.

    Most of these are fairly standard little fixes, a bmc150 and bmg160 patch
    is to make an ABI change to indicated a specific axis in an event rather
    than the generic option in the original drivers. As both of these drivers
    are new in this cycle it would be ideal to push this minor change through
    even though it isn't strictly a fix. A couple of other 'fixes' change
    defaults for some settings on these new drivers to more intuitive calues.
    Looks like some useful feedback has been coming in for this driver
    since it was applied.

    * IIO_EVENT_CODE_EXTRACT_DIR bit mask was wrong and has been for a while
    0xCF clearly doesn't give a contiguous bitmask.
    * kxcjk-1013 range setting was failing to mask out the previous value
    in the register and hence was 'enable only'.
    * men_z188 device id table wasn't null terminated.
    * bmg160 and bmc150 both failed to correctly handling an error in mode
    setting.
    * bmg160 and bmc150 both had a bug in setting the event direction in the
    event spec (leads to an attribute name being incorrect)
    * bmg160 defaulted to an open drain output for the interrupt - as a default
    this obviously only works with some interrupt chips - hence change the
    default to push-pull (note this is a new driver so we aren't going to
    cause any regressions with this change).
    * bmc150 had an unintuitive default for the rate of change (motion detector)
    so change it to 0 (new driver so change of default won't cause any
    regressions).

    Greg Kroah-Hartman
     

26 Nov, 2014

3 commits

  • This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
    kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

    The problem being addressed by the patch above was that some ARM code
    based the memory mapping attributes of a pfn on the return value of
    kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
    be mapped as device memory.

    However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
    and the existing non-ARM users were already using it in a way which
    suggests that its name should probably have been 'kvm_is_reserved_pfn'
    from the beginning, e.g., whether or not to call get_page/put_page on
    it etc. This means that returning false for the zero page is a mistake
    and the patch above should be reverted.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Paolo Bonzini

    Ard Biesheuvel
     
  • Pull powerpc fixes from Ben Herrenschmidt:
    "This series fix a nasty issue with radeon adapters on powerpc servers,
    it's all CC'ed stable and has the relevant maintainers ack's/reviews.

    Basically, some (radeon) adapters have issues with MSI addresses above
    1T (only support 40-bits). We had powerpc specific quirk but it only
    listed a specific revision of an adapter that we shipped with our
    machines and didn't properly handle the audio function which some
    distros enable nowadays.

    So we made the quirk generic and fixed both the graphic and audio
    drivers properly to use it.

    Without that, ppc64 server machines will crash at boot with a radeon
    adapter.

    Note: This has been brewing for a while, it just needed a last respin
    which got delayed due to us moving ozlabs to a new location in town
    and other such things taking priority"

    * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/pci: Remove unused force_32bit_msi quirk
    powerpc/pseries: Honor the generic "no_64bit_msi" flag
    powerpc/powernv: Honor the generic "no_64bit_msi" flag
    sound/radeon: Move 64-bit MSI quirk from arch to driver
    gpu/radeon: Set flag to indicate broken 64-bit MSI
    PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
    ALSA: hda - Limit 40bit DMA for AMD HDMI controllers

    Linus Torvalds
     
  • Pull clock fixes from Mike Turquette:
    "The fixes for the clock framework are all regressions in drivers, plus
    a single fix in one of the basic clock templates. No fixes to the
    core this time around.

    As with most clock driver fixes these run the gamut from fixing a
    build warning to fixing wrecked memory timings, with a little USB
    tossed in for fun"

    * tag 'clk-fixes-for-linus' of https://git.linaro.org/people/mike.turquette/linux:
    clk: pxa: fix pxa27x CCCR bit usage
    clk-divider: Fix READ_ONLY when divider > 1
    clk: qcom: Fix duplicate rbcpr clock name
    clk: at91: usb: fix at91sam9x5 recalc, round and set rate
    clk: at91: usb: fix at91rm9200 round and set rate

    Linus Torvalds
     

24 Nov, 2014

3 commits

  • This can be set by quirks/drivers to be used by the architecture code
    that assigns the MSI addresses.

    We additionally add verification in the core MSI code that the values
    assigned by the architecture do satisfy the limitation in order to fail
    gracefully if they don't (ie. the arch hasn't been updated to deal with
    that quirk yet).

    Signed-off-by: Benjamin Herrenschmidt
    CC:
    Acked-by: Bjorn Helgaas

    Benjamin Herrenschmidt
     
  • Pull percpu fix from Tejun Heo:
    "This contains one patch to fix a race condition which can lead to
    percpu_ref using a percpu pointer which is corrupted with a set DEAD
    bit. The bug was introduced while separating out the ATOMIC mode flag
    from the DEAD flag. The fix is pretty straight forward.

    I just committed the patch to the percpu tree but am sending out the
    pull request early as I'll be on vacation for a week. The patch
    should be fairly safe and while the latency will be higher I'll be
    checking emails"

    * 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu-ref: fix DEAD flag contamination of percpu pointer

    Linus Torvalds
     
  • While decoupling ATOMIC and DEAD flags, f47ad4578461 ("percpu_ref:
    decouple switching to percpu mode and reinit") updated
    __ref_is_percpu() so that it only tests ATOMIC flag to determine
    whether the ref is in percpu mode or not; however, while DEAD implies
    ATOMIC, the two flags are set separately during percpu_ref_kill() and
    if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
    ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr
    value verbatim as the percpu pointer after testing ATOMIC, the pointer
    may now be contaminated with the DEAD flag.

    This can be fixed by clearing the flag bits before returning the
    pointer which was the fix proposed by Shaohua; however, as DEAD
    implies ATOMIC, we can just test for both flags at once and avoid the
    explicit masking.

    Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
    are clear before returning @ref->percpu_count_ptr as the percpu
    pointer.

    Signed-off-by: Tejun Heo
    Reported-and-Reviewed-by: Shaohua Li
    Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
    Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")

    Tejun Heo
     

22 Nov, 2014

2 commits

  • Pull networking fixes from David Miller:

    1) Fix BUG when decrypting empty packets in mac80211, from Ronald Wahl.

    2) nf_nat_range is not fully initialized and this is copied back to
    userspace, from Daniel Borkmann.

    3) Fix read past end of b uffer in netfilter ipset, also from Dan
    Carpenter.

    4) Signed integer overflow in ipv4 address mask creation helper
    inet_make_mask(), from Vincent BENAYOUN.

    5) VXLAN, be2net, mlx4_en, and qlcnic need ->ndo_gso_check() methods to
    properly describe the device's capabilities, from Joe Stringer.

    6) Fix memory leaks and checksum miscalculations in openvswitch, from
    Pravin B SHelar and Jesse Gross.

    7) FIB rules passes back ambiguous error code for unreachable routes,
    making behavior confusing for userspace. Fix from Panu Matilainen.

    8) ieee802154fake_probe() doesn't release resources properly on error,
    from Alexey Khoroshilov.

    9) Fix skb_over_panic in add_grhead(), from Daniel Borkmann.

    10) Fix access of stale slave pointers in bonding code, from Nikolay
    Aleksandrov.

    11) Fix stack info leak in PPP pptp code, from Mathias Krause.

    12) Cure locking bug in IPX stack, from Jiri Bohac.

    13) Revert SKB fclone memory freeing optimization that is racey and can
    allow accesses to freed up memory, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (71 commits)
    tcp: Restore RFC5961-compliant behavior for SYN packets
    net: Revert "net: avoid one atomic operation in skb_clone()"
    virtio-net: validate features during probe
    cxgb4 : Fix DCB priority groups being returned in wrong order
    ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
    openvswitch: Don't validate IPv6 label masks.
    pptp: fix stack info leak in pptp_getname()
    brcmfmac: don't include linux/unaligned/access_ok.h
    cxgb4i : Don't block unload/cxgb4 unload when remote closes TCP connection
    ipv6: delete protocol and unregister rtnetlink when cleanup
    net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
    bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
    mac80211: minstrel_ht: fix a crash in rate sorting
    vxlan: Inline vxlan_gso_check().
    can: m_can: update to support CAN FD features
    can: m_can: fix incorrect error messages
    can: m_can: add missing delay after setting CCCR_INIT bit
    can: m_can: fix not set can_dlc for remote frame
    can: m_can: fix possible sleep in napi poll
    can: m_can: add missing message RAM initialization
    ...

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "This batch ended up as a relatively high volume due to pending ASoC
    fixes. But most of fixes there are trivial and/or device- specific
    fixes and quirks, so safe to apply. The only (ASoC) core fixes are
    the DPCM race fix and the machine-driver matching fix for
    componentization"

    * tag 'sound-3.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: hda - fix the mic mute led problem for Latitude E5550
    ALSA: hda - move DELL_WMI_MIC_MUTE_LED to the tail in the quirk chain
    ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use
    ALSA: usb-audio: Set the Control Selector to SU_SELECTOR_CONTROL for UAC2
    ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices
    ASoC: sgtl5000: Fix SMALL_POP bit definition
    ASoC: cs42l51: re-hook of_match_table pointer
    ASoC: rt5670: change dapm routes of PLL connection
    ASoC: rt5670: correct the incorrect default values
    ASoC: samsung: Add MODULE_DEVICE_TABLE for Snow
    ASoC: max98090: Correct pclk divisor settings
    ASoC: dpcm: Fix race between FE/BE updates and trigger
    ASoC: Fix snd_soc_find_dai() matching component by name
    ASoC: rsnd: remove unsupported PAUSE flag
    ASoC: fsi: remove unsupported PAUSE flag
    ASoC: rt5645: Mark RT5645_TDM_CTRL_3 as readable
    ASoC: rockchip-i2s: fix infinite loop in rockchip_snd_rxctrl
    ASoC: es8328-i2c: Fix i2c_device_id name field in es8328_id
    ASoC: fsl_asrc: Add reg_defaults for regmap to fix kernel dump

    Linus Torvalds