12 Apr, 2014

9 commits


11 Apr, 2014

4 commits


31 Mar, 2014

10 commits

  • Linus Torvalds
     
  • Pull vfs fixes from Al Viro:
    "Switch mnt_hash to hlist, turning the races between __lookup_mnt() and
    hash modifications into false negatives from __lookup_mnt() (instead
    of hangs)"

    On the false negatives from __lookup_mnt():
    "The *only* thing we care about is not getting stuck in __lookup_mnt().
    If it misses an entry because something in front of it just got moved
    around, etc, we are fine. We'll notice that mount_lock mismatch and
    that'll be it"

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch mnt_hash to hlist
    don't bother with propagate_mnt() unless the target is shared
    keep shadowed vfsmounts together
    resizable namespace.c hashes

    Linus Torvalds
     
  • I am the new kernel tree Documentation maintainer (except for parts that
    are handled by other people, of course).

    Signed-off-by: Randy Dunlap
    Acked-by: Rob Landley
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Pull input updates from Dmitry Torokhov:
    "Some more updates for the input subsystem.

    You will get a fix for race in mousedev that has been causing quite a
    few oopses lately and a small fixup for force feedback support in
    evdev"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: mousedev - fix race when creating mixed device
    Input: don't modify the id of ioctl-provided ff effect on upload failure

    Linus Torvalds
     
  • It its possible to configure your PAM stack to refuse login if audit
    messages (about the login) were unable to be sent. This is common in
    many distros and thus normal configuration of many containers. The PAM
    modules determine if audit is enabled/disabled in the kernel based on
    the return value from sending an audit message on the netlink socket.
    If userspace gets back ECONNREFUSED it believes audit is disabled in the
    kernel. If it gets any other error else it refuses to let the login
    proceed.

    Just about ever since the introduction of namespaces the kernel audit
    subsystem has returned EPERM if the task sending a message was not in
    the init user or pid namespace. So many forms of containers have never
    worked if audit was enabled in the kernel.

    BUT if the container was not in net_init then the kernel network code
    would send ECONNREFUSED (instead of the audit code sending EPERM). Thus
    by pure accident/dumb luck/bug if an admin configured the PAM stack to
    reject all logins that didn't talk to audit, but then ran the login
    untility in the non-init_net namespace, it would work!! Clearly this was
    a bug, but it is a bug some people expected.

    With the introduction of network namespace support in 3.14-rc1 the two
    bugs stopped cancelling each other out. Now, containers in the
    non-init_net namespace refused to let users log in (just like PAM was
    configfured!) Obviously some people were not happy that what used to let
    users log in, now didn't!

    This fix is kinda hacky. We return ECONNREFUSED for all non-init
    relevant namespaces. That means that not only will the old broken
    non-init_net setups continue to work, now the broken non-init_pid or
    non-init_user setups will 'work'. They don't really work, since audit
    isn't logging things. But it's what most users want.

    In 3.15 we should have patches to support not only the non-init_net
    (3.14) namespace but also the non-init_pid and non-init_user namespace.
    So all will be right in the world. This just opens the doors wide open
    on 3.14 and hopefully makes users happy, if not the audit system...

    Reported-by: Andre Tomt
    Reported-by: Adam Richter
    Signed-off-by: Eric Paris
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Use cmpxchg() to atomically set i_flags instead of clearing out the
    S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
    EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
    where an immutable file has the immutable flag cleared for a brief
    window of time.

    Reported-by: John Sullivan
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • fixes RCU bug - walking through hlist is safe in face of element moves,
    since it's self-terminating. Cyclic lists are not - if we end up jumping
    to another hash chain, we'll loop infinitely without ever hitting the
    original list head.

    [fix for dumb braino folded]

    Spotted by: Max Kellermann
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • If the dest_mnt is not shared, propagate_mnt() does nothing -
    there's no mounts to propagate to and thus no copies to create.
    Might as well don't bother calling it in that case.

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • preparation to switching mnt_hash to hlist

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • * switch allocation to alloc_large_system_hash()
    * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
    * switch mountpoint_hashtable from list_head to hlist_head

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

30 Mar, 2014

5 commits

  • Pull timer fix from Ingo Molnar:
    "A late breaking fix from John. (The bug fixed has a hard lockup
    potential, but that was not observed, warnings were)"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Revert to calling clock_was_set_delayed() while in irq context

    Linus Torvalds
     
  • Pull Ceph fix from Sage Weil:
    "This drops a bad assert that a few users have been hitting but we've
    only recently been able to track down"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: drop an unsafe assertion

    Linus Torvalds
     
  • We should not be using static variable mousedev_mix in methods that can be
    called before that singleton gets assigned. While at it let's add open and
    close methods to mousedev structure so that we do not need to test if we
    are dealing with multiplexor or normal device and simply call appropriate
    method directly.

    This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=71551

    Reported-by: GiulioDP
    Tested-by: GiulioDP
    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry Torokhov

    Dmitry Torokhov
     
  • If a new (id == -1) ff effect was uploaded from userspace,
    ff-core.c::input_ff_upload() will have assigned a positive number to the
    new effect id. Currently, evdev.c::evdev_do_ioctl() will save this new id
    to userspace, regardless of whether the upload succeeded or not.

    On upload failure, this can be confusing because the dev->ff->effects[]
    array will not contain an element at the index of that new effect id.

    This patch fixes this by leaving the id unchanged after upload fails.

    Note: Unfortunately applications should still expect changed effect id for
    quite some time.

    This has been discussed on:
    http://www.mail-archive.com/linux-input@vger.kernel.org/msg08513.html
    ("ff-core effect id handling in case of a failed effect upload")

    Suggested-by: Dmitry Torokhov
    Signed-off-by: Elias Vanderstuyft
    Signed-off-by: Dmitry Torokhov

    Elias Vanderstuyft
     
  • Olivier Bonvalet reported having repeated crashes due to a failed
    assertion he was hitting in rbd_img_obj_callback():

    Assertion failure in rbd_img_obj_callback() at line 2165:
    rbd_assert(which >= img_request->next_completion);

    With a lot of help from Olivier with reproducing the problem
    we were able to determine the object and image requests had
    already been completed (and often freed) at the point the
    assertion failed.

    There was a great deal of discussion on the ceph-devel mailing list
    about this. The problem only arose when there were two (or more)
    object requests in an image request, and the problem was always
    seen when the second request was being completed.

    The problem is due to a race in the window between setting the
    "done" flag on an object request and checking the image request's
    next completion value. When the first object request completes, it
    checks to see if its successor request is marked "done", and if
    so, that request is also completed. In the process, the image
    request's next_completion value is updated to reflect that both
    the first and second requests are completed. By the time the
    second request is able to check the next_completion value, it
    has been set to a value *greater* than its own "which" value,
    which caused an assertion to fail.

    Fix this problem by skipping over any completion processing
    unless the completing object request is the next one expected.
    Test only for inequality (not >=), and eliminate the bad
    assertion.

    Tested-by: Olivier Bonvalet
    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil
    Reviewed-by: Ilya Dryomov

    Alex Elder
     

29 Mar, 2014

12 commits

  • Pull networking fixes from David Miller:

    1) We've discovered a common error in several networking drivers, they
    put VLAN offload features into ->vlan_features, which would suggest
    that they support offloading 2 or more levels of VLAN encapsulation.
    Not only do these devices not do that, but we don't have the
    infrastructure yet to handle that at all.

    Fixes from Vlad Yasevich.

    2) Fix tcpdump crash with bridging and vlans, also from Vlad.

    3) Some MAINTAINERS updates for random32 and bonding.

    4) Fix late reseeds of prandom generator, from Sasha Levin.

    5) Bridge doesn't handle stacked vlans properly, fix from Toshiaki
    Makita.

    6) Fix deadlock in openvswitch, from Flavio Leitner.

    7) get_timewait4_sock() doesn't report delay times correctly, fix from
    Eric Dumazet.

    8) Duplicate address detection and addrconf verification need to run in
    contexts where RTNL can be obtained. Move them to run from a
    workqueue. From Hannes Frederic Sowa.

    9) Fix route refcount leaking in ip tunnels, from Pravin B Shelar.

    10) Don't return -EINTR from non-blocking recvmsg() on AF_UNIX sockets,
    from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
    vlan: Warn the user if lowerdev has bad vlan features.
    veth: Turn off vlan rx acceleration in vlan_features
    ifb: Remove vlan acceleration from vlan_features
    qlge: Do not propaged vlan tag offloads to vlans
    bridge: Fix crash with vlan filtering and tcpdump
    net: Account for all vlan headers in skb_mac_gso_segment
    MAINTAINERS: bonding: change email address
    MAINTAINERS: bonding: change email address
    ipv6: move DAD and addrconf_verify processing to workqueue
    tcp: fix get_timewait4_sock() delay computation on 64bit
    openvswitch: fix a possible deadlock and lockdep warning
    bridge: Fix handling stacked vlan tags
    bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
    vhost: validate vhost_get_vq_desc return value
    vhost: fix total length when packets are too short
    random32: avoid attempt to late reseed if in the middle of seeding
    random32: assign to network folks in MAINTAINERS
    net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset
    core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
    vlan: Set hard_header_len according to available acceleration
    ...

    Linus Torvalds
     
  • Vlad Yasevich says:

    ====================
    Audit all drivers for correct vlan_features.

    Some drivers set vlan acceleration features in vlan_features. This causes
    issues with Q-in-Q/802.1ad configurations.

    Audit all the drivers for correct vlan_features. Fix broken ones.
    Add a warning to vlan code to help catch future offenders.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Some drivers incorrectly assign vlan acceleration features to
    vlan_features thus causing issues for Q-in-Q vlan configurations.
    Warn the user of such cases.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • For completeness, turn off vlan rx acceleration in vlan_features so
    that it doesn't show up on q-in-q setups.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Do not include vlan acceleration features in vlan_features as that
    precludes correct Q-in-Q operation.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • qlge driver turns off NETIF_F_HW_CTAG_FILTER, but forgets to
    turn off HW_CTAG_TX and HW_CTAG_RX on vlan devices. With the
    current settings, q-in-q will only generate a single vlan header.
    Remember to mask off CTAG_TX and CTAG_RX features in vlan_features.

    CC: Shahed Shaikh
    CC: Jitendra Kalsaria
    CC: Ron Mercer
    Signed-off-by: Vlad Yasevich
    Acked-by: Jitendra Kalsaria
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When the vlan filtering is enabled on the bridge, but
    the filter is not configured on the bridge device itself,
    running tcpdump on the bridge device will result in a
    an Oops with NULL pointer dereference. The reason
    is that br_pass_frame_up() will bypass the vlan
    check because promisc flag is set. It will then try
    to get the table pointer and process the packet based
    on the table. Since the table pointer is NULL, we oops.
    Catch this special condition in br_handle_vlan().

    Reported-by: Toshiaki Makita
    CC: Toshiaki Makita
    Signed-off-by: Vlad Yasevich
    Acked-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • skb_network_protocol() already accounts for multiple vlan
    headers that may be present in the skb. However, skb_mac_gso_segment()
    doesn't know anything about it and assumes that skb->mac_len
    is set correctly to skip all mac headers. That may not
    always be the case. If we are simply forwarding the packet (via
    bridge or macvtap), all vlan headers may not be accounted for.

    A simple solution is to allow skb_network_protocol to return
    the vlan depth it has calculated. This way skb_mac_gso_segment
    will correctly skip all mac headers.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Signed-off-by: Veaceslav Falico
    Signed-off-by: David S. Miller

    Veaceslav Falico
     
  • Update my email address.

    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Jay Vosburgh
     
  • Merge two fixes from Andrew Morton:
    "The x86 fix should come from x86 guys but they appear to be
    conferencing or otherwise distracted.

    The ocfs2 fix is a bit of a mess - the code runs into an immediate
    NULL deref and we're trying to work out how this got through test and
    review, but we haven't heard from Goldwyn in the past few days.
    Sasha's patch fixes the oops, but the feature as a whole is probably
    broken. So this is a stopgap for 3.14 - I'll aim to get the real
    fixes into 3.14.x"

    * emailed patches from Andrew Morton akpm@linux-foundation.org>:
    x86: fix boot on uniprocessor systems
    ocfs2: check if cluster name exists before deref

    Linus Torvalds
     
  • On x86 uniprocessor systems topology_physical_package_id() returns -1
    which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
    which leads to GPF in rapl_pmu_init().

    See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

    It turns out that physical_package_id and core_id can actually be
    retreived for uniprocessor systems too. Enabling them also fixes
    rapl_pmu code.

    Signed-off-by: Artem Fetishev
    Cc: Stephane Eranian
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Artem Fetishev