21 Jul, 2013

1 commit


19 Jul, 2013

5 commits

  • Pull networking fixes from David Miller:
    "A couple interesting SKB fragment handling fixes, plus the usual small
    bits here and there:

    1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
    Tim Gardner.

    2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
    address printing helper.

    3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
    device can't do checksumming offloads then it shouldn't say it can
    do SG either. From Haiyang Zhang.

    4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.

    5) Don't leak DMA mappings on mapping failures, from Neil Horman.

    6) We need to reset the transport header of SKBs in ipv4 before we
    attempt to perform early socket demux, just like ipv6 does. From
    Eric Dumazet.

    7) Add missing locking on vxlan device removal, from Stephen
    Hemminger.

    8) xen-netfront has to make two passes over an SKB to prepare it for
    transfer. One pass calculates the number of slots needed, the
    second massages the SKB and fills the slots. Unfortunately, the
    first pass doesn't calculate the number of slots properly so we
    can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
    work out so well. Fix from Jan Beulich with help and discussion
    with several others.

    9) Fix a similar problem in tun and macvtap, which have to split up
    scatter-gather elements at PAGE_SIZE boundaries. Don't do
    zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from
    Jason Wang.

    10) On receive, once we've decoded the VLAN state completely, clear
    skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger
    the VLAN code again, corrupting the packet. Fix from Eric
    Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    vlan: fix a race in egress prio management
    vlan: mask vlan prio bits
    macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
    tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
    pkt_sched: sch_qfq: remove a source of high packet delay/jitter
    xen-netfront: pull on receive skb may need to happen earlier
    vxlan: add necessary locking on device removal
    hyperv: Fix the NETIF_F_SG flag setting in netvsc
    net: Fix sysfs_format_mac() code duplication.
    be2net: Fix to avoid hardware workaround when not needed
    macvtap: do not assume 802.1Q when send vlan packets
    macvtap: fix the missing ret value of TUNSETQUEUE
    ipv4: set transport header earlier
    mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
    bgmac: add dependency to phylib
    net/irda: fixed style issues in irlan_eth
    ethtool: fixed trailing statements in ethtool
    ndisc: bool initializations should use true and false
    atl1e: unmap partially mapped skb on dma error and free skb

    Linus Torvalds
     
  • egress_priority_map[] hash table updates are protected by rtnl,
    and we never remove elements until device is dismantled.

    We have to make sure that before inserting an new element in hash table,
    all its fields are committed to memory or else another cpu could
    find corrupt values and crash.

    Signed-off-by: Eric Dumazet
    Cc: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
    ("vlan: don't deliver frames for unknown vlans to protocols")
    Florian made sure we set pkt_type to PACKET_OTHERHOST
    if the vlan id is set and we could find a vlan device for this
    particular id.

    But we also have a problem if prio bits are set.

    Steinar reported an issue on a router receiving IPv6 frames with a
    vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
    because skb->vlan_tci is set.

    Forwarded frame is completely corrupted : We can see (8100:4000)
    being inserted in the middle of IPv6 source address :

    16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
    9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
    0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
    0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
    0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
    0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

    It seems we are not really ready to properly cope with this right now.

    We can probably do better in future kernels :
    vlan_get_ingress_priority() should be a netdev property instead of
    a per vlan_dev one.

    For stable kernels, lets clear vlan_tci to fix the bugs.

    Reported-by: Steinar H. Gunderson
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • QFQ+ inherits from QFQ a design choice that may cause a high packet
    delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
    special quantity, the system virtual time, to track the service
    provided by the ideal system it approximates. When a packet is
    dequeued, this quantity must be incremented by the size of the packet,
    divided by the sum of the weights of the aggregates waiting to be
    served. Tracking this sum correctly is a non-trivial task, because, to
    preserve tight service guarantees, the decrement of this sum must be
    delayed in a special way [1]: this sum can be decremented only after
    that its value would decrease also in the ideal system approximated by
    QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
    weight sum, increased and decreased immediately as the weight of an
    aggregate changes, and as an aggregate is created or destroyed (which,
    in its turn, happens as a consequence of some class being
    created/destroyed/changed). However, to avoid the problems caused to
    service guarantees by these immediate decreases, QFQ+ increments the
    system virtual time using the maximum value allowed for the weight
    sum, 2^10, in place of the dynamic, instantaneous value. The
    instantaneous value of the weight sum is used only to check whether a
    request of weight increase or a class creation can be satisfied.

    Unfortunately, the problems caused by this choice are worse than the
    temporary degradation of the service guarantees that may occur, when a
    class is changed or destroyed, if the instantaneous value of the
    weight sum was used to update the system virtual time. In fact, the
    fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
    equal to the ratio between the weight of the aggregate and the sum of
    the weights of the competing aggregates. The packet delay guaranteed
    to the aggregate is instead inversely proportional to the guaranteed
    bandwidth. By using the maximum possible value, and not the actual
    value of the weight sum, QFQ+ provides each aggregate with the worst
    possible service guarantees, and not with service guarantees related
    to the actual set of competing aggregates. To see the consequences of
    this fact, consider the following simple example.

    Suppose that only the following aggregates are backlogged, i.e., that
    only the classes in the following aggregates have packets to transmit:
    one aggregate with weight 10, say A, and ten aggregates with weight 1,
    say B1, B2, ..., B10. In particular, suppose that these aggregates are
    always backlogged. Given the weight distribution, the smoothest and
    fairest service order would be:
    A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...

    QFQ+ would provide exactly this optimal service if it used the actual
    value for the weight sum instead of the maximum possible value, i.e.,
    11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
    serves aggregates as follows (easy to prove and to reproduce
    experimentally):
    A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...

    By replacing 10 with N in the above example, and by increasing N, one
    can increase at will the maximum packet delay and the jitter
    experienced by the classes in aggregate A.

    This patch addresses this issue by just using the above
    'instantaneous' value of the weight sum, instead of the maximum
    possible value, when updating the system virtual time. After the
    instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
    service for a time interval in the order of the time to serve one
    maximum-size packet for each backlogged class. The worst-case extent
    of the deviation exhibited by QFQ+ during this time interval [1] is
    basically the same as of the deviation described above (but, without
    this patch, QFQ+ suffers from such a deviation all the time). Finally,
    this patch modifies the comment to the function qfq_slot_insert, to
    make it coherent with the fact that the weight sum used by QFQ+ can
    now be lower than the maximum possible value.

    [1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
    Proceedings of AAA-IDEA'05, June 2005.

    Signed-off-by: Paolo Valente
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • Pull phase two of __cpuinit removal from Paul Gortmaker:
    "With the __cpuinit infrastructure removed earlier, this group of
    commits only removes the function/data tagging that was done with the
    various (now no-op) __cpuinit related prefixes.

    Now that the dust has settled with yesterday's v3.11-rc1, there
    hopefully shouldn't be any new users leaking back in tree, but I think
    we can leave the harmless no-op stubs there for a release as a
    courtesy to those who still have out of tree stuff and weren't paying
    attention.

    Although the commits are against the recent tag to allow for minor
    context refreshes for things like yesterday's v3.11-rc1~ slab content,
    the patches have been largely unchanged for weeks, aside from such
    trivial updates.

    For detail junkies, the largely boring and mostly irrelevant history
    of the patches can be viewed at:

    http://git.kernel.org/cgit/linux/kernel/git/paulg/cpuinit-delete.git

    If nothing else, I guess it does at least demonstrate the level of
    involvement required to shepherd such a treewide change to completion.

    This is the same repository of patches that has been applied to the
    end of the daily linux-next branches for the past several weeks"

    * 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (28 commits)
    block: delete __cpuinit usage from all block files
    drivers: delete __cpuinit usage from all remaining drivers files
    kernel: delete __cpuinit usage from all core kernel files
    rcu: delete __cpuinit usage from all rcu files
    net: delete __cpuinit usage from all net files
    acpi: delete __cpuinit usage from all acpi files
    hwmon: delete __cpuinit usage from all hwmon files
    cpufreq: delete __cpuinit usage from all cpufreq files
    clocksource+irqchip: delete __cpuinit usage from all related files
    x86: delete __cpuinit usage from all x86 files
    score: delete __cpuinit usage from all score files
    xtensa: delete __cpuinit usage from all xtensa files
    openrisc: delete __cpuinit usage from all openrisc files
    m32r: delete __cpuinit usage from all m32r files
    hexagon: delete __cpuinit usage from all hexagon files
    frv: delete __cpuinit usage from all frv files
    cris: delete __cpuinit usage from all cris files
    metag: delete __cpuinit usage from all metag files
    tile: delete __cpuinit usage from all tile files
    sh: delete __cpuinit usage from all sh files
    ...

    Linus Torvalds
     

18 Jul, 2013

1 commit


17 Jul, 2013

5 commits


15 Jul, 2013

4 commits

  • My static checker marks everything from ntohl() as untrusted and it
    complains we could have an underflow problem doing:

    return (u32 *)&ary->wc_array[nchunks];

    Also on 32 bit systems the upper bound check could overflow.

    Cc: stable@vger.kernel.org
    Signed-off-by: Dan Carpenter
    Signed-off-by: J. Bruce Fields

    Dan Carpenter
     
  • Fix the error pathway if rpcauth_create() fails.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the net/* uses of the __cpuinit macros
    from all C files.

    [1] https://lkml.org/lkml/2013/5/20/589

    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • Pull more vfs stuff from Al Viro:
    "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
    making simple_lookup() usable for filesystems with non-NULL s_d_op,
    which allows us to get rid of quite a bit of ugliness"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    sunrpc: now we can just set ->s_d_op
    cgroup: we can use simple_lookup() now
    efivarfs: we can use simple_lookup() now
    make simple_lookup() usable for filesystems that set ->s_d_op
    configfs: don't open-code d_alloc_name()
    __rpc_lookup_create_exclusive: pass string instead of qstr
    rpc_create_*_dir: don't bother with qstr
    llist: llist_add() can use llist_add_batch()
    llist: fix/simplify llist_add() and llist_add_batch()
    fput: turn "list_head delayed_fput_list" into llist_head
    fs/file_table.c:fput(): add comment
    Safer ABI for O_TMPFILE

    Linus Torvalds
     

14 Jul, 2013

4 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • ... and use d_hash_and_lookup() instead of open-coding it, for fsck sake...

    Signed-off-by: Al Viro

    Al Viro
     
  • just pass the name

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull networking fixes from David Miller:
    "Just a bunch of small fixes and tidy ups:

    1) Finish the "busy_poll" renames, from Eliezer Tamir.

    2) Fix RCU stalls in IFB driver, from Ding Tianhong.

    3) Linearize buffers properly in tun/macvtap zerocopy code.

    4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

    5) Spinlock used before init in alx driver, from Maarten Lankhorst.

    6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
    Kravkov.

    7) Dummy and ifb driver load failure paths can oops, fixes from Tan
    Xiaojun and Ding Tianhong.

    8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

    9) Account all TCP retransmits in SNMP stats properly, from Yuchung
    Cheng.

    10) atl1e and via-rhine do not handle DMA mapping failures properly,
    from Neil Horman.

    11) Various equal-cost multipath route fixes in ipv6 from Hannes
    Frederic Sowa"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
    ipv6: only static routes qualify for equal cost multipathing
    via-rhine: fix dma mapping errors
    atl1e: fix dma mapping warnings
    tcp: account all retransmit failures
    usb/net/r815x: fix cast to restricted __le32
    usb/net/r8152: fix integer overflow in expression
    net: access page->private by using page_private
    net: strict_strtoul is obsolete, use kstrtoul instead
    drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
    drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
    drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
    net/usb: add relative mii functions for r815x
    net/tipc: use %*phC to dump small buffers in hex form
    qlcnic: Adding Maintainers.
    gre: Fix MTU sizing check for gretap tunnels
    pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
    pkt_sched: sch_qfq: improve efficiency of make_eligible
    gso: Update tunnel segmentation to support Tx checksum offload
    inet: fix spacing in assignment
    ifb: fix oops when loading the ifb failed
    ...

    Linus Torvalds
     

13 Jul, 2013

4 commits

  • Static routes in this case are non-expiring routes which did not get
    configured by autoconf or by icmpv6 redirects.

    To make sure we actually get an ecmp route while searching for the first
    one in this fib6_node's leafs, also make sure it matches the ecmp route
    assumptions.

    v2:
    a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF
    already ensures that this route, even if added again without
    RTF_EXPIRES (in case of a RA announcement with infinite timeout),
    does not cause the rt6i_nsiblings logic to go wrong if a later RA
    updates the expiration time later.

    v3:
    a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so,
    because an pmtu event could update the RTF_EXPIRES flag and we would
    not count this route, if another route joins this set. We now filter
    only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that
    don't get changed after rt6_info construction.

    Cc: Nicolas Dichtel
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Change snmp RETRANSFAILS stat to include timeout retransmit failures
    in addition to other loss recoveries.

    Signed-off-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • Signed-off-by: Sunghan Suh
    Signed-off-by: David S. Miller

    Sunghan Suh
     
  • patch found using checkpatch.pl

    Signed-off-by: Cosmin Stanescu
    Signed-off-by: David S. Miller

    “Cosmin
     

12 Jul, 2013

11 commits

  • Instead of passing each byte by stack let's use nice specifier for that.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: David S. Miller

    Andy Shevchenko
     
  • This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
    packets are sent from the interface.

    In my case I was able to reproduce the issue by simply sending a ping of
    1421 bytes with the gretap interface created on a device with a standard
    1500 mtu.

    This fix is based on the fact that the tunnel mtu is already adjusted by
    dev->hard_header_len so it would make sense that any packets being compared
    against that mtu should also be adjusted by hard_header_len and the tunnel
    header instead of just the tunnel header.

    Signed-off-by: Alexander Duyck
    Reported-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch removes the forward declaration of qfq_update_agg_ts, by moving
    the definition of the function above its first call. This patch also
    removes a useless forward declaration of qfq_schedule_agg.

    Reported-by: David S. Miller
    Signed-off-by: Paolo Valente
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • In make_eligible, a mask is used to decide which groups must become eligible:
    the i-th group becomes eligible only if the i-th bit of the mask (from the
    right) is set. The mask is computed by left-shifting a 1 by a given number of
    places, and decrementing the result. The shift is performed on a ULL to avoid
    problems in case the number of places to shift is higher than 31. On a 32-bit
    machine, this is more costly than working on an UL. This patch replaces such a
    costly operation with two cheaper branches.

    The trick is based on the following fact: in case of a shift of at least 32
    places, the resulting mask has at least the 32 less significant bits set,
    whereas the total number of groups is lower than 32. As a consequence, in this
    case it is enough to just set the 32 less significant bits of the mask with a
    cheaper ~0UL. In the other case, the shift can be safely performed on a UL.

    Reported-by: David S. Miller
    Reported-by: David Laight
    Signed-off-by: Paolo Valente
    Signed-off-by: David S. Miller

    Paolo Valente
     
  • This change makes it so that the GRE and VXLAN tunnels can make use of Tx
    checksum offload support provided by some drivers via the hw_enc_features.
    Without this fix enabling GSO means sacrificing Tx checksum offload and
    this actually leads to a performance regression as shown below:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    6276.51 8.39 enabled
    7123.52 8.42 disabled

    To resolve this it was necessary to address two items. First
    netif_skb_features needed to be updated so that it would correctly handle
    the Trans Ether Bridging protocol without impacting the need to check for
    Q-in-Q tagging. To do this it was necessary to update harmonize_features
    so that it used skb_network_protocol instead of just using the outer
    protocol.

    Second it was necessary to update the GRE and UDP tunnel segmentation
    offloads so that they would reset the encapsulation bit and inner header
    offsets after the offload was complete.

    As a result of this change I have seen the following results on a interface
    with Tx checksum enabled for encapsulated frames:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    7123.52 8.42 disabled
    8321.75 5.43 enabled

    v2: Instead of replacing refrence to skb->protocol with
    skb_network_protocol just replace the protocol reference in
    harmonize_features to allow for double VLAN tag checks.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Pull second set of NFS client updates from Trond Myklebust:
    "This mainly contains some small readdir optimisations that had
    dependencies on Al Viro's readdir rewrite. There is also a fix for a
    nasty deadlock which surfaced earlier in this merge window.

    Highlights include:
    - Fix an_rpc pipefs regression that causes a deadlock on mount
    - Readdir optimisations by Scott Mayhew and Jeff Layton
    - clean up the rpc_pipefs dentry operation setup"

    * tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    SUNRPC: Fix a deadlock in rpc_client_register()
    rpc_pipe: rpc_dir_inode_operations can be static
    NFS: Allow nfs_updatepage to extend a write under additional circumstances
    NFS: Make nfs_readdir revalidate less often
    NFS: Make nfs_attribute_cache_expired() non-static
    rpc_pipe: set dentry operations at d_alloc time
    nfs: set verifier on existing dentries in nfs_prime_dcache

    Linus Torvalds
     
  • Found using checkpatch.pl

    Signed-off-by: Camelia Groza
    Signed-off-by: David S. Miller

    Camelia Groza
     
  • This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52
    ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
    information are available").

    Since the removal of rt->n in rt6_info we can end up with a dst ==
    NULL in rt6_check_neigh. In case the kernel is not compiled with
    CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
    NUD state but we must not avoid doing round robin selection on routes
    with the same target. So introduce and pass down a boolean ``do_rr'' to
    indicate when we should update rt->rr_ptr. As soon as no route is valid
    we do backtracking and do a lookup on a higher level in the fib trie.

    v2:
    a) Improved rt6_check_neigh logic (no need to create neighbour there)
    and documented return values.

    v3:
    a) Introduce enum rt6_nud_state to get rid of the magic numbers
    (thanks to David Miller).
    b) Update and shorten commit message a bit to actualy reflect
    the source.

    Reported-by: Pierre Emeriaud
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • p9_release_pages() would attempt to dereference one value past the end of
    pages[]. This would cause the following crashes:

    [ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000
    [ 6293.174146] IP: [] p9_release_pages+0x3b/0x60
    [ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060
    [ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 6293.180060] Modules linked in:
    [ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G W 3.10.0-next-20130710-sasha #3954
    [ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000
    [ 6293.180060] RIP: 0010:[] [] p9_release_pages+0x3b/0x60
    [ 6293.214316] RSP: 0000:ffff880787ddfc28 EFLAGS: 00010202
    [ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000
    [ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40
    [ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000
    [ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
    [ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070
    [ 6293.222017] FS: 00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000
    [ 6293.256784] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0
    [ 6293.256784] Stack:
    [ 6293.256784] ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8
    [ 6293.256784] ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001
    [ 6293.256784] ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000
    [ 6293.256784] Call Trace:
    [ 6293.256784] [] p9_virtio_zc_request+0x598/0x630
    [ 6293.256784] [] ? wake_up_bit+0x40/0x40
    [ 6293.256784] [] p9_client_zc_rpc+0x111/0x3a0
    [ 6293.256784] [] ? sched_clock_cpu+0x108/0x120
    [ 6293.256784] [] p9_client_read+0xe1/0x2c0
    [ 6293.256784] [] v9fs_file_read+0x90/0xc0
    [ 6293.256784] [] vfs_read+0xc3/0x130
    [ 6293.256784] [] ? trace_hardirqs_on+0xd/0x10
    [ 6293.256784] [] SyS_read+0x62/0xa0
    [ 6293.256784] [] tracesys+0xdd/0xe2
    [ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c
    [ 6293.256784] RIP [] p9_release_pages+0x3b/0x60
    [ 6293.256784] RSP
    [ 6293.256784] CR2: ffff8807c96f3000
    [ 6293.256784] ---[ end trace 50822ee72cd360fc ]---

    Signed-off-by: Sasha Levin
    Signed-off-by: David S. Miller

    Sasha Levin
     
  • …inux/kernel/git/ericvh/v9fs

    Pull second round of 9p patches from Eric Van Hensbergen:
    "Several of these patches were rebased in order to correct style
    issues. Only stylistic changes were made versus the patches which
    were in linux-next for two weeks. The rebases have been in linux-next
    for 3 days and have passed my regressions.

    The bulk of these are RDMA fixes and improvements. There's also some
    additions on the extended attributes front to support some additional
    namespaces and a new option for TCP to force allocation of mount
    requests from a priviledged port"

    * tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
    9P: Add cancelled() to the transport functions.
    9P/RDMA: count posted buffers without a pending request
    9P/RDMA: Improve error handling in rdma_request
    9P/RDMA: Do not free req->rc in error handling in rdma_request()
    9P/RDMA: Use a semaphore to protect the RQ
    9P/RDMA: Protect against duplicate replies
    9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
    9pnet: refactor struct p9_fcall alloc code
    9P/RDMA: rdma_request() needs not allocate req->rc
    9P: Fix fcall allocation for rdma
    fs/9p: xattr: add trusted and security namespaces
    net/9p: add privport option to 9p tcp transport

    Linus Torvalds
     
  • Pull nfsd changes from Bruce Fields:
    "Changes this time include:

    - 4.1 enabled on the server by default: the last 4.1-specific issues
    I know of are fixed, so we're not going to find the rest of the
    bugs without more exposure.
    - Experimental support for NFSv4.2 MAC Labeling (to allow running
    selinux over NFS), from Dave Quigley.
    - Fixes for some delicate cache/upcall races that could cause rare
    server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
    debugging persistence.
    - Fixes for some bugs found at the recent NFS bakeathon, mostly v4
    and v4.1-specific, but also a generic bug handling fragmented rpc
    calls"

    * 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
    nfsd4: support minorversion 1 by default
    nfsd4: allow destroy_session over destroyed session
    svcrpc: fix failures to handle -1 uid's
    sunrpc: Don't schedule an upcall on a replaced cache entry.
    net/sunrpc: xpt_auth_cache should be ignored when expired.
    sunrpc/cache: ensure items removed from cache do not have pending upcalls.
    sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
    sunrpc/cache: remove races with queuing an upcall.
    nfsd4: return delegation immediately if lease fails
    nfsd4: do not throw away 4.1 lock state on last unlock
    nfsd4: delegation-based open reclaims should bypass permissions
    svcrpc: don't error out on small tcp fragment
    svcrpc: fix handling of too-short rpc's
    nfsd4: minor read_buf cleanup
    nfsd4: fix decoding of compounds across page boundaries
    nfsd4: clean up nfs4_open_delegation
    NFSD: Don't give out read delegations on creates
    nfsd4: allow client to send no cb_sec flavors
    nfsd4: fail attempts to request gss on the backchannel
    nfsd4: implement minimal SP4_MACH_CRED
    ...

    Linus Torvalds
     

11 Jul, 2013

5 commits

  • We could end up expiring a route which is part of an ecmp route set. Doing
    so would invalidate the rt->rt6i_nsiblings calculations and could provoke
    the following panic:

    [ 80.144667] ------------[ cut here ]------------
    [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
    [ 80.145172] invalid opcode: 0000 [#1] SMP
    [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
    +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
    [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
    [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000
    [ 80.145172] RIP: 0010:[] [] fib6_add+0x75d/0x830
    [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202
    [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480
    [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738
    [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001
    [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680
    [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90
    [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
    [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0
    [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 80.145172] Stack:
    [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
    [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
    [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
    [ 80.145172] Call Trace:
    [ 80.145172] [] ? rt6_bind_peer+0x4b/0x90
    [ 80.145172] [] __ip6_ins_rt+0x45/0x70
    [ 80.145172] [] ip6_ins_rt+0x35/0x40
    [ 80.145172] [] ip6_pol_route.isra.44+0x3a4/0x4b0
    [ 80.145172] [] ip6_pol_route_output+0x2a/0x30
    [ 80.145172] [] fib6_rule_action+0xd7/0x210
    [ 80.145172] [] ? ip6_pol_route_input+0x30/0x30
    [ 80.145172] [] fib_rules_lookup+0xc6/0x140
    [ 80.145172] [] fib6_rule_lookup+0x44/0x80
    [ 80.145172] [] ? ip6_pol_route_input+0x30/0x30
    [ 80.145172] [] ip6_route_output+0x73/0xb0
    [ 80.145172] [] ip6_dst_lookup_tail+0x2c3/0x2e0
    [ 80.145172] [] ? list_del+0x11/0x40
    [ 80.145172] [] ? remove_wait_queue+0x3c/0x50
    [ 80.145172] [] ip6_dst_lookup_flow+0x3d/0xa0
    [ 80.145172] [] rawv6_sendmsg+0x267/0xc20
    [ 80.145172] [] inet_sendmsg+0x63/0xb0
    [ 80.145172] [] ? selinux_socket_sendmsg+0x23/0x30
    [ 80.145172] [] sock_sendmsg+0xa6/0xd0
    [ 80.145172] [] SYSC_sendto+0x128/0x180
    [ 80.145172] [] ? update_curr+0xec/0x170
    [ 80.145172] [] ? kvm_clock_get_cycles+0x9/0x10
    [ 80.145172] [] ? __getnstimeofday+0x3e/0xd0
    [ 80.145172] [] SyS_sendto+0xe/0x10
    [ 80.145172] [] system_call_fastpath+0x16/0x1b
    [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
    [ 80.145172] RIP [] fib6_add+0x75d/0x830
    [ 80.145172] RSP
    [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]---
    [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt

    Cc: Nicolas Dichtel
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Rename LL_SO to BUSY_POLL_SO
    Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
    Fix up users of these variables.
    Fix documentation for sysctl.

    a patch for the socket.7 man page will follow separately,
    because of limitations of my mail setup.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     
  • Rename ndo_ll_poll to ndo_busy_poll.
    Rename sk_mark_ll to sk_mark_napi_id.
    Rename skb_mark_ll to skb_mark_napi_id.
    Correct all useres of these functions.
    Update comments and defines in include/net/busy_poll.h

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     
  • Rename the file and correct all the places where it is included.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     
  • Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on
    PipeFS MOUNT notifications) introduces a regression when we call
    rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.

    By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we
    end up deadlocking in gss_pipes_dentries_create_net().
    Fix is to register the client and release the mutex before calling
    rpcauth_create().

    Reported-by: Weston Andros Adamson
    Tested-by: Weston Andros Adamson
    Cc: Stanislav Kinsbursky
    Cc: # : 3848160: SUNRPC: fix races on PipeFS MOUNT
    Cc: # : e73f4cc: SUNRPC: split client creation
    Signed-off-by: Trond Myklebust

    Trond Myklebust