29 Mar, 2013

1 commit

  • Pull userns fixes from Eric W Biederman:
    "The bulk of the changes are fixing the worst consequences of the user
    namespace design oversight in not considering what happens when one
    namespace starts off as a clone of another namespace, as happens with
    the mount namespace.

    The rest of the changes are just plain bug fixes.

    Many thanks to Andy Lutomirski for pointing out many of these issues."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Restrict when proc and sysfs can be mounted
    ipc: Restrict mounting the mqueue filesystem
    vfs: Carefully propogate mounts across user namespaces
    vfs: Add a mount flag to lock read only bind mounts
    userns: Don't allow creation if the user is chrooted
    yama: Better permission check for ptraceme
    pid: Handle the exit of a multi-threaded init.
    scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

    Linus Torvalds
     

27 Mar, 2013

4 commits

  • Pull networking fixes from David Miller:

    1) Always increment IPV4 ID field in encapsulated GSO packets, even
    when DF is set. Regression fix from Pravin B Shelar.

    2) Fix per-net subsystem initialization in netfilter conntrack,
    otherwise we may access dynamically allocated memory before it is
    actually allocated. From Gao Feng.

    3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka.

    4) Fix race between submission of sync vs async commands in mwifiex
    driver, from Amitkumar Karwar.

    5) Add missing cancel of command timer in mwifiex driver, from Bing
    Zhao.

    6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna.

    7) Thermal layer tries to use a genetlink multicast string that is
    longer than the 16 character limit. Fix it and add a BUG check to
    prevent this kind of thing from happening in the future.

    From Masatake YAMATO.

    8) Fix many bugs in the handling of the teardown of L2TP connections,
    UDP encapsulation instances, and sockets. From Tom Parkin.

    9) Missing socket release in IRDA, from Kees Cook.

    10) Fix fec driver modular build, from Fabio Estevam.

    11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop,
    from Wei Yongjun.

    12) Fix bugs in handling of queue numbers and steering rules in mlx4
    driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz.

    13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey
    Vagin.

    14) TCP segmentation deferral is unintentionally done too strongly,
    breaking ACK clocking. Fix from Eric Dumazet.

    15) net_enable_timestamp() can legitimately be invoked from software
    interrupts, and in a way that is safe, so remove the WARN_ON().
    Also from Eric Dumazet.

    16) Fix use after free in VLANs, from Cong Wang.

    17) Fix TCP slow start retransmit storms after SACK reneging, from
    Yuchung Cheng.

    18) Unix socket release should mark a socket dead before NULL'ing out
    sock->sk, otherwise we can race. Fix from Paul Moore.

    19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo.

    20) Fix register mis-programming, NULL pointer derefs, and wrong PHC
    clock frequency in IGB driver. From Lior LevyAlex Williamson, Jiri
    Benc, and Jeff Kirsher.

    21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet
    forwarding. Fix from Veaceslav Falico.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
    ipv4: Fix ip-header identification for gso packets.
    bonding: remove already created master sysfs link on failure
    af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
    pch_gbe: fix ip_summed checksum reporting on rx
    igb: fix PHC stopping on max freq
    igb: make sensor info static
    igb: SR-IOV init reordering
    igb: Fix null pointer dereference
    igb: fix i350 anti spoofing config
    ixgbevf: don't release the soft entries
    ipv6: fix bad free of addrconf_init_net
    unix: fix a race condition in unix_release()
    tcp: undo spurious timeout after SACK reneging
    bnx2x: fix assignment of signed expression to unsigned variable
    bridge: fix crash when set mac address of br interface
    8021q: fix a potential use-after-free
    net: remove a WARN_ON() in net_enable_timestamp()
    tcp: preserve ACK clocking in TSO
    net: fix *_DIAG_MAX constants
    net/mlx4_core: Disallow releasing VF QPs which have steering rules
    ...

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    - Fix an NFSv4 idmapper regression
    - Fix an Oops in the pNFS blocks client
    - Fix up various issues with pNFS layoutcommit
    - Ensure correct read ordering of variables in
    rpc_wake_up_task_queue_locked

    * tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
    NFSv4.1: Add a helper pnfs_commit_and_return_layout
    NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
    NFSv4.1: Fix a race in pNFS layoutcommit
    pnfs-block: removing DM device maybe cause oops when call dev_remove
    NFSv4: Fix the string length returned by the idmapper

    Linus Torvalds
     
  • ip-header id needs to be incremented even if IP_DF flag is set.
    This behaviour was changed in commit 490ab08127cebc25e3a26
    (IP_GRE: Fix IP-Identification).

    Following patch fixes it so that identification is always
    incremented.

    Reported-by: Cong Wang
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • SCM_SCREDENTIALS should apply to write() syscalls only either source or destination
    socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong,
    and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom).

    Origionally-authored-by: Karel Srot
    Signed-off-by: Ding Tianhong
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    dingtianhong
     

26 Mar, 2013

2 commits

  • Signed-off-by: Hong Zhiguo
    Signed-off-by: David S. Miller

    Hong Zhiguo
     
  • As reported by Jan, and others over the past few years, there is a
    race condition caused by unix_release setting the sock->sk pointer
    to NULL before properly marking the socket as dead/orphaned. This
    can cause a problem with the LSM hook security_unix_may_send() if
    there is another socket attempting to write to this partially
    released socket in between when sock->sk is set to NULL and it is
    marked as dead/orphaned. This patch fixes this by only setting
    sock->sk to NULL after the socket has been marked as dead; I also
    take the opportunity to make unix_release_sock() a void function
    as it only ever returned 0/success.

    Dave, I think this one should go on the -stable pile.

    Special thanks to Jan for coming up with a reproducer for this
    problem.

    Reported-by: Jan Stancek
    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     

25 Mar, 2013

5 commits

  • We need to be careful when testing task->tk_waitqueue in
    rpc_wake_up_task_queue_locked, because it can be changed while we
    are holding the queue->lock.
    By adding appropriate memory barriers, we can ensure that it is safe to
    test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     
  • On SACK reneging the sender immediately retransmits and forces a
    timeout but disables Eifel (undo). If the (buggy) receiver does not
    drop any packet this can trigger a false slow-start retransmit storm
    driven by the ACKs of the original packets. This can be detected with
    undo and TCP timestamps.

    Signed-off-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • When I tried to set mac address of a bridge interface to a mac
    address which already learned on this bridge, I got system hang.

    The cause is straight forward: function br_fdb_change_mac_address
    calls fdb_insert with NULL source nbp. Then an fdb lookup is
    performed. If an fdb entry is found and it's local, it's OK. But
    if it's not local, source is dereferenced for printk without NULL
    check.

    Signed-off-by: Hong Zhiguo
    Signed-off-by: David S. Miller

    Hong zhi guo
     
  • vlan_vid_del() could possibly free ->vlan_info after a RCU grace
    period, however, we may still refer to the freed memory area
    by 'grp' pointer. Found by code inspection.

    This patch moves vlan_vid_del() as behind as possible.

    Cc: Patrick McHardy
    Cc: "David S. Miller"
    Signed-off-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Cong Wang
     
  • The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
    positive, in socket clone path, run from softirq context :

    [ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
    [ 3641.668811] Call Trace:
    [ 3641.671254] [] warn_slowpath_common+0x87/0xc0
    [ 3641.677871] [] warn_slowpath_null+0x1a/0x20
    [ 3641.683683] [] net_enable_timestamp+0x7b/0x80
    [ 3641.689668] [] sk_clone_lock+0x425/0x450
    [ 3641.695222] [] inet_csk_clone_lock+0x16/0x170
    [ 3641.701213] [] tcp_create_openreq_child+0x29/0x820
    [ 3641.707663] [] ? ipt_do_table+0x222/0x670
    [ 3641.713354] [] tcp_v4_syn_recv_sock+0xab/0x3d0
    [ 3641.719425] [] tcp_check_req+0x3da/0x530
    [ 3641.724979] [] ? inet_hashinfo_init+0x60/0x80
    [ 3641.730964] [] ? tcp_v4_rcv+0x79f/0xbe0
    [ 3641.736430] [] tcp_v4_do_rcv+0x38d/0x4f0
    [ 3641.741985] [] tcp_v4_rcv+0xa7a/0xbe0

    Its safe at this point because the parent socket owns a reference
    on the netstamp_needed, so we cant have a 0 -> 1 transition, which
    requires to lock a mutex.

    Instead of refining the check, lets remove it, as all known callers
    are safe. If it ever changes in the future, static_key_slow_inc()
    will complain anyway.

    Reported-by: Laurent Chavey
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Mar, 2013

1 commit

  • A long standing problem with TSO is the fact that tcp_tso_should_defer()
    rearms the deferred timer, while it should not.

    Current code leads to following bad bursty behavior :

    20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
    20:11:24.484337 IP B > A: . ack 263721 win 1117
    20:11:24.485086 IP B > A: . ack 265241 win 1117
    20:11:24.485925 IP B > A: . ack 266761 win 1117
    20:11:24.486759 IP B > A: . ack 268281 win 1117
    20:11:24.487594 IP B > A: . ack 269801 win 1117
    20:11:24.488430 IP B > A: . ack 271321 win 1117
    20:11:24.489267 IP B > A: . ack 272841 win 1117
    20:11:24.490104 IP B > A: . ack 274361 win 1117
    20:11:24.490939 IP B > A: . ack 275881 win 1117
    20:11:24.491775 IP B > A: . ack 277401 win 1117
    20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
    20:11:24.492620 IP B > A: . ack 278921 win 1117
    20:11:24.493448 IP B > A: . ack 280441 win 1117
    20:11:24.494286 IP B > A: . ack 281961 win 1117
    20:11:24.495122 IP B > A: . ack 283481 win 1117
    20:11:24.495958 IP B > A: . ack 285001 win 1117
    20:11:24.496791 IP B > A: . ack 286521 win 1117
    20:11:24.497628 IP B > A: . ack 288041 win 1117
    20:11:24.498459 IP B > A: . ack 289561 win 1117
    20:11:24.499296 IP B > A: . ack 291081 win 1117
    20:11:24.500133 IP B > A: . ack 292601 win 1117
    20:11:24.500970 IP B > A: . ack 294121 win 1117
    20:11:24.501388 IP B > A: . ack 295641 win 1117
    20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

    While the expected behavior is more like :

    20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
    20:19:49.260446 IP B > A: . ack 154281 win 1212
    20:19:49.261282 IP B > A: . ack 155801 win 1212
    20:19:49.262125 IP B > A: . ack 157321 win 1212
    20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
    20:19:49.262958 IP B > A: . ack 158841 win 1212
    20:19:49.263795 IP B > A: . ack 160361 win 1212
    20:19:49.264628 IP B > A: . ack 161881 win 1212
    20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
    20:19:49.265465 IP B > A: . ack 163401 win 1212
    20:19:49.265886 IP B > A: . ack 164921 win 1212
    20:19:49.266722 IP B > A: . ack 166441 win 1212
    20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
    20:19:49.267559 IP B > A: . ack 167961 win 1212
    20:19:49.268394 IP B > A: . ack 169481 win 1212
    20:19:49.269232 IP B > A: . ack 171001 win 1212
    20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Van Jacobson
    Cc: Neal Cardwell
    Cc: Nandita Dukkipati
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Mar, 2013

17 commits

  • …wireless into for-davem

    John W. Linville
     
  • This makes sure that release_sock is called for all error conditions in
    irda_getsockopt.

    Signed-off-by: Kees Cook
    Reported-by: Brad Spengler
    Cc: stable@vger.kernel.org
    Signed-off-by: David S. Miller

    Kees Cook
     
  • When using ipconfig the logs currently look like:

    Single name server:
    [ 3.467270] IP-Config: Complete:
    [ 3.470613] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
    [ 3.480670] host=infigo-1, domain=, nis-domain=(none)
    [ 3.486166] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
    [ 3.492910] nameserver0=172.16.42.1[ 3.496853] ALSA device list:

    Three name servers:
    [ 3.496949] IP-Config: Complete:
    [ 3.500293] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
    [ 3.510367] host=infigo-1, domain=, nis-domain=(none)
    [ 3.515864] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
    [ 3.522635] nameserver0=172.16.42.1, nameserver1=172.16.42.100
    [ 3.529149] , nameserver2=172.16.42.200

    Fix newline handling for these cases

    Signed-off-by: Martin Fuzzey
    Signed-off-by: David S. Miller

    Martin Fuzzey
     
  • In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
    doing the work here anyway, also store thoff for a later usage, e.g. in
    the BPF filter.

    Suggested-by: Eric Dumazet
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • If we postpone unhashing of l2tp sessions until the structure is freed, we
    risk:

    1. further packets arriving and getting queued while the pseudowire is being
    closed down
    2. the recv path hitting "scheduling while atomic" errors in the case that
    recv drops the last reference to a session and calls l2tp_session_free
    while in atomic context

    As such, l2tp sessions should be unhashed from l2tp_core data structures early
    in the teardown process prior to calling pseudowire close. For pseudowires
    like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • l2tp's u64_stats writers were incorrectly synchronised, making it possible to
    deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
    code netlink commands while passing data through l2tp sessions.

    Previous discussion on netdev determined that alternative solutions such as
    spinlock writer synchronisation or per-cpu data would bring unjustified
    overhead, given that most users interested in high volume traffic will likely
    be running 64bit kernels on 64bit hardware.

    As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
    thereby avoiding the deadlock.

    Ref:
    http://marc.info/?l=linux-netdev&m=134029167910731&w=2
    http://marc.info/?l=linux-netdev&m=134079868111131&w=2

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • If userspace deletes a ppp pseudowire using the netlink API, either by
    directly deleting the session or by deleting the tunnel that contains the
    session, we need to tear down the corresponding pppox channel.

    Rather than trying to manage two pppox unbind codepaths, switch the netlink
    and l2tp_core session_close handlers to close via. the l2tp_ppp socket
    .release handler.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
    and l2tp_session_delete. Pseudowire implementations which are deleted only
    via. l2tp_core l2tp_session_delete calls can dispense with their own code for
    flushing the reorder queue.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • If an l2tp session is deleted, it is necessary to delete skbs in-flight
    on the session's reorder queue before taking it down.

    Rather than having each pseudowire implementation reaching into the
    l2tp_session struct to handle this itself, provide a function in l2tp_core to
    purge the session queue.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • It is valid for an existing struct sock object to have a NULL sk_socket
    pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
    whether the socket was created by the kernel or by userspace.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • When a user deletes a tunnel using netlink, all the sessions in the tunnel
    should also be deleted. Since running sessions will pin the tunnel socket
    with the references they hold, have the l2tp_tunnel_delete close all sessions
    in a tunnel before finally closing the tunnel socket.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
    socket being closed from userspace. We need to do the same thing for
    IP-encapsulation sockets.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
    tunnel when a UDP-encapsulation socket is destroyed. We need to do something
    similar for IP-encapsulation sockets.

    Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
    call it from their .destroy handlers.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • L2TP sessions hold a reference to the tunnel socket to prevent it going away
    while sessions are still active. However, since tunnel destruction is handled
    by the sock sk_destruct callback there is a catch-22: a tunnel with sessions
    cannot be deleted since each session holds a reference to the tunnel socket.
    If userspace closes a managed tunnel socket, or dies, the tunnel will persist
    and it will be neccessary to individually delete the sessions using netlink
    commands. This is ugly.

    To prevent this occuring, this patch leverages the udp encapsulation socket
    destroy callback to gain early notification when the tunnel socket is closed.
    This allows us to safely close the sessions running in the tunnel, dropping
    the tunnel socket references in the process. The tunnel socket is then
    destroyed as normal, and the tunnel resources deallocated in sk_destruct.

    While we're at it, ensure that l2tp_tunnel_closeall correctly drops session
    references to allow the sessions to be deleted rather than leaking.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • Users of udp encapsulation currently have an encap_rcv callback which they can
    use to hook into the udp receive path.

    In situations where a encapsulation user allocates resources associated with a
    udp encap socket, it may be convenient to be able to also hook the proto
    .destroy operation. For example, if an encap user holds a reference to the
    udp socket, the destroy hook might be used to relinquish this reference.

    This patch adds a socket destroy hook into udp, which is set and enabled
    in the same way as the existing encap_rcv hook.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • Trigger BUG_ON if a group name is longer than GENL_NAMSIZ.

    Signed-off-by: Masatake YAMATO
    Signed-off-by: David S. Miller

    Masatake YAMATO
     

20 Mar, 2013

3 commits

  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are:

    * Restrict IPv6 stateless NPT targets to the mangle table. Many users are
    complaining that this target does not work in the nat table, which is the
    wrong table for it, from Florian Westphal.

    * Fix possible use before initialization in the netns init path of several
    conntrack protocol trackers (introduced recently while improving conntrack
    netns support), from Gao Feng.

    * Fix incorrect initialization of copy_range in nfnetlink_queue, spotted
    by Eric Dumazet during the NFWS2013, patch from myself.

    * Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov.

    * Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu
    not required anymore after change introduced in 3.7, again from Julian.

    * Fix SYN looping in IPVS state sync if the backup is used a real server
    in DR/TUN modes, this required a new /proc entry to disable the director
    function when acting as backup, also from Julian.

    * Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by
    Paul Bolle.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Kconfig symbol IP_NF_QUEUE is unused since commit
    d16cf20e2f2f13411eece7f7fb72c17d141c4a84 ("netfilter: remove ip_queue
    support"). Let's remove it too.

    Signed-off-by: Paul Bolle
    Signed-off-by: Pablo Neira Ayuso

    Paul Bolle
     
  • Pull networking fixes from David Miller:

    1) Fix ARM BPF JIT handling of negative 'k' values, from Chen Gang.

    2) Insufficient space reserved for bridge netlink values, fix from
    Stephen Hemminger.

    3) Some dst_neigh_lookup*() callers don't interpret error pointer
    correctly, fix from Zhouyi Zhou.

    4) Fix transport match in SCTP active_path loops, from Xugeng Zhang.

    5) Fix qeth driver handling of multi-order SKB frags, from Frank
    Blaschka.

    6) fec driver is missing napi_disable() call, resulting in crashes on
    unload, from Georg Hofmann.

    7) Don't try to handle PMTU events on a listening socket, fix from Eric
    Dumazet.

    8) Fix timestamp location calculations in IP option processing, from
    David Ward.

    9) FIB_TABLE_HASHSZ setting is not controlled by the correct kconfig
    tests, from Denis V Lunev.

    10) Fix TX descriptor push handling in SFC driver, from Ben Hutchings.

    11) Fix isdn/hisax and tulip/de4x5 kconfig dependencies, from Arnd
    Bergmann.

    12) bnx2x statistics don't handle 4GB rollover correctly, fix from
    Maciej Żenczykowski.

    13) Openvswitch bug fixes for vport del/new error reporting, missing
    genlmsg_end() call in netlink processing, and mis-parsing of
    LLC/SNAP ethernet types. From Rich Lane.

    14) SKB pfmemalloc state should only be propagated from the head page of
    a compound page, fix from Pavel Emelyanov.

    15) Fix link handling in tg3 driver for 5715 chips when autonegotation
    is disabled. From Nithin Sujir.

    16) Fix inverted test of cpdma_check_free_tx_desc return value in
    davinci_emac driver, from Mugunthan V N.

    17) vlan_depth is incorrectly calculated in skb_network_protocol(), from
    Li RongQing.

    18) Fix probing of Gobi 1K devices in qmi_wwan driver, and fix NCM
    device mode backwards compat in cdc_ncm driver. From Bjørn Mork.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
    inet: limit length of fragment queue hash table bucket lists
    qeth: Fix scatter-gather regression
    qeth: Fix invalid router settings handling
    qeth: delay feature trace
    tcp: dont handle MTU reduction on LISTEN socket
    bnx2x: fix occasional statistics off-by-4GB error
    vhost/net: fix heads usage of ubuf_info
    bridge: Add support for setting BR_ROOT_BLOCK flag.
    bnx2x: add missing napi deletion in error path
    drivers: net: ethernet: ti: davinci_emac: fix usage of cpdma_check_free_tx_desc()
    ethernet/tulip: DE4x5 needs VIRT_TO_BUS
    isdn: hisax: netjet requires VIRT_TO_BUS
    net: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility
    rtnetlink: Mask the rta_type when range checking
    Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"
    Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug
    smsc75xx: configuration help incorrectly mentions smsc95xx
    net: fec: fix missing napi_disable call
    net: fec: restart the FEC when PHY speed changes
    skb: Propagate pfmemalloc on skb from head page only
    ...

    Linus Torvalds
     

19 Mar, 2013

6 commits

  • This patch introduces a constant limit of the fragment queue hash
    table bucket list lengths. Currently the limit 128 is choosen somewhat
    arbitrary and just ensures that we can fill up the fragment cache with
    empty packets up to the default ip_frag_high_thresh limits. It should
    just protect from list iteration eating considerable amounts of cpu.

    If we reach the maximum length in one hash bucket a warning is printed.
    This is implemented on the caller side of inet_frag_find to distinguish
    between the different users of inet_fragment.c.

    I dropped the out of memory warning in the ipv4 fragment lookup path,
    because we already get a warning by the slab allocator.

    Cc: Eric Dumazet
    Cc: Jesper Dangaard Brouer
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • In 3.7 we added code that uses ipv4_update_pmtu but after commit
    c5ae7d4192 (ipv4: must use rcu protection while calling fib_lookup)
    the RCU lock is not needed.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • Dmitry Akindinov is reporting for a problem where SYNs are looping
    between the master and backup server when the backup server is used as
    real server in DR mode and has IPVS rules to function as director.

    Even when the backup function is enabled we continue to forward
    traffic and schedule new connections when the current master is using
    the backup server as real server. While this is not a problem for NAT,
    for DR and TUN method the backup server can not determine if a request
    comes from client or from director.

    To avoid such loops add new sysctl flag backup_only. It can be needed
    for DR/TUN setups that do not need backup and director function at the
    same time. When the backup function is enabled we stop any forwarding
    and pass the traffic to the local stack (real server mode). The flag
    disables the director function when the backup function is enabled.

    For setups that enable backup function for some virtual services and
    director function for other virtual services there should be another
    more complex solution to support DR/TUN mode, may be to assign
    per-virtual service syncid value, so that we can differentiate the
    requests.

    Reported-by: Dmitry Akindinov
    Tested-by: German Myzovsky
    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • Fix wrong but non-fatal access to chunk length.
    sch->length should be in network order, next chunk should
    be aligned to 4 bytes. Problem noticed in sparse output.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • John W. Linville
     
  • When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
    LISTEN socket, and this socket is currently owned by the user, we
    set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.

    This is bad because if we clone the parent before it had a chance to
    clear the flag, the child inherits the tsq_flags value, and next
    tcp_release_cb() on the child will decrement sk_refcnt.

    Result is that we might free a live TCP socket, as reported by
    Dormando.

    IPv4: Attempt to release TCP socket in state 1

    Fix this issue by testing sk_state against TCP_LISTEN early, so that we
    set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)

    This bug was introduced in commit 563d34d05786
    (tcp: dont drop MTU reduction indications)

    Reported-by: dormando
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Mar, 2013

1 commit