03 Jul, 2013

3 commits

  • If L2TP data sequence numbers are enabled and reordering is not
    enabled, data reception stops if a packet is lost since the kernel
    waits for a sequence number that is never resent. (When reordering is
    enabled, data reception restarts when the reorder timeout expires.) If
    no reorder timeout is set, we should count the number of in-sequence
    packets after the out-of-sequence (OOS) condition is detected, and reset
    sequence number state after a number of such packets are received.

    For now, the number of in-sequence packets while in OOS state which
    cause the sequence number state to be reset is hard-coded to 5. This
    could be configurable later.

    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    James Chapman
     
  • The L2TP datapath is not currently RFC-compliant when sequence numbers
    are used in L2TP data packets. According to the L2TP RFC, any received
    sequence number NR greater than or equal to the next expected NR is
    acceptable, where the "greater than or equal to" test is determined by
    the NR wrap point. This differs for L2TPv2 and L2TPv3, so add state in
    the session context to hold the max NR value and the NR window size in
    order to do the acceptable sequence number value check. These might be
    configurable later, but for now we derive it from the tunnel L2TP
    version, which determines the sequence number field size.

    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    James Chapman
     
  • This change moves some code handling data sequence numbers into a
    separate function to avoid too much indentation. This is to prepare
    for some changes to data sequence number handling in subsequent
    patches.

    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    James Chapman
     

02 Jul, 2013

1 commit


13 Jun, 2013

2 commits


08 Apr, 2013

2 commits


23 Mar, 2013

1 commit


21 Mar, 2013

11 commits

  • If we postpone unhashing of l2tp sessions until the structure is freed, we
    risk:

    1. further packets arriving and getting queued while the pseudowire is being
    closed down
    2. the recv path hitting "scheduling while atomic" errors in the case that
    recv drops the last reference to a session and calls l2tp_session_free
    while in atomic context

    As such, l2tp sessions should be unhashed from l2tp_core data structures early
    in the teardown process prior to calling pseudowire close. For pseudowires
    like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • l2tp's u64_stats writers were incorrectly synchronised, making it possible to
    deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
    code netlink commands while passing data through l2tp sessions.

    Previous discussion on netdev determined that alternative solutions such as
    spinlock writer synchronisation or per-cpu data would bring unjustified
    overhead, given that most users interested in high volume traffic will likely
    be running 64bit kernels on 64bit hardware.

    As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
    thereby avoiding the deadlock.

    Ref:
    http://marc.info/?l=linux-netdev&m=134029167910731&w=2
    http://marc.info/?l=linux-netdev&m=134079868111131&w=2

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • If userspace deletes a ppp pseudowire using the netlink API, either by
    directly deleting the session or by deleting the tunnel that contains the
    session, we need to tear down the corresponding pppox channel.

    Rather than trying to manage two pppox unbind codepaths, switch the netlink
    and l2tp_core session_close handlers to close via. the l2tp_ppp socket
    .release handler.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
    and l2tp_session_delete. Pseudowire implementations which are deleted only
    via. l2tp_core l2tp_session_delete calls can dispense with their own code for
    flushing the reorder queue.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • If an l2tp session is deleted, it is necessary to delete skbs in-flight
    on the session's reorder queue before taking it down.

    Rather than having each pseudowire implementation reaching into the
    l2tp_session struct to handle this itself, provide a function in l2tp_core to
    purge the session queue.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • It is valid for an existing struct sock object to have a NULL sk_socket
    pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
    whether the socket was created by the kernel or by userspace.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • When a user deletes a tunnel using netlink, all the sessions in the tunnel
    should also be deleted. Since running sessions will pin the tunnel socket
    with the references they hold, have the l2tp_tunnel_delete close all sessions
    in a tunnel before finally closing the tunnel socket.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
    socket being closed from userspace. We need to do the same thing for
    IP-encapsulation sockets.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
    tunnel when a UDP-encapsulation socket is destroyed. We need to do something
    similar for IP-encapsulation sockets.

    Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
    call it from their .destroy handlers.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • L2TP sessions hold a reference to the tunnel socket to prevent it going away
    while sessions are still active. However, since tunnel destruction is handled
    by the sock sk_destruct callback there is a catch-22: a tunnel with sessions
    cannot be deleted since each session holds a reference to the tunnel socket.
    If userspace closes a managed tunnel socket, or dies, the tunnel will persist
    and it will be neccessary to individually delete the sessions using netlink
    commands. This is ugly.

    To prevent this occuring, this patch leverages the udp encapsulation socket
    destroy callback to gain early notification when the tunnel socket is closed.
    This allows us to safely close the sessions running in the tunnel, dropping
    the tunnel socket references in the process. The tunnel socket is then
    destroyed as normal, and the tunnel resources deallocated in sk_destruct.

    While we're at it, ensure that l2tp_tunnel_closeall correctly drops session
    references to allow the sessions to be deleted rather than leaking.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     

06 Mar, 2013

1 commit

  • Pull networking fixes from David Miller:
    "A moderately sized pile of fixes, some specifically for merge window
    introduced regressions although others are for longer standing items
    and have been queued up for -stable.

    I'm kind of tired of all the RDS protocol bugs over the years, to be
    honest, it's way out of proportion to the number of people who
    actually use it.

    1) Fix missing range initialization in netfilter IPSET, from Jozsef
    Kadlecsik.

    2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
    Berg.

    3) Fix DMA syncing in SFC driver, from Ben Hutchings.

    4) Fix regression in BOND device MAC address setting, from Jiri
    Pirko.

    5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

    6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
    fix from Dmitry Kravkov.

    7) Missing cfgspace_lock initialization in BCMA driver.

    8) Validate parameter size for SCTP assoc stats getsockopt(), from
    Guenter Roeck.

    9) Fix SCTP association hangs, from Lee A Roberts.

    10) Fix jumbo frame handling in r8169, from Francois Romieu.

    11) Fix phy_device memory leak, from Petr Malat.

    12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
    Mehrtens.

    13) Missing socket refcount release in L2TP, from Guillaume Nault.

    14) sctp_endpoint_init should respect passed in gfp_t, rather than use
    GFP_KERNEL unconditionally. From Dan Carpenter.

    15) Add AISX AX88179 USB driver, from Freddy Xin.

    16) Remove MAINTAINERS entries for drivers deleted during the merge
    window, from Cesar Eduardo Barros.

    17) RDS protocol can try to allocate huge amounts of memory, check
    that the user's request length makes sense, from Cong Wang.

    18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
    bogus, definition. From Cong Wang.

    19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
    from Frank Li. Also, fix a build error introduced in the merge
    window.

    20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

    21) Don't double count RTT measurements when we leave the TCP receive
    fast path, from Neal Cardwell."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
    tcp: fix double-counted receiver RTT when leaving receiver fast path
    CAIF: fix sparse warning for caif_usb
    rds: simplify a warning message
    net: fec: fix build error in no MXC platform
    net: ipv6: Don't purge default router if accept_ra=2
    net: fec: put tx to napi poll function to fix dead lock
    sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
    rds: limit the size allocated by rds_message_alloc()
    MAINTAINERS: remove eexpress
    MAINTAINERS: remove drivers/net/wan/cycx*
    MAINTAINERS: remove 3c505
    caif_dev: fix sparse warnings for caif_flow_cb
    ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
    sctp: use the passed in gfp flags instead GFP_KERNEL
    ipv[4|6]: correct dropwatch false positive in local_deliver_finish
    l2tp: Restore socket refcount when sendmsg succeeds
    net/phy: micrel: Disable asymmetric pause for KSZ9021
    bgmac: omit the fcs
    phy: Fix phy_device_free memory leak
    bnx2x: Fix KR2 work-around condition
    ...

    Linus Torvalds
     

02 Mar, 2013

1 commit

  • The sendmsg() syscall handler for PPPoL2TP doesn't decrease the socket
    reference counter after successful transmissions. Any successful
    sendmsg() call from userspace will then increase the reference counter
    forever, thus preventing the kernel's session and tunnel data from
    being freed later on.

    The problem only happens when writing directly on L2TP sockets.
    PPP sockets attached to L2TP are unaffected as the PPP subsystem
    uses pppol2tp_xmit() which symmetrically increase/decrease reference
    counters.

    This patch adds the missing call to sock_put() before returning from
    pppol2tp_sendmsg().

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

22 Feb, 2013

1 commit

  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

09 Feb, 2013

1 commit


08 Feb, 2013

1 commit

  • Andrew Savchenko reported a DNS failure and we diagnosed that
    some UDP sockets were unable to send more packets because their
    sk_wmem_alloc was corrupted after a while (tx_queue column in
    following trace)

    $ cat /proc/net/udp
    sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
    ...
    459: 00000000:0270 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4507 2 ffff88003d612380 0
    466: 00000000:0277 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4802 2 ffff88003d613180 0
    470: 076A070A:007B 00000000:0000 07 FFFF4600:00000000 00:00000000 00000000 123 0 5552 2 ffff880039974380 0
    470: 010213AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4986 2 ffff88003dbd3180 0
    470: 010013AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4985 2 ffff88003dbd2e00 0
    470: 00FCA8C0:007B 00000000:0000 07 FFFFFB00:00000000 00:00000000 00000000 0 0 4984 2 ffff88003dbd2a80 0
    ...

    Playing with skb->truesize is tricky, especially when
    skb is attached to a socket, as we can fool memory charging.

    Just remove this code, its not worth trying to be ultra
    precise in xmit path.

    Reported-by: Andrew Savchenko
    Tested-by: Andrew Savchenko
    Signed-off-by: Eric Dumazet
    Cc: James Chapman
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Feb, 2013

5 commits

  • The infrastructure is already pretty much entirely there
    to allow this conversion.

    The tunnel and session lookups have per-namespace tables,
    and the ipv4 bind lookup includes the namespace in the
    lookup key.

    Set netns_ok in l2tp_ip_protocol.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When creating unmanaged tunnel sockets we should honour the network namespace
    passed to l2tp_tunnel_create. Furthermore, unmanaged tunnel sockets should
    not hold a reference to the network namespace lest they accidentally keep
    alive a namespace which should otherwise have been released.

    Unmanaged tunnel sockets now drop their namespace reference via sk_change_net,
    and are released in a new pernet exit callback, l2tp_exit_net.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • l2tp_tunnel_create is passed a pointer to the network namespace for the
    tunnel, along with an optional file descriptor for the tunnel which may
    be passed in from userspace via. netlink.

    In the case where the file descriptor is defined, ensure that the namespace
    associated with that socket matches the namespace explicitly passed to
    l2tp_tunnel_create.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • The L2TP netlink code can run in namespaces. Set the netnsok flag in
    genl_family to true to reflect that fact.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • To allow l2tp_tunnel_delete to be called from an atomic context, place the
    tunnel socket release calls on a workqueue for asynchronous execution.

    Tunnel memory is eventually freed in the tunnel socket destructor.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     

01 Feb, 2013

2 commits

  • l2tp_ip6 is incorrectly using the IPv4-specific ip_cmsg_recv to handle
    ancillary data. This means that socket options such as IPV6_RECVPKTINFO are
    not honoured in userspace.

    Convert l2tp_ip6 to use the IPv6-specific handler.

    Ref: net/ipv6/udp.c

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: Chris Elston
    Signed-off-by: David S. Miller

    Tom Parkin
     
  • The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific. Since
    datagram_send_ctl is publicly exported it should be appropriately named to
    reflect the fact that it's for IPv6 only.

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     

30 Jan, 2013

1 commit

  • If a tunnel socket is created by userspace, l2tp hooks the socket destructor
    in order to clean up resources if userspace closes the socket or crashes. It
    also caches a pointer to the struct sock for use in the data path and in the
    netlink interface.

    While it is safe to use the cached sock pointer in the data path, where the
    skb references keep the socket alive, it is not safe to use it elsewhere as
    such access introduces a race with userspace closing the socket. In
    particular, l2tp_tunnel_delete is prone to oopsing if a multithreaded
    userspace application closes a socket at the same time as sending a netlink
    delete command for the tunnel.

    This patch fixes this oops by forcing l2tp_tunnel_delete to explicitly look up
    a tunnel socket held by userspace using sockfd_lookup().

    Signed-off-by: Tom Parkin
    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    Tom Parkin
     

12 Jan, 2013

1 commit

  • The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
    while now and is almost always enabled by default. As agreed during the
    Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

    CC: "David S. Miller"
    Signed-off-by: Kees Cook
    Acked-by: David S. Miller

    Kees Cook
     

11 Nov, 2012

1 commit


03 Nov, 2012

1 commit

  • When creating an L2TPv3 Ethernet session, if register_netdev() should fail for
    any reason (for example, automatic naming for "l2tpeth%d" interfaces hits the
    32k-interface limit), the netdev is freed in the error path. However, the
    l2tp_eth_sess structure's dev pointer is left uncleared, and this results in
    l2tp_eth_delete() then attempting to unregister the same netdev later in the
    session teardown. This results in an oops.

    To avoid this, clear the session dev pointer in the error path.

    Signed-off-by: Tom Parkin
    Signed-off-by: David S. Miller

    Tom Parkin
     

26 Oct, 2012

1 commit