12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Feb, 2014

1 commit


13 Feb, 2014

2 commits

  • Include appropriate header file fs/dlm/ast.h in fs/dlm/ast.c because it
    contains function prototypes of some functions defined in fs/dlm/ast.c.

    This also eliminates the following warning in fs/dlm/ast:
    fs/dlm/ast.c:52:5: warning: no previous prototype for ‘dlm_add_lkb_callback’ [-Wmissing-prototypes]
    fs/dlm/ast.c:113:5: warning: no previous prototype for ‘dlm_rem_lkb_callback’ [-Wmissing-prototypes]
    fs/dlm/ast.c:174:6: warning: no previous prototype for ‘dlm_add_cb’ [-Wmissing-prototypes]
    fs/dlm/ast.c:212:6: warning: no previous prototype for ‘dlm_callback_work’ [-Wmissing-prototypes]
    fs/dlm/ast.c:267:5: warning: no previous prototype for ‘dlm_callback_start’ [-Wmissing-prototypes]
    fs/dlm/ast.c:278:6: warning: no previous prototype for ‘dlm_callback_stop’ [-Wmissing-prototypes]
    fs/dlm/ast.c:284:6: warning: no previous prototype for ‘dlm_callback_suspend’ [-Wmissing-prototypes]
    fs/dlm/ast.c:292:6: warning: no previous prototype for ‘dlm_callback_resume’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: David Teigland

    Rashika Kheria
     
  • We pass the freed "r" pointer back to the caller. It's harmless but it
    upsets the static checkers.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David Teigland

    Dan Carpenter
     

26 Jan, 2014

1 commit

  • Pull networking updates from David Miller:

    1) BPF debugger and asm tool by Daniel Borkmann.

    2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

    3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

    4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match. From Ben Hutchings.

    5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

    6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

    7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

    8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

    9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

    10) Fix ipv6 router reachability probing, from Jiri Benc.

    11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

    12) Support tunneling in GRO layer, from Jerry Chu.

    13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

    14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI. From Atzm Watanabe.

    15) New "Heavy Hitter" qdisc, from Terry Lam.

    16) Significantly improve the IPSEC support in pktgen, from Fan Du.

    17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
    Herbert.

    18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

    19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

    20) Key TCP metrics blobs also by source address, not just destination
    address. From Christoph Paasch.

    21) Support 10G in generic phylib. From Andy Fleming.

    22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided. From Tom Herbert.

    The wireless and netfilter folks have been busy little bees too.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
    net/cxgb4: Fix referencing freed adapter
    ipv6: reallocate addrconf router for ipv6 address when lo device up
    fib_frontend: fix possible NULL pointer dereference
    rtnetlink: remove IFLA_BOND_SLAVE definition
    rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
    qlcnic: update version to 5.3.55
    qlcnic: Enhance logic to calculate msix vectors.
    qlcnic: Refactor interrupt coalescing code for all adapters.
    qlcnic: Update poll controller code path
    qlcnic: Interrupt code cleanup
    qlcnic: Enhance Tx timeout debugging.
    qlcnic: Use bool for rx_mac_learn.
    bonding: fix u64 division
    rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
    sfc: Use the correct maximum TX DMA ring size for SFC9100
    Add Shradha Shah as the sfc driver maintainer.
    net/vxlan: Share RX skb de-marking and checksum checks with ovs
    tulip: cleanup by using ARRAY_SIZE()
    ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
    net/cxgb4: Don't retrieve stats during recovery
    ...

    Linus Torvalds
     

22 Jan, 2014

1 commit


16 Dec, 2013

1 commit

  • The recovery time for a failed node was taking a long
    time because the failed node could not perform the full
    shutdown process. Removing the linger time speeds this
    up. The dlm does not care what happens to messages to
    or from the failed node.

    Signed-off-by: Dongmao Zhang
    Signed-off-by: David Teigland

    Dongmao Zhang
     

20 Nov, 2013

1 commit

  • As suggested by David Miller, make genl_register_family_with_ops()
    a macro and pass only the array, evaluating ARRAY_SIZE() in the
    macro, this is a little safer.

    The openvswitch has some indirection, assing ops/n_ops directly in
    that code. This might ultimately just assign the pointers in the
    family initializations, saving the struct genl_family_and_ops and
    code (once mcast groups are handled differently.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

16 Oct, 2013

1 commit

  • When dlm_release_lockspace(ls, 1) is invoked on a busy system
    immediately after the last dlm_unlock() AST has finished it can occur
    that lkb_idr_is_local() is invoked for the unlocked LKB since removal
    from ls_lkbidr only occurs after the AST has returned. If that happens
    dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing
    the lockspace. Fix this race condition by changing lkb_idr_is_local()
    such that it only returns true for LKB's that have not yet been
    unlocked.

    Signed-off-by: Bart Van Assche
    Signed-off-by: David Teigland

    Bart Van Assche
     

13 Aug, 2013

1 commit


30 Jul, 2013

1 commit


27 Jun, 2013

1 commit


26 Jun, 2013

2 commits


19 Jun, 2013

1 commit


15 Jun, 2013

6 commits

  • For TCP we disable Nagle and I cannot think of why it would be needed
    for SCTP. When disabled it seems to improve dlm_lock operations like it
    does for TCP.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • Currently if a SCTP send fails, we lose the data we were trying
    to send because the writequeue_entry is released when we do the send.
    When this happens other nodes will then hang waiting for a reply.

    This adds support for SCTP to retry the send operation.

    I also removed the retry limit for SCTP use, because we want
    to make sure we try every path during init time and for longer
    failures we want to continually retry in case paths come back up
    while trying other paths. We will do this until userspace tells us
    to stop.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • Currently, if we cannot create a association to the first IP addr
    that is added to DLM, the SCTP init assoc code will just retry
    the same IP. This patch adds a simple failover schemes where we
    will try one of the addresses that was passed into DLM.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • We should be testing and cleaing the init pending bit because later
    when sctp_init_assoc is recalled it will be checking that it is not set
    and set the bit.

    We do not want to touch CF_CONNECT_PENDING here because we will queue
    swork and process_send_sockets will then call the connect_action function.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • sctp_assoc was not getting set so later lookups failed.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • We were clearing the base con's init pending flags, but the
    con for the node was the one with the pending bit set.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     

02 May, 2013

1 commit

  • Pull networking updates from David Miller:
    "Highlights (1721 non-merge commits, this has to be a record of some
    sort):

    1) Add 'random' mode to team driver, from Jiri Pirko and Eric
    Dumazet.

    2) Make it so that any driver that supports configuration of multiple
    MAC addresses can provide the forwarding database add and del
    calls by providing a default implementation and hooking that up if
    the driver doesn't have an explicit set of handlers. From Vlad
    Yasevich.

    3) Support GSO segmentation over tunnels and other encapsulating
    devices such as VXLAN, from Pravin B Shelar.

    4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

    5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
    Dukkipati.

    6) In the PHY layer, allow supporting wake-on-lan in situations where
    the PHY registers have to be written for it to be configured.

    Use it to support wake-on-lan in mv643xx_eth.

    From Michael Stapelberg.

    7) Significantly improve firewire IPV6 support, from YOSHIFUJI
    Hideaki.

    8) Allow multiple packets to be sent in a single transmission using
    network coding in batman-adv, from Martin Hundebøll.

    9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

    10) Generalize the VXLAN forwarding tables so that there is more
    flexibility in configurating various aspects of the endpoints.
    From David Stevens.

    11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
    from Dmitry Kravkov.

    12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
    Neira Ayuso.

    13) Start adding networking selftests.

    14) In situations of overload on the same AF_PACKET fanout socket, or
    per-cpu packet receive queue, minimize drop by distributing the
    load to other cpus/fanouts. From Willem de Bruijn and Eric
    Dumazet.

    15) Add support for new payload offset BPF instruction, from Daniel
    Borkmann.

    16) Convert several drivers over to mdoule_platform_driver(), from
    Sachin Kamat.

    17) Provide a minimal BPF JIT image disassembler userspace tool, from
    Daniel Borkmann.

    18) Rewrite F-RTO implementation in TCP to match the final
    specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

    19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
    you like netlink, so I implemented netlink dumping of netlink
    sockets.") From Andrey Vagin.

    20) Remove ugly passing of rtnetlink attributes into rtnl_doit
    functions, from Thomas Graf.

    21) Allow userspace to be able to see if a configuration change occurs
    in the middle of an address or device list dump, from Nicolas
    Dichtel.

    22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
    Frederic Sowa.

    23) Increase accuracy of packet length used by packet scheduler, from
    Jason Wang.

    24) Beginning set of changes to make ipv4/ipv6 fragment handling more
    scalable and less susceptible to overload and locking contention,
    from Jesper Dangaard Brouer.

    25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
    instead. From Hong Zhiguo.

    26) Optimize route usage in IPVS by avoiding reference counting where
    possible, from Julian Anastasov.

    27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

    28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
    Eitzenberger.

    29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
    nfnetlink_log, and nfnetlink_queue. From Gao feng.

    30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

    31) Support several new r8169 chips, from Hayes Wang.

    32) Support tokenized interface identifiers in ipv6, from Daniel
    Borkmann.

    33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

    34) Add 802.1ad vlan offload support, from Patrick McHardy.

    35) Support mmap() based netlink communication, also from Patrick
    McHardy.

    36) Support HW timestamping in mlx4 driver, from Amir Vadai.

    37) Rationalize AF_PACKET packet timestamping when transmitting, from
    Willem de Bruijn and Daniel Borkmann.

    38) Bring parity to what's provided by /proc/net/packet socket dumping
    and the info provided by netlink socket dumping of AF_PACKET
    sockets. From Nicolas Dichtel.

    39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
    Poirier"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    filter: fix va_list build error
    af_unix: fix a fatal race with bit fields
    bnx2x: Prevent memory leak when cnic is absent
    bnx2x: correct reading of speed capabilities
    net: sctp: attribute printl with __printf for gcc fmt checks
    netlink: kconfig: move mmap i/o into netlink kconfig
    netpoll: convert mutex into a semaphore
    netlink: Fix skb ref counting.
    net_sched: act_ipt forward compat with xtables
    mlx4_en: fix a build error on 32bit arches
    Revert "bnx2x: allow nvram test to run when device is down"
    bridge: avoid OOPS if root port not found
    drivers: net: cpsw: fix kernel warn on cpsw irq enable
    sh_eth: use random MAC address if no valid one supplied
    3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
    tg3: fix to append hardware time stamping flags
    unix/stream: fix peeking with an offset larger than data in queue
    unix/dgram: fix peeking with an offset larger than data in queue
    unix/dgram: peek beyond 0-sized skbs
    openvswitch: Remove unneeded ovs_netdev_get_ifindex()
    ...

    Linus Torvalds
     

10 Apr, 2013

1 commit

  • This patch introduces an UAPI header for the SCTP protocol,
    so that we can facilitate the maintenance and development of
    user land applications or libraries, in particular in terms
    of header synchronization.

    To not break compatibility, some fragments from lksctp-tools'
    netinet/sctp.h have been carefully included, while taking care
    that neither kernel nor user land breaks, so both compile fine
    with this change (for lksctp-tools I tested with the old
    netinet/sctp.h header and with a newly adapted one that includes
    the uapi sctp header). lksctp-tools smoke test run through
    successfully as well in both cases.

    Suggested-by: Neil Horman
    Cc: Neil Horman
    Cc: Vlad Yasevich
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

09 Apr, 2013

1 commit

  • When the kernel clears flocks/plocks during close, it calls posix
    unlock when there are flocks but no posix locks. Without this
    patch, that unnecessary posix unlock is passed to userland
    (dlm_controld), across the cluster, and back to the kernel.
    This can create a lot of plock activity, even when no posix
    locks had been used.

    This patch copies the nfs approach, and skips the full posix
    unlock if there is no plock found during the vfs unlock phase.

    Signed-off-by: David Teigland

    David Teigland
     

28 Feb, 2013

4 commits

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • Convert to the much saner new idr interface. Error return values from
    recover_idr_add() mix -1 and -errno. The conversion doesn't change
    that but it looks iffy.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated.

    The conversion isn't completely trivial for recover_idr_clear() as it's
    the only place in kernel which makes legitimate use of idr_remove_all()
    w/o idr_destroy(). Replace it with idr_remove() call inside
    idr_for_each_entry() loop. It goes on top so that it matches the
    operation order in recover_idr_del().

    Signed-off-by: Tejun Heo
    Cc: Christine Caulfield
    Cc: David Teigland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert recover_idr_clear() to use idr_for_each_entry() instead of
    idr_for_each(). It's somewhat less efficient this way but it shouldn't
    matter in an error path. This is to help with deprecation of
    idr_remove_all().

    Signed-off-by: Tejun Heo
    Cc: Christine Caulfield
    Cc: David Teigland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

1 commit

  • According to SUSv3:

    [EACCES] Permission denied. An attempt was made to access a file in a way
    forbidden by its file access permissions.

    [EPERM] Operation not permitted. An attempt was made to perform an operation
    limited to processes with appropriate privileges or to the owner of a file
    or other resource.

    So -EPERM should be returned if capability checks fails.

    Strictly speaking this is an API change since the error code user sees is
    altered.

    Signed-off-by: Zhao Hongjiang
    Acked-by: Jan Kara
    Acked-by: Steven Whitehouse
    Acked-by: Ian Kent
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Zhao Hongjiang
     

22 Feb, 2013

1 commit


05 Feb, 2013

1 commit


08 Jan, 2013

1 commit

  • Keep track of whether a toss list contains any
    shrinkable rsbs. If not, dlm_scand can avoid
    scanning the list for rsbs to shrink. Unnecessary
    scanning can otherwise waste a lot of time because
    the toss lists can contain a large number of rsbs
    that are non-shrinkable (directory records).

    Signed-off-by: David Teigland

    David Teigland
     

17 Nov, 2012

1 commit

  • When a node is removed that held a PW/EX lock, the
    existing master node should invalidate the lvb on the
    resource due to the purged lock.

    Previously, the existing master node was invalidating
    the lvb if it found only NL/CR locks on the resource
    during recovery for the removed node. This could lead
    to cases where it invalidated the lvb and shouldn't
    have, or cases where it should have invalidated and
    didn't.

    When recovery selects a *new* master node for a
    resource, and that new master finds only NL/CR locks
    on the resource after lock recovery, it should
    invalidate the lvb. This case was handled correctly
    (but was incorrectly applied to the existing master
    case also.)

    When a process exits while holding a PW/EX lock,
    the lvb on the resource should be invalidated.
    This was not happening.

    The lvb contents and VALNOTVALID flag should be
    recovered before granting locks in recovery so that
    the recovered lvb state is provided in the callback.
    The lvb was being recovered after the lock was granted.

    Signed-off-by: David Teigland

    David Teigland
     

02 Nov, 2012

2 commits


03 Oct, 2012

1 commit

  • Pull networking changes from David Miller:

    1) GRE now works over ipv6, from Dmitry Kozlov.

    2) Make SCTP more network namespace aware, from Eric Biederman.

    3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

    4) Make openvswitch network namespace aware, from Pravin B Shelar.

    5) IPV6 NAT implementation, from Patrick McHardy.

    6) Server side support for TCP Fast Open, from Jerry Chu and others.

    7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

    8) Increate the loopback default MTU to 64K, from Eric Dumazet.

    9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic. This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

    10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail. Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

    11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

    12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

    13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

    Fix up various fairly trivial conflicts, mostly caused by the user
    namespace changes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
    hyperv: Add buffer for extended info after the RNDIS response message.
    hyperv: Report actual status in receive completion packet
    hyperv: Remove extra allocated space for recv_pkt_list elements
    hyperv: Fix page buffer handling in rndis_filter_send_request()
    hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
    hyperv: Fix the max_xfer_size in RNDIS initialization
    vxlan: put UDP socket in correct namespace
    vxlan: Depend on CONFIG_INET
    sfc: Fix the reported priorities of different filter types
    sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
    sfc: Fix loopback self-test with separate_tx_channels=1
    sfc: Fix MCDI structure field lookup
    sfc: Add parentheses around use of bitfield macro arguments
    sfc: Fix null function pointer in efx_sriov_channel_type
    vxlan: virtual extensible lan
    igmp: export symbol ip_mc_leave_group
    netlink: add attributes to fdb interface
    tg3: unconditionally select HWMON support when tg3 is enabled.
    Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
    gre: fix sparse warning
    ...

    Linus Torvalds
     

11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

10 Sep, 2012

1 commit

  • device_write only checks whether the request size is big enough, but it doesn't
    check if the size is too big.

    At that point, it also tries to allocate as much memory as the user has requested
    even if it's too much. This can lead to OOM killer kicking in, or memory corruption
    if (count + 1) overflows.

    Signed-off-by: Sasha Levin
    Signed-off-by: David Teigland

    Sasha Levin
     

13 Aug, 2012

1 commit