15 Oct, 2014

1 commit


10 Sep, 2014

1 commit


09 Aug, 2014

1 commit


12 Jun, 2014

1 commit

  • The connection struct with nodeid 0 is the listening socket,
    not a connection to another node. The sctp resend function
    was not checking that the nodeid was valid (non-zero), so it
    would mistakenly get and resend on the listening connection
    when nodeid was zero.

    Signed-off-by: Lidong Zhong
    Signed-off-by: David Teigland

    Lidong Zhong
     

07 Jun, 2014

3 commits


12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Feb, 2014

1 commit


13 Feb, 2014

2 commits

  • Include appropriate header file fs/dlm/ast.h in fs/dlm/ast.c because it
    contains function prototypes of some functions defined in fs/dlm/ast.c.

    This also eliminates the following warning in fs/dlm/ast:
    fs/dlm/ast.c:52:5: warning: no previous prototype for ‘dlm_add_lkb_callback’ [-Wmissing-prototypes]
    fs/dlm/ast.c:113:5: warning: no previous prototype for ‘dlm_rem_lkb_callback’ [-Wmissing-prototypes]
    fs/dlm/ast.c:174:6: warning: no previous prototype for ‘dlm_add_cb’ [-Wmissing-prototypes]
    fs/dlm/ast.c:212:6: warning: no previous prototype for ‘dlm_callback_work’ [-Wmissing-prototypes]
    fs/dlm/ast.c:267:5: warning: no previous prototype for ‘dlm_callback_start’ [-Wmissing-prototypes]
    fs/dlm/ast.c:278:6: warning: no previous prototype for ‘dlm_callback_stop’ [-Wmissing-prototypes]
    fs/dlm/ast.c:284:6: warning: no previous prototype for ‘dlm_callback_suspend’ [-Wmissing-prototypes]
    fs/dlm/ast.c:292:6: warning: no previous prototype for ‘dlm_callback_resume’ [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: David Teigland

    Rashika Kheria
     
  • We pass the freed "r" pointer back to the caller. It's harmless but it
    upsets the static checkers.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David Teigland

    Dan Carpenter
     

26 Jan, 2014

1 commit

  • Pull networking updates from David Miller:

    1) BPF debugger and asm tool by Daniel Borkmann.

    2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

    3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

    4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match. From Ben Hutchings.

    5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

    6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

    7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

    8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

    9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

    10) Fix ipv6 router reachability probing, from Jiri Benc.

    11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

    12) Support tunneling in GRO layer, from Jerry Chu.

    13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

    14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI. From Atzm Watanabe.

    15) New "Heavy Hitter" qdisc, from Terry Lam.

    16) Significantly improve the IPSEC support in pktgen, from Fan Du.

    17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
    Herbert.

    18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

    19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

    20) Key TCP metrics blobs also by source address, not just destination
    address. From Christoph Paasch.

    21) Support 10G in generic phylib. From Andy Fleming.

    22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided. From Tom Herbert.

    The wireless and netfilter folks have been busy little bees too.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
    net/cxgb4: Fix referencing freed adapter
    ipv6: reallocate addrconf router for ipv6 address when lo device up
    fib_frontend: fix possible NULL pointer dereference
    rtnetlink: remove IFLA_BOND_SLAVE definition
    rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
    qlcnic: update version to 5.3.55
    qlcnic: Enhance logic to calculate msix vectors.
    qlcnic: Refactor interrupt coalescing code for all adapters.
    qlcnic: Update poll controller code path
    qlcnic: Interrupt code cleanup
    qlcnic: Enhance Tx timeout debugging.
    qlcnic: Use bool for rx_mac_learn.
    bonding: fix u64 division
    rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
    sfc: Use the correct maximum TX DMA ring size for SFC9100
    Add Shradha Shah as the sfc driver maintainer.
    net/vxlan: Share RX skb de-marking and checksum checks with ovs
    tulip: cleanup by using ARRAY_SIZE()
    ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
    net/cxgb4: Don't retrieve stats during recovery
    ...

    Linus Torvalds
     

22 Jan, 2014

1 commit


16 Dec, 2013

1 commit

  • The recovery time for a failed node was taking a long
    time because the failed node could not perform the full
    shutdown process. Removing the linger time speeds this
    up. The dlm does not care what happens to messages to
    or from the failed node.

    Signed-off-by: Dongmao Zhang
    Signed-off-by: David Teigland

    Dongmao Zhang
     

20 Nov, 2013

1 commit

  • As suggested by David Miller, make genl_register_family_with_ops()
    a macro and pass only the array, evaluating ARRAY_SIZE() in the
    macro, this is a little safer.

    The openvswitch has some indirection, assing ops/n_ops directly in
    that code. This might ultimately just assign the pointers in the
    family initializations, saving the struct genl_family_and_ops and
    code (once mcast groups are handled differently.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

16 Oct, 2013

1 commit

  • When dlm_release_lockspace(ls, 1) is invoked on a busy system
    immediately after the last dlm_unlock() AST has finished it can occur
    that lkb_idr_is_local() is invoked for the unlocked LKB since removal
    from ls_lkbidr only occurs after the AST has returned. If that happens
    dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing
    the lockspace. Fix this race condition by changing lkb_idr_is_local()
    such that it only returns true for LKB's that have not yet been
    unlocked.

    Signed-off-by: Bart Van Assche
    Signed-off-by: David Teigland

    Bart Van Assche
     

13 Aug, 2013

1 commit


30 Jul, 2013

1 commit


27 Jun, 2013

1 commit


26 Jun, 2013

2 commits


19 Jun, 2013

1 commit


15 Jun, 2013

6 commits

  • For TCP we disable Nagle and I cannot think of why it would be needed
    for SCTP. When disabled it seems to improve dlm_lock operations like it
    does for TCP.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • Currently if a SCTP send fails, we lose the data we were trying
    to send because the writequeue_entry is released when we do the send.
    When this happens other nodes will then hang waiting for a reply.

    This adds support for SCTP to retry the send operation.

    I also removed the retry limit for SCTP use, because we want
    to make sure we try every path during init time and for longer
    failures we want to continually retry in case paths come back up
    while trying other paths. We will do this until userspace tells us
    to stop.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • Currently, if we cannot create a association to the first IP addr
    that is added to DLM, the SCTP init assoc code will just retry
    the same IP. This patch adds a simple failover schemes where we
    will try one of the addresses that was passed into DLM.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • We should be testing and cleaing the init pending bit because later
    when sctp_init_assoc is recalled it will be checking that it is not set
    and set the bit.

    We do not want to touch CF_CONNECT_PENDING here because we will queue
    swork and process_send_sockets will then call the connect_action function.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • sctp_assoc was not getting set so later lookups failed.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     
  • We were clearing the base con's init pending flags, but the
    con for the node was the one with the pending bit set.

    Signed-off-by: Mike Christie
    Signed-off-by: David Teigland

    Mike Christie
     

02 May, 2013

1 commit

  • Pull networking updates from David Miller:
    "Highlights (1721 non-merge commits, this has to be a record of some
    sort):

    1) Add 'random' mode to team driver, from Jiri Pirko and Eric
    Dumazet.

    2) Make it so that any driver that supports configuration of multiple
    MAC addresses can provide the forwarding database add and del
    calls by providing a default implementation and hooking that up if
    the driver doesn't have an explicit set of handlers. From Vlad
    Yasevich.

    3) Support GSO segmentation over tunnels and other encapsulating
    devices such as VXLAN, from Pravin B Shelar.

    4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

    5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
    Dukkipati.

    6) In the PHY layer, allow supporting wake-on-lan in situations where
    the PHY registers have to be written for it to be configured.

    Use it to support wake-on-lan in mv643xx_eth.

    From Michael Stapelberg.

    7) Significantly improve firewire IPV6 support, from YOSHIFUJI
    Hideaki.

    8) Allow multiple packets to be sent in a single transmission using
    network coding in batman-adv, from Martin Hundebøll.

    9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

    10) Generalize the VXLAN forwarding tables so that there is more
    flexibility in configurating various aspects of the endpoints.
    From David Stevens.

    11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
    from Dmitry Kravkov.

    12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
    Neira Ayuso.

    13) Start adding networking selftests.

    14) In situations of overload on the same AF_PACKET fanout socket, or
    per-cpu packet receive queue, minimize drop by distributing the
    load to other cpus/fanouts. From Willem de Bruijn and Eric
    Dumazet.

    15) Add support for new payload offset BPF instruction, from Daniel
    Borkmann.

    16) Convert several drivers over to mdoule_platform_driver(), from
    Sachin Kamat.

    17) Provide a minimal BPF JIT image disassembler userspace tool, from
    Daniel Borkmann.

    18) Rewrite F-RTO implementation in TCP to match the final
    specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

    19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
    you like netlink, so I implemented netlink dumping of netlink
    sockets.") From Andrey Vagin.

    20) Remove ugly passing of rtnetlink attributes into rtnl_doit
    functions, from Thomas Graf.

    21) Allow userspace to be able to see if a configuration change occurs
    in the middle of an address or device list dump, from Nicolas
    Dichtel.

    22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
    Frederic Sowa.

    23) Increase accuracy of packet length used by packet scheduler, from
    Jason Wang.

    24) Beginning set of changes to make ipv4/ipv6 fragment handling more
    scalable and less susceptible to overload and locking contention,
    from Jesper Dangaard Brouer.

    25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
    instead. From Hong Zhiguo.

    26) Optimize route usage in IPVS by avoiding reference counting where
    possible, from Julian Anastasov.

    27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

    28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
    Eitzenberger.

    29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
    nfnetlink_log, and nfnetlink_queue. From Gao feng.

    30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

    31) Support several new r8169 chips, from Hayes Wang.

    32) Support tokenized interface identifiers in ipv6, from Daniel
    Borkmann.

    33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

    34) Add 802.1ad vlan offload support, from Patrick McHardy.

    35) Support mmap() based netlink communication, also from Patrick
    McHardy.

    36) Support HW timestamping in mlx4 driver, from Amir Vadai.

    37) Rationalize AF_PACKET packet timestamping when transmitting, from
    Willem de Bruijn and Daniel Borkmann.

    38) Bring parity to what's provided by /proc/net/packet socket dumping
    and the info provided by netlink socket dumping of AF_PACKET
    sockets. From Nicolas Dichtel.

    39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
    Poirier"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    filter: fix va_list build error
    af_unix: fix a fatal race with bit fields
    bnx2x: Prevent memory leak when cnic is absent
    bnx2x: correct reading of speed capabilities
    net: sctp: attribute printl with __printf for gcc fmt checks
    netlink: kconfig: move mmap i/o into netlink kconfig
    netpoll: convert mutex into a semaphore
    netlink: Fix skb ref counting.
    net_sched: act_ipt forward compat with xtables
    mlx4_en: fix a build error on 32bit arches
    Revert "bnx2x: allow nvram test to run when device is down"
    bridge: avoid OOPS if root port not found
    drivers: net: cpsw: fix kernel warn on cpsw irq enable
    sh_eth: use random MAC address if no valid one supplied
    3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
    tg3: fix to append hardware time stamping flags
    unix/stream: fix peeking with an offset larger than data in queue
    unix/dgram: fix peeking with an offset larger than data in queue
    unix/dgram: peek beyond 0-sized skbs
    openvswitch: Remove unneeded ovs_netdev_get_ifindex()
    ...

    Linus Torvalds
     

10 Apr, 2013

1 commit

  • This patch introduces an UAPI header for the SCTP protocol,
    so that we can facilitate the maintenance and development of
    user land applications or libraries, in particular in terms
    of header synchronization.

    To not break compatibility, some fragments from lksctp-tools'
    netinet/sctp.h have been carefully included, while taking care
    that neither kernel nor user land breaks, so both compile fine
    with this change (for lksctp-tools I tested with the old
    netinet/sctp.h header and with a newly adapted one that includes
    the uapi sctp header). lksctp-tools smoke test run through
    successfully as well in both cases.

    Suggested-by: Neil Horman
    Cc: Neil Horman
    Cc: Vlad Yasevich
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

09 Apr, 2013

1 commit

  • When the kernel clears flocks/plocks during close, it calls posix
    unlock when there are flocks but no posix locks. Without this
    patch, that unnecessary posix unlock is passed to userland
    (dlm_controld), across the cluster, and back to the kernel.
    This can create a lot of plock activity, even when no posix
    locks had been used.

    This patch copies the nfs approach, and skips the full posix
    unlock if there is no plock found during the vfs unlock phase.

    Signed-off-by: David Teigland

    David Teigland
     

28 Feb, 2013

4 commits

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • Convert to the much saner new idr interface. Error return values from
    recover_idr_add() mix -1 and -errno. The conversion doesn't change
    that but it looks iffy.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated.

    The conversion isn't completely trivial for recover_idr_clear() as it's
    the only place in kernel which makes legitimate use of idr_remove_all()
    w/o idr_destroy(). Replace it with idr_remove() call inside
    idr_for_each_entry() loop. It goes on top so that it matches the
    operation order in recover_idr_del().

    Signed-off-by: Tejun Heo
    Cc: Christine Caulfield
    Cc: David Teigland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert recover_idr_clear() to use idr_for_each_entry() instead of
    idr_for_each(). It's somewhat less efficient this way but it shouldn't
    matter in an error path. This is to help with deprecation of
    idr_remove_all().

    Signed-off-by: Tejun Heo
    Cc: Christine Caulfield
    Cc: David Teigland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

26 Feb, 2013

1 commit

  • According to SUSv3:

    [EACCES] Permission denied. An attempt was made to access a file in a way
    forbidden by its file access permissions.

    [EPERM] Operation not permitted. An attempt was made to perform an operation
    limited to processes with appropriate privileges or to the owner of a file
    or other resource.

    So -EPERM should be returned if capability checks fails.

    Strictly speaking this is an API change since the error code user sees is
    altered.

    Signed-off-by: Zhao Hongjiang
    Acked-by: Jan Kara
    Acked-by: Steven Whitehouse
    Acked-by: Ian Kent
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Zhao Hongjiang
     

22 Feb, 2013

1 commit


05 Feb, 2013

1 commit


08 Jan, 2013

1 commit

  • Keep track of whether a toss list contains any
    shrinkable rsbs. If not, dlm_scand can avoid
    scanning the list for rsbs to shrink. Unnecessary
    scanning can otherwise waste a lot of time because
    the toss lists can contain a large number of rsbs
    that are non-shrinkable (directory records).

    Signed-off-by: David Teigland

    David Teigland