04 Jan, 2014

2 commits

  • Function never used in current upstream code.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Pablo Neira Ayuso

    stephen hemminger
     
  • We currently use prandom_u32() for allocation of ports in tcp bind(0)
    and udp code. In case of plain SNAT we try to keep the ports as is
    or increment on collision.

    SNAT --random mode does use per-destination incrementing port
    allocation. As a recent paper pointed out in [1] that this mode of
    port allocation makes it possible to an attacker to find the randomly
    allocated ports through a timing side-channel in a socket overloading
    attack conducted through an off-path attacker.

    So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
    in regard to the attack described in this paper. As we need to keep
    compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
    that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
    selection algorithm with a simple prandom_u32() in order to mitigate
    this attack vector. Note that the lfsr113's internal state is
    periodically reseeded by the kernel through a local secure entropy
    source.

    More details can be found in [1], the basic idea is to send bursts
    of packets to a socket to overflow its receive queue and measure
    the latency to detect a possible retransmit when the port is found.
    Because of increasing ports to given destination and port, further
    allocations can be predicted. This information could then be used by
    an attacker for e.g. for cache-poisoning, NS pinning, and degradation
    of service attacks against DNS servers [1]:

    The best defense against the poisoning attacks is to properly
    deploy and validate DNSSEC; DNSSEC provides security not only
    against off-path attacker but even against MitM attacker. We hope
    that our results will help motivate administrators to adopt DNSSEC.
    However, full DNSSEC deployment make take significant time, and
    until that happens, we recommend short-term, non-cryptographic
    defenses. We recommend to support full port randomisation,
    according to practices recommended in [2], and to avoid
    per-destination sequential port allocation, which we show may be
    vulnerable to derandomisation attacks.

    Joint work between Hannes Frederic Sowa and Daniel Borkmann.

    [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
    [2] http://arxiv.org/pdf/1205.5190v1.pdf

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann
     

27 Dec, 2013

1 commit

  • net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
    net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]

    Commit 35a2af94c7ce7130ca292c68b1d27fcfdb648f6b ("sched/wait: Make the
    __wait_event*() interface more friendly") changed how the interruption
    state is returned. However, sync_thread_master() ignores this state,
    now causing a compile warning.

    According to Julian Anastasov , this behavior is OK:

    "Yes, your patch looks ok to me. In the past we used ssleep() but IPVS
    users were confused why IPVS threads increase the load average. So, we
    switched to _interruptible calls and later the socket polling was
    added."

    Document this, as requested by Peter Zijlstra, to avoid precious developers
    disappearing in this pitfall in the future.

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Geert Uytterhoeven
     

24 Dec, 2013

1 commit

  • With this plugin, user could specify IPComp tagged with certain
    CPI that host not interested will be DROPped or any other action.

    For example:
    iptables -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP
    ip6tables -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP

    Then input IPComp packet with CPI equates 0x87 will not reach
    upper layer anymore.

    Signed-off-by: Fan Du
    Signed-off-by: Pablo Neira Ayuso

    fan.du
     

21 Dec, 2013

1 commit

  • Thanks to commits 41063e9 (ipv4: Early TCP socket demux) and 421b388
    (udp: ipv4: Add udp early demux) it is now possible to parse UID and
    GID socket info also for incoming TCP and UDP connections. Having
    this info available, it is convenient to let NFQUEUE parse it in
    order to improve and refine the traffic analysis in userspace.

    Signed-off-by: Valentina Giusti
    Signed-off-by: Pablo Neira Ayuso

    Valentina Giusti
     

20 Dec, 2013

2 commits

  • Useful to only set a particular range of the conntrack mark while
    leaving exisiting parts of the value alone, e.g. when setting
    conntrack marks via NFQUEUE.

    Follows same scheme as MARK/CONNMARK targets, i.e. the mask defines
    those bits that should be altered. No mask is equal to '~0', ie.
    the old value is replaced by new one.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • All these users need an initial seed value for jhash, prandom is
    perfectly fine. This avoids draining the entropy pool where
    its not strictly required.

    nfnetlink_log did not use the random value at all.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

13 Dec, 2013

7 commits

  • Reorder struct netns_ct so that atomic_t "count" changes don't
    slowdown users of read mostly fields.

    This is based on Eric Dumazet's proposed patch:
    "netfilter: conntrack: remove the central spinlock"
    http://thread.gmane.org/gmane.linux.network/268758/focus=47306

    The tricky part of cache-aligning this structure, that it is getting
    inlined in struct net (include/net/net_namespace.h), thus changes to
    other netns_xxx structures affects our alignment.

    Eric's original patch contained an ambiguity on 32-bit regarding
    alignment in struct net. This patch also takes 32-bit into account,
    and in case of changed (struct net) alignment sysctl_xxx entries have
    been ordered according to how often they are accessed.

    Signed-off-by: Jesper Dangaard Brouer
    Reviewed-by: Jiri Benc
    Signed-off-by: Pablo Neira Ayuso

    Jesper Dangaard Brouer
     
  • Introduced by 1397ed35f22d7c30d0b89ba74b6b7829220dfcfd
    "ipv6: add flowinfo for tcp6 pkt_options for all cases"

    Reported-by: kbuild test robot

    V2: fix the title, add empty line after the declaration (Sergei Shtylyov
    feedbacks)

    Signed-off-by: David S. Miller

    Florent Fourcot
     
  • Commit c45f812f0280 ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_*
    feature') ended up moving the printout of version[] from something that
    will be compiled out due to defines, to something that is now evaluated
    at runtime.

    That means that what always used to be an access to an __initdata string
    from non-__init code started showing up as a section mismatch when it
    didn't before.

    All other 8390 versions skip __initdata on the version string, and
    starting to annotate the whole chain of callers with __init seems like
    more churn than it's worth on this driver, so remove it from etherh.c as well.

    Fixes: c45f812f0280 ('8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature')
    Signed-off-by: Olof Johansson
    Signed-off-by: David S. Miller

    Olof Johansson
     
  • This patch modifies the GRO stack to avoid the use of "network_header"
    and associated macros like ip_hdr() and ipv6_hdr() in order to allow
    an arbitary number of IP hdrs (v4 or v6) to be used in the
    encapsulation chain. This lays the foundation for various IP
    tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.

    With this patch, the GRO stack traversing now is mostly based on
    skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
    skb->network_header). As a result all but the top layer (i.e., the
    the transport layer) must have hdrs of the same length in order for
    a pkt to be considered for aggregation. Therefore when adding a new
    encap layer (e.g., for tunneling), one must check and skip flows
    (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
    different hdr length.

    Note that unlike the network header, the transport header can and
    will continue to be set by the GRO code since there will be at
    most one "transport layer" in the encap chain.

    Signed-off-by: H.K. Jerry Chu
    Suggested-by: Eric Dumazet
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jerry Chu
     
  • Vlad Yasevich says:

    ====================
    Add packet capture support on macvtap device

    Change from RFC:
    - moved to the rx_handler approach.

    This series adds support for packet capturing on macvtap device.
    The initial approach was to simply export the capturing code as
    a function from the core network. While simple, it was not
    a very architecturally clean approach.

    The new appraoch is to provide macvtap with its rx_handler which can
    is attached to the macvtap device itself. Macvlan will simply requeue
    the packet with an updated skb->dev. BTW, macvlan layer already does this
    for macvlan devices. So, now macvtap and macvlan have almost the
    same exact input path.

    I've toyed with short-circuting the input path for macvtap by returning
    RX_HANDLER_ANOTHER, but that just made the code more complicated and
    didn't provide any kind of measurable gain (at least according to
    netperf and perf runs on the host).

    To see if there was a performance regression, I ran 1, 2 and 4 netperf
    STREAM and MAERTS tests agains the VM from both remote host and another
    guest on the same system. The command ran was
    netperf -H $host -t $test -l 20 -i 10 -I 95 -c -C

    The numbers I was getting with the new code were consistently very
    slightly (1-2%) better then the old code. I don't consider this
    an improvement, but it's not a regression! :)

    Running 'perf record' on the host didn't show any new hot spots
    and cpu utilization stayed about the same. This was better
    then I expected from simply looking at the code.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Since now macvlan and macvtap use the same receive and
    forward handlers, we can remove them completely and use
    netif_rx and dev_forward_skb() directly.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Macvtap device currently doesn not allow a user to capture
    traffic on due to the fact that it steals the packets
    from the network stack before the skb->dev is set correctly
    on the receive side, and that use uses macvlan transmit
    path directly on the send side. As a result, we never
    get a change to give traffic to the taps while the correct
    device is set in the skb.

    This patch makes macvtap device behave almost exaclty like
    macvlan. On the send side, we switch to using dev_queue_xmit().
    On the receive side, to deliver packets to macvtap, we now
    use netif_rx and dev_forward_skb just like macvlan. The only
    differnce now is that macvtap has its own rx_handler which is
    attached to the macvtap netdev. It is here that we now steal
    the packet and provide it to the socket.

    As a result, we can now capture traffic on the macvtap device:
    tcpdump -i macvtap0

    It also gives us the abilit to add tc actions to the macvtap
    device and actually utilize different bandwidth management
    queues on output.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

12 Dec, 2013

17 commits

  • Daniel Borkmann says:

    ====================
    bpf/filter updates

    This set adds just two minimal helper tools that complement the
    already available bpf_jit_disasm and complete BPF tooling; plus
    it adds and an extensive documentation update of filter.txt.

    Please see individual descriptions for details.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch significantly updates the BPF documentation and describes
    its internal architecture, Linux extensions, and handling of the
    kernel's BPF and JIT engine, plus documents how development can be
    facilitated with the help of bpf_dbg, bpf_asm, bpf_jit_disasm.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • There are a couple of valid use cases for a minimal low-level bpf asm
    like tool, for example, using/linking to libpcap is not an option, the
    required BPF filters use Linux extensions that are not supported by
    libpcap's compiler, a filter might be more complex and not cleanly
    implementable with libpcap's compiler, particular filter codes should
    be optimized differently than libpcap's internal BPF compiler does,
    or for security audits of emitted BPF JIT code for prepared set of BPF
    instructions resp. BPF JIT compiler development in general.

    Then, in such cases writing such a filter in low-level syntax can be
    an good alternative, for example, xt_bpf and cls_bpf users might have
    requirements that could result in more complex filter code, or one that
    cannot be expressed with libpcap (e.g. different return codes in
    cls_bpf for flowids on various BPF code paths).

    Moreover, BPF JIT implementors may wish to manually write test cases
    in order to verify the resulting JIT image, and thus need low-level
    access to BPF code generation as well. Therefore, complete the available
    toolchain for BPF with this small bpf_asm helper tool for the tools/net/
    directory. These 3 complementary minimal helper tools round up and
    facilitate BPF development.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This patch adds a minimal BPF debugger that "emulates" the kernel's
    BPF engine (w/o extensions) and allows for single stepping (forwards
    and backwards through BPF code) or running with >=1 breakpoints through
    selected or all packets from a pcap file with a provided user filter
    in order to facilitate verification of a BPF program. When a breakpoint
    is being hit, it dumps all register contents, decoded instructions and
    in case of branches both decoded branch targets as well as other useful
    information.

    Having this facility is in particular useful to verify BPF programs
    against given test traffic *before* attaching to a live system.

    With the general availability of cls_bpf, xt_bpf, socket filters,
    team driver and e.g. PTP code, all BPF users, quite often a single
    more complex BPF program is being used. Reasons for a more complex
    BPF program are primarily to optimize execution time for making a
    verdict when multiple simple BPF programs are combined into one in
    order to prevent parsing same headers multiple times. In particular,
    for cls_bpf that can have various return paths for encoding flowids,
    and xt_bpf to come to a fw verdict this can be the case.

    Therefore, as this can result in more complex and harder to debug
    code, it would be very useful to have this minimal tool for testing
    purposes. It can also be of help for BPF JIT developers as filters
    are "test attached" to the kernel on a temporary socket thus
    triggering a JIT image dump when enabled. The tool uses an interactive
    libreadline shell with auto-completion and history support.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Minor fix for printk format of a phys_addr_t, and the switch of two local
    functions to static since they're not used outside of the file.

    Signed-off-by: Olof Johansson
    Signed-off-by: David S. Miller

    Olof Johansson
     
  • Only used locally. Found by sparse.

    Signed-off-by: Olof Johansson
    Signed-off-by: David S. Miller

    Olof Johansson
     
  • Silences the below warnings when building with ARM_LPAE enabled, which
    gives longer dma_addr_t by default:

    drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_desc_pool_create':
    drivers/net/ethernet/ti/davinci_cpdma.c:182:3: warning: passing argument 3 of 'dma_alloc_attrs' from incompatible pointer type [enabled by default]
    drivers/net/ethernet/ti/davinci_cpdma.c: In function 'desc_phys':
    drivers/net/ethernet/ti/davinci_cpdma.c:222:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    drivers/net/ethernet/ti/davinci_cpdma.c:223:8: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

    Signed-off-by: Olof Johansson
    Signed-off-by: David S. Miller

    Olof Johansson
     
  • Removed the shared ei_debug variable. Replaced it by adding u32 msg_enable to
    the private struct ei_device. Now each 8390 ethernet instance has a per-device
    logging variable.

    Changed older style printk() calls to more canonical forms.

    Tested on: ne, ne2k-pci, smc-ultra, and wd hardware.

    V4.0
    - Substituted pr_info() and pr_debug() for printk() KERN_INFO and KERN_DEBUG

    V3.0
    - Checked for cases where pr_cont() was most appropriate choice.
    - Changed module parameter from 'debug' to 'msg_enable' because debug was
    no longer the best description.

    V2.0
    - Changed netif_msg_(drv|probe|ifdown|rx_err|tx_err|tx_queued|intr|rx_status|hw)
    to netif_(dbg|info|warn|err) where possible.

    Signed-off-by: Matthew Whitehead
    Signed-off-by: David S. Miller

    Matthew Whitehead
     
  • RFC 4191 states in 3.5:

    When a host avoids using any non-reachable router X and instead sends
    a data packet to another router Y, and the host would have used
    router X if router X were reachable, then the host SHOULD probe each
    such router X's reachability by sending a single Neighbor
    Solicitation to that router's address. A host MUST NOT probe a
    router's reachability in the absence of useful traffic that the host
    would have sent to the router if it were reachable. In any case,
    these probes MUST be rate-limited to no more than one per minute per
    router.

    Currently, when the neighbour corresponding to a router falls into
    NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
    value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
    should be probed with a single NS. The probe is ratelimited by the existing
    code. To better distinguish meanings of the failure values, rename
    RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.

    Signed-off-by: Jiri Benc
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • In sctp_err_lookup, goto out while the asoc is not NULL, so remove the
    check NULL. Also, in sctp_err_finish which called by sctp_v4_err and
    sctp_v6_err, they pass asoc to sctp_err_finish while the asoc is not
    NULL, so remove the check.

    Signed-off-by: Wang Weidong
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    wangweidong
     
  • It already has a NULL pointer judgment of rtab in qdisc_put_rtab().
    Remove the judgment outside of qdisc_put_rtab().

    Signed-off-by: Yang Yingliang
    Signed-off-by: David S. Miller

    Yang Yingliang
     
  • Help of this function says: "in_dev: only on this interface, 0=any interface",
    but since commit 39a6d0630012 ("[NETNS]: Process inet_confirm_addr in the
    correct namespace."), the code supposes that it will never be NULL. This
    function is never called with in_dev == NULL, but it's exported and may be used
    by an external module.

    Because this patch restore the ability to call inet_confirm_addr() with in_dev
    == NULL, I partially revert the above commit, as suggested by Julian.

    CC: Julian Anastasov
    Signed-off-by: Nicolas Dichtel
    Reviewed-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • vxlan_group_used only allows device to leave multicast group
    when the remote_ip of this vxlan device is difference from
    other vxlan devices' remote_ip. this will cause device not
    leave multicast group untile the vn_sock of this vxlan deivce
    being released.

    The check in vxlan_group_used is not quite precise. since even
    the remote_ip is same, but these vxlan devices may use different
    lower devices, and they may use different vn_socks.

    Only when some vxlan devices use the same vn_sock,same lower
    device and same remote_ip, the mc_list of the vn_sock should
    not be changed.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • In vxlan_open, vxlan_group_used always returns true,
    because the state of the vxlan deivces which we want
    to open has alreay been running. and it has already
    in vxlan_list.

    Since ip_mc_join_group takes care of the reference
    of struct ip_mc_list. removing vxlan_group_used here
    is safe.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     
  • Without this bgmac_adjust_link didn't know it should re-initialize MAC
    state. This led to the MAC not working after if down & up routine.

    Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     
  • SKIP_NONLOCAL hides the control flow. The control flow should be
    inlined and expanded explicitly in code so that someone who reads
    it can tell the control flow can be changed by the statement.

    Signed-off-by: Yang Yingliang
    Signed-off-by: David S. Miller

    Yang Yingliang
     

11 Dec, 2013

9 commits

  • When adjusting the link speed, the target frequency is determined by a
    'swith (LINK_SPEED)' statement, that assigns the target rate only for
    valid and expected LINK_SPEED values. This incomplete switch statement
    leads to the following build warning:
    drivers/net/ethernet/cadence/macb.c: In function 'macb_handle_link_change':
    >> drivers/net/ethernet/cadence/macb.c:241:14: warning: 'rate' may be used uninitialized in this function [-Wmaybe-uninitialized]
    netdev_warn(dev, "unable to generate target frequency: %ld Hz\n",
    ^
    drivers/net/ethernet/cadence/macb.c:215:13: note: 'rate' was declared here
    long ferr, rate, rate_rounded;

    Fixing this by bailing out of that function in the switch's default case
    before the rate variable is used.

    Reported-by: kbuild test robot
    Signed-off-by: Soren Brinkmann
    Signed-off-by: David S. Miller

    Soren Brinkmann
     
  • Jon Maloy says:

    ====================
    tipc: cleanups in media and bearer layer

    This commit series performs a number cleanups in order to make the
    bearer and media part of the code more comprehensible and manageable.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In early versions of TIPC it was possible to administratively block
    individual links through the use of the member flag 'blocked'. This
    functionality was deemed redundant, and since commit 7368dd ("tipc:
    clean out all instances of #if 0'd unused code"), this flag has been
    unused.

    In the current code, a link only needs to be blocked for sending and
    reception if it is subject to an ongoing link failover. In that case,
    it is sufficient to check if the number of expected failover packets
    is non-zero, something which is done via the funtion 'link_blocked()'.

    This commit finally removes the redundant 'blocked' flag completely.

    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Currently TIPC supports two L2 media types, Ethernet and Infiniband.
    Because both these media are accessed through the common net_device API,
    several functions in the two media adaptation files turn out to be
    fully or almost identical, leading to unnecessary code duplication.

    In this commit we extract this common code from the two media files
    and move them to the generic bearer.c. Additionally, we change
    the function names to reflect their real role: to access L2 media,
    irrespective of type.

    Signed-off-by: Ying Xue
    Cc: Patrick McHardy
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Currently, registering a TIPC stack handler in the network device layer
    is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
    repectively. But, as this registration is not media specific, we can
    avoid some code duplication by moving the registering function to
    the generic bearer layer, to the file bearer.c, and call it only once.
    The same is true for the network device event notifier.

    As a side effect, the two workqueues we are using for for setting up/
    cleaning up media can now be eliminated. Furthermore, the array for
    storing the specific media type structs, media_array[], can be entirely
    deleted.

    Note that the eth_started and ib_started flags were removed during the
    code relocation. There is now only one call to bearer_setup and
    bearer_cleanup, and these can logically not race against each other.

    Despite its size, this cleanup work incurs no functional changes in TIPC.
    In particular, it should be noted that the sequence ordering of received
    packets is unaffected by this change, since packet reception never was
    subject to any work queue handling in the first place.

    Signed-off-by: Ying Xue
    Cc: Patrick McHardy
    Signed-off-by: Jon Maloy
    Reviewed-by: Paul Gortmaker
    Signed-off-by: David S. Miller

    Ying Xue
     
  • TIPC is currently using the field 'af_packet_priv' in struct net_device
    as a handle to find the bearer instance associated to the given network
    device. But, by doing so it is blocking other networking cleanups, such
    as the one discussed here:

    http://patchwork.ozlabs.org/patch/178044/

    This commit removes this usage from TIPC. Instead, we introduce a new
    field, 'tipc_ptr', to the net_device structure, to serve this purpose.
    When TIPC bearer is enabled, the bearer object is associated to
    'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall
    from a networking device, the bearer object can now be obtained from
    'tipc_ptr'. When a bearer is disabled, the bearer object is detached
    from its underlying network device by setting 'tipc_ptr' to NULL.

    Additionally, an RCU lock is used to protect the new pointer.
    Henceforth, the existing tipc_net_lock is used in write mode to
    serialize write accesses to this pointer, while the new RCU lock is
    applied on the read side to ensure that the pointer is 100% valid
    within its wrapped area for all readers.

    Signed-off-by: Ying Xue
    Cc: Patrick McHardy
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • struct 'tipc_media' represents the specific info that the media
    layer adaptors (eth_media and ib_media) expose to the generic
    bearer layer. We clarify this by improved commenting, and by giving
    the 'media_list' array the more appropriate name 'media_info_array'.

    There are no functional changes in this commit.

    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • Communication media types are abstracted through the struct 'tipc_media',
    one per media type. These structs are allocated statically inside their
    respective media file.

    Furthermore, in order to be able to reach all instances from a central
    location, we keep a static array with pointers to these structs. This
    array is currently initialized at runtime, under protection of
    tipc_net_lock. However, since the contents of the array itself never
    changes after initialization, we can just as well initialize it at
    compile time and make it 'const', at the same time making it obvious
    that no lock protection is needed here.

    This commit makes the array constant and removes the redundant lock
    protection.

    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • sk_buff lists are currently relased by looping over the list and
    explicitly releasing each buffer.

    We replace all occurrences of this loop with a call to kfree_skb_list().

    Signed-off-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue