04 Aug, 2013

1 commit


28 Jul, 2013

1 commit


15 Jul, 2013

2 commits

  • commit 681f130f39e10 ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD
    flag") added a potential NULL dereference if an old iptables package
    uses v0 of the match.

    Fix this by removing the test on @info in fast path.

    IPv6 can remove the test as well, as it uses v1 or v2.

    Reported-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Cc: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     
  • nf_ct_expect_alloc leaves unset the expectation NAT fields. However,
    ctnetlink_exp_dump_expect expects them to be zeroed in case they are
    not used, which may not be the case. This results in dumping the NAT
    tuple of the expectation when it should not.

    Fix it by zeroing the NAT fields of the expectation.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

04 Jul, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/freescale/fec_main.c
    drivers/net/ethernet/renesas/sh_eth.c
    net/ipv4/gre.c

    The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
    and the splitting of the gre.c code into seperate files.

    The FEC conflict was two sets of changes adding ethtool support code
    in an "!CONFIG_M5272" CPP protected block.

    Finally the sh_eth.c conflict was between one commit add bits set
    in the .eesr_err_check mask whilst another commit removed the
    .tx_error_check member and assignments.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Jul, 2013

1 commit

  • The common case is that TCP/IP checksums have already been
    verified, e.g. by hardware (rx checksum offload), or conntrack.

    Userspace can use this flag to determine when the checksum
    has not been validated yet.

    If the flag is set, this doesn't necessarily mean that the packet has
    an invalid checksum, e.g. if NIC doesn't support rx checksum.

    Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can
    infer that IP/TCP checksum has already been validated if either the
    SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED
    flag is unset.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

26 Jun, 2013

6 commits

  • Add sync_persist_mode flag to reduce sync traffic
    by syncing only persistent templates.

    Signed-off-by: Julian Anastasov
    Tested-by: Aleksey Chudov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • By default the SH scheduler rejects connections that are hashed onto a
    realserver of weight 0. This patch adds a flag to make SH choose a
    different realserver in this case, instead of rejecting the connection.

    The patch also adds a flag to make SH include the source port (TCP, UDP,
    SCTP) in the hash as well as the source address. This basically allows
    for deterministic round-robin load balancing (i.e., where any director
    in a cluster of directors with identical config will send the same
    packet the same way).

    The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
    can be set per service. They are set using a new option to ipvsadm.

    Signed-off-by: Alexander Frolkin
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Alexander Frolkin
     
  • Drop SCTP connections under load (dropentry context) depending
    on the protocol state, just like for TCP: INIT conns are
    dropped immediately, established are dropped randomly while
    connections in progress or shutdown are skipped.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • Convert the SCTP state table, so that it is more readable.
    Change the states to be according to the diagram in RFC 2960
    and add more states suitable for middle box. Still, such
    change in states adds incompatibility if systems in sync
    setup include this change and others do not include it.

    With this change we also have proper transitions in INPUT-ONLY
    mode (DR/TUN) where we see packets only from client. Now
    we should not switch to 10-second CLOSED state at a time
    when we should stay in ESTABLISHED state.

    The short names for states are because we have 16-char space
    in ipvsadm and 11-char limit for the connection list format.
    It is a sequence of the TCP implementation where the longest
    state name is ESTABLISHED.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • This adds support for sloppy TCP and SCTP modes to IPVS.

    When enabled (sysctls net.ipv4.vs.sloppy_tcp and
    net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
    packet, not just a TCP SYN (or SCTP INIT).

    This allows connections to fail over from one IPVS director to another
    mid-flight.

    Signed-off-by: Alexander Frolkin
    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Alexander Frolkin
     
  • Before now the schedulers needed access only to IP
    addresses and it was easy to get them from skb by
    using ip_vs_fill_iph_addr_only.

    New changes for the SH scheduler will need the protocol
    and ports which is difficult to get from skb for the
    IPv6 case. As we have all the data in the iph structure,
    to avoid the same slow lookups provide the iph to schedulers.

    Signed-off-by: Julian Anastasov
    Acked-by: Hans Schillstrom
    Signed-off-by: Simon Horman

    Julian Anastasov
     

24 Jun, 2013

2 commits

  • commit 0ceabd83875b72a29f33db4ab703d6ba40ea4c58
    (netfilter: ctnetlink: deliver labels to userspace) sets the event bit
    when we raced with another packet, instead of raising the event bit
    when the label bit is set for the first time.

    commit 9b21f6a90924dfe8e5e686c314ddb441fb06501e
    (netfilter: ctnetlink: allow userspace to modify labels) forgot to update
    the event mask in the "conntrack already exists" case.

    Both issues result in CTA_LABELS attribute not getting included in the
    conntrack event.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • In (b20ab9c netfilter: nf_ct_helper: better logging for dropped packets)
    there were some missing brackets around the logging information, thus
    always returning drop.

    Closes https://bugzilla.kernel.org/show_bug.cgi?id=60061

    Signed-off-by: Balazs Peter Odor
    Signed-off-by: Pablo Neira Ayuso

    Balazs Peter Odor
     

21 Jun, 2013

1 commit

  • xt_socket module can be a nice replacement to conntrack module
    in some cases (SYN filtering for example)

    But it lacks the ability to match the 3rd packet of TCP
    handshake (ACK coming from the client).

    Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism.

    The wildcard is the legacy socket match behavior, that ignores
    LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent)

    iptables -I INPUT -p tcp --syn -j SYN_CHAIN
    iptables -I INPUT -m socket --nowildcard -j ACCEPT

    Signed-off-by: Eric Dumazet
    Cc: Patrick McHardy
    Cc: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     

20 Jun, 2013

3 commits

  • When loose tracking is enabled (default), non-syn packets cause
    creation of new conntracks in established state with default timeout for
    established state (5 days). This causes the table to fill up with UNREPLIED
    when the 'new ack' packet happened to be the last-ack of a previous,
    already timed-out connection.

    Consider:

    A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255
    B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123

    C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123
    D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255

    B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout,
    C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout.

    Use UNACK timeout (5 minutes) instead to get rid of these entries sooner
    when in ESTABLISHED state without having seen traffic in both directions.

    Signed-off-by: Florian Westphal
    Acked-by: Jozsef Kadlecsik
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • These are the only calls under net/ that do not check nla_parse_nested()
    for its error code, but simply continue execution. If parsing of netlink
    attributes fails, we should return with an error instead of continuing.
    In nearly all of these calls we have a policy attached, that is being
    type verified during nla_parse_nested(), which we would miss checking
    for otherwise.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Pablo Neira Ayuso

    Daniel Borkmann
     
  • Conflicts:
    drivers/net/wireless/ath/ath9k/Kconfig
    drivers/net/xen-netback/netback.c
    net/batman-adv/bat_iv_ogm.c
    net/wireless/nl80211.c

    The ath9k Kconfig conflict was a change of a Kconfig option name right
    next to the deletion of another option.

    The xen-netback conflict was overlapping changes involving the
    handling of the notify list in xen_netbk_rx_action().

    Batman conflict resolution provided by Antonio Quartulli, basically
    keep everything in both conflict hunks.

    The nl80211 conflict is a little more involved. In 'net' we added a
    dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
    Linus reported. Meanwhile in 'net-next' the handlers were converted
    to use pre and post doit handlers which use a flag to determine
    whether to hold the RTNL mutex around the operation.

    However, the dump handlers to not use this logic. Instead they have
    to explicitly do the locking. There were apparent bugs in the
    conversion of nl80211_dump_wiphy() in that we were not dropping the
    RTNL mutex in all the return paths, and it seems we very much should
    be doing so. So I fixed that whilst handling the overlapping changes.

    To simplify the initial returns, I take the RTNL mutex after we try
    to allocate 'tb'.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Jun, 2013

1 commit


13 Jun, 2013

1 commit

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

12 Jun, 2013

2 commits

  • Similar to commit bc6bcb59 ("netfilter: xt_TCPOPTSTRIP: fix
    possible mangling beyond packet boundary"), add safe fragment
    handling to xt_TCPMSS.

    Signed-off-by: Phil Oester
    Signed-off-by: Pablo Neira Ayuso

    Phil Oester
     
  • As a followup to commit 409b545a ("netfilter: xt_TCPMSS: Fix violation
    of RFC879 in absence of MSS option"), John Heffner points out that IPv6
    has a higher MTU than IPv4, and thus a higher minimum MSS. Update TCPMSS
    target to account for this, and update RFC comment.

    While at it, point to more recent reference RFC1122 instead of RFC879.

    Signed-off-by: Phil Oester
    Signed-off-by: Pablo Neira Ayuso

    Phil Oester
     

11 Jun, 2013

2 commits

  • struct gnet_stats_rate_est contains u32 fields, so the bytes per second
    field can wrap at 34360Mbit.

    Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
    and switch the kernel to use this structure natively.

    This structure is dumped to user space as a new attribute :

    TCA_STATS_RATE_EST64

    Old tc command will now display the capped bps (to 34360Mbit), instead
    of wrapped values, and updated tc command will display correct
    information.

    Old tc command output, after patch :

    eric:~# tc -s -d qd sh dev lo
    qdisc pfifo 8001: root refcnt 2 limit 1000p
    Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
    rate 34360Mbit 189696pps backlog 0b 0p requeues 0

    This patch carefully reorganizes "struct Qdisc" layout to get optimal
    performance on SMP.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In (bc6bcb5 netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond
    packet boundary), the use of tcp_hdr was introduced. However, we
    cannot assume that skb->transport_header is set for non-local packets.

    Cc: Florian Westphal
    Reported-by: Phil Oester
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

10 Jun, 2013

1 commit

  • The entry struct has a 2 byte hole after ->port and another 4 byte
    hole after ->stats.outpkts. You must have CAP_NET_ADMIN in your
    namespace to hit this information leak.

    Signed-off-by: Dan Carpenter
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Dan Carpenter
     

08 Jun, 2013

1 commit


06 Jun, 2013

2 commits

  • Conflicts:
    net/netfilter/nf_log.c

    The conflict in nf_log.c is that in 'net' we added CONFIG_PROC_FS
    protection around foo_proc_entry() calls to fix a build failure,
    whereas in Pablo's tree a guard if() test around a call is
    remove_proc_entry() was removed. Trivially resolved.

    Pablo Neira Ayuso says:

    ====================
    The following patchset contains the first batch of
    Netfilter/IPVS updates for your net-next tree, they are:

    * Three patches with improvements and code refactorization
    for nfnetlink_queue, from Florian Westphal.

    * FTP helper now parses replies without brackets, as RFC1123
    recommends, from Jeff Mahoney.

    * Rise a warning to tell everyone about ULOG deprecation,
    NFLOG has been already in the kernel tree for long time
    and supersedes the old logging over netlink stub, from
    myself.

    * Don't panic if we fail to load netfilter core framework,
    just bail out instead, from myself.

    * Add cond_resched_rcu, used by IPVS to allow rescheduling
    while walking over big hashtables, from Simon Horman.

    * Change type of IPVS sysctl_sync_qlen_max sysctl to avoid
    possible overflow, from Zhang Yanfei.

    * Use strlcpy instead of strncpy to skip zeroing of already
    initialized area to write the extension names in ebtables,
    from Chen Gang.

    * Use already existing per-cpu notrack object from xt_CT,
    from Eric Dumazet.

    * Save explicit socket lookup in xt_socket now that we have
    early demux, also from Eric Dumazet.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Merge 'net' bug fixes into 'net-next' as we have patches
    that will build on top of them.

    This merge commit includes a change from Emil Goode
    (emilgoode@gmail.com) that fixes a warning that would
    have been introduced by this merge. Specifically it
    fixes the pingv6_ops method ipv6_chk_addr() to add a
    "const" to the "struct net_device *dev" argument and
    likewise update the dummy_ipv6_chk_addr() declaration.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Jun, 2013

5 commits


01 Jun, 2013

1 commit

  • This corrects an regression introduced by "net: Use 16bits for *_headers
    fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In
    that case skb->tail will be a pointer whereas skb->network_header
    will be an offset from head. This is corrected by using wrappers that
    ensure that calculations are always made using pointers.

    Reported-by: Stephen Rothwell
    Reported-by: Chen Gang
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

29 May, 2013

3 commits

  • kfree_rcu() requires offsetof(..., rcu_head) < 4096, which can
    get violated with a sufficiently high CONFIG_IP_VS_SH_TAB_BITS.

    Signed-off-by: Jan Beulich
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Jan Beulich
     
  • In dump_ipv6_packet(), the "recurse" parameter is zero only if
    dumping contents of a packet embedded into an ICMPv6 error
    message. Therefore we want to log packet mark if recurse is
    non-zero, not when it is zero.

    Signed-off-by: Michal Kubecek
    Signed-off-by: Pablo Neira Ayuso

    Michal Kubeček
     
  • So far, only net_device * could be passed along with netdevice notifier
    event. This patch provides a possibility to pass custom structure
    able to provide info that event listener needs to know.

    Signed-off-by: Jiri Pirko

    v2->v3: fix typo on simeth
    shortened dev_getter
    shortened notifier_info struct name
    v1->v2: fix notifier_call parameter in call_netdevice_notifier()
    Signed-off-by: David S. Miller

    Jiri Pirko
     

27 May, 2013

3 commits

  • The FTP conntrack code currently only accepts the following format for
    the 227 response for PASV:
    227 Entering Passive Mode (148,100,81,40,31,161).

    It doesn't accept the following format from an obscure server:
    227 Data transfer will passively listen to 67,218,99,134,50,144

    From RFC 1123:
    The format of the 227 reply to a PASV command is not
    well standardized. In particular, an FTP client cannot
    assume that the parentheses shown on page 40 of RFC-959
    will be present (and in fact, Figure 3 on page 43 omits
    them). Therefore, a User-FTP program that interprets
    the PASV reply must scan the reply for the first digit
    of the host and port numbers.

    This patch adds support for the RFC 1123 clarification by:
    - Allowing a search filter to specify NUL as the terminator so that
    try_number will return successfully if the array of numbers has been
    filled when an unexpected character is encountered.
    - Using space as the separator for the 227 reply and then scanning for
    the first digit of the number sequence. The number sequence is parsed
    out using the existing try_rfc959 but with a NUL terminator.

    References: https://bugzilla.novell.com/show_bug.cgi?id=466279
    References: http://bugzilla.netfilter.org/show_bug.cgi?id=574
    Reported-by: Mark Post
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Jiri Slaby
    Cc: Pablo Neira Ayuso
    Cc: Patrick McHardy
    Cc: "David S. Miller"
    Cc: netfilter-devel@vger.kernel.org
    Cc: netfilter@vger.kernel.org
    Cc: coreteam@netfilter.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Pablo Neira Ayuso

    Jeff Mahoney
     
  • Expire cached connection for new TCP/SCTP connection if real
    server is down. Otherwise, IPVS uses the dead server for the
    reused connection, instead of a new working one.

    Signed-off-by: Grzegorz Lyczba
    Acked-by: Hans Schillstrom
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Grzegorz Lyczba
     
  • The portid is set to NETLINK_CB(skb).portid at create time.
    The run-time check will always be false.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal