27 May, 2011

1 commit


25 May, 2011

1 commit

  • In igmp_group_dropped() we call ip_mc_clear_src(), which resets the number
    of source filters per mulitcast. However, igmp_group_dropped() is also
    called on NETDEV_DOWN, NETDEV_PRE_TYPE_CHANGE and NETDEV_UNREGISTER, which
    means that the group might get added back on NETDEV_UP, NETDEV_REGISTER and
    NETDEV_POST_TYPE_CHANGE respectively, leaving us with broken source
    filters.

    To fix that, we must clear the source filters only when there are no users
    in the ip_mc_list, i.e. in ip_mc_dec_group() and on device destroy.

    Acked-by: David L Stevens
    Signed-off-by: Veaceslav Falico
    Signed-off-by: David S. Miller

    Veaceslav Falico
     

24 May, 2011

3 commits

  • All static seqlock should be initialized with the lockdep friendly
    __SEQLOCK_UNLOCKED() macro.

    Remove legacy SEQLOCK_UNLOCKED() macro.

    Signed-off-by: Eric Dumazet
    Cc: David Miller
    Link: http://lkml.kernel.org/r/%3C1306238888.3026.31.camel%40edumazet-laptop%3E
    Signed-off-by: Thomas Gleixner

    Eric Dumazet
     
  • The %pK format specifier is designed to hide exposed kernel pointers,
    specifically via /proc interfaces. Exposing these pointers provides an
    easy target for kernel write vulnerabilities, since they reveal the
    locations of writable structures containing easily triggerable function
    pointers. The behavior of %pK depends on the kptr_restrict sysctl.

    If kptr_restrict is set to 0, no deviation from the standard %p behavior
    occurs. If kptr_restrict is set to 1, the default, if the current user
    (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
    (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
    If kptr_restrict is set to 2, kernel pointers using %pK are printed as
    0's regardless of privileges. Replacing with 0's was chosen over the
    default "(null)", which cannot be parsed by userland %p, which expects
    "(nil)".

    The supporting code for kptr_restrict and %pK are currently in the -mm
    tree. This patch converts users of %p in net/ to %pK. Cases of printing
    pointers to the syslog are not covered, since this would eliminate useful
    information for postmortem debugging and the reading of the syslog is
    already optionally protected by the dmesg_restrict sysctl.

    Signed-off-by: Dan Rosenberg
    Cc: James Morris
    Cc: Eric Dumazet
    Cc: Thomas Graf
    Cc: Eugene Teo
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Dan Rosenberg
     
  • net/ipv4/ping.c: In function ‘ping_v4_unhash’:
    net/ipv4/ping.c:140:28: warning: variable ‘hslot’ set but not used

    Signed-off-by: Eric Dumazet
    CC: Vasiliy Kulikov
    Acked-by: Vasiliy Kulikov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 May, 2011

4 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
    bnx2x: allow device properly initialize after hotplug
    bnx2x: fix DMAE timeout according to hw specifications
    bnx2x: properly handle CFC DEL in cnic flow
    bnx2x: call dev_kfree_skb_any instead of dev_kfree_skb
    net: filter: move forward declarations to avoid compile warnings
    pktgen: refactor pg_init() code
    pktgen: use vzalloc_node() instead of vmalloc_node() + memset()
    net: skb_trim explicitely check the linearity instead of data_len
    ipv4: Give backtrace in ip_rt_bug().
    net: avoid synchronize_rcu() in dev_deactivate_many
    net: remove synchronize_net() from netdev_set_master()
    rtnetlink: ignore NETDEV_RELEASE and NETDEV_JOIN event
    net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASE
    bridge: call NETDEV_JOIN notifiers when add a slave
    netpoll: disable netpoll when enslave a device
    macvlan: Forward unicast frames in bridge mode to lowerdev
    net: Remove linux/prefetch.h include from linux/skbuff.h
    ipv4: Include linux/prefetch.h in fib_trie.c
    netlabel: Remove prefetches from list handlers.
    drivers/net: add prefetch header for prefetch users
    ...

    Fixed up prefetch parts: removed a few duplicate prefetch.h includes,
    fixed the location of the igb prefetch.h, took my version of the
    skbuff.h code without the extra parentheses etc.

    Linus Torvalds
     
  • After discovering that wide use of prefetch on modern CPUs
    could be a net loss instead of a win, net drivers which were
    relying on the implicit inclusion of prefetch.h via the list
    headers showed up in the resulting cleanup fallout. Give
    them an explicit include via the following $0.02 script.

    =========================================
    #!/bin/bash
    MANUAL=""
    for i in `git grep -l 'prefetch(.*)' .` ; do
    grep -q '' $i
    if [ $? = 0 ] ; then
    continue
    fi

    ( echo '?^#include '
    echo .
    echo w
    echo q
    ) | ed -s $i > /dev/null 2>&1
    if [ $? != 0 ]; then
    echo $i needs manual fixup
    MANUAL="$i $MANUAL"
    fi
    done
    echo ------------------- 8\
    [ Fixed up some incorrect #include placements, and added some
    non-network drivers and the fib_trie.c case - Linus ]
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     
  • Add a stack backtrace to the ip_rt_bug path for debugging

    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones
     
  • Signed-off-by: David S. Miller

    David S. Miller
     

21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

20 May, 2011

4 commits

  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits)
    Revert "rcu: Decrease memory-barrier usage based on semi-formal proof"
    net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
    batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu
    batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree()
    batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu
    net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu()
    net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu()
    net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu()
    perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()
    perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()
    net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu()
    net,rcu: convert call_rcu(net_generic_release) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu()
    net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu()
    security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu()
    net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()
    net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()
    net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()
    ...

    Linus Torvalds
     
  • v3 -> v4: fix return boolean false instead of 0 for ic_is_init_dev

    Currently the ip auto configuration has a hardcoded delay of 1 second.
    When (ethernet) link takes longer to come up (e.g. more than 3 seconds),
    nfs root may not be found.

    Remove the hardcoded delay, and wait for carrier on at least one network
    device.

    Signed-off-by: Micha Nelissen
    Cc: David Miller
    Signed-off-by: David S. Miller

    Micha Nelissen
     
  • The characters in a line should be no more than 80.

    Signed-off-by: Changli Gao
    Signed-off-by: David S. Miller

    Changli Gao
     
  • As these functions are only used in this file.

    Signed-off-by: Changli Gao
    Signed-off-by: David S. Miller

    Changli Gao
     

19 May, 2011

4 commits


18 May, 2011

2 commits


17 May, 2011

2 commits

  • Commit 6623e3b24a5e (ipv4: IP defragmentation must be ECN aware) was an
    attempt to not lose "Congestion Experienced" (CE) indications when
    performing datagram defragmentation.

    Stefanos Harhalakis raised the point that RFC 3168 requirements were not
    completely met by this commit.

    In particular, we MUST detect invalid combinations and eventually drop
    illegal frames.

    Reported-by: Stefanos Harhalakis
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • At these points we have a fully filled in value via the IP
    header the form of ip_hdr(skb)->saddr

    Signed-off-by: David S. Miller

    David S. Miller
     

16 May, 2011

1 commit

  • udp_ioctl() really handles UDP and UDPLite protocols.

    1) It can increment UDP_MIB_INERRORS in case first_packet_length() finds
    a frame with bad checksum.

    2) It has a dependency on sizeof(struct udphdr), not applicable to
    ICMP/PING

    If ping sockets need to handle SIOCINQ/SIOCOUTQ ioctl, this should be
    done differently.

    Signed-off-by: Eric Dumazet
    CC: Vasiliy Kulikov
    Acked-by: Vasiliy Kulikov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 May, 2011

1 commit

  • ping_table is not __read_mostly, since it contains one rwlock,
    and is static to ping.c

    ping_port_rover & ping_v4_lookup are static

    Signed-off-by: Eric Dumazet
    Acked-by: Vasiliy Kulikov
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 May, 2011

5 commits

  • At this point iph->daddr equals what rt->rt_dst would hold.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pass in the sk_buff so that we can fetch the necessary keys from
    the packet header when working with input routes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This will allow ip_options_build() to reliably look at the values of
    iph->{daddr,saddr}

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This code block executes when opt->srr_is_hit is set. It will be
    set only by ip_options_rcv_srr().

    ip_options_rcv_srr() walks until it hits a matching nexthop in the SRR
    option addresses, and when it matches one 1) looks up the route for
    that nexthop and 2) on route lookup success it writes that nexthop
    value into iph->daddr.

    ip_forward_options() runs later, and again walks the SRR option
    addresses looking for the option matching the destination of the route
    stored in skb_rtable(). This route will be the same exact one looked
    up for the nexthop by ip_options_rcv_srr().

    Therefore "rt->rt_dst == iph->daddr" must be true.

    All it really needs to do is record the route's source address in the
    matching SRR option adddress. It need not write iph->daddr again,
    since that has already been done by ip_options_rcv_srr() as detailed
    above.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch adds IPPROTO_ICMP socket kind. It makes it possible to send
    ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
    without any special privileges. In other words, the patch makes it
    possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In
    order not to increase the kernel's attack surface, the new functionality
    is disabled by default, but is enabled at bootup by supporting Linux
    distributions, optionally with restriction to a group or a group range
    (see below).

    Similar functionality is implemented in Mac OS X:
    http://www.manpagez.com/man/4/icmp/

    A new ping socket is created with

    socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

    Message identifiers (octets 4-5 of ICMP header) are interpreted as local
    ports. Addresses are stored in struct sockaddr_in. No port numbers are
    reserved for privileged processes, port 0 is reserved for API ("let the
    kernel pick a free number"). There is no notion of remote ports, remote
    port numbers provided by the user (e.g. in connect()) are ignored.

    Data sent and received include ICMP headers. This is deliberate to:
    1) Avoid the need to transport headers values like sequence numbers by
    other means.
    2) Make it easier to port existing programs using raw sockets.

    ICMP headers given to send() are checked and sanitized. The type must be
    ICMP_ECHO and the code must be zero (future extensions might relax this,
    see below). The id is set to the number (local port) of the socket, the
    checksum is always recomputed.

    ICMP reply packets received from the network are demultiplexed according
    to their id's, and are returned by recv() without any modifications.
    IP header information and ICMP errors of those packets may be obtained
    via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
    quenches and redirects are reported as fake errors via the error queue
    (IP_RECVERR); the next hop address for redirects is saved to ee_info (in
    network order).

    socket(2) is restricted to the group range specified in
    "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning
    that nobody (not even root) may create ping sockets. Setting it to "100
    100" would grant permissions to the single group (to either make
    /sbin/ping g+s and owned by this group or to grant permissions to the
    "netadmins" group), "0 4294967295" would enable it for the world, "100
    4294967295" would enable it for the users, but not daemons.

    The existing code might be (in the unlikely case anyone needs it)
    extended rather easily to handle other similar pairs of ICMP messages
    (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
    etc.).

    Userspace ping util & patch for it:
    http://openwall.info/wiki/people/segoon/ping

    For Openwall GNU/*/Linux it was the last step on the road to the
    setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels)
    is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
    http://mirrors.kernel.org/openwall/Owl/current/iso/

    Initially this functionality was written by Pavel Kankovsky for
    Linux 2.4.32, but unfortunately it was never made public.

    All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
    the patch.

    PATCH v3:
    - switched to flowi4.
    - minor changes to be consistent with raw sockets code.

    PATCH v2:
    - changed ping_debug() to pr_debug().
    - removed CONFIG_IP_PING.
    - removed ping_seq_fops.owner field (unused for procfs).
    - switched to proc_net_fops_create().
    - switched to %pK in seq_printf().

    PATCH v1:
    - fixed checksumming bug.
    - CAP_NET_RAW may not create icmp sockets anymore.

    RFC v2:
    - minor cleanups.
    - introduced sysctl'able group range to restrict socket(2).

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: David S. Miller

    Vasiliy Kulikov
     

13 May, 2011

4 commits


12 May, 2011

1 commit


11 May, 2011

6 commits

  • As it is, we assign the outer modes output function to the dst entry
    when we create the xfrm bundle. This leads to two problems on interfamily
    scenarios. We might insert ipv4 packets into ip6_fragment when called
    from xfrm6_output. The system crashes if we try to fragment an ipv4
    packet with ip6_fragment. This issue was introduced with git commit
    ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
    as needed). The second issue is, that we might insert ipv4 packets in
    netfilter6 and vice versa on interfamily scenarios.

    With this patch we assign the inner mode output function to the dst entry
    when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
    mode is used and the right fragmentation and netfilter functions are called.
    We switch then to outer mode with the output_finish functions.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Commit e67f88dd12f6 (net: dont hold rtnl mutex during netlink dump
    callbacks) switched rtnl protection to RCU, but we forgot to adjust two
    rcu_dereference() lockdep annotations :

    inet_get_link_af_size() or inet_fill_link_af() might be called with
    rcu_read_lock or rtnl held, so use rcu_dereference_rtnl()
    instead of rtnl_dereference()

    Reported-by: Valdis Kletnieks
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Rearrange xfrm4_dst_lookup() so that it works by calling a helper
    function __xfrm_dst_lookup() that takes an explicit flow key storage
    area as an argument.

    Use this new helper in xfrm4_get_saddr() so we can fetch the selected
    source address from the flow instead of from rt->rt_src

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We already track and pass around the correct flow key,
    so simply use it in udp_send_skb().

    Signed-off-by: David S. Miller

    David S. Miller
     
  • On input packets, rt->rt_src always equals ip_hdr(skb)->saddr

    Anything that mangles or otherwise changes the IP header must
    relookup the route found at skb_rtable(). Therefore this
    invariant must always hold true.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This eliminates an access to rt->rt_src.

    Signed-off-by: David S. Miller

    David S. Miller