27 Sep, 2010

1 commit


20 Sep, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
    dca: disable dca on IOAT ver.3.0 multiple-IOH platforms
    netpoll: Disable IRQ around RCU dereference in netpoll_rx
    sctp: Do not reset the packet during sctp_packet_config().
    net/llc: storing negative error codes in unsigned short
    MAINTAINERS: move atlx discussions to netdev
    drivers/net/cxgb3/cxgb3_main.c: prevent reading uninitialized stack memory
    drivers/net/eql.c: prevent reading uninitialized stack memory
    drivers/net/usb/hso.c: prevent reading uninitialized memory
    xfrm: dont assume rcu_read_lock in xfrm_output_one()
    r8169: Handle rxfifo errors on 8168 chips
    3c59x: Remove atomic context inside vortex_{set|get}_wol
    tcp: Prevent overzealous packetization by SWS logic.
    net: RPS needs to depend upon USE_GENERIC_SMP_HELPERS
    phylib: fix PAL state machine restart on resume
    net: use rcu_barrier() in rollback_registered_many
    bonding: correctly process non-linear skbs
    ipv4: enable getsockopt() for IP_NODEFRAG
    ipv4: force_igmp_version ignored when a IGMPv3 query received
    ppp: potential NULL dereference in ppp_mp_explode()
    net/llc: make opt unsigned in llc_ui_setsockopt()
    ...

    Linus Torvalds
     

18 Sep, 2010

1 commit

  • sctp_packet_config() is called when getting the packet ready
    for appending of chunks. The function should not touch the
    current state, since it's possible to ping-pong between two
    transports when sending, and that can result packet corruption
    followed by skb overlfow crash.

    Reported-by: Thomas Dreibholz
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

17 Sep, 2010

2 commits


15 Sep, 2010

3 commits

  • You cannot invoke __smp_call_function_single() unless the
    architecture sets this symbol.

    Reported-by: Daniel Hellstrom
    Signed-off-by: David S. Miller

    David S. Miller
     
  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies
    statfs() gives ESTALE error
    NFS: Fix a typo in nfs_sockaddr_match_ipaddr6
    sunrpc: increase MAX_HASHTABLE_BITS to 14
    gss:spkm3 miss returning error to caller when import security context
    gss:krb5 miss returning error to caller when import security context
    Remove incorrect do_vfs_lock message
    SUNRPC: cleanup state-machine ordering
    SUNRPC: Fix a race in rpc_info_open
    SUNRPC: Fix race corrupting rpc upcall
    Fix null dereference in call_allocate

    Linus Torvalds
     
  • netdev_wait_allrefs() waits that all references to a device vanishes.

    It currently uses a _very_ pessimistic 250 ms delay between each probe.
    Some users reported that no more than 4 devices can be dismantled per
    second, this is a pretty serious problem for some setups.

    Most of the time, a refcount is about to be released by an RCU callback,
    that is still in flight because rollback_registered_many() uses a
    synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
    number of online cpus is one, because synchronize_rcu() is then a no op.

    time to remove 50 ipip tunnels on a UP machine :

    before patch : real 11.910s
    after patch : real 1.250s

    Reported-by: Nicolas Dichtel
    Reported-by: Octavian Purdila
    Reported-by: Benjamin LaHaise
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

14 Sep, 2010

3 commits

  • While integrating your man-pages patch for IP_NODEFRAG, I noticed
    that this option is settable by setsockopt(), but not gettable by
    getsockopt(). I suppose this is not intended. The (untested,
    trivial) patch below adds getsockopt() support.

    Signed-off-by: Michael kerrisk
    Acked-by: Jiri Olsa
    Signed-off-by: David S. Miller

    Michael Kerrisk
     
  • After all these years, it turns out that the
    /proc/sys/net/ipv4/conf/*/force_igmp_version
    parameter isn't fully implemented.

    *Symptom*:
    When set force_igmp_version to a value of 2, the kernel should only perform
    multicast IGMPv2 operations (IETF rfc2236). An host-initiated Join message
    will be sent as a IGMPv2 Join message. But if a IGMPv3 query message is
    received, the host responds with a IGMPv3 join message. Per rfc3376 and
    rfc2236, a IGMPv2 host should treat a IGMPv3 query as a IGMPv2 query and
    respond with an IGMPv2 Join message.

    *Consequences*:
    This is an issue when a IGMPv3 capable switch is the querier and will only
    issue IGMPv3 queries (which double as IGMPv2 querys) and there's an
    intermediate switch that is only IGMPv2 capable. The intermediate switch
    processes the initial v2 Join, but fails to recognize the IGMPv3 Join responses
    to the Query, resulting in a dropped connection when the intermediate v2-only
    switch times it out.

    *Identifying issue in the kernel source*:
    The issue is in this section of code (in net/ipv4/igmp.c), which is called when
    an IGMP query is received (from mainline 2.6.36-rc3 gitweb):
    ...
    A IGMPv3 query has a length >= 12 and no sources. This routine will exit after
    line 880, setting the general query timer (random timeout between 0 and query
    response time). This calls igmp_gq_timer_expire():
    ...
    .. which only sends a v3 response. So if a v3 query is received, the kernel
    always sends a v3 response.

    IGMP queries happen once every 60 sec (per vlan), so the traffic is low. A
    IGMPv3 query *is* a strict superset of a IGMPv2 query, so this patch properly
    short circuit's the v3 behaviour.

    One issue is that this does not address force_igmp_version=1. Then again, I've
    never seen any IGMPv1 multicast equipment in the wild. However there is a lot
    of v2-only equipment. If it's necessary to support the IGMPv1 case as well:

    837 if (len == 8 || IGMP_V2_SEEN(in_dev) || IGMP_V1_SEEN(in_dev)) {

    Signed-off-by: David S. Miller

    Bob Arendt
     
  • The members of struct llc_sock are unsigned so if we pass a negative
    value for "opt" it can cause a sign bug. Also it can cause an integer
    overflow when we multiply "opt * HZ".

    CC: stable@kernel.org
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     

13 Sep, 2010

9 commits

  • Four memory leak fixes in the 9P code.

    Signed-off-by: Latchesar Ionkov
    Signed-off-by: Eric Van Hensbergen

    Latchesar Ionkov
     
  • The maximum size of the authcache is now set to 1024 (10 bits),
    but on our server we need at least 4096 (12 bits). Increase
    MAX_HASHTABLE_BITS to 14. This is a maximum of 16384 entries,
    each containing a pointer (8 bytes on x86_64). This is
    exactly the limit of kmalloc() (128K).

    Signed-off-by: Miquel van Smoorenburg
    Signed-off-by: Trond Myklebust

    Miquel van Smoorenburg
     
  • spkm3 miss returning error to up layer when import security context,
    it may be return ok though it has failed to import security context.

    Signed-off-by: Bian Naimeng
    Signed-off-by: Trond Myklebust

    Bian Naimeng
     
  • krb5 miss returning error to up layer when import security context,
    it may be return ok though it has failed to import security context.

    Signed-off-by: Bian Naimeng
    Signed-off-by: Trond Myklebust

    Bian Naimeng
     
  • This is just a minor cleanup: net/sunrpc/clnt.c clarifies the rpc client
    state machine by commenting each state and by laying out the functions
    implementing each state in the order that each state is normally
    executed (in the absence of errors).

    The previous patch "Fix null dereference in call_allocate" changed the
    order of the states. Move the functions and update the comments to
    reflect the change.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    J. Bruce Fields
     
  • There is a race between rpc_info_open and rpc_release_client()
    in that nothing stops a process from opening the file after
    the clnt->cl_kref goes to zero.

    Fix this by using atomic_inc_unless_zero()...

    Reported-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     
  • If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
    after rpc_pipe_release calls rpc_purge_list(), but before it calls
    gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
    will free a message without deleting it from the rpci->pipe list.

    We will be left with a freed object on the rpc->pipe list. Most
    frequent symptoms are kernel crashes in rpc.gssd system calls on the
    pipe in question.

    Reported-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     
  • In call_allocate we need to reach the auth in order to factor au_cslack
    into the allocation.

    As of a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound
    cred to struct rpc_rqst", call_allocate attempts to do this by
    dereferencing tk_client->cl_auth, however this is not guaranteed to be
    defined--cl_auth can be zero in the case of gss context destruction (see
    rpc_free_auth).

    Reorder the client state machine to bind credentials before allocating,
    so that we can instead reach the auth through the cred.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    J. Bruce Fields
     
  • The list_head conversion unearther an unnecessary flow
    check. Since flow is always NULL here we don't need to
    see if a matching flow exists already.

    Reported-by: Jiri Slaby
    Signed-off-by: David S. Miller

    David S. Miller
     

10 Sep, 2010

1 commit


09 Sep, 2010

6 commits

  • David S. Miller
     
  • commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
    added a secondary hash on UDP, hashed on (local addr, local port).

    Problem is that following sequence :

    fd = socket(...)
    connect(fd, &remote, ...)

    not only selects remote end point (address and port), but also sets
    local address, while UDP stack stored in secondary hash table the socket
    while its local address was INADDR_ANY (or ipv6 equivalent)

    Sequence is :
    - autobind() : choose a random local port, insert socket in hash tables
    [while local address is INADDR_ANY]
    - connect() : set remote address and port, change local address to IP
    given by a route lookup.

    When an incoming UDP frame comes, if more than 10 sockets are found in
    primary hash table, we switch to secondary table, and fail to find
    socket because its local address changed.

    One solution to this problem is to rehash datagram socket if needed.

    We add a new rehash(struct socket *) method in "struct proto", and
    implement this method for UDP v4 & v6, using a common helper.

    This rehashing only takes care of secondary hash table, since primary
    hash (based on local port only) is not changed.

    Reported-by: Krzysztof Piotr Oledzki
    Signed-off-by: Eric Dumazet
    Tested-by: Krzysztof Piotr Oledzki
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error
    triggered by IKE for example), hence this kind of route is always
    temporary and so we should check if a better route exists for next
    packets.
    Bug has been introduced by commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92.

    Signed-off-by: Jianzhao Wang
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Jianzhao Wang
     
  • Hi,
    Here is one more of these warnings and a patch below:

    Sep 5 23:52:33 del kernel: [46044.244833] ===================================================
    Sep 5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ]
    Sep 5 23:52:33 del kernel: [46044.277000] ---------------------------------------------------
    Sep 5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!
    Sep 5 23:52:33 del kernel: [46044.293627]
    Sep 5 23:52:33 del kernel: [46044.293632] other info that might help us debug this:
    Sep 5 23:52:33 del kernel: [46044.293634]
    Sep 5 23:52:33 del kernel: [46044.325333]
    Sep 5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0
    Sep 5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717:
    Sep 5 23:52:33 del kernel: [46044.357548] #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0xf/0x20
    Sep 5 23:52:33 del kernel: [46044.367647]
    Sep 5 23:52:33 del kernel: [46044.367652] stack backtrace:
    Sep 5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3
    Sep 5 23:52:33 del kernel: [46044.398764] Call Trace:
    Sep 5 23:52:33 del kernel: [46044.409596] [] ? printk+0x18/0x1e
    Sep 5 23:52:33 del kernel: [46044.420761] [] lockdep_rcu_dereference+0xa9/0xb0
    Sep 5 23:52:33 del kernel: [46044.432229] [] trie_firstleaf+0x65/0x70
    Sep 5 23:52:33 del kernel: [46044.443941] [] fib_table_flush+0x14/0x170
    Sep 5 23:52:33 del kernel: [46044.455823] [] ? local_bh_enable_ip+0x62/0xd0
    Sep 5 23:52:33 del kernel: [46044.467995] [] ? _raw_spin_unlock_bh+0x2f/0x40
    Sep 5 23:52:33 del kernel: [46044.480404] [] ? fib_sync_down_dev+0x120/0x180
    Sep 5 23:52:33 del kernel: [46044.493025] [] fib_flush+0x2d/0x60
    Sep 5 23:52:33 del kernel: [46044.505796] [] fib_disable_ip+0x25/0x50
    Sep 5 23:52:33 del kernel: [46044.518772] [] fib_netdev_event+0x73/0xd0
    Sep 5 23:52:33 del kernel: [46044.531918] [] notifier_call_chain+0x2d/0x70
    Sep 5 23:52:33 del kernel: [46044.545358] [] raw_notifier_call_chain+0x1a/0x20
    Sep 5 23:52:33 del kernel: [46044.559092] [] call_netdevice_notifiers+0x27/0x60
    Sep 5 23:52:33 del kernel: [46044.573037] [] __dev_notify_flags+0x5c/0x80
    Sep 5 23:52:33 del kernel: [46044.586489] [] dev_change_flags+0x37/0x60
    Sep 5 23:52:33 del kernel: [46044.599394] [] devinet_ioctl+0x54d/0x630
    Sep 5 23:52:33 del kernel: [46044.612277] [] inet_ioctl+0x97/0xc0
    Sep 5 23:52:34 del kernel: [46044.625208] [] sock_ioctl+0x6f/0x270
    Sep 5 23:52:34 del kernel: [46044.638046] [] ? handle_mm_fault+0x420/0x6c0
    Sep 5 23:52:34 del kernel: [46044.650968] [] ? sock_ioctl+0x0/0x270
    Sep 5 23:52:34 del kernel: [46044.663865] [] vfs_ioctl+0x28/0xa0
    Sep 5 23:52:34 del kernel: [46044.676556] [] do_vfs_ioctl+0x6a/0x5c0
    Sep 5 23:52:34 del kernel: [46044.688989] [] ? up_read+0x16/0x30
    Sep 5 23:52:34 del kernel: [46044.701411] [] ? do_page_fault+0x1d6/0x3a0
    Sep 5 23:52:34 del kernel: [46044.714223] [] ? fget_light+0xf8/0x2f0
    Sep 5 23:52:34 del kernel: [46044.726601] [] ? sys_socketcall+0x208/0x2c0
    Sep 5 23:52:34 del kernel: [46044.739140] [] sys_ioctl+0x63/0x70
    Sep 5 23:52:34 del kernel: [46044.751967] [] syscall_call+0x7/0xb
    Sep 5 23:52:34 del kernel: [46044.764734] [] ? cookie_v6_check+0x3d0/0x630

    -------------->

    This patch fixes the warning:
    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    1 lock held by pppd/1717:
    #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0xf/0x20

    stack backtrace:
    Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3
    Call Trace:
    [] ? printk+0x18/0x1e
    [] lockdep_rcu_dereference+0xa9/0xb0
    [] trie_firstleaf+0x65/0x70
    [] fib_table_flush+0x14/0x170
    ...

    Allow trie_firstleaf() to be called either under rcu_read_lock()
    protection or with RTNL held. The same annotation is added to
    node_parent_rcu() to prevent a similar warning a bit later.

    Followup of commits 634a4b20 and 4eaa0e3c.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • - Do not create expectation when forwarding the PORT
    command to avoid blocking the connection. The problem is that
    nf_conntrack_ftp.c:help() tries to create the same expectation later in
    POST_ROUTING and drops the packet with "dropping packet" message after
    failure in nf_ct_expect_related.

    - Change ip_vs_update_conntrack to alter the conntrack
    for related connections from real server. If we do not alter the reply in
    this direction the next packet from client sent to vport 20 comes as NEW
    connection. We alter it but may be some collision happens for both
    conntracks and the second conntrack gets destroyed immediately. The
    connection stucks too.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Julian Anastasov
     
  • The patch: "gro: fix different skb headrooms" in its part:
    "2) allocate a minimal skb for head of frag_list" is buggy. The copied
    skb has p->data set at the ip header at the moment, and skb_gro_offset
    is the length of ip + tcp headers. So, after the change the length of
    mac header is skipped. Later skb_set_mac_header() sets it into the
    NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
    NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
    original skb was wrongly allocated, so let's copy it as it was.

    bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
    fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57

    Reported-by: Plamen Petrov
    Signed-off-by: Jarek Poplawski
    CC: Eric Dumazet
    Acked-by: Eric Dumazet
    Tested-by: Plamen Petrov
    Signed-off-by: David S. Miller

    Jarek Poplawski
     

08 Sep, 2010

7 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
    pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator
    ipvs: avoid oops for passive FTP
    Revert "sky2: don't do GRO on second port"
    gro: fix different skb headrooms
    bridge: Clear INET control block of SKBs passed into ip_fragment().
    3c59x: Remove incorrect locking; correct documented lock hierarchy
    sky2: don't do GRO on second port
    ipv4: minor fix about RPF in help of Kconfig
    xfrm_user: avoid a warning with some compiler
    net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf()
    pxa168_eth: fix a mdiobus leak
    net sched: fix kernel leak in act_police
    vhost: stop worker only if created
    MAINTAINERS: Add ehea driver as Supported
    ath9k_hw: fix parsing of HT40 5 GHz CTLs
    ath9k_hw: Fix EEPROM uncompress block reading on AR9003
    wireless: register wiphy rfkill w/o holding cfg80211_mutex
    netlink: Make NETLINK_USERSOCK work again.
    irda: Correctly clean up self->ias_obj on irda_bind() failure.
    wireless extensions: fix kernel heap content leak
    ...

    Linus Torvalds
     
  • Actually iterate over the next-hops to make sure we have
    a device match. Otherwise RP filtering is always elided
    when the route matched has multiple next-hops.

    Reported-by: Igor M Podlesny
    Signed-off-by: David S. Miller

    David S. Miller
     
  • We assumed that unix_autobind() never fails if kzalloc() succeeded.
    But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
    larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
    consume all names using fork()/socket()/bind().

    If all names are in use, those who call bind() with addr_len == sizeof(short)
    or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

    while (1)
    yield();

    loop at unix_autobind() till a name becomes available.
    This patch adds a loop counter in order to give up after 1048576 attempts.

    Calling yield() for once per 256 attempts may not be sufficient when many names
    are already in use, for __unix_find_socket_byname() can take long time under
    such circumstance. Therefore, this patch also adds cond_resched() call.

    Note that currently a local user can consume 2GB of kernel memory if the user
    is allowed to create and autobind 1048576 UNIX domain sockets. We should
    consider adding some restriction for autobind operation.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: David S. Miller

    Tetsuo Handa
     
  • This is an off by one. We would go past the end when we NUL terminate
    the "value" string at end of the function. The "value" buffer is
    allocated in irlan_client_parse_response() or
    irlan_provider_parse_command().

    CC: stable@kernel.org
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • RFC5722 prohibits reassembling IPv6 fragments when some data overlaps.

    Bug spotted by Zhang Zuotao .

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • RFC5722 prohibits reassembling fragments when some data overlaps.

    Bug spotted by Zhang Zuotao .

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • When a net device is implementing the select_queue callback and is part of
    a bridge, frames coming from the bridge already have a tx queue associated
    to the socket (introduced in commit a4ee3ce3293dc931fab19beb472a8bde1295aebe,
    "net: Use sk_tx_queue_mapping for connected sockets"). The call to
    sk_tx_queue_get will then return the tx queue used by the bridge instead
    of calling the select_queue callback.

    In case of mac80211 this broke QoS which is implemented by using the
    select_queue callback. Furthermore it introduced problems with rt2x00
    because frames with the same TID and RA sometimes appeared on different
    tx queues which the hw cannot handle correctly.

    Fix this by always calling select_queue first if it is available and only
    afterwards use the socket tx queue mapping.

    Signed-off-by: Helmut Schaa
    Signed-off-by: David S. Miller

    Helmut Schaa
     

03 Sep, 2010

2 commits

  • This patch fixes a lockdep warning:

    [ 516.287584] =========================================================
    [ 516.288386] [ INFO: possible irq lock inversion dependency detected ]
    [ 516.288386] 2.6.35b #7
    [ 516.288386] ---------------------------------------------------------
    [ 516.288386] swapper/0 just changed the state of lock:
    [ 516.288386] (&qdisc_tx_lock){+.-...}, at: [] est_timer+0x62/0x1b4
    [ 516.288386] but this lock took another, SOFTIRQ-unsafe lock in the past:
    [ 516.288386] (est_tree_lock){+.+...}
    [ 516.288386]
    [ 516.288386] and interrupts could create inverse lock ordering between them.
    ...

    So, est_tree_lock needs BH protection because it's taken by
    qdisc_tx_lock, which is used both in BH and process contexts.
    (Full warning with this patch at netdev, 02 Sep 2010.)

    Fixes commit: ae638c47dc040b8def16d05dc6acdd527628f231
    ("pkt_sched: gen_estimator: add a new lock")

    Signed-off-by: Jarek Poplawski
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Fix Passive FTP problem in ip_vs_ftp:

    - Do not oops in nf_nat_set_seq_adjust (adjust_tcp_sequence) when
    iptable_nat module is not loaded

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Julian Anastasov
     

02 Sep, 2010

4 commits

  • Packets entering GRO might have different headrooms, even for a given
    flow (because of implementation details in drivers, like copybreak).
    We cant force drivers to deliver packets with a fixed headroom.

    1) fix skb_segment()

    skb_segment() makes the false assumption headrooms of fragments are same
    than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
    errors, and crash later in skb_copy_and_csum_dev()

    2) allocate a minimal skb for head of frag_list

    skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
    allocate a fresh skb. This adds NET_SKB_PAD to a padding already
    provided by netdevice, depending on various things, like copybreak.

    Use alloc_skb() to allocate an exact padding, to reduce cache line
    needs:
    NET_SKB_PAD + NET_IP_ALIGN

    bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626

    Many thanks to Plamen Petrov, testing many debugging patches !
    With help of Jarek Poplawski.

    Reported-by: Plamen Petrov
    Signed-off-by: Eric Dumazet
    CC: Jarek Poplawski
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In a similar vain to commit 17762060c25590bfddd68cc1131f28ec720f405f
    ("bridge: Clear IPCB before possible entry into IP stack")

    Any time we call into the IP stack we have to make sure the state
    there is as expected by the ipv4 code.

    With help from Eric Dumazet and Herbert Xu.

    Reported-by: Bandan Das
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • Attached is a small patch to remove a warning ("warning: ISO C90 forbids
    mixed declarations and code" with gcc 4.3.2).

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel