22 Apr, 2009

1 commit

  • In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
    does not agree with that specified by DEFINE_PER_CPU(). This means that
    architectures that have a small data section references relative to a base
    register may throw up linkage errors due to too great a displacement between
    where the base register points and the per-CPU variable.

    On FRV, the .h declaration says that the variable is in the .sdata section, but
    the .c definition says it's actually in the .data section. The linker throws
    up the following errors:

    kernel/built-in.o: In function `release_task':
    kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
    kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o

    To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
    as does DEFINE_PER_CPU(). However, this is made slightly more complex by
    virtue of the fact that there are several variants on DEFINE, so these need to
    be matched by variants on DECLARE.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

15 Apr, 2009

3 commits

  • Latest tcpdump/libpcap triggers annoying messages because of high order page
    allocation failures (when lowmem exhausted or fragmented)

    These allocation errors are correctly handled so could be silent.

    [22660.208901] tcpdump: page allocation failure. order:5, mode:0xc0d0
    [22660.208921] Pid: 13866, comm: tcpdump Not tainted 2.6.30-rc2 #170
    [22660.208936] Call Trace:
    [22660.208950] [] ? printk+0x18/0x1a
    [22660.208965] [] __alloc_pages_internal+0x357/0x460
    [22660.208980] [] __get_free_pages+0x21/0x40
    [22660.208995] [] packet_set_ring+0x105/0x3d0
    [22660.209009] [] packet_setsockopt+0x21d/0x4d0
    [22660.209025] [] ? filemap_fault+0x0/0x450
    [22660.209040] [] sys_setsockopt+0x54/0xa0
    [22660.209053] [] sys_socketcall+0xef/0x270
    [22660.209067] [] sysenter_do_call+0x12/0x26

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This reverts commit 244f46ae6e9e18f6fc0be7d1f49febde4762c34b.

    Alan Cox did the research, and just like the other radio protocols
    zero-length frames have meaning because at the top level ROSE is
    X.25 PLP.

    So this zero-length filtering is invalid.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Since everybody has been focusing on baremetal GRO performance
    no one noticed when I added a bug that zapped gso_size for all
    GRO packets. This only gets picked up when you forward the skb
    out of an interface.

    Thanks to Mark Wagner for noticing this bug when testing kvm.

    Reported-by: Mark Wagner
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

14 Apr, 2009

4 commits

  • After switch (rthdr->type) {...},the check below is completely useless.Because:
    if the type is 2,then hdrlen must be 2 and segments_left must be 1,clearly the
    check is redundant;if the type is not 2,then goto sticky_done,the check is useless
    too.

    Signed-off-by: Yang Hongyang
    Reviewed-by: Shan Wei
    Signed-off-by: David S. Miller

    Yang Hongyang
     
  • A long-standing feature in tcp_init_metrics() is such that
    any of its goto reset prevents call to tcp_init_cwnd().

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • When vlan acceleration is used on receive, the vlan tag is maintained
    outside of the skb data. The existing vlan tag match only works on TX
    path because it uses vlan_get_tag which tests for VLAN_HW_TX_ACCEL.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Hi:

    gro: Normalise skb before bypassing GRO on netpoll VLAN path

    When we detect netpoll RX on the GRO VLAN path we bail out and
    call the normal VLAN receive handler. However, the packet needs
    to be normalised by calling eth_type_trans since that's what the
    normal path expects (normally the GRO path does the fixup).

    This patch adds the necessary call to vlan_gro_frags.

    Signed-off-by: Herbert Xu

    Thanks,
    Signed-off-by: David S. Miller

    Herbert Xu
     

11 Apr, 2009

3 commits

  • Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6
    (ipv6: Fix conflict resolutions during ipv6 binding)
    introduced a regression where time-wait sockets were
    not treated correctly. This resulted in the following:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
    IP: [] ipv4_rcv_saddr_equal+0x61/0x70
    ...
    Call Trace:
    [] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
    [] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
    [] inet_csk_get_port+0x1ee/0x400
    [] inet6_bind+0x1cf/0x3a0 [ipv6]
    [] ? sockfd_lookup_light+0x3c/0xd0
    [] sys_bind+0x89/0x100
    [] ? trace_hardirqs_on_thunk+0x3a/0x3c
    [] system_call_fastpath+0x16/0x1b

    Tested-by: Brian Haley
    Tested-by: Ed Tomlinson
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Add dev_put() after dev_get_by_index() to avoid leakage
    of device.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Currently netif_device_attach/detach are only stopping one queue. They
    should be starting and stopping all the queues on a given device.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     


07 Apr, 2009

4 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    b44: Use kernel DMA addresses for the kernel DMA API
    forcedeth: Fix resume from hibernation regression.
    xfrm: fix fragmentation on inter family tunnels
    ibm_newemac: Fix dangerous struct assumption
    gigaset: documentation update
    gigaset: in file ops, check for device disconnect before anything else
    bas_gigaset: use tasklet_hi_schedule for timing critical tasklets
    net/802/fddi.c: add MODULE_LICENSE
    smsc911x: remove unused #include
    axnet_cs: fix phy_id detection for bogus Asix chip.
    bnx2: Use request_firmware()
    b44: Fix sizes passed to b44_sync_dma_desc_for_{device,cpu}()
    socket: use percpu_add() while updating sockets_in_use
    virtio_net: Set the mac config only when VIRITO_NET_F_MAC
    myri_sbus: use request_firmware
    e1000: fix loss of multicast packets
    vxge: should include tcp.h

    Conflict in firmware/WHENCE (SCSI vs net firmware)

    Linus Torvalds
     
  • If an ipv4 packet (not locally generated with IP_DF flag not set) bigger
    than mtu size is supposed to go via a xfrm ipv6 tunnel, the packetsize
    check in xfrm4_tunnel_check_size() is omited and ipv6 drops the packet
    without sending a notice to the original sender of the ipv4 packet.

    Another issue is that ipv4 connection tracking does reassembling of
    incomming fragmented packets. If such a reassembled packet is supposed to
    go via a xfrm ipv6 tunnel it will be droped, even if the original sender
    did proper fragmentation.

    According to RFC 2473 (section 7) tunnel ipv6 packets resulting from the
    encapsulation of an original packet are considered as locally generated
    packets. If such a packet passed the checks in xfrm{4,6}_tunnel_check_size()
    fragmentation is allowed according to RFC 2473 (section 7.1/7.2).

    This patch sets skb->local_df in xfrm6_prepare_output() to achieve
    fragmentation in this case.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • This patch adds the missing MODULE_LICENSE("GPL").

    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Adrian Bunk
     
  • * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
    nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
    nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
    nfsd41: Documentation/filesystems/nfs41-server.txt
    nfsd41: CREATE_EXCLUSIVE4_1
    nfsd41: SUPPATTR_EXCLCREAT attribute
    nfsd41: support for 3-word long attribute bitmask
    nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
    nfsd41: pass writable attrs mask to nfsd4_decode_fattr
    nfsd41: provide support for minor version 1 at rpc level
    nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
    nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
    nfsd41: access_valid
    nfsd41: clientid handling
    nfsd41: check encode size for sessions maxresponse cached
    nfsd41: stateid handling
    nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
    nfsd41: destroy_session operation
    nfsd41: non-page DRC for solo sequence responses
    nfsd41: Add a create session replay cache
    nfsd41: create_session operation
    ...

    Linus Torvalds
     

06 Apr, 2009

4 commits

  • This patch fixes a regression (introduced by myself in commit 19abb7b:
    netfilter: ctnetlink: deliver events for conntracks changed from
    userspace) that results in an expectation re-insertion since
    __nf_ct_expect_check() may return 0 for expectation timer refreshing.

    This patch also removes a unnecessary refcount bump that
    pretended to avoid a possible race condition with event delivery
    and expectation timers (as said, not needed since we hold a
    reference to the object since until we finish the expectation
    setup). This also merges nf_ct_expect_related_report() and
    nf_ct_expect_related() which look basically the same.

    Reported-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Patrick McHardy

    Pablo Neira Ayuso
     
  • It's plural, not LED_TRIGGERS.

    Signed-off-by: Alex Riesen
    Signed-off-by: Patrick McHardy

    Alex Riesen
     
  • Commit 7845447 (netfilter: iptables: lock free counters) broke
    ip6_tables by unconditionally returning ENOMEM in alloc_counters(),

    Reported-by: Graham Murray
    Signed-off-by: Eric Dumazet
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits)
    cpumask: remove cpumask allocation from idle_balance, fix
    numa, cpumask: move numa_node_id default implementation to topology.h, fix
    cpumask: remove cpumask allocation from idle_balance
    x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus
    x86: cpumask: update 32-bit APM not to mug current->cpus_allowed
    x86: microcode: cleanup
    x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c
    cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash
    numa, cpumask: move numa_node_id default implementation to topology.h
    cpumask: convert node_to_cpumask_map[] to cpumask_var_t
    cpumask: remove x86 cpumask_t uses.
    cpumask: use cpumask_var_t in uv_flush_tlb_others.
    cpumask: remove cpumask_t assignment from vector_allocation_domain()
    cpumask: make Xen use the new operators.
    cpumask: clean up summit's send_IPI functions
    cpumask: use new cpumask functions throughout x86
    x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask
    cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t
    cpumask: convert node_to_cpumask_map[] to cpumask_var_t
    x86: unify 32 and 64-bit node_to_cpumask_map
    ...

    Linus Torvalds
     

05 Apr, 2009

1 commit

  • sock_alloc() currently uses following code to update sockets_in_use

    get_cpu_var(sockets_in_use)++;
    put_cpu_var(sockets_in_use);

    This translates to :

    c0436274: b8 01 00 00 00 mov $0x1,%eax
    c0436279: e8 42 40 df ff call c022a2c0
    c043627e: bb 20 4f 6a c0 mov $0xc06a4f20,%ebx
    c0436283: e8 18 ca f0 ff call c0342ca0
    c0436288: 03 1c 85 60 4a 65 c0 add -0x3f9ab5a0(,%eax,4),%ebx
    c043628f: ff 03 incl (%ebx)
    c0436291: b8 01 00 00 00 mov $0x1,%eax
    c0436296: e8 75 3f df ff call c022a210
    c043629b: 89 e0 mov %esp,%eax
    c043629d: 25 00 e0 ff ff and $0xffffe000,%eax
    c04362a2: f6 40 08 08 testb $0x8,0x8(%eax)
    c04362a6: 75 07 jne c04362af
    c04362a8: 8d 46 d8 lea -0x28(%esi),%eax
    c04362ab: 5b pop %ebx
    c04362ac: 5e pop %esi
    c04362ad: c9 leave
    c04362ae: c3 ret
    c04362af: e8 cc 5d 09 00 call c04cc080
    c04362b4: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
    c04362b8: eb ee jmp c04362a8

    While percpu_add(sockets_in_use, 1) translates to a single instruction :

    c0436275: 64 83 05 20 5f 6a c0 addl $0x1,%fs:0xc06a5f20

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Apr, 2009

2 commits

  • On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
    returned. It is up to the NFSv4.1 client to resend only the operations that
    have not been processed.

    Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
    nfsd4_proc_compound() only when NFSv4.1 Sessions are used.

    Note: this isn't an adequate solution on its own. It's acceptable as a way
    to get some minimal 4.1 up and working, but we're going to have to find a
    way to avoid returning DELAY in all common cases before 4.1 can really be
    considered ready.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfsd41: reverse rq_nodeferral negative logic]
    Signed-off-by: Benny Halevy
    [sunrpc: initialize rq_usedeferral]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: J. Bruce Fields

    Andy Adamson
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
    trivial: Update my email address
    trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
    trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
    trivial: Fix misspelling of "Celsius".
    trivial: remove unused variable 'path' in alloc_file()
    trivial: fix a pdlfush -> pdflush typo in comment
    trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
    trivial: wusb: Storage class should be before const qualifier
    trivial: drivers/char/bsr.c: Storage class should be before const qualifier
    trivial: h8300: Storage class should be before const qualifier
    trivial: fix where cgroup documentation is not correctly referred to
    trivial: Give the right path in Documentation example
    trivial: MTD: remove EOL from MODULE_DESCRIPTION
    trivial: Fix typo in bio_split()'s documentation
    trivial: PWM: fix of #endif comment
    trivial: fix typos/grammar errors in Kconfig texts
    trivial: Fix misspelling of firmware
    trivial: cgroups: documentation typo and spelling corrections
    trivial: Update contact info for Jochen Hein
    trivial: fix typo "resgister" -> "register"
    ...

    Linus Torvalds
     

03 Apr, 2009

4 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    Remove two unneeded exports and make two symbols static in fs/mpage.c
    Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
    Trim includes of fdtable.h
    Don't crap into descriptor table in binfmt_som
    Trim includes in binfmt_elf
    Don't mess with descriptor table in load_elf_binary()
    Get rid of indirect include of fs_struct.h
    New helper - current_umask()
    check_unsafe_exec() doesn't care about signal handlers sharing
    New locking/refcounting for fs_struct
    Take fs_struct handling to new file (fs/fs_struct.c)
    Get rid of bumping fs_struct refcount in pivot_root(2)
    Kill unsharing fs_struct in __set_personality()

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (54 commits)
    glge: remove unused #include
    dnet: remove unused #include
    tcp: miscounts due to tcp_fragment pcount reset
    tcp: add helper for counter tweaking due mid-wq change
    hso: fix for the 'invalid frame length' messages
    hso: fix for crash when unplugging the device
    fsl_pq_mdio: Fix compile failure
    fsl_pq_mdio: Revive UCC MDIO support
    ucc_geth: Pass proper device to DMA routines, otherwise oops happens
    i.MX31: Fixing cs89x0 network building to i.MX31ADS
    tc35815: Fix build error if NAPI enabled
    hso: add Vendor/Product ID's for new devices
    ucc_geth: Remove unused header
    gianfar: Remove unused header
    kaweth: Fix locking to be SMP-safe
    net: allow multiple dev per napi with GRO
    r8169: reset IntrStatus after chip reset
    ixgbe: Fix potential memory leak/driver panic issue while setting up Tx & Rx ring parameters
    ixgbe: fix ethtool -A|a behavior
    ixgbe: Patch to fix driver panic while freeing up tx & rx resources
    ...

    Linus Torvalds
     
  • It seems that trivial reset of pcount to one was not sufficient
    in tcp_retransmit_skb. Multiple counters experience a positive
    miscount when skb's pcount gets lowered without the necessary
    adjustments (depending on skb's sacked bits which exactly), at
    worst a packets_out miscount can crash at RTO if the write queue
    is empty!

    Triggering this requires mss change, so bidir tcp or mtu probe or
    like.

    Signed-off-by: Ilpo Järvinen
    Reported-by: Markus Trippelsdorf
    Tested-by: Uwe Bugla
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     
  • We need full-scale adjustment to fix a TCP miscount in the next
    patch, so just move it into a helper and call for that from the
    other places.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

02 Apr, 2009

7 commits

  • GRO assumes that there is a one-to-one relationship between NAPI
    structure and network device. Some devices like sky2 share multiple
    devices on a single interrupt so only have one NAPI handler. Rather than
    split GRO from NAPI, just have GRO assume if device changes that
    it is a different flow.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Commit 784544739a25c30637397ace5489eeb6e15d7d49
    (netfilter: iptables: lock free counters) forgot to disable BH
    in arpt_do_table(), ipt_do_table() and ip6t_do_table()

    Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.

    Reported-and-bisected-by: Roman Mindalev
    Signed-off-by: Eric Dumazet
    Acked-by: Patrick McHardy
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We have a 64bit value that needs to be set atomically.
    This is easy and quick on all 64bit archs, and can also be done
    on x86/32 with set_64bit() (uses cmpxchg8b). However other
    32b archs don't have this.

    I actually changed this to the current state in preparation for
    mainline because the old way (using a spinlock on 32b) resulted in
    unsightly #ifdefs in the code. But obviously, being correct takes
    precedence.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • This fixes a bug where a connection was unexpectedly
    not on *any* list while being destroyed. It also
    cleans up some code duplication and regularizes some
    function names.

    * Grab appropriate lock in conn_free() and explain in comment
    * Ensure via locking that a conn is never not on either
    a dev's list or the nodev list
    * Add rds_xx_remove_conn() to match rds_xx_add_conn()
    * Make rds_xx_add_conn() return void
    * Rename remove_{,nodev_}conns() to
    destroy_{,nodev_}conns() and unify their implementation
    in a helper function
    * Document lock ordering as nodev conn_lock before
    dev_conn_lock

    Reported-by: Yosef Etigin
    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • rs_send_drop_to() is called during socket close. If it takes
    m_rs_lock without disabling interrupts, then
    rds_send_remove_from_sock() can run from the rx completion
    handler and thus deadlock.

    Signed-off-by: Andy Grover
    Signed-off-by: David S. Miller

    Andy Grover
     
  • Trond Myklebust
     
  • Also ensure that we use the protocol family instead of the address
    family when calling sock_create_kern().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

01 Apr, 2009

6 commits