14 Jul, 2011
1 commit
-
Currently we flush tp_status and then flush the remainder of the header+payload.
tp_status should be flushed in the end to avoid stale data being read by user-space.Incorrectly re-ordered barriers in v1.
Signed-off-by: Chetan Loke
Signed-off-by: David S. Miller
07 Jul, 2011
2 commits
-
af_packet.c:(.text+0x3d130): undefined reference to `ip_defrag'
or
ERROR: "ip_defrag" [net/packet/af_packet.ko] undefined!Reported-by: Randy Dunlap
Signed-off-by: David S. Miller -
fanout_add() might return with fanout_mutex held.
Reduce indentation level while we are at it
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Jul, 2011
5 commits
-
When we clone the SKB, we forget about the original
one. Avoid this problem by using skb_share_check().Reported-by: Penttilä Mika
Signed-off-by: David S. Miller -
Unfortunately we have to use a real modulus here as
the multiply trick won't work as effectively with cpu
numbers as it does with rxhash values.Signed-off-by: David S. Miller
-
The skb->rxhash cannot be properly computed if the
packet is a fragment. To alleviate this, allow the
AF_PACKET client to ask for defragmentation to be
done at demux time.Signed-off-by: David S. Miller
-
Fanouts allow packet capturing to be demuxed to a set of AF_PACKET
sockets. Two fanout policies are implemented:1) Hashing based upon skb->rxhash
2) Pure round-robin
An AF_PACKET socket must be fully bound before it tries to add itself
to a fanout. All AF_PACKET sockets trying to join the same fanout
must all have the same bind settings.Fanouts are identified (within a network namespace) by a 16-bit ID.
The first socket to try to add itself to a fanout with a particular
ID, creates that fanout. When the last socket leaves the fanout
(which happens only when the socket is closed), that fanout is
destroyed.Signed-off-by: David S. Miller
-
Signed-off-by: David S. Miller
21 Jun, 2011
1 commit
-
Conflicts:
drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
drivers/net/wireless/rtlwifi/pci.c
net/netfilter/ipvs/ip_vs_core.c
12 Jun, 2011
1 commit
-
There's no need for the guest to validate the checksum if it have been
validated by host nics. So this patch introduces a new flag -
VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum
examing in guest. The backend (tap/macvtap) may set this flag when
met skbs with CHECKSUM_UNNECESSARY to save cpu utilization.No feature negotiation is needed as old driver just ignore this flag.
Iperf shows 12%-30% performance improvement for UDP traffic. For TCP,
when gro is on no difference as it produces skb with partial
checksum. But when gro is disabled, 20% or even higher improvement
could be measured by netperf.Signed-off-by: Jason Wang
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller
07 Jun, 2011
1 commit
-
In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
added a small information leak.Add padding field and make sure its zeroed before copy to user.
Signed-off-by: Eric Dumazet
CC: Patrick McHardy
Signed-off-by: David S. Miller
06 Jun, 2011
2 commits
-
This saves a network device lookup on each packet transmitted,
for sockets that are bound to a network device.Signed-off-by: Ben Greear
Signed-off-by: David S. Miller -
Old code was probably safe, but with this change we
can actually use the netdev object, not just compare
the pointer values.Signed-off-by: Ben Greear
Signed-off-by: David S. Miller
02 Jun, 2011
1 commit
-
Currently, user-space cannot determine if a 0 tcp_vlan_tci
means there is no VLAN tag or the VLAN ID was zero.Add flag to make this explicit. User-space can check for
TP_STATUS_VLAN_VALID || tp_vlan_tci > 0, which will be backwards
compatible. Older could would have just checked for tp_vlan_tci,
so it will work no worse than before.Signed-off-by: Ben Greear
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
24 May, 2011
1 commit
-
The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.Signed-off-by: Dan Rosenberg
Cc: James Morris
Cc: Eric Dumazet
Cc: Thomas Graf
Cc: Eugene Teo
Cc: Kees Cook
Cc: Ingo Molnar
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
28 Apr, 2011
1 commit
-
In order to speedup packet filtering, here is an implementation of a
JIT compiler for x86_64It is disabled by default, and must be enabled by the admin.
echo 1 >/proc/sys/net/core/bpf_jit_enable
It uses module_alloc() and module_free() to get memory in the 2GB text
kernel range since we call helpers functions from the generated code.EAX : BPF A accumulator
EBX : BPF X accumulator
RDI : pointer to skb (first argument given to JIT function)
RBP : frame pointer (even if CONFIG_FRAME_POINTER=n)
r9d : skb->len - skb->data_len (headlen)
r8 : skb->dataTo get a trace of generated code, use :
echo 2 >/proc/sys/net/core/bpf_jit_enable
Example of generated code :
# tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24
flen=18 proglen=147 pass=3 image=ffffffffa00b5000
JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60
JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00
JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00
JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be
JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0
JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00
JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00
JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24
JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31
JIT code: ffffffffa00b5090: c0 c9 c3BPF program is 144 bytes long, so native program is almost same size ;)
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 8
(002) ld [26]
(003) and #0xffffff00
(004) jeq #0xc0a81400 jt 16 jf 5
(005) ld [30]
(006) and #0xffffff00
(007) jeq #0xc0a81400 jt 16 jf 17
(008) jeq #0x806 jt 10 jf 9
(009) jeq #0x8035 jt 10 jf 17
(010) ld [28]
(011) and #0xffffff00
(012) jeq #0xc0a81400 jt 16 jf 13
(013) ld [38]
(014) and #0xffffff00
(015) jeq #0xc0a81400 jt 16 jf 17
(016) ret #65535
(017) ret #0Signed-off-by: Eric Dumazet
Cc: Arnaldo Carvalho de Melo
Cc: Ben Hutchings
Cc: Hagen Paul Pfeifer
Signed-off-by: David S. Miller
08 Mar, 2011
1 commit
-
Signed-off-by: Hagen Paul Pfeifer
Signed-off-by: David S. Miller
12 Feb, 2011
1 commit
-
This allows user-space to send a '1500' MTU VLAN packet on a
1500 MTU ethernet frame. The extra 4 bytes of a VLAN header is
not usually charged against the MTU when other parts of the
network stack is transmitting vlans...Signed-off-by: Ben Greear
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Jan, 2011
1 commit
-
Clean up some unused macros in net/*.
1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
2. never be used since introduced to kernel.
e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.Signed-off-by: Shan Wei
Acked-by: Sjur Braendeland
Signed-off-by: David S. Miller
19 Jan, 2011
1 commit
-
Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Dec, 2010
1 commit
-
Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().
Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).
Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller
11 Dec, 2010
1 commit
-
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
09 Dec, 2010
3 commits
-
It is introduced in:
commit 0e3125c755445664f00ad036e4fc2cd32fd52877
Author: Neil Horman
Date: Tue Nov 16 10:26:47 2010 -0800packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller -
Some arches don't need flush_dcache_page(), and don't implement it, so
we can eliminate pgv_to_page() calls on those arches.Signed-off-by: Changli Gao
Signed-off-by: David S. Miller -
sk_run_filter() doesnt write on skb, change its prototype to reflect
this.Fix two af_packet comments.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
07 Dec, 2010
2 commits
-
As we can check if an address is vmalloc address with is_vmalloc_addr(),
we remove pgv.flags. Then we may get more pg_vecs.Signed-off-by: Changli Gao
Signed-off-by: David S. Miller -
The following commit causes the pgv->buffer may point to the memory
returned by vmalloc(). And we can't use virt_to_page() for the vmalloc
address.This patch introduces a new inline function pgv_to_page(), which calls
vmalloc_to_page() for the vmalloc address, and virt_to_page() for the
__get_free_pages address.We used to increase page pointer to get the next page at the next page
address, after Neil's patch, it is wrong, as the physical address may
be not continuous. This patch also fixes this issue.commit 0e3125c755445664f00ad036e4fc2cd32fd52877
Author: Neil Horman
Date: Tue Nov 16 10:26:47 2010 -0800packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
22 Nov, 2010
1 commit
-
alloc_one_pg_vec_page() is supposed to return zeroed memory, so use
vzalloc() instead of vmalloc()Signed-off-by: Eric Dumazet
Cc: Neil Horman
Acked-by: Neil Horman
Signed-off-by: David S. Miller
20 Nov, 2010
1 commit
-
Remove pc variable to avoid arithmetic to compute fentry at each filter
instruction. Jumps directly manipulate fentry pointer.As the last instruction of filter[] is guaranteed to be a RETURN, and
all jumps are before the last instruction, we dont need to check filter
bounds (number of instructions in filter array) at each iteration, so we
remove it from sk_run_filter() params.On x86_32 remove f_k var introduced in commit 57fe93b374a6b871
(filter: make sure filters dont read uninitialized memory)Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
avoid too many ifdefs in this code.This helps compiler to use cpu registers to hold fentry and A
accumulator.On x86_32, this saves 401 bytes, and more important, sk_run_filter()
runs much faster because less register pressure (One less conditional
branch per BPF instruction)# size net/core/filter.o net/core/filter_pre.o
text data bss dec hex filename
2948 0 0 2948 b84 net/core/filter.o
3349 0 0 3349 d15 net/core/filter_pre.oon x86_64 :
# size net/core/filter.o net/core/filter_pre.o
text data bss dec hex filename
5173 0 0 5173 1435 net/core/filter.o
5224 0 0 5224 1468 net/core/filter_pre.oSigned-off-by: Eric Dumazet
Acked-by: Changli Gao
Signed-off-by: David S. Miller
17 Nov, 2010
1 commit
-
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bitVersion 4 of this patch.
Change notes:
1) Removed extra memset. Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run. The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be. It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation. Thats difficult under good circumstances, and horrid under memory
pressure.I thought it would be good to make that a bit more usable. I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option. It attempts to allocate memory in the following order:1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap2) Using vmalloc
3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memoryThe effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.Tested successfully by me.
Signed-off-by: Neil Horman
Acked-by: Eric Dumazet
Acked-by: Maciej Żenczykowski
Reported-by: Maciej Żenczykowski
Signed-off-by: David S. Miller
13 Nov, 2010
1 commit
-
Parameter 'len' is size_t type so it will never get negative.
Signed-off-by: Mariusz Kozlowski
Signed-off-by: David S. Miller
11 Nov, 2010
1 commit
-
packet_getname_spkt() doesn't initialize all members of sa_data field of
sockaddr struct if strlen(dev->name) < 13. This structure is then copied
to userland. It leads to leaking of contents of kernel stack memory.
We have to fully fill sa_data with strncpy() instead of strlcpy().The same with packet_getname(): it doesn't initialize sll_pkttype field of
sockaddr_ll. Set it to zero.Signed-off-by: Vasiliy Kulikov
Signed-off-by: David S. Miller
19 Aug, 2010
1 commit
-
This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().http://marc.info/?l=linux-netdev&m=128084897415886&w=2
Signed-off-by: Oliver Hartkopp
Signed-off-by: David S. Miller
02 Jun, 2010
1 commit
-
This patch adds a setting, PACKET_TIMESTAMP, to specify the packet
timestamp source that is exported to capture utilities like tcpdump by
packet_mmap.PACKET_TIMESTAMP accepts the same integer bit field as
SO_TIMESTAMPING. However, only the SOF_TIMESTAMPING_SYS_HARDWARE and
SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by
PACKET_TIMESTAMP. SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.If PACKET_TIMESTAMP is not set, a software timestamp generated inside
the networking stack is used (the behavior before this setting was
added).Signed-off-by: Scott McMillan
Signed-off-by: David S. Miller
21 Apr, 2010
1 commit
-
Conflicts:
drivers/net/wireless/iwlwifi/iwl-6000.c
net/core/dev.c
17 Apr, 2010
1 commit
-
The af_packet protocol is used by Perl to do ioctls as reported by
Stephane Riviere:"Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC
addresses of the network interface."But in a new network namespace these ioctl fail because it is disabled for
a namespace different from the init_net_ns.These two lines should not be there as af_inet and af_packet are
namespace aware since a long time now. I suppose we forget to remove these
lines because we sent the af_packet first, before af_inet was supported.Signed-off-by: Daniel Lezcano
Reported-by: Stephane Riviere
Signed-off-by: David S. Miller
13 Apr, 2010
1 commit
-
Enable the SO_TIMESTAMPING socket infrastructure for raw packet sockets.
We introduce PACKET_TX_TIMESTAMP for the control message cmsg_type.Similar support for UDP and CAN sockets was added in commit
51f31cabe3ce5345b51e4a4f82138b38c4d5dc91Signed-off-by: Richard Cochran
Signed-off-by: David S. Miller
12 Apr, 2010
1 commit
-
Conflicts:
drivers/net/stmmac/stmmac_main.c
drivers/net/wireless/wl12xx/wl1271_cmd.c
drivers/net/wireless/wl12xx/wl1271_main.c
drivers/net/wireless/wl12xx/wl1271_spi.c
net/core/ethtool.c
net/mac80211/scan.c
04 Apr, 2010
2 commits
-
Converts the list and the core manipulating with it to be the same as uc_list.
+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller -
+little renaming of unicast functions to be smooth with multicast ones
Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller