14 Oct, 2013
1 commit
-
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ]
If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.Signed-off-by: Ansis Atteka
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
14 Sep, 2013
1 commit
-
[ Upstream commit c27c9322d015dc1d9dfdf31724fca71c0476c4d1 ]
ipv4: raw_sendmsg: don't use header's destination address
A sendto() regression was bisected and found to start with commit
f8126f1d5136be1 (ipv4: Adjust semantics of rt->rt_gateway.)The problem is that it tries to ARP-lookup the constructed packet's
destination address rather than the explicitly provided address.Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.
cf. commit 2ad5b9e4bd314fc685086b99e90e5de3bc59e26b
Reported-by: Chris Clark
Bisected-by: Chris Clark
Tested-by: Chris Clark
Suggested-by: Julian Anastasov
Signed-off-by: Chris Clark
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman
28 Feb, 2013
1 commit
-
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;type T;
expression a,c,d,e;
identifier b;
statement S;
@@-T b;
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Feb, 2013
2 commits
-
proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.Signed-off-by: Gao feng
Signed-off-by: David S. Miller -
Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.Signed-off-by: Gao feng
Signed-off-by: David S. Miller
22 Jan, 2013
1 commit
-
This implements a socket release callback function to check
if the socket cached route got invalid during the time
we owned the socket. The function is used from udp, raw
and ping sockets.Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller
03 Oct, 2012
1 commit
-
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
23 Sep, 2012
1 commit
-
icmp_filter() should not modify its input, or else its caller
would need to recompute ip_hdr() if skb->head is reallocated.Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
15 Aug, 2012
1 commit
-
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Cc: Arnaldo Carvalho de Melo
Cc: Sridhar Samudrala
Acked-by: Vlad Yasevich
Acked-by: David S. Miller
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
12 Jul, 2012
1 commit
-
Signed-off-by: David S. Miller
15 Jun, 2012
1 commit
-
With ip_rt_frag_needed() removed, we have to explicitly update PMTU
information in every ICMP error handler.Create two helper functions to facilitate this.
1) ipv4_sk_update_pmtu()
This updates the PMTU when we have a socket context to
work with.2) ipv4_update_pmtu()
Raw version, used when no socket context is available. For this
interface, we essentially just pass in explicit arguments for
the flow identity information we would have extracted from the
socket.And you'll notice that ipv4_sk_update_pmtu() is simply implemented
in terms of ipv4_update_pmtu()Note that __ip_route_output_key() is used, rather than something like
ip_route_output_flow() or ip_route_output_key(). This is because we
absolutely do not want to end up with a route that does IPSEC
encapsulation and the like. Instead, we only want the route that
would get us to the node described by the outermost IP header.Reported-by: Steffen Klassert
Signed-off-by: David S. Miller
16 Apr, 2012
1 commit
-
Fix checkpatch errors of the following type:
* ERROR: "foo * bar" should be "foo *bar"
* ERROR: "(foo*)" should be "(foo *)"Signed-off-by: Daniel Baluta
Signed-off-by: David S. Miller
12 Mar, 2012
1 commit
-
Use a more current kernel messaging style.
Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.Some messages that were prefixed with _close are
now prefixed with _fini. Some ah4 and esp messages
are now not prefixed with "ip ".The intent of this patch is to later add something like
#define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.Text size is trivially reduced. (x86-32 allyesconfig)
$ size net/ipv4/built-in.o*
text data bss dec hex filename
887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new
887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.oldSigned-off-by: Joe Perches
Signed-off-by: David S. Miller
09 Feb, 2012
1 commit
-
The IP_UNICAST_IF feature is needed by the Wine project. This patch
implements the feature by setting the outgoing interface in a similar
fashion to that of IP_MULTICAST_IF. A separate option is needed to
handle this feature since the existing options do not provide all of
the characteristics required by IP_UNICAST_IF, a summary is provided
below.SO_BINDTODEVICE:
* SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF
does not. From reading some old mailing list articles my
understanding is that SO_BINDTODEVICE requires administrative
privileges because it can override the administrator's routing
settings.
* The SO_BINDTODEVICE option restricts both outbound and inbound
traffic, IP_UNICAST_IF only impacts outbound traffic.IP_PKTINFO:
* Since IP_PKTINFO and IP_UNICAST_IF are independent options,
implementing IP_UNICAST_IF with IP_PKTINFO will likely break some
applications.
* Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly
complicates the Wine codebase and reduces the socket performance
(doing this requires a lot of extra communication between the
"server" and "user" layers).bind():
* bind() does not work on broadcast packets, IP_UNICAST_IF is
specifically intended to work with broadcast packets.
* Like SO_BINDTODEVICE, bind() restricts both outbound and inbound
traffic.Signed-off-by: Erich E. Hoover
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Nov, 2011
1 commit
-
ipv4: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv4
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
10 Nov, 2011
1 commit
-
Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :
> At least, in recent kernels we dont change dst->refcnt in forwarding
> patch (usinf NOREF skb->dst)
>
> One particular point is the atomic_inc(dst->refcnt) we have to perform
> when queuing an UDP packet if socket asked PKTINFO stuff (for example a
> typical DNS server has to setup this option)
>
> I have one patch somewhere that stores the information in skb->cb[] and
> avoid the atomic_{inc|dec}(dst->refcnt).
>OK I found it, I did some extra tests and believe its ready.
[PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference
When a socket uses IP_PKTINFO notifications, we currently force a dst
reference for each received skb. Reader has to access dst to get needed
information (rt_iif & rt_spec_dst) and must release dst reference.We also forced a dst reference if skb was put in socket backlog, even
without IP_PKTINFO handling. This happens under stress/load.We can instead store the needed information in skb->cb[], so that only
softirq handler really access dst, improving cache hit ratios.This removes two atomic operations per packet, and false sharing as
well.On a benchmark using a mono threaded receiver (doing only recvmsg()
calls), I can reach 720.000 pps instead of 570.000 pps.IP_PKTINFO is typically used by DNS servers, and any multihomed aware
UDP application.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Nov, 2011
1 commit
-
These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.Signed-off-by: Paul Gortmaker
08 Aug, 2011
1 commit
-
The raw sockets can provide source address for
routing but their privileges are not considered. We
can provide non-local source address, make sure the
FLOWI_FLAG_ANYSRC flag is set if socket has privileges
for this, i.e. based on hdrincl (IP_HDRINCL) and
transparent flags.Signed-off-by: Julian Anastasov
Signed-off-by: David S. Miller
27 Jul, 2011
1 commit
-
This allows us to move duplicated code in
(atomic_inc_not_zero() for now) toSigned-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Jul, 2011
1 commit
-
Make the case labels the same indent as the switch.
git diff -w shows no difference.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller
24 May, 2011
1 commit
-
The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.Signed-off-by: Dan Rosenberg
Cc: James Morris
Cc: Eric Dumazet
Cc: Thomas Graf
Cc: Eugene Teo
Cc: Kees Cook
Cc: Ingo Molnar
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
09 May, 2011
2 commits
-
This way rt->rt_dst accesses are unnecessary.
Signed-off-by: David S. Miller
-
This way ip_output.c no longer needs rt->rt_{src,dst}.
We already have these keys sitting, ready and waiting, on the stack or
in a socket structure.Signed-off-by: David S. Miller
29 Apr, 2011
1 commit
-
We lack proper synchronization to manipulate inet->opt ip_options
Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.Another thread can change inet->opt pointer and free old one under us.
Use RCU to protect inet->opt (changed to inet->inet_opt).
Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.Signed-off-by: Eric Dumazet
Cc: Herbert Xu
Signed-off-by: David S. Miller
23 Apr, 2011
1 commit
-
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
12 Apr, 2011
1 commit
-
Conflicts:
drivers/net/smsc911x.c
31 Mar, 2011
2 commits
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
-
Signed-off-by: David S. Miller
29 Mar, 2011
1 commit
-
Reported-by: Marc Kleine-Budde
Signed-off-by: David S. Miller
13 Mar, 2011
4 commits
-
Signed-off-by: David S. Miller
-
Signed-off-by: David S. Miller
-
Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*This will let us to create AF optimal flow instances.
It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.Signed-off-by: David S. Miller
-
I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.This is the first step to move in that direction.
Signed-off-by: David S. Miller
03 Mar, 2011
1 commit
-
Instead of on the stack.
Signed-off-by: David S. Miller
02 Mar, 2011
3 commits
-
This boolean state is now available in the flow flags.
Signed-off-by: David S. Miller
-
And set is in contexts where the route resolution can sleep.
Signed-off-by: David S. Miller
-
Since that is what the current vague "flags" argument means.
Signed-off-by: David S. Miller
30 Jan, 2011
1 commit
-
SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient. A trivial compact ioctl implementation would conflict
with:SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIESTo make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.Reported-by: Bill Fenner
Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller
18 Nov, 2010
1 commit
-
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
19 Aug, 2010
1 commit
-
This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().http://marc.info/?l=linux-netdev&m=128084897415886&w=2
Signed-off-by: Oliver Hartkopp
Signed-off-by: David S. Miller