19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

01 Feb, 2013

1 commit


02 Nov, 2012

1 commit

  • #if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)

    can be replaced by

    #if IS_ENABLED(CONFIG_FOO)

    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     

03 Oct, 2012

1 commit

  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

26 Sep, 2012

1 commit

  • icmpv6_filter() should not modify its input, or else its caller
    would need to recompute ipv6_hdr() if skb->head is reallocated.

    Use skb_header_pointer() instead of pskb_may_pull() and
    change the prototype to make clear both sk and skb are const.

    Also, if icmpv6 header cannot be found, do not deliver the packet,
    as we do in IPv4.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Aug, 2012

1 commit


12 Jul, 2012

1 commit


20 Jun, 2012

1 commit

  • Don't pretend that inet_protos[] and inet6_protos[] are hashes, thay
    are just a straight arrays. Remove all unnecessary hash masking.

    Document MAX_INET_PROTOS.

    Use RAW_HTABLE_SIZE when appropriate.

    Reported-by: Ben Hutchings
    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jun, 2012

1 commit

  • One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
    to handle the error pass the 32-bit info cookie in network byte order
    whereas ipv4 passes it around in host byte order.

    Like the ipv4 side, we have two helper functions. One for when we
    have a socket context and one for when we do not.

    ip6ip6 tunnels are not handled here, because they handle PMTU events
    by essentially relaying another ICMP packet-too-big message back to
    the original sender.

    This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all
    kinds of situations that simply cannot happen when we do the PMTU
    update directly using a fully resolved route.

    In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
    likely be removed or changed into a BUG_ON() check. We should never
    have a prefixed ipv6 route when we get there.

    Another piece of strange history here is that TCP and DCCP, unlike in
    ipv4, never invoke the update_pmtu() method from their ICMP error
    handlers. This is incredibly astonishing since this is the context
    where we have the most accurate context in which to make a PMTU
    update, namely we have a fully connected socket and associated cached
    socket route.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 May, 2012

1 commit


09 Feb, 2012

1 commit


13 Jan, 2012

1 commit

  • commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
    RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
    complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
    y).

    We miss needed barriers, even on x86, when y is not NULL.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Nov, 2011

1 commit


19 Nov, 2011

1 commit

  • ipv6: Remove all uses of LL_ALLOCATED_SPACE

    The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
    alignment to the sum of needed_headroom and needed_tailroom. As
    the amount that is then reserved for head room is needed_headroom
    with alignment, this means that the tail room left may be too small.

    This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6
    with the macro LL_RESERVED_SPACE and direct reference to
    needed_tailroom.

    This also fixes the problem with needed_headroom changing between
    allocating the skb and reserving the head room.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

10 Nov, 2011

1 commit

  • Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :

    > At least, in recent kernels we dont change dst->refcnt in forwarding
    > patch (usinf NOREF skb->dst)
    >
    > One particular point is the atomic_inc(dst->refcnt) we have to perform
    > when queuing an UDP packet if socket asked PKTINFO stuff (for example a
    > typical DNS server has to setup this option)
    >
    > I have one patch somewhere that stores the information in skb->cb[] and
    > avoid the atomic_{inc|dec}(dst->refcnt).
    >

    OK I found it, I did some extra tests and believe its ready.

    [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference

    When a socket uses IP_PKTINFO notifications, we currently force a dst
    reference for each received skb. Reader has to access dst to get needed
    information (rt_iif & rt_spec_dst) and must release dst reference.

    We also forced a dst reference if skb was put in socket backlog, even
    without IP_PKTINFO handling. This happens under stress/load.

    We can instead store the needed information in skb->cb[], so that only
    softirq handler really access dst, improving cache hit ratios.

    This removes two atomic operations per packet, and false sharing as
    well.

    On a benchmark using a mono threaded receiver (doing only recvmsg()
    calls), I can reach 720.000 pps instead of 570.000 pps.

    IP_PKTINFO is typically used by DNS servers, and any multihomed aware
    UDP application.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Nov, 2011

1 commit


19 Oct, 2011

1 commit

  • ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
    On IPsec the effective mtu is lower because we need to add the protocol
    headers and trailers later when we do the IPsec transformations. So after
    the IPsec transformations the packet might be too big, which leads to a
    slowpath fragmentation then. This patch fixes this by building the packets
    based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
    handling to this.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

22 Sep, 2011

1 commit

  • Conflicts:
    MAINTAINERS
    drivers/net/Kconfig
    drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
    drivers/net/ethernet/broadcom/tg3.c
    drivers/net/wireless/iwlwifi/iwl-pci.c
    drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
    drivers/net/wireless/rt2x00/rt2800usb.c
    drivers/net/wireless/wl12xx/main.c

    David S. Miller
     

31 Aug, 2011

1 commit


12 Aug, 2011

1 commit


02 Aug, 2011

1 commit

  • When assigning a NULL value to an RCU protected pointer, no barrier
    is needed. The rcu_assign_pointer, used to handle that but will soon
    change to not handle the special case.

    Convert all rcu_assign_pointer of NULL value.

    //smpl
    @@ expression P; @@

    - rcu_assign_pointer(P, NULL)
    + RCU_INIT_POINTER(P, NULL)

    //

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

02 Jul, 2011

1 commit

  • Make the case labels the same indent as the switch.

    git diff -w shows 80 column reflowing,
    removal of a useless break after return, and moving
    open brace after case instead of separate line.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

24 May, 2011

1 commit

  • The %pK format specifier is designed to hide exposed kernel pointers,
    specifically via /proc interfaces. Exposing these pointers provides an
    easy target for kernel write vulnerabilities, since they reveal the
    locations of writable structures containing easily triggerable function
    pointers. The behavior of %pK depends on the kptr_restrict sysctl.

    If kptr_restrict is set to 0, no deviation from the standard %p behavior
    occurs. If kptr_restrict is set to 1, the default, if the current user
    (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
    (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
    If kptr_restrict is set to 2, kernel pointers using %pK are printed as
    0's regardless of privileges. Replacing with 0's was chosen over the
    default "(null)", which cannot be parsed by userland %p, which expects
    "(nil)".

    The supporting code for kptr_restrict and %pK are currently in the -mm
    tree. This patch converts users of %p in net/ to %pK. Cases of printing
    pointers to the syslog are not covered, since this would eliminate useful
    information for postmortem debugging and the reading of the syslog is
    already optionally protected by the dmesg_restrict sysctl.

    Signed-off-by: Dan Rosenberg
    Cc: James Morris
    Cc: Eric Dumazet
    Cc: Thomas Graf
    Cc: Eugene Teo
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Dan Rosenberg
     

07 May, 2011

1 commit

  • When we fast path datagram sends to avoid locking by putting
    the inet_cork on the stack we use up lots of space that isn't
    necessary.

    This is because inet_cork contains a "struct flowi" which isn't
    used in these code paths.

    Split inet_cork to two parts, "inet_cork" and "inet_cork_full".
    Only the latter of which has the "struct flowi" and is what is
    stored in inet_sock.

    Signed-off-by: David S. Miller
    Acked-by: Eric Dumazet

    David S. Miller
     

23 Apr, 2011

1 commit


13 Mar, 2011

4 commits


02 Mar, 2011

1 commit

  • Route lookups follow a general pattern in the ipv6 code wherein
    we first find the non-IPSEC route, potentially override the
    flow destination address due to ipv6 options settings, and then
    finally make an IPSEC search using either xfrm_lookup() or
    __xfrm_lookup().

    __xfrm_lookup() is used when we want to generate a blackhole route
    if the key manager needs to resolve the IPSEC rules (in this case
    -EREMOTE is returned and the original 'dst' is left unchanged).

    Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
    resolution is necessary, we simply fail the lookup completely.

    All of these cases are encapsulated into two routines,
    ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which
    handles unconnected UDP datagram sockets.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Feb, 2011

1 commit


04 Feb, 2011

1 commit


21 Jan, 2011

1 commit


26 Oct, 2010

1 commit


24 Sep, 2010

1 commit


11 Jun, 2010

1 commit


07 Jun, 2010

1 commit


02 Jun, 2010

1 commit

  • There are more than a dozen occurrences of following code in the
    IPv6 stack:

    if (opt && opt->srcrt) {
    struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
    ipv6_addr_copy(&final, &fl.fl6_dst);
    ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
    final_p = &final;
    }

    Replace those with a helper. Note that the helper overrides final_p
    in all cases. This is ok as final_p was previously initialized to
    NULL when declared.

    Signed-off-by: Arnaud Ebalard
    Signed-off-by: David S. Miller

    Arnaud Ebalard
     

11 May, 2010

1 commit