03 Oct, 2012

2 commits

  • Pull networking changes from David Miller:

    1) GRE now works over ipv6, from Dmitry Kozlov.

    2) Make SCTP more network namespace aware, from Eric Biederman.

    3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

    4) Make openvswitch network namespace aware, from Pravin B Shelar.

    5) IPV6 NAT implementation, from Patrick McHardy.

    6) Server side support for TCP Fast Open, from Jerry Chu and others.

    7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

    8) Increate the loopback default MTU to 64K, from Eric Dumazet.

    9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic. This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

    10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail. Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

    11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

    12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

    13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

    Fix up various fairly trivial conflicts, mostly caused by the user
    namespace changes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
    hyperv: Add buffer for extended info after the RNDIS response message.
    hyperv: Report actual status in receive completion packet
    hyperv: Remove extra allocated space for recv_pkt_list elements
    hyperv: Fix page buffer handling in rndis_filter_send_request()
    hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
    hyperv: Fix the max_xfer_size in RNDIS initialization
    vxlan: put UDP socket in correct namespace
    vxlan: Depend on CONFIG_INET
    sfc: Fix the reported priorities of different filter types
    sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
    sfc: Fix loopback self-test with separate_tx_channels=1
    sfc: Fix MCDI structure field lookup
    sfc: Add parentheses around use of bitfield macro arguments
    sfc: Fix null function pointer in efx_sriov_channel_type
    vxlan: virtual extensible lan
    igmp: export symbol ip_mc_leave_group
    netlink: add attributes to fdb interface
    tg3: unconditionally select HWMON support when tg3 is enabled.
    Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
    gre: fix sparse warning
    ...

    Linus Torvalds
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

18 Sep, 2012

1 commit

  • Always store audit loginuids in type kuid_t.

    Print loginuids by converting them into uids in the appropriate user
    namespace, and then printing the resulting uid.

    Modify audit_get_loginuid to return a kuid_t.

    Modify audit_set_loginuid to take a kuid_t.

    Modify /proc//loginuid on read to convert the loginuid into the
    user namespace of the opener of the file.

    Modify /proc//loginud on write to convert the loginuid
    rom the user namespace of the opener of the file.

    Cc: Al Viro
    Cc: Eric Paris
    Cc: Paul Moore ?
    Cc: David Miller
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

15 Sep, 2012

1 commit

  • Conflicts:
    net/netfilter/nfnetlink_log.c
    net/netfilter/xt_LOG.c

    Rather easy conflict resolution, the 'net' tree had bug fixes to make
    sure we checked if a socket is a time-wait one or not and elide the
    logging code if so.

    Whereas on the 'net-next' side we are calculating the UID and GID from
    the creds using different interfaces due to the user namespace changes
    from Eric Biederman.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

05 Sep, 2012

1 commit

  • ESN for esp is defined in RFC 4303. This RFC assumes that the
    sequence number counters are always up to date. However,
    this is not true if an async crypto algorithm is employed.

    If the sequence number counters are not up to date on sequence
    number check, we may incorrectly update the upper 32 bit of
    the sequence number. This leads to a DOS.

    We workaround this by comparing the upper sequence number,
    (used for authentication) with the upper sequence number
    computed after the async processing. We drop the packet
    if these numbers are different.

    To do this, we introduce a recheck function that does this
    check in the ESN case.

    Signed-off-by: Steffen Klassert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Steffen Klassert
     

23 Aug, 2012

1 commit


20 Aug, 2012

1 commit

  • Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
    a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
    initialized, causing a false positive result from inetpeer_ptr_is_peer(),
    which in turn causes a NULL pointer dereference in inet_putpeer().

    Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
    EIP: 0060:[] EFLAGS: 00010246 CPU: 0
    EIP is at inet_putpeer+0xe/0x16
    EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
    ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
    DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
    CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff0ff0 DR7: 00000400
    f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
    f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
    f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
    [] xfrm6_dst_destroy+0x42/0xdb
    [] dst_destroy+0x1d/0xa4
    [] xfrm_bundle_flo_delete+0x2b/0x36
    [] flow_cache_gc_task+0x85/0x9f
    [] process_one_work+0x122/0x441
    [] ? apic_timer_interrupt+0x31/0x38
    [] ? flow_cache_new_hashrnd+0x2b/0x2b
    [] worker_thread+0x113/0x3cc

    Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
    properly initialize the dst's peer pointer.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

16 Aug, 2012

1 commit


02 Aug, 2012

1 commit

  • After SA is setup, one timer is armed to detect soft/hard expiration,
    however the timer handler uses xtime to do the math. This makes hard
    expiration occurs first before soft expiration after setting new date
    with big interval. As a result new child SA is deleted before rekeying
    the new one.

    Signed-off-by: Fan Du
    Signed-off-by: David S. Miller

    Fan Du
     

19 Jul, 2012

1 commit


28 Jun, 2012

1 commit


16 May, 2012

1 commit


02 Apr, 2012

1 commit


26 Feb, 2012

1 commit


23 Nov, 2011

2 commits


12 May, 2011

1 commit


11 May, 2011

1 commit

  • As it is, we assign the outer modes output function to the dst entry
    when we create the xfrm bundle. This leads to two problems on interfamily
    scenarios. We might insert ipv4 packets into ip6_fragment when called
    from xfrm6_output. The system crashes if we try to fragment an ipv4
    packet with ip6_fragment. This issue was introduced with git commit
    ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
    as needed). The second issue is, that we might insert ipv4 packets in
    netfilter6 and vice versa on interfamily scenarios.

    With this patch we assign the inner mode output function to the dst entry
    when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
    mode is used and the right fragmentation and netfilter functions are called.
    We switch then to outer mode with the output_finish functions.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

23 Apr, 2011

1 commit


11 Apr, 2011

1 commit

  • The reverse path filter interferes with IPsec subnet-to-subnet tunnels,
    especially when the link to the IPsec peer is on an interface other than
    the one hosting the default route.

    With dynamic routing, where the peer might be reachable through eth0
    today and eth1 tomorrow, it's difficult to keep rp_filter enabled unless
    fake routes to the remote subnets are configured on the interface
    currently used to reach the peer.

    IPsec provides a much stronger anti-spoofing policy than rp_filter, so
    this patch disables the rp_filter for packets with a security path.

    Signed-off-by: Michael Smith
    Signed-off-by: David S. Miller

    Michael Smith
     

29 Mar, 2011

1 commit

  • When we clone a xfrm state we have to assign the replay_esn
    and the preplay_esn pointers to the state if we use the
    new replay detection method. To this end, we add a
    xfrm_replay_clone() function that allocates memory for
    the replay detection and takes over the necessary values
    from the original state.

    Signed-off-by: Steffen Klassert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Steffen Klassert
     

22 Mar, 2011

1 commit

  • Commit 'xfrm: Move IPsec replay detection functions to a separate file'
    (9fdc4883d92d20842c5acea77a4a21bb1574b495)
    introduce repl field to struct xfrm_state, and only initialize it
    under SA's netlink create path, the other path, such as pf_key,
    ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if
    the SA is created by pf_key, any input packet with SA's encryption
    algorithm will cause panic.

    int xfrm_input()
    {
    ...
    x->repl->advance(x, seq);
    ...
    }

    This patch fixed it by introduce new function __xfrm_init_state().

    Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs
    EIP: 0060:[] EFLAGS: 00010206 CPU: 0
    EIP is at xfrm_input+0x31c/0x4cc
    EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000
    ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000)
    Stack:
    00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0
    00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000
    dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32
    Call Trace:
    [] xfrm4_rcv_encap+0x22/0x27
    [] xfrm4_rcv+0x1b/0x1d
    [] ip_local_deliver_finish+0x112/0x1b1
    [] ? ip_local_deliver_finish+0x0/0x1b1
    [] NF_HOOK.clone.1+0x3d/0x44
    [] ip_local_deliver+0x3e/0x44
    [] ? ip_local_deliver_finish+0x0/0x1b1
    [] ip_rcv_finish+0x30a/0x332
    [] ? ip_rcv_finish+0x0/0x332
    [] NF_HOOK.clone.1+0x3d/0x44
    [] ip_rcv+0x20b/0x247
    [] ? ip_rcv_finish+0x0/0x332
    [] __netif_receive_skb+0x373/0x399
    [] netif_receive_skb+0x4b/0x51
    [] cp_rx_poll+0x210/0x2c4 [8139cp]
    [] net_rx_action+0x9a/0x17d
    [] __do_softirq+0xa1/0x149
    [] ? __do_softirq+0x0/0x149

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

14 Mar, 2011

4 commits


13 Mar, 2011

3 commits


02 Mar, 2011

1 commit


28 Feb, 2011

4 commits


24 Feb, 2011

5 commits