01 Jun, 2013

1 commit

  • In some cases after deleting a policy from the SPD the policy would
    remain in the dst/flow/route cache for an extended period of time
    which caused problems for SELinux as its dynamic network access
    controls key off of the number of XFRM policy and state entries.
    This patch corrects this problem by forcing a XFRM garbage collection
    whenever a policy is sucessfully removed.

    Reported-by: Ondrej Moris
    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     

06 Mar, 2013

2 commits

  • There is no need to modify the netlink dispatch table at runtime.

    Signed-off-by: Mathias Krause
    Signed-off-by: Steffen Klassert

    Mathias Krause
     
  • By default, DSCP is copying during encapsulation.
    Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
    different DSCP may get reordered relative to each other in the network and then
    dropped by the remote IPsec GW if the reordering becomes too big compared to the
    replay window.

    It is possible to avoid this copy with netfilter rules, but it's very convenient
    to be able to configure it for each SA directly.

    This patch adds a toogle for this purpose. By default, it's not set to maintain
    backward compatibility.

    Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

30 Jan, 2013

1 commit


19 Nov, 2012

1 commit

  • Allow an unpriviled user who has created a user namespace, and then
    created a network namespace to effectively use the new network
    namespace, by reducing capable(CAP_NET_ADMIN) and
    capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
    CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

    Allow creation of af_key sockets.
    Allow creation of llc sockets.
    Allow creation of af_packet sockets.

    Allow sending xfrm netlink control messages.

    Allow binding to netlink multicast groups.
    Allow sending to netlink multicast groups.
    Allow adding and dropping netlink multicast groups.
    Allow sending to all netlink multicast groups and port ids.

    Allow reading the netfilter SO_IP_SET socket option.
    Allow sending netfilter netlink messages.
    Allow setting and getting ip_vs netfilter socket options.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

03 Oct, 2012

2 commits

  • Pull networking changes from David Miller:

    1) GRE now works over ipv6, from Dmitry Kozlov.

    2) Make SCTP more network namespace aware, from Eric Biederman.

    3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

    4) Make openvswitch network namespace aware, from Pravin B Shelar.

    5) IPV6 NAT implementation, from Patrick McHardy.

    6) Server side support for TCP Fast Open, from Jerry Chu and others.

    7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

    8) Increate the loopback default MTU to 64K, from Eric Dumazet.

    9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic. This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

    10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail. Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

    11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

    12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

    13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

    Fix up various fairly trivial conflicts, mostly caused by the user
    namespace changes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
    hyperv: Add buffer for extended info after the RNDIS response message.
    hyperv: Report actual status in receive completion packet
    hyperv: Remove extra allocated space for recv_pkt_list elements
    hyperv: Fix page buffer handling in rndis_filter_send_request()
    hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
    hyperv: Fix the max_xfer_size in RNDIS initialization
    vxlan: put UDP socket in correct namespace
    vxlan: Depend on CONFIG_INET
    sfc: Fix the reported priorities of different filter types
    sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
    sfc: Fix loopback self-test with separate_tx_channels=1
    sfc: Fix MCDI structure field lookup
    sfc: Add parentheses around use of bitfield macro arguments
    sfc: Fix null function pointer in efx_sriov_channel_type
    vxlan: virtual extensible lan
    igmp: export symbol ip_mc_leave_group
    netlink: add attributes to fdb interface
    tg3: unconditionally select HWMON support when tg3 is enabled.
    Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
    gre: fix sparse warning
    ...

    Linus Torvalds
     
  • Pull user namespace changes from Eric Biederman:
    "This is a mostly modest set of changes to enable basic user namespace
    support. This allows the code to code to compile with user namespaces
    enabled and removes the assumption there is only the initial user
    namespace. Everything is converted except for the most complex of the
    filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
    nfs, ocfs2 and xfs as those patches need a bit more review.

    The strategy is to push kuid_t and kgid_t values are far down into
    subsystems and filesystems as reasonable. Leaving the make_kuid and
    from_kuid operations to happen at the edge of userspace, as the values
    come off the disk, and as the values come in from the network.
    Letting compile type incompatible compile errors (present when user
    namespaces are enabled) guide me to find the issues.

    The most tricky areas have been the places where we had an implicit
    union of uid and gid values and were storing them in an unsigned int.
    Those places were converted into explicit unions. I made certain to
    handle those places with simple trivial patches.

    Out of that work I discovered we have generic interfaces for storing
    quota by projid. I had never heard of the project identifiers before.
    Adding full user namespace support for project identifiers accounts
    for most of the code size growth in my git tree.

    Ultimately there will be work to relax privlige checks from
    "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
    root in a user names to do those things that today we only forbid to
    non-root users because it will confuse suid root applications.

    While I was pushing kuid_t and kgid_t changes deep into the audit code
    I made a few other cleanups. I capitalized on the fact we process
    netlink messages in the context of the message sender. I removed
    usage of NETLINK_CRED, and started directly using current->tty.

    Some of these patches have also made it into maintainer trees, with no
    problems from identical code from different trees showing up in
    linux-next.

    After reading through all of this code I feel like I might be able to
    win a game of kernel trivial pursuit."

    Fix up some fairly trivial conflicts in netfilter uid/git logging code.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
    userns: Convert the ufs filesystem to use kuid/kgid where appropriate
    userns: Convert the udf filesystem to use kuid/kgid where appropriate
    userns: Convert ubifs to use kuid/kgid
    userns: Convert squashfs to use kuid/kgid where appropriate
    userns: Convert reiserfs to use kuid and kgid where appropriate
    userns: Convert jfs to use kuid/kgid where appropriate
    userns: Convert jffs2 to use kuid and kgid where appropriate
    userns: Convert hpfs to use kuid and kgid where appropriate
    userns: Convert btrfs to use kuid/kgid where appropriate
    userns: Convert bfs to use kuid/kgid where appropriate
    userns: Convert affs to use kuid/kgid wherwe appropriate
    userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
    userns: On ia64 deal with current_uid and current_gid being kuid and kgid
    userns: On ppc convert current_uid from a kuid before printing.
    userns: Convert s390 getting uid and gid system calls to use kuid and kgid
    userns: Convert s390 hypfs to use kuid and kgid where appropriate
    userns: Convert binder ipc to use kuids
    userns: Teach security_path_chown to take kuids and kgids
    userns: Add user namespace support to IMA
    userns: Convert EVM to deal with kuids and kgids in it's hmac computation
    ...

    Linus Torvalds
     

29 Sep, 2012

1 commit

  • Conflicts:
    drivers/net/team/team.c
    drivers/net/usb/qmi_wwan.c
    net/batman-adv/bat_iv_ogm.c
    net/ipv4/fib_frontend.c
    net/ipv4/route.c
    net/l2tp/l2tp_netlink.c

    The team, fib_frontend, route, and l2tp_netlink conflicts were simply
    overlapping changes.

    qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

    With help from Antonio Quartulli.

    Signed-off-by: David S. Miller

    David S. Miller
     

21 Sep, 2012

6 commits

  • The ESN replay window was already fully initialized in
    xfrm_alloc_replay_state_esn(). No need to copy it again.

    Cc: Steffen Klassert
    Signed-off-by: Mathias Krause
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • The current code fails to ensure that the netlink message actually
    contains as many bytes as the header indicates. If a user creates a new
    state or updates an existing one but does not supply the bytes for the
    whole ESN replay window, the kernel copies random heap bytes into the
    replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
    netlink attribute. This leads to following issues:

    1. The replay window has random bits set confusing the replay handling
    code later on.

    2. A malicious user could use this flaw to leak up to ~3.5kB of heap
    memory when she has access to the XFRM netlink interface (requires
    CAP_NET_ADMIN).

    Known users of the ESN replay window are strongSwan and Steffen's
    iproute2 patch (). The latter
    uses the interface with a bitmap supplied while the former does not.
    strongSwan is therefore prone to run into issue 1.

    To fix both issues without breaking existing userland allow using the
    XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
    fully specified one. For the former case we initialize the in-kernel
    bitmap with zero, for the latter we copy the user supplied bitmap. For
    state updates the full bitmap must be supplied.

    To prevent overflows in the bitmap length calculation the maximum size
    of bmp_len is limited to 128 by this patch -- resulting in a maximum
    replay window of 4096 packets. This should be sufficient for all real
    life scenarios (RFC 4303 recommends a default replay window size of 64).

    Cc: Steffen Klassert
    Cc: Martin Willi
    Cc: Ben Hutchings
    Signed-off-by: Mathias Krause
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • The memory used for the template copy is a local stack variable. As
    struct xfrm_user_tmpl contains multiple holes added by the compiler for
    alignment, not initializing the memory will lead to leaking stack bytes
    to userland. Add an explicit memset(0) to avoid the info leak.

    Initial version of the patch by Brad Spengler.

    Cc: Brad Spengler
    Signed-off-by: Mathias Krause
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • The memory reserved to dump the xfrm policy includes multiple padding
    bytes added by the compiler for alignment (padding bytes in struct
    xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
    memset(0) before filling the buffer to avoid the heap info leak.

    Signed-off-by: Mathias Krause
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • The memory reserved to dump the xfrm state includes the padding bytes of
    struct xfrm_usersa_info added by the compiler for alignment (7 for
    amd64, 3 for i386). Add an explicit memset(0) before filling the buffer
    to avoid the info leak.

    Signed-off-by: Mathias Krause
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • copy_to_user_auth() fails to initialize the remainder of alg_name and
    therefore discloses up to 54 bytes of heap memory via netlink to
    userland.

    Use strncpy() instead of strcpy() to fill the trailing bytes of alg_name
    with null bytes.

    Signed-off-by: Mathias Krause
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Mathias Krause
     

19 Sep, 2012

2 commits

  • When dump_one_policy() returns an error, e.g. because of a too small
    buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns
    NULL instead of an error pointer. But its caller expects an error
    pointer and therefore continues to operate on a NULL skbuff.

    Signed-off-by: Mathias Krause
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Mathias Krause
     
  • When dump_one_state() returns an error, e.g. because of a too small
    buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
    instead of an error pointer. But its callers expect an error pointer
    and therefore continue to operate on a NULL skbuff.

    This could lead to a privilege escalation (execution of user code in
    kernel context) if the attacker has CAP_NET_ADMIN and is able to map
    address 0.

    Signed-off-by: Mathias Krause
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Mathias Krause
     

18 Sep, 2012

1 commit

  • Always store audit loginuids in type kuid_t.

    Print loginuids by converting them into uids in the appropriate user
    namespace, and then printing the resulting uid.

    Modify audit_get_loginuid to return a kuid_t.

    Modify audit_set_loginuid to take a kuid_t.

    Modify /proc//loginuid on read to convert the loginuid into the
    user namespace of the opener of the file.

    Modify /proc//loginud on write to convert the loginuid
    rom the user namespace of the opener of the file.

    Cc: Al Viro
    Cc: Eric Paris
    Cc: Paul Moore ?
    Cc: David Miller
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

11 Sep, 2012

2 commits

  • When a policy expiration is triggered from user space the request
    travels through km_policy_expired and ultimately into
    xfrm_exp_policy_notify which calls build_polexpire. build_polexpire
    uses the netlink port passed to km_policy_expired as the source port for
    the netlink message it builds.

    When a state expiration is triggered from user space the request travles
    through km_state_expired and ultimately into xfrm_exp_state_notify which
    calls build_expire. build_expire uses the netlink port passed to
    km_state_expired as the source port for the netlink message it builds.

    Pass nlh->nlmsg_pid from the user generated netlink message that
    requested the expiration to km_policy_expired and km_state_expired
    instead of current->pid which is not a netlink port number.

    Cc: Jamal Hadi Salim
    Signed-off-by: "Eric W. Biederman"
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

09 Sep, 2012

1 commit


16 Aug, 2012

1 commit


30 Jun, 2012

1 commit

  • This patch adds the following structure:

    struct netlink_kernel_cfg {
    unsigned int groups;
    void (*input)(struct sk_buff *skb);
    struct mutex *cb_mutex;
    };

    That can be passed to netlink_kernel_create to set optional configurations
    for netlink kernel sockets.

    I've populated this structure by looking for NULL and zero parameters at the
    existing code. The remaining parameters that always need to be set are still
    left in the original interface.

    That includes optional parameters for the netlink socket creation. This allows
    easy extensibility of this interface in the future.

    This patch also adapts all callers to use this new interface.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

28 Jun, 2012

1 commit


02 Apr, 2012

1 commit


27 Feb, 2012

1 commit


15 Jan, 2012

1 commit

  • * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
    capabilities: remove __cap_full_set definition
    security: remove the security_netlink_recv hook as it is equivalent to capable()
    ptrace: do not audit capability check when outputing /proc/pid/stat
    capabilities: remove task_ns_* functions
    capabitlies: ns_capable can use the cap helpers rather than lsm call
    capabilities: style only - move capable below ns_capable
    capabilites: introduce new has_ns_capabilities_noaudit
    capabilities: call has_ns_capability from has_capability
    capabilities: remove all _real_ interfaces
    capabilities: introduce security_capable_noaudit
    capabilities: reverse arguments to security_capable
    capabilities: remove the task from capable LSM hook entirely
    selinux: sparse fix: fix several warnings in the security server cod
    selinux: sparse fix: fix warnings in netlink code
    selinux: sparse fix: eliminate warnings for selinuxfs
    selinux: sparse fix: declare selinux_disable() in security.h
    selinux: sparse fix: move selinux_complete_init
    selinux: sparse fix: make selinux_secmark_refcount static
    SELinux: Fix RCU deref check warning in sel_netport_insert()

    Manually fix up a semantic mis-merge wrt security_netlink_recv():

    - the interface was removed in commit fd7784615248 ("security: remove
    the security_netlink_recv hook as it is equivalent to capable()")

    - a new user of it appeared in commit a38f7907b926 ("crypto: Add
    userspace configuration API")

    causing no automatic merge conflict, but Eric Paris pointed out the
    issue.

    Linus Torvalds
     

13 Jan, 2012

1 commit

  • commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
    RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
    complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
    y).

    We miss needed barriers, even on x86, when y is not NULL.

    Signed-off-by: Eric Dumazet
    CC: Stephen Hemminger
    CC: Paul E. McKenney
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Jan, 2012

1 commit


12 Dec, 2011

1 commit


02 Aug, 2011

1 commit

  • When assigning a NULL value to an RCU protected pointer, no barrier
    is needed. The rcu_assign_pointer, used to handle that but will soon
    change to not handle the special case.

    Convert all rcu_assign_pointer of NULL value.

    //smpl
    @@ expression P; @@

    - rcu_assign_pointer(P, NULL)
    + RCU_INIT_POINTER(P, NULL)

    //

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

10 Jun, 2011

1 commit

  • The message size allocated for rtnl ifinfo dumps was limited to
    a single page. This is not enough for additional interface info
    available with devices that support SR-IOV and caused a bug in
    which VF info would not be displayed if more than approximately
    40 VFs were created per interface.

    Implement a new function pointer for the rtnl_register service that will
    calculate the amount of data required for the ifinfo dump and allocate
    enough data to satisfy the request.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

27 Apr, 2011

1 commit


31 Mar, 2011

1 commit


29 Mar, 2011

2 commits


22 Mar, 2011

1 commit

  • Commit 'xfrm: Move IPsec replay detection functions to a separate file'
    (9fdc4883d92d20842c5acea77a4a21bb1574b495)
    introduce repl field to struct xfrm_state, and only initialize it
    under SA's netlink create path, the other path, such as pf_key,
    ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if
    the SA is created by pf_key, any input packet with SA's encryption
    algorithm will cause panic.

    int xfrm_input()
    {
    ...
    x->repl->advance(x, seq);
    ...
    }

    This patch fixed it by introduce new function __xfrm_init_state().

    Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs
    EIP: 0060:[] EFLAGS: 00010206 CPU: 0
    EIP is at xfrm_input+0x31c/0x4cc
    EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000
    ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000)
    Stack:
    00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0
    00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000
    dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32
    Call Trace:
    [] xfrm4_rcv_encap+0x22/0x27
    [] xfrm4_rcv+0x1b/0x1d
    [] ip_local_deliver_finish+0x112/0x1b1
    [] ? ip_local_deliver_finish+0x0/0x1b1
    [] NF_HOOK.clone.1+0x3d/0x44
    [] ip_local_deliver+0x3e/0x44
    [] ? ip_local_deliver_finish+0x0/0x1b1
    [] ip_rcv_finish+0x30a/0x332
    [] ? ip_rcv_finish+0x0/0x332
    [] NF_HOOK.clone.1+0x3d/0x44
    [] ip_rcv+0x20b/0x247
    [] ? ip_rcv_finish+0x0/0x332
    [] __netif_receive_skb+0x373/0x399
    [] netif_receive_skb+0x4b/0x51
    [] cp_rx_poll+0x210/0x2c4 [8139cp]
    [] net_rx_action+0x9a/0x17d
    [] __do_softirq+0xa1/0x149
    [] ? __do_softirq+0x0/0x149

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

14 Mar, 2011

2 commits

  • This patch adds a netlink based user interface to configure
    esn and big anti-replay windows. The new netlink attribute
    XFRMA_REPLAY_ESN_VAL is used to configure the new implementation.
    If the XFRM_STATE_ESN flag is set, we use esn and support for big
    anti-replay windows for the configured state. If this flag is not
    set we use the new implementation with 32 bit sequence numbers.
    A big anti-replay window can be configured in this case anyway.

    Signed-off-by: Steffen Klassert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • To support multiple versions of replay detection, we move the replay
    detection functions to a separate file and make them accessible
    via function pointers contained in the struct xfrm_replay.

    Signed-off-by: Steffen Klassert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Steffen Klassert
     

04 Mar, 2011

1 commit


28 Feb, 2011

1 commit