12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Mar, 2014

1 commit


10 Mar, 2014

2 commits

  • security_xfrm_policy_alloc can be called in atomic context so the
    allocation should be done with GFP_ATOMIC. Add an argument to let the
    callers choose the appropriate way. In order to do so a gfp argument
    needs to be added to the method xfrm_policy_alloc_security in struct
    security_operations and to the internal function
    selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
    callers and leave GFP_KERNEL as before for the rest.
    The path that needed the gfp argument addition is:
    security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
    all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
    selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)

    Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
    add it to security_context_to_sid which is used inside and prior to this
    patch did only GFP_KERNEL allocation. So add gfp argument to
    security_context_to_sid and adjust all of its callers as well.

    CC: Paul Moore
    CC: Dave Jones
    CC: Steffen Klassert
    CC: Fan Du
    CC: David S. Miller
    CC: LSM list
    CC: SELinux list

    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Paul Moore
    Signed-off-by: Steffen Klassert

    Nikolay Aleksandrov
     
  • There's a kmalloc with GFP_KERNEL in a helper
    (pfkey_sadb2xfrm_user_sec_ctx) used in pfkey_compile_policy which is
    called under rcu_read_lock. Adjust pfkey_sadb2xfrm_user_sec_ctx to have
    a gfp argument and adjust the users.

    CC: Dave Jones
    CC: Steffen Klassert
    CC: Fan Du
    CC: David S. Miller

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Steffen Klassert

    Nikolay Aleksandrov
     

07 Mar, 2014

1 commit


21 Feb, 2014

1 commit


17 Feb, 2014

1 commit

  • The goal of this patch is to allow userland to dump only a part of SA by
    specifying a filter during the dump.
    The kernel is in charge to filter SA, this avoids to generate useless netlink
    traffic (it save also some cpu cycles). This is particularly useful when there
    is a big number of SA set on the system.

    Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
    struct netlink_callback->args is defined as a array of 6 long and the first long
    is used in xfrm code to flag the cb as initialized. Hence, we must have:
    sizeof(struct xfrm_state_walk)
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

13 Feb, 2014

1 commit

  • In the case when KMs have no listeners, km_query() will fail and
    temporary SAs are garbage collected immediately after their allocation.
    This causes strain on memory allocation, leading even to OOM since
    temporary SA alloc/free cycle is performed for every packet
    and garbage collection does not keep up the pace.

    The sane thing to do is to make sure we have audience before
    temporary SA allocation.

    Signed-off-by: Horia Geanta
    Signed-off-by: Steffen Klassert

    Horia Geanta
     

16 Dec, 2013

1 commit


06 Dec, 2013

3 commits

  • We now queue packets to the policy if the states are not yet resolved,
    this replaces the ancient sleeping code. Also the sleeping can cause
    indefinite task hangs if the needed state does not get resolved.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • By semantics, xfrm layer is fully name space aware,
    so will the locks, e.g. xfrm_state/pocliy_lock.
    Ensure exclusive access into state/policy link list
    for different name space with one global lock is not
    right in terms of semantics aspect at first place,
    as they are indeed mutually independent with each
    other, but also more seriously causes scalability
    problem.

    One practical scenario is on a Open Network Stack,
    more than hundreds of lxc tenants acts as routers
    within one host, a global xfrm_state/policy_lock
    becomes the bottleneck. But onces those locks are
    decoupled in a per-namespace fashion, locks contend
    is just with in specific name space scope, without
    causing additional SPD/SAD access delay for other
    name space.

    Also this patch improve scalability while as without
    changing original xfrm behavior.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     
  • because the home agent could surely be run on a different
    net namespace other than init_net. The original behavior
    could lead into inconsistent of key info.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     

21 Nov, 2013

1 commit


17 Sep, 2013

1 commit

  • For legacy IPsec anti replay mechanism:

    bitmap in struct xfrm_replay_state could only provide a 32 bits
    window size limit in current design, thus user level parameter
    sadb_sa_replay should honor this limit, otherwise misleading
    outputs("replay=244") by setkey -D will be:

    192.168.25.2 192.168.22.2
    esp mode=transport spi=147561170(0x08cb9ad2) reqid=0(0x00000000)
    E: aes-cbc 9a8d7468 7655cf0b 719d27be b0ddaac2
    A: hmac-sha1 2d2115c2 ebf7c126 1c54f186 3b139b58 264a7331
    seq=0x00000000 replay=244 flags=0x00000000 state=mature
    created: Sep 17 14:00:00 2013 current: Sep 17 14:00:22 2013
    diff: 22(s) hard: 30(s) soft: 26(s)
    last: Sep 17 14:00:00 2013 hard: 0(s) soft: 0(s)
    current: 1408(bytes) hard: 0(bytes) soft: 0(bytes)
    allocated: 22 hard: 0 soft: 0
    sadb_seq=1 pid=4854 refcnt=0
    192.168.22.2 192.168.25.2
    esp mode=transport spi=255302123(0x0f3799eb) reqid=0(0x00000000)
    E: aes-cbc 6485d990 f61a6bd5 e5660252 608ad282
    A: hmac-sha1 0cca811a eb4fa893 c47ae56c 98f6e413 87379a88
    seq=0x00000000 replay=244 flags=0x00000000 state=mature
    created: Sep 17 14:00:00 2013 current: Sep 17 14:00:22 2013
    diff: 22(s) hard: 30(s) soft: 26(s)
    last: Sep 17 14:00:00 2013 hard: 0(s) soft: 0(s)
    current: 1408(bytes) hard: 0(bytes) soft: 0(bytes)
    allocated: 22 hard: 0 soft: 0
    sadb_seq=0 pid=4854 refcnt=0

    And also, optimizing xfrm_replay_check window checking by setting the
    desirable x->props.replay_window with only doing the comparison once
    for all when xfrm_state is first born.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     

07 Aug, 2013

1 commit

  • present_and_same_family has checked addresses family validness for both
    SADB_EXT_ADDRESS_SRC and SADB_EXT_ADDRESS_DST in the beginning.
    Thereafter pfkey_sadb_addr2xfrm_addr doesn't need to do the checking again.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     

05 Aug, 2013

2 commits


31 Jul, 2013

1 commit

  • This is inspired by a5cc68f3d6 "af_key: fix info leaks in notify
    messages". There are some struct members which don't get initialized
    and could disclose small amounts of private information.

    Acked-by: Mathias Krause
    Signed-off-by: Dan Carpenter
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Dan Carpenter
     

27 Jun, 2013

1 commit

  • key_notify_sa_flush() and key_notify_policy_flush() miss to initialize
    the sadb_msg_reserved member of the broadcasted message and thereby
    leak 2 bytes of heap memory to listeners. Fix that.

    Signed-off-by: Mathias Krause
    Cc: Steffen Klassert
    Cc: "David S. Miller"
    Cc: Herbert Xu
    Signed-off-by: David S. Miller

    Mathias Krause
     

01 Jun, 2013

1 commit

  • In some cases after deleting a policy from the SPD the policy would
    remain in the dst/flow/route cache for an extended period of time
    which caused problems for SELinux as its dynamic network access
    controls key off of the number of XFRM policy and state entries.
    This patch corrects this problem by forcing a XFRM garbage collection
    whenever a policy is sucessfully removed.

    Reported-by: Ondrej Moris
    Signed-off-by: Paul Moore
    Signed-off-by: David S. Miller

    Paul Moore
     

28 Mar, 2013

1 commit

  • Steffen Klassert says:

    ====================
    1) Initialize the satype field in key_notify_policy_flush(),
    this was left uninitialized. From Nicolas Dichtel.

    2) The sequence number difference for replay notifications
    was misscalculated on ESN sequence number wrap. We need
    a separate replay notify function for esn.

    3) Fix an off by one in the esn replay notify function.
    From Mathias Krause.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Mar, 2013

1 commit


28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

21 Feb, 2013

1 commit


19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

15 Feb, 2013

1 commit

  • Steffen Klassert says:

    ====================
    1) Remove a duplicated call to skb_orphan() in pf_key, from Cong Wang.

    2) Prepare xfrm and pf_key for algorithms without pf_key support,
    from Jussi Kivilinna.

    3) Fix an unbalanced lock in xfrm_output_one(), from Li RongQing.

    4) Add an IPsec state resolution packet queue to handle
    packets that are send before the states are resolved.

    5) xfrm4_policy_fini() is unused since 2.6.11, time to remove it.
    From Michal Kubecek.

    6) The xfrm gc threshold was configurable just in the initial
    namespace, make it configurable in all namespaces. From
    Michal Kubecek.

    7) We currently can not insert policies with mark and mask
    such that some flows would be matched from both policies.
    Allow this if the priorities of these policies are different,
    the one with the higher priority is used in this case.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Feb, 2013

1 commit


30 Jan, 2013

1 commit


28 Jan, 2013

1 commit


19 Nov, 2012

1 commit

  • Allow an unpriviled user who has created a user namespace, and then
    created a network namespace to effectively use the new network
    namespace, by reducing capable(CAP_NET_ADMIN) and
    capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
    CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

    Allow creation of af_key sockets.
    Allow creation of llc sockets.
    Allow creation of af_packet sockets.

    Allow sending xfrm netlink control messages.

    Allow binding to netlink multicast groups.
    Allow sending to netlink multicast groups.
    Allow adding and dropping netlink multicast groups.
    Allow sending to all netlink multicast groups and port ids.

    Allow reading the netfilter SO_IP_SET socket option.
    Allow sending netfilter netlink messages.
    Allow setting and getting ip_vs netfilter socket options.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

02 Oct, 2012

1 commit


11 Sep, 2012

1 commit

  • It is a frequent mistake to confuse the netlink port identifier with a
    process identifier. Try to reduce this confusion by renaming fields
    that hold port identifiers portid instead of pid.

    I have carefully avoided changing the structures exported to
    userspace to avoid changing the userspace API.

    I have successfully built an allyesconfig kernel with this change.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

25 Aug, 2012

1 commit


16 Aug, 2012

1 commit


15 Aug, 2012

1 commit


16 Apr, 2012

1 commit


13 Apr, 2012

1 commit


12 Dec, 2011

1 commit


23 Nov, 2011

1 commit