16 Jun, 2018

1 commit

  • commit 4b66af2d6356a00e94bcdea3e7fea324e8b5c6f4 upstream.

    Key extensions (struct sadb_key) include a user-specified number of key
    bits. The kernel uses that number to determine how much key data to copy
    out of the message in pfkey_msg2xfrm_state().

    The length of the sadb_key message must be verified to be long enough,
    even in the case of SADB_X_AALG_NULL. Furthermore, the sadb_key_len value
    must be long enough to include both the key data and the struct sadb_key
    itself.

    Introduce a helper function verify_key_len(), and call it from
    parse_exthdrs() where other exthdr types are similarly checked for
    correctness.

    Signed-off-by: Kevin Easton
    Reported-by: syzbot+5022a34ca5a3d49b84223653fab632dfb7b4cf37@syzkaller.appspotmail.com
    Signed-off-by: Steffen Klassert
    Cc: Zubin Mithra
    Signed-off-by: Greg Kroah-Hartman

    Kevin Easton
     

24 Jan, 2018

2 commits

  • commit 4e765b4972af7b07adcb1feb16e7a525ce1f6b28 upstream.

    If a message sent to a PF_KEY socket ended with an incomplete extension
    header (fewer than 4 bytes remaining), then parse_exthdrs() read past
    the end of the message, into uninitialized memory. Fix it by returning
    -EINVAL in this case.

    Reproducer:

    #include
    #include
    #include

    int main()
    {
    int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
    char buf[17] = { 0 };
    struct sadb_msg *msg = (void *)buf;

    msg->sadb_msg_version = PF_KEY_V2;
    msg->sadb_msg_type = SADB_DELETE;
    msg->sadb_msg_len = 2;

    write(sock, buf, 17);
    }

    Signed-off-by: Eric Biggers
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     
  • commit 06b335cb51af018d5feeff5dd4fd53847ddb675a upstream.

    If a message sent to a PF_KEY socket ended with one of the extensions
    that takes a 'struct sadb_address' but there were not enough bytes
    remaining in the message for the ->sa_family member of the 'struct
    sockaddr' which is supposed to follow, then verify_address_len() read
    past the end of the message, into uninitialized memory. Fix it by
    returning -EINVAL in this case.

    This bug was found using syzkaller with KMSAN.

    Reproducer:

    #include
    #include
    #include

    int main()
    {
    int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
    char buf[24] = { 0 };
    struct sadb_msg *msg = (void *)buf;
    struct sadb_address *addr = (void *)(msg + 1);

    msg->sadb_msg_version = PF_KEY_V2;
    msg->sadb_msg_type = SADB_DELETE;
    msg->sadb_msg_len = 3;
    addr->sadb_address_len = 1;
    addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;

    write(sock, buf, 24);
    }

    Reported-by: Alexander Potapenko
    Signed-off-by: Eric Biggers
    Signed-off-by: Steffen Klassert
    Signed-off-by: Greg Kroah-Hartman

    Eric Biggers
     

16 Aug, 2017

1 commit


15 Aug, 2017

1 commit

  • pfkey_broadcast() might be called from non process contexts,
    we can not use GFP_KERNEL in these cases [1].

    This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in
    af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
    section.

    [1] : syzkaller reported :

    in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
    3 locks held by syzkaller183439/2932:
    #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
    #1: (&pfk->dump_lock){+.+.+.}, at: [] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
    #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [] spin_lock_bh include/linux/spinlock.h:304 [inline]
    #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
    CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:16 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:52
    ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
    __might_sleep+0x95/0x190 kernel/sched/core.c:5947
    slab_pre_alloc_hook mm/slab.h:416 [inline]
    slab_alloc mm/slab.c:3383 [inline]
    kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
    skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
    pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
    pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
    dump_sp+0x3d6/0x500 net/key/af_key.c:2685
    xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
    pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
    pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
    pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
    pfkey_process+0x606/0x710 net/key/af_key.c:2814
    pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    ___sys_sendmsg+0x755/0x890 net/socket.c:2035
    __sys_sendmsg+0xe5/0x210 net/socket.c:2069
    SYSC_sendmsg net/socket.c:2080 [inline]
    SyS_sendmsg+0x2d/0x50 net/socket.c:2076
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x445d79
    RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
    RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
    RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
    R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
    R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000

    Fixes: ba51b6be38c1 ("net: Fix RCU splat in af_key")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Cc: David Ahern
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Jul, 2017

1 commit

  • After rcu conversions performance degradation in forward tests isn't that
    noticeable anymore.

    See next patch for some numbers.

    A followup patcg could then also remove genid from the policies
    as we do not cache bundles anymore.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

05 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

01 Jul, 2017

4 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    This patch uses refcount_inc_not_zero() instead of
    atomic_inc_not_zero_hint() due to absense of a _hint()
    version of refcount API. If the hint() version must
    be used, we might need to revisit API.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • A set of overlapping changes in macvlan and the rocker
    driver, nothing serious.

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2017

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-06-23

    1) Use memdup_user to spmlify xfrm_user_policy.
    From Geliang Tang.

    2) Make xfrm_dev_register static to silence a sparse warning.
    From Wei Yongjun.

    3) Use crypto_memneq to check the ICV in the AH protocol.
    From Sabrina Dubroca.

    4) Remove some unused variables in esp6.
    From Stephen Hemminger.

    5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
    From Antony Antony.

    6) Include the UDP encapsulation port to km_migrate announcements.
    From Antony Antony.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jun, 2017

3 commits

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions (skb_put, __skb_put and pskb_put) return void *
    and remove all the casts across the tree, adding a (u8 *) cast only
    where the unsigned char pointer was used directly, all done with the
    following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    which actually doesn't cover pskb_put since there are only three
    users overall.

    A handful of stragglers were converted manually, notably a macro in
    drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
    instances in net/bluetooth/hci_sock.c. In the former file, I also
    had to fix one whitespace problem spatch introduced.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • A common pattern with skb_put() is to just want to memcpy()
    some data into the new space, introduce skb_put_data() for
    this.

    An spatch similar to the one for skb_put_zero() converts many
    of the places using it:

    @@
    identifier p, p2;
    expression len, skb, data;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, len);
    |
    -memcpy(p, data, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb, data;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, sizeof(*p));
    |
    -memcpy(p, data, sizeof(*p));
    )

    @@
    expression skb, len, data;
    @@
    -memcpy(skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

    (again, manually post-processed to retain some comments)

    Reviewed-by: Stephen Hemminger
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

14 Jun, 2017

2 commits

  • The default error code in pfkey_msg2xfrm_state() is -ENOBUFS. We
    added a new call to security_xfrm_state_alloc() which sets "err" to zero
    so there several places where we can return ERR_PTR(0) if kmalloc()
    fails. The caller is expecting error pointers so it leads to a NULL
    dereference.

    Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Steffen Klassert

    Dan Carpenter
     
  • There are some missing error codes here so we accidentally return NULL
    instead of an error pointer. It results in a NULL pointer dereference.

    Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Steffen Klassert

    Dan Carpenter
     

12 Jun, 2017

1 commit

  • Now we will force to do garbage collection if any policy removed in
    xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
    first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
    -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
    dereference when check percpu_empty. The code path looks like:

    flow_cache_fini()
    - fc->percpu = NULL
    xfrm_policy_fini()
    - xfrm_policy_flush()
    - xfrm_garbage_collect()
    - flow_cache_flush()
    - flow_cache_percpu_empty()
    - fcp = per_cpu_ptr(fc->percpu, cpu)

    To reproduce, just add ipsec in netns and then remove the netns.

    v2:
    As Xin Long suggested, since only two other places need to call it. move
    xfrm_garbage_collect() outside xfrm_policy_flush().

    v3:
    Fix subject mismatch after v2 fix.

    Fixes: 35db06912189 ("xfrm: do the garbage collection after flushing policy")
    Signed-off-by: Hangbin Liu
    Reviewed-by: Xin Long
    Signed-off-by: Steffen Klassert

    Hangbin Liu
     

07 Jun, 2017

2 commits

  • Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
    to userland. Only add if XFRMA_ENCAP was in user migrate request.

    Signed-off-by: Antony Antony
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Steffen Klassert

    Antony Antony
     
  • Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional
    netlink attribute XFRMA_ENCAP.

    The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8)
    could go to sleep for a few minutes and wake up. When it wake up the
    NAT mapping could have expired, the device send a MOBIKE UPDATE_SA
    message to migrate the IPsec SA. The change could be a change UDP
    encapsulation port, IP address, or both.

    Reported-by: Paul Wouters
    Signed-off-by: Antony Antony
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Steffen Klassert

    Antony Antony
     

08 May, 2017

1 commit

  • The sadb_x_sec_len is stored in the unit 'byte divided by eight'.
    So we have to multiply this value by eight before we can do
    size checks. Otherwise we may get a slab-out-of-bounds when
    we memcpy the user sec_ctx.

    Fixes: df71837d502 ("[LSM-IPSec]: Security association restriction.")
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

22 Apr, 2017

1 commit


18 Apr, 2017

1 commit

  • The parsing of sadb_x_ipsecrequest is broken in a number of ways.
    First of all we're not verifying sadb_x_ipsecrequest_len. This
    is needed when the structure carries addresses at the end. Worse
    we don't even look at the length when we parse those optional
    addresses.

    The migration code had similar parsing code that's better but
    it also has some deficiencies. The length is overcounted first
    of all as it includes the header itself. It also fails to check
    the length before dereferencing the sa_family field.

    This patch fixes those problems in parse_sockaddr_pair and then
    uses it in parse_ipsecrequest.

    Reported-by: Andrey Konovalov
    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     

03 Apr, 2017

1 commit

  • A dump may come in the middle of another dump, modifying its dump
    structure members. This race condition will result in NULL pointer
    dereference in kernel. So add a lock to prevent that race.

    Fixes: 83321d6b9872 ("[AF_KEY]: Dump SA/SP entries non-atomically")
    Signed-off-by: Yuejie Shi
    Signed-off-by: Steffen Klassert

    Yuejie Shi
     

24 Mar, 2017

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

23 Oct, 2015

1 commit


25 Aug, 2015

1 commit

  • Hit the following splat testing VRF change for ipsec:

    [ 113.475692] ===============================
    [ 113.476194] [ INFO: suspicious RCU usage. ]
    [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted
    [ 113.477545] -------------------------------
    [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section!
    [ 113.479288]
    [ 113.479288] other info that might help us debug this:
    [ 113.479288]
    [ 113.480207]
    [ 113.480207] rcu_scheduler_active = 1, debug_locks = 1
    [ 113.480931] 2 locks held by setkey/6829:
    [ 113.481371] #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [] pfkey_sendmsg+0xfb/0x213
    [ 113.482509] #1: (rcu_read_lock){......}, at: [] rcu_read_lock+0x0/0x6e
    [ 113.483509]
    [ 113.483509] stack backtrace:
    [ 113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED
    [ 113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
    [ 113.486845] 0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962
    [ 113.487732] ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154
    [ 113.488628] 0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8
    [ 113.489525] Call Trace:
    [ 113.489813] [] dump_stack+0x4c/0x65
    [ 113.490389] [] ? console_unlock+0x3d6/0x405
    [ 113.491039] [] lockdep_rcu_suspicious+0xfa/0x103
    [ 113.491735] [] rcu_preempt_sleep_check+0x45/0x47
    [ 113.492442] [] ___might_sleep+0x19/0x1c8
    [ 113.493077] [] __might_sleep+0x6c/0x82
    [ 113.493681] [] cache_alloc_debugcheck_before.isra.50+0x1d/0x24
    [ 113.494508] [] kmem_cache_alloc+0x31/0x18f
    [ 113.495149] [] skb_clone+0x64/0x80
    [ 113.495712] [] pfkey_broadcast_one+0x3d/0xff
    [ 113.496380] [] pfkey_broadcast+0xb5/0x11e
    [ 113.497024] [] pfkey_register+0x191/0x1b1
    [ 113.497653] [] pfkey_process+0x162/0x17e
    [ 113.498274] [] pfkey_sendmsg+0x109/0x213

    In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
    the RCU lock.

    Since pfkey_broadcast takes the RCU lock the allocation argument is
    pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
    The one call outside of rcu can be done with GFP_KERNEL.

    Fixes: 7f6b9dbd5afbd ("af_key: locking change")
    Signed-off-by: David Ahern
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David Ahern
     

25 Jun, 2015

1 commit

  • Pull networking updates from David Miller:

    1) Add TX fast path in mac80211, from Johannes Berg.

    2) Add TSO/GRO support to ibmveth, from Thomas Falcon

    3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

    4) Lots of new rhashtable tests, from Thomas Graf.

    5) Run ingress qdisc lockless, from Alexei Starovoitov.

    6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting. From Eric Dumazet.

    7) Add mode parameter to pktgen, for testing receive. From Alexei
    Starovoitov.

    8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

    9) Move page frag allocator under mm/, also from Alexander.

    10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

    11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

    12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier. From Jiri Pirko.

    13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics. From Willem de Bruijn.

    14) Add netdev driver for GENEVE tunnels, from John W Linville.

    15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

    16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

    17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

    18) Add tail call support to BPF, from Alexei Starovoitov.

    19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

    20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

    21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion. From Eric Dumazet.

    22) Add Cavium ThunderX driver, from Sunil Goutham.

    23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

    24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

    25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation. From Wei Liu.

    26) Add more entropy inputs to flow dissector, from Tom Herbert.

    27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

    28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

    29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
    bridge: vlan: flush the dynamically learned entries on port vlan delete
    bridge: multicast: add a comment to br_port_state_selection about blocking state
    net: inet_diag: export IPV6_V6ONLY sockopt
    stmmac: troubleshoot unexpected bits in des0 & des1
    net: ipv4 sysctl option to ignore routes when nexthop link is down
    net: track link-status of ipv4 nexthops
    net: switchdev: ignore unsupported bridge flags
    net: Cavium: Fix MAC address setting in shutdown state
    drivers: net: xgene: fix for ACPI support without ACPI
    ip: report the original address of ICMP messages
    net/mlx5e: Prefetch skb data on RX
    net/mlx5e: Pop cq outside mlx5e_get_cqe
    net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
    net/mlx5e: Remove extra spaces
    net/mlx5e: Avoid TX CQE generation if more xmit packets expected
    net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
    net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
    net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
    net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
    net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
    ...

    Linus Torvalds
     

28 May, 2015

1 commit


11 May, 2015

1 commit


01 Apr, 2015

1 commit

  • In many places, the a6 field is typecasted to struct in6_addr. As the
    fields are in union anyway, just add in6_addr type to the union and
    get rid of the typecasting.

    Modifying the uapi header is okay, the union has still the same size.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

03 Mar, 2015

1 commit

  • After TIPC doesn't depend on iocb argument in its internal
    implementations of sendmsg() and recvmsg() hooks defined in proto
    structure, no any user is using iocb argument in them at all now.
    Then we can drop the redundant iocb argument completely from kinds of
    implementations of both sendmsg() and recvmsg() in the entire
    networking stack.

    Cc: Christoph Hellwig
    Suggested-by: Al Viro
    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     

24 Nov, 2014

1 commit


06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jul, 2014

1 commit


31 May, 2014

1 commit

  • This patch replaces a comma between expression statements by a semicolon.

    A simplified version of the semantic patch that performs this
    transformation is as follows:

    //
    @r@
    expression e1,e2,e;
    type T;
    identifier i;
    @@

    e1
    -,
    +;
    e2;
    //

    Signed-off-by: Himangi Saraogi
    Acked-by: Julia Lawall
    Signed-off-by: David S. Miller

    Himangi Saraogi
     

23 Apr, 2014

1 commit

  • Commit f1370cc4 "xfrm: Remove useless secid field from xfrm_audit." changed
    "struct xfrm_audit" to have either
    { audit_get_loginuid(current) / audit_get_sessionid(current) } or
    { INVALID_UID / -1 } pair.

    This means that we can represent "struct xfrm_audit" as "bool".
    This patch replaces "struct xfrm_audit" argument with "bool".

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Steffen Klassert

    Tetsuo Handa
     

22 Apr, 2014

1 commit

  • It seems to me that commit ab5f5e8b "[XFRM]: xfrm audit calls" is doing
    something strange at xfrm_audit_helper_usrinfo().
    If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls
    audit_log_task_context() which basically does
    secid != 0 && security_secid_to_secctx(secid) == 0 case
    except that secid is obtained from current thread's context.

    Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was
    obtained from other thread's context? It might audit current thread's
    context rather than other thread's context if security_secid_to_secctx()
    in xfrm_audit_helper_usrinfo() failed for some reason.

    Then, are all the caller of xfrm_audit_helper_usrinfo() passing either
    secid obtained from current thread's context or secid == 0?
    It seems to me that they are.

    If I didn't miss something, we don't need to pass secid to
    xfrm_audit_helper_usrinfo() because audit_log_task_context() will
    obtain secid from current thread's context.

    Signed-off-by: Tetsuo Handa
    Signed-off-by: Steffen Klassert

    Tetsuo Handa
     

12 Apr, 2014

1 commit

  • Several spots in the kernel perform a sequence like:

    skb_queue_tail(&sk->s_receive_queue, skb);
    sk->sk_data_ready(sk, skb->len);

    But at the moment we place the SKB onto the socket receive queue it
    can be consumed and freed up. So this skb->len access is potentially
    to freed up memory.

    Furthermore, the skb->len can be modified by the consumer so it is
    possible that the value isn't accurate.

    And finally, no actual implementation of this callback actually uses
    the length argument. And since nobody actually cared about it's
    value, lots of call sites pass arbitrary values in such as '0' and
    even '1'.

    So just remove the length argument from the callback, that way there
    is no confusion whatsoever and all of these use-after-free cases get
    fixed as a side effect.

    Based upon a patch by Eric Dumazet and his suggestion to audit this
    issue tree-wide.

    Signed-off-by: David S. Miller

    David S. Miller