24 Mar, 2019

1 commit

  • [ Upstream commit fc2d5cfdcfe2ab76b263d91429caa22451123085 ]

    Attempting to avoid cloning the skb when broadcasting by inflating
    the refcount with sock_hold/sock_put while under RCU lock is dangerous
    and violates RCU principles. It leads to subtle race conditions when
    attempting to free the SKB, as we may reference sockets that have
    already been freed by the stack.

    Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b
    [006b6b6b6b6b6c4b] address between user and kernel address ranges
    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    task: fffffff78f65b380 task.stack: ffffff8049a88000
    pc : sock_rfree+0x38/0x6c
    lr : skb_release_head_state+0x6c/0xcc
    Process repro (pid: 7117, stack limit = 0xffffff8049a88000)
    Call trace:
    sock_rfree+0x38/0x6c
    skb_release_head_state+0x6c/0xcc
    skb_release_all+0x1c/0x38
    __kfree_skb+0x1c/0x30
    kfree_skb+0xd0/0xf4
    pfkey_broadcast+0x14c/0x18c
    pfkey_sendmsg+0x1d8/0x408
    sock_sendmsg+0x44/0x60
    ___sys_sendmsg+0x1d0/0x2a8
    __sys_sendmsg+0x64/0xb4
    SyS_sendmsg+0x34/0x4c
    el0_svc_naked+0x34/0x38
    Kernel panic - not syncing: Fatal exception

    Suggested-by: Eric Dumazet
    Signed-off-by: Sean Tranchetti
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin

    Sean Tranchetti
     

28 Jul, 2018

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2018-07-27

    1) Extend the output_mark to also support the input direction
    and masking the mark values before applying to the skb.

    2) Add a new lookup key for the upcomming xfrm interfaces.

    3) Extend the xfrm lookups to match xfrm interface IDs.

    4) Add virtual xfrm interfaces. The purpose of these interfaces
    is to overcome the design limitations that the existing
    VTI devices have.

    The main limitations that we see with the current VTI are the
    following:

    VTI interfaces are L3 tunnels with configurable endpoints.
    For xfrm, the tunnel endpoint are already determined by the SA.
    So the VTI tunnel endpoints must be either the same as on the
    SA or wildcards. In case VTI tunnel endpoints are same as on
    the SA, we get a one to one correlation between the SA and
    the tunnel. So each SA needs its own tunnel interface.

    On the other hand, we can have only one VTI tunnel with
    wildcard src/dst tunnel endpoints in the system because the
    lookup is based on the tunnel endpoints. The existing tunnel
    lookup won't work with multiple tunnels with wildcard
    tunnel endpoints. Some usecases require more than on
    VTI tunnel of this type, for example if somebody has multiple
    namespaces and every namespace requires such a VTI.

    VTI needs separate interfaces for IPv4 and IPv6 tunnels.
    So when routing to a VTI, we have to know to which address
    family this traffic class is going to be encapsulated.
    This is a lmitation because it makes routing more complex
    and it is not always possible to know what happens behind the
    VTI, e.g. when the VTI is move to some namespace.

    VTI works just with tunnel mode SAs. We need generic interfaces
    that ensures transfomation, regardless of the xfrm mode and
    the encapsulated address family.

    VTI is configured with a combination GRE keys and xfrm marks.
    With this we have to deal with some extra cases in the generic
    tunnel lookup because the GRE keys on the VTI are actually
    not GRE keys, the GRE keys were just reused for something else.
    All extensions to the VTI interfaces would require to add
    even more complexity to the generic tunnel lookup.

    So to overcome this, we developed xfrm interfaces with the
    following design goal:

    It should be possible to tunnel IPv4 and IPv6 through the same
    interface.

    No limitation on xfrm mode (tunnel, transport and beet).

    Should be a generic virtual interface that ensures IPsec
    transformation, no need to know what happens behind the
    interface.

    Interfaces should be configured with a new key that must match a
    new policy/SA lookup key.

    The lookup logic should stay in the xfrm codebase, no need to
    change or extend generic routing and tunnel lookups.

    Should be possible to use IPsec hardware offloads of the underlying
    interface.

    5) Remove xfrm pcpu policy cache. This was added after the flowcache
    removal, but it turned out to make things even worse.
    From Florian Westphal.

    6) Allow to update the set mark on SA updates.
    From Nathan Harold.

    7) Convert some timestamps to time64_t.
    From Arnd Bergmann.

    8) Don't check the offload_handle in xfrm code,
    it is an opaque data cookie for the driver.
    From Shannon Nelson.

    9) Remove xfrmi interface ID from flowi. After this pach
    no generic code is touched anymore to do xfrm interface
    lookups. From Benedict Wong.

    10) Allow to update the xfrm interface ID on SA updates.
    From Nathan Harold.

    11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
    From YueHaibing.

    12) Return more detailed errors on xfrm interface creation.
    From Benedict Wong.

    13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
    From the kbuild test robot.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Jun, 2018

1 commit

  • The poll() changes were not well thought out, and completely
    unexplained. They also caused a huge performance regression, because
    "->poll()" was no longer a trivial file operation that just called down
    to the underlying file operations, but instead did at least two indirect
    calls.

    Indirect calls are sadly slow now with the Spectre mitigation, but the
    performance problem could at least be largely mitigated by changing the
    "->get_poll_head()" operation to just have a per-file-descriptor pointer
    to the poll head instead. That gets rid of one of the new indirections.

    But that doesn't fix the new complexity that is completely unwarranted
    for the regular case. The (undocumented) reason for the poll() changes
    was some alleged AIO poll race fixing, but we don't make the common case
    slower and more complex for some uncommon special case, so this all
    really needs way more explanations and most likely a fundamental
    redesign.

    [ This revert is a revert of about 30 different commits, not reverted
    individually because that would just be unnecessarily messy - Linus ]

    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Jun, 2018

1 commit

  • This patch adds the xfrm interface id as a lookup key
    for xfrm states and policies. With this we can assign
    states and policies to virtual xfrm interfaces.

    Signed-off-by: Steffen Klassert
    Acked-by: Shannon Nelson
    Acked-by: Benedict Wong
    Tested-by: Benedict Wong
    Tested-by: Antony Antony
    Reviewed-by: Eyal Birger

    Steffen Klassert
     

05 Jun, 2018

1 commit

  • Pull aio updates from Al Viro:
    "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

    The only thing I'm holding back for a day or so is Adam's aio ioprio -
    his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
    but let it sit in -next for decency sake..."

    * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    aio: sanitize the limit checking in io_submit(2)
    aio: fold do_io_submit() into callers
    aio: shift copyin of iocb into io_submit_one()
    aio_read_events_ring(): make a bit more readable
    aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
    aio: take list removal to (some) callers of aio_complete()
    aio: add missing break for the IOCB_CMD_FDSYNC case
    random: convert to ->poll_mask
    timerfd: convert to ->poll_mask
    eventfd: switch to ->poll_mask
    pipe: convert to ->poll_mask
    crypto: af_alg: convert to ->poll_mask
    net/rxrpc: convert to ->poll_mask
    net/iucv: convert to ->poll_mask
    net/phonet: convert to ->poll_mask
    net/nfc: convert to ->poll_mask
    net/caif: convert to ->poll_mask
    net/bluetooth: convert to ->poll_mask
    net/sctp: convert to ->poll_mask
    net/tipc: convert to ->poll_mask
    ...

    Linus Torvalds
     

26 May, 2018

1 commit


16 May, 2018

1 commit

  • Variants of proc_create{,_data} that directly take a struct seq_operations
    and deal with network namespaces in ->open and ->release. All callers of
    proc_create + seq_open_net converted over, and seq_{open,release}_net are
    removed entirely.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

09 Apr, 2018

1 commit

  • Key extensions (struct sadb_key) include a user-specified number of key
    bits. The kernel uses that number to determine how much key data to copy
    out of the message in pfkey_msg2xfrm_state().

    The length of the sadb_key message must be verified to be long enough,
    even in the case of SADB_X_AALG_NULL. Furthermore, the sadb_key_len value
    must be long enough to include both the key data and the struct sadb_key
    itself.

    Introduce a helper function verify_key_len(), and call it from
    parse_exthdrs() where other exthdr types are similarly checked for
    correctness.

    Signed-off-by: Kevin Easton
    Reported-by: syzbot+5022a34ca5a3d49b84223653fab632dfb7b4cf37@syzkaller.appspotmail.com
    Signed-off-by: Steffen Klassert

    Kevin Easton
     

10 Jan, 2018

1 commit


30 Dec, 2017

2 commits

  • If a message sent to a PF_KEY socket ended with an incomplete extension
    header (fewer than 4 bytes remaining), then parse_exthdrs() read past
    the end of the message, into uninitialized memory. Fix it by returning
    -EINVAL in this case.

    Reproducer:

    #include
    #include
    #include

    int main()
    {
    int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
    char buf[17] = { 0 };
    struct sadb_msg *msg = (void *)buf;

    msg->sadb_msg_version = PF_KEY_V2;
    msg->sadb_msg_type = SADB_DELETE;
    msg->sadb_msg_len = 2;

    write(sock, buf, 17);
    }

    Cc: stable@vger.kernel.org
    Signed-off-by: Eric Biggers
    Signed-off-by: Steffen Klassert

    Eric Biggers
     
  • If a message sent to a PF_KEY socket ended with one of the extensions
    that takes a 'struct sadb_address' but there were not enough bytes
    remaining in the message for the ->sa_family member of the 'struct
    sockaddr' which is supposed to follow, then verify_address_len() read
    past the end of the message, into uninitialized memory. Fix it by
    returning -EINVAL in this case.

    This bug was found using syzkaller with KMSAN.

    Reproducer:

    #include
    #include
    #include

    int main()
    {
    int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
    char buf[24] = { 0 };
    struct sadb_msg *msg = (void *)buf;
    struct sadb_address *addr = (void *)(msg + 1);

    msg->sadb_msg_version = PF_KEY_V2;
    msg->sadb_msg_type = SADB_DELETE;
    msg->sadb_msg_len = 3;
    addr->sadb_address_len = 1;
    addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;

    write(sock, buf, 24);
    }

    Reported-by: Alexander Potapenko
    Cc: stable@vger.kernel.org
    Signed-off-by: Eric Biggers
    Signed-off-by: Steffen Klassert

    Eric Biggers
     

14 Nov, 2017

1 commit


16 Aug, 2017

1 commit


15 Aug, 2017

1 commit

  • pfkey_broadcast() might be called from non process contexts,
    we can not use GFP_KERNEL in these cases [1].

    This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in
    af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
    section.

    [1] : syzkaller reported :

    in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
    3 locks held by syzkaller183439/2932:
    #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
    #1: (&pfk->dump_lock){+.+.+.}, at: [] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
    #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [] spin_lock_bh include/linux/spinlock.h:304 [inline]
    #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
    CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:16 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:52
    ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
    __might_sleep+0x95/0x190 kernel/sched/core.c:5947
    slab_pre_alloc_hook mm/slab.h:416 [inline]
    slab_alloc mm/slab.c:3383 [inline]
    kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
    skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
    pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
    pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
    dump_sp+0x3d6/0x500 net/key/af_key.c:2685
    xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
    pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
    pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
    pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
    pfkey_process+0x606/0x710 net/key/af_key.c:2814
    pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    ___sys_sendmsg+0x755/0x890 net/socket.c:2035
    __sys_sendmsg+0xe5/0x210 net/socket.c:2069
    SYSC_sendmsg net/socket.c:2080 [inline]
    SyS_sendmsg+0x2d/0x50 net/socket.c:2076
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x445d79
    RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
    RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
    RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
    R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
    R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000

    Fixes: ba51b6be38c1 ("net: Fix RCU splat in af_key")
    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Cc: David Ahern
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Jul, 2017

1 commit

  • After rcu conversions performance degradation in forward tests isn't that
    noticeable anymore.

    See next patch for some numbers.

    A followup patcg could then also remove genid from the policies
    as we do not cache bundles anymore.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

05 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

01 Jul, 2017

4 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    This patch uses refcount_inc_not_zero() instead of
    atomic_inc_not_zero_hint() due to absense of a _hint()
    version of refcount API. If the hint() version must
    be used, we might need to revisit API.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • A set of overlapping changes in macvlan and the rocker
    driver, nothing serious.

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2017

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2017-06-23

    1) Use memdup_user to spmlify xfrm_user_policy.
    From Geliang Tang.

    2) Make xfrm_dev_register static to silence a sparse warning.
    From Wei Yongjun.

    3) Use crypto_memneq to check the ICV in the AH protocol.
    From Sabrina Dubroca.

    4) Remove some unused variables in esp6.
    From Stephen Hemminger.

    5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
    From Antony Antony.

    6) Include the UDP encapsulation port to km_migrate announcements.
    From Antony Antony.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jun, 2017

3 commits

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions (skb_put, __skb_put and pskb_put) return void *
    and remove all the casts across the tree, adding a (u8 *) cast only
    where the unsigned char pointer was used directly, all done with the
    following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    which actually doesn't cover pskb_put since there are only three
    users overall.

    A handful of stragglers were converted manually, notably a macro in
    drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
    instances in net/bluetooth/hci_sock.c. In the former file, I also
    had to fix one whitespace problem spatch introduced.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • A common pattern with skb_put() is to just want to memcpy()
    some data into the new space, introduce skb_put_data() for
    this.

    An spatch similar to the one for skb_put_zero() converts many
    of the places using it:

    @@
    identifier p, p2;
    expression len, skb, data;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_data(skb, data, len);
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, len);
    |
    -memcpy(p, data, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb, data;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_data(skb, data, sizeof(t));
    )
    (
    p2 = (t2)p;
    -memcpy(p2, data, sizeof(*p));
    |
    -memcpy(p, data, sizeof(*p));
    )

    @@
    expression skb, len, data;
    @@
    -memcpy(skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

    (again, manually post-processed to retain some comments)

    Reviewed-by: Stephen Hemminger
    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

14 Jun, 2017

2 commits

  • The default error code in pfkey_msg2xfrm_state() is -ENOBUFS. We
    added a new call to security_xfrm_state_alloc() which sets "err" to zero
    so there several places where we can return ERR_PTR(0) if kmalloc()
    fails. The caller is expecting error pointers so it leads to a NULL
    dereference.

    Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Steffen Klassert

    Dan Carpenter
     
  • There are some missing error codes here so we accidentally return NULL
    instead of an error pointer. It results in a NULL pointer dereference.

    Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
    Signed-off-by: Dan Carpenter
    Signed-off-by: Steffen Klassert

    Dan Carpenter
     

12 Jun, 2017

1 commit

  • Now we will force to do garbage collection if any policy removed in
    xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
    first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
    -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
    dereference when check percpu_empty. The code path looks like:

    flow_cache_fini()
    - fc->percpu = NULL
    xfrm_policy_fini()
    - xfrm_policy_flush()
    - xfrm_garbage_collect()
    - flow_cache_flush()
    - flow_cache_percpu_empty()
    - fcp = per_cpu_ptr(fc->percpu, cpu)

    To reproduce, just add ipsec in netns and then remove the netns.

    v2:
    As Xin Long suggested, since only two other places need to call it. move
    xfrm_garbage_collect() outside xfrm_policy_flush().

    v3:
    Fix subject mismatch after v2 fix.

    Fixes: 35db06912189 ("xfrm: do the garbage collection after flushing policy")
    Signed-off-by: Hangbin Liu
    Reviewed-by: Xin Long
    Signed-off-by: Steffen Klassert

    Hangbin Liu
     

07 Jun, 2017

2 commits

  • Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
    to userland. Only add if XFRMA_ENCAP was in user migrate request.

    Signed-off-by: Antony Antony
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Steffen Klassert

    Antony Antony
     
  • Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional
    netlink attribute XFRMA_ENCAP.

    The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8)
    could go to sleep for a few minutes and wake up. When it wake up the
    NAT mapping could have expired, the device send a MOBIKE UPDATE_SA
    message to migrate the IPsec SA. The change could be a change UDP
    encapsulation port, IP address, or both.

    Reported-by: Paul Wouters
    Signed-off-by: Antony Antony
    Reviewed-by: Richard Guy Briggs
    Signed-off-by: Steffen Klassert

    Antony Antony
     

08 May, 2017

1 commit

  • The sadb_x_sec_len is stored in the unit 'byte divided by eight'.
    So we have to multiply this value by eight before we can do
    size checks. Otherwise we may get a slab-out-of-bounds when
    we memcpy the user sec_ctx.

    Fixes: df71837d502 ("[LSM-IPSec]: Security association restriction.")
    Reported-by: Andrey Konovalov
    Tested-by: Andrey Konovalov
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

22 Apr, 2017

1 commit


18 Apr, 2017

1 commit

  • The parsing of sadb_x_ipsecrequest is broken in a number of ways.
    First of all we're not verifying sadb_x_ipsecrequest_len. This
    is needed when the structure carries addresses at the end. Worse
    we don't even look at the length when we parse those optional
    addresses.

    The migration code had similar parsing code that's better but
    it also has some deficiencies. The length is overcounted first
    of all as it includes the header itself. It also fails to check
    the length before dereferencing the sa_family field.

    This patch fixes those problems in parse_sockaddr_pair and then
    uses it in parse_ipsecrequest.

    Reported-by: Andrey Konovalov
    Signed-off-by: Herbert Xu
    Signed-off-by: Steffen Klassert

    Herbert Xu
     

03 Apr, 2017

1 commit

  • A dump may come in the middle of another dump, modifying its dump
    structure members. This race condition will result in NULL pointer
    dereference in kernel. So add a lock to prevent that race.

    Fixes: 83321d6b9872 ("[AF_KEY]: Dump SA/SP entries non-atomically")
    Signed-off-by: Yuejie Shi
    Signed-off-by: Steffen Klassert

    Yuejie Shi
     

24 Mar, 2017

1 commit


18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

23 Oct, 2015

1 commit


25 Aug, 2015

1 commit

  • Hit the following splat testing VRF change for ipsec:

    [ 113.475692] ===============================
    [ 113.476194] [ INFO: suspicious RCU usage. ]
    [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted
    [ 113.477545] -------------------------------
    [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section!
    [ 113.479288]
    [ 113.479288] other info that might help us debug this:
    [ 113.479288]
    [ 113.480207]
    [ 113.480207] rcu_scheduler_active = 1, debug_locks = 1
    [ 113.480931] 2 locks held by setkey/6829:
    [ 113.481371] #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [] pfkey_sendmsg+0xfb/0x213
    [ 113.482509] #1: (rcu_read_lock){......}, at: [] rcu_read_lock+0x0/0x6e
    [ 113.483509]
    [ 113.483509] stack backtrace:
    [ 113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED
    [ 113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
    [ 113.486845] 0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962
    [ 113.487732] ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154
    [ 113.488628] 0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8
    [ 113.489525] Call Trace:
    [ 113.489813] [] dump_stack+0x4c/0x65
    [ 113.490389] [] ? console_unlock+0x3d6/0x405
    [ 113.491039] [] lockdep_rcu_suspicious+0xfa/0x103
    [ 113.491735] [] rcu_preempt_sleep_check+0x45/0x47
    [ 113.492442] [] ___might_sleep+0x19/0x1c8
    [ 113.493077] [] __might_sleep+0x6c/0x82
    [ 113.493681] [] cache_alloc_debugcheck_before.isra.50+0x1d/0x24
    [ 113.494508] [] kmem_cache_alloc+0x31/0x18f
    [ 113.495149] [] skb_clone+0x64/0x80
    [ 113.495712] [] pfkey_broadcast_one+0x3d/0xff
    [ 113.496380] [] pfkey_broadcast+0xb5/0x11e
    [ 113.497024] [] pfkey_register+0x191/0x1b1
    [ 113.497653] [] pfkey_process+0x162/0x17e
    [ 113.498274] [] pfkey_sendmsg+0x109/0x213

    In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
    the RCU lock.

    Since pfkey_broadcast takes the RCU lock the allocation argument is
    pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
    The one call outside of rcu can be done with GFP_KERNEL.

    Fixes: 7f6b9dbd5afbd ("af_key: locking change")
    Signed-off-by: David Ahern
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David Ahern
     

25 Jun, 2015

1 commit

  • Pull networking updates from David Miller:

    1) Add TX fast path in mac80211, from Johannes Berg.

    2) Add TSO/GRO support to ibmveth, from Thomas Falcon

    3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

    4) Lots of new rhashtable tests, from Thomas Graf.

    5) Run ingress qdisc lockless, from Alexei Starovoitov.

    6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting. From Eric Dumazet.

    7) Add mode parameter to pktgen, for testing receive. From Alexei
    Starovoitov.

    8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

    9) Move page frag allocator under mm/, also from Alexander.

    10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

    11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

    12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier. From Jiri Pirko.

    13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics. From Willem de Bruijn.

    14) Add netdev driver for GENEVE tunnels, from John W Linville.

    15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

    16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

    17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

    18) Add tail call support to BPF, from Alexei Starovoitov.

    19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

    20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

    21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion. From Eric Dumazet.

    22) Add Cavium ThunderX driver, from Sunil Goutham.

    23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

    24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

    25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation. From Wei Liu.

    26) Add more entropy inputs to flow dissector, from Tom Herbert.

    27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

    28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

    29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
    bridge: vlan: flush the dynamically learned entries on port vlan delete
    bridge: multicast: add a comment to br_port_state_selection about blocking state
    net: inet_diag: export IPV6_V6ONLY sockopt
    stmmac: troubleshoot unexpected bits in des0 & des1
    net: ipv4 sysctl option to ignore routes when nexthop link is down
    net: track link-status of ipv4 nexthops
    net: switchdev: ignore unsupported bridge flags
    net: Cavium: Fix MAC address setting in shutdown state
    drivers: net: xgene: fix for ACPI support without ACPI
    ip: report the original address of ICMP messages
    net/mlx5e: Prefetch skb data on RX
    net/mlx5e: Pop cq outside mlx5e_get_cqe
    net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
    net/mlx5e: Remove extra spaces
    net/mlx5e: Avoid TX CQE generation if more xmit packets expected
    net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
    net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
    net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
    net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
    net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
    ...

    Linus Torvalds
     

28 May, 2015

1 commit


11 May, 2015

1 commit