10 Jan, 2019

1 commit

  • [ Upstream commit 69d2c86766da2ded2b70281f1bf242cb0d58a778 ]

    vr.mifi is indirectly controlled by user-space, hence leading to
    a potential exploitation of the Spectre variant 1 vulnerability.

    This issue was detected with the help of Smatch:

    net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
    net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)

    Fix this by sanitizing vr.mifi before using it to index mrt->vif_table'

    Notice that given that speculation windows are large, the policy is
    to kill the speculation on the first load and not worry if it can be
    completed with a dependent load/store [1].

    [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     

12 Jun, 2018

1 commit

  • [ Upstream commit 848235edb5c93ed086700584c8ff64f6d7fc778d ]

    Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during
    ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same
    setsockopt will fail with -ENOENT, since we haven't actually created
    that table.

    A similar fix for ipv4 was included in commit 5e1859fbcc3c ("ipv4: ipmr:
    various fixes and cleanups").

    Fixes: d1db275dd3f6 ("ipv6: ip6mr: support multiple tables")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     

13 Feb, 2018

1 commit

  • [ Upstream commit 4adfa79fc254efb7b0eb3cd58f62c2c3f805f1ba ]

    When we dump the ip6mr mfc entries via proc, we initialize an iterator
    with the table to dump but we don't clear the cache pointer which might
    be initialized from a prior read on the same descriptor that ended. This
    can result in lock imbalance (an unnecessary unlock) leading to other
    crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
    Thanks for the reliable reproducer.

    Here's syzbot's trace:
    WARNING: bad unlock balance detected!
    4.15.0-rc3+ #128 Not tainted
    syzkaller971460/3195 is trying to release lock (mrt_lock) at:
    [] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
    but there are no more locks to release!

    other info that might help us debug this:
    1 lock held by syzkaller971460/3195:
    #0: (&p->lock){+.+.}, at: [] seq_read+0xd5/0x13d0
    fs/seq_file.c:165

    stack backtrace:
    CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
    __lock_release kernel/locking/lockdep.c:3775 [inline]
    lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
    __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
    _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
    ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
    traverse+0x3bc/0xa00 fs/seq_file.c:135
    seq_read+0x96a/0x13d0 fs/seq_file.c:189
    proc_reg_read+0xef/0x170 fs/proc/inode.c:217
    do_loop_readv_writev fs/read_write.c:673 [inline]
    do_iter_read+0x3db/0x5b0 fs/read_write.c:897
    compat_readv+0x1bf/0x270 fs/read_write.c:1140
    do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
    C_SYSC_preadv fs/read_write.c:1209 [inline]
    compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
    do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
    do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
    entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
    RIP: 0023:0xf7f73c79
    RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
    RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
    RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    BUG: sleeping function called from invalid context at lib/usercopy.c:25
    in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
    INFO: lockdep is turned off.
    CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
    __might_sleep+0x95/0x190 kernel/sched/core.c:6013
    __might_fault+0xab/0x1d0 mm/memory.c:4525
    _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
    copy_to_user include/linux/uaccess.h:155 [inline]
    seq_read+0xcb4/0x13d0 fs/seq_file.c:279
    proc_reg_read+0xef/0x170 fs/proc/inode.c:217
    do_loop_readv_writev fs/read_write.c:673 [inline]
    do_iter_read+0x3db/0x5b0 fs/read_write.c:897
    compat_readv+0x1bf/0x270 fs/read_write.c:1140
    do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
    C_SYSC_preadv fs/read_write.c:1209 [inline]
    compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
    do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
    do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
    entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
    RIP: 0023:0xf7f73c79
    RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
    RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
    RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
    lib/usercopy.c:26

    Reported-by: syzbot
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     

10 Aug, 2017

1 commit

  • This change allows us to later indicate to rtnetlink core that certain
    doit functions should be called without acquiring rtnl_mutex.

    This change should have no effect, we simply replace the last (now
    unused) calcit argument with the new flag.

    Signed-off-by: Florian Westphal
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     

21 Jun, 2017

1 commit

  • Add Netlink notifications on cache reports in ip6mr, in addition to the
    existing mrt6msg sent to mroute6_sk.
    Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R.

    MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
    same data as their equivalent fields in the mrt6msg header.
    PKT attribute is the packet sent to mroute6_sk, without the added
    mrt6msg header.

    Suggested-by: Ryan Halbrook
    Signed-off-by: Julien Gomes
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Julien Gomes
     

16 Jun, 2017

1 commit

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = {
    skb_pull,
    __skb_pull,
    skb_pull_inline,
    __pskb_pull_tail,
    __pskb_pull,
    pskb_pull
    };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = {
    skb_pull,
    __skb_pull,
    skb_pull_inline,
    __pskb_pull_tail,
    __pskb_pull,
    pskb_pull
    };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

08 Jun, 2017

1 commit

  • Network devices can allocate reasources and private memory using
    netdev_ops->ndo_init(). However, the release of these resources
    can occur in one of two different places.

    Either netdev_ops->ndo_uninit() or netdev->destructor().

    The decision of which operation frees the resources depends upon
    whether it is necessary for all netdev refs to be released before it
    is safe to perform the freeing.

    netdev_ops->ndo_uninit() presumably can occur right after the
    NETDEV_UNREGISTER notifier completes and the unicast and multicast
    address lists are flushed.

    netdev->destructor(), on the other hand, does not run until the
    netdev references all go away.

    Further complicating the situation is that netdev->destructor()
    almost universally does also a free_netdev().

    This creates a problem for the logic in register_netdevice().
    Because all callers of register_netdevice() manage the freeing
    of the netdev, and invoke free_netdev(dev) if register_netdevice()
    fails.

    If netdev_ops->ndo_init() succeeds, but something else fails inside
    of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
    it is not able to invoke netdev->destructor().

    This is because netdev->destructor() will do a free_netdev() and
    then the caller of register_netdevice() will do the same.

    However, this means that the resources that would normally be released
    by netdev->destructor() will not be.

    Over the years drivers have added local hacks to deal with this, by
    invoking their destructor parts by hand when register_netdevice()
    fails.

    Many drivers do not try to deal with this, and instead we have leaks.

    Let's close this hole by formalizing the distinction between what
    private things need to be freed up by netdev->destructor() and whether
    the driver needs unregister_netdevice() to perform the free_netdev().

    netdev->priv_destructor() performs all actions to free up the private
    resources that used to be freed by netdev->destructor(), except for
    free_netdev().

    netdev->needs_free_netdev is a boolean that indicates whether
    free_netdev() should be done at the end of unregister_netdevice().

    Now, register_netdevice() can sanely release all resources after
    ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
    and netdev->priv_destructor().

    And at the end of unregister_netdevice(), we invoke
    netdev->priv_destructor() and optionally call free_netdev().

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Apr, 2017

2 commits

  • Both conflict were simple overlapping changes.

    In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
    conversion of the driver in net-next to use in-netdev stats.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
    because we call unregister_netdevice_many for a device that is already
    being destroyed. In IPv4's ipmr that has been resolved by two commits
    long time ago by introducing the "notify" parameter to the delete
    function and avoiding the unregister when called from a notifier, so
    let's do the same for ip6mr.

    The trace from Andrey:
    ------------[ cut here ]------------
    kernel BUG at net/core/dev.c:6813!
    invalid opcode: 0000 [#1] SMP KASAN
    Modules linked in:
    CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
    01/01/2011
    Workqueue: netns cleanup_net
    task: ffff880069208000 task.stack: ffff8800692d8000
    RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
    RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
    RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
    RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
    R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
    R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
    FS: 0000000000000000(0000) GS:ffff88006cb00000(0000)
    knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
    Call Trace:
    unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
    unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
    ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
    notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394
    raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
    call_netdevice_notifiers net/core/dev.c:1663
    rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
    unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
    unregister_netdevice_many net/core/dev.c:7880
    default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
    ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
    cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
    process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
    worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
    kthread+0x35e/0x430 kernel/kthread.c:231
    ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
    Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
    47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe
    0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
    RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
    ---[ end trace e0b29c57e9b3292c ]---

    Reported-by: Andrey Konovalov
    Signed-off-by: Nikolay Aleksandrov
    Tested-by: Andrey Konovalov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

29 Mar, 2017

1 commit


27 Feb, 2017

1 commit

  • Commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups") fixed
    the issue for ipv4 ipmr:

    ip_mroute_setsockopt() & ip_mroute_getsockopt() should not
    access/set raw_sk(sk)->ipmr_table before making sure the socket
    is a raw socket, and protocol is IGMP

    The same fix should be done for ipv6 ipmr as well.

    This patch can fix the panic caused by overwriting the same offset
    as ipmr_table as in raw_sk(sk) when accessing other type's socket
    by ip_mroute_setsockopt().

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

19 Jan, 2017

1 commit

  • All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and
    simplify rt6_fill_node accordingly.

    rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the
    nowait arg from it as well.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

03 Jan, 2017

1 commit

  • While working with ipmr, we noticed that it is impossible to determine
    if an entry is actually unresolved or its IIF interface has disappeared
    (e.g. virtual interface got deleted). These entries look almost
    identical to user-space when dumping or receiving notifications. So in
    order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
    sending an unresolved cache entry to user-space.

    Suggested-by: Roopa Prabhu
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

25 Dec, 2016

1 commit


01 Nov, 2016

1 commit


26 Sep, 2016

1 commit

  • Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
    instead of the previous dst_pid which was copied from in_skb's portid.
    Since the skb is new the portid is 0 at that point so the packets are sent
    to the kernel and we get scheduling while atomic or a deadlock (depending
    on where it happens) by trying to acquire rtnl two times.
    Also since this is RTM_GETROUTE, it can be triggered by a normal user.

    Here's the sleeping while atomic trace:
    [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
    [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
    [ 7858.212881] 2 locks held by swapper/0/0:
    [ 7858.213013] #0: (((&mrt->ipmr_expire_timer))){+.-...}, at: [] call_timer_fn+0x5/0x350
    [ 7858.213422] #1: (mfc_unres_lock){+.....}, at: [] ipmr_expire_process+0x25/0x130
    [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
    [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    [ 7858.214108] 0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
    [ 7858.214412] ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
    [ 7858.214716] 000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
    [ 7858.215251] Call Trace:
    [ 7858.215412] [] dump_stack+0x85/0xc1
    [ 7858.215662] [] ___might_sleep+0x192/0x250
    [ 7858.215868] [] __might_sleep+0x6f/0x100
    [ 7858.216072] [] mutex_lock_nested+0x33/0x4d0
    [ 7858.216279] [] ? netlink_lookup+0x25f/0x460
    [ 7858.216487] [] rtnetlink_rcv+0x1b/0x40
    [ 7858.216687] [] netlink_unicast+0x19c/0x260
    [ 7858.216900] [] rtnl_unicast+0x20/0x30
    [ 7858.217128] [] ipmr_destroy_unres+0xa9/0xf0
    [ 7858.217351] [] ipmr_expire_process+0x8f/0x130
    [ 7858.217581] [] ? ipmr_net_init+0x180/0x180
    [ 7858.217785] [] ? ipmr_net_init+0x180/0x180
    [ 7858.217990] [] call_timer_fn+0xa5/0x350
    [ 7858.218192] [] ? call_timer_fn+0x5/0x350
    [ 7858.218415] [] ? ipmr_net_init+0x180/0x180
    [ 7858.218656] [] run_timer_softirq+0x260/0x640
    [ 7858.218865] [] ? __do_softirq+0xbb/0x54f
    [ 7858.219068] [] __do_softirq+0xe8/0x54f
    [ 7858.219269] [] irq_exit+0xb8/0xc0
    [ 7858.219463] [] smp_apic_timer_interrupt+0x42/0x50
    [ 7858.219678] [] apic_timer_interrupt+0x8c/0xa0
    [ 7858.219897] [] ? native_safe_halt+0x6/0x10
    [ 7858.220165] [] ? trace_hardirqs_on+0xd/0x10
    [ 7858.220373] [] default_idle+0x23/0x190
    [ 7858.220574] [] arch_cpu_idle+0xf/0x20
    [ 7858.220790] [] default_idle_call+0x4c/0x60
    [ 7858.221016] [] cpu_startup_entry+0x39b/0x4d0
    [ 7858.221257] [] rest_init+0x135/0x140
    [ 7858.221469] [] start_kernel+0x50e/0x51b
    [ 7858.221670] [] ? early_idt_handler_array+0x120/0x120
    [ 7858.221894] [] x86_64_start_reservations+0x2a/0x2c
    [ 7858.222113] [] x86_64_start_kernel+0x13b/0x14a

    Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

21 Sep, 2016

1 commit

  • When I introduced the lastuse member I made a subtle error because it was
    returned as an absolute value but that is meaningless to user-space as it
    doesn't allow to see how old exactly an entry is. Let's make it similar to
    how the bridge returns such values and make it relative to "now" (jiffies).
    This allows us to show the actual age of the entries and is much more
    useful (e.g. user-space daemons can age out entries, iproute2 can display
    the lastuse properly).

    Fixes: 43b9e1274060 ("net: ipmr/ip6mr: add support for keeping an entry age")
    Reported-by: Satish Ashok
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

27 Jul, 2016

1 commit

  • Currently lastuse is updated on entry creation and cache hit, but it should
    also be updated on entry change. Since both on add and update the ttl array
    is updated we can simply update the lastuse in ipmr_update_thresholds.

    Signed-off-by: Nikolay Aleksandrov
    CC: Roopa Prabhu
    CC: Donald Sharp
    CC: David S. Miller
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

17 Jul, 2016

1 commit

  • In preparation for hardware offloading of ipmr/ip6mr we need an
    interface that allows to check (and later update) the age of entries.
    Relying on stats alone can show activity but not actual age of the entry,
    furthermore when there're tens of thousands of entries a lot of the
    hardware implementations only support "hit" bits which are cleared on
    read to denote that the entry was active and shouldn't be aged out,
    these can then be naturally translated into age timestamp and will be
    compatible with the software forwarding age. Using a lastuse entry doesn't
    affect performance because the members in that cache line are written to
    along with the age.
    Since all new users are encouraged to use ipmr via netlink, this is
    exported via the RTA_EXPIRES attribute.
    Also do a minor local variable declaration style adjustment - arrange them
    longest to shortest.

    Signed-off-by: Nikolay Aleksandrov
    CC: Roopa Prabhu
    CC: Shrijeet Mukherjee
    CC: Satish Ashok
    CC: Donald Sharp
    CC: David S. Miller
    CC: Alexey Kuznetsov
    CC: James Morris
    CC: Hideaki YOSHIFUJI
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

10 Jul, 2016

1 commit

  • All inet6_netconf_notify_devconf() callers are in process context,
    so we can use GFP_KERNEL allocations if we take care of not holding
    a rwlock while not needed in ip6mr (we hold RTNL there)

    Fixes: d67b8c616b48 ("netconf: advertise mc_forwarding status")
    Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status")
    Signed-off-by: Eric Dumazet
    Cc: Nicolas Dichtel
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Jun, 2016

1 commit


28 Apr, 2016

1 commit


22 Apr, 2016

1 commit


25 Nov, 2015

1 commit

  • Since (at least) commit b17a7c179dd3 ("[NET]: Do sysfs registration as
    part of register_netdevice."), netdev_run_todo() deals only with
    unregistration, so we don't need to do the rtnl_unlock/lock cycle to
    finish registration when failing pimreg or dvmrp device creation. In
    fact that opens a race condition where someone can delete the device
    while rtnl is unlocked because it's fully registered. The problem gets
    worse when netlink support is introduced as there are more points of entry
    that can cause it and it also makes reusing that code correctly impossible.

    Signed-off-by: Nikolay Aleksandrov
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

23 Nov, 2015

1 commit

  • Similar to ipv4, when destroying an mrt table the static mfc entries and
    the static devices are kept, which leads to devices that can never be
    destroyed (because of refcnt taken) and leaked memory. Make sure that
    everything is cleaned up on netns destruction.

    Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
    CC: Benjamin Thery
    Signed-off-by: Nikolay Aleksandrov
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

08 Oct, 2015

1 commit


18 Sep, 2015

4 commits

  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

10 Sep, 2015

1 commit

  • This switches IPv6 policy routing to use the shared
    fib_default_rule_pref() function of IPv4 and DECnet. It is also used in
    multicast routing for IPv4 as well as IPv6.

    The motivation for this patch is a complaint about iproute2 behaving
    inconsistent between IPv4 and IPv6 when adding policy rules: Formerly,
    IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the
    assigned priority value was decreased with each rule added.

    Since then all users of the default_pref field have been converted to
    assign the generic function fib_default_rule_pref(), fib_nl_newrule()
    may just use it directly instead. Therefore get rid of the function
    pointer altogether and make fib_default_rule_pref() static, as it's not
    used outside fib_rules.c anymore.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     

06 Sep, 2015

1 commit

  • In the IPv6 multicast routing code the mrt_lock was not being released
    correctly in the MFC iterator, as a result adding or deleting a MIF would
    cause a hang because the mrt_lock could not be acquired.

    This fix is a copy of the code for the IPv4 case and ensures that the lock
    is released correctly.

    Signed-off-by: Richard Laing
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Richard Laing
     

08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

07 Apr, 2015

1 commit

  • Conflicts:
    drivers/net/ethernet/mellanox/mlx4/cmd.c
    net/core/fib_rules.c
    net/ipv4/fib_frontend.c

    The fib_rules.c and fib_frontend.c conflicts were locking adjustments
    in 'net' overlapping addition and removal of code in 'net-next'.

    The mlx4 conflict was a bug fix in 'net' happening in the same
    place a constant was being replaced with a more suitable macro.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Apr, 2015

5 commits

  • We need to wait for the flying timers, since we
    are going to free the mrtable right after it.

    Cc: Hannes Frederic Sowa
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • We have to hold rtnl lock for fib_rules_unregister()
    otherwise the following race could happen:

    fib_rules_unregister(): fib_nl_delrule():
    ... ...
    ... ops = lookup_rules_ops();
    list_del_rcu(&ops->list);
    list_for_each_entry(ops->rules) {
    fib_rules_cleanup_ops(ops); ...
    list_del_rcu(); list_del_rcu();
    }

    Note, net->rules_mod_lock is actually not needed at all,
    either upper layer netns code or rtnl lock guarantees
    we are safe.

    Cc: Alexander Duyck
    Cc: Thomas Graf
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     
  • Conflicts:
    drivers/net/usb/asix_common.c
    drivers/net/usb/sr9800.c
    drivers/net/usb/usbnet.c
    include/linux/usb/usbnet.h
    net/ipv4/tcp_ipv4.c
    net/ipv6/tcp_ipv6.c

    The TCP conflicts were overlapping changes. In 'net' we added a
    READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
    Eric Dumazet touched the surrounding code dealing with how mini
    sockets are handled.

    With USB, it's a case of the same bug fix first going into net-next
    and then I cherry picked it back into net.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Don't use dev->iflink anymore.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • The goal of this patch is to prepare the removal of the iflink field. It
    introduces a new ndo function, which will be implemented by virtual interfaces.

    There is no functional change into this patch. All readers of iflink field
    now call dev_get_iflink().

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

01 Apr, 2015

1 commit

  • IP addresses are often stored in netlink attributes. Add generic functions
    to do that.

    For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
    not used universally throughout the kernel, in way too many places __be32 is
    used to store IPv4 address.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc