26 Sep, 2016

1 commit

  • Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
    instead of the previous dst_pid which was copied from in_skb's portid.
    Since the skb is new the portid is 0 at that point so the packets are sent
    to the kernel and we get scheduling while atomic or a deadlock (depending
    on where it happens) by trying to acquire rtnl two times.
    Also since this is RTM_GETROUTE, it can be triggered by a normal user.

    Here's the sleeping while atomic trace:
    [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
    [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
    [ 7858.212881] 2 locks held by swapper/0/0:
    [ 7858.213013] #0: (((&mrt->ipmr_expire_timer))){+.-...}, at: [] call_timer_fn+0x5/0x350
    [ 7858.213422] #1: (mfc_unres_lock){+.....}, at: [] ipmr_expire_process+0x25/0x130
    [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
    [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    [ 7858.214108] 0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
    [ 7858.214412] ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
    [ 7858.214716] 000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
    [ 7858.215251] Call Trace:
    [ 7858.215412] [] dump_stack+0x85/0xc1
    [ 7858.215662] [] ___might_sleep+0x192/0x250
    [ 7858.215868] [] __might_sleep+0x6f/0x100
    [ 7858.216072] [] mutex_lock_nested+0x33/0x4d0
    [ 7858.216279] [] ? netlink_lookup+0x25f/0x460
    [ 7858.216487] [] rtnetlink_rcv+0x1b/0x40
    [ 7858.216687] [] netlink_unicast+0x19c/0x260
    [ 7858.216900] [] rtnl_unicast+0x20/0x30
    [ 7858.217128] [] ipmr_destroy_unres+0xa9/0xf0
    [ 7858.217351] [] ipmr_expire_process+0x8f/0x130
    [ 7858.217581] [] ? ipmr_net_init+0x180/0x180
    [ 7858.217785] [] ? ipmr_net_init+0x180/0x180
    [ 7858.217990] [] call_timer_fn+0xa5/0x350
    [ 7858.218192] [] ? call_timer_fn+0x5/0x350
    [ 7858.218415] [] ? ipmr_net_init+0x180/0x180
    [ 7858.218656] [] run_timer_softirq+0x260/0x640
    [ 7858.218865] [] ? __do_softirq+0xbb/0x54f
    [ 7858.219068] [] __do_softirq+0xe8/0x54f
    [ 7858.219269] [] irq_exit+0xb8/0xc0
    [ 7858.219463] [] smp_apic_timer_interrupt+0x42/0x50
    [ 7858.219678] [] apic_timer_interrupt+0x8c/0xa0
    [ 7858.219897] [] ? native_safe_halt+0x6/0x10
    [ 7858.220165] [] ? trace_hardirqs_on+0xd/0x10
    [ 7858.220373] [] default_idle+0x23/0x190
    [ 7858.220574] [] arch_cpu_idle+0xf/0x20
    [ 7858.220790] [] default_idle_call+0x4c/0x60
    [ 7858.221016] [] cpu_startup_entry+0x39b/0x4d0
    [ 7858.221257] [] rest_init+0x135/0x140
    [ 7858.221469] [] start_kernel+0x50e/0x51b
    [ 7858.221670] [] ? early_idt_handler_array+0x120/0x120
    [ 7858.221894] [] x86_64_start_reservations+0x2a/0x2c
    [ 7858.222113] [] x86_64_start_kernel+0x13b/0x14a

    Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

17 Jul, 2016

1 commit

  • In preparation for hardware offloading of ipmr/ip6mr we need an
    interface that allows to check (and later update) the age of entries.
    Relying on stats alone can show activity but not actual age of the entry,
    furthermore when there're tens of thousands of entries a lot of the
    hardware implementations only support "hit" bits which are cleared on
    read to denote that the entry was active and shouldn't be aged out,
    these can then be naturally translated into age timestamp and will be
    compatible with the software forwarding age. Using a lastuse entry doesn't
    affect performance because the members in that cache line are written to
    along with the age.
    Since all new users are encouraged to use ipmr via netlink, this is
    exported via the RTA_EXPIRES attribute.
    Also do a minor local variable declaration style adjustment - arrange them
    longest to shortest.

    Signed-off-by: Nikolay Aleksandrov
    CC: Roopa Prabhu
    CC: Shrijeet Mukherjee
    CC: Satish Ashok
    CC: Donald Sharp
    CC: David S. Miller
    CC: Alexey Kuznetsov
    CC: James Morris
    CC: Hideaki YOSHIFUJI
    CC: Patrick McHardy
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

01 Dec, 2015

3 commits


22 Jan, 2013

1 commit


13 Oct, 2012

1 commit


05 May, 2011

1 commit


30 Jan, 2011

1 commit

  • SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
    which unfortunately means the existing infrastructure for compat networking
    ioctls is insufficient. A trivial compact ioctl implementation would conflict
    with:

    SIOCAX25ADDUID
    SIOCAIPXPRISLT
    SIOCGETSGCNT_IN6
    SIOCGETSGCNT
    SIOCRSSCAUSE
    SIOCX25SSUBSCRIP
    SIOCX25SDTEFACILITIES

    To make this work I have updated the compat_ioctl decode path to mirror the
    the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function
    so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl
    function into struct proto so I can break out ioctls by which kind of ip socket
    I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only
    works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal
    ipmr_ioctl.

    This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
    has unsigned longs in it so changes between 32bit and 64bit kernels.

    This change was sufficient to run a 32bit ip multicast routing daemon on a
    64bit kernel.

    Reported-by: Bill Fenner
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

04 Oct, 2010

1 commit


14 Apr, 2010

3 commits

  • This patch adds support for multiple independant multicast routing instances,
    named "tables".

    Userspace multicast routing daemons can bind to a specific table instance by
    issuing a setsockopt call using a new option MRT_TABLE. The table number is
    stored in the raw socket data and affects all following ipmr setsockopt(),
    getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT)
    is created with a default routing rule pointing to it. Newly created pimreg
    devices have the table number appended ("pimregX"), with the exception of
    devices created in the default table, which are named just "pimreg" for
    compatibility reasons.

    Packets are directed to a specific table instance using routing rules,
    similar to how regular routing rules work. Currently iif, oif and mark
    are supported as keys, source and destination addresses could be supported
    additionally.

    Example usage:

    - bind pimd/xorp/... to a specific table:

    uint32_t table = 123;
    setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table));

    - create routing rules directing packets to the new table:

    # ip mrule add iif eth0 lookup 123
    # ip mrule add oif eth0 lookup 123

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Now that cache entries in unres_queue don't need to be distinguished by their
    network namespace pointer anymore, we can remove it from struct mfc_cache
    add pass the namespace as function argument to the functions that need it.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

05 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Oct, 2009

1 commit

  • When routing daemon wants to enable forwarding of multicast traffic it
    performs something like:

    struct vifctl vc = {
    .vifc_vifi = 1,
    .vifc_flags = 0,
    .vifc_threshold = 1,
    .vifc_rate_limit = 0,
    .vifc_lcl_addr = ip, /* vifc_lcl_addr.s_addr);

    The current API (struct vifctl) does not allow to specify an
    interface other way than using it's IP, and if there are more than a
    single interface with specified IP only the first one will be found.

    The attached patch (against 2.6.30.4) allows to specify an interface
    by its index, instead of IP address:

    struct vifctl vc = {
    .vifc_vifi = 1,
    .vifc_flags = VIFF_USE_IFINDEX, /* NEW */
    .vifc_threshold = 1,
    .vifc_rate_limit = 0,
    .vifc_lcl_ifindex = if_nametoindex("eth0"), /* NEW */
    .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
    };
    setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));

    Signed-off-by: Ilia K.

    === modified file 'include/linux/mroute.h'
    Signed-off-by: David S. Miller

    Ilia K
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Jan, 2009

2 commits

  • This last patch makes the appropriate changes to use and propagate the
    network namespace where needed in IPv4 multicast routing code.

    This consists mainly in replacing all the remaining init_net occurences
    with current netns pointer retrieved from sockets, net devices or
    mfc_caches depending on the routines' contexts.

    Some routines receive a new 'struct net' parameter to propagate the current
    netns:
    * vif_add/vif_delete
    * ipmr_new_tunnel
    * mroute_clean_tables
    * ipmr_cache_find
    * ipmr_cache_report
    * ipmr_cache_unresolved
    * ipmr_mfc_add/ipmr_mfc_delete
    * ipmr_get_route
    * rt_fill_info (in route.c)

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     
  • This patch stores into struct mfc_cache the network namespace each
    mfc_cache belongs to. The new member is mfc_net.

    mfc_net is assigned at cache allocation and doesn't change during
    the rest of the cache entry life.
    A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres.

    This will help to retrieve the current netns around the IPv4 multicast
    routing code.

    At the moment, all mfc_cache are allocated in init_net.

    Signed-off-by: Benjamin Thery
    Signed-off-by: David S. Miller

    Benjamin Thery
     

30 Aug, 2008

1 commit

  • Nothing in linux/pim.h should be exported to userspace.

    This should fix the XORP build failure reported by
    Jose Calhariz, the debain package maintainer.

    Nothing originally in linux/mroute.h was exported to userspace
    ever, but some of this stuff started to be when it was moved into
    this new linux/pim.h, and that was wrong. If we didn't provide these
    definitions for 10 years we can reasonably expect that applications
    defined this stuff locally or used GLIBC headers providing the
    protocol definitions. And as such the only result of this can
    be conflict and userland build breakage.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Jul, 2008

2 commits


04 Apr, 2008

2 commits


07 Nov, 2007

1 commit

  • The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
    which looks completely bad. Similar ifdefs inside the functions
    looks a bit better, but they are also not recommended to be used.

    Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

29 Sep, 2006

2 commits


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds