16 Apr, 2014

1 commit

  • In the dst->output() path for ipv4, the code assumes the skb it has to
    transmit is attached to an inet socket, specifically via
    ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
    provider of the packet is an AF_PACKET socket.

    The dst->output() method gets an additional 'struct sock *sk'
    parameter. This needs a cascade of changes so that this parameter can
    be propagated from vxlan to final consumer.

    Fixes: 8f646c922d55 ("vxlan: keep original skb ownership")
    Reported-by: lucien xin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Mar, 2014

1 commit


19 Mar, 2014

1 commit

  • Steffen Klassert says:

    ====================
    One patch to rename a newly introduced struct. The rest is
    the rework of the IPsec virtual tunnel interface for ipv6 to
    support inter address family tunneling and namespace crossing.

    1) Rename the newly introduced struct xfrm_filter to avoid a
    conflict with iproute2. From Nicolas Dichtel.

    2) Introduce xfrm_input_afinfo to access the address family
    dependent tunnel callback functions properly.

    3) Add and use a IPsec protocol multiplexer for ipv6.

    4) Remove dst_entry caching. vti can lookup multiple different
    dst entries, dependent of the configured xfrm states. Therefore
    it does not make to cache a dst_entry.

    5) Remove caching of flow informations. vti6 does not use the the
    tunnel endpoint addresses to do route and xfrm lookups.

    6) Update the vti6 to use its own receive hook.

    7) Remove the now unused xfrm_tunnel_notifier. This was used from vti
    and is replaced by the IPsec protocol multiplexer hooks.

    8) Support inter address family tunneling for vti6.

    9) Check if the tunnel endpoints of the xfrm state and the vti interface
    are matching and return an error otherwise.

    10) Enable namespace crossing for vti devices.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

14 Mar, 2014

1 commit


13 Mar, 2014

1 commit

  • We leak an active timer, the hotcpu notifier and all allocated
    resources when we exit a namespace. Fix this by introducing a
    flow_cache_fini() function where we release the resources before
    we exit.

    Fixes: ca925cf1534e ("flowcache: Make flow cache name space aware")
    Reported-by: Jakub Kicinski
    Tested-by: Jakub Kicinski
    Cc: Eric Dumazet
    Cc: Fan Du
    Signed-off-by: Steffen Klassert
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Steffen Klassert
     

10 Mar, 2014

1 commit

  • security_xfrm_policy_alloc can be called in atomic context so the
    allocation should be done with GFP_ATOMIC. Add an argument to let the
    callers choose the appropriate way. In order to do so a gfp argument
    needs to be added to the method xfrm_policy_alloc_security in struct
    security_operations and to the internal function
    selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
    callers and leave GFP_KERNEL as before for the rest.
    The path that needed the gfp argument addition is:
    security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
    all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
    selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)

    Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
    add it to security_context_to_sid which is used inside and prior to this
    patch did only GFP_KERNEL allocation. So add gfp argument to
    security_context_to_sid and adjust all of its callers as well.

    CC: Paul Moore
    CC: Dave Jones
    CC: Steffen Klassert
    CC: Fan Du
    CC: David S. Miller
    CC: LSM list
    CC: SELinux list

    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Paul Moore
    Signed-off-by: Steffen Klassert

    Nikolay Aleksandrov
     

07 Mar, 2014

1 commit


06 Mar, 2014

1 commit

  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Feb, 2014

1 commit

  • When a policy is unlinked from the lists in thread context,
    the xfrm timer can fire before we can mark this policy as dead.
    So reinitialize the bydst hlist, then hlist_unhashed() will
    notice that this policy is not linked and will avoid a
    doulble unlink of that policy.

    Reported-by: Xianpeng Zhao
    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

25 Feb, 2014

2 commits


21 Feb, 2014

1 commit


20 Feb, 2014

3 commits


19 Feb, 2014

1 commit

  • We currently cache socket policy bundles at xfrm_policy_sk_bundles.
    These cached bundles are never used. Instead we create and cache
    a new one whenever xfrm_lookup() is called on a socket policy.

    Most protocols cache the used routes to the socket, so let's
    remove the unused caching of socket policy bundles in xfrm.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

17 Feb, 2014

1 commit

  • The goal of this patch is to allow userland to dump only a part of SA by
    specifying a filter during the dump.
    The kernel is in charge to filter SA, this avoids to generate useless netlink
    traffic (it save also some cpu cycles). This is particularly useful when there
    is a big number of SA set on the system.

    Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
    struct netlink_callback->args is defined as a array of 6 long and the first long
    is used in xfrm code to flag the cb as initialized. Hence, we must have:
    sizeof(struct xfrm_state_walk)
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

13 Feb, 2014

1 commit

  • In the case when KMs have no listeners, km_query() will fail and
    temporary SAs are garbage collected immediately after their allocation.
    This causes strain on memory allocation, leading even to OOM since
    temporary SA alloc/free cycle is performed for every packet
    and garbage collection does not keep up the pace.

    The sane thing to do is to make sure we have audience before
    temporary SA allocation.

    Signed-off-by: Horia Geanta
    Signed-off-by: Steffen Klassert

    Horia Geanta
     

12 Feb, 2014

2 commits

  • Inserting a entry into flowcache, or flushing flowcache should be based
    on per net scope. The reason to do so is flushing operation from fat
    netns crammed with flow entries will also making the slim netns with only
    a few flow cache entries go away in original implementation.

    Since flowcache is tightly coupled with IPsec, so it would be easier to
    put flow cache global parameters into xfrm namespace part. And one last
    thing needs to do is bumping flow cache genid, and flush flow cache should
    also be made in per net style.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     
  • Clear checking when user try to use ESN through netlink keymgr for AH.
    As only ESP and AH support ESN feature according to RFC.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     

26 Jan, 2014

1 commit

  • Pull networking updates from David Miller:

    1) BPF debugger and asm tool by Daniel Borkmann.

    2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

    3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

    4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match. From Ben Hutchings.

    5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

    6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

    7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

    8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

    9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

    10) Fix ipv6 router reachability probing, from Jiri Benc.

    11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

    12) Support tunneling in GRO layer, from Jerry Chu.

    13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

    14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI. From Atzm Watanabe.

    15) New "Heavy Hitter" qdisc, from Terry Lam.

    16) Significantly improve the IPSEC support in pktgen, from Fan Du.

    17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
    Herbert.

    18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

    19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

    20) Key TCP metrics blobs also by source address, not just destination
    address. From Christoph Paasch.

    21) Support 10G in generic phylib. From Andy Fleming.

    22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided. From Tom Herbert.

    The wireless and netfilter folks have been busy little bees too.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
    net/cxgb4: Fix referencing freed adapter
    ipv6: reallocate addrconf router for ipv6 address when lo device up
    fib_frontend: fix possible NULL pointer dereference
    rtnetlink: remove IFLA_BOND_SLAVE definition
    rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
    qlcnic: update version to 5.3.55
    qlcnic: Enhance logic to calculate msix vectors.
    qlcnic: Refactor interrupt coalescing code for all adapters.
    qlcnic: Update poll controller code path
    qlcnic: Interrupt code cleanup
    qlcnic: Enhance Tx timeout debugging.
    qlcnic: Use bool for rx_mac_learn.
    bonding: fix u64 division
    rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
    sfc: Use the correct maximum TX DMA ring size for SFC9100
    Add Shradha Shah as the sfc driver maintainer.
    net/vxlan: Share RX skb de-marking and checksum checks with ovs
    tulip: cleanup by using ARRAY_SIZE()
    ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
    net/cxgb4: Don't retrieve stats during recovery
    ...

    Linus Torvalds
     

24 Jan, 2014

1 commit

  • Pull audit update from Eric Paris:
    "Again we stayed pretty well contained inside the audit system.
    Venturing out was fixing a couple of function prototypes which were
    inconsistent (didn't hurt anything, but we used the same value as an
    int, uint, u32, and I think even a long in a couple of places).

    We also made a couple of minor changes to when a couple of LSMs called
    the audit system. We hoped to add aarch64 audit support this go
    round, but it wasn't ready.

    I'm disappearing on vacation on Thursday. I should have internet
    access, but it'll be spotty. If anything goes wrong please be sure to
    cc rgb@redhat.com. He'll make fixing things his top priority"

    * git://git.infradead.org/users/eparis/audit: (50 commits)
    audit: whitespace fix in kernel-parameters.txt
    audit: fix location of __net_initdata for audit_net_ops
    audit: remove pr_info for every network namespace
    audit: Modify a set of system calls in audit class definitions
    audit: Convert int limit uses to u32
    audit: Use more current logging style
    audit: Use hex_byte_pack_upper
    audit: correct a type mismatch in audit_syscall_exit()
    audit: reorder AUDIT_TTY_SET arguments
    audit: rework AUDIT_TTY_SET to only grab spin_lock once
    audit: remove needless switch in AUDIT_SET
    audit: use define's for audit version
    audit: documentation of audit= kernel parameter
    audit: wait_for_auditd rework for readability
    audit: update MAINTAINERS
    audit: log task info on feature change
    audit: fix incorrect set of audit_sock
    audit: print error message when fail to create audit socket
    audit: fix dangling keywords in audit_log_set_loginuid() output
    audit: log on errors from filter user rules
    ...

    Linus Torvalds
     

15 Jan, 2014

1 commit


14 Jan, 2014

2 commits

  • Conflicts:
    net/xfrm/xfrm_policy.c

    Steffen Klassert says:

    ====================
    This pull request has a merge conflict between commits be7928d20bab
    ("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
    da7c224b1baa ("net: xfrm: xfrm_policy: silence compiler warning") from
    the net-next tree and commit 2f3ea9a95c58 ("xfrm: checkpatch erros with
    inline keyword position") from the ipsec-next tree.

    The version from net-next can be used, like it is done in linux-next.

    1) Checkpatch cleanups, from Weilong Chen.

    2) Fix lockdep complaints when pktgen is used with IPsec,
    from Fan Du.

    3) Update pktgen to allow any combination of IPsec transport/tunnel mode
    and AH/ESP/IPcomp type, from Fan Du.

    4) Make pktgen_dst_metrics static, Fengguang Wu.

    5) Compile fix for pktgen when CONFIG_XFRM is not set,
    from Fan Du.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Right now the sessionid value in the kernel is a combination of u32,
    int, and unsigned int. Just use unsigned int throughout.

    Signed-off-by: Eric Paris
    Signed-off-by: Richard Guy Briggs
    Signed-off-by: Eric Paris

    Eric Paris
     

08 Jan, 2014

2 commits

  • Fix below compiler warning:

    net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function]

    Signed-off-by: Ying Xue
    Signed-off-by: David S. Miller

    Ying Xue
     
  • Fix three warnings related to:

    net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
    net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
    net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]

    Just removing the inline keyword is sufficient as the compiler will
    decide on its own about inlining or not.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Jan, 2014

2 commits

  • Introduce xfrm_state_lookup_byspi to find user specified by custom
    from "pgset spi xxx". Using this scheme, any flow regardless its
    saddr/daddr could be transform by SA specified with configurable
    spi.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     
  • Acquiring xfrm_state_lock in process context is expected to turn BH off,
    as this lock is also used in BH context, namely xfrm state timer handler.
    Otherwise it surprises LOCKDEP with below messages.

    [ 81.422781] pktgen: Packet Generator for packet performance testing. Version: 2.74
    [ 81.725194]
    [ 81.725211] =========================================================
    [ 81.725212] [ INFO: possible irq lock inversion dependency detected ]
    [ 81.725215] 3.13.0-rc2+ #92 Not tainted
    [ 81.725216] ---------------------------------------------------------
    [ 81.725218] kpktgend_0/2780 just changed the state of lock:
    [ 81.725220] (xfrm_state_lock){+.+...}, at: [] xfrm_stateonly_find+0x41/0x1f0
    [ 81.725231] but this lock was taken by another, SOFTIRQ-safe lock in the past:
    [ 81.725232] (&(&x->lock)->rlock){+.-...}
    [ 81.725232]
    [ 81.725232] and interrupts could create inverse lock ordering between them.
    [ 81.725232]
    [ 81.725235]
    [ 81.725235] other info that might help us debug this:
    [ 81.725237] Possible interrupt unsafe locking scenario:
    [ 81.725237]
    [ 81.725238] CPU0 CPU1
    [ 81.725240] ---- ----
    [ 81.725241] lock(xfrm_state_lock);
    [ 81.725243] local_irq_disable();
    [ 81.725244] lock(&(&x->lock)->rlock);
    [ 81.725246] lock(xfrm_state_lock);
    [ 81.725248]
    [ 81.725249] lock(&(&x->lock)->rlock);
    [ 81.725251]
    [ 81.725251] *** DEADLOCK ***
    [ 81.725251]
    [ 81.725254] no locks held by kpktgend_0/2780.
    [ 81.725255]
    [ 81.725255] the shortest dependencies between 2nd lock and 1st lock:
    [ 81.725269] -> (&(&x->lock)->rlock){+.-...} ops: 8 {
    [ 81.725274] HARDIRQ-ON-W at:
    [ 81.725276] [] __lock_acquire+0x65b/0x1d70
    [ 81.725282] [] lock_acquire+0x97/0x130
    [ 81.725284] [] _raw_spin_lock+0x36/0x70
    [ 81.725289] [] xfrm_timer_handler+0x43/0x290
    [ 81.725292] [] __tasklet_hrtimer_trampoline+0x17/0x40
    [ 81.725300] [] tasklet_hi_action+0xd7/0xf0
    [ 81.725303] [] __do_softirq+0xe6/0x2d0
    [ 81.725305] [] irq_exit+0x96/0xc0
    [ 81.725308] [] smp_apic_timer_interrupt+0x4a/0x60
    [ 81.725313] [] apic_timer_interrupt+0x6f/0x80
    [ 81.725316] [] arch_cpu_idle+0x26/0x30
    [ 81.725329] [] cpu_startup_entry+0x88/0x2b0
    [ 81.725333] [] start_secondary+0x190/0x1f0
    [ 81.725338] IN-SOFTIRQ-W at:
    [ 81.725340] [] __lock_acquire+0x62d/0x1d70
    [ 81.725342] [] lock_acquire+0x97/0x130
    [ 81.725344] [] _raw_spin_lock+0x36/0x70
    [ 81.725347] [] xfrm_timer_handler+0x43/0x290
    [ 81.725349] [] __tasklet_hrtimer_trampoline+0x17/0x40
    [ 81.725352] [] tasklet_hi_action+0xd7/0xf0
    [ 81.725355] [] __do_softirq+0xe6/0x2d0
    [ 81.725358] [] irq_exit+0x96/0xc0
    [ 81.725360] [] smp_apic_timer_interrupt+0x4a/0x60
    [ 81.725363] [] apic_timer_interrupt+0x6f/0x80
    [ 81.725365] [] arch_cpu_idle+0x26/0x30
    [ 81.725368] [] cpu_startup_entry+0x88/0x2b0
    [ 81.725370] [] start_secondary+0x190/0x1f0
    [ 81.725373] INITIAL USE at:
    [ 81.725375] [] __lock_acquire+0x32a/0x1d70
    [ 81.725385] [] lock_acquire+0x97/0x130
    [ 81.725388] [] _raw_spin_lock+0x36/0x70
    [ 81.725390] [] xfrm_timer_handler+0x43/0x290
    [ 81.725394] [] __tasklet_hrtimer_trampoline+0x17/0x40
    [ 81.725398] [] tasklet_hi_action+0xd7/0xf0
    [ 81.725401] [] __do_softirq+0xe6/0x2d0
    [ 81.725404] [] irq_exit+0x96/0xc0
    [ 81.725407] [] smp_apic_timer_interrupt+0x4a/0x60
    [ 81.725409] [] apic_timer_interrupt+0x6f/0x80
    [ 81.725412] [] arch_cpu_idle+0x26/0x30
    [ 81.725415] [] cpu_startup_entry+0x88/0x2b0
    [ 81.725417] [] start_secondary+0x190/0x1f0
    [ 81.725420] }
    [ 81.725421] ... key at: [] __key.46349+0x0/0x8
    [ 81.725445] ... acquired at:
    [ 81.725446] [] lock_acquire+0x97/0x130
    [ 81.725449] [] _raw_spin_lock+0x36/0x70
    [ 81.725452] [] __xfrm_state_delete+0x37/0x140
    [ 81.725454] [] xfrm_state_delete+0x2c/0x50
    [ 81.725456] [] xfrm_state_flush+0xc7/0x1b0
    [ 81.725458] [] pfkey_flush+0x7c/0x100 [af_key]
    [ 81.725465] [] pfkey_process+0x1c7/0x1f0 [af_key]
    [ 81.725468] [] pfkey_sendmsg+0x159/0x260 [af_key]
    [ 81.725471] [] sock_sendmsg+0xaf/0xc0
    [ 81.725476] [] SYSC_sendto+0xfc/0x130
    [ 81.725479] [] SyS_sendto+0xe/0x10
    [ 81.725482] [] system_call_fastpath+0x16/0x1b
    [ 81.725484]
    [ 81.725486] -> (xfrm_state_lock){+.+...} ops: 11 {
    [ 81.725490] HARDIRQ-ON-W at:
    [ 81.725493] [] __lock_acquire+0x65b/0x1d70
    [ 81.725504] [] lock_acquire+0x97/0x130
    [ 81.725507] [] _raw_spin_lock_bh+0x3b/0x70
    [ 81.725510] [] xfrm_state_flush+0x2f/0x1b0
    [ 81.725513] [] pfkey_flush+0x7c/0x100 [af_key]
    [ 81.725516] [] pfkey_process+0x1c7/0x1f0 [af_key]
    [ 81.725519] [] pfkey_sendmsg+0x159/0x260 [af_key]
    [ 81.725522] [] sock_sendmsg+0xaf/0xc0
    [ 81.725525] [] SYSC_sendto+0xfc/0x130
    [ 81.725527] [] SyS_sendto+0xe/0x10
    [ 81.725530] [] system_call_fastpath+0x16/0x1b
    [ 81.725533] SOFTIRQ-ON-W at:
    [ 81.725534] [] __lock_acquire+0x68a/0x1d70
    [ 81.725537] [] lock_acquire+0x97/0x130
    [ 81.725539] [] _raw_spin_lock+0x36/0x70
    [ 81.725541] [] xfrm_stateonly_find+0x41/0x1f0
    [ 81.725544] [] mod_cur_headers+0x793/0x7f0 [pktgen]
    [ 81.725547] [] pktgen_thread_worker+0xd42/0x1880 [pktgen]
    [ 81.725550] [] kthread+0xe4/0x100
    [ 81.725555] [] ret_from_fork+0x7c/0xb0
    [ 81.725565] INITIAL USE at:
    [ 81.725567] [] __lock_acquire+0x32a/0x1d70
    [ 81.725569] [] lock_acquire+0x97/0x130
    [ 81.725572] [] _raw_spin_lock_bh+0x3b/0x70
    [ 81.725574] [] xfrm_state_flush+0x2f/0x1b0
    [ 81.725576] [] pfkey_flush+0x7c/0x100 [af_key]
    [ 81.725580] [] pfkey_process+0x1c7/0x1f0 [af_key]
    [ 81.725583] [] pfkey_sendmsg+0x159/0x260 [af_key]
    [ 81.725586] [] sock_sendmsg+0xaf/0xc0
    [ 81.725589] [] SYSC_sendto+0xfc/0x130
    [ 81.725594] [] SyS_sendto+0xe/0x10
    [ 81.725597] [] system_call_fastpath+0x16/0x1b
    [ 81.725599] }
    [ 81.725600] ... key at: [] xfrm_state_lock+0x18/0x50
    [ 81.725606] ... acquired at:
    [ 81.725607] [] check_usage_backwards+0x110/0x150
    [ 81.725609] [] mark_lock+0x196/0x2f0
    [ 81.725611] [] __lock_acquire+0x68a/0x1d70
    [ 81.725614] [] lock_acquire+0x97/0x130
    [ 81.725616] [] _raw_spin_lock+0x36/0x70
    [ 81.725627] [] xfrm_stateonly_find+0x41/0x1f0
    [ 81.725629] [] mod_cur_headers+0x793/0x7f0 [pktgen]
    [ 81.725632] [] pktgen_thread_worker+0xd42/0x1880 [pktgen]
    [ 81.725635] [] kthread+0xe4/0x100
    [ 81.725637] [] ret_from_fork+0x7c/0xb0
    [ 81.725640]
    [ 81.725641]
    [ 81.725641] stack backtrace:
    [ 81.725645] CPU: 0 PID: 2780 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #92
    [ 81.725647] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
    [ 81.725649] ffffffff82537b80 ffff880018199988 ffffffff8176af37 0000000000000007
    [ 81.725652] ffff8800181999f0 ffff8800181999d8 ffffffff81099358 ffffffff82537b80
    [ 81.725655] ffffffff81a32def ffff8800181999f4 0000000000000000 ffff880002cbeaa8
    [ 81.725659] Call Trace:
    [ 81.725664] [] dump_stack+0x46/0x58
    [ 81.725667] [] print_irq_inversion_bug.part.42+0x1e8/0x1f0
    [ 81.725670] [] check_usage_backwards+0x110/0x150
    [ 81.725672] [] mark_lock+0x196/0x2f0
    [ 81.725675] [] ? check_usage_forwards+0x150/0x150
    [ 81.725685] [] __lock_acquire+0x68a/0x1d70
    [ 81.725691] [] ? sched_clock_local+0x25/0x90
    [ 81.725694] [] ? sched_clock_cpu+0xa8/0x120
    [ 81.725697] [] ? __lock_acquire+0x32a/0x1d70
    [ 81.725699] [] ? xfrm_stateonly_find+0x41/0x1f0
    [ 81.725702] [] lock_acquire+0x97/0x130
    [ 81.725704] [] ? xfrm_stateonly_find+0x41/0x1f0
    [ 81.725707] [] ? sched_clock_local+0x25/0x90
    [ 81.725710] [] _raw_spin_lock+0x36/0x70
    [ 81.725712] [] ? xfrm_stateonly_find+0x41/0x1f0
    [ 81.725715] [] ? lock_release_holdtime.part.26+0x1c/0x1a0
    [ 81.725717] [] xfrm_stateonly_find+0x41/0x1f0
    [ 81.725721] [] mod_cur_headers+0x793/0x7f0 [pktgen]
    [ 81.725724] [] pktgen_thread_worker+0xd42/0x1880 [pktgen]
    [ 81.725727] [] ? pktgen_thread_worker+0xb11/0x1880 [pktgen]
    [ 81.725729] [] ? trace_hardirqs_on+0xd/0x10
    [ 81.725733] [] ? _raw_spin_unlock_irq+0x30/0x40
    [ 81.725745] [] ? e1000_clean+0x9d0/0x9d0
    [ 81.725751] [] ? __init_waitqueue_head+0x60/0x60
    [ 81.725753] [] ? __init_waitqueue_head+0x60/0x60
    [ 81.725757] [] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
    [ 81.725759] [] kthread+0xe4/0x100
    [ 81.725762] [] ? flush_kthread_worker+0x170/0x170
    [ 81.725765] [] ret_from_fork+0x7c/0xb0
    [ 81.725768] [] ? flush_kthread_worker+0x170/0x170

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     

02 Jan, 2014

5 commits


16 Dec, 2013

2 commits

  • In order to check against valid IPcomp spi range, export verify_userspi_info
    for both pfkey and netlink interface.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     
  • IPComp connection between two hosts is broken if given spi bigger
    than 0xffff.

    OUTSPI=0x87
    INSPI=0x11112

    ip xfrm policy update dst 192.168.1.101 src 192.168.1.109 dir out action allow \
    tmpl dst 192.168.1.101 src 192.168.1.109 proto comp spi $OUTSPI
    ip xfrm policy update src 192.168.1.101 dst 192.168.1.109 dir in action allow \
    tmpl src 192.168.1.101 dst 192.168.1.109 proto comp spi $INSPI

    ip xfrm state add src 192.168.1.101 dst 192.168.1.109 proto comp spi $INSPI \
    comp deflate
    ip xfrm state add dst 192.168.1.101 src 192.168.1.109 proto comp spi $OUTSPI \
    comp deflate

    tcpdump can capture outbound ping packet, but inbound packet is
    dropped with XfrmOutNoStates errors. It looks like spi value used
    for IPComp is expected to be 16bits wide only.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     

06 Dec, 2013

4 commits

  • We now queue packets to the policy if the states are not yet resolved,
    this replaces the ancient sleeping code. Also the sleeping can cause
    indefinite task hangs if the needed state does not get resolved.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • By semantics, xfrm layer is fully name space aware,
    so will the locks, e.g. xfrm_state/pocliy_lock.
    Ensure exclusive access into state/policy link list
    for different name space with one global lock is not
    right in terms of semantics aspect at first place,
    as they are indeed mutually independent with each
    other, but also more seriously causes scalability
    problem.

    One practical scenario is on a Open Network Stack,
    more than hundreds of lxc tenants acts as routers
    within one host, a global xfrm_state/policy_lock
    becomes the bottleneck. But onces those locks are
    decoupled in a per-namespace fashion, locks contend
    is just with in specific name space scope, without
    causing additional SPD/SAD access delay for other
    name space.

    Also this patch improve scalability while as without
    changing original xfrm behavior.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     
  • because the home agent could surely be run on a different
    net namespace other than init_net. The original behavior
    could lead into inconsistent of key info.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du
     
  • xfrm code always searches for unused policy index for
    newly created policy regardless whether or not user
    space policy index hint supplied.

    This patch enables such feature so that using
    "ip xfrm ... index=xxx" can be used by user to set
    specific policy index.

    Currently this beahvior is broken, so this patch make
    it happen as expected.

    Signed-off-by: Fan Du
    Signed-off-by: Steffen Klassert

    Fan Du