05 May, 2015

1 commit

  • In an environment where the KDC is running Active Directory, the
    exported composite name field returned in the context could be large
    enough to span a page boundary. Attaching a scratch buffer to the
    decoding xdr_stream helps deal with those cases.

    The case where we saw this was actually due to behavior that's been
    fixed in newer gss-proxy versions, but we're fixing it here too.

    Signed-off-by: Scott Mayhew
    Cc: stable@vger.kernel.org
    Reviewed-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     

02 May, 2015

1 commit


30 Apr, 2015

7 commits

  • eeprom-length is a switch property, not a dsa property, and thus
    needs to be attached to the switch node, not to the dsa node.

    Reported-by: Andrew Lunn
    Fixes: 6793abb4e849 ("net: dsa: Add support for switch EEPROM access")
    Signed-off-by: Guenter Roeck
    Acked-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Currently, we try to accumulate arrived packets in the links's
    'deferred' queue during the parallel link syncronization phase.

    This entails two problems:

    - With an unlucky combination of arriving packets the algorithm
    may go into a lockstep with the out-of-sequence handling function,
    where the synch mechanism is adding a packet to the deferred queue,
    while the out-of-sequence handling is retrieving it again, thus
    ending up in a loop inside the node_lock scope.

    - Even if this is avoided, the link will very often send out
    unnecessary protocol messages, in the worst case leading to
    redundant retransmissions.

    We fix this by just dropping arriving packets on the upcoming link
    during the synchronization phase, thus relying on the retransmission
    protocol to resolve the situation once the two links have arrived to
    a synchronized state.

    Reviewed-by: Erik Hugne
    Reviewed-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     
  • NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
    it is sent only at the end of a dump.

    Libraries like libnl will wait forever for NLMSG_DONE.

    Fixes: 35b9dd7607f0 ("tipc: add bearer get/dump to new netlink api")
    Fixes: 7be57fc69184 ("tipc: add link get/dump to new netlink api")
    Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api")
    CC: Richard Alpe
    CC: Jon Maloy
    CC: Ying Xue
    CC: tipc-discussion@lists.sourceforge.net
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
    it is sent only at the end of a dump.

    Libraries like libnl will wait forever for NLMSG_DONE.

    Fixes: e5a55a898720 ("net: create generic bridge ops")
    Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
    CC: John Fastabend
    CC: Sathya Perla
    CC: Subbu Seetharaman
    CC: Ajit Khaparde
    CC: Jeff Kirsher
    CC: intel-wired-lan@lists.osuosl.org
    CC: Jiri Pirko
    CC: Scott Feldman
    CC: Stephen Hemminger
    CC: bridge@lists.linux-foundation.org
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
    it is sent only at the end of a dump.

    Libraries like libnl will wait forever for NLMSG_DONE.

    Fixes: 37a393bc4932 ("bridge: notify mdb changes via netlink")
    CC: Cong Wang
    CC: Stephen Hemminger
    CC: bridge@lists.linux-foundation.org
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • This action is meant to be passive, i.e. we should not alter
    skb->nfct: If nfct is present just leave it alone.

    Compile tested only.

    Cc: Jamal Hadi Salim
    Signed-off-by: Florian Westphal
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • The commit 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 ("ipv4: Don't
    increase PMTU with Datagram Too Big message") broke PMTU in cases
    where the rt_pmtu value has expired but is smaller than the new
    PMTU value.

    This obsolete rt_pmtu then prevents the new PMTU value from being
    installed.

    Fixes: 3cdaa5be9e81 ("ipv4: Don't increase PMTU with Datagram Too Big message")
    Reported-by: Gerd v. Egidy
    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

28 Apr, 2015

3 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree,
    they are:

    1) Fix a crash in nf_tables when dictionaries are used from the ruleset,
    due to memory corruption, from Florian Westphal.

    2) Fix another crash in nf_queue when used with br_netfilter. Also from
    Florian.

    Both fixes are related to new stuff that got in 4.0-rc.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:

    1) mlx4 doesn't check fully for supported valid RSS hash function, fix
    from Amir Vadai

    2) Off by one in ibmveth_change_mtu(), from David Gibson

    3) Prevent altera chip from reporting false error interrupts in some
    circumstances, from Chee Nouk Phoon

    4) Get rid of that stupid endless loop trying to allocate a FIN packet
    in TCP, and in the process kill deadlocks. From Eric Dumazet

    5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
    Eric Dumazet

    6) Fix two bugs in async rhashtable resizing, from Thomas Graf

    7) Fix topology server listener socket namespace bug in TIPC, from Ying
    Xue

    8) Add some missing HAS_DMA kconfig dependencies, from Geert
    Uytterhoeven

    9) bgmac driver intends to force re-polling but does so by returning
    the wrong value from it's ->poll() handler. Fix from Rafał Miłecki

    10) When the creater of an rhashtable configures a max size for it,
    don't bark in the logs and drop insertions when that is exceeded.
    Fix from Johannes Berg

    11) Recover from out of order packets in ppp mppe properly, from Sylvain
    Rochet

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
    bnx2x: really disable TPA if 'disable_tpa' option is set
    net:treewide: Fix typo in drivers/net
    net/mlx4_en: Prevent setting invalid RSS hash function
    mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
    netfilter; Add some missing default cases to switch statements in nft_reject.
    ppp: mppe: discard late packet in stateless mode
    ppp: mppe: sanity error path rework
    net/bonding: Make DRV macros private
    net: rfs: fix crash in get_rps_cpus()
    altera tse: add support for fixed-links.
    pxa168: fix double deallocation of managed resources
    net: fix crash in build_skb()
    net: eth: altera: Resolve false errors from MSGDMA to TSE
    ehea: Fix memory hook reference counting crashes
    net/tg3: Release IRQs on permanent error
    net: mdio-gpio: support access that may sleep
    inet: fix possible panic in reqsk_queue_unlink()
    rhashtable: don't attempt to grow when at max_size
    bgmac: fix requests for extra polling calls from NAPI
    tcp: avoid looping in tcp_send_fin()
    ...

    Linus Torvalds
     
  • This fixes:

    ====================
    net/netfilter/nft_reject.c: In function ‘nft_reject_dump’:
    net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswitch]
    switch (priv->type) {
    ^
    net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_ICMPX_UNREACH’ not handled in switch [-Wswi\
    tch]
    net/netfilter/nft_reject_inet.c: In function ‘nft_reject_inet_dump’:
    net/netfilter/nft_reject_inet.c:105:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswi\
    tch]
    switch (priv->type) {
    ^
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

27 Apr, 2015

3 commits

  • Pull NFS client updates from Trond Myklebust:
    "Another set of mainly bugfixes and a couple of cleanups. No new
    functionality in this round.

    Highlights include:

    Stable patches:
    - Fix a regression in /proc/self/mountstats
    - Fix the pNFS flexfiles O_DIRECT support
    - Fix high load average due to callback thread sleeping

    Bugfixes:
    - Various patches to fix the pNFS layoutcommit support
    - Do not cache pNFS deviceids unless server notifications are enabled
    - Fix a SUNRPC transport reconnection regression
    - make debugfs file creation failure non-fatal in SUNRPC
    - Another fix for circular directory warnings on NFSv4 "junctioned"
    mountpoints
    - Fix locking around NFSv4.2 fallocate() support
    - Truncating NFSv4 file opens should also sync O_DIRECT writes
    - Prevent infinite loop in rpcrdma_ep_create()

    Features:
    - Various improvements to the RDMA transport code's handling of
    memory registration
    - Various code cleanups"

    * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
    fs/nfs: fix new compiler warning about boolean in switch
    nfs: Remove unneeded casts in nfs
    NFS: Don't attempt to decode missing directory entries
    Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
    NFS: Rename idmap.c to nfs4idmap.c
    NFS: Move nfs_idmap.h into fs/nfs/
    NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
    NFS: Add a stub for GETDEVICELIST
    nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
    nfs: fix DIO good bytes calculation
    nfs: Fetch MOUNTED_ON_FILEID when updating an inode
    sunrpc: make debugfs file creation failure non-fatal
    nfs: fix high load average due to callback thread sleeping
    NFS: Reduce time spent holding the i_mutex during fallocate()
    NFS: Don't zap caches on fallocate()
    xprtrdma: Make rpcrdma_{un}map_one() into inline functions
    xprtrdma: Handle non-SEND completions via a callout
    xprtrdma: Add "open" memreg op
    xprtrdma: Add "destroy MRs" memreg op
    xprtrdma: Add "reset MRs" memreg op
    ...

    Linus Torvalds
     
  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     
  • Commit 567e4b79731c ("net: rfs: add hash collision detection") had one
    mistake :

    RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu()
    and get_rps_cpu(), as @next_cpu was the result of an AND with
    rps_cpu_mask

    This bug showed up on a host with 72 cpus :
    next_cpu was 0x7f, and the code was trying to access percpu data of an
    non existent cpu.

    In a follow up patch, we might get rid of compares against nr_cpu_ids,
    if we init the tables with 0. This is silly to test for a very unlikely
    condition that exists only shortly after table initialization, as
    we got rid of rps_reset_sock_flow() and similar functions that were
    writing this RPS_NO_CPU magic value at flow dismantle : When table is
    old enough, it never contains this value anymore.

    Fixes: 567e4b79731c ("net: rfs: add hash collision detection")
    Signed-off-by: Eric Dumazet
    Cc: Tom Herbert
    Cc: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

26 Apr, 2015

1 commit

  • When I added pfmemalloc support in build_skb(), I forgot netlink
    was using build_skb() with a vmalloc() area.

    In this patch I introduce __build_skb() for netlink use,
    and build_skb() is a wrapper handling both skb->head_frag and
    skb->pfmemalloc

    This means netlink no longer has to hack skb->head_frag

    [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
    [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
    [ 1567.700067] Dumping ftrace buffer:
    [ 1567.700067] (ftrace buffer empty)
    [ 1567.700067] Modules linked in:
    [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167
    [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000
    [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
    [ 1567.700067] RSP: 0018:ffff8802467779d8 EFLAGS: 00010202
    [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c
    [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049
    [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000
    [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000
    [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000
    [ 1567.700067] FS: 00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000
    [ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0
    [ 1567.700067] Stack:
    [ 1567.700067] ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
    [ 1567.700067] ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
    [ 1567.700067] ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
    [ 1567.700067] Call Trace:
    [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
    [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
    [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
    [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
    [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
    [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
    [ 1567.774369] sock_write_iter (net/socket.c:823)
    [ 1567.774369] ? sock_sendmsg (net/socket.c:806)
    [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
    [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
    [ 1567.774369] ? default_llseek (fs/read_write.c:487)
    [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
    [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
    [ 1567.774369] vfs_write (fs/read_write.c:539)
    [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
    [ 1567.774369] ? SyS_read (fs/read_write.c:577)
    [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
    [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
    [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
    [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)

    Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
    Signed-off-by: Eric Dumazet
    Reported-by: Sasha Levin
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Apr, 2015

1 commit


24 Apr, 2015

6 commits

  • [ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
    0000000000000080
    [ 3897.931025] IP: [] reqsk_timer_handler+0x1a6/0x243

    There is a race when reqsk_timer_handler() and tcp_check_req() call
    inet_csk_reqsk_queue_unlink() on the same req at the same time.

    Before commit fa76ce7328b2 ("inet: get rid of central tcp/dccp listener
    timer"), listener spinlock was held and race could not happen.

    To solve this bug, we change reqsk_queue_unlink() to not assume req
    must be found, and we return a status, to conditionally release a
    refcount on the request sock.

    This also means tcp_check_req() in non fastopen case might or not
    consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
    to properly handle this.

    (Same remark for dccp_check_req() and its callers)

    inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
    called 4 times in tcp and 3 times in dccp.

    Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
    Signed-off-by: Eric Dumazet
    Reported-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Presence of an unbound loop in tcp_send_fin() had always been hard
    to explain when analyzing crash dumps involving gigantic dying processes
    with millions of sockets.

    Lets try a different strategy :

    In case of memory pressure, try to add the FIN flag to last packet
    in write queue, even if packet was already sent. TCP stack will
    be able to deliver this FIN after a timeout event. Note that this
    FIN being delivered by a retransmit, it also carries a Push flag
    given our current implementation.

    By checking sk_under_memory_pressure(), we anticipate that cooking
    many FIN packets might deplete tcp memory.

    In the case we could not allocate a packet, even with __GFP_WAIT
    allocation, then not sending a FIN seems quite reasonable if it allows
    to get rid of this socket, free memory, and not block the process from
    eventually doing other useful work.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • NFS: NFSoRDMA Client Changes

    This patch series creates an operation vector for each of the different
    memory registration modes. This should make it easier to one day increase
    credit limit, rsize, and wsize.

    Signed-off-by: Anna Schumaker

    Trond Myklebust
     
  • * bugfixes:
    NFSv4: Return delegations synchronously in evict_inode
    SUNRPC: Fix a regression when reconnecting
    NFS: remount with security change should return EINVAL
    nfs: do not export discarded symbols
    NFSv4.1: don't export static symbol

    Trond Myklebust
     
  • v2: gracefully handle the case where some dentry pointers end up NULL
    and be more dilligent about zeroing out dentry pointers

    We currently have a problem that SELinux policy is being enforced when
    creating debugfs files. If a debugfs file is created as a side effect of
    doing some syscall, then that creation can fail if the SELinux policy
    for that process prevents it.

    This seems wrong. We don't do that for files under /proc, for instance,
    so Bruce has proposed a patch to fix that.

    While discussing that patch however, Greg K.H. stated:

    "No kernel code should care / fail if a debugfs function fails, so
    please fix up the sunrpc code first."

    This patch converts all of the sunrpc debugfs setup code to be void
    return functins, and the callers to not look for errors from those
    functions.

    This should allow rpc_clnt and rpc_xprt creation to work, even if the
    kernel fails to create debugfs files for some reason.

    Cc: Greg Kroah-Hartman
    Acked-by: "J. Bruce Fields"
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • fixed several comment and whitespace style issues

    Signed-off-by: Jason Eastman
    Signed-off-by: David S. Miller

    Jason Eastman
     

23 Apr, 2015

10 commits

  • When link statistics is dumped over netlink, we iterate over
    the list of peer nodes and append each links statistics to
    the netlink msg. In the case where the dump is resumed after
    filling up a nlmsg, the node refcnt is decremented without
    having been incremented previously which may cause the node
    reference to be freed. When this happens, the following
    info/stacktrace will be generated, followed by a crash or
    undefined behavior.
    We fix this by removing the erroneous call to tipc_node_put
    inside the loop that iterates over nodes.

    [ 384.312303] INFO: trying to register non-static key.
    [ 384.313110] the code is fine but needs lockdep annotation.
    [ 384.313290] turning off the locking correctness validator.
    [ 384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
    [ 384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 384.313290] ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007
    [ 384.313290] ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200
    [ 384.313290] ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001
    [ 384.313290] Call Trace:
    [ 384.313290] [] dump_stack+0x4c/0x65
    [ 384.313290] [] __lock_acquire+0xf3d/0xf50
    [ 384.313290] [] lock_acquire+0xd5/0x290
    [ 384.313290] [] ? link_timeout+0x1c/0x170 [tipc]
    [ 384.313290] [] ? link_state_event+0x4e0/0x4e0 [tipc]
    [ 384.313290] [] _raw_spin_lock_bh+0x40/0x80
    [ 384.313290] [] ? link_timeout+0x1c/0x170 [tipc]
    [ 384.313290] [] link_timeout+0x1c/0x170 [tipc]
    [ 384.313290] [] call_timer_fn+0xb8/0x490
    [ 384.313290] [] ? process_timeout+0x10/0x10
    [ 384.313290] [] run_timer_softirq+0x21c/0x420
    [ 384.313290] [] ? link_state_event+0x4e0/0x4e0 [tipc]
    [ 384.313290] [] __do_softirq+0xf4/0x630
    [ 384.313290] [] irq_exit+0x5d/0x60
    [ 384.313290] [] smp_apic_timer_interrupt+0x41/0x50
    [ 384.313290] [] apic_timer_interrupt+0x70/0x80
    [ 384.313290] [] ? default_idle+0x20/0x210
    [ 384.313290] [] ? default_idle+0x1e/0x210
    [ 384.313290] [] arch_cpu_idle+0xa/0x10
    [ 384.313290] [] cpu_startup_entry+0x2c3/0x530
    [ 384.313290] [] ? clockevents_register_device+0x113/0x200
    [ 384.313290] [] start_secondary+0x13f/0x170

    Fixes: 8a0f6ebe8494 ("tipc: involve reference counter for node structure")
    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     
  • In the function tipc_sk_rcv(), the stack variable 'err'
    is only initialized to TIPC_ERR_NO_PORT for the first
    iteration over the link input queue. If a chain of messages
    are received from a link, failure to lookup the socket for
    any but the first message will cause the message to bounce back
    out on a random link.
    We fix this by properly initializing err.

    Signed-off-by: Erik Hugne
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     
  • When a new topology server is launched in a new namespace, its
    listening socket is inserted into the "init ns" namespace's socket
    hash table rather than the one owned by the new namespace. Although
    the socket's namespace is forcedly changed to the new namespace later,
    the socket is still stored in the socket hash table of "init ns"
    namespace. When a client created in the new namespace connects
    its own topology server, the connection is failed as its server's
    socket could not be found from its own namespace's socket table.

    If __sock_create() instead of original sock_create_kern() is used
    to create the server's socket through specifying an expected namesapce,
    the socket will be inserted into the specified namespace's socket
    table, thereby avoiding to the topology server broken issue.

    Fixes: 76100a8a64bc ("tipc: fix netns refcnt leak")

    Reported-by: Erik Hugne
    Signed-off-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Ying Xue
     
  • build_skb() should look at the page pfmemalloc status.
    If set, this means page allocator allocated this page in the
    expectation it would help to free other pages. Networking
    stack can do that only if skb->pfmemalloc is also set.

    Also, we must refrain using high order pages from the pfmemalloc
    reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
    them. Under memory pressure, using order-0 pages is probably the best
    strategy.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The code there just open-codes the same, so use the provided macro instead.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Pull Ceph updates from Sage Weil:
    "This time around we have a collection of CephFS fixes from Zheng
    around MDS failure handling and snapshots, support for a new CRUSH
    straw2 algorithm (to sync up with userspace) and several RBD cleanups
    and fixes from Ilya, an error path leak fix from Taesoo, and then an
    assorted collection of cleanups from others"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
    rbd: rbd_wq comment is obsolete
    libceph: announce support for straw2 buckets
    crush: straw2 bucket type with an efficient 64-bit crush_ln()
    crush: ensuring at most num-rep osds are selected
    crush: drop unnecessary include from mapper.c
    ceph: fix uninline data function
    ceph: rename snapshot support
    ceph: fix null pointer dereference in send_mds_reconnect()
    ceph: hold on to exclusive caps on complete directories
    libceph: simplify our debugfs attr macro
    ceph: show non-default options only
    libceph: expose client options through debugfs
    libceph, ceph: split ceph_show_options()
    rbd: mark block queue as non-rotational
    libceph: don't overwrite specific con error msgs
    ceph: cleanup unsafe requests when reconnecting is denied
    ceph: don't zero i_wrbuffer_ref when reconnecting is denied
    ceph: don't mark dirty caps when there is no auth cap
    ceph: keep i_snap_realm while there are writers
    libceph: osdmap.h: Add missing format newlines
    ...

    Linus Torvalds
     
  • The reserved implicit-NULL label isn't allowed to appear in the label
    stack for packets, so make it an error for the control plane to
    specify it as an outgoing label.

    Suggested-by: "Eric W. Biederman"
    Signed-off-by: Robert Shearman
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • An MPLS network is a single trust domain where the edges must be in
    control of what labels make their way into the core. The simplest way
    of ensuring this is for the edge device to always impose the labels,
    and not allow forward labeled traffic from untrusted neighbours. This
    is achieved by allowing a per-device configuration of whether MPLS
    traffic input from that interface should be processed or not.

    To be secure by default, the default state is changed to MPLS being
    disabled on all interfaces unless explicitly enabled and no global
    option is provided to change the default. Whilst this differs from
    other protocols (e.g. IPv6), network operators are used to explicitly
    enabling MPLS forwarding on interfaces, and with the number of links
    to the MPLS core typically fairly low this doesn't present too much of
    a burden on operators.

    Cc: "Eric W. Biederman"
    Signed-off-by: Robert Shearman
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Add per-device MPLS state to supported interfaces. Use the presence of
    this state in mpls_route_add to determine that this is a supported
    interface.

    Use the presence of mpls_dev to drop packets that arrived on an
    unsupported interface - previously they were allowed through.

    Cc: "Eric W. Biederman"
    Signed-off-by: Robert Shearman
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
    case a huge process is killed by OOM, and tcp_mem[2] is hit.

    To be able to free memory we need to make progress, so this
    patch allows FIN packets to not care about tcp_mem[2], if
    skb allocation succeeded.

    In a follow-up patch, we might abort tcp_send_fin() infinite loop
    in case TIF_MEMDIE is set on this thread, as memory allocator
    did its best getting extra memory already.

    This patch reverts d22e15371811 ("tcp: fix tcp fin memory accounting")

    Fixes: d22e15371811 ("tcp: fix tcp fin memory accounting")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Apr, 2015

5 commits

  • This is an improved straw bucket that correctly avoids any data movement
    between items A and B when neither A nor B's weights are changed. Said
    differently, if we adjust the weight of item C (including adding it anew
    or removing it completely), we will only see inputs move to or from C,
    never between other items in the bucket.

    Notably, there is not intermediate scaling factor that needs to be
    calculated. The mapping function is a simple function of the item weights.

    The below commits were squashed together into this one (mostly to avoid
    adding and then yanking a ~6000 lines worth of crush_ln_table):

    - crush: add a straw2 bucket type
    - crush: add crush_ln to calculate nature log efficently
    - crush: improve straw2 adjustment slightly
    - crush: change crush_ln to provide 32 more digits
    - crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
    - crush/mapper: fix divide-by-0 in straw2
    (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
    to create a proper compat.h in ceph.git)

    Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
    32a1ead92efcd351822d22a5fc37d159c65c1338,
    6289912418c4a3597a11778bcf29ed5415117ad9,
    35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
    6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
    b5921d55d16796e12d66ad2c4add7305f9ce2353.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Crush temporary buffers are allocated as per replica size configured
    by the user. When there are more final osds (to be selected as per
    rule) than the replicas, buffer overlaps and it causes crash. Now, it
    ensures that at most num-rep osds are selected even if more number of
    osds are allowed by the rule.

    Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
    234b066ba04976783d15ff2abc3e81b6cc06fb10.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Pull networking fixes from David Miller:
    "Just a few fixes trickling in at this point.

    1) If we see an attached socket on an skb in the ipv4 forwarding path,
    bail. This can happen due to races with FIB rule addition, and
    deletion, and we should just drop such frames. From Sebastian
    Pöhn.

    2) pppoe receive should only accept packets destined for this hosts's
    MAC address. From Joakim Tjernlund.

    3) Handle checksum unwrapping properly in ppp receive properly when
    it's encapsulated in UDP in some way, fix from Tom Herbert.

    4) Fix some bugs in mv88e6xxx DSA driver resulting from the conversion
    from register offset constants to mnenomic macros. From Vivien
    Didelot.

    5) Fix handling of HCA max message size in mlx4 adapters, from Eran
    Ben ELisha"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP
    tcp: add memory barriers to write space paths
    altera tse: Error-Bit on tx-avalon-stream always set.
    net: dsa: mv88e6xxx: use PORT_DEFAULT_VLAN
    net: dsa: mv88e6xxx: fix setup of port control 1
    ppp: call skb_checksum_complete_unset in ppp_receive_frame
    net: add skb_checksum_complete_unset
    pppoe: Lacks DST MAC address check
    ip_forward: Drop frames with attached skb->sk

    Linus Torvalds
     
  • Ensure that we either see that the buffer has write space
    in tcp_poll() or that we perform a wakeup from the input
    side. Did not run into any actual problem here, but thought
    that we should make things explicit.

    Signed-off-by: Jason Baron
    Signed-off-by: David S. Miller

    jbaron@akamai.com
     

21 Apr, 2015

1 commit

  • Initial discussion was:
    [FYI] xfrm: Don't lookup sk_policy for timewait sockets

    Forwarded frames should not have a socket attached. Especially
    tw sockets will lead to panics later-on in the stack.

    This was observed with TPROXY assigning a tw socket and broken
    policy routing (misconfigured). As a result frame enters
    forwarding path instead of input. We cannot solve this in
    TPROXY as it cannot know that policy routing is broken.

    v2:
    Remove useless comment

    Signed-off-by: Sebastian Poehn
    Signed-off-by: David S. Miller

    Sebastian Pöhn
     

20 Apr, 2015

1 commit