07 Mar, 2014

7 commits

  • Can be invoked from non-BH context.

    Based upon a patch by Eric Dumazet.

    Fixes: f19c29e3e391 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts")
    Reported-by: Sergey Senozhatsky
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Currently we're using GFP_KERNEL, however there are some path(s) where we
    can hold some spinlocks, specifically bond->curr_slave_lock:

    [ 4.722916] BUG: sleeping function called from invalid context at mm/slub.c:965
    [ 4.724438] in_atomic(): 1, irqs_disabled(): 0, pid: 940, name: ifup-eth
    [ 4.726034] 5 locks held by ifup-eth/940:
    ...snip...
    [ 4.734646] #4: (&bond->curr_slave_lock){+...+.}, at: [] bond_enslave+0xda6/0xdd0 [bonding]
    ...snip...
    [ 4.759081] [] bond_change_active_slave+0x191/0x3b0 [bonding]
    [ 4.760917] [] bond_select_active_slave+0xf7/0x1d0 [bonding]
    [ 4.762751] [] bond_enslave+0xdae/0xdd0 [bonding]
    ...snip...

    As it's out of hot path and is a really rare event - change the gfp_t flags
    to GFP_ATOMIC to avoid sleeping under spinlock.

    v2: convert new notify calls to GFP_ATOMIC.

    CC: Thomas Glanzmann
    CC: Ding Tianhong
    CC: Jay Vosburgh
    CC: Andy Gospodarek
    Signed-off-by: Veaceslav Falico
    Signed-off-by: David S. Miller

    Veaceslav Falico
     
  • Commit e688a604807647 ("net: introduce DST_NOPEER dst flag") introduced
    DST_NOPEER because because of crashes in ipv6_select_ident called from
    udp6_ufo_fragment.

    Since commit 916e4cf46d0204 ("ipv6: reuse ip6_frag_id from
    ip6_ufo_append_data") we don't call ipv6_select_ident any more from
    ip6_ufo_append_data, thus this flag lost its purpose and can be removed.

    Cc: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Hayes Wang says:

    ====================
    r8152: cleanups

    Deal with some empty lines and spaces, replace some tp->netdev with netdev,
    and remove the unnecessary function.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The rtl8152_get_stats() returns the point address of the struct
    net_device_stats. This could be got from struct net_device directly.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller

    hayeswang
     
  • Replace some tp->netdev with netdev.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller

    hayeswang
     
  • Add or remove some empty lines. Replace the spaces with the tabs.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller

    hayeswang
     

06 Mar, 2014

1 commit

  • Conflicts:
    drivers/net/wireless/ath/ath9k/recv.c
    drivers/net/wireless/mwifiex/pcie.c
    net/ipv6/sit.c

    The SIT driver conflict consists of a bug fix being done by hand
    in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
    was created (netdev_alloc_pcpu_stats()) which takes care of this.

    The two wireless conflicts were overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Mar, 2014

10 commits

  • This patch fixes some whitespace issues in Kconfig files of IEEE
    802.15.4 subsytem.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • Since commit 8fad346f366a72978ea942abd06bd501ebd39c22
    (ieee802154: add basic support for RF212 to at86rf230 driver)

    we support at86rf212 as well.

    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • The driver currently maps a page for DMA, divides the page into multiple
    frags and posts them to the HW. It un-maps the page after data is received
    on all the frags of the page. This scheme doesn't work when bounce buffers
    are used for DMA (swiotlb=force kernel param).

    This patch fixes this problem by calling dma_sync_single_for_cpu() for each
    frag (excepting the last one) so that the data is copied from the bounce
    buffers. The page is un-mapped only when DMA finishes on the last frag of
    the page.
    (Thanks Ben H. for suggesting the dma_sync API!)

    This patch also renames the "last_page_user" field of be_rx_page_info{}
    struct to "last_frag" to improve readability of the fixed code.

    Reported-by: Li Fengmao
    Signed-off-by: Sathya Perla
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • Simon Wunderlich says:

    ====================
    this series contains a header file proposal for MPLS labels. These
    labels do not seem to be properly defined in the kernel so far. We are
    developing a wired/wireless 802.21/MPLS switch and need to check the
    MPLS labels to use the traffic control info for transmissions over
    802.11 networks.

    Changes to third version:

    * rename mpls_label_stack to mpls_label (thanks Neil)
    * fix over-indendented closing brac (thanks Sergei)
    * add Johannes' Ack
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • MPLS labels may contain traffic control information, which should be
    evaluated and used by the wireless subsystem if present.

    Also check for IEEE 802.21 which is always network control traffic.

    Signed-off-by: Simon Wunderlich
    Signed-off-by: Mathias Kretschmer
    Acked-by: Johannes Berg
    Signed-off-by: David S. Miller

    Simon Wunderlich
     
  • Labels for the Multiprotocol Label Switching are defined in RFC 3032
    which was superseded by RFC 5462. Add the definition to UAPI and a stub
    header for include/linux.

    Signed-off-by: Simon Wunderlich
    Signed-off-by: Mathias Kretschmer
    Signed-off-by: David S. Miller

    Simon Wunderlich
     
  • Add the Ethertype for IEEE Std 802.21 - Media Independent Handover
    Protocol. This Ethertype is used for network control messages.

    Signed-off-by: Simon Wunderlich
    Signed-off-by: Mathias Kretschmer
    Signed-off-by: David S. Miller

    Simon Wunderlich
     
  • Pull networking fixes from David Miller:

    1) Fix memory leak in ieee80211_prep_connection(), sta_info leaked on
    error. From Eytan Lifshitz.

    2) Unintentional switch case fallthrough in nft_reject_inet_eval(),
    from Patrick McHardy.

    3) Must check if payload lenth is a power of 2 in
    nft_payload_select_ops(), from Nikolay Aleksandrov.

    4) Fix mis-checksumming in xen-netfront driver, ip_hdr() is not in the
    correct place when we invoke skb_checksum_setup(). From Wei Liu.

    5) TUN driver should not advertise HW vlan offload features in
    vlan_features. Fix from Fernando Luis Vazquez Cao.

    6) IPV6_VTI needs to select NET_IPV_TUNNEL to avoid build errors, fix
    from Steffen Klassert.

    7) Add missing locking in xfrm_migrade_state_find(), we must hold the
    per-namespace xfrm_state_lock while traversing the lists. Fix from
    Steffen Klassert.

    8) Missing locking in ath9k driver, access to tid->sched must be done
    under ath_txq_lock(). Fix from Stanislaw Gruszka.

    9) Fix two bugs in TCP fastopen. First respect the size argument given
    to tcp_sendmsg() in the fastopen path, and secondly prevent
    tcp_send_syn_data() from potentially using order-5 allocations.
    From Eric Dumazet.

    10) Fix handling of default neigh garbage collection params, from Jiri
    Pirko.

    11) Fix cwnd bloat and over-inflation of RTT when transmit segmentation
    is in use. From Eric Dumazet.

    12) Missing initialization of Realtek r8169 driver's statistics
    seqlocks. Fix from Kyle McMartin.

    13) Fix RTNL assertion failures in 802.3ad and AB ARP monitor of bonding
    driver, from Ding Tianhong.

    14) Bonding slave release race can cause divide by zero, fix from
    Nikolay Aleksandrov.

    15) Overzealous return from neigh_periodic_work() causes reachability
    time to not be computed. Fix from Duain Jiong.

    16) Fix regression in ipv6_find_hdr(), it should not return -ENOENT when
    a specific target is specified and found. From Hans Schillstrom.

    17) Fix VLAN tag stripping regression in BNA driver, from Ivan Vecera.

    18) Tail loss probe can calculate bogus RTTs due to missing packet
    marking on retransmit. Fix from Yuchung Cheng.

    19) We cannot do skb_dst_drop() in iptunnel_pull_header() because
    multicast loopback detection in later code paths need access to
    skb_rtable(). Fix from Xin Long.

    20) The macvlan driver regresses in that it propagates lower device
    offload support disables into itself, causing severe slowdowns when
    running over a bridge. Provide the software offloads always on
    macvlan devices to deal with this and the regression is gone. From
    Vlad Yasevich.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
    macvlan: Add support for 'always_on' offload features
    net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable
    ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer
    net: cpsw: fix cpdma rx descriptor leak on down interface
    be2net: isolate TX workarounds not applicable to Skyhawk-R
    be2net: Fix skb double free in be_xmit_wrokarounds() failure path
    be2net: clear promiscuous bits in adapter->flags while disabling promiscuous mode
    be2net: Fix to reset transparent vlan tagging
    qlcnic: dcb: a couple off by one bugs
    tcp: fix bogus RTT on special retransmission
    hsr: off by one sanity check in hsr_register_frame_in()
    can: remove CAN FD compatibility for CAN 2.0 sockets
    can: flexcan: factor out soft reset into seperate funtion
    can: flexcan: flexcan_remove(): add missing netif_napi_del()
    can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze
    can: flexcan: factor out transceiver {en,dis}able into seperate functions
    can: flexcan: fix transition from and to low power mode in chip_{en,dis}able
    can: flexcan: flexcan_open(): fix error path if flexcan_chip_start() fails
    can: flexcan: fix shutdown: first disable chip, then all interrupts
    USB AX88179/178A: Support D-Link DUB-1312
    ...

    Linus Torvalds
     
  • Pull regulator fixes from Mark Brown:
    "A couple of fixes here which ensure that regulators using the core
    support for GPIO enables work in all cases by ensuring that helpers
    are used consistently rather than open coding in places and hence not
    having GPIO support in some of them"

    * tag 'regulator-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
    regulator: core: Replace direct ops->disable usage
    regulator: core: Replace direct ops->enable usage

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton.

    * emailed patches from Andrew Morton akpm@linux-foundation.org>:
    mm: page_alloc: exempt GFP_THISNODE allocations from zone fairness
    mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS
    MAINTAINERS: add and correct types of some "T:" entries
    MAINTAINERS: use tab for separator
    rapidio/tsi721: fix tasklet termination in dma channel release
    hfsplus: fix remount issue
    zram: avoid null access when fail to alloc meta
    sh: prefix sh-specific "CCR" and "CCR2" by "SH_"
    ocfs2: fix quota file corruption
    drivers/rtc/rtc-s3c.c: fix incorrect way of save/restore of S3C2410_TICNT for TYPE_S3C64XX
    kallsyms: fix absolute addresses for kASLR
    scripts/gen_initramfs_list.sh: fix flags for initramfs LZ4 compression
    mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
    memcg: reparent charges of children before processing parent
    memcg: fix endless loop in __mem_cgroup_iter_next()
    lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock
    dma debug: account for cachelines and read-only mappings in overlap tracking
    mm: close PageTail race
    MAINTAINERS: EDAC: add Mauro and Borislav as interim patch collectors

    Linus Torvalds
     

04 Mar, 2014

22 commits

  • Jan Stancek reports manual page migration encountering allocation
    failures after some pages when there is still plenty of memory free, and
    bisected the problem down to commit 81c0a2bb515f ("mm: page_alloc: fair
    zone allocator policy").

    The problem is that GFP_THISNODE obeys the zone fairness allocation
    batches on one hand, but doesn't reset them and wake kswapd on the other
    hand. After a few of those allocations, the batches are exhausted and
    the allocations fail.

    Fixing this means either having GFP_THISNODE wake up kswapd, or
    GFP_THISNODE not participating in zone fairness at all. The latter
    seems safer as an acute bugfix, we can clean up later.

    Reported-by: Jan Stancek
    Signed-off-by: Johannes Weiner
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • When doing some numa tests on powerpc, I triggered an oops bug. I find
    it is caused by using page->_last_cpupid. It should be initialized as
    "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(),
    we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And
    finally cause an oops bug in task_numa_group(), since the online cpu is
    less than possible cpu. This happen with CONFIG_SPARSE_VMEMMAP disabled

    Call trace:

    SMP NR_CPUS=64 NUMA PowerNV
    Modules linked in:
    CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32
    task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000
    REGS: c000001e32c53510 TRAP: 0300 Not tainted(3.13.0-rc1+)
    MSR: 9000000000009032 CR:28024424 XER: 20000000
    CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1
    NIP .task_numa_fault+0x1470/0x2370
    LR .task_numa_fault+0x1468/0x2370
    Call Trace:
    .task_numa_fault+0x1468/0x2370 (unreliable)
    .do_numa_page+0x480/0x4a0
    .handle_mm_fault+0x4ec/0xc90
    .do_page_fault+0x3a8/0x890
    handle_page_fault+0x10/0x30
    Instruction dump:
    3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
    3884b138 48d9cfd5 60000000 e93f0100 7d2907b45529063e 7d2a07b4
    ---[ end trace 15f2510da5ae07cf ]---

    Signed-off-by: Liu Ping Fan
    Signed-off-by: Aneesh Kumar K.V
    Acked-by: Peter Zijlstra
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Liu Ping Fan
     
  • Tree location entries should start with the appropriate type.

    Add git to some, hg to another.

    Neaten tree type description.

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Convert whitespace to single tab for separators.

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • This patch is a modification of the patch originally proposed by
    Xiaotian Feng : https://lkml.org/lkml/2012/11/5/413
    This new version disables DMA channel interrupts and ensures that the
    tasklet wil not be scheduled again before calling tasklet_kill().

    Unfortunately the updated patch was not released at that time due to
    planned rework of Tsi721 mport driver to use threaded interrupts (which
    has yet to happen). Recently the issue was reported again:
    https://lkml.org/lkml/2014/2/19/762.

    Description from the original Xiaotian's patch:

    "Some drivers use tasklet_disable in device remove/release process,
    tasklet_disable will inc tasklet->count and return. If the tasklet is
    not handled yet under some softirq pressure, the tasklet will be
    placed on the tasklet_vec, never have a chance to be excuted. This
    might lead to a heavy loaded ksoftirqd, wakeup with pending_softirq,
    but tasklet is disabled. tasklet_kill should be used in this case."

    This patch is applicable to kernel versions starting from v3.5.

    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Cc: Xiaotian Feng
    Reviewed-by: Thomas Gleixner
    Cc: Mike Galbraith
    Cc: [3.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Current implementation of HFS+ driver has small issue with remount
    option. Namely, for example, you are unable to remount from RO mode
    into RW mode by means of command "mount -o remount,rw /dev/loop0
    /mnt/hfsplus". Trying to execute sequence of commands results in an
    error message:

    mount /dev/loop0 /mnt/hfsplus
    mount -o remount,ro /dev/loop0 /mnt/hfsplus
    mount -o remount,rw /dev/loop0 /mnt/hfsplus

    mount: you must specify the filesystem type

    mount -t hfsplus -o remount,rw /dev/loop0 /mnt/hfsplus

    mount: /mnt/hfsplus not mounted or bad option

    The reason of such issue is failure of mount syscall:

    mount("/dev/loop0", "/mnt/hfsplus", 0x2282a60, MS_MGC_VAL|MS_REMOUNT, NULL) = -1 EINVAL (Invalid argument)

    Namely, hfsplus_parse_options_remount() method receives empty "input"
    argument and return false in such case. As a result, hfsplus_remount()
    returns -EINVAL error code.

    This patch fixes the issue by means of return true for the case of empty
    "input" argument in hfsplus_parse_options_remount() method.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • zram_meta_alloc could fail so caller should check it. Otherwise, your
    system will hang.

    Signed-off-by: Minchan Kim
    Acked-by: Jerome Marchand
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Commit bcf24e1daa94 ("mmc: omap_hsmmc: use the generic config for
    omap2plus devices"), enabled the build for other platforms for compile
    testing.

    sh-allmodconfig now fails with:

    include/linux/omap-dma.h:171:8: error: expected identifier before numeric constant
    make[4]: *** [drivers/mmc/host/omap_hsmmc.o] Error 1

    This happens because SuperH #defines "CCR", which is one of the enum
    values in include/linux/omap-dma.h. There's a similar issue with "CCR2"
    on sh2a.

    As "CCR" and "CCR2" are too generic names for global #defines, prefix
    them with "SH_" to fix this.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Global quota files are accessed from different nodes. Thus we cannot
    cache offset of quota structure in the quota file after we drop our node
    reference count to it because after that moment quota structure may be
    freed and reallocated elsewhere by a different node resulting in
    corruption of quota file.

    Fix the problem by clearing dq_off when we are releasing dquot structure.
    We also remove the DB_READ_B handling because it is useless -
    DQ_ACTIVE_B is set iff DQ_READ_B is set.

    Signed-off-by: Jan Kara
    Cc: Goldwyn Rodrigues
    Cc: Joel Becker
    Reviewed-by: Mark Fasheh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • On exynos5250, exynos5420 and exynos5260 it was observed that, after 1
    cycle of S2R, the rtc-tick occurs at a very fast rate as compared to the
    rtc-tick occuring before S2R.

    This patch fixes the above issue by correcting the wrong way of
    save/restore of S3C2410_TICNT for TYPE_S3C64XX.

    Signed-off-by: Vikas Sajjan
    Cc: Grant Likely
    Cc: Rob Herring
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vikas Sajjan
     
  • Currently symbols that are absolute addresses are incorrectly displayed
    in /proc/kallsyms if the kernel is loaded with kASLR.

    The problem was that the scripts/kallsyms.c file which generates the
    array of symbol names and addresses uses an relocatable value for all
    symbols, even absolute symbols. This patch fixes that.

    Several kallsyms output in different boot states for comparison:

    $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.nokaslr
    0000000000000000 D __per_cpu_start
    0000000000014280 D __per_cpu_end
    ffffffff810001c8 T _stext
    $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr1
    000000001f200000 D __per_cpu_start
    000000001f214280 D __per_cpu_end
    ffffffffa02001c8 T _stext
    $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr2
    000000000d400000 D __per_cpu_start
    000000000d414280 D __per_cpu_end
    ffffffff8e4001c8 T _stext
    $ egrep '_(stext|_per_cpu_(start|end))' /root/kallsyms.kaslr-fixed
    0000000000000000 D __per_cpu_start
    0000000000014280 D __per_cpu_end
    ffffffffadc001c8 T _stext

    Signed-off-by: Andy Honig
    Signed-off-by: Kees Cook
    Cc: Michal Marek
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Honig
     
  • LZ4 as implemented in the kernel differs from the default method now
    used by the reference implementation of LZ4. Until the in-kernel method
    is updated to support the new default, passing the legacy flag (-l) to
    the compressor is necessary. Without this flag the kernel-generated,
    LZ4-compressed initramfs is junk.

    Kyungsik said:

    : It seems that lz4 supports legacy format with the same option as lz4c
    : does. Just looking at the first few bytes of lz4 compressed image, we can
    : see whether it is new format or not.
    :
    : It shows new format magic number without this patch. New format magic
    : number is 0x184d2204.
    :
    : $ hexdump -C ./initramfs_data.cpio.lz4 |more
    : 00000000 04 22 4d 18 64 70 b9 69 (Little Endian)
    : ...
    :
    : Currently kernel supports legacy format only.

    Signed-off-by: Daniel M. Weeks
    Cc: Michal Marek
    Acked-by: Kyungsik Lee
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel M. Weeks
     
  • Daniel Borkmann reported a VM_BUG_ON assertion failing:

    ------------[ cut here ]------------
    kernel BUG at mm/mlock.c:528!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ccm arc4 iwldvm [...]
    video
    CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
    Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
    task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
    RIP: 0010:[] [] munlock_vma_pages_range+0x2e0/0x2f0
    Call Trace:
    do_munmap+0x18f/0x3b0
    vm_munmap+0x41/0x60
    SyS_munmap+0x22/0x30
    system_call_fastpath+0x1a/0x1f
    RIP munlock_vma_pages_range+0x2e0/0x2f0
    ---[ end trace a0088dcf07ae10f2 ]---

    because munlock_vma_pages_range() thinks it's unexpectedly in the middle
    of a THP page. This can be reproduced with default config since 3.11
    kernels. A reproducer can be found in the kernel's selftest directory
    for networking by running ./psock_tpacket.

    The problem is that an order=2 compound page (allocated by
    alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
    by packet_mmap()) and mistaken for a THP page and assumed to be order=9.

    The checks for THP in munlock came with commit ff6a6da60b89 ("mm:
    accelerate munlock() treatment of THP pages"), i.e. since 3.9, but did
    not trigger a bug. It just makes munlock_vma_pages_range() skip such
    compound pages until the next 512-pages-aligned page, when it encounters
    a head page. This is however not a problem for vma's where mlocking has
    no effect anyway, but it can distort the accounting.

    Since commit 7225522bb429 ("mm: munlock: batch non-THP page isolation
    and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
    PageTransHuge() check.

    This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
    list of flags that make vma's non-mlockable and non-mergeable. The
    reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
    already on the VM_SPECIAL list, and both are intended for non-LRU pages
    where mlocking makes no sense anyway. Related Lkml discussion can be
    found in [2].

    [1] tools/testing/selftests/net/psock_tpacket
    [2] https://lkml.org/lkml/2014/1/10/427

    Signed-off-by: Vlastimil Babka
    Signed-off-by: Daniel Borkmann
    Reported-by: Daniel Borkmann
    Tested-by: Daniel Borkmann
    Cc: Thomas Hellstrom
    Cc: John David Anglin
    Cc: HATAYAMA Daisuke
    Cc: Konstantin Khlebnikov
    Cc: Carsten Otte
    Cc: Jared Hulbert
    Tested-by: Hannes Frederic Sowa
    Cc: Kirill A. Shutemov
    Acked-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: [3.11.x+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Sometimes the cleanup after memcg hierarchy testing gets stuck in
    mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.

    There may turn out to be several causes, but a major cause is this: the
    workitem to offline parent can get run before workitem to offline child;
    parent's mem_cgroup_reparent_charges() circles around waiting for the
    child's pages to be reparented to its lrus, but it's holding
    cgroup_mutex which prevents the child from reaching its
    mem_cgroup_reparent_charges().

    Further testing showed that an ordered workqueue for cgroup_destroy_wq
    is not always good enough: percpu_ref_kill_and_confirm's call_rcu_sched
    stage on the way can mess up the order before reaching the workqueue.

    Instead, when offlining a memcg, call mem_cgroup_reparent_charges() on
    all its children (and grandchildren, in the correct order) to have their
    charges reparented first.

    Fixes: e5fca243abae ("cgroup: use a dedicated workqueue for cgroup destruction")
    Signed-off-by: Filipe Brandenburger
    Signed-off-by: Hugh Dickins
    Reviewed-by: Tejun Heo
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: [v3.10+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Filipe Brandenburger
     
  • Commit 0eef615665ed ("memcg: fix css reference leak and endless loop in
    mem_cgroup_iter") got the interaction with the commit a few before it
    d8ad30559715 ("mm/memcg: iteration skip memcgs not yet fully
    initialized") slightly wrong, and we didn't notice at the time.

    It's elusive, and harder to get than the original, but for a couple of
    days before rc1, I several times saw a endless loop similar to that
    supposedly being fixed.

    This time it was a tighter loop in __mem_cgroup_iter_next(): because we
    can get here when our root has already been offlined, and the ordering
    of conditions was such that we then just cycled around forever.

    Fixes: 0eef615665ed ("memcg: fix css reference leak and endless loop in mem_cgroup_iter").
    Signed-off-by: Hugh Dickins
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Greg Thelen
    Cc: [3.12+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Running fsx on tmpfs with concurrent memhog-swapoff-swapon, lots of

    BUG: sleeping function called from invalid context at kernel/fork.c:606
    in_atomic(): 0, irqs_disabled(): 0, pid: 1394, name: swapoff
    1 lock held by swapoff/1394:
    #0: (rcu_read_lock){.+.+.+}, at: [] radix_tree_locate_item+0x1f/0x2b6

    followed by

    ================================================
    [ BUG: lock held when returning to user space! ]
    3.14.0-rc1 #3 Not tainted
    ------------------------------------------------
    swapoff/1394 is leaving the kernel with locks still held!
    1 lock held by swapoff/1394:
    #0: (rcu_read_lock){.+.+.+}, at: [] radix_tree_locate_item+0x1f/0x2b6

    after which the system recovered nicely.

    Whoops, I long ago forgot the rcu_read_unlock() on one unlikely branch.

    Fixes e504f3fdd63d ("tmpfs radix_tree: locate_item to speed up swapoff")

    Signed-off-by: Hugh Dickins
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • While debug_dma_assert_idle() checks if a given *page* is actively
    undergoing dma the valid granularity of a dma mapping is a *cacheline*.
    Sander's testing shows that the warning message "DMA-API: exceeded 7
    overlapping mappings of pfn..." is falsely triggering. The test is
    simply mapping multiple cachelines in a given page.

    Ultimately we want overlap tracking to be valid as it is a real api
    violation, so we need to track active mappings by cachelines. Update
    the active dma tracking to use the page-frame-relative cacheline of the
    mapping as the key, and update debug_dma_assert_idle() to check for all
    possible mapped cachelines for a given page.

    However, the need to track active mappings is only relevant when the
    dma-mapping is writable by the device. In fact it is fairly standard
    for read-only mappings to have hundreds or thousands of overlapping
    mappings at once. Limiting the overlap tracking to writable
    (!DMA_TO_DEVICE) eliminates this class of false-positive overlap
    reports.

    Note, the radix gang lookup is sub-optimal. It would be best if it
    stopped fetching entries once the search passed a page boundary.
    Nevertheless, this implementation does not perturb the original net_dma
    failing case. That is to say the extra overhead does not show up in
    terms of making the failing case pass due to a timing change.

    References:
    http://marc.info/?l=linux-netdev&m=139232263419315&w=2
    http://marc.info/?l=linux-netdev&m=139217088107122&w=2

    Signed-off-by: Dan Williams
    Reported-by: Sander Eikelenboom
    Reported-by: Dave Jones
    Tested-by: Dave Jones
    Tested-by: Sander Eikelenboom
    Cc: Konrad Rzeszutek Wilk
    Cc: Francois Romieu
    Cc: Eric Dumazet
    Cc: Wei Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Commit bf6bddf1924e ("mm: introduce compaction and migration for
    ballooned pages") introduces page_count(page) into memory compaction
    which dereferences page->first_page if PageTail(page).

    This results in a very rare NULL pointer dereference on the
    aforementioned page_count(page). Indeed, anything that does
    compound_head(), including page_count() is susceptible to racing with
    prep_compound_page() and seeing a NULL or dangling page->first_page
    pointer.

    This patch uses Andrea's implementation of compound_trans_head() that
    deals with such a race and makes it the default compound_head()
    implementation. This includes a read memory barrier that ensures that
    if PageTail(head) is true that we return a head page that is neither
    NULL nor dangling. The patch then adds a store memory barrier to
    prep_compound_page() to ensure page->first_page is set.

    This is the safest way to ensure we see the head page that we are
    expecting, PageTail(page) is already in the unlikely() path and the
    memory barriers are unfortunately required.

    Hugetlbfs is the exception, we don't enforce a store memory barrier
    during init since no race is possible.

    Signed-off-by: David Rientjes
    Cc: Holger Kiehl
    Cc: Christoph Lameter
    Cc: Rafael Aquini
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: "Kirill A. Shutemov"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • We're more or less collecting EDAC patches already anyway so let's hold it
    down so that get_maintainer sees it too.

    Signed-off-by: Borislav Petkov
    Acked-by: Mauro Carvalho Chehab
    Cc: Doug Thompson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • Macvlan currently inherits all of its features from the lower
    device. When lower device disables offload support, this causes
    macvlan to disable offload support as well. This causes
    performance regression when using macvlan/macvtap in bridge
    mode.

    It can be easily demonstrated by creating 2 namespaces using
    macvlan in bridge mode and running netperf between them:

    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 20.00 1204.61

    To restore the performance, we add software offload features
    to the list of "always_on" features for macvlan. This way
    when a namespace or a guest using macvtap initially sends a
    packet, this packet will not be segmented at macvlan level.
    It will only be segmented when macvlan sends the packet
    to the lower device.

    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 20.00 5507.35

    Fixes: 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 (macvtap: Add support of packet capture on macvtap device.)
    Fixes: 797f87f83b60685ff8a13fa0572d2f10393c50d3 (macvlan: fix netdev feature propagation from lower device)
    CC: Florian Westphal
    CC: Christian Borntraeger
    CC: Jason Wang
    CC: Michael S. Tsirkin
    Tested-by: Christian Borntraeger
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • John W. Linville says:

    ====================
    Please pull this batch of fixes intended for the 3.14 stream...

    For the mac80211 bits, Johannes says:

    "This time I have a fix to get out of an 'infinite error state' in case
    regulatory domain updates failed and two fixes for VHT associations: one
    to not disconnect immediately when the AP uses more bandwidth than the
    new regdomain would allow after a change due to association country
    information getting used, and one for an issue in the code where
    mac80211 doesn't correctly ignore a reserved field and then uses an HT
    instead of VHT association."

    For the iwlwifi bits, Emmanuel says:

    "Johannes fixes a long standing bug in the AMPDU status reporting.
    Max fixes the listen time which was way too long and causes trouble
    to several APs."

    Along with those, Bing Zhao marks the mwifiex_usb driver as _not_
    supporting USB autosuspend after a number of problems with that have
    been reported.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • RFC4895 introduced AUTH chunks for SCTP; during the SCTP
    handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS
    being optional though):

    ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->



    peer
    meta data (peer_random, peer_hmacs, peer_chunks) in case
    sysctl -w net.sctp.auth_enable=1 is set. If in INIT's
    SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set,
    peer_random != NULL and peer_hmacs != NULL the peer is to be
    assumed asoc->peer.auth_capable=1, in any other case
    asoc->peer.auth_capable=0.

    Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is
    available, we set up a fake auth chunk and pass that on to
    sctp_sf_authenticate(), which at latest in
    sctp_auth_calculate_hmac() reliably dereferences a NULL pointer
    at position 0..0008 when setting up the crypto key in
    crypto_hash_setkey() by using asoc->asoc_shared_key that is
    NULL as condition key_id == asoc->active_key_id is true if
    the AUTH chunk was injected correctly from remote. This
    happens no matter what net.sctp.auth_enable sysctl says.

    The fix is to check for net->sctp.auth_enable and for
    asoc->peer.auth_capable before doing any operations like
    sctp_sf_authenticate() as no key is activated in
    sctp_auth_asoc_init_active_key() for each case.

    Now as RFC4895 section 6.3 states that if the used HMAC-ALGO
    passed from the INIT chunk was not used in the AUTH chunk, we
    SHOULD send an error; however in this case it would be better
    to just silently discard such a maliciously prepared handshake
    as we didn't even receive a parameter at all. Also, as our
    endpoint has no shared key configured, section 6.3 says that
    MUST silently discard, which we are doing from now onwards.

    Before calling sctp_sf_pdiscard(), we need not only to free
    the association, but also the chunk->auth_chunk skb, as
    commit bbd0d59809f9 created a skb clone in that case.

    I have tested this locally by using netfilter's nfqueue and
    re-injecting packets into the local stack after maliciously
    modifying the INIT chunk (removing RANDOM; HMAC-ALGO param)
    and the SCTP packet containing the COOKIE_ECHO (injecting
    AUTH chunk before COOKIE_ECHO). Fixed with this patch applied.

    Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk")
    Signed-off-by: Daniel Borkmann
    Cc: Vlad Yasevich
    Cc: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Daniel Borkmann