06 Feb, 2015

1 commit

  • Pull networking fixes from David Miller:

    1) Stretch ACKs can kill performance with Reno and CUBIC congestion
    control, largely due to LRO and GRO. Fix from Neal Cardwell.

    2) Fix userland breakage because we accidently emit zero length netlink
    messages from the bridging code. From Roopa Prabhu.

    3) Carry handling in generic csum_tcpudp_nofold is broken, fix from
    Karl Beldan.

    4) Remove bogus dev_set_net() calls from CAIF driver, from Nicolas
    Dichtel.

    5) Make sure PPP deflation never returns a length greater then the
    output buffer, otherwise we overflow and trigger skb_over_panic().
    Fix from Florian Westphal.

    6) COSA driver needs VIRT_TO_BUS Kconfig dependencies, from Arnd
    Bergmann.

    7) Don't increase route cached MTU on datagram too big ICMPs. From Li
    Wei.

    8) Fix error path leaks in nf_tables, from Pablo Neira Ayuso.

    9) Fix bitmask handling regression in netlink that broke things like
    acpi userland tools. From Pablo Neira Ayuso.

    10) Wrong header pointer passed to param_type2af() in SCTP code, from
    Saran Maruti Ramanara.

    11) Stacked vlans not handled correctly by vlan_get_protocol(), from
    Toshiaki Makita.

    12) Add missing DMA memory barrier to xgene driver, from Iyappan
    Subramanian.

    13) Fix crash in rate estimators, from Eric Dumazet.

    14) We've been adding various workarounds, one after another, for the
    change which added the per-net tcp_sock. It was meant to reduce
    socket contention but added lots of problems.

    Reduce this instead to a proper per-cpu socket and that rids us of
    all the daemons.

    From Eric Dumazet.

    15) Fix memory corruption and OOPS in mlx4 driver, from Jack
    Morgenstein.

    16) When we disabled UFO in the virtio_net device, it introduces some
    serious performance regressions. The orignal problem was IPV6
    fragment ID generation, so fix that properly instead. From Vlad
    Yasevich.

    17) sr9700 driver build breaks on xtensa because it defines macros with
    the same name as those used by the arch code. Use more unique
    names. From Chen Gang.

    18) Fix endianness in new virio 1.0 mode of the vhost net driver, from
    Michael S Tsirkin.

    19) Several sysctls were setting the maxlen attribute incorrectly, from
    Sasha Levin.

    20) Don't accept an FQ scheduler quantum of zero, that leads to crashes.
    From Kenneth Klette Jonassen.

    21) Fix dumping of non-existing actions in the packet scheduler
    classifier. From Ignacy Gawędzki.

    22) Return the write work_done value when doing TX work in the qlcnic
    driver.

    23) ip6gre_err accesses the info field with the wrong endianness, from
    Sabrina Dubroca.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
    sit: fix some __be16/u16 mismatches
    ipv6: fix sparse errors in ip6_make_flowlabel()
    net: remove some sparse warnings
    flow_keys: n_proto type should be __be16
    ip6_gre: fix endianness errors in ip6gre_err
    qlcnic: Fix NAPI poll routine for Tx completion
    amd-xgbe: Set RSS enablement based on hardware features
    amd-xgbe: Adjust for zero-based traffic class count
    cls_api.c: Fix dumping of non-existing actions' stats.
    pkt_sched: fq: avoid hang when quantum 0
    net: rds: use correct size for max unacked packets and bytes
    vhost/net: fix up num_buffers endian-ness
    gianfar: correct the bad expression while writing bit-pattern
    net: usb: sr9700: Use 'SR_' prefix for the common register macros
    Revert "drivers/net: Disable UFO through virtio"
    Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
    ipv6: Select fragment id during UFO segmentation if not set.
    xen-netback: stop the guest rx thread after a fatal error
    net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs
    isdn: off by one in connect_res()
    ...

    Linus Torvalds
     

05 Feb, 2015

2 commits

  • include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
    include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash
    include/net/ipv6.h:713:22: got unsigned int
    include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
    include/net/ipv6.h:719:22: warning: invalid assignment: ^=
    include/net/ipv6.h:719:22: left side has type restricted __be32
    include/net/ipv6.h:719:22: right side has type unsigned int

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • (struct flow_keys)->n_proto is in network order, use
    proper type for this.

    Fixes following sparse errors :

    net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
    net/core/flow_dissector.c:139:39: expected unsigned short [unsigned] [usertype] n_proto
    net/core/flow_dissector.c:139:39: got restricted __be16 [assigned] [usertype] proto
    net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
    net/core/flow_dissector.c:237:23: expected unsigned short [unsigned] [usertype] n_proto
    net/core/flow_dissector.c:237:23: got restricted __be16 [assigned] [usertype] proto

    Signed-off-by: Eric Dumazet
    Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Feb, 2015

1 commit

  • If the IPv6 fragment id has not been set and we perform
    fragmentation due to UFO, select a new fragment id.
    We now consider a fragment id of 0 as unset and if id selection
    process returns 0 (after all the pertrubations), we set it to
    0x80000000, thus giving us ample space not to create collisions
    with the next packet we may have to fragment.

    When doing UFO integrity checking, we also select the
    fragment id if it has not be set yet. This is stored into
    the skb_shinfo() thus allowing UFO to function correclty.

    This patch also removes duplicate fragment id generation code
    and moves ipv6_select_ident() into the header as it may be
    used during GSO.

    Signed-off-by: Vladislav Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

03 Feb, 2015

2 commits

  • Commit de966c592802 (net/mlx4_core: Support more than 64 VFs) was meant to
    allow up to 126 VFs. However, due to leaving MLX4_MFUNC_MAX too low, using
    more than 80 VFs resulted in memory corruptions (and Oopses) when more than
    80 VFs were requested. In addition, the number of slaves was left too high.

    This commit fixes these issues.

    Fixes: de966c592802 ("net/mlx4_core: Support more than 64 VFs")
    Signed-off-by: Jack Morgenstein
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Jack Morgenstein
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS fixes for net

    The following patchset contains Netfilter/IPVS fixes for your net tree,
    they are:

    1) Validate hooks for nf_tables NAT expressions, otherwise users can
    crash the kernel when using them from the wrong hook. We already
    got one user trapped on this when configuring masquerading.

    2) Fix a BUG splat in nf_tables with CONFIG_DEBUG_PREEMPT=y. Reported
    by Andreas Schultz.

    3) Avoid unnecessary reroute of traffic in the local input path
    in IPVS that triggers a crash in in xfrm. Reported by Florian
    Wiessner and fixes by Julian Anastasov.

    4) Fix memory and module refcount leak from the error path of
    nf_tables_newchain().
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Feb, 2015

2 commits

  • In commit be9f4a44e7d41 ("ipv4: tcp: remove per net tcp_sock")
    I tried to address contention on a socket lock, but the solution
    I chose was horrible :

    commit 3a7c384ffd57e ("ipv4: tcp: unicast_sock should not land outside
    of TCP stack") addressed a selinux regression.

    commit 0980e56e506b ("ipv4: tcp: set unicast_sock uc_ttl to -1")
    took care of another regression.

    commit b5ec8eeac46 ("ipv4: fix ip_send_skb()") fixed another regression.

    commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
    was another shot in the dark.

    Really, just use a proper socket per cpu, and remove the skb_orphan()
    call, to re-enable flow control.

    This solves a serious problem with FQ packet scheduler when used in
    hostile environments, as we do not want to allocate a flow structure
    for every RST packet sent in response to a spoofed packet.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report
    on nested sleep conditions, which we generally want to avoid because the
    inner sleeping operation can re-set the thread state to TASK_RUNNING,
    but that will then cause the outer sleep loop not actually sleep when it
    calls schedule.

    However, that's actually valid traditional behavior, with the inner
    sleep being some fairly rare case (like taking a sleeping lock that
    normally doesn't actually need to sleep).

    And the debug code would actually change the state of the task to
    TASK_RUNNING internally, which makes that kind of traditional and
    working code not work at all, because now the nested sleep doesn't just
    sometimes cause the outer one to not block, but will cause it to happen
    every time.

    In particular, it will cause the cardbus kernel daemon (pccardd) to
    basically busy-loop doing scheduling, converting a laptop into a heater,
    as reported by Bruno Prémont. But there may be other legacy uses of
    that nested sleep model in other drivers that are also likely to never
    get converted to the new model.

    This fixes both cases:

    - don't set TASK_RUNNING when the nested condition happens (note: even
    if WARN_ONCE() only _warns_ once, the return value isn't whether the
    warning happened, but whether the condition for the warning was true.
    So despite the warning only happening once, the "if (WARN_ON(..))"
    would trigger for every nested sleep.

    - in the cases where we knowingly disable the warning by using
    "sched_annotate_sleep()", don't change the task state (that is used
    for all core scheduling decisions), instead use '->task_state_change'
    that is used for the debugging decision itself.

    (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
    avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
    suggested change to use 'task_state_change' as part of the test)

    Reported-and-bisected-by: Bruno Prémont
    Tested-by: Rafael J Wysocki
    Acked-by: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner ,
    Cc: Ilya Dryomov ,
    Cc: Mike Galbraith
    Cc: Ingo Molnar
    Cc: Peter Hurley ,
    Cc: Davidlohr Bueso ,
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Feb, 2015

2 commits

  • Doing the following commands on a non idle network device
    panics the box instantly, because cpu_bstats gets overwritten
    by stats.

    tc qdisc add dev eth0 root
    ... some traffic (one packet is enough) ...
    tc qdisc replace dev eth0 root est 1sec 4sec

    [ 325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c
    [ 325.362609] IP: [] __gnet_stats_copy_basic+0x3e/0x90
    [ 325.369158] PGD 1fa7067 PUD 0
    [ 325.372254] Oops: 0000 [#1] SMP
    [ 325.375514] Modules linked in: ...
    [ 325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163
    [ 325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000
    [ 325.419518] RIP: 0010:[] [] __gnet_stats_copy_basic+0x3e/0x90
    [ 325.428506] RSP: 0018:ffff881ff2fa7928 EFLAGS: 00010286
    [ 325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c
    [ 325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060
    [ 325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000
    [ 325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744
    [ 325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c
    [ 325.469536] FS: 00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000
    [ 325.477630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0
    [ 325.490510] Stack:
    [ 325.492524] ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0
    [ 325.499990] ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744
    [ 325.507460] 00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0
    [ 325.514956] Call Trace:
    [ 325.517412] [] gen_new_estimator+0x8e/0x230
    [ 325.523250] [] gen_replace_estimator+0x4a/0x60
    [ 325.529349] [] tc_modify_qdisc+0x52b/0x590
    [ 325.535117] [] rtnetlink_rcv_msg+0xa0/0x240
    [ 325.540963] [] ? __rtnl_unlock+0x20/0x20
    [ 325.546532] [] netlink_rcv_skb+0xb1/0xc0
    [ 325.552145] [] rtnetlink_rcv+0x25/0x40
    [ 325.557558] [] netlink_unicast+0x168/0x220
    [ 325.563317] [] netlink_sendmsg+0x2ec/0x3e0

    Lets play safe and not use an union : percpu 'pointers' are mostly read
    anyway, and we have typically few qdiscs per host.

    Signed-off-by: Eric Dumazet
    Cc: John Fastabend
    Fixes: 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe")
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pull i2c fixes from Wolfram Sang:
    "i2c driver bugfixes (s3c2410, slave-eeprom, sh_mobile), size
    regression "bugfix" (i2c slave), documentation bugfix (st).

    Also, one documentation update (da9063), so some devicetrees can now
    be verified"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: sh_mobile: terminate DMA reads properly
    i2c: Only include slave support if selected
    i2c: s3c2410: fix ABBA deadlock by keeping clock prepared
    i2c: slave-eeprom: fix boundary check when using sysfs
    i2c: st: Rename clock reference to something that exists
    DT: i2c: Add devices handled by the da9063 MFD driver

    Linus Torvalds
     

31 Jan, 2015

3 commits

  • vlan_get_protocol() could not get network protocol if a skb has a 802.1ad
    vlan tag or multiple vlans, which caused incorrect checksum calculation
    in several drivers.

    Fix vlan_get_protocol() to retrieve network protocol instead of incorrect
    vlan protocol.

    As the logic is the same as skb_network_protocol(), create a common helper
    function __vlan_get_protocol() and call it from existing functions.

    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    Toshiaki Makita
     
  • Pull perf fixes from Ingo Molnar:
    "Mostly tooling fixes, but also an event groups fix, two PMU driver
    fixes and a CPU model variant addition"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Tighten (and fix) the grouping condition
    perf/x86/intel: Add model number for Airmont
    perf/rapl: Fix crash in rapl_scale()
    perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization
    perf probe: Fix probing kretprobes
    perf symbols: Introduce 'for' method to iterate over the symbols with a given name
    perf probe: Do not rely on map__load() filter to find symbols
    perf symbols: Introduce method to iterate symbols ordered by name
    perf symbols: Return the first entry with a given name in find_by_name method
    perf annotate: Fix memory leaks in LOCK handling
    perf annotate: Handle ins parsing failures
    perf scripting perl: Force to use stdbool
    perf evlist: Remove extraneous 'was' on error message

    Linus Torvalds
     
  • Pull quota and UDF fix from Jan Kara:
    "A fix for UDF to properly free preallocated blocks and a fix for quota
    so that Q_GETQUOTA quotactl reports correct numbers for XFS filesystem
    (and similarly Q_XGETQUOTA quotactl works properly for other
    filesystems)"

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units
    udf: Release preallocation on last writeable close

    Linus Torvalds
     

30 Jan, 2015

1 commit

  • The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
    "you should SIGSEGV" error, because the SIGSEGV case was generally
    handled by the caller - usually the architecture fault handler.

    That results in lots of duplication - all the architecture fault
    handlers end up doing very similar "look up vma, check permissions, do
    retries etc" - but it generally works. However, there are cases where
    the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

    In particular, when accessing the stack guard page, libsigsegv expects a
    SIGSEGV. And it usually got one, because the stack growth is handled by
    that duplicated architecture fault handler.

    However, when the generic VM layer started propagating the error return
    from the stack expansion in commit fee7e49d4514 ("mm: propagate error
    from stack expansion even for guard page"), that now exposed the
    existing VM_FAULT_SIGBUS result to user space. And user space really
    expected SIGSEGV, not SIGBUS.

    To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
    duplicate architecture fault handlers about it. They all already have
    the code to handle SIGSEGV, so it's about just tying that new return
    value to the existing code, but it's all a bit annoying.

    This is the mindless minimal patch to do this. A more extensive patch
    would be to try to gather up the mostly shared fault handling logic into
    one generic helper routine, and long-term we really should do that
    cleanup.

    Just from this patch, you can generally see that most architectures just
    copied (directly or indirectly) the old x86 way of doing things, but in
    the meantime that original x86 model has been improved to hold the VM
    semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
    "newer" things, so it would be a good idea to bring all those
    improvements to the generic case and teach other architectures about
    them too.

    Reported-and-tested-by: Takashi Iwai
    Tested-by: Jan Engelhardt
    Acked-by: Heiko Carstens # "s390 still compiles and boots"
    Cc: linux-arch@vger.kernel.org
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Jan, 2015

1 commit

  • LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
    cover more than the RFC-specified maximum of 2 packets. These stretch
    ACKs can cause serious performance shortfalls in common congestion
    control algorithms that were designed and tuned years ago with
    receiver hosts that were not using LRO or GRO, and were instead
    politely ACKing every other packet.

    This patch series fixes Reno and CUBIC to handle stretch ACKs.

    This patch prepares for the upcoming stretch ACK bug fix patches. It
    adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
    fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
    changes all congestion control algorithms to pass in 1 for the ACKed
    count. It also changes tcp_slow_start() to return the number of packet
    ACK "credits" that were not processed in slow start mode, and can be
    processed by the congestion control module in additive increase mode.

    In future patches we will fix tcp_cong_avoid_ai() to handle stretch
    ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
    and additive increase mode.

    Reported-by: Eyal Perry
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     

28 Jan, 2015

3 commits

  • The fix from 9fc81d87420d ("perf: Fix events installation during
    moving group") was incomplete in that it failed to recognise that
    creating a group with events for different CPUs is semantically
    broken -- they cannot be co-scheduled.

    Furthermore, it leads to real breakage where, when we create an event
    for CPU Y and then migrate it to form a group on CPU X, the code gets
    confused where the counter is programmed -- triggered in practice
    as well by me via the perf fuzzer.

    Fix this by tightening the rules for creating groups. Only allow
    grouping of counters that can be co-scheduled in the same context.
    This means for the same task and/or the same cpu.

    Fixes: 9fc81d87420d ("perf: Fix events installation during moving group")
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
    tracks space limits and usage in 512-byte blocks. However VFS quotas
    track usage in bytes (as some filesystems require that) and we need to
    somehow pass this information. Upto now it wasn't a problem because we
    didn't do any unit conversion (thus VFS quota routines happily stuck
    number of bytes into d_bcount field of struct fd_disk_quota). Only if
    you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
    / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
    tried this but reportedly some Samba users hit the problem in practice.
    So when we want interfaces compatible we need to fix this.

    We bite the bullet and define another quota structure used for passing
    information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
    to have more conversion routines in fs/quota/quota.c and another copying
    of quota structure slows down getting of quota information by about 2%
    but it seems cleaner than overloading e.g. units of d_bcount to bytes.

    CC: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Pull networking fixes from David Miller:

    1) Don't OOPS on socket AIO, from Christoph Hellwig.

    2) Scheduled scans should be aborted upon RFKILL, from Emmanuel
    Grumbach.

    3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish.

    4) Fix RCU locking across copy_to_user() in bpf code, from Alexei
    Starovoitov.

    5) Lots of crash, memory leak, short TX packet et al bug fixes in
    sh_eth from Ben Hutchings.

    6) Fix memory corruption in SCTP wrt. INIT collitions, from Daniel
    Borkmann.

    7) Fix return value logic for poll handlers in netxen, enic, and bnx2x.
    From Eric Dumazet and Govindarajulu Varadarajan.

    8) Header length calculation fix in mac80211 from Fred Chou.

    9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths.
    From Ezequiel Garcia.

    10) udp_diag has bogus logic in it's hash chain skipping, copy same fix
    tcp diag used. From Herbert Xu.

    11) amd-xgbe programs wrong rx flow control register, from Thomas
    Lendacky.

    12) Fix race leading to use after free in ping receive path, from Subash
    Abhinov Kasiviswanathan.

    13) Cache redirect routes otherwise we can get a heavy backlog of rcu
    jobs liberating DST_NOCACHE entries. From Hannes Frederic Sowa.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
    net: don't OOPS on socket aio
    stmmac: prevent probe drivers to crash kernel
    bnx2x: fix napi poll return value for repoll
    ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
    sh_eth: Fix DMA-API usage for RX buffers
    sh_eth: Check for DMA mapping errors on transmit
    sh_eth: Ensure DMA engines are stopped before freeing buffers
    sh_eth: Remove RX overflow log messages
    ping: Fix race in free in receive path
    udp_diag: Fix socket skipping within chain
    can: kvaser_usb: Fix state handling upon BUS_ERROR events
    can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
    can: kvaser_usb: Send correct context to URB completion
    can: kvaser_usb: Do not sleep in atomic context
    ipv4: try to cache dst_entries which would cause a redirect
    samples: bpf: relax test_maps check
    bpf: rcu lock must not be held when calling copy_to_user()
    net: sctp: fix slab corruption from use after free on INIT collisions
    net: mv643xx_eth: Fix highmem support in non-TSO egress path
    sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers
    ...

    Linus Torvalds
     

27 Jan, 2015

6 commits

  • Not caching dst_entries which cause redirects could be exploited by hosts
    on the same subnet, causing a severe DoS attack. This effect aggravated
    since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()").

    Lookups causing redirects will be allocated with DST_NOCACHE set which
    will force dst_release to free them via RCU. Unfortunately waiting for
    RCU grace period just takes too long, we can end up with >1M dst_entries
    waiting to be released and the system will run OOM. rcuos threads cannot
    catch up under high softirq load.

    Attaching the flag to emit a redirect later on to the specific skb allows
    us to cache those dst_entries thus reducing the pressure on allocation
    and deallocation.

    This issue was discovered by Marcelo Leitner.

    Cc: Julian Anastasov
    Signed-off-by: Marcelo Leitner
    Signed-off-by: Florian Westphal
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Merge misc fixes from Andrew Morton:
    "Six fixes"

    * emailed patches from Andrew Morton :
    drivers/rtc/rtc-s5m.c: terminate s5m_rtc_id array with empty element
    printk: add dummy routine for when CONFIG_PRINTK=n
    mm/vmscan: fix highidx argument type
    memcg: remove extra newlines from memcg oom kill log
    x86, build: replace Perl script with Shell script
    mm: page_alloc: embed OOM killing naturally into allocation slowpath

    Linus Torvalds
     
  • Pull regulator fixes from Mark Brown:
    "One correctness fix here for the s2mps11 driver which would have
    resulted in some of the regulators being completely broken together
    with a fix for locking in regualtor_put() (which is fortunately rarely
    called at all in practical systems)"

    * tag 'regulator-v3.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
    regulator: s2mps11: Fix wrong calculation of register offset
    regulator: core: fix race condition in regulator_put()

    Linus Torvalds
     
  • There are missing dummy routines for log_buf_addr_get() and
    log_buf_len_get() for when CONFIG_PRINTK is not set causing build
    failures.

    This patch adds these dummy routines at the appropriate location.

    Signed-off-by: Pranith Kumar
    Cc: Michael Ellerman
    Reviewed-by: Petr Mladek
    Acked-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pranith Kumar
     
  • The OOM killing invocation does a lot of duplicative checks against the
    task's allocation context. Rework it to take advantage of the existing
    checks in the allocator slowpath.

    The OOM killer is invoked when the allocator is unable to reclaim any
    pages but the allocation has to keep looping. Instead of having a check
    for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM
    invocation to the true branch of should_alloc_retry(). The __GFP_FS
    check from oom_gfp_allowed() can then be moved into the OOM avoidance
    branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test.

    __alloc_pages_may_oom() can then signal to the caller whether the OOM
    killer was invoked, instead of requiring it to duplicate the order and
    high_zoneidx checks to guess this when deciding whether to continue.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Make the slave support depend on CONFIG_I2C_SLAVE. Otherwise it gets
    included unconditionally, even when it is not needed.

    I2C bus drivers which implement slave support must select
    I2C_SLAVE.

    Signed-off-by: Jean Delvare
    Signed-off-by: Wolfram Sang

    Jean Delvare
     

26 Jan, 2015

1 commit

  • Pull timer fixes from Thomas Gleixner:
    "A set of small fixes:

    - regression fix for exynos_mct clocksource

    - trivial build fix for kona clocksource

    - functional one liner fix for the sh_tmu clocksource

    - two validation fixes to prevent (root only) data corruption in the
    kernel via settimeofday and adjtimex. Tagged for stable"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: adjtimex: Validate the ADJ_FREQUENCY values
    time: settimeofday: Validate the values of tv from user
    clocksource: sh_tmu: Set cpu_possible_mask to fix SMP broadcast
    clocksource: kona: fix __iomem annotation
    clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write

    Linus Torvalds
     

24 Jan, 2015

3 commits

  • Pull PCI fixes from Bjorn Helgaas:
    "These are fixes for:

    - a resource management problem that causes a Radeon "Fatal error
    during GPU init" on machines where the BIOS programmed an invalid
    Root Port window. This was a regression in v3.16.

    - an Atheros AR93xx device that doesn't handle PCI bus resets
    correctly. This was a regression in v3.14.

    - an out-of-date email address"

    * tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    MAINTAINERS: Update Richard Zhu's email address
    sparc/PCI: Clip bridge windows to fit in upstream windows
    powerpc/PCI: Clip bridge windows to fit in upstream windows
    parisc/PCI: Clip bridge windows to fit in upstream windows
    mn10300/PCI: Clip bridge windows to fit in upstream windows
    microblaze/PCI: Clip bridge windows to fit in upstream windows
    ia64/PCI: Clip bridge windows to fit in upstream windows
    frv/PCI: Clip bridge windows to fit in upstream windows
    alpha/PCI: Clip bridge windows to fit in upstream windows
    x86/PCI: Clip bridge windows to fit in upstream windows
    PCI: Add pci_claim_bridge_resource() to clip window if necessary
    PCI: Add pci_bus_clip_resource() to clip to fit upstream window
    PCI: Pass bridge device, not bus, when updating bridge windows
    PCI: Mark Atheros AR93xx to avoid bus reset
    PCI: Add flag for devices where we can't use bus reset

    Linus Torvalds
     
  • Pull devicetree bug fixes and documentation updates from Grant Likely:
    "A few bugfixes for the new DT overlay feature, documentation updates,
    spelling corrections, and changes to MAINTAINERS. Nothing earth
    shattering here"

    * tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux:
    of/unittest: Overlays with sub-devices tests
    of/platform: Handle of_populate drivers in notifier
    of/overlay: Do not generate duplicate nodes
    devicetree: document the "qemu" and "virtio" vendor prefixes
    devicetree: document ARM bindings for QEMU's Firmware Config interface
    Documentation: of: fix typo in graph bindings
    dma-mapping: fix debug print to display correct dma_pfn_offset
    of: replace Asahi Kasei Corp vendor prefix
    ARM: dt: GIC: Spelling s/specific/specifier/, s/flaggs/flags/
    dt/bindings: arm-boards: Spelling s/pointong/pointing/
    MAINTAINERS: Update DT website and git repository
    MAINTAINERS: drop DT regex matching on of_get_property and of_match_table

    Linus Torvalds
     
  • Pull kvm fixes from Paolo Bonzini:
    "Three small fixes.

    Two for x86 and one avoids that sparse bails out"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: SYSENTER emulation is broken
    KVM: x86: Fix of previously incomplete fix for CVE-2014-8480
    KVM: fix sparse warning in include/trace/events/kvm.h

    Linus Torvalds
     

23 Jan, 2015

1 commit

  • Pull module and param fixes from Rusty Russell:
    "Surprising number of fixes this merge window :(

    The first two are minor fallout from the param rework which went in
    this merge window.

    The next three are a series which fixes a longstanding (but never
    previously reported and unlikely , so no CC stable) race between
    kallsyms and freeing the init section.

    Finally, a minor cleanup as our module refcount will now be -1 during
    unload"

    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    module: make module_refcount() a signed integer.
    module: fix race in kallsyms resolution during module load success.
    module: remove mod arg from module_free, rename module_memfree().
    module_arch_freeing_init(): new hook for archs before module->module_init freed.
    param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC
    param: initialize store function to NULL if not available.

    Linus Torvalds
     

22 Jan, 2015

2 commits


21 Jan, 2015

1 commit

  • Pull libata fixes from Tejun Heo:

    - Bartlomiej will be co-maintaining PATA portion of libata. git
    workflow will stay the same.

    - sata_sil24 wasn't happy with tag ordered submission. An option to
    restore the old tag allocation behavior is implemented for sil24.

    - a very old race condition in PIO host state machine which can trigger
    BUG fixed.

    - other driver-specific changes

    * 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
    libata: prevent HSM state change race between ISR and PIO
    libata: allow sata_sil24 to opt-out of tag ordered submission
    ata: pata_at91: depend on !ARCH_MULTIPLATFORM
    ahci: Remove Device ID for Intel Sunrise Point PCH
    ahci: Use dev_info() to inform about the lack of Device Sleep support
    libata: Whitelist SSDs that are known to properly return zeroes after TRIM
    sata_dwc_460ex: fix resource leak on error path
    ata: add MAINTAINERS entry for libata PATA drivers
    libata: clean up MAINTAINERS entries
    libata: export ata_get_cmd_descript()
    ahci_xgene: Fix the DMA state machine lockup for the ATA_CMD_PACKET PIO mode command.
    ahci_xgene: Fix the endianess issue in APM X-Gene SoC AHCI SATA controller driver.

    Linus Torvalds
     

20 Jan, 2015

3 commits

  • Pull networking fixes from David Miller:

    1) Socket addresses returned in the error queue need to be fully
    initialized before being passed on to userspace, fix from Willem de
    Bruijn.

    2) Interrupt handling fixes to davinci_emac driver from Tony Lindgren.

    3) Fix races between receive packet steering and cpu hotplug, from Eric
    Dumazet.

    4) Allowing netlink sockets to subscribe to unknown multicast groups
    leads to crashes, don't allow it. From Johannes Berg.

    5) One to many socket races in SCTP fixed by Daniel Borkmann.

    6) Put in a guard against the mis-use of ipv6 atomic fragments, from
    Hagen Paul Pfeifer.

    7) Fix promisc mode and ethtool crashes in sh_eth driver, from Ben
    Hutchings.

    8) NULL deref and double kfree fix in sxgbe driver from Girish K.S and
    Byungho An.

    9) cfg80211 deadlock fix from Arik Nemtsov.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
    s2io: use snprintf() as a safety feature
    r8152: remove sram_read
    r8152: remove generic_ocp_read before writing
    bgmac: activate irqs only if there is nothing to poll
    bgmac: register napi before the device
    sh_eth: Fix ethtool operation crash when net device is down
    sh_eth: Fix promiscuous mode on chips without TSU
    ipv6: stop sending PTB packets for MTU < 1280
    net: sctp: fix race for one-to-many sockets in sendmsg's auto associate
    genetlink: synchronize socket closing and family removal
    genetlink: disallow subscribing to unknown mcast groups
    genetlink: document parallel_ops
    net: rps: fix cpu unplug
    net: davinci_emac: Add support for emac on dm816x
    net: davinci_emac: Fix ioremap for devices with MDIO within the EMAC address space
    net: davinci_emac: Fix incomplete code for getting the phy from device tree
    net: davinci_emac: Free clock after checking the frequency
    net: davinci_emac: Fix runtime pm calls for davinci_emac
    net: davinci_emac: Fix hangs with interrupts
    ip: zero sockaddr returned on error queue
    ...

    Linus Torvalds
     
  • Nothing needs the module pointer any more, and the next patch will
    call it from RCU, where the module itself might no longer exist.
    Removing the arg is the safest approach.

    This just codifies the use of the module_alloc/module_free pattern
    which ftrace and bpf use.

    Signed-off-by: Rusty Russell
    Acked-by: Alexei Starovoitov
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: Ralf Baechle
    Cc: Ley Foon Tan
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Steven Rostedt
    Cc: x86@kernel.org
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Cc: linux-cris-kernel@axis.com
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: nios2-dev@lists.rocketboards.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: sparclinux@vger.kernel.org
    Cc: netdev@vger.kernel.org

    Rusty Russell
     
  • Archs have been abusing module_free() to clean up their arch-specific
    allocations. Since module_free() is also (ab)used by BPF and trace code,
    let's keep it to simple allocations, and provide a hook called before
    that.

    This means that avr32, ia64, parisc and s390 no longer need to implement
    their own module_free() at all. avr32 doesn't need module_finalize()
    either.

    Signed-off-by: Rusty Russell
    Cc: Chris Metcalf
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-s390@vger.kernel.org

    Rusty Russell
     

19 Jan, 2015

4 commits

  • Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
    "Since commit 8a4aeec8d "libata/ahci: accommodate tag ordered
    controllers" the access to the harddisk on the first SATA-port is
    failing on its first access. The access to the harddisk on the
    second port is working normal.

    When reverting the above commit, access to both harddisks is working
    fine again."

    Maintain tag ordered submission as the default, but allow sata_sil24 to
    continue with the old behavior.

    Cc:
    Cc: Tejun Heo
    Reported-by: Ronny Hegewald
    Signed-off-by: Dan Williams
    Signed-off-by: Tejun Heo

    Dan Williams
     
  • The user can crash the kernel if it uses any of the existing NAT
    expressions from the wrong hook, so add some code to validate this
    when loading the rule.

    This patch introduces nft_chain_validate_hooks() which is based on
    an existing function in the bridge version of the reject expression.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • sparse complains about
    include/trace/events/kvm.h:163:1: error: directive in argument list
    include/trace/events/kvm.h:167:1: error: directive in argument list
    include/trace/events/kvm.h:169:1: error: directive in argument list
    and sparse is right. Preprocessing directives in an argument of a
    macro are undefined behaviour as of C99 6.10.3p11.

    Lets use an indirection to fix this.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Paolo Bonzini

    Christian Borntraeger
     
  • Pull input subsystem fixes from Dmitry Torokhov.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: uinput - fix ioctl nr overflow for UI_GET_SYSNAME/VERSION
    Input: I8042 - add Acer Aspire 7738 to the nomux list
    Input: elantech - support new ICs types for version 4
    Input: i8042 - reset keyboard to fix Elantech touchpad detection
    MAINTAINERS: remove Dmitry Torokhov's alternate address

    Linus Torvalds
     

17 Jan, 2015

1 commit

  • In addition to the problem Jeff Layton reported, I looked at the code
    and reproduced the same warning by subscribing and removing the genl
    family with a socket still open. This is a fairly tricky race which
    originates in the fact that generic netlink allows the family to go
    away while sockets are still open - unlike regular netlink which has
    a module refcount for every open socket so in general this cannot be
    triggered.

    Trying to resolve this issue by the obvious locking isn't possible as
    it will result in deadlocks between unregistration and group unbind
    notification (which incidentally lockdep doesn't find due to the home
    grown locking in the netlink table.)

    To really resolve this, introduce a "closing socket" reference counter
    (for generic netlink only, as it's the only affected family) in the
    core netlink code and use that in generic netlink to wait for all the
    sockets that are being closed at the same time as a generic netlink
    family is removed.

    This fixes the race that when a socket is closed, it will should call
    the unbind, but if the family is removed at the same time the unbind
    will not find it, leading to the warning. The real problem though is
    that in this case the unbind could actually find a new family that is
    registered to have a multicast group with the same ID, and call its
    mcast_unbind() leading to confusing.

    Also remove the warning since it would still trigger, but is now no
    longer a problem.

    This also moves the code in af_netlink.c to before unreferencing the
    module to avoid having the same problem in the normal non-genl case.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg