16 Feb, 2015

1 commit

  • Pull CRIS changes from Jesper Nilsson.

    * tag 'cris-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris:
    CRIS: Whitespace cleanup
    CRIS: macro whitespace fixes in uaccess.h
    CRIS: uaccess: fix sparse errors
    CRISv32: Remove unnecessary KERN_INFO from sync_serial
    CRIS: Fix missing NR_CPUS in menuconfig
    CRISv32: Avoid warning of unused variable
    CRIS: Avoid warning in cris mm/fault.c
    CRIS: Export csum_partial_copy_nocheck

    Linus Torvalds
     

15 Feb, 2015

4 commits


13 Feb, 2015

1 commit

  • If an attacker can cause a controlled kernel stack overflow, overwriting
    the restart block is a very juicy exploit target. This is because the
    restart_block is held in the same memory allocation as the kernel stack.

    Moving the restart block to struct task_struct prevents this exploit by
    making the restart_block harder to locate.

    Note that there are other fields in thread_info that are also easy
    targets, at least on some architectures.

    It's also a decent simplification, since the restart code is more or less
    identical on all architectures.

    [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
    Signed-off-by: Andy Lutomirski
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: David Miller
    Acked-by: Richard Weinberger
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Steven Miao
    Cc: Mark Salter
    Cc: Aurelien Jacquiot
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Richard Kuo
    Cc: "Luck, Tony"
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Jonas Bonn
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Michael Ellerman (powerpc)
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Chris Metcalf
    Cc: Guan Xuetao
    Cc: Chris Zankel
    Cc: Max Filippov
    Cc: Oleg Nesterov
    Cc: Guenter Roeck
    Signed-off-by: James Hogan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

12 Feb, 2015

1 commit

  • LKP has triggered a compiler warning after my recent patch "mm: account
    pmd page tables to the process":

    mm/mmap.c: In function 'exit_mmap':
    >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]

    The code:

    > 2857 WARN_ON(mm_nr_pmds(mm) >
    2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);

    In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has
    the same type -- int. PUD_SHIFT.

    I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
    long. On every arch for consistency.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

11 Feb, 2015

1 commit


30 Jan, 2015

1 commit

  • The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
    "you should SIGSEGV" error, because the SIGSEGV case was generally
    handled by the caller - usually the architecture fault handler.

    That results in lots of duplication - all the architecture fault
    handlers end up doing very similar "look up vma, check permissions, do
    retries etc" - but it generally works. However, there are cases where
    the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

    In particular, when accessing the stack guard page, libsigsegv expects a
    SIGSEGV. And it usually got one, because the stack growth is handled by
    that duplicated architecture fault handler.

    However, when the generic VM layer started propagating the error return
    from the stack expansion in commit fee7e49d4514 ("mm: propagate error
    from stack expansion even for guard page"), that now exposed the
    existing VM_FAULT_SIGBUS result to user space. And user space really
    expected SIGSEGV, not SIGBUS.

    To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
    duplicate architecture fault handlers about it. They all already have
    the code to handle SIGSEGV, so it's about just tying that new return
    value to the existing code, but it's all a bit annoying.

    This is the mindless minimal patch to do this. A more extensive patch
    would be to try to gather up the mostly shared fault handling logic into
    one generic helper routine, and long-term we really should do that
    cleanup.

    Just from this patch, you can generally see that most architectures just
    copied (directly or indirectly) the old x86 way of doing things, but in
    the meantime that original x86 model has been improved to hold the VM
    semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
    "newer" things, so it would be a good idea to bring all those
    improvements to the generic case and teach other architectures about
    them too.

    Reported-and-tested-by: Takashi Iwai
    Tested-by: Jan Engelhardt
    Acked-by: Heiko Carstens # "s390 still compiles and boots"
    Cc: linux-arch@vger.kernel.org
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Jan, 2015

4 commits


26 Jan, 2015

1 commit


20 Jan, 2015

2 commits

  • Convert file->f_dentry->d_inode to file_inode() so as to get layered
    filesystems right.

    Found with: git grep '[.>]f_dentry'

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Nothing needs the module pointer any more, and the next patch will
    call it from RCU, where the module itself might no longer exist.
    Removing the arg is the safest approach.

    This just codifies the use of the module_alloc/module_free pattern
    which ftrace and bpf use.

    Signed-off-by: Rusty Russell
    Acked-by: Alexei Starovoitov
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: Ralf Baechle
    Cc: Ley Foon Tan
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Steven Rostedt
    Cc: x86@kernel.org
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Masami Hiramatsu
    Cc: linux-cris-kernel@axis.com
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Cc: nios2-dev@lists.rocketboards.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: sparclinux@vger.kernel.org
    Cc: netdev@vger.kernel.org

    Rusty Russell
     

20 Dec, 2014

14 commits


12 Dec, 2014

1 commit

  • Pull networking updates from David Miller:

    1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

    2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers. Thanks to Al Viro
    and Herbert Xu.

    3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

    4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

    5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

    6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

    7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

    8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

    9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets. From Alexei
    Starovoitov.

    10) Support TSO/LSO in sunvnet driver, from David L Stevens.

    11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

    12) Remote checksum offload, from Tom Herbert.

    13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

    14) Add MPLS support to openvswitch, from Simon Horman.

    15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

    16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet. This tries to resolve the conflicting goals between the
    desired handling of bulk vs. RPC-like traffic.

    17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU. From Eric Dumazet.

    18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

    19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

    20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

    21) Add VLAN packet scheduler action, from Jiri Pirko.

    22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
    Fix race condition between vxlan_sock_add and vxlan_sock_release
    net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
    net/mlx4: Add support for A0 steering
    net/mlx4: Refactor QUERY_PORT
    net/mlx4_core: Add explicit error message when rule doesn't meet configuration
    net/mlx4: Add A0 hybrid steering
    net/mlx4: Add mlx4_bitmap zone allocator
    net/mlx4: Add a check if there are too many reserved QPs
    net/mlx4: Change QP allocation scheme
    net/mlx4_core: Use tasklet for user-space CQ completion events
    net/mlx4_core: Mask out host side virtualization features for guests
    net/mlx4_en: Set csum level for encapsulated packets
    be2net: Export tunnel offloads only when a VxLAN tunnel is created
    gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
    cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
    net: fec: only enable mdio interrupt before phy device link up
    net: fec: clear all interrupt events to support i.MX6SX
    net: fec: reset fep link status in suspend function
    net: sock: fix access via invalid file descriptor
    net: introduce helper macro for_each_cmsghdr
    ...

    Linus Torvalds
     

11 Dec, 2014

1 commit

  • As there are now no remaining users of arch_fast_hash(), lets kill
    it entirely.

    This basically reverts commit 71ae8aac3e19 ("lib: introduce arch
    optimized hash library") and follow-up work, that is f.e., commit
    237217546d44 ("lib: hash: follow-up fixups for arch hash"),
    commit e3fec2f74f7f ("lib: Add missing arch generic-y entries for
    asm-generic/hash.h") and last but not least commit 6a02652df511
    ("perf tools: Fix include for non x86 architectures").

    Cc: Francesco Fusco
    Cc: Thomas Graf
    Cc: Arnaldo Carvalho de Melo
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

06 Dec, 2014

1 commit

  • introduce new setsockopt() command:

    setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))

    where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
    and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER

    setsockopt() calls bpf_prog_get() which increments refcnt of the program,
    so it doesn't get unloaded while socket is using the program.

    The same eBPF program can be attached to multiple sockets.

    User task exit automatically closes socket which calls sk_filter_uncharge()
    which decrements refcnt of eBPF program

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

12 Nov, 2014

1 commit

  • Alternative to RPS/RFS is to use hardware support for multiple
    queues.

    Then split a set of million of sockets into worker threads, each
    one using epoll() to manage events on its own socket pool.

    Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
    know after accept() or connect() on which queue/cpu a socket is managed.

    We normally use one cpu per RX queue (IRQ smp_affinity being properly
    set), so remembering on socket structure which cpu delivered last packet
    is enough to solve the problem.

    After accept(), connect(), or even file descriptor passing around
    processes, applications can use :

    int cpu;
    socklen_t len = sizeof(cpu);

    getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

    And use this information to put the socket into the right silo
    for optimal performance, as all networking stack should run
    on the appropriate cpu, without need to send IPI (RPS/RFS).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Oct, 2014

1 commit

  • write{b,w,l}_relaxed are implemented by some architectures in order to
    permit memory-mapped I/O accesses with weaker barrier semantics than the
    non-relaxed variants.

    This patch adds dummy macros for the write accessors to Cris, in the same
    vein as the dummy definitions for the relaxed read accessors.

    Cc: Mikael Starvik
    Acked-by: Jesper Nilsson
    Signed-off-by: Will Deacon

    Will Deacon
     

13 Oct, 2014

2 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds
     
  • Pull arch atomic cleanups from Ingo Molnar:
    "This is a series kept separate from the main locking tree, which
    cleans up and improves various details in the atomics type handling:

    - Remove the unused atomic_or_long() method

    - Consolidate and compress atomic ops implementations between
    architectures, to reduce linecount and to make it easier to add new
    ops.

    - Rewrite generic atomic support to only require cmpxchg() from an
    architecture - generate all other methods from that"

    * 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
    locking, mips: Fix atomics
    locking, sparc64: Fix atomics
    locking,arch: Rewrite generic atomic support
    locking,arch,xtensa: Fold atomic_ops
    locking,arch,sparc: Fold atomic_ops
    locking,arch,sh: Fold atomic_ops
    locking,arch,powerpc: Fold atomic_ops
    locking,arch,parisc: Fold atomic_ops
    locking,arch,mn10300: Fold atomic_ops
    locking,arch,mips: Fold atomic_ops
    locking,arch,metag: Fold atomic_ops
    locking,arch,m68k: Fold atomic_ops
    locking,arch,m32r: Fold atomic_ops
    locking,arch,ia64: Fold atomic_ops
    locking,arch,hexagon: Fold atomic_ops
    locking,arch,cris: Fold atomic_ops
    locking,arch,avr32: Fold atomic_ops
    locking,arch,arm64: Fold atomic_ops
    locking,arch,arm: Fold atomic_ops
    ...

    Linus Torvalds
     

10 Oct, 2014

1 commit


03 Oct, 2014

1 commit

  • Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile.
    This is purely a stylistic change.

    Signed-off-by: Pranith Kumar
    Acked-by: Jesper Nilsson
    Acked-by: Hans-Christian Egtvedt
    Acked-by: Max Filippov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com
    Signed-off-by: Ingo Molnar

    Pranith Kumar
     

19 Sep, 2014

1 commit

  • schedule(), io_schedule() and schedule_timeout() always return
    with TASK_RUNNING state set, so one more setting is unnecessary.

    (All places in patch are visible good, only exception is
    kiblnd_scheduler() from:

    drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c

    Its schedule() is one line above standard 3 lines of unified diff)

    No places where set_current_state() is used for mb().

    Signed-off-by: Kirill Tkhai
    Signed-off-by: Peter Zijlstra (Intel)
    Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
    Cc: Alasdair Kergon
    Cc: Anil Belur
    Cc: Arnd Bergmann
    Cc: Dave Kleikamp
    Cc: David Airlie
    Cc: David Howells
    Cc: Dmitry Eremin
    Cc: Frank Blaschka
    Cc: Greg Kroah-Hartman
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Isaac Huang
    Cc: James E.J. Bottomley
    Cc: James E.J. Bottomley
    Cc: J. Bruce Fields
    Cc: Jeff Dike
    Cc: Jesper Nilsson
    Cc: Jiri Slaby
    Cc: Laura Abbott
    Cc: Liang Zhen
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Masaru Nomura
    Cc: Michael Opdenacker
    Cc: Mikael Starvik
    Cc: Mike Snitzer
    Cc: Neil Brown
    Cc: Oleg Drokin
    Cc: Peng Tao
    Cc: Richard Weinberger
    Cc: Robert Love
    Cc: Steven Rostedt
    Cc: Trond Myklebust
    Cc: Ursula Braun
    Cc: Zi Shen Lim
    Cc: devel@driverdev.osuosl.org
    Cc: dm-devel@redhat.com
    Cc: dri-devel@lists.freedesktop.org
    Cc: fcoe-devel@open-fcoe.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux390@de.ibm.com
    Cc: linux-afs@lists.infradead.org
    Cc: linux-cris-kernel@axis.com
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-raid@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: qla2xxx-upstream@qlogic.com
    Cc: user-mode-linux-devel@lists.sourceforge.net
    Cc: user-mode-linux-user@lists.sourceforge.net
    Signed-off-by: Ingo Molnar

    Kirill Tkhai