25 Feb, 2013

1 commit


24 Feb, 2013

2 commits


04 Feb, 2013

4 commits


29 Dec, 2012

1 commit


13 Dec, 2012

1 commit

  • Pull big execve/kernel_thread/fork unification series from Al Viro:
    "All architectures are converted to new model. Quite a bit of that
    stuff is actually shared with architecture trees; in such cases it's
    literally shared branch pulled by both, not a cherry-pick.

    A lot of ugliness and black magic is gone (-3KLoC total in this one):

    - kernel_thread()/kernel_execve()/sys_execve() redesign.

    We don't do syscalls from kernel anymore for either kernel_thread()
    or kernel_execve():

    kernel_thread() is essentially clone(2) with callback run before we
    return to userland, the callbacks either never return or do
    successful do_execve() before returning.

    kernel_execve() is a wrapper for do_execve() - it doesn't need to
    do transition to user mode anymore.

    As a result kernel_thread() and kernel_execve() are
    arch-independent now - they live in kernel/fork.c and fs/exec.c
    resp. sys_execve() is also in fs/exec.c and it's completely
    architecture-independent.

    - daemonize() is gone, along with its parts in fs/*.c

    - struct pt_regs * is no longer passed to do_fork/copy_process/
    copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

    - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
    still need wrappers (ones with callee-saved registers not saved in
    pt_regs on syscall entry), but the main part of those suckers is in
    kernel/fork.c now."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
    do_coredump(): get rid of pt_regs argument
    print_fatal_signal(): get rid of pt_regs argument
    ptrace_signal(): get rid of unused arguments
    get rid of ptrace_signal_deliver() arguments
    new helper: signal_pt_regs()
    unify default ptrace_signal_deliver
    flagday: kill pt_regs argument of do_fork()
    death to idle_regs()
    don't pass regs to copy_process()
    flagday: don't pass regs to copy_thread()
    bfin: switch to generic vfork, get rid of pointless wrappers
    xtensa: switch to generic clone()
    openrisc: switch to use of generic fork and clone
    unicore32: switch to generic clone(2)
    score: switch to generic fork/vfork/clone
    c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
    take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
    mn10300: switch to generic fork/vfork/clone
    h8300: switch to generic fork/vfork/clone
    tile: switch to generic clone()
    ...

    Conflicts:
    arch/microblaze/include/asm/Kbuild

    Linus Torvalds
     

04 Dec, 2012

1 commit


17 Nov, 2012

2 commits


29 Oct, 2012

1 commit


27 Oct, 2012

1 commit

  • The Montgomery Multiply, Montgomery Square, and Multiple-Precision
    Multiply instructions work by loading a combination of the floating
    point and multiple register windows worth of integer registers
    with the inputs.

    These values are 64-bit. But for 32-bit userland processes we only
    save the low 32-bits of each integer register during a register spill.
    This is because the register window save area is in the user stack and
    has a fixed layout.

    Therefore, the only way to use these instruction in 32-bit mode is to
    perform the following sequence:

    1) Load the top-32bits of a choosen integer register with a sentinel,
    say "-1". This will be in the outer-most register window.

    The idea is that we're trying to see if the outer-most register
    window gets spilled, and thus the 64-bit values were truncated.

    2) Load all the inputs for the montmul/montsqr/mpmul instruction,
    down to the inner-most register window.

    3) Execute the opcode.

    4) Traverse back up to the outer-most register window.

    5) Check the sentinel, if it's still "-1" store the results.
    Otherwise retry the entire sequence.

    This retry is extremely troublesome. If you're just unlucky and an
    interrupt or other trap happens, it'll push that outer-most window to
    the stack and clear the sentinel when we restore it.

    We could retry forever and never make forward progress if interrupts
    arrive at a fast enough rate (consider perf events as one example).
    So we have do limited retries and fallback to software which is
    extremely non-deterministic.

    Luckily it's very straightforward to provide a mechanism to let
    32-bit applications use a 64-bit stack. Stacks in 64-bit mode are
    biased by 2047 bytes, which means that the lowest bit is set in the
    actual %sp register value.

    So if we see bit zero set in a 32-bit application's stack we treat
    it like a 64-bit stack.

    Runtime detection of such a facility is tricky, and cumbersome at
    best. For example, just trying to use a biased stack and seeing if it
    works is hard to recover from (the signal handler will need to use an
    alt stack, plus something along the lines of longjmp). Therefore, we
    add a system call to report a bitmask of arch specific features like
    this in a cheap and less hairy way.

    With help from Andy Polyakov.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Oct, 2012

1 commit


11 May, 2012

1 commit

  • Use the 32-bit compat keyctl() syscall wrapper on Sparc64 for Sparc32 binary
    compatibility.

    Without this, keyctl(KEYCTL_INSTANTIATE_IOV) is liable to malfunction as it
    uses an iovec array read from userspace - though the kernel should survive this
    as it checks pointers and sizes anyway.

    I think all the other keyctl() function should just work, provided (a) the top
    32-bits of each 64-bit argument register are cleared prior to invoking the
    syscall routine, and the 32-bit address space is right at the 0-end of the
    64-bit address space. Most of the arguments are 32-bit anyway, and so for
    those clearing is not required.

    Signed-off-by: David Howells
    cc: sparclinux@vger.kernel.org
    cc: stable@vger.kernel.org

    David Howells
     

01 Nov, 2011

1 commit


30 Aug, 2011

1 commit


27 Aug, 2011

1 commit


29 May, 2011

1 commit

  • 32bit and 64bit on x86 are tested and working. The rest I have looked
    at closely and I can't find any problems.

    setns is an easy system call to wire up. It just takes two ints so I
    don't expect any weird architecture porting problems.

    While doing this I have noticed that we have some architectures that are
    very slow to get new system calls. cris seems to be the slowest where
    the last system calls wired up were preadv and pwritev. avr32 is weird
    in that recvmmsg was wired up but never declared in unistd.h. frv is
    behind with perf_event_open being the last syscall wired up. On h8300
    the last system call wired up was epoll_wait. On m32r the last system
    call wired up was fallocate. mn10300 has recvmmsg as the last system
    call wired up. The rest seem to at least have syncfs wired up which was
    new in the 2.6.39.

    v2: Most of the architecture support added by Daniel Lezcano
    v3: ported to v2.6.36-rc4 by: Eric W. Biederman
    v4: Moved wiring up of the system call to another patch
    v5: ported to v2.6.39-rc6
    v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
    v7: ported to Linus's latest post 2.6.39 tree.

    >  arch/blackfin/include/asm/unistd.h     |    3 ++-
    >  arch/blackfin/mach-common/entry.S      |    1 +
    Acked-by: Mike Frysinger

    Oh - ia64 wiring looks good.
    Acked-by: Tony Luck

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

06 May, 2011

1 commit

  • This patch adds a multiple message send syscall and is the send
    version of the existing recvmmsg syscall. This is heavily
    based on the patch by Arnaldo that added recvmmsg.

    I wrote a microbenchmark to test the performance gains of using
    this new syscall:

    http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

    The test was run on a ppc64 box with a 10 Gbit network card. The
    benchmark can send both UDP and RAW ethernet packets.

    64B UDP

    batch pkts/sec
    1 804570
    2 872800 (+ 8 %)
    4 916556 (+14 %)
    8 939712 (+17 %)
    16 952688 (+18 %)
    32 956448 (+19 %)
    64 964800 (+20 %)

    64B raw socket

    batch pkts/sec
    1 1201449
    2 1350028 (+12 %)
    4 1461416 (+22 %)
    8 1513080 (+26 %)
    16 1541216 (+28 %)
    32 1553440 (+29 %)
    64 1557888 (+30 %)

    We see a 20% improvement in throughput on UDP send and 30%
    on raw socket send.

    [ Add sparc syscall entries. -DaveM ]

    Signed-off-by: Anton Blanchard
    Signed-off-by: David S. Miller

    Anton Blanchard
     

30 Mar, 2011

1 commit


19 Mar, 2011

1 commit


17 Aug, 2010

1 commit


13 Mar, 2010

2 commits

  • On an architecture that supports 32-bit compat we need to override the
    reported machine in uname with the 32-bit value. Instead of doing this
    separately in every architecture introduce a COMPAT_UTS_MACHINE define in
    and apply it directly in sys_newuname().

    Signed-off-by: Christoph Hellwig
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Hirokazu Takata
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: James Morris
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Add a generic implementation of the ipc demultiplexer syscall. Except for
    s390 and sparc64 all implementations of the sys_ipc are nearly identical.

    There are slight differences in the types of the parameters, where mips
    and powerpc as the only 64-bit architectures with sys_ipc use unsigned
    long for the "third" argument as it gets casted to a pointer later, while
    it traditionally is an "int" like most other paramters. frv goes even
    further and uses unsigned long for all parameters execept for "ptr" which
    is a pointer type everywhere. The change from int to unsigned long for
    "third" and back to "int" for the others on frv should be fine due to the
    in-register calling conventions for syscalls (we already had a similar
    issue with the generic sys_ptrace), but I'd prefer to have the arch
    maintainers looks over this in details.

    Except for that h8300, m68k and m68knommu lack an impplementation of the
    semtimedop sub call which this patch adds, and various architectures have
    gets used - at least on i386 it seems superflous as the compat code on
    x86-64 and ia64 doesn't even bother to implement it.

    [akpm@linux-foundation.org: add sys_ipc to sys_ni.c]
    Signed-off-by: Christoph Hellwig
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Hirokazu Takata
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Reviewed-by: H. Peter Anvin
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: James Morris
    Cc: Andreas Schwab
    Acked-by: Jesper Nilsson
    Acked-by: Russell King
    Acked-by: David Howells
    Acked-by: Kyle McMartin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

04 Mar, 2010

1 commit


11 Dec, 2009

2 commits


08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds
     

06 Nov, 2009

1 commit


13 Oct, 2009

1 commit

  • Meaning receive multiple messages, reducing the number of syscalls and
    net stack entry/exit operations.

    Next patches will introduce mechanisms where protocols that want to
    optimize this operation will provide an unlocked_recvmsg operation.

    This takes into account comments made by:

    . Paul Moore: sock_recvmsg is called only for the first datagram,
    sock_recvmsg_nosec is used for the rest.

    . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
    works in the same fashion as the ppoll one.

    If the underlying protocol returns a datagram with MSG_OOB set, this
    will make recvmmsg return right away with as many datagrams (+ the OOB
    one) it has received so far.

    . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
    datagrams and then recvmsg returns an error, recvmmsg will return
    the successfully received datagrams, store the error and return it
    in the next call.

    This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
    where we will be able to acquire the lock only at batch start and end, not at
    every underlying recvmsg call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

04 Sep, 2009

1 commit


28 Jul, 2009

1 commit


16 Jun, 2009

1 commit


08 Apr, 2009

1 commit


28 Mar, 2009

1 commit


20 Jan, 2009

1 commit


14 Jan, 2009

1 commit