17 Jan, 2012

1 commit

  • * 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86: Move from trace_syscalls.c to asm/syscall.h
    x86, um: Fix typo in 32-bit system call modifications
    um: Use $(srctree) not $(KBUILD_SRC)
    x86, um: Mark system call tables readonly
    x86, um: Use the same style generated syscall tables as native
    um: Generate headers before generating user-offsets.s
    um: Run host archheaders, allow use of host generated headers
    kbuild, headers.sh: Don't make archheaders explicitly
    x86, syscall: Allow syscall offset to be symbolic
    x86, syscall: Re-fix typo in comment
    x86: Simplify syscallhdr.sh
    x86: Generate system call tables and unistd_*.h from tables
    checksyscalls: Use arch/x86/syscalls/syscall_32.tbl as source
    x86: Machine-readable syscall tables and scripts to process them
    trace: Include in trace_syscalls.c
    x86-64, ia32: Move compat_ni_syscall into C and its own file
    x86-64, syscall: Adjust comment spacing and remove typo
    kbuild: Add support for an "archheaders" target
    kbuild: Add support for installing generated asm headers

    Linus Torvalds
     

06 Dec, 2011

2 commits

  • system_call_after_swapgs doesn't really benefit from forcing
    alignment from it - quite the opposite, native code needlessly
    so far got a big NOP instruction inserted in front of it. Xen
    being the only user of the separate entry point can well live
    with the branch going to three bytes into a cache line.

    The compatibility mode ptregs entry points for one can make use
    of the GLOBAL() macro, and should be suitably aligned. Their
    shared continuation point (ia32_ptregs_common) otoh doesn't need
    to be global at all, but should continue to be properly aligned.

    Signed-off-by: Jan Beulich
    Reviewed-by: Andi Kleen
    Link: http://lkml.kernel.org/r/4ED4CEEA020000780006407D@nat28.tlf.novell.com
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • GET_THREAD_INFO() involves a memory read immediately followed by
    an "sub" on the value read, in turn (in several cases)
    immediately followed by a use of the calculated value as the
    base address of a memory access. This combination of
    instructions has a non-negligible potential for stalls.

    In the system call entry point code, however, the (fixed) offset
    of the stack pointer from the end of the stack is generally
    known, and hence we can instead avoid the memory load and
    subtract, and instead do the memory reference using %rsp as the
    base register. To do so in a legible fashion, introduce a
    THREAD_INFO() macro which, provided a register (generally %rsp)
    and the known offset from the end of the stack, produces a
    suitable memory access operand.

    The patch attempts to only touch the fast paths (no auditing and
    alike), but manages to do so only in the 64-bit entry point
    case; the compatibility mode entry points have so many
    interdependencies between their various branch targets that it
    was necessary to also adjust the slow paths to eliminate the
    risk of having missed some register dependency during code
    analysis.

    Signed-off-by: Jan Beulich
    Reviewed-by: Andi Kleen
    Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.com
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

18 Nov, 2011

2 commits

  • Generate system call tables and unistd_*.h automatically from the
    tables in arch/x86/syscalls. All other information, like NR_syscalls,
    is auto-generated, some of which is in asm-offsets_*.c.

    This allows us to keep all the system call information in one place,
    and allows for kernel space and user space to see different
    information; this is currently used for the ia32 system call numbers
    when building the 64-bit kernel, but will be used by the x32 ABI in
    the near future.

    This also removes some gratuitious differences between i386, x86-64
    and ia32; in particular, now all system call tables are generated with
    the same mechanism.

    Cc: H. J. Lu
    Cc: Sam Ravnborg
    Cc: Michal Marek
    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     
  • Move compat_ni_syscall out of ia32entry.S and into its own .c file.
    Although this is a trivial function, it is not performance-critical,
    and this will simplify further cleanups.

    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     

01 Nov, 2011

1 commit

  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

27 Aug, 2011

1 commit


04 Jun, 2011

2 commits


29 May, 2011

1 commit

  • 32bit and 64bit on x86 are tested and working. The rest I have looked
    at closely and I can't find any problems.

    setns is an easy system call to wire up. It just takes two ints so I
    don't expect any weird architecture porting problems.

    While doing this I have noticed that we have some architectures that are
    very slow to get new system calls. cris seems to be the slowest where
    the last system calls wired up were preadv and pwritev. avr32 is weird
    in that recvmmsg was wired up but never declared in unistd.h. frv is
    behind with perf_event_open being the last syscall wired up. On h8300
    the last system call wired up was epoll_wait. On m32r the last system
    call wired up was fallocate. mn10300 has recvmmsg as the last system
    call wired up. The rest seem to at least have syncfs wired up which was
    new in the 2.6.39.

    v2: Most of the architecture support added by Daniel Lezcano
    v3: ported to v2.6.36-rc4 by: Eric W. Biederman
    v4: Moved wiring up of the system call to another patch
    v5: ported to v2.6.39-rc6
    v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
    v7: ported to Linus's latest post 2.6.39 tree.

    >  arch/blackfin/include/asm/unistd.h     |    3 ++-
    >  arch/blackfin/mach-common/entry.S      |    1 +
    Acked-by: Mike Frysinger

    Oh - ia64 wiring looks good.
    Acked-by: Tony Luck

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

06 May, 2011

1 commit

  • This patch adds a multiple message send syscall and is the send
    version of the existing recvmmsg syscall. This is heavily
    based on the patch by Arnaldo that added recvmmsg.

    I wrote a microbenchmark to test the performance gains of using
    this new syscall:

    http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

    The test was run on a ppc64 box with a 10 Gbit network card. The
    benchmark can send both UDP and RAW ethernet packets.

    64B UDP

    batch pkts/sec
    1 804570
    2 872800 (+ 8 %)
    4 916556 (+14 %)
    8 939712 (+17 %)
    16 952688 (+18 %)
    32 956448 (+19 %)
    64 964800 (+20 %)

    64B raw socket

    batch pkts/sec
    1 1201449
    2 1350028 (+12 %)
    4 1461416 (+22 %)
    8 1513080 (+26 %)
    16 1541216 (+28 %)
    32 1553440 (+29 %)
    64 1557888 (+30 %)

    We see a 20% improvement in throughput on UDP send and 30%
    on raw socket send.

    [ Add sparc syscall entries. -DaveM ]

    Signed-off-by: Anton Blanchard
    Signed-off-by: David S. Miller

    Anton Blanchard
     

21 Mar, 2011

1 commit

  • It is frequently useful to sync a single file system, instead of all
    mounted file systems via sync(2):

    - On machines with many mounts, it is not at all uncommon for some of
    them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on
    those and may never get to the one you do care about (e.g., /).
    - Some applications write lots of data to the file system and then
    want to make sure it is flushed to disk. Calling fsync(2) on each
    file introduces unnecessary ordering constraints that result in a large
    amount of sub-optimal writeback/flush/commit behavior by the file
    system.

    There are currently two ways (that I know of) to sync a single super_block:

    - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
    mapping, which isn't usually desirable, and doesn't work for non-block
    file systems.
    - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
    current implemention. Relying on this little-known side effect for
    something like data safety sounds foolish.

    Both of these approaches require root privileges, which some applications
    do not have (nor should they need?) given that sync(2) is an unprivileged
    operation.

    This patch introduces a new system call syncfs(2) that takes an fd and
    syncs only the file system it references. Maybe someday we can

    $ sync /some/path

    and not get

    sync: ignoring all arguments

    The syscall is motivated by comments by Al and Christoph at the last LSF.
    syncfs(2) seems like an appropriate name given statfs(2).

    A similar ioctl was also proposed a while back, see
    http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2

    Signed-off-by: Sage Weil
    Signed-off-by: Al Viro

    Sage Weil
     

16 Mar, 2011

3 commits

  • * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, binutils, xen: Fix another wrong size directive
    x86: Remove dead config option X86_CPU
    x86: Really print supported CPUs if PROCESSOR_SELECT=y
    x86: Fix a bogus unwind annotation in lib/semaphore_32.S
    um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S
    x86: Remove unused bits from lib/thunk_*.S
    x86: Use {push,pop}_cfi in more places
    x86-64: Add CFI annotations to lib/rwsem_64.S
    x86, asm: Cleanup unnecssary macros in asm-offsets.c
    x86, system.h: Drop unused __SAVE/__RESTORE macros
    x86: Use bitmap library functions
    x86: Partly unify asm-offsets_{32,64}.c
    x86: Reduce back the alignment of the per-CPU data section

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
    posix-clocks: Check write permissions in posix syscalls
    hrtimer: Remove empty hrtimer_init_hres_timer()
    hrtimer: Update hrtimer->state documentation
    hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
    timers: Export CLOCK_BOOTTIME via the posix timers interface
    timers: Add CLOCK_BOOTTIME hrtimer base
    time: Extend get_xtime_and_monotonic_offset() to also return sleep
    time: Introduce get_monotonic_boottime and ktime_get_boottime
    hrtimers: extend hrtimer base code to handle more then 2 clockids
    ntp: Remove redundant and incorrect parameter check
    mn10300: Switch do_timer() to xtimer_update()
    posix clocks: Introduce dynamic clocks
    posix-timers: Cleanup namespace
    posix-timers: Add support for fd based clocks
    x86: Add clock_adjtime for x86
    posix-timers: Introduce a syscall for clock tuning.
    time: Splitout compat timex accessors
    ntp: Add ADJ_SETOFFSET mode bit
    time: Introduce timekeeping_inject_offset
    posix-timer: Update comment
    ...

    Fix up new system-call-related conflicts in
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/syscall_table_32.S
    (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
    due to movement of get_jiffies_64() in:
    kernel/time.c

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
    perf probe: Clean up probe_point_lazy_walker() return value
    tracing: Fix irqoff selftest expanding max buffer
    tracing: Align 4 byte ints together in struct tracer
    tracing: Export trace_set_clr_event()
    tracing: Explain about unstable clock on resume with ring buffer warning
    ftrace/graph: Trace function entry before updating index
    ftrace: Add .ref.text as one of the safe areas to trace
    tracing: Adjust conditional expression latency formatting.
    tracing: Fix event alignment: skb:kfree_skb
    tracing: Fix event alignment: mce:mce_record
    tracing: Fix event alignment: kvm:kvm_hv_hypercall
    tracing: Fix event alignment: module:module_request
    tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
    tracing: Remove lock_depth from event entry
    perf header: Stop using 'self'
    perf session: Use evlist/evsel for managing perf.data attributes
    perf top: Don't let events to eat up whole header line
    perf top: Fix events overflow in top command
    ring-buffer: Remove unused #include <linux/trace_irq.h>
    tracing: Add an 'overwrite' trace_option.
    ...

    Linus Torvalds
     

15 Mar, 2011

1 commit


09 Mar, 2011

1 commit

  • Put x86 entry code into a separate link section: .entry.text.

    Separating the entry text section seems to have performance
    benefits - caused by more efficient instruction cache usage.

    Running hackbench with perf stat --repeat showed that the change
    compresses the icache footprint. The icache load miss rate went
    down by about 15%:

    before patch:
    19417627 L1-icache-load-misses ( +- 0.147% )

    after patch:
    16490788 L1-icache-load-misses ( +- 0.180% )

    The motivation of the patch was to fix a particular kprobes
    bug that relates to the entry text section, the performance
    advantage was discovered accidentally.

    Whole perf output follows:

    - results for current tip tree:

    Performance counter stats for './hackbench/hackbench 10' (500 runs):

    19417627 L1-icache-load-misses ( +- 0.147% )
    2676914223 instructions # 0.497 IPC ( +- 0.079% )
    5389516026 cycles ( +- 0.144% )

    0.206267711 seconds time elapsed ( +- 0.138% )

    - results for current tip tree with the patch applied:

    Performance counter stats for './hackbench/hackbench 10' (500 runs):

    16490788 L1-icache-load-misses ( +- 0.180% )
    2717734941 instructions # 0.502 IPC ( +- 0.079% )
    5414756975 cycles ( +- 0.148% )

    0.206747566 seconds time elapsed ( +- 0.137% )

    Signed-off-by: Jiri Olsa
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Nick Piggin
    Cc: Eric Dumazet
    Cc: masami.hiramatsu.pt@hitachi.com
    Cc: ananth@in.ibm.com
    Cc: davem@davemloft.net
    Cc: 2nddept-manager@sdl.hitachi.co.jp
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

01 Mar, 2011

1 commit


02 Feb, 2011

1 commit


15 Sep, 2010

2 commits

  • In commit d4d6715, we reopened an old hole for a 64-bit ptracer touching a
    32-bit tracee in system call entry. A %rax value set via ptrace at the
    entry tracing stop gets used whole as a 32-bit syscall number, while we
    only check the low 32 bits for validity.

    Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
    in addition to testing the full 64 bits as has already been added.

    Reported-by: Ben Hawkes
    Signed-off-by: Roland McGrath
    Signed-off-by: H. Peter Anvin

    Roland McGrath
     
  • On 64 bits, we always, by necessity, jump through the system call
    table via %rax. For 32-bit system calls, in theory the system call
    number is stored in %eax, and the code was testing %eax for a valid
    system call number. At one point we loaded the stored value back from
    the stack to enforce zero-extension, but that was removed in checkin
    d4d67150165df8bf1cc05e532f6efca96f907cab. An actual 32-bit process
    will not be able to introduce a non-zero-extended number, but it can
    happen via ptrace.

    Instead of re-introducing the zero-extension, test what we are
    actually going to use, i.e. %rax. This only adds a handful of REX
    prefixes to the code.

    Reported-by: Ben Hawkes
    Signed-off-by: H. Peter Anvin
    Cc:
    Cc: Roland McGrath
    Cc: Andrew Morton

    H. Peter Anvin
     

11 Aug, 2010

2 commits

  • As pointed out by Jiri Slaby: when I resolved the the 32-bit x85 system
    call entry tables for prlimit (due to the conflict with fanotify), I
    forgot to add the numbering in comments that we do for every fifth entry.

    Reported-by: Jiri Slaby
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
    unistd: add __NR_prlimit64 syscall numbers
    rlimits: implement prlimit64 syscall
    rlimits: switch more rlimit syscalls to do_prlimit
    rlimits: redo do_setrlimit to more generic do_prlimit
    rlimits: add rlimit64 structure
    rlimits: do security check under task_lock
    rlimits: allow setrlimit to non-current tasks
    rlimits: split sys_setrlimit
    rlimits: selinux, do rlimits changes under task_lock
    rlimits: make sure ->rlim_max never grows in sys_setrlimit
    rlimits: add task_struct to update_rlimit_cpu
    rlimits: security, add task_struct to setrlimit

    Fix up various system call number conflicts. We not only added fanotify
    system calls in the meantime, but asm-generic/unistd.h added a wait4
    along with a range of reserved per-architecture system calls.

    Linus Torvalds
     

28 Jul, 2010

2 commits


16 Jul, 2010

1 commit


21 Apr, 2010

1 commit

  • Before commit e28cbf22933d0c0ccaf3c4c27a1a263b41f73859 ("improve
    sys_newuname() for compat architectures") 64-bit x86 had a private
    implementation of sys_uname which was just called sys_uname, which other
    architectures used for the old uname.

    Due to some merge issues with the uname refactoring patches we ended up
    calling the old uname version for both the old and new system call
    slots, which lead to the domainname filed never be set which caused
    failures with libnss_nis.

    Reported-and-tested-by: Andy Isaacson
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

13 Mar, 2010

2 commits

  • Add generic implementations of the old and really old uname system calls.
    Note that sh only implements sys_olduname but not sys_oldolduname, but I'm
    not going to bother with another ifdef for that special case.

    m32r implemented an old uname but never wired it up, so kill it, too.

    Signed-off-by: Christoph Hellwig
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Hirokazu Takata
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: James Morris
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Add a generic implementation of the old select() syscall, which expects
    its argument in a memory block and switch all architectures over to use
    it.

    Signed-off-by: Christoph Hellwig
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Hirokazu Takata
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Reviewed-by: H. Peter Anvin
    Cc: Al Viro
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: "Luck, Tony"
    Cc: James Morris
    Acked-by: Andreas Schwab
    Acked-by: Russell King
    Acked-by: Greg Ungerer
    Acked-by: David Howells
    Cc: Andreas Schwab
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

11 Dec, 2009

1 commit


08 Dec, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
    mac80211: fix reorder buffer release
    iwmc3200wifi: Enable wimax core through module parameter
    iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
    iwmc3200wifi: Coex table command does not expect a response
    iwmc3200wifi: Update wiwi priority table
    iwlwifi: driver version track kernel version
    iwlwifi: indicate uCode type when fail dump error/event log
    iwl3945: remove duplicated event logging code
    b43: fix two warnings
    ipw2100: fix rebooting hang with driver loaded
    cfg80211: indent regulatory messages with spaces
    iwmc3200wifi: fix NULL pointer dereference in pmkid update
    mac80211: Fix TX status reporting for injected data frames
    ath9k: enable 2GHz band only if the device supports it
    airo: Fix integer overflow warning
    rt2x00: Fix padding bug on L2PAD devices.
    WE: Fix set events not propagated
    b43legacy: avoid PPC fault during resume
    b43: avoid PPC fault during resume
    tcp: fix a timewait refcnt race
    ...

    Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
    CTL_UNNUMBERED removed) in
    kernel/sysctl_check.c
    net/ipv4/sysctl_net_ipv4.c
    net/ipv6/addrconf.c
    net/sctp/sysctl.c

    Linus Torvalds
     

19 Nov, 2009

1 commit


06 Nov, 2009

1 commit


26 Oct, 2009

1 commit

  • Restoring %ebp after the call to audit_syscall_exit() is not
    only unnecessary (because the register didn't get clobbered),
    but in the sysenter case wasn't even doing the right thing: It
    loaded %ebp from a location below the top of stack (RBP <
    ARGOFFSET), i.e. arbitrary kernel data got passed back to user
    mode in the register.

    Signed-off-by: Jan Beulich
    Acked-by: Roland McGrath
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

13 Oct, 2009

1 commit

  • Meaning receive multiple messages, reducing the number of syscalls and
    net stack entry/exit operations.

    Next patches will introduce mechanisms where protocols that want to
    optimize this operation will provide an unlocked_recvmsg operation.

    This takes into account comments made by:

    . Paul Moore: sock_recvmsg is called only for the first datagram,
    sock_recvmsg_nosec is used for the rest.

    . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
    works in the same fashion as the ppoll one.

    If the underlying protocol returns a datagram with MSG_OOB set, this
    will make recvmmsg return right away with as many datagrams (+ the OOB
    one) it has received so far.

    . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
    datagrams and then recvmsg returns an error, recvmmsg will return
    the successfully received datagrams, store the error and return it
    in the next call.

    This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
    where we will be able to acquire the lock only at batch start and end, not at
    every underlying recvmsg call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

01 Oct, 2009

1 commit

  • While 32-bit processes can't directly access R8...R15, they can
    gain access to these registers by temporarily switching themselves
    into 64-bit mode.

    Therefore, registers not preserved anyway by called C functions
    (i.e. R8...R11) must be cleared prior to returning to user mode.

    Signed-off-by: Jan Beulich
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

09 Aug, 2009

1 commit


01 May, 2009

2 commits