24 Dec, 2014

1 commit

  • In Linux 3.18 and below, GCC hoists the lsl instructions in the
    pvclock code all the way to the beginning of __vdso_clock_gettime,
    slowing the non-paravirt case significantly. For unknown reasons,
    presumably related to the removal of a branch, the performance issue
    is gone as of

    e76b027e6408 x86,vdso: Use LSL unconditionally for vgetcpu

    but I don't trust GCC enough to expect the problem to stay fixed.

    There should be no correctness issue, because the __getcpu calls in
    __vdso_vlock_gettime were never necessary in the first place.

    Note to stable maintainers: In 3.18 and below, depending on
    configuration, gcc 4.9.2 generates code like this:

    9c3: 44 0f 03 e8 lsl %ax,%r13d
    9c7: 45 89 eb mov %r13d,%r11d
    9ca: 0f 03 d8 lsl %ax,%ebx

    This patch won't apply as is to any released kernel, but I'll send a
    trivial backported version if needed.

    Fixes: 51c19b4f5927 x86: vdso: pvclock gettime support
    Cc: stable@vger.kernel.org # 3.8+
    Cc: Marcelo Tosatti
    Acked-by: Paolo Bonzini
    Signed-off-by: Andy Lutomirski

    Andy Lutomirski
     

03 Nov, 2014

1 commit

  • LSL is faster than RDTSCP and works everywhere; there's no need to
    switch between them depending on CPU.

    Signed-off-by: Andy Lutomirski
    Cc: Andi Kleen
    Link: http://lkml.kernel.org/r/72f73d5ec4514e02bba345b9759177ef03742efb.1414706021.git.luto@amacapital.net
    Signed-off-by: Thomas Gleixner

    Andy Lutomirski
     

19 Mar, 2014

1 commit

  • This patch add the VDSO time support for the IA32 Emulation Layer.

    Due the nature of the kernel headers and the LP64 compiler where the
    size of a long and a pointer differs against a 32 bit compiler, there
    is some type hacking necessary for optimal performance.

    The vsyscall_gtod_data struture must be a rearranged to serve 32- and
    64-bit code access at the same time:

    - The seqcount_t was replaced by an unsigned, this makes the
    vsyscall_gtod_data intedepend of kernel configuration and internal functions.
    - All kernel internal structures are replaced by fix size elements
    which works for 32- and 64-bit access
    - The inner struct clock was removed to pack the whole struct.

    The "unsigned seq" would be handled by functions derivated from seqcount_t.

    Signed-off-by: Stefani Seibold
    Link: http://lkml.kernel.org/r/1395094933-14252-11-git-send-email-stefani@seibold.net
    Signed-off-by: H. Peter Anvin

    Stefani Seibold
     

25 Sep, 2012

1 commit


24 Mar, 2012

1 commit


16 Mar, 2012

1 commit


15 Jul, 2011

1 commit


06 Jun, 2011

1 commit

  • It's unnecessary overhead in code that's supposed to be highly
    optimized. Removing it allows us to remove one of the two
    syscall instructions in the vsyscall page.

    The only sensible use for it is for UML users, and it doesn't
    fully address inconsistent vsyscall results on UML. The real
    fix for UML is to stop using vsyscalls entirely.

    Signed-off-by: Andy Lutomirski
    Cc: Jesper Juhl
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Arjan van de Ven
    Cc: Jan Beulich
    Cc: richard -rw- weinberger
    Cc: Mikael Pettersson
    Cc: Andi Kleen
    Cc: Brian Gerst
    Cc: Louis Rilling
    Cc: Valdis.Kletnieks@vt.edu
    Cc: pageexec@freemail.hu
    Link: http://lkml.kernel.org/r/973ae803fe76f712da4b2740e66dccf452d3b1e4.1307292171.git.luto@mit.edu
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

24 May, 2011

1 commit

  • Variables that are shared between the vdso and the kernel are
    currently a bit of a mess. They are each defined with their own
    magic, they are accessed differently in the kernel, the vsyscall page,
    and the vdso, and one of them (vsyscall_clock) doesn't even really
    exist.

    This changes them all to use a common mechanism. All of them are
    delcared in vvar.h with a fixed address (validated by the linker
    script). In the kernel (as before), they look like ordinary
    read-write variables. In the vsyscall page and the vdso, they are
    accessed through a new macro VVAR, which gives read-only access.

    The vdso is now loaded verbatim into memory without any fixups. As a
    side bonus, access from the vdso is faster because a level of
    indirection is removed.

    While we're at it, pack jiffies and vgetcpu_mode into the same
    cacheline.

    Signed-off-by: Andy Lutomirski
    Cc: Andi Kleen
    Cc: Linus Torvalds
    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E
    Signed-off-by: Thomas Gleixner

    Andy Lutomirski
     

22 Aug, 2009

1 commit

  • After talking with some application writers who want very fast, but not
    fine-grained timestamps, I decided to try to implement new clock_ids
    to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
    which returns the time at the last tick. This is very fast as we don't
    have to access any hardware (which can be very painful if you're using
    something like the acpi_pm clocksource), and we can even use the vdso
    clock_gettime() method to avoid the syscall. The only trade off is you
    only get low-res tick grained time resolution.

    This isn't a new idea, I know Ingo has a patch in the -rt tree that made
    the vsyscall gettimeofday() return coarse grained time when the
    vsyscall64 sysctrl was set to 2. However this affects all applications
    on a system.

    With this method, applications can choose the proper speed/granularity
    trade-off for themselves.

    Signed-off-by: John Stultz
    Cc: Andi Kleen
    Cc: nikolag@ca.ibm.com
    Cc: Darren Hart
    Cc: arjan@infradead.org
    Cc: jonathan@jonmasters.org
    LKML-Reference:
    Signed-off-by: Thomas Gleixner

    john stultz
     

23 Oct, 2008

2 commits