11 Jan, 2012

1 commit

  • Abstract the code sequence for adding a signal handler's sa_mask to
    current->blocked because the sequence is identical for all architectures.
    Furthermore, in the past some architectures actually got this code wrong,
    so introduce a wrapper that all architectures can use.

    Signed-off-by: Matt Fleming
    Signed-off-by: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Tejun Heo
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Fleming
     

15 Jul, 2011

3 commits

  • handle_signal()->set_fs() has a nice comment which explains what
    set_fs() is, but it doesn't explain why it is needed and why it
    depends on CONFIG_X86_64.

    Afaics, the history of this confusion is:

    1. I guess today nobody can explain why it was needed
    in arch/i386/kernel/signal.c, perhaps it was always
    wrong. This predates 2.4.0 kernel.

    2. then it was copy-and-past'ed to the new x86_64 arch.

    3. then it was removed from i386 (but not from x86_64)
    by b93b6ca3 "i386: remove unnecessary code".

    4. then it was reintroduced under CONFIG_X86_64 when x86
    unified i386 and x86_64, because the patch above didn't
    touch x86_64.

    Remove it. ->addr_limit should be correct. Even if it was possible
    that it is wrong, it is too late to fix it after setup_rt_frame().

    Linus commented in:
    http://lkml.kernel.org/r/alpine.LFD.0.999.0707170902570.19166@woody.linux-foundation.org

    ... about the equivalent bit from i386:

    Heh. I think it's entirely historical.

    Please realize that the whole reason that function is called "set_fs()" is
    that it literally used to set the %fs segment register, not
    "->addr_limit".

    So I think the "set_fs(USER_DS)" is there _only_ to match the other

    regs->xds = __USER_DS;
    regs->xes = __USER_DS;
    regs->xss = __USER_DS;
    regs->xcs = __USER_CS;

    things, and never mattered. And now it matters even less, and has been
    copied to all other architectures where it is just totally insane.

    Signed-off-by: Oleg Nesterov
    Link: http://lkml.kernel.org/r/20110710164424.GA20261@redhat.com
    Signed-off-by: H. Peter Anvin

    Oleg Nesterov
     
  • 1. do_signal() looks at TS_RESTORE_SIGMASK and calculates the
    mask which should be stored in the signal frame, then it
    passes "oldset" to the callees, down to setup_rt_frame().

    This is ugly, setup_rt_frame() can do this itself and nobody
    else needs this sigset_t. Move this code into setup_rt_frame.

    2. do_signal() also clears TS_RESTORE_SIGMASK if handle_signal()
    succeeds.

    We can move this to setup_rt_frame() as well, this avoids the
    unnecessary checks and makes the logic more clear.

    3. use set_current_blocked() instead of sigprocmask(SIG_SETMASK),
    sigprocmask() should be avoided.

    Signed-off-by: Oleg Nesterov
    Link: http://lkml.kernel.org/r/20110710182203.GA27979@redhat.com
    Signed-off-by: H. Peter Anvin

    Oleg Nesterov
     
  • sys_sigsuspend() and sys_sigreturn() change ->blocked directly.
    This is not correct, see the changelog in e6fa16ab
    "signal: sigprocmask() should do retarget_shared_pending()"

    Change them to use set_current_blocked().

    Signed-off-by: Oleg Nesterov
    Link: http://lkml.kernel.org/r/20110710192727.GA31759@redhat.com
    Signed-off-by: H. Peter Anvin

    Oleg Nesterov
     

28 Apr, 2011

2 commits

  • Normally sys_rt_sigreturn() restores the old current->blocked which was
    changed by handle_signal(), and unblocking is always fine.

    But the debugger or application itself can change frame->uc_sigmask and
    thus we need set_current_blocked()->retarget_shared_pending().

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • This is ugly, but if sigprocmask() needs retarget_shared_pending() then
    handle signal should follow this logic. In theory it is newer correct to
    add the new signals to current->blocked, the signal handler can sleep/etc
    so we should notify other threads in case we block the pending signal and
    nobody else has TIF_SIGPENDING.

    Of course, this change doesn't make signals faster :/

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Matt Fleming
    Acked-by: Tejun Heo

    Oleg Nesterov
     

10 Dec, 2009

1 commit


09 Dec, 2009

1 commit

  • * 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
    KVM: VMX: Fix comparison of guest efer with stale host value
    KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
    KVM: Drop user return notifier when disabling virtualization on a cpu
    KVM: VMX: Disable unrestricted guest when EPT disabled
    KVM: x86 emulator: limit instructions to 15 bytes
    KVM: s390: Make psw available on all exits, not just a subset
    KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
    KVM: VMX: Report unexpected simultaneous exceptions as internal errors
    KVM: Allow internal errors reported to userspace to carry extra data
    KVM: Reorder IOCTLs in main kvm.h
    KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
    KVM: only clear irq_source_id if irqchip is present
    KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
    KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
    KVM: VMX: Remove vmx->msr_offset_efer
    KVM: MMU: update invlpg handler comment
    KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
    KVM: remove duplicated task_switch check
    KVM: powerpc: Fix BUILD_BUG_ON condition
    KVM: VMX: Use shared msr infrastructure
    ...

    Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile

    Linus Torvalds
     

18 Oct, 2009

1 commit


02 Oct, 2009

1 commit

  • Add a general per-cpu notifier that is called whenever the kernel is
    about to return to userspace. The notifier uses a thread_info flag
    and existing checks, so there is no impact on user return or context
    switch fast paths.

    This will be used initially to speed up KVM task switching by lazily
    updating MSRs.

    Signed-off-by: Avi Kivity
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Avi Kivity
     

18 Sep, 2009

1 commit

  • * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
    x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.c
    x86, mce: CE in last bank prevents panic by unknown MCE
    x86, mce: Fake panic support for MCE testing
    x86, mce: Move debugfs mce dir creating to mce.c
    x86, mce: Support specifying raise mode for software MCE injection
    x86, mce: Support specifying context for software mce injection
    x86, mce: fix reporting of Thermal Monitoring mechanism enabled
    x86, mce: remove never executed code
    x86, mce: add missing __cpuinit tags
    x86, mce: fix "mce" boot option handling for CONFIG_X86_NEW_MCE
    x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
    x86: mce: Lower maximum number of banks to architecture limit
    x86: mce: macros to compute banks MSRs
    x86: mce: Move per bank data in a single datastructure
    x86: mce: Move code in mce.c
    x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE
    x86: mce: Remove old i386 machine check code
    x86: mce: Update X86_MCE description in x86/Kconfig
    x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCE
    x86, mce: use atomic_inc_return() instead of add by 1
    ...

    Manually fixed up trivial conflicts:
    Documentation/feature-removal-schedule.txt
    arch/x86/kernel/cpu/mcheck/mce.c

    Linus Torvalds
     

15 Sep, 2009

1 commit


02 Sep, 2009

1 commit

  • Add a keyctl to install a process's session keyring onto its parent. This
    replaces the parent's session keyring. Because the COW credential code does
    not permit one process to change another process's credentials directly, the
    change is deferred until userspace next starts executing again. Normally this
    will be after a wait*() syscall.

    To support this, three new security hooks have been provided:
    cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
    the blank security creds and key_session_to_parent() - which asks the LSM if
    the process may replace its parent's session keyring.

    The replacement may only happen if the process has the same ownership details
    as its parent, and the process has LINK permission on the session keyring, and
    the session keyring is owned by the process, and the LSM permits it.

    Note that this requires alteration to each architecture's notify_resume path.
    This has been done for all arches barring blackfin, m68k* and xtensa, all of
    which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the
    replacement to be performed at the point the parent process resumes userspace
    execution.

    This allows the userspace AFS pioctl emulation to fully emulate newpag() and
    the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
    alter the parent process's PAG membership. However, since kAFS doesn't use
    PAGs per se, but rather dumps the keys into the session keyring, the session
    keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
    the newpag flag.

    This can be tested with the following program:

    #include
    #include
    #include

    #define KEYCTL_SESSION_TO_PARENT 18

    #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)

    int main(int argc, char **argv)
    {
    key_serial_t keyring, key;
    long ret;

    keyring = keyctl_join_session_keyring(argv[1]);
    OSERROR(keyring, "keyctl_join_session_keyring");

    key = add_key("user", "a", "b", 1, keyring);
    OSERROR(key, "add_key");

    ret = keyctl(KEYCTL_SESSION_TO_PARENT);
    OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");

    return 0;
    }

    Compiled and linked with -lkeyutils, you should see something like:

    [dhowells@andromeda ~]$ keyctl show
    Session Keyring
    -3 --alswrv 4043 4043 keyring: _ses
    355907932 --alswrv 4043 -1 \_ keyring: _uid.4043
    [dhowells@andromeda ~]$ /tmp/newpag
    [dhowells@andromeda ~]$ keyctl show
    Session Keyring
    -3 --alswrv 4043 4043 keyring: _ses
    1055658746 --alswrv 4043 4043 \_ user: a
    [dhowells@andromeda ~]$ /tmp/newpag hello
    [dhowells@andromeda ~]$ keyctl show
    Session Keyring
    -3 --alswrv 4043 4043 keyring: hello
    340417692 --alswrv 4043 4043 \_ user: a

    Where the test program creates a new session keyring, sticks a user key named
    'a' into it and then installs it on its parent.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

10 Jul, 2009

1 commit


17 Jun, 2009

1 commit

  • Conflicts:
    arch/x86/Kconfig
    arch/x86/kernel/traps.c
    arch/x86/power/cpu.c
    arch/x86/power/cpu_32.c
    kernel/Makefile

    Semantic conflict:
    arch/x86/kernel/hw_breakpoint.c

    Merge reason: Resolve the conflicts, move from put_cpu_no_sched() to
    put_cpu() in arch/x86/kernel/hw_breakpoint.c.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

12 Jun, 2009

1 commit


04 Jun, 2009

2 commits

  • Newer Intel CPUs support a new class of machine checks called recoverable
    action optional.

    Action Optional means that the CPU detected some form of corruption in
    the background and tells the OS about using a machine check
    exception. The OS can then take appropiate action, like killing the
    process with the corrupted data or logging the event properly to disk.

    This is done by the new generic high level memory failure handler added
    in a earlier patch. The high level handler takes the address with the
    failed memory and does the appropiate action, like killing the process.

    In this version of the patch the high level handler is stubbed out
    with a weak function to not create a direct dependency on the hwpoison
    branch.

    The high level handler cannot be directly called from the machine check
    exception though, because it has to run in a defined process context to
    be able to sleep when taking VM locks (it is not expected to sleep for a
    long time, just do so in some exceptional cases like lock contention)

    Thus the MCE handler has to queue a work item for process context,
    trigger process context and then call the high level handler from there.

    This patch adds two path to process context: through a per thread kernel
    exit notify_user() callback or through a high priority work item.
    The first runs when the process exits back to user space, the other when
    it goes to sleep and there is no higher priority process.

    The machine check handler will schedule both, and whoever runs first
    will grab the event. This is done because quick reaction to this
    event is critical to avoid a potential more fatal machine check
    when the corruption is consumed.

    There is a simple lock less ring buffer to queue the corrupted
    addresses between the exception handler and the process context handler.
    Then in process context it just calls the high level VM code with
    the corrupted PFNs.

    The code adds the required code to extract the failed address from
    the CPU's machine check registers. It doesn't try to handle all
    possible cases -- the specification has 6 different ways to specify
    memory address -- but only the linear address.

    Most of the required checking has been already done earlier in the
    mce_severity rule checking engine. Following the Intel
    recommendations Action Optional errors are only enabled for known
    situations (encoded in MCACODs). The errors are ignored otherwise,
    because they are action optional.

    v2: Improve comment, disable preemption while processing ring buffer
    (reported by Ying Huang)

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     
  • Rename the mce_notify_user function to mce_notify_irq. The next
    patch will split the wakeup handling of interrupt context
    and of process context and it's better to give it a clearer
    name for this.

    Contains a fix from Ying Huang

    [ Impact: cleanup ]

    Signed-off-by: Andi Kleen
    Signed-off-by: Hidetoshi Seto
    Cc: Huang Ying
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     

03 Jun, 2009

1 commit


29 May, 2009

1 commit

  • The 64bit machine check code is in many ways much better than
    the 32bit machine check code: it is more specification compliant,
    is cleaner, only has a single code base versus one per CPU,
    has better infrastructure for recovery, has a cleaner way to communicate
    with user space etc. etc.

    Use the 64bit code for 32bit too.

    This is the second attempt to do this. There was one a couple of years
    ago to unify this code for 32bit and 64bit. Back then this ran into some
    trouble with K7s and was reverted.

    I believe this time the K7 problems (and some others) are addressed.
    I went over the old handlers and was very careful to retain
    all quirks.

    But of course this needs a lot of testing on old systems. On newer
    64bit capable systems I don't expect much problems because they have been
    already tested with the 64bit kernel.

    I made this a CONFIG for now that still allows to select the old
    machine check code. This is mostly to make testing easier,
    if someone runs into a problem we can ask them to try
    with the CONFIG switched.

    The new code is default y for more coverage.

    Once there is confidence the 64bit code works well on older hardware
    too the CONFIG_X86_OLD_MCE and the associated code can be easily
    removed.

    This causes a behaviour change for 32bit installations. They now
    have to install the mcelog package to be able to log
    corrected machine checks.

    The 64bit machine check code only handles CPUs which support the
    standard Intel machine check architecture described in the IA32 SDM.
    The 32bit code has special support for some older CPUs which
    have non standard machine check architectures, in particular
    WinChip C3 and Intel P5. I made those a separate CONFIG option
    and kept them for now. The WinChip variant could be probably
    removed without too much pain, it doesn't really do anything
    interesting. P5 is also disabled by default (like it
    was before) because many motherboards have it miswired, but
    according to Alan Cox a few embedded setups use that one.

    Forward ported/heavily changed version of old patch, original patch
    included review/fixes from Thomas Gleixner, Bert Wesarg.

    Signed-off-by: Andi Kleen
    Signed-off-by: H. Peter Anvin
    Signed-off-by: Hidetoshi Seto
    Signed-off-by: H. Peter Anvin

    Andi Kleen
     

06 Apr, 2009

2 commits

  • While going over the wakeup code I noticed delayed wakeups only work
    for hardware counters but basically all software counters rely on
    them.

    This patch unifies and generalizes the delayed wakeup to fix this
    issue.

    Since we're dealing with NMI context bits here, use a cmpxchg() based
    single link list implementation to track counters that have pending
    wakeups.

    [ This should really be generic code for delayed wakeups, but since we
    cannot use cmpxchg()/xchg() in generic code, I've let it live in the
    perf_counter code. -- Eric Dumazet could use it to aggregate the
    network wakeups. ]

    Furthermore, the x86 method of using TIF flags was flawed in that its
    quite possible to end up setting the bit on the idle task, loosing the
    wakeup.

    The powerpc method uses per-cpu storage and does appear to be
    sufficient.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Orig-LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Merge reason: we have gathered quite a few conflicts, need to merge upstream

    Conflicts:
    arch/powerpc/kernel/Makefile
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/hardirq.h
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/cpu/common.c
    arch/x86/kernel/irq.c
    arch/x86/kernel/syscall_table_32.S
    arch/x86/mm/iomap_32.c
    include/linux/sched.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

01 Apr, 2009

1 commit

  • Impact: fix redundant and incorrect check

    Oleg Nesterov noticed wrt commit:

    14fc9fb: x86: signal: check signal stack overflow properly

    >> No need to check SA_ONSTACK if we're already using alternate signal stack.
    >
    > Yes, but this also mean that we don't need sas_ss_flags() under
    > "if (!onsigstack)",

    Checking on_sig_stack() in sas_ss_flags() at get_sigframe() is redundant
    and not correct on 64 bit. To check sas_ss_size is enough.

    Reported-by: Oleg Nesterov
    Signed-off-by: Hiroshi Shimamoto
    Cc: roland@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     

21 Mar, 2009

1 commit

  • Impact: cleanup

    Check alternate signal stack overflow with proper stack pointer.
    The stack pointer of the next signal frame is different if that
    task has i387 state.

    On x86_64, redzone would be included.

    No need to check SA_ONSTACK if we're already using alternate signal stack.

    Signed-off-by: Hiroshi Shimamoto
    Cc: Roland McGrath
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     

03 Mar, 2009

1 commit

  • Impact: fix bad frame in rt_sigreturn on 64-bit

    After commit 97286a2b64725aac2d584ddd1f94871f9991d5a1 some applications
    fail to return from signal handler:

    [ 145.150133] firefox[3250] bad frame in rt_sigreturn frame:00007f902b44eb28 ip:352e80b307 sp:7f902b44ef70 orax:ffffffffffffffff in libpthread-2.9.so[352e800000+17000]
    [ 665.519017] firefox[5420] bad frame in rt_sigreturn frame:00007faa8deaeb28 ip:352e80b307 sp:7faa8deaef70 orax:ffffffffffffffff in libpthread-2.9.so[352e800000+17000]

    The root cause is forgetting to keep 64 byte aligned value of
    fpstate for next stack pointer calculation.

    Reported-by: Jaswinder Singh Rajput
    Reported-by: Mike Galbraith
    Signed-off-by: Hiroshi Shimamoto
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hiroshi Shimamoto
     

28 Feb, 2009

5 commits


13 Feb, 2009

1 commit


12 Feb, 2009

2 commits

  • Impact: cleanup

    With the recent changes in the 32-bit code to make system calls which
    use struct pt_regs take a pointer, sys_rt_sigreturn() have become
    identical between 32 and 64 bits, and both are empty wrappers around
    do_rt_sigreturn(). Remove both wrappers and rename both to
    sys_rt_sigreturn().

    Cc: Brian Gerst
    Cc: Tejun Heo
    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     
  • Some syscalls need to access the pt_regs structure, either to copy
    user register state or to modifiy it. This patch adds stubs to load
    the address of the pt_regs struct into the %eax register, and changes
    the syscalls to take the pointer as an argument instead of relying on
    the assumption that the pt_regs structure overlaps the function
    arguments.

    Drop the use of regparm(1) due to concern about gcc bugs, and to move
    in the direction of the eventual removal of regparm(0) for asmlinkage.

    Signed-off-by: Brian Gerst
    Signed-off-by: H. Peter Anvin

    Brian Gerst
     

11 Feb, 2009

2 commits


10 Feb, 2009

2 commits

  • Impact: cleanup

    On x86_32, %gs is handled lazily. It's not saved and restored on
    kernel entry/exit but only when necessary which usually is during task
    switch but there are few other places. Currently, it's done by
    calling savesegment() and loadsegment() explicitly. Define
    get_user_gs(), set_user_gs() and task_user_gs() and use them instead.

    While at it, clean up register access macros in signal.c.

    This cleans up code a bit and will help future changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Ingo Molnar
     

24 Jan, 2009

1 commit

  • Impact: use new framework

    Use {get|put}_user_try, catch, and _ex in arch/x86/kernel/signal.c.

    Note: this patch contains "WARNING: line over 80 characters", because when
    introducing new block I insert an indent to avoid mistakes by edit.

    Signed-off-by: Hiroshi Shimamoto
    Signed-off-by: H. Peter Anvin

    Hiroshi Shimamoto
     

21 Jan, 2009

1 commit

  • This reverts commit 4217458dafaa57d8e26a46f5d05ab8c53cf64191.

    Justin Madru bisected this commit, it was causing weird Firefox
    crashes.

    The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
    the calling frame, which corrupts the syscall return pt_regs state and
    thus corrupts user-space register state.

    So we go back to the slightly less clean but more optimization-safe
    method of getting to pt_regs. Also add a comment to explain this.

    Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505

    Reported-and-bisected-by: Justin Madru
    Tested-by: Justin Madru
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

29 Dec, 2008

1 commit