28 Jun, 2011

3 commits

  • Change other callers of do_notify_parent() to check the value it
    returns, this makes the subsequent task_detached() unnecessary.
    Mark do_notify_parent() as __must_check.

    Use thread_group_leader() instead of !task_detached() to check
    if we need to notify the real parent in wait_task_zombie().

    Remove the stale comment in release_task(). "just for sanity" is
    no longer true, we have to set EXIT_DEAD to avoid the races with
    do_wait().

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     
  • Kill tracehook_notify_death(), reimplement the logic in its caller,
    exit_notify().

    Also, change the exec_id's check to use thread_group_leader() instead
    of task_detached(), this is more clear. This logic only applies to
    the exiting leader, a sub-thread must never change its exit_signal.

    Note: when the traced group leader exits the exit_signal-or-SIGCHLD
    logic looks really strange:

    - we notify the tracer even if !thread_group_empty() but
    do_wait(WEXITED) can't work until all threads exit

    - if the tracer is real_parent, it is not clear why can't
    we use ->exit_signal event if !thread_group_empty()

    -v2: do not try to fix the 2nd oddity to avoid the subtle behavior
    change mixed with reorganization, suggested by Tejun.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Tejun Heo

    Oleg Nesterov
     
  • - change do_notify_parent() to return a boolean, true if the task should
    be reaped because its parent ignores SIGCHLD.

    - update the only caller which checks the returned value, exit_notify().

    This temporary uglifies exit_notify() even more, will be cleanuped by
    the next change.

    Signed-off-by: Oleg Nesterov
    Acked-by: Tejun Heo

    Oleg Nesterov
     

23 Jun, 2011

6 commits

  • tracehook.h is on the way out. Rename tracehook_tracer_task() to
    ptrace_parent() and move it from tracehook.h to ptrace.h.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Cc: John Johansen
    Cc: Stephen Smalley
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • At this point, tracehooks aren't useful to mainline kernel and mostly
    just add an extra layer of obfuscation. Although they have comments,
    without actual in-kernel users, it is difficult to tell what are their
    assumptions and they're actually trying to achieve. To mainline
    kernel, they just aren't worth keeping around.

    This patch kills the following clone and exec related tracehooks.

    tracehook_prepare_clone()
    tracehook_finish_clone()
    tracehook_report_clone()
    tracehook_report_clone_complete()
    tracehook_unsafe_exec()

    The changes are mostly trivial - logic is moved to the caller and
    comments are merged and adjusted appropriately.

    The only exception is in check_unsafe_exec() where LSM_UNSAFE_PTRACE*
    are OR'd to bprm->unsafe instead of setting it, which produces the
    same result as the field is always zero on entry. It also tests
    p->ptrace instead of (p->ptrace & PT_PTRACED) for consistency, which
    also gives the same result.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • At this point, tracehooks aren't useful to mainline kernel and mostly
    just add an extra layer of obfuscation. Although they have comments,
    without actual in-kernel users, it is difficult to tell what are their
    assumptions and they're actually trying to achieve. To mainline
    kernel, they just aren't worth keeping around.

    This patch kills the following trivial tracehooks.

    * Ones testing whether task is ptraced. Replace with ->ptrace test.

    tracehook_expect_breakpoints()
    tracehook_consider_ignored_signal()
    tracehook_consider_fatal_signal()

    * ptrace_event() wrappers. Call directly.

    tracehook_report_exec()
    tracehook_report_exit()
    tracehook_report_vfork_done()

    * ptrace_release_task() wrapper. Call directly.

    tracehook_finish_release_task()

    * noop

    tracehook_prepare_release_task()
    tracehook_report_death()

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Cc: Martin Schwidefsky
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • Move SIGTRAP on exec(2) logic from tracehook_report_exec() to
    ptrace_event(). This is part of changes to make ptrace_event()
    smarter and handle ptrace event related details in one place.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • This patch implements ptrace_event_enabled() which tests whether a
    given PTRACE_EVENT_* is enabled and use it to simplify ptrace_event()
    and tracehook_prepare_clone().

    PT_EVENT_FLAG() macro is added which calculates PT_TRACE_* flag from
    PTRACE_EVENT_*. This is used to define PT_TRACE_* flags and by
    ptrace_event_enabled() to find the matching flag.

    This is used to make ptrace_event() and tracehook_prepare_clone()
    simpler.

    * ptrace_event() callers were responsible for providing mask to test
    whether the event was enabled. This patch implements
    ptrace_event_enabled() and make ptrace_event() drop @mask and
    determine whether the event is enabled from @event. Note that
    @event is constant and this conversion doesn't add runtime overhead.

    All conversions except tracehook_report_clone_complete() are
    trivial. tracehook_report_clone_complete() used to use 0 for @mask
    (always enabled) but now tests whether the specified event is
    enabled. This doesn't cause any behavior difference as it's
    guaranteed that the event specified by @trace is enabled.

    * tracehook_prepare_clone() now only determines which event is
    applicable and use ptrace_event_enabled() for enable test.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • task_ptrace(task) simply dereferences task->ptrace and isn't even used
    consistently only adding confusion. Kill it and directly access
    ->ptrace instead.

    This doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     

17 Jun, 2011

5 commits

  • The previous patch implemented async notification for ptrace but it
    only worked while trace is running. This patch introduces
    PTRACE_LISTEN which is suggested by Oleg Nestrov.

    It's allowed iff tracee is in STOP trap and puts tracee into
    quasi-running state - tracee never really runs but wait(2) and
    ptrace(2) consider it to be running. While ptracer is listening,
    tracee is allowed to re-enter STOP to notify an async event.
    Listening state is cleared on the first notification. Ptracer can
    also clear it by issuing INTERRUPT - tracee will re-trap into STOP
    with listening state cleared.

    This allows ptracer to monitor group stop state without running tracee
    - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
    wait(2) to wait for the next group stop event. When it happens,
    PTRACE_GETSIGINFO provides information to determine the current state.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207
    #define PTRACE_LISTEN 0x4208

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(int argc, char **argv)
    {
    pid_t tracee, tracer;
    int i;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    tracer = fork();
    if (!tracer) {
    siginfo_t si;

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    repeat:
    waitid(P_PID, tracee, NULL, WSTOPPED);

    ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
    if (!si.si_code) {
    printf("tracer: SIG %d\n", si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(unsigned long)si.si_signo);
    goto repeat;
    }
    printf("tracer: stopped=%d signo=%d\n",
    si.si_signo != SIGTRAP, si.si_signo);
    if (si.si_signo != SIGTRAP)
    ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
    else
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    goto repeat;
    }

    for (i = 0; i < 3; i++) {
    nanosleep(&ts1s, NULL);
    printf("mother: SIGSTOP\n");
    kill(tracee, SIGSTOP);
    nanosleep(&ts1s, NULL);
    printf("mother: SIGCONT\n");
    kill(tracee, SIGCONT);
    }
    nanosleep(&ts1s, NULL);

    kill(tracer, SIGKILL);
    kill(tracee, SIGKILL);
    return 0;
    }

    This is identical to the program to test TRAP_NOTIFY except that
    tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
    This allows ptracer to monitor when group stop ends without running
    tracee.

    # ./test-listen
    tracer: stopped=0 signo=5
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18

    -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
    task_stopped_code() as suggested by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently there's no way for ptracer to find out whether group stop
    finished other than polling with INTERRUPT - GETSIGINFO - CONT
    sequence. This patch implements group stop notification for ptracer
    using STOP traps.

    When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
    is set, which schedules a STOP trap which is sticky - it isn't cleared
    by other traps and at least one STOP trap will happen eventually.
    STOP trap is synchronization point for event notification and the
    tracer can determine the current group stop state by looking at the
    signal number portion of exit code (si_status from waitid(2) or
    si_code from PTRACE_GETSIGINFO).

    Notifications are generated both on start and end of group stops but,
    because group stop participation always happens before STOP trap, this
    doesn't cause an extra trap while tracee is participating in group
    stop. The symmetry will be useful later.

    Note that this notification works iff tracee is not trapped.
    Currently there is no way to be notified of group stop state changes
    while tracee is trapped. This will be addressed by a later patch.

    An example program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts1s = { .tv_sec = 1 };

    int main(int argc, char **argv)
    {
    pid_t tracee, tracer;
    int i;

    tracee = fork();
    if (!tracee)
    while (1)
    pause();

    tracer = fork();
    if (!tracer) {
    siginfo_t si;

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    repeat:
    waitid(P_PID, tracee, NULL, WSTOPPED);

    ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
    if (!si.si_code) {
    printf("tracer: SIG %d\n", si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL,
    (void *)(unsigned long)si.si_signo);
    goto repeat;
    }
    printf("tracer: stopped=%d signo=%d\n",
    si.si_signo != SIGTRAP, si.si_signo);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    goto repeat;
    }

    for (i = 0; i < 3; i++) {
    nanosleep(&ts1s, NULL);
    printf("mother: SIGSTOP\n");
    kill(tracee, SIGSTOP);
    nanosleep(&ts1s, NULL);
    printf("mother: SIGCONT\n");
    kill(tracee, SIGCONT);
    }
    nanosleep(&ts1s, NULL);

    kill(tracer, SIGKILL);
    kill(tracee, SIGKILL);
    return 0;
    }

    In the above program, tracer keeps tracee running and gets
    notification of each group stop state changes.

    # ./test-notify
    tracer: stopped=0 signo=5
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18
    mother: SIGSTOP
    tracer: SIG 19
    tracer: stopped=1 signo=19
    mother: SIGCONT
    tracer: stopped=0 signo=5
    tracer: SIG 18

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • Currently, there's no way to trap a running ptracee short of sending a
    signal which has various side effects. This patch implements
    PTRACE_INTERRUPT which traps ptracee without any signal or job control
    related side effect.

    The implementation is almost trivial. It uses the group stop trap -
    SIGTRAP | PTRACE_EVENT_STOP << 8. A new trap flag
    JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
    cleared when any trap happens. As INTERRUPT should be useable
    regardless of the current state of tracee, task_is_traced() test in
    ptrace_check_attach() is skipped for INTERRUPT.

    PTRACE_INTERRUPT is available iff tracee is attached with
    PTRACE_SEIZE.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_INTERRUPT 0x4207

    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive pid=%d\n", getpid());
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);

    printf("tracer: INTERRUPT and DETACH\n");
    ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_DETACH, tracee, NULL, NULL);
    nanosleep(&ts3s, NULL);

    printf("tracer: exiting\n");
    kill(tracee, SIGKILL);
    return 0;
    }

    When called without argument, tracee is seized from running state,
    interrupted and then detached back to running state.

    # ./test-interrupt
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: INTERRUPT and DETACH
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracee: alive pid=4546
    tracer: exiting

    When called with argument, tracee is seized from stopped state,
    continued, interrupted and then detached back to stopped state.

    # ./test-interrupt 1
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracee: alive pid=4548
    tracer: INTERRUPT and DETACH
    tracer: exiting

    Before PTRACE_INTERRUPT, once the tracee was running, there was no way
    to trap tracee and do PTRACE_DETACH without causing side effect.

    -v2: Updated to use task_set_jobctl_pending() so that it doesn't end
    up scheduling TRAP_STOP if child is dying which may make the
    child unkillable. Spotted by Oleg.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     
  • PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
    effects on tracee signal and job control states. This patch
    implements a new ptrace request PTRACE_SEIZE which attaches a tracee
    without trapping it or affecting its signal and job control states.

    The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
    flags in @data. Currently, the only defined flag is
    PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
    PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
    The changes will be implemented gradually and the DEVEL flag is to
    prevent programs which expect full SEIZE behavior from using it before
    all the behavior modifications are complete while allowing unit
    testing. The flag will be removed once SEIZE behaviors are completely
    implemented.

    * PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap. After
    attaching tracee continues to run unless a trap condition occurs.

    * PTRACE_SEIZE doesn't affect signal or group stop state.

    * If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
    exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
    the stopping signals if group stop is in effect or SIGTRAP
    otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
    instead of NULL.

    Seizing sets PT_SEIZED in ->ptrace of the tracee. This flag will be
    used to determine whether new SEIZE behaviors should be enabled.

    Test program follows.

    #define PTRACE_SEIZE 0x4206
    #define PTRACE_SEIZE_DEVEL 0x80000000

    static const struct timespec ts100ms = { .tv_nsec = 100000000 };
    static const struct timespec ts1s = { .tv_sec = 1 };
    static const struct timespec ts3s = { .tv_sec = 3 };

    int main(int argc, char **argv)
    {
    pid_t tracee;

    tracee = fork();
    if (tracee == 0) {
    nanosleep(&ts100ms, NULL);
    while (1) {
    printf("tracee: alive\n");
    nanosleep(&ts1s, NULL);
    }
    }

    if (argc > 1)
    kill(tracee, SIGSTOP);

    nanosleep(&ts100ms, NULL);

    ptrace(PTRACE_SEIZE, tracee, NULL,
    (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
    if (argc > 1) {
    waitid(P_PID, tracee, NULL, WSTOPPED);
    ptrace(PTRACE_CONT, tracee, NULL, NULL);
    }
    nanosleep(&ts3s, NULL);
    printf("tracer: exiting\n");
    return 0;
    }

    When the above program is called w/o argument, tracee is seized while
    running and remains running. When tracer exits, tracee continues to
    run and print out messages.

    # ./test-seize-simple
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    tracee: alive
    tracee: alive

    When called with an argument, tracee is seized from stopped state and
    continued, and returns to stopped state when tracer exits.

    # ./test-seize
    tracee: alive
    tracee: alive
    tracee: alive
    tracer: exiting
    # ps -el|grep test-seize
    1 T 0 4720 1 0 80 0 - 941 signal ttyS0 00:00:00 test-seize

    -v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
    suggested.

    -v3: PTRACE_EVENT_STOP traps now report group stop state by signr. If
    group stop is in effect the stop signal number is returned as
    part of exit_code; otherwise, SIGTRAP. This was suggested by
    Denys and Oleg.

    Signed-off-by: Tejun Heo
    Cc: Jan Kratochvil
    Cc: Denys Vlasenko
    Cc: Oleg Nesterov

    Tejun Heo
     
  • do_signal_stop() implemented both normal group stop and trap for group
    stop while ptraced. This approach has been enough but scheduled
    changes require trap mechanism which can be used in more generic
    manner and using group stop trap for generic trap site simplifies both
    userland visible interface and implementation.

    This patch adds a new jobctl flag - JOBCTL_TRAP_STOP. When set, it
    triggers a trap site, which behaves like group stop trap, in
    get_signal_to_deliver() after checking for pending signals. While
    ptraced, do_signal_stop() doesn't stop itself. It initiates group
    stop if requested and schedules JOBCTL_TRAP_STOP and returns. The
    caller - get_signal_to_deliver() - is responsible for checking whether
    TRAP_STOP is pending afterwards and handling it.

    ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
    JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending trap
    bits and TRAPPING so that TRAP_STOP and future trap bits don't linger
    after detach.

    While at it, add proper function comment to do_signal_stop() and make
    it return bool.

    -v2: __ptrace_unlink() updated to clear JOBCTL_TRAP_MASK and TRAPPING
    instead of JOBCTL_PENDING_MASK. This avoids accidentally
    clearing JOBCTL_STOP_CONSUME. Spotted by Oleg.

    -v3: do_signal_stop() updated to return %false without dropping
    siglock while ptraced and TRAP_STOP check moved inside for(;;)
    loop after group stop participation. This avoids unnecessary
    relocking and also will help avoiding unnecessary traps by
    consuming group stop before handling pending traps.

    -v4: Jobctl trap handling moved into a separate function -
    do_jobctl_trap().

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov

    Tejun Heo
     

05 Jun, 2011

5 commits

  • Remove the following three noop tracehooks in signals.c.

    * tracehook_force_sigpending()
    * tracehook_get_signal()
    * tracehook_finish_jctl()

    The code area is about to be updated and these hooks don't do anything
    other than obfuscating the logic.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
    pending bits too. Setting pending conditions on a dying task may make
    the task unkillable. Currently, each setting site is responsible for
    checking for the condition but with to-be-added job control traps this
    becomes too fragile.

    This patch adds task_set_jobctl_pending() which should be used when
    setting task->jobctl bits to schedule a stop or trap. The function
    performs the followings to ease setting pending bits.

    * Sanity checks.

    * If fatal signal is pending or PF_EXITING is set, no bit is set.

    * STOP_SIGMASK is automatically cleared if new value is being set.

    do_signal_stop() and ptrace_attach() are updated to use
    task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
    The surrounding structures around setting are changed to fit
    task_set_jobctl_pending() better but there should be no userland
    visible behavior difference.

    Signed-off-by: Tejun Heo
    Cc: Oleg Nesterov
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • This patch introduces JOBCTL_PENDING_MASK and replaces
    task_clear_jobctl_stop_pending() with task_clear_jobctl_pending()
    which takes an extra @mask argument.

    JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but
    future patches will add more bits. recalc_sigpending_tsk() is updated
    to use JOBCTL_PENDING_MASK instead.

    task_clear_jobctl_pending() takes @mask which in subset of
    JOBCTL_PENDING_MASK and clears the relevant jobctl bits. If
    JOBCTL_STOP_PENDING is set, other STOP bits are cleared together. All
    task_clear_jobctl_stop_pending() users are updated to call
    task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is
    functionally identical to task_clear_jobctl_stop_pending().

    This patch doesn't cause any functional change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • PTRACE_INTERRUPT is going to be added which should also skip
    task_is_traced() check in ptrace_check_attach(). Rename @kill to
    @ignore_state and make it bool. Add function comment while at it.

    This patch doesn't introduce any behavior difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     
  • signal->group_stop currently hosts mostly group stop related flags;
    however, it's gonna be used for wider purposes and the GROUP_STOP_
    flag prefix becomes confusing. Rename signal->group_stop to
    signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

    Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
    defined in terms of them to allow using bitops later.

    While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
    future additions.

    This doesn't cause any functional change.

    -v2: JOBCTL_*_BIT macros added as suggested by Linus.

    Signed-off-by: Tejun Heo
    Cc: Linus Torvalds
    Signed-off-by: Oleg Nesterov

    Tejun Heo
     

04 Jun, 2011

5 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
    tg3: Fix tg3_skb_error_unmap()
    net: tracepoint of net_dev_xmit sees freed skb and causes panic
    drivers/net/can/flexcan.c: add missing clk_put
    net: dm9000: Get the chip in a known good state before enabling interrupts
    drivers/net/davinci_emac.c: add missing clk_put
    af-packet: Add flag to distinguish VID 0 from no-vlan.
    caif: Fix race when conditionally taking rtnl lock
    usbnet/cdc_ncm: add missing .reset_resume hook
    vlan: fix typo in vlan_dev_hard_start_xmit()
    net/ipv4: Check for mistakenly passed in non-IPv4 address
    iwl4965: correctly validate temperature value
    bluetooth l2cap: fix locking in l2cap_global_chan_by_psm
    ath9k: fix two more bugs in tx power
    cfg80211: don't drop p2p probe responses
    Revert "net: fix section mismatches"
    drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run()
    sctp: stop pending timers and purge queues when peer restart asoc
    drivers/net: ks8842 Fix crash on received packet when in PIO mode.
    ip_options_compile: properly handle unaligned pointer
    iwlagn: fix incorrect PCI subsystem id for 6150 devices
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-block:
    block: Use hlist_entry() for io_context.cic_list.first
    cfq-iosched: Remove bogus check in queue_fail path
    xen/blkback: potential null dereference in error handling
    xen/blkback: don't call vbd_size() if bd_disk is NULL
    block: blkdev_get() should access ->bd_disk only after success
    CFQ: Fix typo and remove unnecessary semicolon
    block: remove unwanted semicolons
    Revert "block: Remove extra discard_alignment from hd_struct."
    nbd: adjust 'max_part' according to part_shift
    nbd: limit module parameters to a sane value
    nbd: pass MSG_* flags to kernel_recvmsg()
    block: improve the bio_add_page() and bio_add_pc_page() descriptions

    Linus Torvalds
     
  • * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    asm-generic/unistd.h: support sendmmsg syscall
    tile: enable CONFIG_BUGVERBOSE

    Linus Torvalds
     
  • This reverts commit b1c43f82c5aa265442f82dba31ce985ebb7aa71c.

    It was broken in so many ways, and results in random odd pty issues.

    It re-introduced the buggy schedule_work() in flush_to_ldisc() that can
    cause endless work-loops (see commit a5660b41af6a: "tty: fix endless
    work loop when the buffer fills up").

    It also used an "unsigned int" return value fo the ->receive_buf()
    function, but then made multiple functions return a negative error code,
    and didn't actually check for the error in the caller.

    And it didn't actually work at all. BenH bisected down odd tty behavior
    to it:
    "It looks like the patch is causing some major malfunctions of the X
    server for me, possibly related to PTYs. For example, cat'ing a
    large file in a gnome terminal hangs the kernel for -minutes- in a
    loop of what looks like flush_to_ldisc/workqueue code, (some ftrace
    data in the quoted bits further down).

    ...

    Some more data: It -looks- like what happens is that the
    flush_to_ldisc work queue entry constantly re-queues itself (because
    the PTY is full ?) and the workqueue thread will basically loop
    forver calling it without ever scheduling, thus starving the consumer
    process that could have emptied the PTY."

    which is pretty much exactly the problem we fixed in a5660b41af6a.

    Milton Miller pointed out the 'unsigned int' issue.

    Reported-by: Benjamin Herrenschmidt
    Reported-by: Milton Miller
    Cc: Stefan Bigler
    Cc: Toby Gray
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Alan Cox
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • …wireless-2.6 into for-davem

    John W. Linville
     

03 Jun, 2011

2 commits

  • Because there is a possibility that skb is kfree_skb()ed and zero cleared
    after ndo_start_xmit, we should not see the contents of skb like skb->len and
    skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
    and causes panic by NULL pointer dereference.
    This patch fixes trace_net_dev_xmit not to see the contents of skb directly.

    If you want to reproduce this panic,

    1. Get tracepoint of net_dev_xmit on
    2. Create 2 guests on KVM
    2. Make 2 guests use virtio_net
    4. Execute netperf from one to another for a long time as a network burden
    5. host will panic(It takes about 30 minutes)

    Signed-off-by: Koki Sanagi
    Signed-off-by: David S. Miller

    Koki Sanagi
     
  • Signed-off-by: Chris Metcalf
    Acked-by: Arnd Bergmann

    Chris Metcalf
     

02 Jun, 2011

3 commits

  • Currently, user-space cannot determine if a 0 tcp_vlan_tci
    means there is no VLAN tag or the VLAN ID was zero.

    Add flag to make this explicit. User-space can check for
    TP_STATUS_VLAN_VALID || tp_vlan_tci > 0, which will be backwards
    compatible. Older could would have just checked for tp_vlan_tci,
    so it will work no worse than before.

    Signed-off-by: Ben Greear
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Ben Greear
     
  • * git://git.infradead.org/iommu-2.6:
    intel-iommu: Fix off-by-one in RMRR setup
    intel-iommu: Add domain check in domain_remove_one_dev_info
    intel-iommu: Remove Host Bridge devices from identity mapping
    intel-iommu: Use coherent DMA mask when requested
    intel-iommu: Dont cache iova above 32bit
    intel-iommu: Speed up processing of the identity_mapping function
    intel-iommu: Check for identity mapping candidate using system dma mask
    intel-iommu: Only unlink device domains from iommu
    intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
    intel-iommu: Flush unmaps at domain_exit
    intel-iommu: Remove obsolete comment from detect_intel_iommu
    intel-iommu: fix VT-d PMR disable for TXT on S3 resume

    Linus Torvalds
     
  • Commit 0a35d36 ("cfg80211: Use capability info to detect mesh beacons")
    assumed that probe response with both ESS and IBSS bits cleared
    means that the frame was sent by a mesh sta.

    However, these capabilities are also being used in the p2p_find phase,
    and the mesh-validation broke it.

    Rename the WLAN_CAPABILITY_IS_MBSS macro, and verify that mesh ies
    exist before assuming this frame was sent by a mesh sta.

    Signed-off-by: Eliad Peller
    Signed-off-by: John W. Linville

    Eliad Peller
     

01 Jun, 2011

4 commits

  • * git://git.infradead.org/mtd-2.6:
    mtd: fix physmap.h warnings

    Linus Torvalds
     
  • There are no externally-visible changes with this. In the loop in the
    internal __domain_mapping() function, we simply detect if we are mapping:
    - size >= 2MiB, and
    - virtual address aligned to 2MiB, and
    - physical address aligned to 2MiB, and
    - on hardware that supports superpages.

    (and likewise for larger superpages).

    We automatically use a superpage for such mappings. We never have to
    worry about *breaking* superpages, since we trust that we will always
    *unmap* the same range that was mapped. So all we need to do is ensure
    that dma_pte_clear_range() will also cope with superpages.

    Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
    it can return a PTE at the appropriate level rather than always
    extending the page tables all the way down to level 1. Again, this is
    simplified by the fact that we should never encounter existing small
    pages when we're creating a mapping; any old mapping that used the same
    virtual range will have been entirely removed and its obsolete page
    tables freed.

    Provide an 'intel_iommu=sp_off' argument on the command line as a
    chicken bit. Not that it should ever be required.

    ==

    The original commit seen in the iommu-2.6.git was Youquan's
    implementation (and completion) of my own half-baked code which I'd
    typed into an email. Followed by half a dozen subsequent 'fixes'.

    I've taken the unusual step of rewriting history and collapsing the
    original commits in order to keep the main history simpler, and make
    life easier for the people who are going to have to backport this to
    older kernels. And also so I can give it a more coherent commit comment
    which (hopefully) gives a better explanation of what's going on.

    The original sequence of commits leading to identical code was:

    Youquan Song (3):
    intel-iommu: super page support
    intel-iommu: Fix superpage alignment calculation error
    intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()

    David Woodhouse (4):
    intel-iommu: Precalculate superpage support for dmar_domain
    intel-iommu: Fix hardware_largepage_caps()
    intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
    intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages

    Signed-off-by: Youquan Song
    Signed-off-by: David Woodhouse

    Youquan Song
     
  • Fix build warnings in physmap.h:

    include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list
    include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want
    include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list
    include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list

    Signed-off-by: Randy Dunlap
    Signed-off-by: David Woodhouse

    Randy Dunlap
     
  • If the peer restart the asoc, we should not only fail any unsent/unacked
    data, but also stop the T3-rtx, SACK, T4-rto timers, and teardown ASCONF
    queues.

    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Wei Yongjun
     

31 May, 2011

1 commit

  • Since those defined functions require additional semicolon
    from the caller, they could cause potential syntax errors
    when used in if-else statements.

    Signed-off-by: Namhyung Kim
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Namhyung Kim
     

30 May, 2011

6 commits

  • It was not a good idea to start dereferencing disk->queue from
    the fs sysfs strategy for displaying discard alignment. We ran
    into first a NULL pointer deref, and after fixing that we sometimes
    see unvalid disk->queue pointer values.

    Since discard is the only one of the bunch actually looking into
    the queue, just revert the change.

    This reverts commit 23ceb5b7719e9276d4fa72a3ecf94dd396755276.

    Conflicts:
    fs/partitions/check.c

    Jens Axboe
     
  • Add an API that tells the other side that callbacks
    should be delayed until a lot of work has been done.
    Implement using the new event_idx feature.

    Note: it might seem advantageous to let the drivers
    ask for a callback after a specific capacity has
    been reached. However, as a single head can
    free many entries in the descriptor table,
    we don't really have a clue about capacity
    until get_buf is called. The API is the simplest
    to implement at the moment, we'll see what kind of
    hints drivers can pass when there's more than one
    user of the feature.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • With the new used_event and avail_event and features, both
    host and guest need similar logic to check whether events are
    enabled, so it helps to put the common code in the header.

    Note that Xen has similar logic for notification hold-off
    in include/xen/interface/io/ring.h with req_event and req_prod
    corresponding to event_idx + 1 and new_idx respectively.
    +1 comes from the fact that req_event and req_prod in Xen start at 1,
    while event index in virtio starts at 0.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Define a new feature bit for the guest and host to utilize
    an event index (like Xen) instead if a flag bit to enable/disable
    interrupts and kicks.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • It's unclear to me if it's important, but it's obviously causing my
    technical colleages some headaches and I'd hate such imprecision to
    slow virtio adoption.

    I've emailed this to all non-trivial contributors for approval, too.

    Signed-off-by: Rusty Russell
    Acked-by: Grant Likely
    Acked-by: Ryan Harper
    Acked-by: Anthony Liguori
    Acked-by: Eric Van Hensbergen
    Acked-by: john cooper
    Acked-by: Aneesh Kumar K.V
    Acked-by: Christian Borntraeger
    Acked-by: Fernando Luis Vazquez Cao

    Rusty Russell
     
  • * 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
    pnfs-obj: pg_test check for max_io_size
    NFSv4.1: define nfs_generic_pg_test
    NFSv4.1: use pnfs_generic_pg_test directly by layout driver
    NFSv4.1: change pg_test return type to bool
    NFSv4.1: unify pnfs_pageio_init functions
    pnfs-obj: objlayout_encode_layoutcommit implementation
    pnfs: encode_layoutcommit
    pnfs-obj: report errors and .encode_layoutreturn Implementation.
    pnfs: encode_layoutreturn
    pnfs: layoutret_on_setattr
    pnfs: layoutreturn
    pnfs-obj: osd raid engine read/write implementation
    pnfs: support for non-rpc layout drivers
    pnfs-obj: define per-inode private structure
    pnfs: alloc and free layout_hdr layoutdriver methods
    pnfs-obj: objio_osd device information retrieval and caching
    pnfs-obj: decode layout, alloc/free lseg
    pnfs-obj: pnfs_osd XDR client implementation
    pnfs-obj: pnfs_osd XDR definitions
    pnfs-obj: objlayoutdriver module skeleton
    ...

    Linus Torvalds