08 Jun, 2011

1 commit

  • The flush_to_ldisc() work entry has special logic to notice when it has
    seen the original tail of the data queue, and it avoids continuing the
    flush if it sees that _original_ tail rather than the current tail.

    This logic can trigger in case somebody is constantly adding new data to
    the tty while the flushing is active - and the intent is to avoid
    excessive CPU usage while flushing the tty, especially as we used to do
    this from a softirq context which made it non-preemptible.

    However, since we no longer re-arm the work-queue from within itself
    (because that causes other trouble: see commit a5660b41af6a "tty: fix
    endless work loop when the buffer fills up"), this just leads to
    possible hung tty's (most easily seen in SMP and with a test-program
    that floods a pty with data - nobody seems to have reported this for any
    real-life situation yet).

    And since the workqueue isn't done from timers and softirq's any more,
    it's doubtful whether the CPU useage issue is really relevant any more.
    So just remove the logic entirely, and see if anybody ever notices.

    Alternatively, we might want to re-introduce the "re-arm the work" for
    just this case, but then we'd have to re-introduce the delayed work
    model or some explicit timer, which really doesn't seem worth it for
    this.

    Reported-and-tested-by: Guillaume Chazarain
    Cc: Alan Cox
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Jun, 2011

1 commit

  • This reverts commit b1c43f82c5aa265442f82dba31ce985ebb7aa71c.

    It was broken in so many ways, and results in random odd pty issues.

    It re-introduced the buggy schedule_work() in flush_to_ldisc() that can
    cause endless work-loops (see commit a5660b41af6a: "tty: fix endless
    work loop when the buffer fills up").

    It also used an "unsigned int" return value fo the ->receive_buf()
    function, but then made multiple functions return a negative error code,
    and didn't actually check for the error in the caller.

    And it didn't actually work at all. BenH bisected down odd tty behavior
    to it:
    "It looks like the patch is causing some major malfunctions of the X
    server for me, possibly related to PTYs. For example, cat'ing a
    large file in a gnome terminal hangs the kernel for -minutes- in a
    loop of what looks like flush_to_ldisc/workqueue code, (some ftrace
    data in the quoted bits further down).

    ...

    Some more data: It -looks- like what happens is that the
    flush_to_ldisc work queue entry constantly re-queues itself (because
    the PTY is full ?) and the workqueue thread will basically loop
    forver calling it without ever scheduling, thus starving the consumer
    process that could have emptied the PTY."

    which is pretty much exactly the problem we fixed in a5660b41af6a.

    Milton Miller pointed out the 'unsigned int' issue.

    Reported-by: Benjamin Herrenschmidt
    Reported-by: Milton Miller
    Cc: Stefan Bigler
    Cc: Toby Gray
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Alan Cox
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Apr, 2011

1 commit

  • it makes it simpler to keep track of the amount of
    bytes received and simplifies how flush_to_ldisc counts
    the remaining bytes. It also fixes a bug of lost bytes
    on n_tty when flushing too many bytes via the USB
    serial gadget driver.

    Tested-by: Stefan Bigler
    Tested-by: Toby Gray
    Signed-off-by: Felipe Balbi
    Signed-off-by: Greg Kroah-Hartman

    Felipe Balbi
     

05 Apr, 2011

1 commit

  • Commit f23eb2b2b285 ('tty: stop using "delayed_work" in the tty layer')
    ended up causing hung machines on UP with no preemption, because the
    work routine to flip the buffer data to the ldisc would endlessly re-arm
    itself if the destination buffer had filled up.

    With the delayed work, that only caused a timer-driving polling of the
    tty state every timer tick, but without the delay we just ended up with
    basically a busy loop instead.

    Stop the insane polling, and instead make the code that opens up the
    receive room re-schedule the buffer flip work. That's what we should
    have been doing anyway.

    This same "poll for tty room" issue is almost certainly also the cause
    of excessive kworker activity when idle reported by Dave Jones, who also
    reported "flush_to_ldisc executing 2500 times a second" back in Nov 2010:

    http://lkml.org/lkml/2010/11/30/592

    which is that silly flushing done every timer tick. Wasting both power
    and CPU for no good reason.

    Reported-and-tested-by: Alexander Beregalov
    Reported-and-tested-by: Sitsofe Wheeler
    Cc: Greg KH
    Cc: Alan Cox
    Cc: Dave Jones
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Mar, 2011

1 commit

  • Using delayed-work for tty flip buffers ends up causing us to wait for
    the next tick to complete some actions. That's usually not all that
    noticeable, but for certain latency-critical workloads it ends up being
    totally unacceptable.

    As an extreme case of this, passing a token back-and-forth over a pty
    will take two ticks per iteration, so even just a thousand iterations
    will take 8 seconds assuming a common 250Hz configuration.

    Avoiding the whole delayed work issue brings that ping-pong test-case
    down to 0.009s on my machine.

    In more practical terms, this latency has been a performance problem for
    things like dive computer simulators (simulating the serial interface
    using the ptys) and for other environments (Alan mentions a CP/M emulator).

    Reported-by: Jef Driesen
    Acked-by: Greg KH
    Acked-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

10 Nov, 2010

1 commit

  • There's a small window inside the flush_to_ldisc function,
    where the tty is unlocked and calling ldisc's receive_buf
    function. If in this window new buffer is added to the tty,
    the processing might never leave the flush_to_ldisc function.

    This scenario will hog the cpu, causing other tty processing
    starving, and making it impossible to interface the computer
    via tty.

    I was able to exploit this via pty interface by sending only
    control characters to the master input, causing the flush_to_ldisc
    to be scheduled, but never actually generate any output.

    To reproduce, please run multiple instances of following code.

    - SNIP
    #define _XOPEN_SOURCE
    #include
    #include
    #include
    #include
    #include

    int main(int argc, char **argv)
    {
    int i, slave, master = getpt();
    char buf[8192];

    sprintf(buf, "%s", ptsname(master));
    grantpt(master);
    unlockpt(master);

    slave = open(buf, O_RDWR);
    if (slave < 0) {
    perror("open slave failed");
    return 1;
    }

    for(i = 0; i < sizeof(buf); i++)
    buf[i] = rand() % 32;

    while(1) {
    write(master, buf, sizeof(buf));
    }

    return 0;
    }
    - SNIP

    The attached patch (based on -next tree) fixes this by checking on the
    tty buffer tail. Once it's reached, the current work is rescheduled
    and another could run.

    Signed-off-by: Jiri Olsa
    Cc: stable
    Acked-by: Alan Cox
    Signed-off-by: Greg Kroah-Hartman

    Jiri Olsa
     

05 Nov, 2010

1 commit