Discussion:
RTP over TCP bugs
Ralf Globisch
2011-07-17 09:19:08 UTC
Permalink
Hi Ross,

We use the LIVE555 Streaming Media library to implement a live streaming server
(on Ubuntu) and a windows media player RTSP client streaming RTP over
TCP only.
We have run into a complicated set of problems recently:

1) When a client with a *bad* internet connection joins the RTSP
server, the server
becomes unresponsive (to new RTSP connections) and connected clients
are dropped.
Enabling the DEBUG processor directive showed that many writes fail in
RTPInterface::sendRTPOverTCP before the server finally hangs. After searching
the mailing list for any similar problems, I came across this post:
http://lists.live555.com/pipermail/live-devel/2010-June/012226.html
which seems to
indicate that using the new asynchronous RTSP client resolves that issue?

Could you please clarify this a bit? If I understand correctly, we
must therefore upgrade
both our server and client code to use the latest live media version
to avoid the server
hanging? Or would updating the client code to use the new async
RTSPClient code suffice?

2) We are aware that you only support the latest version so we
recently upgraded the live555 version.
We then ran into another issue: while the stream is playing back, CPU
usage is negligible.
Then, once network problems occur, the CPU usage spikes to 100%.
Matt's post (http://lists.live555.com/pipermail/live-devel/2011-June/013521.html)
could be related.
However commenting out the adding of dummy socket to fWriteSet and
fExceptionSet as you
advised in your response to Matt didn't solve the problem. Upgrading
our WMP client to use the
new asynchronous interface didn't help either. Finally, we were able
to replicate the problem using
only openRTSP.

Here's a short summary of the steps taken so far:

LiveMedia 2010.04.09 - synchronous - WMP RTSPClient - no problems with CPU usage
LiveMedia 2011.03.14 - synchronous - WMP RTSPClient - same CPU usage problem
LiveMedia 2011.03.14 - asynchronous - WMP RTSPClient - same CPU usage problem
LiveMedia 2011.03.14 - asynchronous - openRTSP - same CPU usage problem
LiveMedia 2011.07.15 - asynchronous - openRTSP - same CPU usage problem

Ways to replicate the problem:
- One method that has worked to trigger the problem is to start
running openRTSP and then restarting our ADSL router once the stream
is active.

Could this have anything to do with "the Windows bug that forced me to
add "dummySocket" in the first place"?

We are currently piloting the project and these errors are critical
for the success of the pilot.
Any pointers/advice would be greatly appreciated.

Best regards,
Ralf
Ralf Globisch
2011-07-17 10:41:08 UTC
Permalink
In case it helps: profiling shows that the following seems to account
for the CPU usage:

void SocketDescriptor::tcpReadHandler(SocketDescriptor*
socketDescriptor, int mask) {
socketDescriptor->tcpReadHandler1(mask);
}

Commenting out the dummy descriptor on fReadSet didn't help solve the issue.

The stack trace seemingly causing the high CPU uage looks somewhat as follows:

BasicTaskScheduler::SingleStep
SocketDescriptor::tcpReadHandler
socketDescriptor->tcpReadHandler1 -> [fTCPReadingState =
AWAITING_PACKET_DATA]
rtpInterface->fReadHandlerProc ->
MultiFramedRTPSource::networkReadHandler
MultiFramedRTPSource::networkReadHandler1 ->
[packetReadWasIncomplete = false -> fPacketReadInProgress = NULL]
....
if (bPacket->dataSize() < 12) break; ->
[readSuccess = false]
doGetNextFrame1();

Does this help at all in resolving/tracking down the issue? Perhaps
offsetting the next read event by some milliseconds?

Regards,
Ralf
Ross Finlayson
2011-07-18 08:02:07 UTC
Permalink
Post by Ralf Globisch
Could you please clarify this a bit? If I understand correctly, we
must therefore upgrade
both our server and client code to use the latest live media version
to avoid the server
hanging? Or would updating the client code to use the new async
RTSPClient code suffice?
I'm not interested in any alleged server bug, unless the server is
running an up-to-date version of our software (in which case it
doesn't matter what version of our software the client is running, or
whether or not the client is using the synchronous or asynchronous
interface). Similarly, I'm not interested in any alleged client bug,
unless the client is running an up-to-date version of our software
(in which case it doesn't matter what version of our software the
server is running).

However, if you have control over *both* the client and server
software, then it would be best if you upgrade both the client and
the server to the latest version.

In any case, I can't really help you with a non-specific bug claim
(e.g., "CPU usage spikes to 100%") unless you can show how it can be
repeated using our supplied, unmodified software and demo
applications (the latest version, of course).

If, however, you have a problem that arises only with your own,
custom code, then you're going to have to track down *specifically*
what is happening, and in which part of our code. Sorry (but
Remember, You Have Complete Source Code).

Referring to old archived email messages usually doesn't help,
because the problems referred to in those emails either weren't real,
or else should have been fixed in later versions of the code.
--
Ross Finlayson
Live Networks, Inc.
http://www.live555.com/
Ralf Globisch
2011-07-18 08:30:03 UTC
Permalink
Hi Ross,

Thanks for your response.
Post by Ross Finlayson
I'm not interested in any alleged server bug, unless the server is
running an up-to-date version of our software (in which case it
doesn't matter what version of our software the client is running, or
whether or not the client is using the synchronous or asynchronous
Ok, thanks, point made. We are in the process of updating our server
code base to the latest version of the live555. The term *bug* was
perhaps inappropriate in the title.
Post by Ross Finlayson
interface). Similarly, I'm not interested in any alleged client bug,
unless the client is running an up-to-date version of our software
(in which case it doesn't matter what version of our software the
server is running).
Since you stated that it doesn't matter what version the server is running:

Perhaps I didn't express myself clear enough. It is for that exact
reason, that I replicated the client problem using only an *unmodifed*
openRTSP client using the *latest* version of LiveMedia.
The accompanying stack trace (if you can call it that) was from obtained
running the openRTSP client.

I was able to (quickly) replicate the problem multiple times by resetting
the ADSL router. However, the problem occurs in practice without
needing to do this, since the Internet infrastructure in South Africa
is unreliable/not as advanced as in the US/Europe.
It just takes longer for it to happen (from minutes to hours).

FWIW, I will try to replicate this when connecting openRTSP to
the standard Live555 RTSP server.
Post by Ross Finlayson
Referring to old archived email messages usually doesn't help,
because the problems referred to in those emails either weren't real,
or else should have been fixed in later versions of the code.
Agree. Please accept my apology. (at the time of writing I thought the
second reference was valid since it was from a fairly recent discussion)
However profiling the code, showed that it clearly wasn't relevant.
Loading...