RFC 3550: RTP: A Transport Protocol for Real-Time Applications

30 Mar 2026

This algorithm may be used for sessions in which all participants are allowed to send. O The interval between RTCP packets is varied randomly over the range 0.5,1.5 times the calculated interval to avoid unintended synchronization of all participants . This allows an application to provide fast response for small sessions where, for example, identification of all participants is important, yet automatically adapt to large sessions. The algorithm described in Section 6.3 and Appendix A.7 was designed to meet the goals outlined in this section. O For all sessions, the fixed minimum SHOULD be used when calculating the participant timeout interval (see Section 6.3.5) so that implementations which do not use the reduced value for transmitting RTCP packets are not timed out by other participants prematurely.

Methods for Ensuring QoS in RTP Streams

Actual presentation occurs some time later as determined by the receiver. Therefore, although these timestamps are sufficient to reconstruct the timing of a single stream, directly comparing RTP timestamps from different media is not effective for synchronization. The resolution of the clock MUST be sufficient for the desired synchronization accuracy and for measuring packet arrival jitter (one tick per video frame is typically not sufficient). The sampling instant MUST be derived from a clock that increments monotonically and linearly in time to allow synchronization and jitter calculations (see Section 6.4.1).

In general, a translator SHOULD NOT aggregate SR and RR packets from different sources into one packet since that would reduce the accuracy of the propagation delay measurements based on the LSR and DLSR fields.
The specification of such protocols and mechanisms is outside the scope of this document.
In particular, this approach should be applied to the multiple sessions of a layered encoding scheme (see Section 2.4).
Other transport protocols specifically designed for multimedia sessions are SCTP and DCCP, although, as of 2012update, they were not in widespread use.
Modern implementations use adaptive jitter buffers that dynamically adjust their size based on observed network conditions.
Despite the separation, synchronized playback of a source’s audio and video can be achieved using timing information carried in the RTCP packets for both sessions.

RTP Header Structure

The only difference between the sender report (SR) and receiver report (RR) forms, besides the packet type code, is that the sender report includes a 20-byte sender information section for use by active senders. 6.4 Sender and Receiver Reports RTP receivers provide reception quality feedback using RTCP report packets which may take one of two forms depending upon whether or not the receiver is also a sender. In particular, this approach should be applied to the multiple sessions of a layered encoding scheme (see Section 2.4). For example, an application may be designed to send only CNAME, NAME and EMAIL and not any others. Rather than estimate these fractions dynamically, it is recommended that the percentages be translated statically into report interval counts based on the typical length of an item. To do this, the participant computes the deterministic (without the randomization factor) calculated interval Td for a receiver, that is, with we_sent false.
Research on audio and video over packet-switched networks dates back to the early 1970s. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features. The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks.
Standards Track Page 74 RFC 3550 RTP July 2003 Appendix A – Algorithms We provide examples of C code for aspects of RTP sender and receiver algorithms. Acknowledgments This memorandum is based on discussions within the IETF Audio/Video Transport working group chaired by Stephen Casner and Colin Perkins. These names are for use by higher-level control protocols, such as the Session Description Protocol (SDP), RFC 2327 , to refer to transport methods. Rightly or not, users may be more sensitive to privacy concerns with audio and video communication than they have been with more traditional forms of network communication . In addition, RTP may be sent via IP multicast, which provides no direct means for a sender to know all the receivers of the data sent and therefore no measure of privacy.

What is SRTP?

In the context of RTP over IP multicast, the source can stripe the progressive layers of a hierarchically represented signal across multiple RTP sessions each carried on its own multicast group. Instead, responsibility for rate-adaptation can be placed at the receivers by combining a layered encoding with a layered transmission system. This does not work well with multicast transmission because of the conflicting bandwidth requirements of heterogeneous receivers. 2.4 Layered Encodings Multimedia applications should be able to adjust the transmission rate to match the capacity of the receiver or to adapt to network congestion. Other examples of translation include the connection of a group of hosts speaking only IP/UDP to a group of hosts that understand only ST-II, or the packet-by-packet encoding translation of video streams from individual sources without resynchronization or mixing.

Jitter Buffer

Examples of such protocols include the Session Initiation Protocol (SIP) (RFC 3261 ), ITU Recommendation H.323 and applications using SDP (RFC 2327 ), such as RTSP (RFC 2326 ). It is also acceptable for a third-party monitor to receive the RTP data packets but not send RTCP packets or otherwise be counted in the session. The monitor function is likely to be built into the application(s) participating in the session, but may also be a separate application that does not otherwise participate and does not send or receive the RTP data packets (since they are on a separate port). An end system can act as one or more synchronization sources in a particular RTP session, but typically only one. Standards Track Page 10 RFC 3550 RTP July 2003 was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer). A participant need not use the same SSRC identifier for all the RTP sessions in a multimedia session; the binding of the SSRC identifiers is provided through RTCP (see Section 6.5.1).

In order to track loops of the participant’s own data packets, the implementation luckygans casino MUST also keep a separate list of source transport addresses (not identifiers) that have been found to be conflicting. Note that if two sources on the same host are transmitting with the same source identifier at the time a receiver begins operation, it would be possible that the first RTP packet received came from one of the sources while the first RTCP packet received came from the other. This problem can be avoided by keeping the source transport address fixed across restarts, but in any case will be resolved after a timeout at the receivers. (As explained below, this step is taken only once in case of a loop.) If a receiver discovers that two other sources are colliding, it MAY keep the packets from one and discard the packets from the other when this can be detected by different source transport addresses or CNAMEs.

This correspondence may be used for intra- and inter-media synchronization for sources whose NTP timestamps are synchronized, and may be used by media-independent receivers to estimate the nominal RTP clock frequency.
To do this, the participant computes the deterministic (without the randomization factor) calculated interval Td for a receiver, that is, with we_sent false.
The packet-based data transmission in RTP reduces buffering and lag, and diverse payload formats allow accommodation to various codecs and resolutions.
The trade-off is that the buffer adds a small amount of latency, typically 20 to 60 ms for voice calls.
O Timing out a participant is to be based on inactivity for a number of RTCP report intervals calculated using the receiver RTCP bandwidth fraction even for active senders.
Examples of synchronization sources include the sender of a stream of packets derived from a signal source such as a microphone or a camera, or an RTP mixer (see below).

Standards Track Page 7 RFC 3550 RTP July 2003 Mixers and translators may be designed for a variety of purposes. The RTP header includes a means for mixers to identify the sources that contributed to a mixed packet so that correct talker indication can be provided at the receivers. The sequence number can also be used by the receiver to estimate how many packets are being lost. In these examples, RTP is carried on top of IP and UDP, and follows the conventions established by the profile for audio and video specified in the companion RFC 3551. A profile for audio and video data may be found in the companion RFC 3551 .
On the other hand, multiplexing multiple related sources of the same medium in one RTP session using different SSRC values is the norm for multicast sessions. The RTCP sender and receiver reports (see Section 6.4) can only describe one timing and sequence number space per SSRC and do not carry a payload type field. For example, in a teleconference composed of audio and video media encoded separately, each medium SHOULD be carried in a separate RTP session with its own destination transport address.

Can RTP stream both audio and video simultaneously?

Security Considerations RTP suffers from the same security liabilities as the underlying protocols. Those are the RTCP fraction of session bandwidth, the minimum report interval, and the bandwidth split between senders and receivers. A profile for audio and video applications may be found in the companion RFC 3551. Carrying several RTP packets in one network or transport packet reduces header overhead and may simplify synchronization between different streams. A profile MAY specify a framing method to be used even when RTP is carried in protocols that do provide framing in order to allow carrying several RTP packets in one lower-layer protocol data unit, such as a UDP packet.

ข่าวประชาสัมพันธ์