Audio/Video Transport WG                                     Y.-K. Wang 
Internet Draft                                                    Nokia 
Intended status: Standards track                             T. Schierl 
Expires: February 2009                                   Fraunhofer HHI 
                                                        August 21, 2008 
 
                                      
                     RTP Payload Format for MVC Video 
                       draft-wang-avt-rtp-mvc-02.txt 


Status of this Memo 

   By submitting this Internet-Draft, each author represents that any 
   applicable patent or other IPR claims of which he or she is aware 
   have been or will be disclosed, and any of which he or she becomes 
   aware will be disclosed, in accordance with Section 6 of BCP 79. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 

   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 

   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 

   This Internet-Draft will expire on February 21, 2008. 

Copyright Notice 

   Copyright (C) The IETF Trust (2008). 

 

Abstract 

   This memo describes an RTP payload format for the multiview 
   extension of the ITU-T Recommendation H.264 video codec that is 
   technically identical to ISO/IEC International Standard 14496-10.  
 
 
 
Wang et al            Expires February 21, 2009                [Page 1] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   The RTP payload format allows for packetization of one or more 
   Network Abstraction Layer (NAL) units, produced by the video 
   encoder, in each RTP payload.  The payload format has wide 
   applicability, such as 3D video streaming, free-viewpoint video, and 
   3DTV. 
    

Table of Contents 

   1. Introduction...................................................3 
   2. Conventions....................................................3 
   3. The MVC Codec..................................................4 
      3.1. Overview..................................................4 
      3.2. Parameter Set Concept.....................................5 
      3.3. Network Abstraction Layer Unit Header.....................5 
   4. Scope..........................................................8 
   5. Definitions and Abbreviations..................................8 
      5.1. Definitions...............................................8 
         5.1.1. Definitions per MVC specification....................8 
         5.1.2. Definitions local to this memo.......................9 
      5.1. Abbreviations.............................................9 
   6. MVC RTP Payload Format.........................................9 
      6.1. Design Principles.........................................9 
      6.2. RTP Header Usage.........................................10 
      6.3. Common Structure of the RTP Payload Format...............10 
      6.4. NAL Unit Header Usage....................................10 
      6.5. Packetization Modes......................................11 
         6.5.1. Packetization Modes for single-session transmission.11 
         6.5.2. Packetization Modes for multi-session transmission..12 
      6.6. Aggregation Packets......................................12 
      6.7. Fragmentation Units (FUs)................................12 
      6.8. Payload Content Scalability Information (PACSI) NAL Unit for 
      MVC...........................................................12 
      6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs)16 
      6.10. Cross-Session DON (CS-DON) for multi-session transmission16 
   7. Packetization Rules...........................................16 
   8. De-Packetization Process (Informative)........................18 
   9. Payload Format Parameters.....................................18 
      9.1. Media Type Registration..................................18 
      9.2. SDP Parameters...........................................20 
         9.2.1. Mapping of Payload Type Parameters to SDP...........20 
         9.2.2. Usage with the SDP Offer/Answer Model...............20 
         9.2.3. Usage with multi-session transmission...............20 
         9.2.4. Usage in Declarative Session Descriptions...........20 
      9.3. Examples.................................................20 
 
 
Wang et al            Expires February 21, 2009                [Page 2] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

      9.4. Parameter Set Considerations.............................20 
   10. Security Considerations......................................20 
   11. Congestion Control...........................................21 
   12. IANA Considerations..........................................21 
   13. Acknowledgments..............................................21 
   14. References...................................................21 
      14.1. Normative References....................................21 
      14.2. Informative References..................................22 
   Author's Addresses...............................................22 
   Intellectual Property Statement..................................23 
   Disclaimer of Validity...........................................23 
   15. Open issues:.................................................24 
   16. Changes Log..................................................24 
    
    

1. Introduction 

   This memo specifies an RTP [RFC3550] payload format for a forthcoming 
   new mode of the H.264/AVC video coding standard, known as Multiview 
   Video Coding (MVC).  Formally, MVC will take the form of Amendment 4 
   to ISO/IEC 14496 Part 10 [MPEG4-10], and Annex H of ITU-T Rec. H.264 
   [H.264]. The latest draft specification of MVC is available in [MVC]. 

   MVC covers a wide range of 3D video applications, including 3D video 
   streaming, free-viewpoint video as well as 3DTV. 

   This memo follows a backward compatible enhancement philosophy, by 
   keeping as close an alignment to the H.264/AVC payload format 
   [RFC3984] as possible.  It documents the enhancements relevant from 
   an RTP transport viewpoint, and defines signaling support for MVC, 
   including a new media subtype name. 

   Due to the similarity between MVC and SVC in system and transport 
   aspects, this memo reuses the design principles as well as many 
   features of the SVC RTP payload draft [I-D.draft-ietf-avt-svc]. 

    

   [Ed.Note(TS):Need text on session multiplexing and on the relation of 
   this draft to [I-D.draft-ietf-avt-svc] here.] 

2. Conventions 

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 

 
 
Wang et al            Expires February 21, 2009                [Page 3] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   document are to be interpreted as described in BCP 14, RFC 2119 
   [RFC2119]. 

   This specification uses the notion of setting and clearing a bit when 
   bit fields are handled.  Setting a bit is the same as assigning that 
   bit the value of 1 (On).  Clearing a bit is the same as assigning 
   that bit the value of 0 (Off). 

3. The MVC Codec 

3.1. Overview 

   MVC provides multi-view video bitstreams.  An MVC bitstream contains 
   a base view conforming to at least one of the profiles of H.264/AVC 
   as defined in Annex A of [H.264], and one or more non-base views.  To 
   enable high compression efficiency, coding of a non-base view can 
   utilize other views for inter-view prediction, thus its decoding 
   relies on the presence of the views it depends on.  Each coded view 
   itself may be temporally scalable.  Besides temporal scalability, MVC 
   also supports view scalability, wherein a subset of the encoded views 
   can be extracted, decoded and displayed, whenever it is desired by 
   the application. 

   The concept of video coding layer (VCL) and network abstraction layer 
   (NAL) is inherited from H.264/AVC.  The VCL contains the signal 
   processing functionality of the codec; mechanisms such as transform, 
   quantization, motion-compensated prediction, loop filtering and 
   inter-layer prediction.  The Network Abstraction Layer (NAL) 
   encapsulates each slice generated by the VCL into one or more Network 
   Abstraction Layer Units (NAL units).  Please consult RFC 3984 for a 
   more in-depth discussion of the NAL unit concept.  MVC specifies the 
   decoding order of NAL units. 

   In MVC, one access unit contains all NAL units pertaining to one 
   output time instance for all the views.  Within one access unit, the 
   coded representation of each view, also named as view component, 
   consists of one or more slices. 

   The concept of temporal scalability is not newly introduced by SVC or 
   MVC, as profiles defined in Annex A of [H.264] already support it.  
   In [H.264], sub-sequences have been introduced in order to allow 
   optional use of temporal layers.  SVC extended this approach by 
   advertising the temporal scalability information within the NAL unit 
   header or prefix NAL units, both were inherited to MVC. 



 
 
Wang et al            Expires February 21, 2009                [Page 4] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

3.2. Parameter Set Concept 

   The parameter set concept was first specified in [H.264].  Please 
   refer to section 1.2 of [RFC3984] for more details.  SVC introduced 
   some new parameter set mechanisms.  MVC has inherited the parameter 
   set concept from [H.264]. 

   In particular, a different type of sequence parameter set (SPS), 
   which is referred to as subset SPS, using a different NAL unit type 
   than "the old SPS" specified in [H.264] is used for non-base views, 
   while the base view still uses "the old SPS".  Slices from different 
   views would be able to use either 1) the same sequence or picture 
   parameter set, or 2) different sequence or picture parameter sets. 

   The inter-view dependency and the decoding order of all the encoded 
   views are indicated in a new syntax structure, the SPS MVC extension, 
   included in each subset SPS. 

3.3. Network Abstraction Layer Unit Header 

   An MVC NAL unit of type 20 or 14 consists of a header of four octets 
   and the payload byte string.  MVC NAL units of type 20 are coded 
   slices of non-base views.  A special type of an MVC NAL unit is the 
   prefix NAL unit (type 14) that includes descriptive information of 
   the associated H.264/AVC VCL NAL unit (type 1 or 5) that immediately 
   follows the prefix NAL unit. 

   MVC extends the one-byte H.264/AVC NAL unit header by three 
   additional octets.  The header indicates the type of the NAL unit, 
   the (potential) presence of bit errors or syntax violations in the 
   NAL unit payload, information regarding the relative importance of 
   the NAL unit for the decoding process, the view identification 
   information, the temporal layer identification information, and other 
   fields as discussed below. 

   The syntax and semantics of the NAL unit header are formally 
   specified in [MVC], but the essential properties of the NAL unit 
   header are summarized below. 

   The first byte of the NAL unit header has the following format (the 
   bit fields are the same as defined for the one-byte H.264/AVC NAL 
   unit header, while the semantics of some fields have changed 
   slightly, in a backward compatible way): 

         +---------------+ 
         |0|1|2|3|4|5|6|7| 
         +-+-+-+-+-+-+-+-+ 
 
 
Wang et al            Expires February 21, 2009                [Page 5] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

         |F|NRI|  Type   | 
         +---------------+ 
    
   F: 1 bit 

   forbidden_zero_bit.  H.264/AVC declares a value of 1 as a syntax 
   violation. 

   NRI: 2 bits 

   nal_ref_idc.  A value of 00 indicates that the content of the NAL 
   unit is not used to reconstruct reference pictures for future 
   prediction.  Such NAL units can be discarded without risking the 
   integrity of the reference pictures in the same view.  A value higher 
   than 00 indicates that the decoding of the NAL unit is required to 
   maintain the integrity of reference pictures in the same view, or 
   that the NAL unit contains parameter sets. 

   Type: 5 bits 

   nal_unit_type.  This component specifies the NAL unit type. 

   In H.264/AVC, NAL unit types 14 and 20 are reserved for future 
   extensions.  MVC uses these two NAL unit types.  NAL unit type 14 is 
   used for prefix NAL unit, and NAL unit type 20 is used for coded 
   slice of non-base view.  NAL unit types 14 and 20 indicate the 
   presence of three additional octets in the NAL unit header, as shown 
   below. 

            +---------------+---------------+---------------+ 
            |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
            |R|I|  PRID     | VID               | TID |A|V|O| 
            +---------------+---------------+---------------+ 
    
   PRID: 6 bits 

   priority_id.  This flag specifies a priority identifier for the NAL 
   unit.  A lower value of PRID indicates a higher priority.  

   TID: 3 bits 

   temporal_id.  This component specifies the temporal layer (or frame 
   rate) hierarchy.  Informally put, a temporal layer consisting of view 
   component with a less temporal_id corresponds to a lower frame rate.  
   A given temporal layer typically depends on the lower temporal layers 
   (i.e. the temporal layers with less temporal_id values) but never 
 
 
Wang et al            Expires February 21, 2009                [Page 6] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   depends on any higher temporal layer (i.e. a temporal layers with 
   higher temporal_id value).  

   A: 1 bit 

   anchor_pic_flag.  This component specifies whether the view component 
   is an anchor picture (when equal to 1) or not (when equal to 0), as 
   specified in [MVC]. 

   VID: 10 bits 

   view_id.  This component specifies the view identifier of the view 
   the NAL unit belongs to.  

   I: 1 bit 

   idr_flag.  This component specifies whether the view component is a 
   view instantaneous decoding refresh (V-IDR) picture for the view 
   (when equal to 1) or not (when equal to 0), as specified in [MVC]. 

   V: 1 bit 

   inter_view_flag.  This component specifies whether the view component 
   is used for inter-view prediction (when equal to 1) or not (when 
   equal to 0). 

   R: 1 bit 

   reserved_zero_one_bit.  Reserved bit for future extension.  R MUST be 
   equal to 0.  Receivers SHOULD ignore the value of 
   reserved_zero_one_bit.  

   O: 1 bit 

   reserved_one_bit.  Reserved bit for future extension.  R shall be 
   equal to 1.  Receivers SHOULD ignore the value of 
   reserved_zero_one_bit.  

   This memo reuses the same additional NAL unit types introduced in RFC 
   3984, which are presented in section 6.3.  In addition, this memo 
   introduces one more NAL unit type, 30, as specified in section 6.8.  
   These NAL unit types are marked as unspecified in [MVC] and 
   intentionally reserved for use in systems specifications like this 
   memo.  Moreover, this specification extends the semantics of F, NRI, 
   PRID, TID, A, and I as described in section 6.4.  


 
 
Wang et al            Expires February 21, 2009                [Page 7] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

4. Scope 

   This payload specification can only be used to carry the "naked" NAL 
   unit stream over RTP, and not the byte stream format according to 
   Annex B of [MVC].  Likely, the applications of this specification 
   will be in the IP based multimedia communications fields including 3D 
   video streaming over IP, free-viewpoint video over IP, and 3DTV over 
   IP. 

   This specification allows, in a given RTP packet stream, to 
   encapsulate NAL units belonging to 

     o the base view only, detailed specification in [RFC3984], or 

     o one or more non-base views, or 

     o the base view and one or non-base views 

   [Ed.Note(YkW): To be extended to allow separate carriage of different 
   temporal layers in different RTP packet streams as in  
   [I-D.draft-ietf-avt-svc].] 

5. Definitions and Abbreviations  

5.1. Definitions 

5.1.1. Definitions per MVC specification 

   This document uses the definitions of [MVC].  The following terms, 
   defined in [MVC], are summed up for convenience: 

   access unit:  A set of NAL units always containing exactly one 
   primary coded picture with one or more view components. In addition 
   to the primary coded picture, an access unit may also contain one or 
   more redundant coded pictures, one auxiliary coded picture, or other 
   NAL units not containing slices or slice data partitions of a coded 
   picture. The decoding of an access unit always results in one decoded 
   picture. All slices or slice data partitions in an access unit have 
   the same value of picture order count.  

   prefix NAL unit:  A NAL unit with nal_unit_type equal to 14 that 
   immediately precedes a NAL unit with nal_unit_type equal to 1, 5, 
   or 12.  The NAL unit that succeeds the prefix NAL unit is also 
   referred to as the associated NAL unit.  The prefix NAL unit contains 
   data associated with the associated NAL unit, which are considered to 
   be part of the associated NAL unit.  

 
 
Wang et al            Expires February 21, 2009                [Page 8] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

5.1.2. Definitions local to this memo 

   MVC NAL unit:  A NAL unit of NAL unit type 14 or 20 as specified in 
   Annex H of [MVC]. An MVC NAL unit has a four-byte NAL unit header.  

   operation point:  An operation point of an MVC bitstream represents a 
   certain level of temporal and view scalability.  An operation point 
   contains only those NAL units required for a valid bitstream to 
   represent a certain subset of views at a certain temporal level.  An 
   operation point is described by the view_id values of the subset of 
   views, and the highest temporal_id. 

   multi-session transmission: The transmission mode in which the MVC 
   bitstream is transmitted over multiple RTP sessions, with each stream 
   having the same SSRC.  These multiple RTP streams can be associated 
   using the RTCP CNAME, or explicit signalling of the SSRC used.  
   Dependency between RTP sessions MUST be signaled according to [I-
   D.ietf-mmusic-decoding-dependency] and this memo.  

   single-session transmission: The transmission mode in which the MVC 
   bitstream is transmitted over a single RTP session, with a single 
   SSRC and separate timestamp and sequence number spaces. 

   [Ed.Note(TS):Need more definitions here.] 

5.1. Abbreviations 

   In addition to the abbreviations defined in [RFC3984], the following 
   ones are defined.  

   MVC:       Multiview Video Coding 
   CS-DON:    Cross-Session Decoding Order Number 
   MST:       multi-session transmission 
   PACSI:     Payload Content Scalability Information 
   SST:       single-session transmission 

6. MVC RTP Payload Format 

6.1. Design Principles 

   The following design principles have been observed: 

   o Backward compatibility with [RFC3984] wherever possible. 

   o As the MVC base view is H.264/AVC compatible, the base view or any 
   H.264/AVC compatible subset of it, when transmitted in its own RTP 
   packet stream, MUST be encapsulated using [RFC3984].  Requiring this 
 
 
Wang et al            Expires February 21, 2009                [Page 9] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   has the desirable side effect that the transmitted data can be 
   received by [RFC3984] receivers and decoded by H.264/AVC decoders.  

   o Media-Aware Network Elements (MANEs) as defined in [RFC3984] are 
   signaling aware and rely on signaling information.  MANEs have state.   

   o MANEs can aggregate multiple RTP streams, possibly from multiple 
   RTP sessions.   

   o MANEs can perform media-aware stream thinning.  By using the 
   payload header information identifying Layers within an RTP session, 
   MANEs are able to remove packets from the incoming RTP packet stream.  
   This implies rewriting the RTP headers of the outgoing packet stream 
   and rewriting of RTCP Receiver Reports.  

6.2. RTP Header Usage 

   Please see section 5.1 of [RFC3984]. 

6.3. Common Structure of the RTP Payload Format 

   Please see section 5.2 of [RFC3984]. 

6.4. NAL Unit Header Usage 

   The structure and semantics of the NAL unit header were introduced in 
   section 3.3.  This section specifies the semantics of F, NRI, PRID, 
   TID, A and I according to this specification. 

   Note that, in the context of this section, "protecting a NAL unit" 
   means any RTP or network transport mechanism that could improve the 
   probability of success delivery of the packet conveying the NAL unit, 
   including applying a QoS-enabled network, forward error correction 
   (FEC), retransmissions, and advanced scheduling behavior, whenever 
   possible.  

   The semantics of F specified in section 5.3 of [RFC3984] also applies 
   herein.  

   For NRI, for a bitstream conforming to one of the profiles defined in 
   Annex A of [H.264] and transported using [RFC3984], the semantics 
   specified in section 5.3 of [RFC3984] are applicable, i.e., NRI also 
   indicates the relative importance of NAL units.  In MVC context, in 
   addition to the semantics specified in Annex H of [MVC] are 
   applicable, NRI also indicate the relative importance of NAL units 
   within a view.  MANEs MAY use this information to protect more 

 
 
Wang et al            Expires February 21, 2009               [Page 10] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   important NAL units better than less important NAL units.  
   [Ed.Note(YkW): "MVC context" to be clearly specified.] 

   For PRID, the semantics specified in Annex H of [MVC] applies.  Note 
   that MANEs implementing unequal error protection MAY use this 
   information to protect NAL units with smaller PRID values better than 
   those with larger PRID values, for example by including only the more 
   important NAL units in a forward error correction (FEC) protection 
   mechanism.  The importance for the decoding process decreases as the 
   PRID value increases.  

   For TID, in addition to the semantics specified in Annex H of [MVC], 
   according to this memo, values of TID indicate the relative 
   importance.  A lower value of TID indicates a higher importance for 
   NAL units within a view.  MANEs MAY use this information to protect 
   more important NAL units better than less important NAL units.  

   For A, in addition to the semantics specified in Annex H of [MVC], 
   according to this memo, MANEs MAY use this information to protect NAL 
   units with A equal to 1 better than NAL units with A equal to 0.  
   MANEs MAY also utilize information of NAL units with A equal to 1 to 
   decide when to forward more packets for an RTP packet stream.  For 
   example, when it is sensed that view switching has happened such that 
   the operation point has changed, MANEs MAY start to forward NAL units 
   for a new target view only after forwarding a NAL unit with A equal 
   to 1 for the new target view.  

   For I, in addition to the semantics specified in Annex H of [MVC], 
   according to this memo, MANEs MAY use this information to protect NAL 
   units with I equal to 1 better than NAL units with I equal to 0.  
   MANEs MAY also utilize information of NAL units with I equal to 1 to 
   decide when to forward more packets for an RTP packet stream.  For 
   example, when it is sensed that view switching has happened such that 
   the operation point has changed, MANEs MAY start to forward NAL units 
   for a new target view only after forwarding a NAL unit with I equal 
   to 1 for the new target view.  

6.5. Packetization Modes 

   [Ed.Note(TS): Need to add text from [I-D.draft-ietf-avt-rtp-svc] to 
   this section with respect to MVC.] 

6.5.1. Packetization Modes for single-session transmission  

   This section will address the issues of section 4.5.1 and 5.1 of [I-
   D.draft-ietf-avt-rtp-svc]. 

 
 
Wang et al            Expires February 21, 2009               [Page 11] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

    

6.5.2. Packetization Modes for multi-session transmission 

   This section will address the issues of section 4.5.2 and 5.2 of [I-
   D.draft-ietf-avt-rtp-svc]. 

6.6. Aggregation Packets 

   This section will address the issues of section 4.7 of [I-D.draft-
   ietf-avt-rtp-svc]. 

6.7. Fragmentation Units (FUs) 

   This section will address the issues of section 4.8 of [I-D.draft-
   ietf-avt-rtp-svc]. 

6.8. Payload Content Scalability Information (PACSI) NAL Unit for MVC 

   A new NAL unit type is specified in this memo, and referred to as 
   payload content scalability information (PACSI) NAL unit.  The PACSI 
   NAL unit, if present, MUST be the first NAL unit in an aggregation 
   packet, and it MUST NOT be present in other types of packets.  The 
   PACSI NAL unit indicates view and temporal scalability information 
   and other characteristics that are common for all the remaining NAL 
   units in the payload of the aggregation packet. Furthermore, a PACSI 
   NAL unit MAY include a DONC field and contain zero or more SEI NAL 
   units.  PACSI NAL unit makes it easier for MANEs to decide whether to 
   forward/process/discard the aggregation packet containing the PACSI 
   NAL unit.  Senders MAY create PACSI NAL units and receivers MAY 
   ignore them, or use them as hints to enable efficient aggregation 
   packet processing.  Note that the NAL unit type for the PACSI NAL 
   unit is selected among those values that are unspecified in [MVC] and 
   [RFC3984]. 

   When the first aggregation unit of an aggregation packet contains a 
   PACSI NAL unit, there MUST be at least one additional aggregation 
   unit present in the same packet.  The RTP header and payload header 
   fields of the aggregation packet are set according to the remaining 
   NAL units in the aggregation packet. 

   When a PACSI NAL unit is included in a multi-time aggregation packet 
   (MTAP), the decoding order number (DON) for the PACSI NAL unit MUST 
   be set to indicate that the PACSI NAL unit has an identical DON to 
   the first NAL unit in decoding order among the remaining NAL units in 
   the aggregation packet. 

 
 
Wang et al            Expires February 21, 2009               [Page 12] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   The structure of a PACSI NAL unit is as follows.  The first four 
   octets are exactly the same as the four-byte MVC NAL unit header as 
   discussed in section 3.3.  They are followed by two always present 
   octet, two optional octets, and zero or more SEI NAL units, each SEI 
   NAL unit preceded by a 16-bit unsigned size field (in network byte 
   order) that indicates the size of the following NAL unit in bytes 
   (excluding these two octets, but including the NAL unit type octet of 
   the SEI NAL unit).  Figure 1 illustrates the PACSI NAL unit structure 
   and an example of a PACSI NAL unit containing two SEI NAL units.  

   The bits P, C, S, and E are specified only if the bit X is equal to 
   1. The T bit MUST NOT be equal to 1 if the aggregation packet 
   containing the PACSI NAL unit is not an STAP-A packet.  The T bit MAY 
   be equal to 1 if the aggregation packet containing the PACSI NAL unit 
   is an STAP-A packet.  The field DONC MUST NOT be present if the T bit 
   is equal to 0, and MUST be present if the T bit is equal to 1.  

      0                   1                   2                   3     
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1   
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |F|NRI|  Type   |S|   PRID    | TID |A|      VID          |I|V|R| 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |X|T|RR |P|C|S|E|    RRR        |          DONC (optional)      | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |        NAL unit size 1        |                               | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         SEI NAL unit 1        | 
      |                                                               | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
      |        NAL unit size 2        |                               | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   SEI NAL unit 2              | 
      |                                                               | 
      |                                               +-+-+-+-+-+-+-+-+ 
      |                                               | 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
 
         Figure 1.  PACSI NAL unit structure  

   The values of the fields in PACSI NAL unit MUST be set as follows.  
   The term "target NAL units" are used in the semantics of some fields.  
   The target NAL units are such NAL units contained in the aggregation 
   packet, but not included in the PACSI NAL unit, that are within the 
   access unit to which the first NAL unit following the PACSI NAL unit 
   in the aggregation packet belongs. 




 
 
Wang et al            Expires February 21, 2009               [Page 13] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   o The F bit MUST be set to 1 if the F bit in at least one of the 
   remaining NAL units in the aggregation packet is equal to 1.  
   Otherwise, the F bit MUST be set to 0. 

   o The NRI field MUST be set to the highest value of NRI field among 
   all the remaining NAL units in the aggregation packet. 

   o The Type field MUST be set to 30. 

   o The S bit MUST be set to 1. 

   o The PRID field MUST be set to the lowest value of the PRID values 
   of all the remaining NAL units in the aggregation packet. 

   o The TID field MUST be set to the lowest value of the TID values of 
   all the remaining NAL units with the lowest value of VID in the 
   aggregation packet. 

   o The A bit MUST be set to 1 if the A bit of at least one of the 
   remaining NAL units in the aggregation packet is equal to 1.  
   Otherwise, the A bit MUST be set to 0. 

   o The VID field MUST be set to the lowest value of the VID values of 
   all the remaining NAL units in the aggregation packet. 

   o The I bit MUST be set to 1 if the I bit of at least one of the 
   remaining NAL units in the aggregation packet is equal to 1.  
   Otherwise, the I bit MUST be set to 0. 

   o The V bit MUST be set to 1 if the V bit of at least one of the 
   remaining NAL units in the aggregation packet is equal to 1.  
   Otherwise, the A bit MUST be set to 0. 

   o The R bit MUST be set to 0.  Receivers SHOULD ignore the value of 
   R. 

   o If the X bit is equal to 1, the bits P, C, S, and E are specified 
   as below. Otherwise, the bits P, C, S, and E are unspecified, and 
   receivers MUST ignore these bits.  The X bit SHOULD be identical for 
   all the PACSI NAL units involved in all the RTP sessions conveying an 
   MVC bitstream.   

   o The RR field MUST be set to '00' (in binary form).  Receivers 
   SHOULD ignore the value of RR. 



 
 
Wang et al            Expires February 21, 2009               [Page 14] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   o If the T bit is equal to 1, the OPTIONAL field DONC MUST be present 
   and specified as below. Otherwise, the field DONC MUST NOT be 
   present. 

   o The P bit MUST be set to 1 if all the remaining NAL units in the 
   aggregation packet are with redundant_pic_cnt higher than 0, i.e. the 
   slices are redundant slices.  Otherwise, the P bit MUST be set to 0. 

      Informative note: The P bit indicates whether the packet can be 
      discarded because it contains only redundant slice NAL units.  
      Without this bit, the corresponding information can be concluded 
      from the syntax element redundant_pic_cnt, which is buried in the 
      variable-length coded slice header.  

   o The C bit MUST be set to 1 if the target NAL units belong to an 
   access unit for which the view components are intra coded.  
   Otherwise, the C bit MUST be set to 0.  The C bit SHOULD be identical 
   for all the PACSI NAL units for which the target NAL units belong to 
   the same access unit. 

      Informative note: The C bit indicates whether the packet contains 
      intra slices which may be the only packets to be forwarded for a 
      fast forward playback, e.g. when the network condition is 
      extremely bad. 

   o The S bit MUST be set to 1, if the first VCL NAL unit, in 
   transmission order, of the view component containing the first NAL 
   unit following the PACSI NAL unit in the aggregation packet is 
   present in the aggregation packet.  Otherwise, the S bit MUST be set 
   to 0. 

   o The E bit MUST be set to 1, if the last VCL NAL unit, in 
   transmission order, of the view component containing the first NAL 
   unit following the PACSI NAL unit in the aggregation packet is 
   present in the aggregation packet.  Otherwise, the E field MUST be 
   set to 0. 

      Informative note: The S or E bit indicates whether the first or 
      last slice, in transmission order, of a view component is in the 
      packet, to enable a MANE to detect slice loss and take proper 
      action such as requesting a retransmission as soon as possible, 
      as well as to allow an efficient playout buffer handling 
      similarly as the M bit in the RTP header.  The M bit in the RTP 
      header still indicates the end of an access unit, not the end of 
      a view component. 


 
 
Wang et al            Expires February 21, 2009               [Page 15] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   o The RRR field MUST be set to '00000000'(in binary form).  Receivers 
   SHOULD ignore the value of RRR. 

   o When present, the field DONC indicates the CL-DON value for the 
   first NAL unit in the STAP-A in transmission order. 

   SEI NAL units included in the PACSI NAL unit, if any, MUST contain a 
   subset of the SEI messages associated with the access unit of the 
   first NAL unit following the PACSI NAL unit within the aggregation 
   packet. 

      Informative note: Senders may repeat such SEI NAL units in the 
      PACSI NAL unit the presence of which in more than one packet is 
      essential for packet loss robustness.  Receivers may use the 
      repeated SEI messages in place of missing SEI messages. 

   An SEI message SHOULD NOT be included in a PACSI NAL unit and 
   included in one of the remaining NAL units contained in the same 
   aggregation packet. 

6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs) 

   This section will address the issues of section 4.7.1 of [I-D.draft-
   ietf-avt-rtp-svc]. 

6.10. Cross-Session DON (CS-DON) for multi-session transmission 

   This section will address the issues of section 4.11 of [I-D.draft-
   ietf-avt-rtp-svc]. 

7. Packetization Rules 

   [Ed.Note(TS): We need to adjust this section with respect to [I-
   D.draft-ietf-avt-rtp-svc].] 

   Section 6 of [RFC3984] applies.  The following rules apply in 
   addition.  

   All receivers MUST support the single NAL unit packetization mode to 
   provide backward compatibility to endpoints supporting only the 
   single NAL unit mode of RFC 3984.  However, the single NAL unit 
   packetization mode SHOULD NOT be used whenever possible, because 
   encapsulating NAL units of small sizes, e.g. small NAL units 
   containing parameter sets, SEI messages or prefix NAL units, in their 
   own packets is typically less efficient because of the relatively big 
   overhead. 

 
 
Wang et al            Expires February 21, 2009               [Page 16] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   All receivers MUST support the non-interleaved packetization mode. 

      Informative note: The non-interleaved mode allows an application 
      to encapsulate a single NAL unit in a single RTP packet.  
      Historically, the single NAL unit mode has been included into 
      [RFC3984] only for compatibility with ITU-T Rec. H.241 Annex A 
      [H.241].  There is no point in carrying this historic ballast 
      towards a new application space such as the one provided with 
      MVC.  More technically speaking, the implementation complexity 
      increase for providing the additional mechanisms of the non-
      interleaved mode (namely STAP-A and FU-A) is minor, and the 
      benefits are great, that STAP-A implementation is required. 

   A NAL unit of small size SHOULD be encapsulated in an aggregation 
   packet together with one or more other NAL units. For example, non-
   VCL NAL units such as access unit delimiter, parameter set, or SEI 
   NAL unit are typically small. 

   A prefix NAL unit SHOULD be aggregated to the same packet as the 
   associated NAL unit following the prefix NAL unit in decoding order. 

   When the first aggregation unit of an aggregation packet contains a 
   PACSI NAL unit, there MUST be at least one additional aggregation 
   unit present in the same packet. 

   When an MVC bitstream is transported in more than one RTP session, 
   the following applies. 

   o Interleaved mode SHOULD be used for all the RTP sessions. 

   o An RTP session that does not use interleaved mode SHOULD be 
   constrained as follows. 

     - Non-interleaved mode MUST be used. 

     - STAP-A MUST be used, and any other type of packets MUST NOT be 
   used.  

     - Each STAP-A MUST contain a PACSI NAL unit and the DONC field MUST 
   be present in the PACSI NAL unit.  

      Informative note: The motivation for these constraints is to 
      allow the use of non-interleaved mode for the session conveying 
      the H.264/AVC compatible view, such that RFC 3984 receivers 
      without interleaved mode implementation can subscribe to the base 
      view session. 

 
 
Wang et al            Expires February 21, 2009               [Page 17] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   Non-VCL NAL units SHOULD be conveyed in the same session as the 
   associated VCL NAL units.  To meet this, SEI messages that are 
   contained in scalable nesting SEI message and are applicable to more 
   than one session SHOULD be separated and contained into multiple 
   scalable nesting SEI messages.  The DON values MUST indicate the 
   cross-layer decoding order number values as if all these SEI messages 
   were in separate scalable nesting SEI messages and contained in the 
   beginning of the corresponding access units as specified in [MVC].  

8. De-Packetization Process (Informative) 

   For a single RTP session, the de-packetization process specified in 
   section 7 of [RFC3984] applies. 

   For receiving more than one of multiple RTP sessions conveying a 
   scalable bitstream, an example of a suitable implementation of the 
   de-packetization process is to be specified similarly as what will be 
   finally included in [I-D.draft-ietf-avt-svc]. 

9. Payload Format Parameters 

   This section specifies the parameters that MAY be used to select 
   optional features of the payload format and certain features of the 
   bitstream.  The parameters are specified here as part of the media 
   type registration for the MVC codec.  A mapping of the parameters 
   into the Session Description Protocol (SDP) [RFC4566] is also 
   provided for applications that use SDP.  Equivalent parameters could 
   be defined elsewhere for use with control protocols that do not use 
   SDP. 

9.1. Media Type Registration 

   The media subtype for the MVC codec is allocated from the IETF tree. 

   The receiver MUST ignore any unspecified parameter. 

      Informative note: Requiring ignoring unspecified parameter allows 
      for backward compatibility of future extensions.  For example, if 
      a future specification that is backward compatible to this 
      specification specifies some new parameters, then a receiver 
      according to this specification is capable of receiving data per 
      the new payload but ignoring those parameters newly specified in 
      the new payload specification.  This sentence is also present in 
      RFC 3984.  

   Media Type name:     video 

 
 
Wang et al            Expires February 21, 2009               [Page 18] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   Media subtype name:  H264-MVC 

   The media subtype "H264" MUST be used for RTP streams using RFC 3984, 
   i.e. not using any of the new features introduced by this 
   specification compared to RFC 3984.  For RTP streams using any of the 
   new features introduced by this specification compared to RFC 3984, 
   the media subtype "H264-MVC" SHOULD be used, and the media subtype 
   "H264" MAY be used.  Use of the media subtype "H264" for RTP streams 
   using the new features allows for RFC 3984 receivers to negotiate and 
   receive H.264/AVC or MVC streams packetized according to this 
   specification, but to ignore media parameters and NAL unit types it 
   does not recognize.  

   Required parameters: none 

   OPTIONAL parameters: to be specified.  

   Encoding considerations: 

       This type is only defined for transfer via RTP (RFC 3550). 

   Security considerations: 

       See section 10 of RFC XXXX. 

   Public specification: 

       Please refer to RFC XXXX and its section 14. 

   Additional information: none 

   File extensions: none 

   Macintosh file type code: none 

   Object identifier or OID: none 

   Person & email address to contact for further information: 

   Intended usage: COMMON 

   Author: NN 

   Change controller: 

       IETF Audio/Video Transport working group delegated from the IESG. 

 
 
Wang et al            Expires February 21, 2009               [Page 19] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

9.2. SDP Parameters 

9.2.1. Mapping of Payload Type Parameters to SDP 

   The media type video/H264-MVC string is mapped to fields in the 
   Session Description Protocol (SDP) as follows: 

   The media name in the "m=" line of SDP MUST be video. 

   The encoding name in the "a=rtpmap" line of SDP MUST be H264-MVC (the   
   media subtype). 

   The clock rate in the "a=rtpmap" line MUST be 90000. 

   The OPTIONAL parameters, when present, MUST be included in the 
   "a=fmtp" line of SDP.  These parameters are expressed as a media type 
   string, in the form of a semicolon separated list of parameter=value 
   pairs. 

9.2.2. Usage with the SDP Offer/Answer Model 

   TBD.  

9.2.3. Usage with multi-session transmission  

   If multi-session transmission is used, the rules on signaling media 
   decoding dependency in SDP as defined in 
   [I-D.draft-ietf-mmusic-decoding-dependency] apply. 

9.2.4. Usage in Declarative Session Descriptions 

   TBD.  

9.3. Examples 

   TBD.  

9.4. Parameter Set Considerations 

   Please see section 10 of [RFC3984]. 

10. Security Considerations 

   Please see section 11 of [RFC3984]. 



 
 
Wang et al            Expires February 21, 2009               [Page 20] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

11. Congestion Control 

   TBD. 

12. IANA Considerations 

   Request for media type registration to be added. 

13. Acknowledgments 

   TBD. 

   This document was prepared using 2-Word-v2.0.template.dot. 

14. References 

14.1. Normative References 

   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for 
             generic audiovisual services", 3rd Edition, November 2007. 

   [I-D.draft-ietf-avt-rtp-svc] Wenger, S., Wang, Y. -K., Schierl, T. 
             and A. Eleftheriadis, "RTP payload format for SVC video", 
             draft-ietf-avt-rtp-svc-13 (work in progress), July 2008. 

   [I-D.draft-ietf-mmusic-decoding-dependency] Schierl, T., and Wenger, 
             S., "Signaling media decoding dependency in Session 
             Description Protocol (SDP)", draft-ietf-mmusic-decoding-
             dependency-02 (work in progress), May 2008. 

   [MPEG4-10] 
             ISO/IEC International Standard 14496-10:2005. 

   [MVC]     Joint Video Team, "Joint Draft 7 of MVC ", available from 
             http://ftp3.itu.ch/av-arch/jvt-site/2008_04_Geneva/JVT-
             AA209.zip, Geneva, Switzerland, April 2008. 

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 
             Requirement Levels", BCP 14, RFC 2119, March 1997. 

   [RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data 
             Encodings", RFC 3548, July 2003.  

   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, 
             V., "RTP: A Transport Protocol for Real-Time Applications", 
             STD 64, RFC 3550, July 2003. 

 
 
Wang et al            Expires February 21, 2009               [Page 21] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   [RFC3984] Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund, 
             M., and Singer, D., "RTP Payload Format for H.264 Video", 
             RFC 3984, February 2005. 

   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 
             Description Protocol", RFC 4566, July 2006. 

14.2. Informative References 

   [DVB-H]   DVB - Digital Video Broadcasting (DVB); DVB-H 
             Implementation Guidelines, ETSI TR 102 377, 2005.  

   [H.241]   ITU-T Rec. H.241, "Extended video procedures and control 
             signals for H.300-series terminals", May 2006. 

   [IGMP]    Cain, B., Deering S., Kovenlas, I., Fenner, B., and 
             Thyagarajan, A., "Internet Group Management Protocol, 
             Version 3", RFC 3376, October 2002.  

   [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver-
             driven layered multicast", in Proc. of ACM SIGCOMM'96, 
             pages 117--130, Stanford, CA, August 1996.  

   [MBMS]    3GPP - Technical Specification Group Services and System 
             Aspects; Multimedia Broadcast/Multicast Service (MBMS); 
             Protocols and codecs (Release 6), December 2005. 

   [MPEG2]   ISO/IEC International Standard 13818-2:1993. 

   [RFC3450] Luby, M., Gemmell, J., Vicisano, L., Rizzo, L., and 
             Crowcroft, J., "Asynchronous layered coding (ALC) protocol 
             instantiation", RFC 3450, December 2002. 

Author's Addresses 

   Ye-Kui Wang 
   Nokia Research Center 
   P.O. Box 100 
   33721 Tampere 
   Finland 
       
   Phone: +358-50-466-7004 
   EMail: ye-kui.wang@nokia.com 
    



 
 
Wang et al            Expires February 21, 2009               [Page 22] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   Thomas Schierl 
   Fraunhofer HHI 
   Einsteinufer 37 
   D-10587 Berlin 
   Germany 
       
   Phone: +49-30-31002-227 
   EMail: schierl@hhi.fhg.de 
    

Intellectual Property Statement 

   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
   might or might not be available; nor does it represent that it has 
   made any independent effort to identify any such rights.  Information 
   on the procedures with respect to rights in RFC documents can be 
   found in BCP 78 and BCP 79. 

   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use of 
   such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository at 
   http://www.ietf.org/ipr. 

   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 
   this standard.  Please address the information to the IETF at 
   ietf-ipr@ietf.org. 

Disclaimer of Validity 

   This document and the information contained herein are provided on an 
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 

Copyright Statement 

   Copyright (C) The IETF Trust (2008). 
 
 
Wang et al            Expires February 21, 2009               [Page 23] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   This document is subject to the rights, licenses and restrictions 
   contained in BCP 78, and except as set forth therein, the authors 
   retain all their rights. 

Acknowledgment 

   Funding for the RFC Editor function is currently provided by the 
   Internet Society.  Further, the author Thomas Schierl of Fraunhofer 
   HHI is sponsored by the European Commission under the contract number 
   FP7-ICT-214063, project SEA. 

15. Open issues: 

   -  The use of CL-DON for session reordering allows also for 
     interleaved transmission with non-interleaved packetization mode.  
     There should be a clear separation between both tools.  This issue 
     should be handled the same way as for the SVC payload draft. 

   -  Since SVC session multiplexing (multi source transmission(MST)) is 
     cleared, it would be great to just reference the MST sections in 
     [I-D.draft-ietf-avt-rtp-svc]. Since the text in sections 6 and 7 
     of [I-D.draft-ietf-avt-rtp-svc] is currently very SVC specific, 
     the authors would have to try to rewrite these sections in a more 
     generic way. If this is not possible, we need to copy text from 
     [I-D.draft-ietf-avt-rtp-svc] with respect to MVC. 

 

16. Changes Log 

   Initial version 00 

      10 November 2007: YkW 
         Initial version 

      12 November 2007: TS 
         - Added definition of "Session multiplexing" 
         - Added the reference of [I-D.draft-ietf-mmusic-decoding-
   dependency], and its reference in section 9.2.3 

      12 November 2007: YkW 
         - Added the reference of [I-D.draft-ietf-avt-svc] and its 
   reference in section 1. 
         - Added in sections 3.1 and 3.2 paragraphs regarding inter-view 
   prediction 
          

 
 
Wang et al            Expires February 21, 2009               [Page 24] 

Internet-Draft     RTP Payload Format for MVC Video         August 2008 
    

   From draft-wang-avt-rtp-mvc-00 to draft-wang-avt-rtp-mvc-01 

      18 February 2008: YkW 
         - Alignment to the latest MVC draft in JVT-Z209 and version 07 
   of [I-D.draft-ietf-avt-svc].  

      25 February 2008: TS 

   -  Minor modifications and updates throughout the document 

   -  Added open issue on clear separation between "decoding order 
     recovery" and "interleaving" 

   From draft-wang-avt-rtp-mvc-01 to draft-wang-avt-rtp-mvc-02 

      09 July 2008: TS 

   -  Minor modifications and updates throughout the document 

   -  Added open issue  

   -  NAL unit header alignment with MVC spec 

   -  Section 6. References corresponding sections in [RFC3984] and [I-
     D.draft-ietf-avt-svc]. 

   -  TBD: Section 7, we may align [I-D.draft-ietf-avt-svc] in a way 
     that SVC is not mentioned in this paragraphs, so that we can 
     reference them from this document. 

     21 August 2008: 

   -  Minor modifications, editing and adding notes throughout the 
     document. 

   -  Updated references 











 
 
Wang et al            Expires February 21, 2009               [Page 25]