From: The IESG To: IETF-Announce Message-Id: Date: Fri, 29 Oct 2004 15:22:57 -0400 Cc: avt chair , avt chair , Internet Architecture Board , avt mailing list , RFC Editor Subject: [AVT] Protocol Action: 'RTP Payload Formats for European Telecommunications Standards Institute (ETSI) European Standard ES 202 050, ES 202 211, and ES 202 212 Distributed Speech Recognition Encoding' to Proposed Standard The IESG has approved the following document: - 'RTP Payload Formats for European Telecommunications Standards Institute (ETSI) European Standard ES 202 050, ES 202 211, and ES 202 212 Distributed Speech Recognition Encoding ' as a Proposed Standard This document is the product of the Audio/Video Transport Working Group. The IESG contact persons are Allison Mankin and Jon Peterson. Technical Summary Distributed speech recognition (DSR) technology in this architecture uses remote device acting as a thin client, also known as the front-end, to communicate with a speech recognition server, also called a speech engine, over a network connection, to obtain speech recognition services. More details on DSR over Internet can be found in RFC 3557 To achieve interoperability with different client devices and speech engines, the first ETSI standard DSR front-end ES 201 108 was published in early 2000 and an RTP packetization for ES 201 108 frames is defined in RFC 3557 by IETF. In ES 202 050, ETSI issues another standard for an Advanced DSR front-end that provides substantially improved recognition performance when background noise is present. The codecs in ES 202 050 use a slightly different frame format from those of ES 201 108 and thus the two do not inter-operate with each other. The RTP packetization for ES 202 050 front-end defined in this document uses the same RTP packet format layout as that defined in RFC 3557. The differences are in the DSR codec frame bit definition and the payload type MIME registration. The two further standards, ES 202 211 and ES 202 212, for which this document offers payloads, provide extensions to the each of the DSR front-end standards. These respective extensions allow the speech waveform to be reconstructed for human audition and they can also be used to improve recognition performance for tonal languages. This is done by sending additional pitch and voicing information for each frame along with the recognition features. Working Group Summary The document was sent to the ietf-types list for MIME type review and did not surface any concerns. The DSR issues were reviewed by the SPEECHSC WG at the time of RFC 3557, and the Area Director viewed this document as having no new issues. The working group supported advancing this document. Protocol Quality This document was reviewed for the IESG by Magnus Westerlund and Allison Mankin. RFC Editor Notes Section 4 OLD: Author/Change controller: * Qiaobing.Xie@motorola.com * IETF Audio/Video transport working group NEW: Author: * Qiaobing.Xie@motorola.com Change controller: * IETF Audio/Video transport working group delegated by the IESG Section 5 The following paragraph should be moved out of Section 5, and become Section 4.3 Congestion Control: Congestion control for RTP MUST be used in accordance with RFC 3550 [9], and any applicable RTP profile, e.g. RFC 3551 [10].