The P.861 (02/98) disk is labelled as 'C implementation of ...', but only contains reference vectors. My experience from implementing P.861 is that it contains a couple of vagueries that make it difficult to implement. I was able to get the short-term verification tests work, but not the longer term ones. I failed to connect with anyone in the ITU capable of providing clarification.
I have an implementation of Stephen Voran's MNB algorithm http://www.cs.ucl.ac.uk/staff/O.Hodson/misc/mnb-0.1e.tar.gz. This algorithm, it's callibration, and performance, are described in two papers by Voran in IEEE Trans. speech and audio processing, Vol7, No4, July, 1999. You can modify this source to form P861-appendix2-mnb without too much effort. The implemention builds cleanly for Solaris, FreeBSD, Irix, Linux, porting to other platforms should be trivial. Commercially interested parties should contact Stephen Voran if they want to use or implement this algorithm.
Wonho Yang, Majid Benbouchta and Rober Yantorno have an algorithm Modified Bark Spectal Distortion (MBSD) and will provide source code if you ask (firstname.lastname@example.org,email@example.com). The algorithm is described in ICASSP98, Vol1, pp541-544, Seattle, 1998.
A problem in most of these algorithm's is thresholding. They involve a thresholding step of the reference and modified signal. Only frames that have energy above the threshold's are included in the final score computation. This is a problem if you have packet losses and are using silence substitution. The silence-substituted frames do not get included in the calculation, but are perceptually noticeable. An algorithm that takes this into account is described in 'Advances in objective estimation of perceived speech quality', Stephen Voran, Proc. 1999 IEEE Workshop on Speech Coding for Telecommunications, Porvoo, Finland, June 1999.
... developer of PAMS and co-developer of PESQ (P.862) I hope I can comment.
Beware of using PSQM or MNB straight from P.861. These algorithms don't include time alignment, and PSQM does not include level alignment. Neither PSQM nor MNB take account of filtering that may occur in many systems/networks. For these reasons they are very inaccurate in many testing applications.
To fix these problems a much more capable algorithm, PESQ (perceptual evaluation of speech quality) is expected to become a new ITU recommendation P.862 some time early in 2001, at which point P.861 will be withdrawn. PESQ is the result of a long competition and collaboration in the ITU, and has been found to perform well at predicting subjective quality in a very wide range of applications. It is also much better than PSQM and MNB, even for working with speech codecs that have none of the time and level alignment or filtering issues.
Like PESQ, PAMS includes time and level alignment and (from release 3) ability to deal with filtering, and is suitable for use with a wide range of telephony codecs and networks. PAMS has been commercially available since 1998 and is now in wide use in areas such as VoIP testing.
Unfortunately, as P.862 is only a draft ITU recommendation, PESQ is not yet publicly available. The source code is available to ITU members only, and even then under a restrictive license. So for PESQ I am afraid that you will have to wait some months to obtain the first commercial implementations, likely to be available for purchase in Q4 of this year, or until about February 2001 before you can buy the ITU recommendation (assuming it is approved), which will have C source code attached.
If you can't wait this long, I can only suggest that you buy PAMS from one of the companies listed on my website For references on PAMS and PESQ, see my paper list. e-mail me if you would like a copy of any of these papers.
From William D. Voiers, Diagnostic Evaluation of Speech Quality, in Speech Intelligibility and Recognition, Mones E. Hawley, ed., Dowden, Hutchingon & Ross, Inc. (Stroudsburg, PA), 1977.
List of words