Instructions for speech quality evaluation

Introduction

Your task is to evaluate the subjective quality of the speech from short (4-5 second) audio files. Each HIT can be completed in 90 seconds.

Payment

We have methods that analyze the consistency of your answers with respect to themselves, to those of your fellow workers and to references we know to be accurate. We will use these methods to rank the submitted assignments according to quality.

For this experiment we will pay a base reward of $0.05/HIT for every accepted HIT. We have made available a set of 36 different HITs. You will receive a bonus of:

Bonuses will be paid up to 7 days after submission, because we can only rank the submissions once we have a statistically significant number of answers. The base reward will always be paid within 24 hours of submission.

Instructions

You will hear samples of computer generated speech created using different methods. The purpose of this test is to evaluate the quality of each file, so that we (the researchers) can compare the methods and know which ones sound better to a general audience.

Each file should be given a score according to the following scale, known as the MOS (mean opinion score) scale for naturalness:

Score Quality of the Speech Naturalness
5 Excellent Completely natural
4 Good Mostly natural
3 Fair Equally natural and unnatural
2 Poor Mostly unnatural
1 Bad Completely unnatural

The following references illustrate the meaning of each score. Please note that you will encounter many other types of distorted or unnatural speech. Therefore, these examples do not exhaust the range of conditions you can expect to hear.

Examples

The following represents clean speech from a human male, and is given a reference score of 5.0.

The following is synthetized speech with a reference score of 4.0.

This file has synthesized speech with a reference score of 3.0.

This file has synthesized speech with a reference score of 2.0.

Finally, this has synthesized speech with a reference score of 1.3.

Please keep in mind that speech can be unnatural in many ways, and these are only specific examples.

Approval/Rejection Policy

To obtain accurate results, we strongly recommend that you wear headphones and work in a quiet environment, otherwise you might not be able to discriminate between files with clearly different features. Our experience shows that it is very difficult to land in the top 50% or top 10% and get a bonus for quality without wearing headphones.

Your results will be collected and evaluated for consistency. We (the requesters) have an estimate of each file's subjective quality that conforms with the references above. Thus, we can detect if someone submits random scores or does not rate according to these instructions, which can lead to work being rejected. You can rest assured that your work will be approved if you rate according to the instructions above.

Answers will be either reviewed or automatically approved within 24 hours.