Your task is to evaluate the subjective quality of the speech from short (4-5 second) audio files. Each HIT can be completed in 90 seconds.
We have methods that analyze the consistency of your answers with respect to themselves, to those of your fellow workers and to references we know to be accurate. We will use these methods to rank the submitted assignments according to quality.
For this experiment we will pay a base reward of $0.05/HIT for every accepted HIT. We have made available a set of 36 different HITs. You will receive a bonus of:
Bonuses will be paid up to 7 days after submission, because we can only rank the submissions once we have a statistically significant number of answers. The base reward will always be paid within 24 hours of submission.
You will hear samples of computer generated speech created using different methods. The purpose of this test is to evaluate the quality of each file, so that we (the researchers) can compare the methods and know which ones sound better to a general audience.
Each file should be given a score according to the following scale, known as the MOS (mean opinion score) scale for naturalness:
Score | Quality of the Speech | Naturalness |
5 | Excellent | Completely natural |
4 | Good | Mostly natural |
3 | Fair | Equally natural and unnatural |
2 | Poor | Mostly unnatural |
1 | Bad | Completely unnatural |
The following references illustrate the meaning of each score. Please note that you will encounter many other types of distorted or unnatural speech. Therefore, these examples do not exhaust the range of conditions you can expect to hear.
The following represents clean speech from a human male, and is given a reference score of 5.0.
The following is synthetized speech with a reference score of 4.0.
This file has synthesized speech with a reference score of 3.0.
This file has synthesized speech with a reference score of 2.0.
Finally, this has synthesized speech with a reference score of 1.3.
Please keep in mind that speech can be unnatural in many ways, and these are only specific examples.
To obtain accurate results, we strongly recommend that you wear headphones and work in a quiet environment, otherwise you might not be able to discriminate between files with clearly different features. Our experience shows that it is very difficult to land in the top 50% or top 10% and get a bonus for quality without wearing headphones.
Your results will be collected and evaluated for consistency. We (the requesters) have an estimate of each file's subjective quality that conforms with the references above. Thus, we can detect if someone submits random scores or does not rate according to these instructions, which can lead to work being rejected. You can rest assured that your work will be approved if you rate according to the instructions above.
Answers will be either reviewed or automatically approved within 24 hours.