Instructions for speech quality evaluation

Introduction

Your task is to evaluate the subjective quality of the speech from short (2-3 second) audio files. Each HIT can be completed in 90 seconds.

Payment

We have methods that analyze the consistency of your answers with respect to themselves, to those of your fellow workers and to references we know to be accurate. We will use these methods to rank the submitted assignments according to quality.

For this experiment we will pay a base reward of $0.10/HIT for every accepted HIT. We have made available a set of 12 different HITs. You will receive a bonus of:

Bonuses will be paid up to 7 days after submission, because we can only rank the submissions once we have a statistically significant number of answers. The base reward will always be paid within 24 hours of submission.

Instructions

Each file should be given a score according to the following scale, known as the MOS (mean opinion score) scale:

Score Quality of the Speech Level of Distortion
5 Excellent Imperceptible
4 Good Just perceptible, but not annoying
3 Fair Perceptible and slightly annoying
2 Poor Annoying, but not objectionable
1 Bad Very annoying and objectionable

The following references illustrate the meaning of each score. Please note that you will encounter many other types of noise and distortion. Therefore, these examples do not exhaust the range of conditions you can expect to hear.

Reference Samples

The following recording represents clean speech with imperceptible noise or distortion, which is given a reference score of 5.0.

The following represents the best possible quality which can be obtained with a conventional telephone, and has a reference score of 4.5.

This file contains speech corrupted by background noise, and has a reference score of 2.5.

Finally, this is an example of significantly distorted speech, with a reference score of 1.5.

Approval/Rejection Policy

To obtain accurate results, we strongly recommend that you wear headphones and work in a quiet environment, otherwise you might not be able to discriminate between files with clearly different features.

Your results will be collected and evaluated for consistency. We (the requesters) have an estimate of each file's subjective quality that conforms with the references above. Thus, we can detect if someone submits random scores or does not rate according to these instructions, which can lead to work being rejected. You can rest assured that your work will be approved if you rate according to the chart and examples above.

Answers will be either reviewed or automatically approved within 24 hours.