Reliability means Trustworthy. A test score is called reliable when we have reasons for believing the test score to be stable and objective.

      According to Merriam Webster Dictionary:  “Reliability is the extent to which an experiment, test, or measuring procedure yields the same results on repeated trials.”

      According to Hopkins & Antes (2000):  “Reliability is the consistency of observations yielded over repeated recordings either for one subject or a set of subjects.”

Types of Reliability

     Inter-Rater or Inter-Observer Reliability

Inter-rater reliability by considering the similarity of the scores awarded by the two observers. There are two major ways to actually estimate inter- rater reliability. If your measurement consists of categories -- the raters are checking off which category each observation falls in -- you can calculate the percent of agreement between the raters. The other major way to estimate inter-rater reliability is appropriate when the measures is a continuous one and calculate the correlation between the ratings of the two observers that gives an estimate of the reliability or consistency between the raters.

     Test-Retest Reliability

Test-retest is a statistical method used to determine a test’s reliability. It is used to judge the consistency of results across items on the same test. We estimate test-retest reliability when we administer the same test to the same sample on two different occasions. This approach assumes that there is no substantial change in the construct being measured between the two occasions.

     Parallel-Form Reliability

In parallel form reliability we have to create two different tests from the same contents to measure the same learning outcomes. The easiest way to accomplish this is to write a large set of questions that address the same contents and then randomly divide the questions into two sets. The correlation between the two parallel forms is the estimate of reliability.

     Internal Consistency Reliability

The test is administered to a group of students on one occasion to estimate reliability. Reliability of the instrument is judged by estimating how well the items that reflect the same content give similar results. Correlation between one item with whole test item.

     Split half Reliability

In split-half reliability we randomly divide all items that claim to measure the same contents into two sets. The split-half reliability estimate is simply the correlation between two total scores. This method has the advantage that only one test administration is required, and therefore memory and the practice and maturation effects are not involved.

     Kuder-Richardson Reliability

These measures to extent to which items within one form of the test have as much in common with one another as do the items in that one form with corresponding items in an equivalent form.  It refers to how consistent the results from the test are, or how well the test is actually measuring what you want it to measure.

Factors affecting Reliability

      Test Length

Adding more equivalent questions to a test will increase the test's reliability

      Method Used to Estimate Reliability

The reliability coefficient is an estimate that can change depending on the method used to calculate it


A test that is too difficult or too easy reduces the reliability.  A moderate level of difficulty increases test reliability.

      Errors that Can Increase or Decrease Individual  Scores

There might be some errors committed by the test developers that also affect the reliability of the tests developed by teachers. These errors initially affect the students’ scores, mean deviate the scores from the true ability of the students, and therefore affect the reliability.

   For more details download PPT