Proposal 6: Data Collection and Analyses

Develop a format for Awarding Organisations (AOs) to collect and analyse data on previous rounds of assessments. Encourage AOs to use known item response data to inform test developers/senior examiners at the next round of qualification design and preparation.

Language accessibility for whom

There should be a clear sense of which test taking groups are affected due to some tests not being written in accessible language. Research evidence needs to document which groups are affected so that appropriate language modifications can be made to the test tasks, test administration and/or test responses.

Addressing this research topic will help focus on examining language accessibility for these test taking groups.

Differential Item Functioning (DIF) analysis of test performance

DIF analysis is performed on test performance data of sub-groups to examine their data. Such analysis is expected to show whether test takers from different sub-groups of comparable ability levels (by total score of the test) perform differently on certain test items or tasks (or, in other words, whether the items function differentially).

Empirical research on interaction hypothesis

The interaction hypothesis is the assumption that test accommodations or modifications will improve test scores for the test takers who need the accommodation but not for those test takers who do not need the accommodation.

Empirical research data needs to be collected from test-taking sub-groups in terms of how they performed with and without the test accommodation in order to undertake the required analyses.

Empirical research on the effects of language modifications on test taking group performance

Specific modifications should be examined in terms of whether the test taking groups performed better with modifications or not. Empirical research on the effects of different language or test modifications for the test taking groups can examine:

  • if the modifications helped the test-taking groups to gain higher scores
  • if the general candidate population benefit from better-designed assessment.
Validation studies across sub-groups

The main question that needs to be addressed here is whether the scores received through accommodated or modified tests are equivalent in terms of meaning and interpretation to scores received through non-accommodated or modified tests. Examining score comparability between test takers who took an accommodated or modified test and test takers who took a standard unmodified test is not sufficient. Test takers’ scores in both types of administrations need to relate to external criteria (such as other grades, admission test scores, etc.). Further, it is necessary to establish that the test accommodations provided did not change the construct that was being measured

It is proposed to set up a double-blind trial1 of modified and non-modified items using representative candidates from the identified sub-groups. The double blinding would be particularly important due to the effects that examiners could have on the study if they were to announce the “easy” and “hard” versions of the test. Develop an online model which would assign the different versions of the test without teachers being involved.

Responding to Proposal 6

Please complete the response form online. The form is available to print at Annex 1 of the PDF version of this consultation.

  1. Double blind trial refers to an especially rigorous way of conducting an experiment usually on human subjects, in an attempt to eliminate subjective bias on the part of both experimental subjects and the experimenters. In a double-blind experiment, neither the individuals nor the researchers know who belongs to the control group and the experimental group. []

Credits