Section 5: General and Vocational Qualifications – Using Computer Adaptive tests

This last section of the consultation seeks your views on the practicalities of developing a computer adaptive test with special attention given to methods used to mitigate test and question biases that impact members of different population subgroups. Your views are sought through a brief questionnaire.

A computer adaptive test (CAT) is an assessment administered on a computer that adapts the difficulty level of each question or item to the ability level of the candidate. In computer adaptive testing, the difficulty level of the test items is determined by the ratio of the number of past candidates who answered the item incorrectly to the total number of candidates who viewed the item. An item that many candidates get incorrect is determined to be difficult. An item that many candidates get correct is determined to be easy.

A candidate who answers correctly items that many candidates answer incorrectly will get a higher score than a candidate who answers items that nearly all candidates answer correctly. While this may seem reasonable, it is a departure from the practice of using subject-matter experts to determine the difficulty level of an item. Using computer adaptive testing models, there is no subjective measure of an items difficulty. Difficulty is strictly a statistical parameter.

If you would like to read more about this subject before completing the brief questionnaire, please look at the Ofqual commissioned report by language expert Michael Birdsall.

The questionnaire can be completed online or is at Annex 4 of the PDF version of this consultation.

RSS feed of comments 2 Responses to “Section 5: General and Vocational Qualifications – Using Computer Adaptive tests”

  1. Paul Simpson says:

    A BATOD member from Northern Ireland writes:

    I have a real concern re Computer Adaptive Tests if they are using voice (voiceover) they will not be suitable for deaf candidates. Also computerised timed tests in English/literacy or in Mathematics will penalise deaf and dyslexic candidates unless there is some way to incorporate some extra time into the actual timings of the individual test items. In Northern Ireland where INcas are compulsory for all children in schools in P4 – P7 (ie Years 3 to 6) these computerised assessments are proving very difficult for dyslexic and deaf children.

    The dyslexic children get timed out by the non- words section and cannot get on to the comprehensions which they could well do. The deaf children have nothing to lip read and so the test for them becomes a hearing test and they certainly do not have equal access! Even without a disembodied voice, a straight reading test will penalise and be more inaccessible for the reader who needs more time to process the language (ie the deaf and they dyslexic alike and presumably other groups of candidates). There is a huge misunderstanding around mathematics if they are just “sums” (fractions, algebra, geometry etc) that there is no language involved! On the contrary the brain has to process the sum (adding, multiplying etc ) using language and so extra time would also be needs for such tasks. This surely is a requirement in order to achieve fair access? With a computerised task how could that extra time be incorporated into an already timed test which just runs and then stops?

    Like or Dislike: Thumb up 0 Thumb down 0

  2. Michael says:

    Hi Paul,
    I think you bring up a number of important issues.

    1) Are voice recordings suitable for deaf candidates?
    2) Can extra time be incorporated into CATs?
    3) What to do in light of anecdotal evidence that suggests CATs are already more difficult for dyslexic and deaf students.

    I think the first question and the third question suggest the same answer: DIF analysis prior to administration of items. Regardless of whether this is done on a CAT or paper based exam, pretesting items before administration minimizes the likelihood that an item will have an unintended affect on population subgroups. I am curious, what subject area is being tested that requires a voice recording? Foreign language? English as a Second Language? In either of those cases, for a deaf candidate, the solution may be offering video.

    Regarding extra time on CATs, there are several methods to allow extra time.

    1) A CAT can be designed to have a different ending criteria. For example, as the items are administered a CAT collects information about a students ability level. Once the CAT has collected enough information to be statistically reliable, the exam would terminate. The NCLEX, a nursing exam in the United States, is administered with this termination criterion.

    2) Items of equal difficulty but that require less time to complete could be administered on an exam if a student has fallen behind or as an accommodation. This is possible only if items have been pretested. The benefit of this option is that the student wouldn’t even know they were receiving the accommodation and would not be administered “easier” items. As a result the student would not feel singled out and the exam results would be comparable to other students without the need for statistical manipulation.

    “The dyslexic children get timed out by the non- words section and cannot get on to the comprehensions which they could well do”

    This is particularly troubling and sounds like a flaw in test construction. It sounds as though access to one content type is dependent on a students ability to perform on a separate content type. Not only is this unfair, it creates a statistical nightmare on the exam by introducing a dependent probability relationship between items. For example, if the reading section were to precede the non-words section would the same results be obtained by all student? In general, this is compensated for by introducing content balancing that ensures different content types are proportionally administered throughout the exam.

    I think the most important next step is gaining the statistical data on DIF. With the data is easier to make a case for specific changes, whether those changes be in accommodations offered or in something more comprehensive.

    Like or Dislike: Thumb up 0 Thumb down 0

Credits