Steps in Psychological Test Construction

There are various steps in psychological test Construction. Basically Kaplan (2005) defined psychological test as a set of items that are designed to measure characteristics of human beings that pertain to behavior.

According to American Psychological Association (APA), Psychological Test is any standardized instrument, including scales and self-report inventories, used to measure behavior or mental attributes, such as attitudes, emotional functioning, intelligence and cognitive abilities (reasoning, comprehension, abstraction, etc.), aptitudes, values, interests, and personality characteristics.


General Steps in Test Construction:

There are mainly seven steps in Psychological / Psychometric Test Construction. They are as follows-

  1. Planning the test
  2. Item Writing
  3. Preliminary tryout
  4. Reliability of the test
  5. Validity of the test
  6. Norms of the test
  7. Manual of the test

1. Planning the Test

There needs to be a systematic planning in order to formulate a standardized test.

Its objectives should be carefully defined.

The type  of content should be determined for example using short/long/very short answers or using multiple type questions, etc.

A blue print must be ready with instructions to the method to be used for sampling, making the necessary
requirements for preliminary and final administration.

The length, time for completing the test and number of questions should be fixed.

Detailed and precise instructions should be given for administration of the test and also it’s scoring.

2. Item Writing

This requires a lot of creativity and is dependent on the imagination, expertise and knowledge. Its requirements
• Assembly and arrangement of items in a test must be proper, generally done in ascending order of difficulty.
• Detailed instructions of the objective, time limit and the steps of recording the answers must be given.

Types of items

DeVelli’s (1991) provided several simple guidelines for item writing.

  • Define clearly what you want to measure: as specific as possible.
  • Generate an item pool. Avoid exceptionally long items, which are rarely good.
  • Keep the level of reading difficulty appropriate for those who will complete the scale.
  • Avoid “double-barreled” items that convey two or more ideas at the same time.
  • For example, consider an item that asks the respondent to agree or disagree with the statement, “I vote this party because I support social programs.”
  • Consider mixing positively and negatively worded items. Sometimes, respondents develop the “acquiescence response set.”

Item Formats

  • The dichotomous format.
  • The polytomous format.
  • The Likert format.
  • The category format.
  • Checklists and Q-sorts.

Item Analysis

  • Item Difficulty-
    • It used to measure for knowledge-based tests. like Achievement tests, Ability tests or IQ test.
    • Item difficulty is defined by the number of people who get a particular item correct.
    • For example –
    • If 76% of the students taking a particular test get item no. 24 correct, then the difficulty level for that item is .76.
    • If 40% solved the answer correctly then Item Difficulty Level is .40
    • If 15% solved the answer correctly then Item Difficulty Level is .15
    • Argument – these proportions do not really indicate item “difficulty” but item “easiness.”
    • The higher the proportion of people who get the item correct, the easier the item (Allen & Yen,1979).
  • Item Discrimination-
    • It is used to examine the relationship between performance on particular items and performance on the whole test..
    • Item discriminability determines whether the people who have done well on particular items have also done well on the whole test.
    • There are two methods

    1.The extreme group method- compares people who have done well with those who have done poorly on a test.

    2.The point biserial method

3. Preliminary Tryout

After modifying the items as per the advise of the experts the test can be tried out on experimental basis, which is done to prune out any inadequacy or weakness of the item.

It highlights ambiguous items, irrelevant choices in multiple choice questions, items that are very difficult or easy to answer.

Also the time duration of the test and number of items that are to be kept in the final test can be ascertained, this avoids repetition and vagueness in the instructions. This is done in following three stages:

a) Preliminary try-out – This is performed individually and it helps in improving and modifying the linguistic difficulty and vagueness of items. It is administered to around 100 people and modifications are done after observing the workability of the items.

b) The proper try-out – It is administered to approximately 400 people wherein the sample is kept same as the final intended participants of the test. This test is done to remove the poor or less significant items and choose the good items and includes two activities: Item Analysis

• Item analysis – The difficulty of the test should be moderate with each item discriminating the validity between high and low achievers. Item analysis is the process to judge the quality of an item.

• Post item analysis: The final test is framed by retaining good items that have a balanced level of difficulty and satisfactory discrimination. The blue print is used to guide in selection of number of items and then arranging them as per difficulty. Time limit is set.

C) Final try-out – It is administered on a large sample in order to estimate the reliability and validity. It provides an indication to the effectiveness of the test when the intended sample is subjected to it

4. Reliability of the test

Reliability shows the consistency of test scores. When test is finally composed, the final test is again administered on a fresh sample in order to compute the reliability coefficient. This time also sample should not be less than 100. Reliability is calculated through –

  • Test retest Reliability
  • Odd even Reliability
  • Split half Reliability

5. Validity of the Test

Validity refers to what the test measures and how well it measures. If a test measures a trait that it intends to measure well then the test can be said to be a valid one. It is correlation of test with some outside independent criterion. Validity is checked through-

6. Norm of the Test

Test constructor also prepares norms of the test. Norms are defined as average performance scores. They are prepared to meaningfully interpret the scores obtained on the test. The obtained scores on test themselves convey no meaning regarding the ability or trait being measured. But when these are compared with norms, a meaningful inference can be immediately drawn with following methods.

  • Age N
  • Grade N
  • Percentile
  • Standard score (Z, t, Stenine,)

7. Prepare the manual of the Test

The manual is prepared as the last step and the psychometric properties of the test norms and references are reported. It provides in detail the process to administer the test, its duration and scoring technique. It also contains all instructions for the test.

  • Introduction
  • Test book
  • Answer key
  • Scoring sheet
  • Reliability
  • Validity
  • Norms (5-10/,10-15yrs/15-20 Male- female, demography- urban rural, Indian, USA, Pune)
  • References


Anastasi, A. & Urbina, S. (2009). Psychological testing. N.D.: Pearson Education.
Kaplan R.M.& Saccuzzo D.P.(2005) Psychological Testing,Principles ,Applications and Issues.Sixth Ed. Cengage Learning India, Pvt Ltd.
Singh, A.K. (2006). Tests, Measurements and research methods in behavioural sciences. Patna: Bharati Bhavan.

2 Replies to “Steps in Psychological Test Construction”

Leave a Reply

Your email address will not be published. Required fields are marked *