Evidence of Validity (Technical)

The Question Analysis Report is a comprehensive evaluation interface within OnTarget that provides systematic review and quality assurance for assessment items. This multi-section analysis tool ensures educational content meets technical standards and aligns with Texas Essential Knowledge and Skills (TEKS) requirements.

Header Section

Question Identification: Displays the specific question number being analyzed
Difficulty Metrics: Shows statistical difficulty value (0.91 in example)
Quality Assessment: Displays quality score (0.18 in example) with status indicators (“Outside of Optimum Range” and “Marginal” classifications)

P-Value (Item Difficulty) Calculations

Formula

P-Value = (Number of students answering correctly) / (Total number of students)

Interpretation Scale

P = 0.00 to 0.30: Very difficult items
P = 0.31 to 0.70: Moderate difficulty (optimal range)
P = 0.71 to 1.00: Easy items

Example from Screenshot

The displayed difficulty value of 0.91 indicates:

P = 0.91 = 91% of students answered correctly

This places the item in the “easy” category, explaining why it’s marked as “Outside of Optimum Range.”

Point Biserial Correlation (rpb)

Formula

rpb = (M₁ - M₀) / St × √(p × q)

Where:

M₁ = Mean total score of students who answered the item correctly
M₀ = Mean total score of students who answered the item incorrectly
St = Standard deviation of total test scores
p = Proportion answering correctly (P-value)
q = Proportion answering incorrectly (1 – p)

Alternative Computational Formula

rpb = (∑XY - (∑X)(∑Y)/N) / √[(∑X² - (∑X)²/N)(∑Y² - (∑Y)²/N)]

Where:

X = Item scores (1 for correct, 0 for incorrect)
Y = Total test scores
N = Number of students

Discrimination Index Calculations

Upper-Lower 27% Method

D = (PU - PL)

Where:

PU = Proportion of upper 27% group answering correctly
PL = Proportion of lower 27% group answering correctly

Interpretation Guidelines

D ≥ 0.40: Excellent discrimination
D = 0.30-0.39: Good discrimination
D = 0.20-0.29: Fair discrimination
D < 0.20: Poor discrimination (review needed)

Quality Score Calculation

Composite Quality Formula

Quality Score = w₁(rpb) + w₂(1-|P-0.5|) + w₃(Distractor_Effectiveness)

Where weights (w) are predetermined based on assessment priorities.

Example Analysis

Given the screenshot shows Quality = 0.18 marked as “Marginal”:

P-Value = 0.91 (too easy)
Estimated rpb ≈ 0.15-0.25 (likely low due to ceiling effect)
Quality Score = 0.18 (below acceptable threshold)

Statistical Flags and Recommendations

Item Revision Criteria

P-Value Issues:
- If P > 0.90: Item too easy, consider increasing difficulty
- If P < 0.20: Item too difficult, review content alignment
Point Biserial Issues:
- If rpb < 0.15: Poor discrimination, item may need revision
- If rpb < 0.00: Negative discrimination, item likely flawed
Combined Analysis: Item_Flag = (P < 0.2 OR P > 0.9) AND rpb < 0.15

Distractor Analysis Formula

Distractor Point Biserial

rpb(distractor) = (M₀ - M₁) / St × √(p × q)

Effective distractors should have:

Negative point biserial correlations
Attraction from lower-ability students
rpb values between -0.10 to -0.30

Reliability Impact Calculation

Alpha-if-Item-Deleted

αnew = (k-1)/k × [1 - (∑Si² - Sj²)/(St² - Sj²)]

Where:

k = Number of items
Si² = Variance of item i
Sj² = Variance of item being deleted
St² = Variance of total scores

Implementation in OnTarget System

The Question Analysis Report uses these calculations to automatically flag items requiring review based on:

Statistical Thresholds: Pre-set criteria for P-values and discrimination indices
Quality Composite Scores: Weighted combinations of multiple statistics
Bias Detection: Statistical analysis of differential item functioning across demographic groups
Standards Alignment: Correlation analysis between item performance and learning objectives

These statistical measures ensure that assessment items meet TEA technical standards for validity, reliability, and fairness in educational measurement.

Evidence of Validity

This is a validity review checklist used to systematically evaluate test questions to ensure they’re fair, accurate, and appropriately constructed. Here’s what each section helps you examine:

Standards Alignment

Verifies the question actually measures what it’s supposed to measure
Ensures it aligns with specific learning standards or objectives
Confirms the question matches the intended grade level and content area

State Standards: Does it match your curriculum requirements?
Depth of Knowledge (DOK): Does the cognitive demand match your learning objective?
- DOK 1: Recall facts (identify, list)
- DOK 2: Apply skills/concepts (describe, compare)
- DOK 3: Strategic thinking (analyze, evaluate)
- DOK 4: Extended thinking (synthesize, create)

Example issue: Teaching students to “analyze” but asking them to only “identify” creates a mismatch.

Bias and Sensitivity

Checks for cultural, gender, socioeconomic, or other biases
Ensures the question doesn’t favor certain groups of students
Reviews for potentially offensive or insensitive content
Looks for assumptions about student backgrounds or experiences

This protects all students from unfair disadvantages:

Cultural bias: Avoid references that favor certain backgrounds (e.g., assuming all students know about skiing)
Stereotypes: Don’t reinforce harmful assumptions about groups
Sensitive topics: Avoid emotionally charged content unrelated to learning objectives

Example issue: A math word problem about expensive vacations may disadvantage students from low-income families.

Language and Vocabulary

Evaluates whether the reading level is appropriate
Checks for unnecessarily complex or confusing language
Ensures vocabulary doesn’t create barriers unrelated to the content being tested
Reviews for clear, concise wording

This ensures language doesn’t become a barrier to demonstrating knowledge:

Grade-appropriate vocabulary: Use words students at this level should know
Clear, concise writing: Eliminate unnecessary complexity
Consistent terminology: Don’t switch between “rectangle” and “quadrilateral” randomly
Active voice: “Students conducted the experiment” vs. “The experiment was conducted by students”

Example issue: Using “precipitation” instead of “rain” in a 2nd-grade science question.

Structure and Context

Examines the overall organization and flow of the question
Checks if the context/scenario is realistic and relevant
Ensures the question format supports the learning objective
Reviews for logical progression of information

This ensures the question format supports learning assessment:

Clear instructions: Students understand what they’re being asked to do
Appropriate context: Scenarios are realistic and relevant
No unintended clues: The structure doesn’t give away the answer
Parallel structure: All answer choices follow the same format

Example issue: Making the correct answer noticeably longer than the distractors.

Answer Choices (for multiple choice questions)

Evaluates whether distractors are plausible but clearly incorrect
Checks that the correct answer is definitively right
Ensures choices are similar in length and complexity
Reviews for “all of the above” or “none of the above” issues

This ensures multiple-choice questions work effectively:

Plausible distractors: Wrong answers should reflect common misconceptions
Grade-level appropriate: Distractors based on content students should know
Avoid “gotcha” answers: Don’t trick students who actually know the material
One clearly correct answer: No ambiguity about which choice is right

Example issue: Having distractors like “purple elephant” that no student would reasonably choose.

Visuals

Checks charts, graphs, images, or diagrams for clarity
Ensures visuals support rather than confuse the question
Reviews for appropriate size, resolution, and labeling
Confirms visuals are accessible to all students

This ensures graphics support rather than hinder assessment:

Purpose-driven: Charts/images directly support the question’s objective
Complete information: Everything needed to answer is provided
Clear and legible: Students can easily read and interpret visuals
Accessible: Works for students with visual processing differences

Example issue: Including a blurry graph where students can’t read the numbers on the axes.

How to Use This Checklist

Review each question systematically through all six lenses
Mark “No” for any area that needs improvement
Revise the question before using it with students
Document your changes for future reference
Build a bank of validated questions over time

This process might seem time-consuming initially, but it prevents the much larger problem of making instructional decisions based on invalid assessment data. Start with the most important assessments (unit tests, benchmarks) and gradually work through each question bank.

Here’s why this matters so much

Analyzing assessments for validity is critical for ensuring tests actually do what they’re supposed to do.

Validity ensures the assessment actually measures the intended learning objectives, not unrelated factors like reading ability when testing math concepts, or cultural knowledge when testing science skills. Without validity, one might think a student doesn’t understand fractions when they actually just struggled with complex vocabulary.

Updated on 05/29/2025

Was this article helpful?

Yes No

Need Support?

Can't find the answer you're looking for?

Contact Support

Header Section

P-Value (Item Difficulty) Calculations

Formula

Interpretation Scale

Example from Screenshot

Point Biserial Correlation (rpb)

Formula

Alternative Computational Formula

Discrimination Index Calculations

Upper-Lower 27% Method

Interpretation Guidelines

Quality Score Calculation

Composite Quality Formula

Example Analysis

Statistical Flags and Recommendations

Item Revision Criteria

Distractor Analysis Formula

Distractor Point Biserial

Reliability Impact Calculation

Alpha-if-Item-Deleted

Implementation in OnTarget System

Evidence of Validity

Standards Alignment

Bias and Sensitivity

Language and Vocabulary

Structure and Context

Answer Choices (for multiple choice questions)

Visuals

How to Use This Checklist

Here’s why this matters so much

Was this article helpful?

Related Articles