**Summary**

Common Assessment Analysis report is used to assist with determining the reliability and validity of locally developed assessments. Statistical analyses are conducted on the student performance data submitted by the Local Education Agency (LEA) for each item.

**Assessment Reliability Comparison**

Regardless of the subject area, the Common Assessment Analysis Report, can be utilized for any dichotomously scored locally developed assessment.

**Question Difficulty**

One concept of item analysis is concerned with the difficulty of items relative to the population of persons administered the test to assess the item. Item difficulty, simply stated, is the proportion of persons who answered the item correctly which is called the facility of an item and usually denoted by the letter p. The left-hand side depicts the individual p-value of each question. The right-hand side depicts the frequency distribution of the p-values on the entire assessment.

**Question Quality**

The point-biserial correlation describes the relationship between a student’s performance on a multiple-choice or gridded-response item (scored correct or incorrect) and performance on the assessment as a whole. A high point-biserial correlation indicates that students who answered the item correctly tended to score higher on the entire test than those who missed the item.

**Question Analysis**

Validity refers to the extent to which a test measures what it is intended to measure. Validity evidence for an assessment can come from a variety of sources, including test content, response processes, internal structure, relationships with other variables, and analysis of the consequences of testing (2018-2019 Technical Digest). Validity evidence based on test content supports the assumption that the content of the test adequately reflects the intended construct. It is imperative that each test item be reviewed for alignment, appropriateness, adequacy of student preparation, and any potential bias. The statistical analysis for each item should be reviewed and a recommendation should be made on whether the item should be reused as written, revised, re-coded to a different TEKS, or rejected.

**Data Sources**

Common Assessment Data File Upload (required)

Edugence Instructions

DMAC Instructions

Eduphoria Instructions

**Definitions**

- Raw Score: The number of items that a student answers correctly on a given test is known as the raw score.
- Mean: The average calculated by summing all the values in a data set and dividing by the number of cases.
- Standard Deviation: Tells the average distance by which average scores deviate from the mean.
- Student Performance By Demographics Table:
- Student Population
- # Tested: Total number of students included in the report.
- # Failed/Passed (Defined by district, as identified in common assessment data file upload)
- % Passing: (# Passed) / (# Tested)

- Assessment Frequency Distribution:
- x-axis: All possible raw scores on test.
- y-axis: Total # of students achieving each raw score
- Red Plot Line: Average Raw Score achieved on test
- Yellow Plot line: (Average Standard Score) +/- (Standard Deviation of Test)

- Assessment Reliability Comparison:
- STAAR/Common Assessment:
- Reliability Coefficient (Alpha): Measures reliability and internal consistency and tells how closely related a set of test items are as a group. As a general rule, reliability coefficients from 0.70 to 0.79 are considered adequate, those from 0.80 to 0.89 are considered good, and those at 0.90 or above are considered excellent (2018-2019 Technical Digest) Reliability indicates the precision of test scores, which also reflects the consistency of test results across testing conditions. The degree to which results are consistent is assessed using a reliability coefficient.
- Average Raw Score: The average calculated by summing all raw scores and dividing by the total number of students.
- Standard Deviation: Tells the average distance by which average scores deviate from the mean.
- Mean P-Value: Mean of percent correct (0~100%) for the multiple-choice and gridded items only
- Validity – validity refers to the extent to which test scores help educators make appropriate inferences about student achievement.

- STAAR/Common Assessment:
- Question Difficulty (p-Value): The p-Value of an item provides the proportion of students that got the item correct, as a proxy for item difficulty (or more precisely, item easiness). The higher the p-value the easier the item. Low p-values indicate item difficulty.
- Frequency Distribution:
- x-axis: Question difficulty (p-Value)
- y-axis: Item count at each related question difficulty (p-Value) range

- Frequency Distribution:
- Question Quality (point-biserial correlation): The correlation between the right/wrong scores that the students receive on a given item and the total scores that the students receive when summing up their scores across the remaining items. A low point-biserial implies students who got the item incorrect also scored high on the test, while students who got the item correct scored low on the test overall.
- Frequency Distribution
- x-axis: Question Quality (point-biserial correlation)
- y-axis: Item count at each related question quality range

- Frequency Distribution

**Filters**

Single-select:

- Year
- Test Code
- Questions (Question Analysis)
- Order of Questions (Question Analysis)
- Ascending/Descending (Question Analysis)