Because assessment development at Lectica is driven by the principle that the first role of educational assessment should always be learning, we've built an assessment technology that's different from the technologies used in conventional standardized tests. Here's a side-by-side comparison.
Our tests are performance assessments. They are made up of open-ended, written response items without "correct" answers. We're interested in the depth of understanding and level of skill, not correctness. We want to know how test-takers work with their knowledge—which elements of a situation they take into account and the level at which they understand and coordinate those elements. This causes us to focus in on the way students explain or justify their answers.
Multiple choice items with right and wrong answers are still the most common item type in conventional assessments. Test-takers can perform well on these items even when they don't have any real understanding of targeted content. Even the written-response items on conventional standardized assessments are often marked for correctness. In other words, conventional tests primarily focus on the accuracy of students' judgments as opposed to their justifications.
Our tests reward the kind of teaching that builds knowledge and skills that are not only "sticky" and usable, but also integrated into students' existing knowledge networks well enough to provide a solid foundation for future learning.
Conventional assessments focus on correctness. They reward students for the number of procedures, facts, and rules they have learned. They support the kind of teaching that helps students memorize large amounts of information quickly.
Our tests reward and support the kind of teaching that amplifies learning by fostering flow. They do this by helping teachers get the difficulty of learning challenges just right.
Conventional assessments focus on correctness. They reward teaching through transmission by placing an emphasis on the quantity of material covered.
We replace one or two high-stakes summative tests with several low-stakes formative tests that work together to tell a rich (summative) story about student learning. Scores are calibrated to the Lectical Scale and associated with learning resources and suggestions that are tailored to students' specific needs.
Summative tests are designed to provide a sense of how much students' test scores have improved in a broad subject area during a specified period of time. They yield a score that can be used to rank students and to identify general areas of weakness or strength. They do not provide readily actionable information about how to overcome weaknesses.
All of our assessments are calibrated to a well-validated, theory-based, domain-general scale called the Lectical® Scale. Scores are a measure of the level of skill and understanding.
The scores students receive on conventional assessments are arbitrary. Scores are determined psychometrically, based on how test-takers perform on items—their relative difficulty.
The way to get a higher score on one of our assessments is to demonstrate deeper understanding. Because we're not looking for specific answers there are many pathways to a given score. Additionally, our technology allows us to cover a full range of disciplines and big ideas, supporting a diversity of curricula. So, although our assessments are standardized, they won't homogenize education.
The way to get a higher score on a conventional standardized test is to get more right answers. And the standardized tests that matter most tend to cover a narrow range of subject matter—generally science, math, and literacy. There have even been efforts to mandate that all students take the same exact tests, stifling diversity and contributing to the large-scale homogenation of education.
To build an assessment, we work with educators to decide which constructs the assessment will target. Then, we conduct extensive primary research into how these constructs develop relative to a theoretically derived developmental scale (usually from grade 1 to grade 12 or higher). We then use these results to (1) calibrate our electronic scoring system, (2) describe how understanding of and skill for working with each construct develops over time, and (3) design learning activities tailored for learners performing in each level on the developmental scale.
To build an assessment, they create a set of items based on expert and/or political opinion, then compile the items into a test. They then use IRT to (1) test the items and weed out those that don’t "play well" with other items, and (2) place item difficulties and student performances on the same scale. This scale becomes their "learning scale." Statistics determine how many different levels of knowledge are on the scale, then information about which items students are likely to get right at each level is used to create feedback about what a given student should learn next.
We build items, but it's just a tiny part of what we do. Our major focus is on building the Lectical Dictionary, which houses the growing body of knowledge we need to make richly educational diagnostic assessments and populate test-taker reports with formative feedback and targeted learning activities. The cost of developing new assessments decreases as the Lectical Dictionary grows.
Building knowledge about learning is not a major goal for most assessment developers. Building and storing items—hundreds of them—is. Traditional test developers need to continuously create items because the items must be rotated and eventually retired to prevent cheating. The cost of item development increases as item types become more complex in response to demands for more open-ended, open response questions.
Our approach is to make numerous small subject-specific formative assessments that are woven into the curriculum. Over time, patterns of performance on a diversity of DiscoTests or LectaTests weave a rich story about learning. These patterns of performance become a valuable source of evidence that can inform big decisions.
Most assessment developers make tests that are designed primarily for purposes of accountability, evaluation, or selection. Taking them is often associated with hours of test prep and high levels of anxiety, both of which interfere with learning. Worse, big decisions are made based on little evidence—often only one or two tests each year.
By replacing one or two high-stakes summative tests with many low-stakes direct tests that are all calibrated to the same metric, we make cheating pointless. If a student cheats well enough to get an unusually high score, it will be obvious and have little effect on the interpretation of her growth trajectory.
High-stakes testing encourages cheating by making students, educators, and decision-makers think of academics as a contest in which success equals good test performance. The bar is set so unrealistically for disadvantaged populations or students with less native ability that the only way to "win" is to cheat.