Evidence base

Articles Reliability & validity Rationale Case studies

Lectical Assessments are designed to leverage the natural learning cycle. When learners take Lectical Assessments, they:

demonstrate the level of their ability to work with real-world issues and problems that pose adaptive challenges;
receive feedback directly on skills targeted by their assessment; and
are provided with skill-specific reflective activities that are tailored to their personal learning needs.

During the last several years, we have worked with numerous clients to study the impact of Lectical Assessments on learning, and there are a number of findings we think you should know about.

The benefits of Lectical Assessments

Formative use increases growth

Using Lectical Assessments as intended—as learning tools—increases learning. On average, when Lectical Assessments have been used as learning tools, they have doubled the amount of growth that usually takes place in an educational program.

Using more increases growth

When Lectical Assessments have been used as learning tools in educational programs, simply increasing the number of Lectical Assessments taken by participants increased learning.

Their feedback supports growth

Educational programs that have incorporated the reflective activities suggested in Lectical Assessment reports—have produced more (and more robust) growth than those that have not.

Informal use supports growth

Simply following the learning suggestions in individual reports and engaging in reflection with peers about one's experience, has been shown to produce as much or more growth than the average formal education program.

They predict change in behavior

The direct reports and peers of leaders who increase their scores on Lectical Assessments have reported improvement in those leaders' performance (relative to leaders whose scores have not increased).

They aid in program development

When Lectical Assessments are used in evaluations of educational programs, they equip organizations with knowledge that can help them increase program impact.

They are remarkably fair

Lectica's breakthrough developmental scoring system, CLAS, has been shown to be free of gender, racial, and ethnic bias. To learn more, see Are Lectical Assessments fair?

They support optimal learning

Our assessment reports help people learn to build knowledge and skills more efficiently and enjoyably while participating in everyday activities.

They have a wide range of applications

Lectical Assessments not only support optimal learning. They can also be used in high-stakes contexts like recruitment & admissions.

Validity

When thinking about the validity of our assessments, we focus primarily upon construct validity**. Construct validity is usually thought of as the degree to which assessments measure what they are designed to measure, but since our assessments do much more than provide a score, we think of construct validity more broadly—as the degree to which our assessments accomplish what they are designed to accomplish. When we design and deliver assessments, we consider (1) the extent to which their scores capture the Lectical^™ dimension (the skill level of the performance); (2) how well they target the domain or topic of interest; (3) their relevance, particularly with respect to the relevance of their feedback; and (4) their utility, particularly with respect to the value of their feedback.

Figure 1: Validity considerations for Lectical Assessments and DiscoTests

validity

Bias

We do several things to address potential bias.

We collect demographic information, including ethnicity, from all test-takers who are willing to provide it, and periodically use this information to look for any trends that might indicate bias.
We score blind to ethnicity and all other demographic variables.
We build the developmental dictionary (the Lectical Dictionary) behind our electronic scoring system by mining assessment responses for evidence of new meanings or new ways of expressing the same meaning. This means that different ways of expressing similar ideas are continuously being captured and accounted for.

Reliability

We track two forms of reliability: (1) internal consistency, which we examine with Rasch modeling software, and (2) inter-rater reliability.

Figure 2: Maintaining the reliability of our scores

reliability

Test-retest results

Test-retest studies of assessments scored with the Lectical Assessment system show no growth, on average, from test time 1 to test time 2 (when test-takers do not receive one of our formative reports).

Research

We have been conducting research on (and with) our assessments for several years. Some of this research has been published in peer-reviewed journals. Other research is documented in reports. The following section shows how some of our publications and reports relate to various aspects of reliability and validity. (Click on titles to view pdf documents.)

Convergent validity	References
Psychometric modeling and qualitative analyses reveal that the LAS and several longitudinally validated domain-specific cognitive developmental assessment systems assess the same dimension of performance.	Dawson, T. L. (2000). Moral reasoning and evaluative reasoning about the good life. Journal of Applied Measurement, 1, 372-397.
	Dawson, T. L. (2002). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3, 146-189.
	Dawson, T. L. (2003). A stage is a stage is a stage: A direct comparison of two scoring systems. Journal of Genetic Psychology, 164, 335-364.
	Dawson, T. L., & Gabrielian, S. (2003). Developing conceptions of authority and contract across the life-span: Two perspectives. Developmental Review, 23, 162-218.
	Dawson, T. L., Xie, Y., & Wilson, M. (2003). Domain-general and domain-specific developmental assessments: Do they measure the same thing? Cognitive Development, 18, 61-78.
	Dawson, T. L. (2004). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11, 71-85.

Predictive validity 1

Evidence

LectaTests are designed to target real-world skills—skills that make us better at what we do at work and in our personal lives. If we are doing a good job, working with our assessments should support behavioral change.

In a preliminary analysis of Clear Impact's ambitious 40-hour, 9 month, leadership training initiative involving four levels of management in a large North American city, we examined the effects of embedding up to 8 LectaTests (including pre-and post LDMAs) on manager's growth and collaborative behavior. Most of the results reported here are restricted to the LDMA data of supervisors who (1) completed pre and post LDMAs and (2) had two or more supervisees who had completed pre and post LDMAs.

Regression results	n	r	p
The number of LectaTests taken by supervisors predicts their own Lectical growth		.19	.01
Greater Lectical growth of supervisors predicts higher 360 scores from direct reports.	9	.59	.09
Higher 360 scores for supervisors predict higher average lectical growth of direct reports	10	.60	.07
Higher average Lectical growth of direct reports predicts higher 360 scores from peers	10	.46	.18

Predictive validity 2
Upper-level managers, on average, have higher level decision-making skills than lower level managers.

Unidimensionality	References
Rasch modeling shows that the LAS captures a robust dimension of performance.	Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought across the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112.
	Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196.

Transformational learning	References
Rasch modeling shows that development along the latent dimension measured by the LAS is wave-like, a pattern that is consistent with the cognitive developmental postulate that development is characterized by a series of nested, hierarchical reorganizations of knowledge structures (rather than the simple accumulation of knowledge).	Xie, Y., & Dawson, T. L. (2006). Multidimensional models in a developmental context. In M. Garner, G. Engelhard, M. Wilson & W. Fisher (Eds.), Advances in Rasch Measurement: JAM Press.
	Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought across the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112.
	Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196.
	Dawson-Tunik, T. L. (2005, June). Cognitive change is stage-like: The cumulative evidence from a decade of Rasch modeling. Paper presented at the Annual Meeting of the Jean Piaget Society, Vancouver.
	Dawson, T. L. (2006). Stage-like patterns in the development of conceptions of energy. In X. Liu & W. Boone (Eds.), Applications of Rasch measurement in science education (pp. 111-136). Maple Grove, MN: JAM Press.

Internal consistency	References
The internal consistency of the LAS has historically been above .90. (As of 2009, we are maintaining alphas of .95 and above. In general, reliability studies show that we can have confidence in lectical scores to within 1/4 to 1/5 of a level, which means we can detect 4-7 distinct phases of performance within a typical classroom.)	Dawson, T. L. (2000). Moral reasoning and evaluative reasoning about the good life. Journal of Applied Measurement, 1, 372-397.
	Dawson, T. L. (2002). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3, 146-189.
	Dawson, T. L., Xie, Y., & Wilson, M. (2003). Domain-general and domain-specific developmental assessments: Do they measure the same thing? Cognitive Development, 18, 61-78.
	Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought across the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112.
	Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196.

Inter-rater reliability	References
Inter-rater reliability for the LAS consistently has been above 85% agreement within 1/3 of a lectical level. (As of 2014, we maintain an inter-rater agreement rate of 85% within 1/5 of a level.)	Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought across the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112.
	Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196.
	Dawson, T. L. (2006). Stage-like patterns in the development of conceptions of energy. In X. Liu & W. Boone (Eds.), Applications of Rasch measurement in science education (pp. 111-136). Maple Grove, MN: JAM Press.

Statistical reliability

Alphas and variance explained by factor 1 (hierarchical complexity)

In Rasch analyses of the assessments developed by DTS, the lectical dimension (hierarchical complexity) consistently explains 79–99% of the variance in performances.

Test	Project	N	item no.	Alpha	Variance explained
LDMA	CI	1099	5	.968	88.73%
FOLA	DTS	61	2	.894	91.44%
LMBE	LMBE	13	5	.984	95.13%
LLRA	LLRA	54	5	.988	95.47%
LRJA	LRJA	224	7	.958	79.89%

Evaluation studies	References
Lectical assessments have been used in a number of evaluation studies. They have been shown to capture learning over relatively short interventions and in small cohorts. (See table below for results of several paired samples t-tests.)	Dawson-Tunik, T. L. & Stein, Z. (2004, July). Critical Thinking Seminar pre and post assessment results.Hatfield, MA: Developmental Testing Service, Inc.
	Dawson, T. L., & Stein, Z. (2006). National decision-making curriculum. Results of the pre- and post-instruction developmental assessments. Northampton, MA: Developmental Testing Service.
	Dawson, T. L., & Stein, Z. (2006). Mind Brain & Education study: Final report. Northampton, MA: Developmental Testing Service, Inc.

Impact of VCoL-rich curricula

This table shows growth during one VCoL-rich leadership training program. Total class time was 40 hours. (Upper = upper level managers, Middle = mid-level managers, and Supers = supervisors)

Scale averages	Overall (114)		Upper (7)		Middle (43)		Supers (57)
	Pre	Post	Pre	Post	Pre	Post	Pre	Post
Lectical score	11.3	11.5	11.5	11.6	11.4	11.6	11.2	11.4
Perspective-taking	22	38	30	35	25	41	19	36
Perspective-seeking	8	17	8	20	10	19	6	15
Perspective coordination	30	59	27	66	31	63	28	55
Collaborative capacity	34	56	33	66	37	59	31	52
Contextual thinking	31	52	33	65	33	56	29	48
Decision-making process	28	54	32	57	28	60	26	50

The table on the right shows average growth for several projects that included pre- and post-assessment. Some of these programs featured VCoL-rich curricula. Others did not. The average growth for the 7 VCoL-rich programs was .083, whereas the average growth for the 3 more conventional content-based programs, was .21.

Study	N	Interval	Program length (hrs)	VCoL-rich	Mean growth
IT 2004, LDMA	40	6 mos	60	No	0.06
IT 2005, LDMA	32	6 mos	60	Yes	0.27
MH 2010, LRJA	43	13 mos	42	No	0.13
AU 2010, LDMA	28	12 mos	43	No	0.09
ZV 2012, LDMA	18	4 mos	40	Yes	0.18
NA 2012, LDMA	24	1 mos	40	No	0.03 n/s
NA1 2013, LDMA	16	4 mos	40	No	0.05 n/s
NA2 2013, LDMA	19	4 mos	40	No	0.07
ST 2012, LDMA	27	6 mos	40	No	0.15
CI 2013, LDMA	512	9 mos	40	Yes	0.18

Detecting change over short periods in small samples

In the table below, paired-samples t-tests show levels of detectable growth in several program evaluations that were conducted with Lectical Assessments. They demonstrate that our measures can detect growth as small as .05 of a level in an average-sized classroom (NA1 2013, LDMA). Moreover, they show that measurable growth can occur with minimal instruction in a well-designed program. For example, the individuals in CI 2012 met only 4 times over the course of 3 months. Finally, results reveal less growth in programs that provide few opportunities for VCoL-style learning. The course taken in the NA 2012 project did not include any reflective activity. Subsequent NA studies deliberately incorporated increasing amounts of reflective activity.

Study	N	Interval	DF	Mean time 1	Mean time 2	t2 - t1	t	p
IT 2005, LDMA	32	6 mos	31	10.98	11.17	0.27	7.05	.001
CI 2012, LDMA	31	3 mos	30	11.24	11.30	0.06	2.01	.053
ST 2012, LDMA	27	3 mos	26	11.18	11.27	0.09	2.64	.014
AU 2010, LDMA	44	12 mos	43	10.92	11.08	0.16	2.19	.034
CI 2013, LDMA	185	9 mos	184	11.31	11.49	0.17	14.39	.001
AU 2011, LDMA	57	12 mos	56	11.24	11.28	0.04	1.50	.140
AU 2011, LDMA	38	12 mos	37	11.25	11.32	0.07	2.28	.030
ZV 2012, LDMA	18	4 mos	17	11.26	11.44	0.18	5.91	.001
NA 2012, LDMA	24	1-3 mos	23	11.25	11.28	0.03	1.41	.170
NA1 2013, LDMA	16	3 mos	15	11.24	11.29	0.05	3.30	.001
NA2 2013, LDMA	19	3 mos	18	11.23	11.30	0.07	3.63	.001
MH 2010, LRJA	43	13 mos	42	11.32	11.19	0.13	2.22	.031

*All Lectical assessments meet or exceed the validity and reliability standards for educational and psychological testing set jointly by the APA, AERA, and NCME.

**See: Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35(11), 1012-1027.

Validity & reliability