COMPASS

INTERNATIONAL LARGE-SCALE ASSESSMENTS IN EDUCATION:

A BRIEF GUIDE

COMPASS

BRIEFS IN EDUCATION

NUMBER 10 SEPTEMBER 2020

SUMMARY

Internaonal large-scale assessments (ILSAs) are one

of the most important tools policymakers and other

educaonal stakeholders have to inform evidence-

based decision making for educaonal reform. Despite

this, and the widespread use of ILSA data, results are

somemes misunderstood or misinterpreted. Here, we

oer a brief guide to ILSAs and illuminate some of the

important dierences and commonalies within and

across studies, limitaons, and why they remain one

of our most signicant tools for educaon evaluaon

and reform. We focus on and compare the key studies,

approaches, and structure of our own organizaon,

the Internaonal Associaon for the Evaluaon of

Educaonal Achievement (IEA), with other ILSAs.

IMPLICATIONS

 IEA conducted the rst ILSA study in 1958 but the mandates

for ILSAs have changed over their history and new mandates

are constantly developing. Most recently, focus has shied

to educaonal outcomes rather than inputs. ILSA results

should be understood in the context of their changing remit.

 There are substanal dierences between organizaons

and approaches to assessments. From the review process to

decision-making and fees, fundamental dierences should

be accounted for when understanding results.

 Key ILSAs do share a common methodology marked by high-

quality standards carefully dened to achieve each step in

the ILSA process and, further, the data are accompanied by

detailed supporng technical reports to assist in interpreng

and reporng results.

 Limitaons of ILSAs need to be acknowledged when

understanding and reporng results. In parcular, care must

be taken when considering results from ILSAs as a holisc

quality measure of the educaon system.

 Despite limitaons, the assessments are unique, monitoring

systems over me within a robust internaonal framework,

being largely independent of any single polical system,

and with data freely available to the public. When properly

understood and analyzed, the data from ILSAs provide

valuable opportunies to help inform policy decisions and

research.

Internaonal Associaon for the Evaluaon of Educaonal

Achievement (IEA), Amsterdam.

Website: www.iea.nl

@iea_educaon

IEAResearchInEducaon

IEA

The most well-known ILSAs are the core studies of the Internaonal Associaon for the Evaluaon of Educaonal

Achievement (IEA) and the Organisaon for Economic Co-operaon and Development (OECD):

INTRODUCTION

Internaonal large-scale assessments (ILSAs) of educaon

are empirical studies that assess educaonal abilies around

the world. The data are used in various ways to help inform

policymakers, educaonal researchers, and the general public.

However, despite the widespread use of ILSA data, how to

interpret and report results is oen misunderstood. Dierent

study results are somemes reported or, interpreted to mean

the same thing, yet there exists important dierences that need

to be accounted for when using results. As leaders in the eld of

internaonal large-scale assessment, our intenon for this brief

is to provide context for a beer understanding of ILSA results

and; how they should be interpreted and reported—we discuss

what ILSAs are, the history of their development, dierences

as well as commonalies in approaches, organizaon, and

methodology, and important limitaons—and to express why we

believe that ILSAs are unique, important, and relevant tools for

understanding educaonal systems and student achievement

around the world, and informing evidence-based change.

WHAT ARE ILSAs?

ILSAs assess student achievement in specic disciplines and

provide context for the results by collecng addional data

at the student level. Further contextual details at the teacher,

principal, and/or system levels may also be collected. To

provide stascally valid results, a representave sample of

schools (usually around 150 to 200 schools) are drawn from

each parcipang country or educaon system, and a group of

students are randomly drawn from within each of the sampled

schools, either by sampling enre classrooms or by sampling

students across classrooms.

BRIEF HISTORY

The rst ILSA, IEA’s Pilot Twelve-Country Study (Foshay et al.

1962), was launched by a group of researchers in 1958 (Husén

1983; IEA 2018). Scholars from various disciplines met at the

UNESCO Instute for Educaon in Hamburg (Germany) and

decided to launch a then exploratory study to test whether it

was possible to compare learning outcomes across a range of

dierent countries and cultures. They chose to assess student

achievement in mathemacs, assuming that it would be easiest

to translate into dierent languages and was thus, more likely

to result in valid comparisons across countries. Their aim

was to nd out what could be learned through internaonal

assessment, with the hope that countries could learn from

each other. As Torsten Husén phrased it: “In general terms,

internaonal studies such as this one can enable educaonalists

(and ulmately those responsible for educaonal planning and

policy making) to benet from the educaonal experiences

of other countries. It helps educaonalists to view their own

system of educaon more objecvely because for the rst me

many of the variables related to educaonal achievement had to

be quaned in a standardized way” (Husén 1967, pp. 13-14).

While the rst ILSAs were conducted by researchers with

quite minimal resourcing to sasfy academic interests in

invesgang educaon, by 1990 educaonal policymakers

had begun to realize that ILSAs could potenally provide useful

evidence-based data. However, their nancial support for ILSAs

demanded rapid outcomes. Where the rst academic study

reports were somemes launched up to eight years aer the

study was conducted (Anderson et al. 1989), this wider interest

led to a new pressure to publish results as soon as possible. An

addional consequence was an interest in measuring trends

in educaon systems, a challenge that IEA rose to meet with

TIMSS, rst conducted in 1995 and followed up by a second

Source: OECD 2019; IEA 2019

IEA Internaonal Computer and

Informaon Literacy Study (ICILS)

OECD Programme for Internaonal

Student Assessment (PISA)

IEA Trends in Internaonal Mathemacs

and Science Study (TIMSS)

IEA Progress in Internaonal

Reading Literacy Study (PIRLS)

IEA Internaonal Civic and

Cizenship Educaon Study (ICCS)

trend assessment four years later in 1999. The four year gap

was chosen because TIMSS assessed grade four and grade

eight students in 1995; in 1999, only grade eight students were

assessed, the aim being to compare two cohorts of grade eight

students’ results, and assess the same cohort of grade four

students four years later.

In the 1990s there was also an emerging internaonal interest

to have strong educaonal data to beer understand economic

growth. As such, the OECD, one of the leading internaonal

economic organizaons, placed a greater emphasis on

educaon and the measurement of educaonal outcomes, and

launched its own assessment study, PISA, in 2000. Previously

the OECD had only used other sources—including IEA studies

and their annual publicaon Educaon at a Glance—for their

educaonal work. However, the organizaon decided to focus

on the skills needed to operate in a modern economy rather

than on assessing what schools were teaching. Specically, PISA

claims to assess what the OECD believes 15-year-old students

enrolled in school should know in reading, mathemacs, and

science literacy; the assessment is conducted every three

years. Originally, the study focused on OECD countries, but an

increasing number of non-OECD countries now take part in the

assessment.

Toward the end of the 1990s and into the rst decade of

the 2000s, regional ILSAs were iniated including the LLECE

(Laboratoria Lanoamericano de Evaluación de la Calidad de la

Educaón) in Lan America, and PASEC (Programme d’Analyse

des Systèmes Educafs de la CONFEMEN) and SACMEQ

(The Southern and Eastern Africa Consorum for Monitoring

Educaonal Quality).

More recently, since 2015 when the Sustainable Development

Goals (SDGs) were declared by the UN, a new emphasis for

assessments has emerged. In contrast to the Millennium

Development Goals, the SDGs focus on educaonal outcomes

of educaon systems rather than on inputs like expenditure on

educaon. This results in all countries being urged to report on

the percentage of students reaching minimum prociency levels

and clearly constutes a new challenge for ILSAs. In this regard,

IEA is playing an acve role through the implementaon of

the Rosea Stone Project

, in collaboraon with the UNESCO

Instute for Stascs, LLECE, and PASEC. The objecve is to

develop a concordance table that translates scores resulng

from regional mathemacs and reading assessments with the

TIMSS and PIRLS scales.

DIFFERENT APPROACHES =

DIFFERENT ASSESSMENTS

Although the best known ILSAs feature a number of similaries,

there are also some substanal dierences that need to be

considered when comparing the results for dierent educaonal

systems. In the following table we have compiled some notable

dierences that test consumers need to be aware of when

comparing study results conducted by the IEA and OECD.

1. hps://www.iea.nl/studies/addionalstudies/rosea

ASPECT IEA STUDIES PISA

Study philosophy

Content selecon

Cohort selecon

Seeks to measure what is taught in schools

and the contexts of learning.

The curricula of the parcipang countries

are analyzed. Parcipang countries then

jointly develop the assessment framework

and test materials to ensure naonal

interests are acknowledged.

Samples are grade based (grades four,

eight, or twelve) to reect the structure of

curricula and to establish a direct link to

the subject teachers.

Seeks to measure selected acquired skills

of students towards the end of their

compulsory educaon.

OECD-selected experts determine the

skills that they think students should have

mastered for use later in life and assemble

the study instruments accordingly.

Samples are age based (15-year-olds for

PISA) to assess a generaon, independently

of their school pathways or grade

distribuon.

Table 1: Dierence in approaches between IEA studies and PISA

COMMON METHODOLOGIES

Despite dierences in organizaon and approach, ILSAs

share a quite similar set of procedures and methods for

implementaon that have been developed and rened over

the last 60 years, and have contributed to methodological

advances in regional ILSAs and naonal assessment programs

in many countries. Drawing from experse from around

the world, all major ILSAs mandate high-quality standards

carefully dened to achieve each step in the ILSA process

including: sampling rules, translaon processes, eld trial

procedures, and psychometric modelling. These methods

may appear quite complex to those unfamiliar with ILSAs,

but, for transparency, the data are accompanied by detailed

supporng technical reports and all ILSA data is made freely

available online for researchers to download. Further, both

IEA and OECD produce a large number of resources to help

promote the secondary analyses of the data. These secondary

analyses are possible and made rich by the presence of

contextual data. A recent example is ICILS where contextual

data helped to inform student digital literacy.

DIFFERENT ORGANIZATION

Beyond the studies themselves, the two organizaons—

IEA and OECD—dier in many regards (see Table 2).

Most fundamentally, their missions are not equivalent when it

comes to the relaonship between results and policymaking.

The scope of the OECD is larger than the eld of educaon,

originally grounded in the economic sector, and one OECD

mission is to give its members recommendaons in terms of

policies to be implemented. IEA, as an associaon grounded

in academic research in educaon, has no vocaon to dra

recommendaons to its members but rather, to provide

evidence on which each individual country can build adequate

policies regarding its own context. The following table

describes some of the organizaonal dierences between the

IEA and OECD.

ASPECT IEA OECD

Type of organizaon

Method of carrying

out ILSAs

Review process

Parcipaon

Decision making

Fees

Non-governmental associaon (with

naonal members).

Conducts studies cooperavely with partner

organizaons from its scienc network.

All study publicaons are rigorously

peer-reviewed.

Open to all countries. No requirement to

take part in studies.

Final content decided by the Naonal

Research Coordinators (NRC) of

parcipang countries, each of which has

an equal voice.

Each country makes an equal contribuon

to the internaonal coordinaon costs.

Governmental organizaon.

Iniates studies and tenders the studies

for each cycle. Separates out dierent

tasks (e.g., framework development,

sampling, test plaorm).

Wrien and reviewed in-house

and reviewed by board members of

parcipang countries.

The agship PISA study was originally

targeted to OECD member countries

and is now open to all economies.

The organizaon structure reects the

original centrality of the OECD membership

and vong is restricted to OECD members

and select partner economies.

Member countries pay dierent

contribuons based on their GDP.

Non-member parcipants pay a at fee.

Table 2: Organizaonal dierences between IEA and OECD

LIMITATIONS

There are some important limitaons to consider in

understanding and reporng ILSA results. Firstly, all ILSAs

are cross-seconal studies. This means that they measure

educaonal outcomes at one point in me for a specic

populaon. Further, although some ILSAs such as TIMSS, PIRLS,

and PISA report trends over me, they are not studies that follow

individual students. This makes it challenging, if not unfeasible,

to draw causal conclusions about student achievement and is a

clear limit of the data with many policymakers seeking specic

recommendaon for change to help improve their educaonal

system. Although some researchers seek to use advanced

models to establish causal relaonships, we personally

maintain that such models are based on strong assumpons

that are dicult, if not impossible, to achieve with ILSA data.

A second limitaon of ILSAs is that the studies are not

designed to measure individual students’ achievement nor

the results of individual schools but to reect the educaonal

results and relaonship with background informaon within

educaon systems. As such, the assessments are considered

low stakes for schools, teachers, and students. Nevertheless,

the low stakes aspect of the assessment has the advantage

of lowering tesng stress on students and schools. In fact,

ILSAs require a fairly short tesng me when compared to

high stakes assessments and are only administered to a small

representave poron of the populaon in the countries.

Finally, the content domains covered by ILSAs are not an

exhausve list of what is taught in schools. For example,

IEA’s TIMSS focuses on mathemacs and science curriculum

aainment at grades four and eight and, while crical subjects,

strong aainment in these subjects alone cannot be considered

to be a reliable measure of the overall health of an educaon

system. Some have cricized ILSAs as exerng undue inuence

on educaon policies, with naonal approaches perceived as

being replaced with a tendency to target naonal curricula

toward beer achievement in the subjects assessed by ILSAs.

Consequently, care must be taken when considering results

from ILSAs as a holisc quality measure of the educaon system,

rather the focus should be on the results as important indicators

of what students know and can do in specic subjects, and

how students naonally compare to their peers internaonally.

WHY ILSAs ARE IMPORTANT

Despite limitaons, ILSAs are vital tools for educaon

system improvement. The assessments are unique in

that the informaon provided by the data can be used to

monitor systems over me within a robust internaonal

framework, and they fall outside the governance of any

one country thus being largely independent of any single

polical system. Further, the data is available to the public

allowing researchers from around the world to explore the

data and make use of it for their own research quesons.

In many countries, research resulng from ILSAs has improved

our understanding of how educaonal systems operate,

informing policy decisions that are based on strong and reliable

evidence. Various impact studies have shown that ILSA results

have been used to support policymaking (e.g., Breakspear 2012;

Schwippert & Lenkeit 2012; Wagemaker 2013) and as reported

in the TIMSS and PIRLS Encyclopedias various educaonal

improvements have been smulated by evidence from ILSAs.

When properly understood and analyzed, the data from ILSAs

provide valuable opportunies to help inform policy decisions

as well as research into educaon system improvement.

Dr Dirk Hastedt is the Execuve Director of

IEA. He oversees IEA’s operaons, studies,

and services, and drives IEA’s overall strategic

vision. Moreover, he develops and maintains

strong relaonships with member countries,

researchers, policy makers, and other key

players in the educaon sector. Dr Hastedt

also serves as co-editor in chief of the IEA-

ETS Research Instute (IERI) journal Large-

scale Assessments in Educaon.

Dr Thierry Rocher was elected to the IEA Chair

posion at the 59th General Assembly meeng

in October 2018. Dr Rocher previously served

as a Standing Commiee member and General

Assembly representave for France. He is the

Head of the Oce for Student Assessment

(DEPP) at the Ministère de l’éducaon

naonale in France.

ABOUT THE AUTHORS

DR DIRK HASTEDT

DR THIERRY ROCHER

WHAT NEXT?

As ILSAs connuously modernize—most recently with a move

toward computer-based assessments—new methodological

opportunies and challenges await. This is why organizaons

conducng ILSAs are acvely promong research in the

eld of internaonal assessment. For example, IEA sponsors

academic journals, conferences, and themac reports for

outside researchers as well as employing its own research

team to help further developments in the eld of assessment.

More generally, in its renewed strategy, IEA is placing a

strong emphasis on research and innovaon, including,

for example, the promising topic of “process data”(i.e.,

digital traces le by students when passing an assessment).

Another focus of ILSA development is the exploraon of larger

and more complex dimensions, so-called “21st century skills.”

IEA has recently launched a curriculum study (21CS MAP) which

aims to map these skills. This fundamental study will be based

on what is taught in schools (the intended and implemented

curricula) before developing a concrete assessment program.

CONCLUSION

In a growing interconnected world, we believe ILSAs can

help us learn from others and, through comparison, beer

understand ourselves. IEA works diligently on assisng the

policy and research community by training researchers and

policymakers on how to interpret and analyze data, by wring

and commissioning in-depth reports into the study ndings,

and by publishing quarterly briefs that are intended to be

short digesble summaries of interesng study results. Such

acvies are undertaken in support of IEA’s mission “to beer

understand educaon pracces, processes, and policies in

order to improve the quality of teaching and learning within

and across systems of educaon.” All IEA studies and reports

follow the highest academic standards for social science

research, including complete transparency about the tesng

process and robust peer review for studies and reports.

Anderson, L.W., Ryan, D.W., & Shapiro, B.J. (1989). The IEA classroom environment study. Oxford, UK: Pergamon Press.

Breakspear, S. (2012). The Policy Impact of PISA: An Exploraon of the Normave Eects of Internaonal Benchmarking in School

System Performance. OECD Educaon Working Papers No. 71. Paris, France: OECD Publishing

Foshay, A.W., Thorndike, R.L., Hotyat, F., Pidgeon, D.A., & Walker, D.A. (1962). Educaonal achievements of thirteen-year-olds in

twelve countries: Results of an internaonal research project, 1959‚ 1961. Hamburg, Germany: UNESCO Instute for Educaon.

hps://unesdoc.unesco.org/ark:/48223/pf0000131437.

Husén, T. (1967). Internaonal study of achievement in mathemacs: a comparison of twelve countries. Vol. I–II. Stockholm, Sweden

and New York, NY: Almqvist & Wiksell and John Wiley.

Husén, T. (1983). An Incurable Academic: Memoirs of a Professor. Oxford, UK: Pergamon Press.

IEA. (2018). 60 years of IEA 1958–2018. Amsterdam, Netherlands: Author.

hps://www.iea.nl/index.php/publicaons/60-years-iea-1958-2018.

IEA. (2019). IEA studies [webpage]. hps://www.iea.nl/studies/ieastudies

Lockheed, M. (2011). Reecons on IEA from the perspecve of a World Bank ocial. In C. Papanastasiou, T. Plomp, & E.

Papanastasiou (Eds.), IEA 1958-2008: 50 years of experiences and memories, Vol 2. Nicosia, Cyprus: Cultural Center of the Kykkos

Monastery.

Lockheed, M., Prokic-Bruer, T. & Shadrova, A. (2015). The Experience of Middle-Income Countries Parcipang in PISA 2000-2015.

Paris, France: PISA/OECD Publishing.

OECD. (2019). PISA: Programme for Internaonal Student Assessment. hp://www.oecd.org/pisa/

Schwippert, K., & Lenkeit, J. (Eds.). (2012). Progress in reading literacy in naonal and internaonal context: The impact of PIRLS 2006

in 12 countries. Münster, Germany: Waxmann Verlag GmbH

Travers, K., & Westbury, I. (1989). The IEA Study of Mathemacs I: Analysis of the Mathemacs Curricula. Amsterdam, the Netherlands:

Internaonal Associaon for the Evaluaon of Educaonal Achievement (IEA).

Wagemaker, H. (2013). Internaonal large-scale assessments: From research to policy. In L. Rutkowski, M. von Davier, & D.

Rutkowski (Eds.), Handbook of internaonal large-scale assessment: Background, technical issues, and methods of data analysis, p. 11.

Florence, KY: Chapman and Hall CRC.

REFERENCES

ADDITIONAL RESOURCES

Hastedt, D. (2016, November 7). The history and development of internaonal assessments [Podcast]. FreshEd Podcast 49.

hp://www.freshedpodcast.com/dirkhastedt/.

Lockheed, M., & Wagemaker, H. (2013). Internaonal Large-Scale Assessments: Thermometers, whips or useful policy tools?

Research in Comparave and Internaonal Educaon, 8(3), 296–306.

Singer, J.D., Braun, H.I., & Chudowsky, N. (2018). Internaonal educaon assessments: Cauons, conundrums, and common sense.

Washington, DC: Naonal Academy of Educaon.

ABOUT IEA

The Internaonal Associaon for the Evaluaon of Educaonal Achievement,

known as IEA, is an independent, internaonal consorum of naonal

research instuons and governmental agencies, with headquarters in

Amsterdam. Its primary purpose is to conduct large-scale comparave

studies of educaonal achievement with the aim of gaining more in-depth

understanding of the eects of policies and pracces within and across

systems of educaon.

Thierry Rocher

IEA Chair

Dirk Hastedt

IEA Execuve Director

Andrea Neen

Director of IEA Amsterdam

Gina Lamprell

IEA Publicaons Ocer

Compass Editor

David Rutkowski

Indiana University

Please cite this publicaon as:

Rocher, T., & Hastedt, D. (2020, September). Internaonal large-scale assessments in educaon: a brief guide

IEA Compass: Briefs in Educaon No. 10. Amsterdam, The Netherlands: IEA

Associaon for the Evaluaon of

Educaonal Achievement (IEA)

publicaon may be reproduced,

stored in a retrieval system or

transmied in any form or by any

means, electronic, electrostac,

magnec tape, mechanical,

photocopying, recording or

otherwise without permission in

wring from the copyright holder.

ISSN: 2589-70396

Copies of this publicaon can be

obtained from:

IEA Amsterdam

Keizersgracht 311

1016 EE Amsterdam

The Netherlands

By email: [email protected]

Website: www.iea.nl

@iea_educaon

IEAResearchInEducaon

IEA