Measuring those who have their minds set: An item-level meta-analysis of the implicit theories of intelligence scale in education

Fixed and growth mindsets represent implicit theories about the nature of one ’ s abilities or traits. The existing body of research on academic achievement and the effectiveness of mindset interventions for student learning largely relies on the premise that fixed and growth mindsets are mutually exclusive. This premise has led to the common practice in which measures of one mindset are reversed and then assumed to represent the other mindset. Focusing on K-12 and university students ( N = 27328), we tested the validity of this practice via a comprehensive item-level meta-analysis of the Implicit Theories of Intelligence Scale (ITIS). By means of meta-analytic structural equation modeling and network analysis, we examined (a) the ITIS item-item correlations and their heterogeneity across 32 primary studies; (b) the factor structure of the ITIS, including the distinction between fixed and growth mindset; and (c) moderator effects of sample, study, and measurement characteristics. We found positive item-item correlations within the sets of fixed and growth mindset items, with substantial between-study heterogeneity. The ITIS factor structure comprised two moderately correlated mindset factors ( ρ = 0.63 – 0.65), even after reversing one mindset scale. This structure was moderated by the educational level and origin of the student sample, the assessment mode, and scale modifications. Overall, we argue that fixed and growth mindsets are not mutually exclusive but correlated constructs. We discuss the implications for the assessment of implicit theories of intelligence in education.


Introduction
Much of the research in education focuses on identifying possible factors that may determine students' learning and academic success.Over the last few decades, one of these factors has gained considerable attention, both in small-and large-scale interventions (e.g., Sisk et al., 2018;Yeager et al., 2019) and studies establishing the positive link to academic achievement (e.g., Costa & Faria, 2018;Petscher et al., 2017): students' implicit theories or so-called mindsets (Dweck & Yeager, 2019).Dweck (2000) conceptualized mindsets as self-theories about the nature of one's psychological attributes, such as abilities (e.g., intelligence, mathematical skills, creativity) or traits (e.g., personality, moral orientations, emotions).These self-theories can take two forms: People can hold entity theories, believing that their psychological attributes are fixed and can hardly change (fixed mindset) or incremental theories, believing that their psychological attributes are malleable, controllable, and can develop (growth mindset; Dweck & Leggett, 1988).The extant body of research on mindsets is largely based on the premise that fixed and growth mindsets are mutually exclusive and that the absence of one suggests the presence of the other-this implies that a student cannot hold both mindsets at the same time (Lüftenegger & Chen, 2017).
This premise has impacted mindset assessment practices in education to a large extent.Specifically, most empirical studies assessing mindsets via students' self-reports entertained one of the following practices (e.g., Burnette et al., 2013;Costa & Faria, 2018;Lüftenegger & Chen, 2017;OECD, 2021;Yeager & Dweck, 2020): (a) Assessing both fixed and growth mindset with multiple items and reverse-coding one to align it with the other; (b) Assessing either fixed or growth mindset with multiple items and assuming that low scores of one mindset indicate high scores of the other; and (c) Assessing one mindset with a single item under the same assumption as in (b).However, more and more evidence has accumulated that fixed and growth mindsets are correlated yet not mutually exclusive constructs (e.g., Cook et al., 2017;Diseth et al., 2014), and students can indeed hold both beliefs (e.g., Burgoyne & Macnamara, 2021;Petscher et al., 2017).
The present study is aimed at clarifying the consequences of the reversing practice of mindset measures meta-analytically.Utilizing primary study data from the most commonly used mindset assessment, the Implicit Theories of Intelligence Scale (ITIS) developed by Dweck (2000), we perform an item-level meta-analysis to examine the relations among fixed-and growth-mindset items, their heterogeneity, the factor structure of the ITIS, and possible moderators thereof.Ultimately, our methodological review provides recommendations guiding the use of the ITIS to assess students' implicit theories of intelligence in education.Besides, we provide meta-analytic evidence on the existence of a single or two correlated mindsets.

Implicit theories of intelligence in education
Implicit theories represent a person's beliefs about the nature and workings of a psychological attribute or phenomenon (Dweck et al., 1995;Sternberg, 1985).These theories can vary substantially between two persons, as they rely on, for instance, individual experiences, knowledge, and perceptions of the social world (Schunk, 1995).Taking a socio-cognitive perspective, Carol S. Dweck and others have coined the terms "implicit theories of intelligence" or "mindsets" to describe these self-theories (Dweck & Leggett, 1988;Dweck & Yeager, 2019).A person can hold incremental (growth) or entity (fixed) theories (mindsets) about his or her intellectual abilities or psychological attributes, yet not simultaneously, as the current body of research assumes (Lüftenegger & Chen, 2017).
Implicit theories of intellectual abilities have gained considerable attention in educational research, and at least two lines of research have emerged (Dweck & Yeager, 2019): (a) the effectiveness of interventions promoting students ' growth mindsets; and (b) the relations between mindsets and educational achievement.While reviewing these research lines in great detail is beyond the scope of this article, we still point to some of their key results.By and large, mindset interventions targeted at developing students' growth mindsets and, ultimately, improving learning and academic achievement seem effective and scalable, yet with small effects and large heterogeneity (Sisk et al., 2018;Yeager et al., 2019).Some studies have testified that these interventions were especially effective for low-achieving and at-risk students (Claro et al., 2016;Paunesku et al., 2015;Sarrasin et al., 2018).Studies focusing on the relations between growth mindset and academic achievement resulted in positive yet small correlations (r = 0.07-0.10),again with substantial heterogeneity (Burnette et al., 2013;Costa & Faria, 2018;Sisk et al., 2018).Burnette et al. (2013) established meta-analytically that the mindset-achievement relationship was partially mediated by students' goal orientations and self-regulation.Moreover, some evidence suggests that the mindset-achievement relations may vary across domains (Costa & Faria, 2018).Similar to self-beliefs about one's abilities (e.g., self-concept, self-efficacy), mindsets may indeed be domain-specific (Hass et al., 2017).For instance, Lewis et al. (2021) showed that domain-specific mindsets and a global mindset co-exist-a finding similar to the hierarchical structure of academic self-concepts (Arens et al., 2020).
Clearly, much of the evidence on the effectiveness of mindset interventions and the mindset-achievement relationship abounds in heterogeneous findings with varying effect sizes between studies.Several meta-analyses and large-scale studies explained some of this heterogeneity.For instance, Costa and Faria (2018) found higher correlations between growth mindset and academic achievement for middle-school students (r = .15)than for high-school and college students (r = 0.06-0.09).A similar moderation effect across developmental stages was backed by Sisk et al. (2018).These stages are, among others, characterized by different stages of cognition, performance on intelligence tests, and structures of the respective measures (Schroeders et al., 2015).Besides, the cultural background of the student samples plays another key role in explaining heterogeneity: Costa and Faria (2018) reported moderator effects of cultural background on the growth mindset-achievement relationship with positive and significant correlations for samples from Asia and Oceania, yet insignificant correlations for samples from Europe and North America.Moreover, the fixed mindset-achievement correlations were negative and statistically significant for North American samples, positive for European samples, and insignificant for Asian samples.The Programme for International Student Assessment (PISA) administered a single item measuring students' fixed mindset in 2018 and examined its relation to reading achievement.Next to the variation in this relation across countries, one finding stood out: "In East Asian countries, growth mindset was not as strongly associated with academic performance as in most OECD countries" (OECD, 2021, p. 17).Yeager and Dweck (2020) discussed this observation and argued that cross-cultural heterogeneity was likely to exist in any mindset study, and that researchers should consider exploring it.
The remaining evidence on possible explanatory variables is, in our reading, limited and adds only a few measurement characteristics: Costa and Faria (2018) found moderator effects of possible modifications of the ITIS (e.g., item wordings, translations, or response options), especially on the fixed mindset-achievement relation (original ITIS: r = − 0.22, modified scale: r = 0.02).Sisk et al. (2018) observed differences in the effectiveness of growth-mindset interventions between computer-based (d = 0.03) and other formats (d = 0.06-0.27).Overall, our brief review revealed that sample and measurement characteristics can explain heterogeneity to some extent.

Measurement issues
Mindset assessments are largely based on students' self-reports, and the Implicit Theories of Intelligence Scale (ITIS) is one the most popular, if not the most popular assessment (Lüftenegger & Chen, 2017).The scale has been administered in a plethora of empirical studies and domains (Costa & Faria, 2018), has been translated into several languages (e.g., Wang & Ng, 2012), and has been used as a criterion to validate other mindset assessments (e.g., Burgoyne & Macnamara, 2021).The ITIS captures students' agreement with a set of statements corresponding to incremental and entity theories and comes in two versions, a six-item version for children (age 10 and older) and an eight-item version for adults (see Table 1; Dweck, 2000).While the items measuring entity theories contain mainly negative wordings to express the fixed nature of the underlying mindset (e.g., "You have a certain amount of intelligence, and you really can't do much to change it."),the incremental theory items are positively formulated to express the orientation towards growth (e.g., "No matter who you are, you can change your intelligence a lot.").While discussing the suitability of students' self-reports to capture their implicit theories is beyond the scope of this article, we notice that these assessments have their limitations, such as possible acquiescence bias or inconsistent response patterns (Freund & Kasten, 2012;Steinmann et al., 2021).
As noted earlier, three practices have dominated the assessment of mindsets (Lüftenegger & Chen, 2017): The first practice is to assess both mindsets.Specifically, researchers administer the full six-or eight-item version of the ITIS, extract students' responses, and then reverse-code one item set (e.g., the growth mindset items) to align it with the other set.Oftentimes, the sets item responses are then combined into a single scale score (Yeager & Dweck, 2020).This practice heavily relies on the assumption that the two mindsets are mutually exclusive and that students cannot hold both simultaneously.Recently, some evidence occurred that the ITIS, in fact, measures two correlated latent variables that correspond to the two mindsets (e.g., Glerum et al., 2020;Li & Bates, 2020;Lou et al., 2021), thus challenging this assumption.Research on self-report scales with mixed-worded items deals with a similar problem, and the respective body of evidence suggested that item reversing creates multidimensionality in the item response data (e.g., Kam, 2016).In this sense, the reversing practice may not necessarily lead to a valid representation of students' mindsets as a single construct.
Another common practice is to assess only one of the two implicit theories via multiple items.This practice is also based on the assumption that the presence of the one mindset indicates the lack of the other.While this practice may be cost-and time-efficient, given that only three or four ITIS items are administered, it is likely to be prone to acquiescence bias and bias in the estimation of the growth-or fixed-mindset scores (Lüftenegger & Chen, 2017).
Similarly, the third practice is to assess one mindset with a single item.Clearly, such single-item measures are likely be prone to acquiescence bias and measurement error (OECD, 2021).In their study of three-and single-item measures of fixed mindsets, Rammstedt et al. (2021) showed that the relations to other constructs can differ substantially between the two types of measures and that single-item measures suffer from low test-retest reliabilities.This practice also assumes the orthogonality of the two mindsets.To our best knowledge, the extent to which the assumption of mutually exclusive mindsets underlying the reversing practices in fact holds is still unclear and requires empirical backing.

Table 1
Wordings and subscale assignments of the items in the six-and eight-item version of the ITIS.
Original Stimuli Implicit Theories of Intelligence Scale for Children-Self Form "Read each sentence below and then circle the one number that shows how much you agree with it.There are no right or wrong answers."(see Dweck, 2000, p. 177, p. 177) Implicit Theories of Intelligence Scale for Adults-Self Form "This questionnaire has been designed to investigate ideas about intelligence.There are no right or wrong answers.We are interested in your ideas.Using the scale below, please indicate the extent to which you agree or disagree with each of the following statements by writing the number that corresponds to your opinion in the space next to each statement."(see Dweck, 2000, p. 178, p. 178) Original Response Options 1 = "Strongly agree", 2 = "Agree", 3 = "Mostly agree", 4 = "Mostly disagree", 5 = "Disagree", 6 = "Strongly disagree"

Item
Wording Subscale Note.The six-item ITIS comprises items 1-6.Item wordings correspond to those of Dweck's Implicit Theories of Intelligence Scales for Children (age 10 and older) and for Adults-Self Forms (Dweck, 2000).
R. Scherer and D.G. Campos

The present study
To summarize, the extant literature on students' implicit theories of intelligence (i.e., mindsets) is largely based on the premise that entity (i.e., fixed mindset) and incremental (i.e., growth mindset) theories are mutually exclusive (Lüftenegger & Chen, 2017).This premise has impacted current assessment practices of mindsets, such as assessing mindsets by (a) item sets that measure both fixed and growth mindsets, one of which is subsequently reversed to align with the other; (b) item sets that measure either fixed or growth mindsets under the assumption that agreement with one set indicates disagreement with the other; and (c) single items (Yeager & Dweck, 2020).Despite the methodological challenges associated with these practices, the substantive assumption that students cannot hold both fixed and growth mindsets at the same time seems questionable (e.g., Glerum et al., 2020;Lüftenegger & Chen, 2017;Tempelaar et al., 2015).Given the heterogeneous evidence on the existence of a single mindset factor or two correlated mindset factors representing entity and incremental theories, Lüftenegger and Chen (2017) called for large-scale evaluations and replications of the evidence base on the relationship between the two implicit theories.Identifying the possible sources of this heterogeneity is also key to understanding how the study, sample, and measurement contexts may impact the evidence base (see also Carpenter et al., 2016).
The present study is primarily aimed at synthesizing the evidence on the relationship between fixed and growth mindsets of intelligence in the context of education.Ultimately, we aim to clarify whether a single mindset or two correlated mindset factors exist.Our secondary goal is to illustrate the potential of item-level meta-analysis and meta-analytic structural equation modeling for examining research questions concerning educational assessment and measurement.Focusing on the ITIS, we specifically address three research questions (RQs):

Literature search and screening
To identify the literature eligible for addressing our RQs, we performed systematic searches in the databases PsycINFO, ERIC, PsyArXiv Preprints, and ProQuest Dissertations & Theses in August 2021, using "mindset" OR "implicit theor*" and "assessment" OR "measurement" OR "scale" as key elements of the search terms.We further supplemented these searches by reviewing the references in the comprehensive review by Costa and Faria (2018) and by searching for raw data in the OECD (https://www.oecd.org/pisa/data/)Note.The proportions are based on 39 study samples.
R. Scherer and D.G. Campos and IEA (https://ilsa-gateway.org/)databases of educational large-scale assessments.Four additional publications could be identified via personal contacts, email requests to authors, a snowball reference, and a university's thesis archive.Finally, to identify any further publications, we used the artificial intelligence-based tool "ASReview" (van de Schoot et al., 2021).Specifically, we submitted the outcomes of the PsycINFO and ERIC searches to the tool, screened 10% of the publications for their eligibility, and the tool ranked the remaining publications according to their similarity to the initially screened publications.The full documentation of the search can be accessed via https://bit.ly/3p517BF,and Fig. 1 shows the outcomes of these searches.
After removing duplicates, we screened the publications against the following criteria: (a) Measures: Primary studies included a quantitative measure of implicit theories/mindsets in intelligence; (b) Coverage: At least one type of mindset has been assessed with at least three items of the ITIS; (c) Sample: We included student samples enrolled in K-12 and university education (i) to align our metaanalysis with the extant literature providing evidence on the mindset-achievement relation and the effectiveness of mindset interventions in educational contexts (Costa & Faria, 2018;Sisk et al., 2018); and (ii) to cover the core levels of formal education (UNESCO Institute for Statistics, 2012).We excluded pre-K samples due to the issues of measuring mindsets by the ITIS in young children (Muradoglu et al., 2022); (d) Statistical reporting: Publications contain sufficient information to extract or derive the item-item correlations; (e) Language: Publications must be in English or, if not, authors have provided a summary of their study and the relevant information in English.In case of unpublished thesis and reports that did not undergo a peer review beyond institutional approval, we contacted the authors and asked for the raw data to retrieve the item-item correlations directly from the data.Overall, the literature search and screening procedures resulted in 29 reports of 32 primary studies which contained 39 independent samples (see Fig. 1).The full meta-analytic sample provided 487 correlation coefficients that were based on the item responses of 27,328 students.

Coding and effect sizes
As a next step, we extracted and coded study, sample, and measurement features to establish the contexts in which the ITIS had been used.Table 2 gives a detailed account of these features.Besides the features, we extracted and derived the item-item correlations as Pearson's rs from the primary studies.Specifically, some primary study authors have made available the raw data (m = 7), so that we could estimate the observed item-level correlation matrix directly (see Table 2).For m = 11 samples, the authors reported or provided us with the observed item-item correlation or covariance matrices, along with the descriptive statistics.Most study reports contained the results of factor analyses (m = 21), including factor correlations and item factor loadings.Applying (co-)variance rules, we derived the model-based item-item correlation matrices from the parameters of the exploratory factor analyses or the best-fitting confirmatory factor analysis models.In the case that authors have reported multiple factor models, we chose the parameters of the best-fitting model; in the case that authors had reported both CFA and EFA results, we chose the EFA results.The detailed procedure is described in the Supplementary Material S2.The agreement on the initial coding of the primary studies between two independent coders was 94%, and discrepancies were resolved subsequently.Curran and Hussong (2009) noted that integrative data analysis represents an ideal approach to synthesizing item-level information and testing factor structures across multiple samples.This approach is based on premise that primary-study authors have made available their data sets-a premise that is largely not met in education (Logan et al., 2021).However, the recent advancements in item-level meta-analysis (Carpenter et al., 2016) and meta-analytic structural equation modelling (MASEM; Cheung, 2015a) allow meta-analysts to get close to the "ideal approach" by testing hypotheses on factor structures and moderators of model parameters based on item-item correlations.Drawing from these advancements, we (a) examined possible publication bias in the item-item correlations; (b) pooled the item-item correlation matrices via multivariate meta-analysis; (c) explored clusters of items via network analysis; (d) tested the factor structures of the two ITIS versions and their invariance across subgroups of primary studies; and (e) tested for moderator effects on the model parameters via one-stage MASEM.

Univariate meta-analyses
Before synthesizing correlation matrices across primary studies (RQ1), we evaluated possible publication bias in the single correlations.First, we pooled the single item-item correlations via univariate random-effects models with or without robust variance estimation (RVE) in the R packages "metafor" (Viechtbauer, 2010) and "robumeta" (Fisher et al., 2017).We chose RVE random-effects models as the primary models for the univariate meta-analyses to account for the hierarchical nesting of multiple, independent samples within primary studies.Second, we performed the funnel plot test by extending these models by the sample size as a moderator.Similarly, we performed Egger's Precision-Effect Test (PET; with the sampling standard error as moderator) and the Precision-Effect Estimate with Standard Errors (PEESE; with the sampling variance as moderator) in a third step.Fourth, we performed Begg's rank correlation test, followed by trim-and-fill analyses.To rule out file-drawer issues, we examined the p-curves of each correlation (Simonsohn et al., 2014).Finally, we identified influential correlations in the meta-analytic sample via Viechtbauer and Cheung's (2010) diagnostic procedure.Supplementary Material S3 shows the respective R code.

Multivariate meta-analyses
Given that each primary study contributed multiple item-item correlation coefficients nested in correlation matrices, the metaanalytic data exhibit effect size multiplicity (López-López et al., 2018).To address this multiplicity, we performed multiple approaches of multivariate meta-analysis to pool the correlation matrices across studies and quantify their heterogeneity.These R. Scherer and D.G. Campos approaches included one-stage and two-stage meta-analytic structural equation modeling (OSMASEM and TSMASEM;Cheung, 2015a;Jak & Cheung, 2020) and meta-analytic aggregation of Gaussian networks (MAGNA; Epskamp et al., 2022).
Specifically, to address RQ1, we performed the first stage of TSMASEM, in which the correlation matrices R i were synthesized across studies i = 1, …, I to a pooled correlation matrix P via a multivariate random-effects model.Vectorizing these matrices to vectors r i and ρ R (Cheung, 2015a), this model is specified as r i = ρ R + u i + e i , with sampling errors e i and study-specific deviations u i from the pooled correlation matrix under normality assumptions, e i ∼ MVN(0, V i ) and u i ∼ MVN(0, T 2 ).V i represents the sampling covariance matrix of study i, and T 2 the heterogeneity covariance matrix.Given the limited number of studies in our meta-analysis, we constrained the covariances among random effects in T 2 (i.e., the off-diagonals) to zero.The multivariate pooling approach accounts for the dependencies among correlation coefficients instead of assuming that they are independent-this results in a more accurate, pooled correlation matrix than separate meta-analysis of single correlations (Cheung, 2013).
To further address RQ2, we performed TSMASEM and MAGNA.We utilized the pooled correlation matrix from stage 1 in TSMASEM and submitted it to stage 2. Specifically, we estimated exploratory and confirmatory factor analysis models on the basis of the stage-1 correlation matrix and weights that corresponded to the inverse sampling variances and covariances via weighted least squares estimation (Cheung, 2015a).To supplement these correlation-based analyses, we further identified the relationships among mindset items via random-effects MAGNA, that is, analyses based on the partial correlations.MAGNA pools correlation matrices and estimates a multi-group Gaussian Graphical Model based on the corresponding, partial correlation matrices in one stage via maximum-likelihood estimation (Epskamp et al., 2022).Ultimately, MAGNA results in a network of items (nodes) and their partial correlations (exhibited as edges).Unlike factor analysis in which the relationships among two items are assumed to have a common cause (i.e., latent variables), the partial correlations used in MAGNA represent conditional dependencies among items after controlling for all other items (Epskamp et al., 2017).In this sense, the relations between two items represent direct dependencies accounting for the shared dependencies with other items in the network (Kan et al., 2019).Hence, MAGNA provides information about possible item connections from a different perspective than meta-analytic factor analysis and may even result in a better representation of the ITIS scale structure (e.g., Kan et al., 2019).
Addressing RQ3, we conducted subgroup analyses in the TSMASEM approach and tested the measurement invariance of the ITIS factor models.To study the specific moderator effects of sample, study, and measurement characteristics on the parameters in the measurement models, we also performed OSMASEM via maximum-likelihood estimation (Jak & Cheung, 2020).We performed TSMASEM and OSMASEM in the R package "metaSEM" (Cheung, 2015b) and MAGNA in "psychonetrics" (Epskamp, 2021).

Structural equation modelling
To evaluate the fit of the CFA models (RQ2), we referred to the traditional cut-offs of fit indices for an acceptable model fit (e.g., Hu & Bentler, 1999): an insignificant χ 2 -value (indicating no major discrepancy between the observed and model-implied covariance matrices), the Comparative Fit Index (CFI) greater than or equal to 0.95, the Root Mean Square Error of Approximation (RMSEA) smaller than or equal to 0.06, and the Standardized Root Mean Squared Residual (SRMR) smaller than or equal to 0.08.However, as these recommendations have only been validated for a limited set of conditions, we did not consider them "golden rules" (see also Marsh et al., 2004).McNeish and Wolf (2021) developed an alternative, simulation-based procedure to adapt fit index cut-offs to the specific structural equation model and the context of the data.We report these dynamic cut-offs for the factor models relevant to RQ2.
For subsequent invariance testing across subgroups of studies (RQ3), we estimated a series of multi-group CFA models with sequentially increasing constraints of model parameters.These models represented configural, metric, and structural invariance models and were compared via likelihood-ratio tests (Putnick & Bornstein, 2016).

Transparency and openness
To ensure the transparency and replicability of our research approaches and findings, we took three steps: (a) Study

Description of the meta-analytic sample, publication bias, and influential correlations
Table 2 and Fig. 2 detail the features of the meta-analytic sample.Most of the study reports were published in peer-reviewed journals or books and focused on validating the ITIS or describing the associations between mindset and other constructs (m = 34), yet hardly on the evaluation of interventions or changes over time (m = 5).Hence, the study designs were primarily cross-sectional (m = 32).The corresponding study samples contained students in secondary schools (m = 24) or universities (m = 15) who were, on average, 17.7 years old (SD = 4.4, Mdn = 17.2).The average proportion of boys in the study samples was 40.7% (SD = 11.9,Mdn = 42.3).Concerning the ITIS, the primary studies varied in the features of the scale: For instance, the number of response options ranged between 4 and 7, and most studies utilized the original six-point agreement scale (from "strongly disagree" to "strongly agree").The number of items administered to the students ranged between 3 and 8 items.More than 60% of the study samples took the ITIS in a paper-and-pencil format (m = 24) rather than a computer-based (online) format (m = 15).Most studies included the ITIS in a way that both fixed and growth mindset could be assessed (m = 24); studies including only one mindset subscale assessed mainly fixed mindset (m = 14).Two-third of the samples have taken the ITIS with the original item formulations (m = 26).Concerning the measurement quality, reliability coefficients were by and large accessible (m = 33), and authors mainly chose to represent students' mindsets by latent variables (m = 25), testing measurement models with up to three factors.The primary approaches to representing mindsets were confirmatory and exploratory factor analyses.For about 82% of the study samples, model fit indices were evaluated to indicate the goodness of fit.
As an initial step before the pooling of correlation matrices, we evaluated the evidence on possible publication bias for the single Note.The pooled correlation matrix is based on 309 correlation coefficients derived from 39 independent samples in 32 primary studies (N = 27328).item-item correlations.The respective models and statistics are detailed in the Supplementary Material S3.Overall, these analyses showed that: (a) None of the extracted correlations were influential; (b) All of the p-values had evidential value; (c) Some publication bias was evident-the PET and PEESE procedures indicated possible publication bias for the correlations r 13 , r 17 , r 23 , r 27 , r 37 , r 58 , and r 68 .However, all other tests did not flag these correlations.We therefore argue that the degree of publication bias was small.
For the six-item ITIS, the pooled correlations (r) among the fixed-mindset items ranged between 0.58 and 0.66, and between 0.58 and 0.68 for the growth-mindset items (see Table 3).The correlations between fixed-and growth mindset items were also positive yet Note.The pooled correlation matrix is based on 487 correlation coefficients derived from 39 independent samples in 32 primary studies (N = 27328).4).Similar ranges occurred for the eight-item ITIS (fixed-mindset items: r = 0.57-0.68;growth-mindset items: r = 0.57-0.66;fixed-and growth mindset items: r = 0.35-0.45).We utilized these pooled correlation matrices for the subsequent factor analyses of the full meta-analytic sample and generated group-specific correlation matrices for the subgroup analyses, following the same procedure.Notably, in both MAGNA and OSMASEM, the pooling of correlation matrices and the estimation of the analytic models are performed at once.
Six-item ITIS.The partial correlations ω ij between the fixed mindset items i and j ranged from 0.25 to 0.43 and from 0.26 to 0.41 between the growth mindset items; partial correlations between fixed and growth mindset items were substantially lower, ω = 0.04-0.09(ps ≤ .002).Moreover, the resultant values of the expected influence within the network ranged between 0.74 (item 3) and 0.97 (item 5).Fig. 3a depicts the overall network and indicates that two clusters of items existed, one comprising the fixed-mindset items (items 1-3) and one comprising the growth mindset items (items 4-6).The connections between these clusters were all positive, yet weak.
Eight-item ITIS.The partial correlations were similar to those for the six-item ITIS (between fixed mindset items: ω = 0.19-0.34,between growth mindset items: ω = 0.14-0.32,between fixed and growth mindset items: ω = 0.02-0.19;ps ≤ .085).The expected influence values ranged between 0.74 (item 4) and 1.02 (item 5) and identified items 2, 5, and 7 as the most important network nodes.Similar to the 6-item ITIS network, two item clusters corresponded directly to the two mindset subscales (see Fig. 3b).These clusters were positively yet weakly connected.Notably, the partial correlation between items 4 and 8 was considerably larger than the other between-cluster partial correlations, ω 48 = 0.19.Consequently, the subsequent factor analyses may contain cross-loadings or lower within-scale factor loadings involving items 4 and 8.
Overall, the network analyses suggested the existence of two connected but distinct item clusters that corresponded to fixed and growth mindset.The between-cluster connections were stronger for the eight-item ITIS than for the six-item ITIS.

Exploratory factor analyses (RQ2)
Addressing RQ2, we first performed exploratory factor analyses on the pooled correlation matrices.For both the six-and eight-item ITIS, the Kaiser-Mayer-Olkin test provided mean sampling adequacies above 0.80, and Bartlett's sphericity test resulted in significant chi-square statistics (see Table 5).Hence, the correlation matrices supported that the items were correlated, and there was sufficient common variance (Kaiser & Rice, 1974).The Empirical Kaiser Guttman criteria for the eigenvalues indicated two rather than one factor (Braeken & van Assen, 2017; see Supplementary Material S4 and S5).The factor analysis with Oblimin rotation resulted in a factor correlation of ρ = 0.63 for the six-item ITIS, and ρ = 0.64 for the eight-item ITIS, respectively, and exhibited good fit to the data (see Table 5).The factor structure underlying the six-item ITIS was close to a simple structure, with items 1-3 loading on one factor, items 4-6 loading on another factor, and small cross-loadings (λ = − 0.02-0.05).Similarly, items 1-3 and 7 represented one factor, while items 4-6 and 8 represented another factor.However, cross-loadings ranged up to λ = .11(item 7).
Taken together, the exploratory factor analyses suggested two correlated factors describing the structure of the six-and eight-item ITIS.Some items in the eight-item ITIS exhibited cross-loadings and may cause model fit deterioration (Li et al., 2020).

Confirmatory Factor Analyses (RQ2)
To further substantiate the evidence supporting the two-factor model, we specified a single-factor and a two-factor model within the CFA framework and compared the respective model fit indices.Table 6 details the fit indices and the results of the chi-square difference tests for the full meta-analytic samples.Despite the observation that the two-factor models did not only meet the commonly used model fit index cutoffs presented by Hu and Bentler (1999), they also met the criteria of the dynamic model fit index cutoffs, thus pointing to their good representation of the meta-analytic data (i.e., dynamic model fit index cutoffs: six-item ITIS: CFI = 0.979, RMSEA = 0.102, SRMR = 0.036; eight-item ITIS: CFI = 0.972, RMSEA = 0.090, SRMR = 0.044).For both the six-and eight-item ITIS, a two-factor model with two correlated latent variables fixed and growth mindsets (see Fig. 4) exhibited very good model fit and outperformed the corresponding single-factor models.Treating the fixed-mindset items as reverse-coded, the resultant factor correlations were ρ = 0.633 (95% CI [0.589, 0.677]) for the six-item ITIS, and, respectively, ρ = 0.651 (95% CI [0.616, 0.686]) for the eight-item ITIS.Fig. 3 depicts the underlying, meta-analytic CFA models and further shows the high factor loadings for both mindsets, λ F = 0.72-0.84and λ G = 0.74-0.83.Ultimately, these two latent variables were highly reliable in both versions of the ITIS (six-item ITIS: McDonald's ω F = 0.84, ω G = 0.83; eight-item ITIS: ω F = 0.87; ω G = 0.87).Overall, the meta-analytic CFA supported the preference of the two-factor model representing students' implicit theories of intelligence.

Multi-group Confirmatory Factor Analyses (RQ3)
Consistently across subgroups of study samples (i.e., type of response scale, scale modifications, assessment mode, educational level and origin of the sample), the two-factor model fitted the data well and outperformed the single-factor model in model fit (see Appendix Table A1).The respective factor correlations ranged between ρ = 0.42 (subgroup modified scales) and 0.77 (subgroup

Table 5
Results from the meta-analytic exploratory factor analyses of the six-and eight-item ITIS.university samples) for six-item ITIS, and between ρ = .44(subgroup modified scales) and 0.80 (subgroup university samples) for the eight-item ITIS (see Appendix Table A2).The result that the two-factor model was superior within all the a-priori defined subgroups indicated a high level of robustness of this finding.Similar to the full meta-analytic sample, scale reliabilities were high (six-item ITIS: ω F = .78-.90 and ω G = 0.79-0.90;eight-item ITIS: ω F = 0.81-0.93 and ω G = 0.83-0.92).Overall, the two-factor model provided reliable factors and qualified as a baseline model for further invariance and moderation tests.
To further examine possible moderator effects, we tested the two-factor model for its configural, metric, and structural invariance via multi-group CFA.As it is common in invariance testing, we compared the fit of the two latter invariance models to that of the configural invariance model (Putnick & Bornstein, 2016).Table 7 contains the results of these comparisons (see also Supplementary Material S4 and S5).For both ITIS versions, only configural invariance was achieved across educational levels, scale modifications, and East-Asian original samples.We have obtained evidence supporting metric invariance across assessment modes, and even structural invariance across types of response scales.These results point to the moderating roles educational levels, scale modifications, and East-Asian origin samples play for factor loadings and the factor correlation; besides, assessment mode exhibited moderator effects on the factor correlation.Only the type of response scale did not moderate any model parameter.

Moderator effects via one-stage MASEM (RQ3)
The meta-analytic and multi-group CFA provided information about the moderator effects of the sample, study, and measurement features on the factor correlation and the set of factor loadings.However, they neither identified which loadings might be specifically affected nor the direction of these effects.We therefore performed OSMASEM to supplement this information.These analyses revealed the following (see Table 8): • Educational level: For both the six-and eight-item ITIS, negative moderator effects on factor loadings existed for all items, indicating that school samples resulted in lower loadings.In addition, the factor correlations were moderated, with higher coefficients for university samples.• East-Asian samples: Samples originating from East-Asian countries showed lower factor correlations.There was no consistent evidence on the moderation of factor loadings for both ITIS versions, except for a positive effect on the loading of item 3 in the eightitem ITIS.• Type of response scale: This feature did not show any moderator effects.

Table 8
Moderator effects on the parameters in the two-factor measurement model.R. Scherer and D.G. Campos • Assessment mode: The factor correlation was moderated positively for both the six-and eight-item ITIS, with larger coefficients resulting from computer-based (online) assessments.In the six-item ITIS, we observed a tendency toward a positive moderation of the factor loading of item 4; this tendency was evident and significant for the eight-item ITIS.No further moderator effects could be found.• Scale modification: Consistently across the two ITIS versions, the factor loadings of items 1, 2, and 6 were not moderated; yet those of items 3-5, with negative effects indicating lower loadings in primary studies with modified scales.Items 7 and 8 followed the same moderator pattern.Moreover, significantly higher factor correlations could be observed in studies administering the original ITIS.
By and large, OSMASEM supported the previous invariance results and highlighted some items that may function differently across the selected features.

Structure and heterogeneity of the ITIS
As we set out to examine key properties of the two ITIS versions as the most popular assessments of mindsets in education, we discovered (a) overall positive item-item correlations with substantial heterogeneity (RQ1); (b) the preference of measurement models with two moderately to highly correlated factors representing the incremental and entity theories of intelligence (RQ2); and (c) the moderation of model parameters (i.e., factor loadings and correlation) by the educational level and origin of the sample, assessment mode, and scale modifications (RQ3).Table 9 summarizes these findings in greater detail.
As expected, the positive and at least moderate item-item correlations suggested homogeneity within each mindset subscale and point to the existence of some underlying construct (Borsboom et al., 2003).Not surprisingly, the correlations among items measuring different mindsets were weaker-this finding remained even after reversing one set of mindset items to align with the other and was robust against all conditions and forms of grouping in our meta-analytic sample.If the "between-mindsets" correlations are indeed weaker than the "within-mindsets" correlations, then the existence of two rather than one latent variable representing the underlying constructs is indicated.In fact, the network analyses pointed into the same direction for both the six-and eight-item version of the ITIS, that is, two clusters of items occurred, each of which represented one mindset, and positive item-item connections within these clusters.We notice that the meta-analytic aggregation of networks via MAGNA provided a useful tool to explore the item-item connections based on partial correlations and, at the same time, quantify heterogeneity (Epskamp et al., 2022).
All item-item correlations in both ITIS versions exhibited between-sample heterogeneity.This heterogeneity may be interpreted in two ways: First, the item-item correlation could not be fully replicated across samples (Hedges & Schauer, 2019).However, the heterogeneity in item-item correlations does not necessarily imply that the core finding of our study, that is, the existence of two correlated mindset factors, is also not replicable.Second, the between-sample variation may be linked to sample, study, or measurement characteristics-information about these links may be especially useful for informing the design of future studies and benchmarking their results (Pigott & Polanin, 2020).The finding that educational level, the origin of the student samples, and measurement characteristics showed moderation effects was in line with existing meta-analyses of the mindset-achievement relation (Costa & Faria, 2018;Sisk et al., 2018).Nevertheless, our list of possible moderators was not exhaustive.For instance, students' gender may explain parts of the remaining heterogeneity.In fact, some empirical studies suggested significant gender differences in mindsets (e.g., Macnamara & Rupani, 2017;OECD, 2021).However, the gender composition of the student samples reviewed by Costa and Faria (2018) did not moderate the mindset-achievement relation, and some evidence points to the measurement invariance of mindset scales across gender (Bostwick et al., 2017) and small, if not insignificant gender differences in mindset scores (e.g., OECD, 2021;Sigmundsson et al., 2021).Moreover, students' cognitive abilities may also moderate the parameters in the mindset measurement model.In a large-scale study of the Rosenberg Self-Esteem Scale, Gnambs and Schroeders (2017) showed that several measurement properties

Table 9
Overview of the key findings.(i.e., dimensionality and wording effects) were associated with students' cognitive abilities, possibly via response styles.Given that the ITIS captures mindsets via similar self-reports, cognitive abilities may explain heterogeneity.Until now, most meta-analyses describing the relations between implicit theories and other constructs have quantified and explained the heterogeneity in scale scores, assuming that the measurement models and item-item correlations are homogeneous across study samples (e.g., Costa & Faria, 2018;Sisk et al., 2018).Our meta-analysis, however, showcases that heterogeneity is, in fact, located at the level of item-item correlations which form the basis for measurement models in meta-analyses.This is where the potential of correlation-based MASEM at the level of items lies (Cheung, 2015a)-in quantifying and possibly explaining heterogeneity at the level of items, the key sources of information for the structure of mindset assessments.
Both versions of the ITIS exhibited a two-factor structure in the subsequent factor analyses for the full sample of primary studies and sub-samples.Finding that this structure, by and large, held within and across sub-groups of studies suggests its robustness and applicability in various contexts (Higgins et al., 2019).From a substantive perspective, this finding implies that students can hold both incremental and entity theories simultaneously.In their review of the conceptual and assessment issues of implicit theories, Lüftenegger and Chen (2017) argued that it may indeed be possible to observe both mindsets in achievement situations or in interventions.
Recently, more and more evidence has accumulated backing a "mixed" mindset (e.g., Glerum et al., 2020;Li & Bates, 2020;Lou et al., 2021).These findings could impact how researchers conceptualize implicit theories, that is, no longer as mutually exclusive beliefs but as two beliefs that are connected.
Albeit the meta-analytic evidence suggests two correlated factors, it still needs to be further examined if they indeed represent the two implicit theory constructs.We argue that additional evidence is needed to back this distinction and strengthen the validity argument (Pellegrino et al., 2016).Examining the relations to other constructs, such as educational achievement and intelligence, across the two mindsets to avoid "jingle-jangle fallacies" (Gonzalez et al., 2021).In this way, evidence on the link between mindsets and their reference construct (i.e., intelligence) could support the crafting of a validity argument.A study by Macnamara and Rupani (2017) showed that such a link may exist.Such evidence is especially relevant because the multidimensionality of the ITIS could also be caused by the mixture of item wordings that are positive (e.g., incremental) or negative (e.g., entity) relative to the construct (e.g., Gnambs & Schroeders, 2017;Steinmann et al., 2021).If this applies, then the two correlated factors merely represent methodological artefacts that may be construct-irrelevant.Irrespective of the source of multidimensionality, the core finding that the ITIS entertains two correlated factors could impact the way in which researchers use this scale and draw inferences on students' mindsets.

Implications for using the implicit theories of Intelligence Scale
Our findings have several implications for the use of the ITIS: First, the existence of two correlated mindset factors even after reversing one item set challenges the practice of reverse coding and its assumption.Reverse-coding items does not necessarily align the construct meaning and lead to the unidimensionality of the ITIS (see also Lüftenegger & Chen, 2017).This applies to other constructs as well (Steinmann et al., 2021) and is a well-known issue in educational measurement (Kam, 2016).Hence, if educational researchers assess only one type of mindset, by multiple items or a single item, they should not draw inferences on the other type.We caution against the use of single-item measures, given their limited reliability and the possible response bias (Rammstedt et al., 2021).If researchers wish to capture implicit theories more broadly, they should assess both mindsets and draw inferences on these two constructs rather than a unified construct.This, however, requires measurement models that represent the two mindsets within the ITIS.In this situation, reporting a mindset profile is a suitable alternative to categorizing students into fixed vs. growth mindsets (Glerum et al., 2020;Lüftenegger & Chen, 2017).
Second, the factor and network analyses suggested that adding items 7 and 8 to the six-item ITIS improves sub-scale reliabilities, yet all other psychometric properties were similar.Both versions exhibited non-invariance across educational levels so that a preference of one over the other for specific school or university samples did not become clear.The six-and eight-item ITIS seem applicable to the educational levels we have studied in our meta-analysis, yet not fully comparable across these levels.
Third, we encourage researchers who wish to use the ITIS in their studies to consider the context in which the ITIS is administered.Specifically, we showed that the context of the student sample explains heterogeneity in the meta-analytic data and that most parameters in the measurement models were not invariant across sub-groups.The functioning of the ITIS seems sensitive to the sample context and requires measurement invariance testing if group comparisons of mindsets are of interest.Moreover, the observations that measurement characteristics explained heterogeneity suggests once more to avoid assuming that mindset scores are fully comparable across assessment modes or scale modifications.From our perspective, being consistent with the mindset assessment in studies with multiple samples or conditions, testing for possible non-invariance as part of group comparisons, and reporting transparently the context of the study and sample are key to further mindset research.

Limitations and future directions
Our study has several limitations: First, our meta-analysis focused on the assessment of mindsets in education via the Implicit Theories of Intelligence Scale, originally developed by Dweck (2000), yet neither included other scales nor mindset assessments focusing on mathematics, reading, personality, or other key educational domains.We therefore encourage meta-analysts to initiate and extend item-level meta-analyses-this could shed light on the possible differences and similarities of different mindset assessments and their functioning across domains.Second, some item-item correlations had to be retrieved from the parameters of factor models.Although we chose the best-fitting model or EFA over CFA to approximate the observed correlation matrix, our meta-analytic sample comprised model-implied and observed correlation matrices.The extent to which this mixture may create bias toward a specific factor model in MASEM is to be clarified.Third, item-level meta-analyses require the transparent and sufficient reporting of psychometric properties, item-item correlations, or raw data in the primary studies.This requirement could create selection bias in the meta-analytic sample, preferring studies meeting the reporting requirements.Together with Plucker and Makel (2021), we therefore encourage educational researchers to adopt a culture of open science and transparency to circumvent such selection bias in subsequent meta-analyses.

Conclusion
Our item-level meta-analysis revealed that the most commonly used assessment of students' implicit theories of intelligence, the ITIS, measures two moderately correlated mindset factors, one corresponding to incremental theories and one to entity theories of intelligence.This finding held even after reverse-coding the item set of one of these theories and applied to all subgroups of primary studies, that is, across educational levels, origins of the student sample, assessment modes, and scale modifications.From the perspective of educational measurement, our study challenges current assessment practices of mindsets, especially the reverse-coding of fixed-or growth-mindset items under the assumption that the meanings of the items align after the reversing.However, it still needs to be clarified whether the two mindset factors also show differential relations to educational outcomes, such as academic achievement, and whether the cause of their existence is merely an artefact of mixing positive and negative item formulations in one scale.From a substantive perspective, we argue that the evidence we have presented in this study points to the possibility that students can hold both fixed and growth mindsets.In other words, the two implicit theories of intelligence are not mutually exclusive but can coexist.We therefore encourage educational researchers to assess both mindsets and represent them as two related but different constructs.Besides, the heterogeneity in the psychometric properties of the ITIS suggests that the sample, study, and measurement contexts matter when interpreting the resultant mindset scores.Our study further demonstrated the utility of item-level meta-analysis for reliability and validity generalization and for testing test theories about educational measurements and their applications.Given the limited number of samples within this subgroup or unstable model parameter estimates, the stage-1 pooling of the correlation matrices was based on a multivariate fixed-effects model.To ensure the quality of these models, we considered them only if the fit indices suggested at least a reasonable model fit (see Supplementary Material S4 and S5).

RQ 1 .
To what extent are the ITIS items correlated, and how do these correlations vary across study samples?(Pooled correlations and heterogeneity) RQ 2. To what extent can the single-and two-factor models represent the structure of the ITIS, and what characterizes the bestfitting model?(Factor structure and model parameters) RQ 3. Which study, sample, and measurement characteristics moderate the model parameters in the final ITIS factor model? (Moderation of model parameters)

R
.Scherer and D.G. Campos

Fig. 2 .
Fig. 2. Distribution of study samples per (a) publication year and (b) country.

r=
Average correlation under the random-effects model, 95% CI r = 95% confidence interval of r, τ 2 = Between-sample heterogeneity with confidence interval 95% CI τ 2 , I 2 = Heterogeneity index.(R) = Items represented fixed mindsets and were treated as reverse-coded.R.Scherer and D.G. Campos

Fig. 3 .
Fig. 3. Random-Effects Meta-Analytic Gaussian Network Aggregation Note.ITIS = Implicit Theories of Intelligence Scale.Items representing fixed mindsets were treated as reverse-coded.
Note.RMSEA = Root Mean Square Error of Approximation, df = degrees of freedom, 95% CI = 95% confidence interval, SRMR = Standardized Root Mean Squared Residual, CFI = Comparative Fit Index, AIC = Akaike's Information Criterion, BIC = Bayesian Information Criterion, N = Overall size of the study samples, m = Number of independent samples, l = Number of correlation coefficients, ITIS = Implicit Theories of Intelligence Scale.
Note. logL = Log-likelihood value, Npar = Number of parameters, AIC = Akaike's Information Criterion, BIC = Bayesian Information Criterion, χ 2 LRT = Chi-square value with Δdf degrees of freedom based on the likelihood-ratio test (LRT).Model comparisons are conducted against the configural model.
To what extent are the ITIS items correlated, and how do these correlations vary across study samples?(Pooled correlations and heterogeneity) •Positive correlations among fixed-mindset items, r = .57-.68 •Positive correlations among growth-mindset items, r = .57-.68 •Positive but lower correlations between fixed-and growth-mindset items, r = .35-.45 •Substantial heterogeneity in item-item correlations between samples, I 2 = 76.5-95.4% RQ2.To what extent can the single-and two-factor models represent the structure of the ITIS, and what characterizes the best-fitting model?(Factor structures) •Preference of the two-factor model over the single-factor model for the full data sets •Preference of the two-factor model over the single-factor model consistently across subgroups of primary studies •Factor correlations between ρ = .42-.80 •High scale reliabilities, ω = .78-.93 RQ3.Which study, sample, and measurement characteristics moderate the model parameters in the final ITIS factor model? (Moderation of model parameters) •No moderation by the types of response scales •Moderation of factor loadings and the factor correlation by the educational level and origin of the sample, assessment mode, and scale modifications R. Scherer and D.G. Campos
Note.RMSEA = Root Mean Square Error of Approximation, df = degrees of freedom, 95% CI = 95% confidence interval, SRMR = Standardized Root Mean Squared Residual, CFI = Comparative Fit Index, AIC = Akaike's Information Criterion, BIC = Bayesian Information Criterion, N = Overall size of the study samples, m = Number of independent samples, l = Number of correlation coefficients, ITIS = Implicit Theories of Intelligence Scale.#

Table 2
Coded features of the primary studies, samples, and measures in the meta-analytic sample.
Scherer and D.G. Campospreregistration-we pre-registered the present study within the Open Science Framework (OSF), including the study goals, analytic approaches, search strategies, and screening procedures of primary literature (https://bit.ly/3p517BF);(b) Creating an open-access project-we created an open-access project within the OSF to disclose the analytic code, output files, and the data set underlying our study (https://bit.ly/3DzfwKm);(c) Providing supplementary material-to ensure that readers can access the data, analytic code, and output files directly, we submitted the respective files as supplementary material attached to this journal article (see Supplementary Material S1-S7). R.

Table 3
Pooled item-item correlations matrix and heterogeneity indices for the six-item ITIS.

Table 4
Pooled item-item correlations matrix and heterogeneity indices for the eight-item ITIS.
(Kaiser & Rice, 1974)ctor analyses were based on Oblimin rotation and maximum-likelihood estimation.(R)=Items represented fixed mindsets and were treated as reverse-coded.Factor loadings above 0.30 are in bold.MSA = Measure of sampling adequacy(Kaiser & Rice, 1974), RMSR = Root Mean Square of the Residuals, RMSEA = Root Mean Square Error of Approximation, df = degrees of freedom, ITIS = Implicit Theories of Intelligence Scale.*p < .001.

Table 6
Model fit indices and comparisons resulting from confirmatory factor analyses.

Table 7
Results of the measurement invariance testing across subgroups.

Table A .2
Factor Correlations and Scale Reliability Coefficients for the Subgroups