Curate Science

Quantifying the Trustworthiness of Empirical Research.

Published scientific findings can only justifiably inform applied problems or theory if they are at minimum provisionally trustworthy. No platform, however, currently exists that tracks and incrementally quantifies the trustworthiness of empirical research. Curate Science is a unified framework to crowdsource the internal (transparency, analytic reproducibility, & robustness) and external (replicability & generalizability) trustworthiness of empirical research to accelerate our understanding of the world and development of applied solutions to medical and social problems.

UPDATE (June 13, 2017): New and much expanded curation framework (version 4.4.0) released (announcement; full details).

About People Newsletter Version 3 Collections Version 2 Collections

Base Case Calibration: Bem's (2011) Retroactive Recall
Example 1: Macbeth Effect
Example 2: Verbal Overshadowing Effect
Example 3: Infidelity Distress Effect

(For full details of our unified curation framework, please see here (version 4.4.0).)

Curate Science is a unified framework to quantify, in a crowdsourced manner, the internal and external trustworthiness of empirical research over time.1 Internal trustworthiness is quantified by estimating a study's transparency, analytic reproducibility, and analytic robustness. External trustworthiness is quantified by estimating an effect's replicability and generalizability.

Our unified framework needs to overcome several conceptual, epistemological, and statistical challenges:

  1. Quantifying internal trustworthiness requires a standardized workflow and principled metric to estimate the extent to which a study's results are (i) transparent (reporting standards, open materials, open data, & pre-registration), (ii) analytically reproducible, and (iii) analytically robust.
  2. Quantifying external trustworthiness of published findings requires a flexible (1) replication taxonomy and (2) workflow to connect follow-up studies to an original effect. Such a replication taxonomy allows us to (i) categorize replication attempts that are sufficiently methodologically similar to an original study to estimate replicability and (ii) categorize eligible generalizations of an effect to estimate an effect's generalizability across different methods, conditions, and populations. Standardized workflow to connect replication and generalization studies to an original effect in evidence collections, forming the basis for estimating replicability and generalizability.
  3. Accounting for transparency characteristics of original and replication studies (i.e., (i) verifiability (i.e., compliance with reporting standards, open materials, & open data standards), (ii) pre-registration status, (iii) analytic reproducibility verification status) and accounting for study characteristics of replications (i.e., (iv) active sample evidence/positive controls, (v) investigator independence, and (vi) replication design differences).
  4. Developing a principled approach to (i) meta-analytically combining replication evidence within and across generalizations of an empirical effect and (ii) interpreting the overall meta-analytic to quantify replicability and generalizability.
  5. Implementing a viable crowdsourcing web platform that (i) incentivizes the number and frequency of contributions and (ii) ensures quality-control of submitted content.

(For full details of our unified curation framework, please see here (version 4.4.0). For experimental features under development, please see our sandbox.)

Large-Scale Replication Efforts (970 replications)
  • Reproducibility Project: Psychology [100 replications; view studies ]
  • Social Psych Special Issue [31 replications]
    • Many Labs 1 [12 effects x 36 labs = 432 replications]
  • Many Labs 2 [26 effects, N = ~15,000]
  • Many Labs 3 [10 effects x 21 labs = 210 replications]
  • Many Labs 4: Impact of "expertise" on replicability
  • Many Labs 5: Can peer-review of protocols boost replicability?
  • Registered Replication Reports (RRRs) at Perspectives on Psychological Science
    • RRR1 & RRR2: Verbal overshadowing [23 replications; view studies ]
    • RRR3: Grammar on intentionality [13 replications; view studies ]
    • RRR4: Ego depletion [23 replications; view studies ]
    • RRR5: Commitment on forgiveness [16 replications; view studies ]
    • RRR6: Facial feedback hypothesis [17 replications; view studies ]
    • RRR7: Intuitive-cooperation effect [20 replications]
    • RRR8: Trivial pursuit effect [data being analyzed]
    • RRR9: Hostility priming increases perceptions of hostility [data being collected]
    • RRR10: Moral reminder reduces cheating [data being collected]
    • RRR11: SNARC effect (Spatial-Numerical Association of Response Codes) [data being collected]
  • Economics Reproducibility Project [67 analytic reproductions; Chang & Li, 2015]
  • Economics Lab Experiments Replicability Project [18 replications; Camerer et al, 2016]

Last updated: June 27, 2017

All Replications (1049 replications; 382 curated, 668 being curated [see list])

List of known direct replications ("Exact", "Very Close", or "Close" direct replications) of generalizations of effects according to our working taxonomy. Please let us know about any missing replications or errors at curatescience-anti-bot-bit@gmail.com

Search via CTLR+F (Windows) or ⌘+F (Mac) Last updated: June 27, 2017

Original ⁄ Replication Study Authors Effect size [± 95% CI] Active Sample Evidence ⁄ Positive Controls Design Differences IVs DVs Other Outcomes
Relationships ⁄ Evolutionary Psychology
Commitment causally influences forgiveness effect (RRR5)
Finkel et al. (2002) Study 1 MD = -.65 ± .45 α = .78 on subjective commitment measure Same pattern of results for other 3 DVs
Bredow & Luna (2016) MD = -.42 ± .38 α = .90 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Carson, Corretti, & Kane (2016) MD = -.26 ± .39 α = .92 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Yong & Li (2016) MD = -.25 ± .46 α = .91 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Goldberg, Sinclair et al. (2016) MD = -.23 ± .38 α = .91 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Cheung, Campbell & LeBel (2016) MD = -.22 ± .41 α = .91 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Fuglestad, Leone & Kim (2016) MD = -.21 ± .58 α = .92 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Vranka, Bahnik, & Houdek (2016) MD = -.12 ± .43 α = .93 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Collins, Bowen, Winczewski et al. (2016) MD = -.11 ± .42 α = .94 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
DiDonato & Golom (2016) MD = -.04 ± .49 α = .91 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Sucharyna & Morry (2016) MD = -.03 ± .45 α = .90 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Carcedo & Fernandez-Rouco (2016) MD = +.03 ± .39 α = .86 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Aykutoglu, Uysal et al. (2016) MD = +.12 ± .43 α = .93 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Hoplock & Stinson (2016) MD = +.16 ± .35 α = .88 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Cobb, Pink, Millman & Logan (2016) MD = +.16 ± .38 α = .90 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Caprariello (2016) MD = +.23 ± .43 α = .90 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Tidwell & Kraus (2016) MD = +.29 ± .46 α = .90 on subjective commitment measure DV assessed via computer rather than paper-and-pencil
Sex differences in distress to infidelity (part of SP:Special Issue)
Buss et al. (1999) Study 2 (Young) d = +1.30 ± .28
IJzerman et al. (2014) Study 1 (Young) d = +1.11 ± .50
IJzerman et al. (2014) Study 2 (Young) d = +0.30 ± .28
IJzerman et al. (2014) Study 4 (Young) d = +0.50 ± .12
Shackelford et al. (2004) (Old) d = +0.57 ± .28
IJzerman et al. (2014) Study 3 (Old) d = -0.09 ± .34
IJzerman et al. (2014) Study 4 (Old) d = +0.05 ± .28
Playboy effect
Kenrick et al. (1989) Study 2 Δd = -.53 ± .64 Playboy centerfolds vs. control; Participant sex Love for partner (Rubin Love-scale)
Balzarini et al. (2015) Study 1 Δd = +.29 ± .46 Nudes rated as more pleasant than abstract art Updated pictures of abstract art & male/female nudes; Two attention check questions Playboy centerfolds vs. control; Participant sex Love for partner (Rubin Love-scale)
Balzarini et al. (2015) Study 2 Δd = +.30 ± .42 Nudes rated as more pleasant than abstract art Updated pictures of abstract art & male/female nudes; Two attention check questions Playboy centerfolds vs. control; Participant sex Love for partner (Rubin Love-scale)
Balzarini et al. (2015) Study 3 Δd = -.38 ± .46 Nudes rated as more pleasant than abstract art Updated pictures of abstract art & male/female nudes; Two attention check questions Playboy centerfolds vs. control; Participant sex Love for partner (Rubin Love-scale)
Romeo and Juliet effect (part of SP:Special Issue)
Driscoll et al. (1972) r = +.34 ± .32
Sinclair et al. (2014) r = -.05 ± .10 Parental interference negatively associated with partner trust r = -.18 p < .001 Two other outcomes (commitment; trust)
Self ⁄ Emotions
Facial feedback hypothesis (RRR6)
Strack et al. (1988) Study 1 MD = +.82 ± .77
Özdogru (2016) MD = -.58 ± .83 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Oosterwijk et al. (2016) MD = -.24 ± .52 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Wayand (2016) MD = -.20 ± .54 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Koch (2016) MD = -.19 ± .54 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Pacheco-Unguetti et al. (2016) MD = -.13 ± .63 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Capaldi et al. (2016) MD = -.11 ± .58 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Chasten et al. (2016) MD = -.05 ± .60 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Benning et al. (2016) MD = -.02 ± .50 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Talarico & DeCicco (2016) MD = +.02 ± .54 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Korb et al. (2016) MD = +.02 ± .67 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Bulnes et al. (2016) MD = +.12 ± .55 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Albohn et al. (2016) MD = +.14 ± .53 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Wagenmakers et al. (2016) MD = +.15 ± .42 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Allard & Zetzer (2016) MD = +.16 ± .59 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Holmes et al. (2016) MD = +.20 ± .55 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Zeelenberg et al. (2016) MD = +.35 ± .53 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Lynott et al. (2016) MD = +.37 ± .55 New set of Far Side cartoons normed to be moderately funny; instructions via video to minimize experimenter effects
Positive mood boosts helping effect
Isen & Levin (1972) Study 2 PD = +84% ± 18%
Blevins & Murphy (1974) PD = +03% ± 30%
Levin & Isen (1975) Study 1 PD = +60% ± 30%
Weyant & Clark (1977) Study 1 PD = +25% ± 29%
Weyant & Clark (1977) Study 2 PD = -07% ± 16%
Ego depletion effect (includes RRR4)
Gaillot, Baumeister et al. (2007) Study 7 d = -1.19 ± .52 sugar vs. splenda; video attention task vs. control Stroop performance
Cesario & Corker (2010) d = +0.22 ± .34 Positive correlation between baseline & post-manipulation error rates, r = .36, p < .001 No manipulation check sugar vs. splenda; video attention task vs. control Stroop performance
Wang & Dvorak (2010) d = -0.99 ± .52 sugar vs. splenda; future-discounting t1 vs. t2 future-discounting task
Lange & Eggert (2014) Study 1 d = +0.13 ± .48 Test-retest reliability of r = .80 across t1 and t2 scores Different choices in future-discounting task sugar vs. splenda; future-discounting t1 vs. t2 future-discounting task
Muraven, Tice et al. (1998) Study 2 d = -0.75 ± .71 thought suppression vs. control anagram performance
Murtagh & Todd (2004) Study 2 d = -0.01 ± .55 Very difficult solvable anagrams used rather than "unsolvable" thought suppression vs. control anagram performance
Schmeichel, Vohs et al. (2003) Study 1 d = -1.58 ± .98 video attention task vs. control GRE standardized test
Pond et al. (2011) Study 3 d = -0.35 ± .52 10 verbal GRE items used (instead of 13 analytic GRE items) video attention task vs. control GRE standardized test
Schmeichel (2007) Study 1 d = -0.37 ± .44 video attention task vs. control working memory (OSPAN)
Healy et al. (2011) Study 1 d = -1.31 ± .71 % of target words recalled (rather than total) video attention task vs. control working memory (OSPAN)
Carter & McCullough (2013) d = +0.05 ± .45 Effortful essay task vs. control in between IV and DV (perfectly confounded w/ IV) video attention task vs. control working memory (OSPAN)
Lurquin et al. (2016) d = +0.21 ± .28 Main effect of OSPAN set sizes on performance, F(1, 199) = 4439.81, p < .001 40 target words in OSPAN (rather than 48) video attention task vs. control working memory (OSPAN)
Inzlicht & Gutsell (2007) d = -1.06 ± .71 emotion suppression (video) vs. control EEG ERN during stroop task
Wang, Yang, & Wang (2014) d = -0.93 ± .73 emotion suppression (video) vs. control EEG ERN during stroop task
Sripada, Kessler, & Jonides (2014) d = -0.69 ± .59 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Ringos & Carlucci (2016) d = -0.50 ± .48 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Wolff, Muzzi & Brand (2016) d = -0.46 ± .43 German language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Calvillo & Mills (2016) d = -0.44 ± .56 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Crowell, Finley et al. (2016) d = -0.40 ± .46 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Lynch, vanDellen et al. (2016) d = -0.36 ± .44 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Birt & Muise (2016) d = -0.31 ± .52 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Yusainy, Wimbarti et al. (2016) d = -0.22 ± .31 Indonesian language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Lau & Brewer (2016) d = -0.20 ± .40 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Ullrich, Primoceri et al. (2016) d = -0.09 ± .39 German language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Elson (2016) d = -0.04 ± .42 German language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Cheung, Kroese et al. (2016) d = -0.04 ± .29 Dutch language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Hagger et al. (2016) d = +0.00 ± .39 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Schlinkert, Schrama et al. (2016) d = +0.00 ± .44 Dutch language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Philipp & Cannon (2016) d = +0.04 ± .45 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Carruth & Miyake (2016) d = +0.09 ± .36 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Brandt (2016) d = +0.11 ± .39 Dutch language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Stamos, Bruyneel et al. (2016) d = +0.12 ± .41 Dutch language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Rentzsch, Nalis et al. (2016) d = +0.18 ± .39 German language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Francis & Inzlicht (2016) d = +0.18 ± .56 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Lange, Heise et al. (2016) d = +0.23 ± .38 German language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Evans, Fay, & Mosser (2016) d = +0.27 ± .42 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Tinghög & Koppel (2016) d = +0.40 ± .43 effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Otgaar, Martijn et al. (2016) d = +0.41 ± .50 Dutch language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Muller, Zerhouni et al. (2016) d = +0.51 ± .46 French language effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
High-approach positive affect constricts attention effect
Gable & Harmon-Jones (2008) Study 2 η² = .53 [.26,.67] Neutral mood global bias effect observed, p < .01 Affect (High-approach positive affect vs. neutral) Attentional breadth (Navon, 1977 letter task RTs)
Domachowska et al. (2016) Study 1 η² = .24 [.09,.39] Neutral mood global bias effect observed, p < .01; Successful valence manipulation check, dz = 1.14 German language; Mood (PANAS) assessed at study outset Affect (High-approach positive affect vs. neutral) Attentional breadth (Navon, 1977 letter task RTs)
Domachowska et al. (2016) Study 2 η² = .08 [.00,.21] Neutral mood global bias effect observed, p < .05; Successful valence manipulation check, dz = 2.51 German-normed stimuli; Low-approach positive affect within-subject condition also included Affect (High-approach positive affect vs. neutral) Attentional breadth (Navon, 1977 letter task RTs)
Attitudes & Stereotypes
NFC amplifies strong argument persuasion effect (ELM, part of ML3)
Cacioppo et al. (1983) Study 1 ηp² = .170 [.06,.29]
Wirth (2016, OSUNewark) ηp² = .000 [.00,.00]
Fletcher et al. (2016, UofSM) ηp² = .000 [.00,.01]
Devos et al. (2016, SDSU) ηp² = .001 [.00,.02]
Vaughn (2016, Ithaca) ηp² = .001 [.00,.04]
Banks et al. (2016, NovaSU) ηp² = .001 [.00,.04]
German et al. (2016, UCDavis) ηp² = .001 [.00,.04]
Re et al. (2016, UofT) ηp² = .001 [.00,.06]
Capaldi et al. (2016, Carleton) ηp² = .001 [.00,.08]
Davis & Hicks (2016, TexasA&M) ηp² = .002 [.00,.04]
Brown et al. (2016, UofF) ηp² = .002 [.00,.04]
Grahe et al. (2016, PLU) ηp² = .002 [.00,.05]
Bernstein (2016, PSA) ηp² = .003 [.00,.05]
Hermann et al. (2016, Bradley) ηp² = .004 [.00,.06]
Baranski et al. (2016, UCRiverside) ηp² = .005 [.00,.04]
Ebersole et al. (2016, UofV) ηp² = .005 [.00,.05]
Johnson et al. (2016, MichiganSU) ηp² = .006 [.00,.06]
Bonfiglio et al. (2016, Ashland) ηp² = .007 [.00,.08]
Allen (2016, MontanaSU) ηp² = .009 [.00,.07]
Belanger et al. (2016, Miami) ηp² = .010 [.00,.08]
Cairo et al. (2016, VCU) ηp² = .024 [.00,.11]
1/f noise racial bias moderation effect
Correll (2008) d = +.59 ± .51
Madurksi & LeBel (2015) Study 1 d = +.16 ± .34 Racial bias (in terms of RT) higher in use and avoid race (compared to control) conditions (d = .34 ± .35)
Madurksi & LeBel (2015) Study 2 d = -.09 ± .34 Racial bias (in terms of RT) higher in use and avoid race (compared to control) conditions (d = .44 ± .35)
Applied Findings
Verbal overshadowing (RRR1 & RRR2)
Schooler & Eng...-Schooler (1990) Study 1 (RRR2) IRD = -25% ± 20%
Poirer et al. (2014) Study 2 IRD = -29% ± 20%
Delvenne et al. (2014) Study 2 IRD = -26% ± 19%
Birt & Aucoin (2014) Study 2 IRD = -25% ± 24%
Susa et al. (2014) Study 2 IRD = -24% ± 18%
Carlson et al. (2014) Study 2 IRD = -24% ± 15%
Musselman & Colarusso (2014) Study 2 IRD = -23% ± 24%
Echterhoff & Kopietz (2014) Study 2 IRD = -22% ± 20%
Mammarella et al. (2014) Study 2 IRD = -22% ± 19%
Dellapaolera & Bornstein (2014) Study 2 IRD = -21% ± 16%
Mitchell & Petro (2014) Study 2 IRD = -20% ± 19%
Ulatowska & Cislak (2014) Study 2 IRD = -17% ± 19%
Wade et al. (2014) Study 2 IRD = -17% ± 17%
Birch (2014) Study 2 IRD = -16% ± 18%
McCoy & Rancourt (2014) Study 2 IRD = -15% ± 21%
Greenberg et al. (2014) Study 2 IRD = -13% ± 25%
Alogna et al. (2014) Study 2 IRD = -12% ± 19%
Michael et al. (2014, mTURK) Study 2 IRD = -11% ± 10%
Koch et al. (2014) Study 2 IRD = -10% ± 25%
Thompson (2014) Study 2 IRD = -09% ± 22%
Rubinova et al. (2014) Study 2 IRD = -03% ± 18%
Brandimonte (2014) Study 2 IRD = -02% ± 20%
Eggleston et al. (2014) Study 2 IRD = -02% ± 20%
Kehn et al. (2014) Study 2 IRD = -01% ± 20%
Schooler & Eng...-Schooler (1990) Study 4 (RRR1) IRD = -22% ± 22%
Leite (2014) Study 1 IRD = -18% ± 19%
Echterhoff & Kopietz (2014) Study 1 IRD = -16% ± 20%
Musselman & Colarusso (2014) Study 1 IRD = -16% ± 18%
McCoy & Rancourt (2014) Study 1 IRD = -15% ± 18%
Dellapaolera & Bornstein (2014) Study 1 IRD = -15% ± 16%
Alogna et al. (2014) Study 1 IRD = -13% ± 18%
Poirer et al. (2014) Study 1 IRD = -13% ± 18%
Carlson et al. (2014) Study 1 IRD = -13% ± 16%
Mammarella et al. (2014) Study 1 IRD = -12% ± 13%
Chu, Marsh, & Skelton (2014) Study 1 IRD = -09% ± 19%
Greenberg et al. (2014) Study 1 IRD = -08% ± 20%
Wade et al. (2014) Study 1 IRD = -08% ± 18%
Eggleston et al. (2014) Study 1 IRD = -07% ± 16%
Verkoeijen et al. (2014) Study 1 IRD = -05% ± 19%
Kehn et al. (2014) Study 1 IRD = -05% ± 17%
Birt & Aucoin (2014) Study 1 IRD = -03% ± 18%
Palmer et al. (2014) Study 1 IRD = -02% ± 18%
Was et al. (2014) Study 1 IRD = -02% ± 17%
McIntyre, Langton et al. (2014) Study 1 IRD = -01% ± 18%
Birch (2014) Study 1 IRD = +00% ± 18%
Susa et al. (2014) Study 1 IRD = +00% ± 18%
Michael et al. (2014) Study 1 IRD = +00% ± 15%
Delvenne et al. (2014) Study 1 IRD = +02% ± 17%
Gabbert & Valentine (2014) Study 1 IRD = +03% ± 19%
Ulatowska & Cislak (2014) Study 1 IRD = +04% ± 17%
Mitchell & Petro (2014) Study 1 IRD = +04% ± 18%
Michael et al. (2014, mTURK) Study 1 IRD = +06% ± 10%
Rubinova et al. (2014) Study 1 IRD = +07% ± 18%
Koch et al. (2014) Study 1 IRD = +08% ± 20%
Brandimonte (2014) Study 1 IRD = +10% ± 16%
Edlund & Nichols (2014) Study 1 IRD = +12% ± 19%
Thompson (2014) Study 1 IRD = +14% ± 19%
Grammar influences perceived intentionality (RRR3)
Hart & Albarracín (2011) Study 3 MD = +1.20 ± .88 Same pattern of results for other 2 DVs (detailed processing; intention attribution)
Berger (2016) MD = -0.98 ± .74
Knepp (2016) MD = -0.95 ± .63
Michael (2016) MD = -0.41 ± .64
Prenoveau & Carlucci (2016) MD = -0.38 ± .68
Birt & Aucoin (2016) MD = -0.38 ± .59
Arnal (2016) MD = -0.35 ± .72
Eerland, Sherrill et al. (2016, online) MD = -0.33 ± .29
Kurby & Kibbe (2016) MD = -0.14 ± .60
Ferretti (2016) MD = -0.01 ± .42
Eerland, Sherrill et al. (2016) MD = +0.16 ± .65
Poirier, Capezza, & Crocker (2016) MD = +0.32 ± .66
Melcher (2016) MD = +0.65 ± .92
Reading fiction boosts empathy effect
Kidd & Castano (2013) Study 1 d = +.56 ± .43 Literary fiction vs. nonfiction RMET
Panero et al. (2016) Study 1 d = -.08 ± .23 Literary fiction vs. nonfiction RMET
Kidd & Castano (2013) Study 3 d = +.36 ± .21 Literary fiction vs. popular fiction RMET
Dijkstra et al. (2015) d = +.14 ± .23 Literary fiction vs. popular fiction RMET
Panero et al. (2016) Study 2 d = +.04 ± .22 Literary fiction vs. popular fiction RMET
Kidd & Castano (2013) Study 5 d = +.25 ± .25 Literary fiction vs. no reading RMET
Panero et al. (2016) Study 3 d = +.10 ± .20 Literary fiction vs. no reading RMET
Pre-cognition
Bem (2011) Study 1 d = +.25
Wagenmakers et al. (2012) d = -.05
Bem (2011) Study 8 DR% = +2.27 ± 2.27
Galak et al. (2012) Study 1 DR% = -1.21 ± 1.55
Galak et al. (2012) Study 2 DR% = +0.00 ± 1.35
Galak et al. (2012) Study 3 DR% = +1.17 ± 1.43
Galak et al. (2012) Study 7 DR% = -0.05 ± .100
Bem (2011) Study 9 DR% = +4.21 ± 3.01
Galak et al. (2012) Study 4 DR% = +1.59 ± 1.11
Galak et al. (2012) Study 5 DR% = -0.49 ± 1.45
Galak et al. (2012) Study 6 DR% = -0.29 ± 1.52
Ritchie et al. (2012) Study 1 DR% = +0.19 ± 3.45
Ritchie et al. (2012) Study 2 DR% = -2.72 ± 3.48
Ritchie et al. (2012) Study 3 DR% = -0.58 ± 3.46
Robinson (2011) DR% = -1.60 ± 3.45
Social Priming / Embodiment
Power posing effect
Carney et al. (2010, Risk-taking) d = +.61 ± .51
Ranehill et al. (2015) d = -.20 ± .20
Garrison et al. (2016) d = -.21 ± .30
Elderly priming
Bargh et al. (1996) Study 2a d = 1.02 ± .76
Bargh et al. (1996) Study 2a d = .77 ± .74
Hull et al. (2002) Study 1a d = .56 ± .72
Hull et al. (2002) Study 1b d = .53 ± .63
Cesario et al. (2007) Study 2 d = .22 ± .58
Pashler et al. (2008) d = -.22 ± .48
Doyen et al. (2012) Study 1 d = -.07 ± .36
Embodiment of secrecy effects
Slepian et al. (2012) Study 1 d = +.78 ± .62
Perfecto et al. (2012) d = +.19 ± .22
LeBel & Wilbur (2014) Study 1 d = +.18 ± .25
LeBel & Wilbur (2014) Study 2 d = -.32 ± .41
Pecher et al. (2015) Study 1 d = +.08 ± .39
Pecher et al. (2015) Study 2 d = +.21 ± .39
Slepian et al. (2012) Study 2 d = +.81 ± .73
Cobb et al. (2014) d = +.31 ± .52
Pecher et al. (2015) Study 3 d = +.21 ± .36
Macbeth effect
Zhong & Liljenquist (2006) Study 2 r = +.45 ± .31
Gamez et al. (2011) Study 2 r = +.04 ± .33
Siev (2012) Study 1 r = -.04 ± .11
Siev (2012) Study 2 r = -.09 ± .16
Earp et al. (2014) Study 1 r = +.00 ± .16
Earp et al. (2014) Study 2 r = -.07 ± .16
Earp et al. (2014) Study 3 r = -.11 ± .11
Zhong & Liljenquist (2006) Study 3 r = +.38 ± .30
Fayard et al. (2009) Study 1 r = +.03 ± .14
Gamez et al. (2011) Study 3 r = +.15 ± .29
Zhong & Liljenquist (2006) Study 4 r = +.33 ± .26
Fayard et al. (2009) Study 2 r = +.01 ± .18
Gamez et al. (2011) Study 4 r = +.19 ± .36
Reuven et al. (2013) r = +.39 ± .31
Cleanliness priming effect
Schnall et al. (2008a) Study 1 d = -.59 ± .55
Besman et al. (2013) d = -.44 ± .50
Lee et al. (2013) d = -.10 ± .41
Arbesfeld et al. (2014) d = -.47 ± .51
Huang (2014) Study 1 d = -.20 ± .29
Johnson et al. (2014a) Study 1 d = -.01 ± .27
Johnson et al. (2014b) d = +.05 ± .14
Schnall et al. (2008a) Study 2 d = -.84 ± .62
Johnson et al. (2014a) Study 2 d = +.02 ± .35
Money priming effect (part of ML1)
Vohs et al. (2006) Study 3 d = +.65 ± .65
Grenier et al. (2012) d = +.07 ± .62
Caruso et al. (2013) Study 2 d = +.43 ± .30
Rohrer et al. (2015) Study 2 d = +.06 ± .19
Schuler & Wänke (in press) Study 2 d = -.09 ± .39
Caruso et al. (2013) Study 3 d = +.49 ± .44
Rohrer et al. (2015) Study 3 d = -.06 ± .31
Caruso et al. (2013) Study 4 d = +.69 ± .58
Rohrer et al. (2015) Study 4 d = +.13 ± .37
Caruso et al. (2013) Study 1 d = +.77 ± .74
Hunt & Krueger (2014) d = -.26 ± .42
Cheong (2014) d = -.24 ± .39
Devos (2014) d = -.21 ± .40
Swol (2014) d = -.20 ± .40
John & Skorinko (2014) d = -.19 ± .42
Davis & Hicks (2014) Study 1 d = -.16 ± .39
Kappes (2014) d = -.16 ± .24
Klein et al. (2014) d = -.15 ± .35
Vranka (2014) d = -.14 ± .43
Packard (2014) d = -.14 ± .37
Cemalcilar (2014) d = -.13 ± .36
Bocian & Frankowska (2014) Study 2 d = -.12 ± .30
Rohrer et al. (2015) Study 1 d = -.07 ± .34
Huntsinger & Mallett (2014) d = -.07 ± .33
Schmidt & Nosek (2014, MTURK) d = -.06 ± .12
Hovermale & Joy-Gaba (2014) d = -.05 ± .40
Vianello & Galliani (2014) d = -.04 ± .34
Schmidt & Nosek (2014, Proj Impl) d = -.03 ± .11
Bernstein (2014) d = +.00 ± .43
Adams & Nelson (2014) d = +.01 ± .40
Rutchick (2014) d = +.01 ± .40
Vaughn (2014) d = +.01 ± .42
Levitan (2014) d = +.03 ± .36
Brumbaugh & Storbeck (2014) Study 1 d = +.03 ± .39
Smith (2014) d = +.10 ± .38
Kurtz (2014) d = +.11 ± .30
Brumbaugh & Storbeck (2014) Study 2 d = +.13 ± .43
Davis & Hicks (2014) Study 2 d = +.18 ± .26
Pilati (2014) d = +.18 ± .36
Wichman (2014) d = +.18 ± .40
Furrow & Thompson (2014) d = +.19 ± .43
Brandt et al. (2014) d = +.21 ± .44
Bocian & Frankowska (2014) Study 1 d = +.21 ± .45
Nier (2014) d = +.23 ± .40
Woodzicka (2014) d = +.24 ± .41
Schmidt & Nosek (2014) d = +.30 ± .44
Morris (2014) d = +.40 ± .40
Embodiment of physical warmth
Bargh & Shalev (2012) Study 1a r = +.57 ± .19
Bargh & Shalev (2012) Study 1b r = +.37 ± .20
Donnellan et al. (2015a) Study 9 r = -.13 ± .14
Donnellan et al. (2015a) Study 4 r = -.10 ± .13
Donnellan et al. (2015a) Study 1 r = -.06 ± .13
Donnellan et al.(2015b) r = -.04 ± .11
Ferrell et al. (2013) r = -.03 ± .10
McDonald & Donnellan (2015) r = -.02 ± .10
Donnellan et al. (2015a) Study 2 r = -.01 ± .09
Donnellan et al. (2015a) Study 8 r = +.02 ± .10
Donnellan et al. (2015a) Study 7 r = +.02 ± .11
Donnellan & Lucas (2014) r = +.04 ± .08
Donnellan et al. (2015a) Study 6 r = +.06 ± .08
Donnellan et al. (2015a) Study 5 r = +.10 ± .09
Donnellan et al. (2015a) Study 3 r = +.13 ± .13
Bargh & Shalev (2012) Study 2 r = +.29 ± .21
Wortman et al. (2014) r = +.01 ± .11
Williams & Bargh (2008a) Study 2 OR = 3.52 [1.06,11.73]
Lynott et al. (2014) Study 1 (Kenyon) OR = 0.61 [0.38, 0.98] Hot pack rated as warmer than the cold pack, d = 2.50
Lynott et al. (2014) Study 2 (MSU) OR = 0.92 [0.56, 1.53] Hot pack rated as warmer than the cold pack, d = 2.22
Lynott et al. (2014) Study 3 (Manchester) OR = 0.77 [0.58, 1.02] Hot pack rated as warmer than the cold pack, d = 2.61
Vess (2012) Study 1 d = +.60 ± .55
LeBel & Campbell (2013) Study 1 d = +.03 ± .27 Known sex differences in food preferences (women liked vegetables, fruits, candy, and wine more than men
LeBel & Campbell (2013) Study 2 d = +.05 ± .26 Known sex differences in food preferences (women liked vegetables, fruits, candy, and wine more than men
Moral Psychology
Psychological distance increases wrongness of immoral acts (SP:SI)
Eyal, Liberman, & Trope (2008) Study 2 d = +.66 ± .27
Zezelj & Jokic (2015) Study 1 d = -.06 ± .37
Eyal, Liberman, & Trope (2008) Study 3 d = +.71 ± .17
Zezelj & Jokic (2015) Study 2 d = +.61 ± .14
Eyal, Liberman, & Trope (2008) Study 4 d = +.80 ± .26
Zezelj & Jokic (2015) Study 3 d = -.26 ± .26
Gong & Medin (2012) Study 1 d = -.34 ± .24
Zezelj & Jokic (2015) Study 4 d = -.68 ± .36

Cleanliness priming -- Replications (7)  
Schnall, Benton, & Harvey (2008a)
With a Clean Conscience: Cleanliness Reduces the Severity of Moral Judgments
DOI:10.1111/j.1467-9280.2008.02227.x  

Original Studies & Replications N Effect size (d) [95% CI]
Schnall et al. (2008a) Study 1 40
Arbesfeld et al. (2014) 60
Besman et al. (2013) 60
Huang (2014) Study 1 189
Lee et al. (2013) 90
Johnson et al. (2014a) Study 1 208
Johnson et al. (2014b) 736
Current meta-analytic estimate of replications of SBH's Study 1 (random-effects):
Schnall et al. (2008a) Study 2 43
Johnson et al. (2014a) Study 2 126
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary (Last updated: April 7, 2016): The main finding that cleanliness priming reduces the severity of moral judgments does not (yet) appear to be replicable (overall meta-analytic effect: r = -.08 [+/-.13]). In a follow-up commentary, Schnall argued that a ceiling effect in Johnson et al.'s (2014a) studies render their results uninterpretable and hence their replication results should be dismissed. However, independent re-analyses by Simonsohn, Yarkoni, Schönbrodt, Inbar, Fraley, and Simkovic appear to rule out such ceiling effect explanation, hence, Johnson et al.'s (2014a) results should be retained in gauging the replicability of the original cleanliness priming effect. Of course, it's possible "cleanliness priming" may be replicable under different operationalizations, conditions, and/or experimental designs (e.g., within-subjects). Indeed, Huang (2014) has reported new evidence suggesting cleanliness priming may only reduce severity of moral judgments under conditions of "low response effort", however, the research appears to be low-powered (<50%) to detect the small interaction effect found (r = .12). Regardless, independent corroboration of Huang's interaction effect is required before confidence is placed in such moderated cleanliness priming effect.

Original authors' and replicators' comments: F. Cheung mentioned a note should be added that data for the Besman et al. (2013) replication has been lost (communicated to him by K. Daubman, who has not yet responded to my request for links to original data of both her Arbesfeld et al. and Besman et al. replications). M. Frank mentioned we should consider including some of Huang's (2014) studies (baseline un-moderated conditions only), which led us to add Huang's Study 1 (only study with baseline condition comparable to Schnall et al.'s Study 1 design). S. Schnall has yet to respond (email sent March 11, 2016).

Related Commentary

Money priming -- Replications (42)  
Vohs, Mead, & Goode (2006) 
The psychological consequences of money
Caruso, Vohs, Baxter, & Waytz (2013) 
Mere exposure to money increases endorsement of free-market systems and social inequality

Original Studies & Replications N Effect size (d) [95% CI]
Vohs et al. (2006) Study 3 39
Grenier et al. (2012) 40
Caruso et al. (2013) Study 2 168
Schuler & Wänke (in press) Study 2 115
Rohrer et al. (2015) Study 2 420
Current meta-analytic estimate of replications of CVBW's Study 2 (random-effects):
Caruso et al. (2013) Study 3 80
Rohrer et al. (2015) Study 3 156
Caruso et al. (2013) Study 4 48
Rohrer et al. (2015) Study 4 116
Caruso et al. (2013) Study 1 30
Hunt & Krueger (2014) 87
Cheong (2014) 102
Devos (2014) 162
Swol (2014) 96
John & Skorinko (2014) 87
Davis & Hicks (2014) Study 1 187
Kappes (2014) 277
Klein et al. (2014) 127
Packard (2014) 112
Vranka (2014) 84
Cemalcilar (2014) 113
Bocian & Frankowska (2014) Study 2 169
Huntsinger & Mallett (2014) 146
Rohrer et al. (2015) Study 1 136
Schmidt & Nosek (2014, MTURK) 1000
Hovermale & Joy-Gaba (2014) 108
Vianello & Galliani (2014) 144
Schmidt & Nosek (2014, PI) 1329
Bernstein (2014) 84
Adams & Nelson (2014) 95
Rutchick (2014) 96
Vaughn (2014) 90
Levitan (2014) 123
Brumbaugh & Storbeck (2014) Study 1 103
Smith (2014) 107
Kurtz (2014) 174
Brumbaugh & Storbeck (2014) Study 2 86
Wichman (2014) 103
Pilati (2014) 120
Davis & Hicks (2014) Study 2 225
Furrow & Thompson (2014) 85
Bocian & Frankowska (2014) Study 1 79
Brandt et al. (2014) 80
Nier (2014) 95
Woodzicka (2014) 90
Schmidt & Nosek (2014) 81
Morris (2014) 98
Current meta-analytic estimate of replications of CVBW's Study 1 (random-effects):
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary (Last updated: March 24, 2016): The claim that incidental exposure to money influences social behavior and beliefs does not (yet) appear to be replicable (overall meta-analytic effect: d = -.01 [+/-.05]). This appears to be the case whether money exposure is manipulated via instruction background images (Caruso et al., 2013, Study 1 & 4) or descrambling sentence task (Vohs et al., 2006, Study 3) and whether outcome variable is helping others (Vohs et al., 2006, Study 3), system justification beliefs (Caruso et al., 2013, Study 1), just world beliefs (Caruso et al., 2013, Study 2), social dominance beliefs (Caruso et al., 2013, Study 3), or fair market beliefs (Caruso et al., 2013, Study 4). Of course, it's possible money exposure reliably influences behavior under other (currently unknown) conditions, via other operationalizations, and/or using other experimental designs (e.g., within-subjects).

Original authors' comments: K. Vohs responded and mentioned Schuler & Wänke's (in press) replication of Caruso et al. (2013) was missing; this lead us to add Schuler & Wänke (in press) Study 2 (main effect) as a direct replication of Caruso et al. (2013) Study 2. Vohs pointed out several design differences between Grenier et al. (2012) and Vohs et al.'s (2006) original Study 3, but these deviations are minor (e.g., different priming stimuli, different help target); given Grenier et al. (2012) used the same general methodology as Vohs et al. (2006) Study 3 for the independent variable (unscrambling priming task) and dependent variable (offering help to code data sheets), the study satisfies eligibility criteria for a sufficiently similar direct replication according to Curate Science's taxonomy and hence was retained. Vohs also pointed out design differences between Tate (2009) and Vohs et al. (2006) Study 3; given Tate (2009) employed a different general methodology for the IV (background image on a poster instead of unscrambling task), the study does *not* satisfy eligibility criteria for a direct replication and hence was excluded. Finally, Vohs mentioned that "replication studies" for Vohs et al. (2006) are reported in Vohs (2015), however none of these studies were sufficiently similar methodologically to meet direct replication eligibility criteria and hence were not added.

Related Commentary

Macbeth effect -- Replications (11)  
Zhong & Liljenquist (2006)
Washing away your sins: Threatened morality and physical cleansing
DOI:10.1126/science.1130726  

Original Studies & Replications N Effect size (r) [95% CI]
Zhong & Liljenquist (2006) Study 2 27
Earp et al. (2014) Study 3 286
Siev (2012) Study 2 148
Earp et al. (2014) Study 2 156
Siev (2012) Study 1 335
Earp et al. (2014) Study 1 153
Gamez et al. (2011) Study 2 36
Current meta-analytic estimate of replications of Z&L's Study 2 (random-effects):
Zhong & Liljenquist (2006) Study 3 32
Fayard et al. (2009) Study 1 210
Gamez et al. (2011) Study 3 45
Current meta-analytic estimate of replications of Z&L's Study 3 (random-effects):
Zhong & Liljenquist (2006) Study 4 45
Fayard et al. (2009) Study 2 115
Gamez et al. (2011) Study 4 28
Reuven et al. (2013) 29
Current meta-analytic estimate of replications of Z&L's Study 4 (random-effects):
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary (Last updated: November 11, 2016): The main finding that a threat to one's moral purity induces the need to cleanse oneself (the "Macbeth effect") does not (yet) appear to be replicable (overall meta-analytic effect: r = -.02 [+/-.05]). This appears to be the case whether moral purity threat is manipulated via recalling unethical vs. ethical deed (Studies 3 and 4) or transcribing text describing unethical vs. ethical act (Study 2) and whether need to cleanse onself is measured via desirability of cleansing products (Study 2), product choice (Study 3), or reduced volunteerism after cleansing (Study 4). Of course, it is possible the "Macbeth effect" is replicable under different operationalizations and/or experimental designs (e.g., within-subjects).

Original authors' comments: We shared a draft of the curated set of replications with both original authors, and invited them to provide feedback. Chenbo Zhong replied thanking us for the notice and mentioned two published articles that should potentially be considered (i.e., Denke et al., 2014; Reuven et al., 2013). Reuven et al. do indeed report a sufficiently close replication (in their non-OCD control group) of Zhong & Liljenquist's Study 4 and hence the control group replication was added (though we're currently clarifying an issue with their reported t-value).

Related Commentary

Physical warmth embodiment -- Replications (14)  
Bargh & Shalev (2012)
The Substitutability of Physical and Social Warmth in Daily Life
DOI:10.1037/a0023527  

Original Studies & Replications N Effect size (r) [95% CI]
Bargh & Shalev (2012) Study 1a 51
Bargh & Shalev (2012) Study 1b 41
Donnellan et al. (2015a) Study 9 197
Donnellan et al. (2015a) Study 4 228
Donnellan et al. (2015a) Study 1 235
Donnellan et al.(2015b) 291
Ferrell et al. (2013) 365
McDonald & Donnellan (2015) 356
Donnellan et al. (2015a) Study 2 480
Donnellan et al. (2015a) Study 8 365
Donnellan et al. (2015a) Study 7 311
Donnellan & Lucas (2014) 531
Donnellan et al. (2015a) Study 6 553
Donnellan et al. (2015a) Study 5 494
Donnellan et al. (2015a) Study 3 210
Current meta-analytic estimate of replications of B&S' Study 1 (random-effects):
Bargh & Shalev (2012) Study 2 75
Wortman et al. (2014) 260
Current meta-analytic estimate of all replications (random-effects):
[Underlying data (CSV)] [R-code]

Summary (Last updated: April 7, 2016): The notion that physical warmth influences psychological social warmth does not appear to be well-supported by the independent replication evidence (overall meta-analytic effect: r = .007 [+/-.035])), at least via Bargh and Shalev's (2012) Study 1 and 2 operational tests (Study 1: trait loneliness is positively associated with warmer bathing; Study 2: briefly holding a frozen cold-pack boosts reported feelings of chronic loneliness). Regarding first operational test, the loneliness-shower effect doesn't appear replicable whether (1) trait loneliness is measured using the complete 20-item UCLA Loneliness Scale (Donnellan et al., 2015 Studies 1-4) or a 10-item modified version of the UCLA Loneliness Scale (Donnellan et al., 2015 Studies 5-9, as in Bargh & Shalev, 2012 Studies 1a and 1b), (2) whether warm bathing is measured via a "physical warmth index" (all replications as in Bargh & Shalev, 2012 Study 1a and 1b) or via the arguably more hypothesis-relevant water temperature item (all replications of Bargh & Shalev Study 1), and (3) whether participants were sampled from Michigan (Donnellan et al., 2015 Studies 1-9), Texas (Ferrell et al., 2013), or Israel (McDonald & Donnellan, 2015). Of course, different operationalizations of the idea may yield replicable evidence, e.g., in different domains, contexts, or using other experimental designs (e.g., within-subjects). In a response, Shalev & Bargh (2015) point out design differences in Donnellan et al.'s (2015) replications that could have led to discrepant results (e.g., participant awareness not probed) and report three additional studies yielding small positive correlations between loneliness and new bathing and showering items (measured separately; r = .09 [+/-.09, N=491] and r = .14 [+/-.08, N=552]). These new findings, however, await independent corroboration (these additional studies not included in meta-analysis because they were executed by non-independent researchers, see FAQ for more details). In a rejoinder, Donnellan et al. (2015b) report an additional study that (1) probed participant awareness and found effect size unaltered by excluding participants suspected of study awareness (r=-.04, N=291 vs. r=-.05, N=323 total sample) and (2) found no evidence that individual differences in attachment style moderated the loneliness-showering link.

Original authors' comments: I. Shalev responsed stating that they've already publicly responded to these replications and have reported three additional studies in their response and that readers be referred to this article (Shalev & Bargh, 2015). B. Donnellan responded stating that several open questions remain including (1) unexplained anomalies in Bargh & Shalev's (2012) Study 1a data (i.e., 46 of the 51 participants (90%) reported taking less than one shower or bath per week) and (2) concerns regarding unclear exclusion criteria for Shalev & Bargh's (2015) new studies. Donnellan further stated that he's unconvinced by Shalev & Bargh's reply and that replication attempts by multiple independent labs would be the most constructive step forward.

Related Commentary

Strength model of self-control -- Replications (32)  
Muraven, Tice, & Baumeister (1998) 
Self-control as limited resource: Regulatory depletion patterns
Baumeister, Bratslavsky, Muraven, & Tice (1998) 
Ego depletion: Is the active self a limited resource?

Original Studies & Replications N Effect size (d) [95% CI]
Prediction 1: Glucose consumption counteracts ego depletion
Gaillot, Baumeister et al. (2007) Study 7 61
Cesario & Corker (2010) 119
Wang & Dvorak (2010) 61
Lange & Eggert (2014) Study 1 70
Current meta-analytic estimate of Prediction 1 replications (random-effects):
Prediction 2: Self-control impairs further self-control (ego depletion)
Muraven, Tice et al. (1998) Study 2 34
Murtagh & Todd (2004) Study 2 51
Schmeichel, Vohs et al. (2003) Study 1 24
Pond et al. (2011) Study 3 128
Schmeichel (2007) Study 1 79
Healy et al. (2011) Study 1 38
Carter & McCullough (2013) 138
Lurquin et al. (2016) 200
Inzlicht & Gutsell (2007) 33
Wang, Yang, & Wang (2014) 31
Sripada, Kessler, & Jonides (2014) 47
Ringos & Carlucci (2016) 68
Wolff, Muzzi & Brand (2016) 87
Calvillo & Mills (2016) 75
Crowell, Finley et al. (2016) 73
Lynch, vanDellen et al. (2016) 79
Birt & Muise (2016) 59
Yusainy, Wimbarti et al. (2016) 156
Lau & Brewer (2016) 99
Ullrich, Primoceri et al. (2016) 103
Elson (2016) 90
Cheung, Kroese et al. (2016) 181
Hagger & Chatzisarantis (2016) 101
Schlinkert, Schrama et al. (2016) 79
Philipp & Cannon (2016) 75
Carruth & Miyake (2016) 126
Brandt (2016) 102
Stamos, Bruyneel et al. (2016) 93
Rentzsch, Nalis et al. (2016) 103
Francis & Inzlicht (2016) 50
Lange, Heise et al. (2016) 106
Evans, Fay, & Mosser (2016) 89
Tinghög & Koppel (2016) 82
Otgaar, Martijn et al. (2016) 69
Muller, Zerhouni et al. (2016) 78
Current meta-analytic estimate of Prediction 2 replications (random-effects):
[Underlying data (CSV)] [R-code]
Original Studies & Replications Independent Variables Dependent Variables Design Differences Active Sample Evidence
Prediction 1: Glucose consumption counteracts ego depletion
Gaillot, Baumeister et al. (2007) Study 7 sugar vs. splenda
video attention task vs. control
Stroop performance -
Cesario & Corker (2010) sugar vs. splenda
video attention task vs. control
Stroop performance No manipulation check Positive correlation between baseline & post-manipulation error rates, r = .36, p < .001
Wang & Dvorak (2010) sugar vs. splenda
future-discounting t1 vs. t2
future-discounting task -
Lange & Eggert (2014) Study 1 sugar vs. splenda
future-discounting t1 vs. t2
future-discounting task different choices in future-discounting task test-retest reliability of r = .80 across t1 and t2 scores
Prediction 2: Self-control impairs further self-control (ego depletion)
Muraven, Tice et al. (1998) Study 2 thought suppression vs. control anagram performance -
Murtagh & Todd (2004) Study 2 thought suppression vs. control anagram performance very difficult solvable anagrams used rather than "unsolvable"
Schmeichel, Vohs et al. (2003) Study 1 video attention task vs. control GRE standardized test -
Pond et al. (2011) Study 3 video attention task vs. control GRE standardized test 10 verbal GRE items used (instead of 13 analytic GRE items)
Schmeichel (2007) Study 1 video attention task vs. control working memory (OSPAN) -
Healy et al. (2011) Study 1 video attention task vs. control working memory (OSPAN) % of target words recalled (rather than total)
Carter & McCullough (2013) video attention task vs. control working memory (OSPAN) Effortful essay task vs. control in between IV and DV (perfectly confounded w/ IV)
Lurquin et al. (2016) video attention task vs. control working memory (OSPAN) 40 target words in OSPAN (rather than 48) Main effect of OSPAN set sizes on performance, F(1, 199) = 4439.81, p < .001
Inzlicht & Gutsell (2007) emotion suppression (video) vs. control EEG ERN during stroop task -
Wang, Yang, & Wang (2014) emotion suppression (video) vs. control EEG ERN during stroop task
Sripada, Kessler, & Jonides (2014) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) -
Ringos & Carlucci (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Wolff, Muzzi & Brand (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Calvillo & Mills (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Crowell, Finley et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Lynch, VanDellen et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Birt & Muise (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Yusainy, Wimbarti et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Indonesian language
Lau & Brewer (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Ullrich, Primoceri et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Elson (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Cheung, Kroese et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Hagger & Chatzisarantis (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Schlinkert, Schrama et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Philipp & Cannon (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Carruth & Miyake (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Brandt (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Stamos, Bruyneel et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Rentzsch, Nalis et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Francis & Inzlicht (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Lange, Heise et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) German language
Evans, Fay & Mosser (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Tinghög & Koppel (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV)
Otgaar, Martijn et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) Dutch language
Muller, Zerhouni et al. (2016) effortful letter crossing vs. control multi-source interference task (MSIT; RTV) French language
[Underlying data (CSV)] [R-code]

Summary (Last updated: November 11, 2016): There appears to be replication difficulties across 6 different operationalizations of original studies supporting the two main predictions of the strength model of self-control (Baumeister et al., 2007). Prediction 1: Independent researchers appear unable to replicate the finding that glucose consumption counteracts ego depletion, whether self-control is measured via Stroop (Cesario & Corker, 2010, as in Gaillot et al., 2007, Study 7) or future-discounting task (Lange & Eggert, 2014, Study 1, as in Wang & Dvorak, 2010). Prediction 2: There also appears to be replication difficulties (across 4 distinct operationalizations) for the basic ego depletion effect. This is the case whether IV manipulated via thought supppression, video attention task, emotion suppression during video watching, or effortful letter crossing task and also whether DV measured via anagram performance, standardized tests, working memory, or multi-source interference task. Wang et al. (2014) do appear to successfully replicate Inzlicht & Gutsell's (2007) finding that ego depletion led to reduced activity in the anterior cingulate (region previously associated with conflict monitoring), however this finding should be interpretd with caution given potential bias due to analytic flexibility in data exclusions and EEG analyses. Of course, ego depletion may reflect a replicable phenomenon under different conditions, contexts, and/or operationalizations; however, the replication difficulties across 6 different operationalizations suggest ego depletion might be much more nuanced than previously thought. Indeed, alternative models have recently been proposed (e.g., motivation/attention-based accounts, Inzlicht et al., 2014; mental fatigue, Inzlicht & Berkman, 2015) and novel intra-individual paradigms to measure ego depletion have also emerged (Francis, 2014; Francis et al., 2015) that offer promising avenues for future research.

Original authors' and replicators' comments: B. Schmeichel pointed out a missing replication (Healy et al., 2011, Study 1) of Schmeichel (2007, Study 1); we've added the study, though are currently clarifying with K. Healey a potential issue with their reported effect size. F. Lange mentioned that effect sizes for the RRR ego depletion replications seemed off (also pointed out by B. Schmeichel); indeed, we inadvertently sourced the effect sizes from an RRR dataset that included all exclusions (these have now been corrected and match values reported in Figure 1 of Sripada et al. RRR article). M. Inzlicht responded that he's currently developing a pre-registered study of the basic ego depletion effect using a much longer initial depletion task and adapted to be effortful for everyone via a more powerful pre-post mixed-design. R. Dvorak stated their study was not a replication of ego depletion; we clarified that the Wang & Dvorak (2010) study is used as an original study whose finding is consistent with the glucose claim of Baumeister et al.'s (2007) strength model. J. Lurquin mentioned their effect size was d=0.22 (not d=0.21), but .21 is actually correct given we use Hedge's g bias correction, but still call it d because of its greater familiarity with researchers.

Related Commentary

Mood on helping -- Replications (3)  
Isen & Levin (1972) 
Effect of feeling good on helping: Cookies and kindness
Levin & Isen (1975) 
Further studies on the effect of feeling good on helping

Original Studies & Replications N Effect size (Risk Difference) [95% CI]
Isen & Levin (1972) Study 2 41
Blevins & Murphy (1974) 50
Levin & Isen (1975) Study 1 24
Weyant & Clark (1977) Study 2 106
Weyant & Clark (1977) Study 1 32
Current meta-analytic estimate of L&I Study 1 replications (random-effects):
Current meta-analytic estimate of all replications (random-effects):
[Underlying data & R-code]

Summary (Last updated: March 24, 2016): The finding that positive mood boosts helping appears to have replicability problems. Across three replications, individuals presumably in a positive mood (induced via finding a dime in a telephone booth) helped at about the same rate (29.6%) as those not finding a dime (29.8%; meta-analytic risk difference estimate = .03 [+/-.19]; in original studies, 88.8% of dime-finding Ps helped compared to 13.9% of Ps in the control condition). This was the case whether helping was measured via picking up dropped papers (Blevins & Murphy, 1974 as in Isen & Levin, 1972, Study 2) or via mailing a "forgotten letter" (Weyant & Clark, 1977 Study 1 & 2 as in Levin & Isen, 1975, Study 1). These negative replication results are insufficient to declare the mood-helping link as unreplicable, however, they do warrant concern that perhaps additional unmodeled factors should be considered. For instance, it seems plausible that mood may influence helping in different ways for different individuals (e.g., negative, rather than positive, mood may boost helping in some individuals), and may also influence the same person differently on different occasions. Using highly-repeated within-person (HRWP) designs (e.g., Whitsett & Shoda, 2014) would be a fruitful avenue to empirically investigate these more plausible links between mood and helping behavior.

Original authors' comments: Report your research and results thoroughly, you may no longer be around when future researchers interpret replication results of your work!

Verbal overshadowing (RRR1 & RRR2 ) -- Replications (23)   
Schooler & Engstler-Schooler (1990)      [View RRR1 studies ]
Verbal overshadowing of visual memories: Some things are better left unsaid
DOI:10.1016/0010-0285(90)90003-M  

Original Studies & Replications N Effect size [95% CI]
Schooler & Engstler-Schooler (1990) Study 1 88
Poirer et al. (2014) 95
Delvenne et al. (2014) 98
Birt & Aucoin (2014) 65
Susa et al. (2014) 111
Carlson et al. (2014) 160
Musselman & Colarusso (2014) 78
Echterhoff & Kopietz (2014) 124
Mammarella et al. (2014) 104
Dellapaolera & Bornstein (2014) 164
Mitchell & Petro (2014) 109
Ulatowska & Cislak (2014) 106
Wade et al. (2014) 121
Birch (2014) 156
McCoy & Rancourt (2014) 89
Greenberg et al. (2014) 75
Alogna et al. (2014) 137
Michael et al. (2014, mTurk) 615
Koch et al. (2014) 67
Thompson (2014) 102
Rubinova et al. (2014) 110
Brandimonte (2014) 100
Eggleston et al. (2014) 93
Kehn et al. (2014) 113
Current meta-analytic estimate of all lab replications (random-effects):
[Underlying data (CSV) & R-code]

Summary (Last updated: March 3, 2016): The verbal overshadowing effect appears to be replicable; verbally describing a robber after a 20-minute delay decreased correct identification rate in a lineup by 16% (from 54% [control] to 38% [verbal]; meta-analytic estimate = -16% [+/-.04], equivalent to r = .17). Still in question, however, is the validity and generalizability of the effect, hence it's still premature for public policy to be informed by verbal overshadowing evidence. Validity-wise, it's unclear whether verbal overshadowing is driven by a more conservative judgmental response bias process or driven by a reduced memory discriminability process because no "suspect-absent" lineups were used. This is important to clarify because it directly influences how eye-witness testimony should be treated (e.g., if verbal overshadowing is primarily driven by a more conservative response bias process, identifications made after a verbal descriptions should actually be given *more* [rather than less] weight, see Mickes & Wixted, 2015). Generalizability-wise, in a slight variant of RRR2 (i.e., RRR1), a much smaller overall verbal deficit of -4% [+/-.03] emerged, when the lineup identification occured 20 minutes after verbal description (which occurred immediately after seeing robbery). Future research needs to determine the size of verbal overshadowing when there's a delay between crime and verbal description and before lineup identification, which better reflect real-world conditions.

Original authors' comments: We shared a draft of the curated set of replications with original authors, and invited them to provide feedback. Jonathan Schooler replied stating that the information seemed fine to him.

Related Commentary

Current Contributors
Etienne P. LeBel
Western University
Founder & Lead
Wolf Vanpaemel
KU Leuven
Randy McCarthy
Northern Illinois University
Brian Earp
University of Oxford
Malte Elson
Ruhr University Bochum
Current Advisory Board (as of June 2017)
Susann Fiedler
Max Planck Institute
Anna van't Veer
Leiden University
Julia Rohrer
Max Planck Institute
Michèle Nuijten
Tilburg University
Dorothy Bishop
University of Oxford





Brent Roberts
University of Illinois - Urbana-Champaign
Hal Pashler
University of California - San Diego
Daniel Simons
University of Illinois
Alex Holcombe
University of Sydney
E-J Wagenmakers
University of Amsterdam





Lorne Campbell
Western University
Simine Vazire
Washington University in St. Louis
Richard Lucas
Michigan State University
Marco Perugini
University of Milan-Bicocca
Rogier Kievit
University of Cambridge



Eric Eich
University of British Columbia
Mark Brandt
Tilburg University
Fred Hasselman
Radboud University Nijmegen
Previous Contributors
Advisory Board (2014 - 2017)




Denny Borsboom
University of Amsterdam
Brent Donnellan
Michigan State University
Axel Cleeremans
Universite Libre de Bruxelles
Uli Schimmack
University of Toronto
Leslie John
Harvard University


Joe Cesario
Michigan State University
Jan De Houwer
Ghent University
Foundational Contributors (2014 - 2015)




Alex Kyllo
Technical Advisor
Christian Battista
Technical Advisor
Ben Coe
Technical Advisor
Stephen Demjanenko
Technical Advisor
Please sign up below to receive the Curate Science Newsletter to be automatically notifed about news and updates. See past announcements.



Contact:
curatescience@gmail.com
Western University
1151 Richmond St
London, Ontario, CANADA, N6A 3K7

*Thanks to Felix Schönbrodt who is currently hosting Curate Science.