Main

In a heated debate about the proximity of COVID-19 herd immunity, White House health advisor Dr Scott Atlas proclaimed ‘You’re supposed to believe the science, and I’m telling you the science’1. A group of infectious disease experts and former colleagues from Stanford, however, publicly criticized Dr Atlas, who is a radiologist, for spreading ‘falsehoods and misrepresentation of science’ through his statements about face masks, social distancing and the safety of community transmission2. In the 2020 pandemic crisis, all eyes turned to scientific experts to provide advice, guidelines and remedies; from COVID-19 alarmists to sceptics, appeal to scientific authority appeared a prevalent strategy on both sides of the political spectrum. Please see the Supplementary Information for a short commentary on how the current work might relate to the COVID-19 situation.

A large body of research has shown that the credibility of a statement is heavily influenced by the perceived credibility of its source3,4,5,6,7,8,9,10. Children and adults are sensitive to the past track record of informants11,12,13,14,15,16, evidence of their benevolence toward the recipient of testimony17,18,19, as well as how credible the information is at face value20,21. From an evolutionary perspective, deference to credible authorities such as teachers, doctors and scientists is an adaptive strategy that enables effective cultural learning and knowledge transmission22,23,24,25,26,27,28. Indeed, if the source is considered a trusted expert, people are willing to believe claims from that source without fully understanding them. We dub this ‘the Einstein effect’; people simply accept that E = mc2 and that antibiotics can help cure pneumonia because credible authorities such as Einstein and their doctor say so, without actually understanding what these statements truly entail.

Knowing that a statement originates from an epistemic authority may thus increase the likelihood of opaque messages being interpreted as meaningful and profound. According to Sperber29, in some cases, incomprehensible statements from credible sources may be appreciated not just in spite of, but by virtue of their incomprehensibility, as exemplified by the speech of spiritual or intellectual gurus (the ‘Guru effect’). Here, we investigate to what extent different epistemic authorities affect the perceived value of nonsensical information. To this end, we contrasted judgements of gobbledegook spoken by a spiritual leader with gobbledegook spoken by a scientist. In addition, we assessed whether the source effect is predicted by individual religiosity and varies cross-culturally, as a proxy for how scientists and spiritual authorities function as ‘gurus’ for different individuals and within different cultural contexts.

Although source credibility effects have typically been investigated for persuasion in marketing and communication, both science and spirituality may present particularly suitable contexts for inducing strong source effects. Scientists are generally considered competent and benevolent sources30,31 and scientific information is often difficult and counterintuitive32,33,34. The combination of a credible authority and intangible information can increase the probability of obscure scientific information being accepted, by enhancing perceivers’ reliance on the source9,10,35. Even indirect context cues, such as those emphasizing the scientific nature of a piece of information can increase the probability that (dubious) information is believed36. Some experimental evidence, for instance, suggests that irrelevant neuroscience information37,38,39 or nonsense mathematical equations40 can boost the perceived quality of presented claims, though note that replication studies suggest that mere brain images may not suffice41,42. Notably, these effects were present only among non-experts (that is, people with little formal neuroscientific or mathematical training). This distinction suggests that the appeal of ‘sciencey’ information may be particularly strong when analytical assessment fails and one can only rely on secondary credibility cues.

Similar to the anticipated complexity of scientific information, previous beliefs about religious or spiritual texts instigate expectations that the information presented will be obscure. Supernatural explanations often appeal to phenomena that operate outside the natural world and to experiences deemed ineffable, mysterious and exempt from empirical validation43,44,45,46,47,48. Some scholars have argued that incomprehensible theological language and irrational beliefs may serve as a costly signal towards the religious ingroup, signalling quality by hard-to-fake moral commitment, intellectual capacity and epistemological investment49,50. However, irrespective of content biases, the evaluation of spiritual or theological obscurity critically depends on one’s personal beliefs about the credibility of spiritual gurus or religious authorities.

Various lines of evidence suggest that perceived credibility of both content and source depends on individual difference factors such as the perceiver’s (political) ideology and worldview51,52,53,54. In the absence of the means to rationally evaluate a claim and reliable source information, people probably infer credibility based on beliefs about the group to which the source belongs (for example, ‘conservatives’, ‘scientists’). In this process, similarities between one’s own worldview and that of the source’s group may serve as a proxy for being a benevolent and reliable source23,55. In a religious context, Christians were found to be more affected by an intercessory prayer when supposedly performed by a (charismatic) Christian than a non-Christian56 and to require less evidence for religious claims (for example, efficacy of prayer to cure illness) than for scientific claims (for example, efficacy of medication57,58). These differences were not present among secular individuals. Furthermore, evangelical Christians were more likely to accept statements opposing their personal views when attributed to an ingroup religious leader versus an outgroup religious leader59. This effect was moderated by the amount of contact participants had with the specific group to which the religious leader belonged, highlighting the importance of the person–source fit for message acceptance.

To account for these effects, alongside traditional dual-process models of persuasion9,10,60,61, various authors have recently proposed a Bayesian framework in which subjective beliefs about the source (for example, trustworthiness) and one’s worldviews contribute to belief updating in response to new information following Bayesian principles6,62,63,64. By including background beliefs, these Bayesian networks describe how a differential weighing of evidence and even divergent updating (belief polarization) can be considered rational and normative. This may explain, for instance, how strong religious believers can become more convinced of their beliefs in the face of disconfirmatory evidence, especially when their faith is being challenged63,65. Similarly, strong conservatives who distrust science may become less convinced of human-caused global warming when presented with scientific consensus information62. In other words, laypeople may apply their own ‘power priors’66 to calibrate evidence from different sources, whose trustworthiness is subjectively determined, partly by their broader worldview.

In sum, whereas previous studies have established source credibility effects in a wide array of domains, as-of-yet little is known about whether and to what extent people’s worldview is predictive of the relative credibility evaluation of information from scientific and spiritual sources. In the current study, we presented participants (N = 10,195, from 24 countries) with meaningless verbiage (henceforth, ‘gobbledegook’; also referred to in the literature as ‘pseudo-profound bullshit’67) randomly credited to either a spiritual authority or a scientific authority. We assessed: (1) whether trusting scientific experts over spiritual leaders is a general heuristic (that is, the Einstein effect); and (2) to what extent perceivers’ religiosity predicts relative confidence in the truth of the gobbledegook statements from both sources. Note that we chose a ‘spiritual guru’ authority frame, instead of ‘religious leader,’ because we wanted to avoid selecting an authority specific to any particular religion, to keep the study consistent across countries. Whereas religiosity and spirituality are overlapping but not interchangeable constructs68,69, self-reported religiosity has been positively associated with belief in spiritual phenomena such as fate, spiritual energy and a connected universe70,71,72 (though not unequivocally73). Consequently, we expected religiosity to be associated with increased receptivity to gobbledegook from a spiritual authority.

All confirmatory hypotheses and included measures were preregistered on the Open Science Framework (osf.io/faj2z/; the link contains the original preregistration file). The registered component (including additional subprojects) can be found at osf.io/xg8y5/files. In addition, for exploratory purposes, we included response time measures and a memory test to obtain insight into the cognitive processes underlying the source credibility effect (these measures were anticipated in the preregistration, but no concrete hypotheses were formulated). To further validate the findings from our experimental paradigm, we also analysed a large dataset from 117,191 individuals across 143 countries (including the same countries included in our study) that contains explicit trust ratings of scientists and traditional healers, as well as participant religiosity74.

Results

The two dependent variables that were measured (that is, importance of the message and credibility of the message) were highly correlated for both the scientific source (Spearman’s ρ = 0.772, 95% credible interval (95% CI) (0.764, 0.779)) and for the spiritual source (Spearman’s ρ = 0.827, 95% CI (0.822, 0.833); Supplementary Fig. 7)75. Because the pattern of results was equal across the dependent variables, we decided to describe only the findings for credibility in detail (see Table 2 for the results for importance).

Effect of source on credibility

First, we assessed the extent to which the perceived credibility of a gobbledegook statement is affected by its source (that is, a scientist versus a spiritual guru). Note, our initial hypothesis was that there would be no main effect of source, that is, we expected evidence for the null model. However, based on visual inspection of the data (Fig. 1), a main effect of source seems evident. To quantify the evidence for the effect of source, we compared between the null model without an effect of condition (that is, the scientist and spiritual guru are judged equally credible), the model with a common positive effect of condition across countries (that is, the scientist is judged more credible than the guru, to an equal degree in every country), the model with a varying positive effect of source (that is, the scientist is judged more credible than the guru, but to varying degrees across countries), and the unconstrained model that allows the source effect to be varying from both positive to negative (that is, in some countries, the scientist is considered more credible than the guru, in other countries, the guru is considered more credible than the scientist).

Fig. 1: Observed relation between religiosity and credibility ratings per source for each country.
figure 1

Countries are ordered by size of the source-by-religiosity interaction (from left to right, top to bottom). Red lines and circles denote ratings for the spiritual guru and grey lines and circles denote ratings for the scientist. Circles reflect individual observations and are jittered to enhance visibility. Credibility was measured on a seven-point Likert scale.

The Bayes factor model comparison summarized in Table 1 shows that the data provide most evidence for the positive effects model, which assumes a varying but consistently positive effect across countries. The source effect is favoured 1.1 × 10210-to-1 over the null model, which indicates strong evidence that the meaningless statement from the scientist is considered more credible than the meaningless statement from the guru. The positive effects model strongly outperforms the common effect model (BF+1 = 8.9 × 1017; explained variance (Bayesian R2) is 17.9%, 95% CI (17.0%, 18.7%)). The mean (95% CI) of the unstandardized size of the source effect in the full model is 0.70 (0.60, 0.79) on a seven-point Likert scale and the s.d. between countries is 0.16. Also note that as shown in Fig. 1, the within-country individual differences in credibility ratings are large, indicating that most of the variance is located at the lower level (that is, the individual level). The intraclass correlation coefficients quantifying the proportion of variance explained by the country clustering, as well as the total explained variance by the included effects for all models (Bayesian R2) are reported in the Supplementary Information. There, we also report MCMC diagnostics to verify the adequacy of the Bayesian models, as well as the estimates for the intercepts, source effect and the source-by-religiosity interaction effect for each country.

Table 1 Bayes factor model comparisons to test \({{{{\mathcal{H}}}}}_{1}\) and \({{{{\mathcal{H}}}}}_{2}\)

Interaction between source and religiosity on credibility

The source-by-religiosity interaction effect assesses to what extent the effect of source depends on raters’ own religious background (religiosity was globally standardized). Our hypothesis states that for individuals with low religiosity, credibility ratings should be higher for gobbledegook from a scientific source than for gobbledegook from a spiritual guru. For highly religious individuals, the reverse effect is expected, that is, higher credibility ratings for gobbledegook ascribed to a guru than for gobbledegook ascribed to a scientist. The interaction term was therefore constrained to be negative, in the sense that the coefficient of the source effect becomes smaller (or negative) with increased religiosity. Note that although the interaction term was constrained to have a negative sign, for consistency, we still refer to the model as the positive effects model.

For hypothesis 2, the model comparison summarized in Table 1 shows that the data provide most evidence for the common source-by-religiosity interaction model, which assumes a consistent interaction effect across countries, BF10 = 0.99 × 1015 (R2 = 18.1%, 95% CI (17.2%, 19.0%)). The data are uninformative for distinguishing between the common interaction and the varying positive interaction model (BF1p = 1.28), indicating that both are equally plausible. Although we cannot conclude whether the size of the interaction effect differs substantially between countries, both models provide strong evidence for a source-by-religiosity effect across all countries. The mean of the unstandardized source-by-religiosity interaction effect is −0.21 (−0.29, −0.14) and the s.d. between countries is 0.09 on the seven-point Likert scale. As evident from Fig. 2d, the interaction entails that the relative preference in credibility for statements from the scientist versus the spiritual guru decreases with higher religiosity. This effect is further unpacked in Fig. 2c, which shows that in every country, except for Croatia, religiosity is more predictive of credibility ratings for statements from the guru than for statements from the scientist.

Fig. 2: Summary of the multilevel-model (unconstrained) estimates per country and predicted overall effects.
figure 2

a,b, It is apparent that there is substantial variation across the 24 countries in (a) overall credibility judgements (that is, intercept) and (b) the effect of scientific versus spiritual source. c, Individual religiosity has a stronger effect on credibility judgements for the spiritual guru (red circles) than for the scientist (grey circles). The estimates are ordered from largest to smallest, and the open circles denote negatively valued effects. The error bars give the 95% CI for each country. The vertical lines denote the overall estimated effect with the 95% CI in the shaded bands. The dashed lines indicates zero. d, Predicted credibility as a function of source and individual religiosity, showing that the difference in credibility ratings for the scientist (grey lines) versus the guru (red lines) is less pronounced for high-religiosity individuals than low-religiosity individuals. The shaded bands reflects the 95% CI, crosses reflect the observed values for two randomly sampled participants per country, and circles reflect the corresponding estimated values. Crosses and circles are jittered to enhance visibility.

Exploratory analyses

In an exploratory fashion, we assessed to what extent the source manipulation influenced the effort participants put into processing the statements. To this end, we looked at: (1) response time for the evaluation of each statement as a proxy for processing time of the message, and (2) memory performance of words presented in the statements as a proxy for encoding quality. For these exploratory models, we assessed only evidence for a common effect, because visual inspection of the data suggested no or only very small and homogeneous effects (Fig. 3).

Fig. 3: Multilevel-model (unconstrained) estimates for the exploratory analyses.
figure 3

a,b, The source effect on (a) (log-transformed) processing time and (b) memory performance (range 0–1). The estimates are ordered from largest to smallest, and the open circles denote negatively valued effects. The error bars give the 95% CI for each country. The vertical lines denote the overall estimated effect with the 95% CI in the shaded bands. The dashed lines indicate zero.

Processing time

For processing time, the data indicate a common effect of source: participants spent more time processing the statement of the scientist (median response time = 28.30 s) than that of the guru (median response time = 27.0 s; BF10 = 8,050.48). Processing times were log-transformed for the analysis, to account for the positive skew that is typically observed in response time data. However, the standardized effect size is very small: 0.058 (0.023, 0.087). There was strong evidence against an interaction between source and religiosity ratings on processing time: religiosity is not predictive of the difference in processing time for the scientist versus the guru (BF10 = 0.03, BF01 = 30.78).

Memory performance

After the rating question, participants were presented with a recall item that required them to indicate which words they recognized from the statement. The list consisted of five target words (included in the statement) and five distractor words (not in the statement) for each source. An F1 score was calculated per person per source, which gives the harmonic mean of the precision (proportion true positives of all selected words) and recall (proportion true positives of all presented target words). F1 ranges between 0 and 1, with 1 being perfect performance.

The analysis indicated anecdotal evidence against a common effect of source on memory performance: participants did not perform better on recognizing words from the statement by the scientist than the statement by the guru (BF10 = 0.53; BF01 = 1.90; standardized estimate = 0.014 (0.001, 0.035)). Finally, there was moderate evidence against an interaction, BF10 = 0.31, BF01 = 3.27.

As a sanity check, we showed that there is an extremely strong effect of processing time on memory performance; participants who spent more time processing the statement, also performed better on the memory task (BF10 = ).

Validation using previously collected trust ratings

In addition to the experimental data collected in this study, we also examined an existing dataset that includes surveyed trust ratings for scientists and traditional healers for 117,191 participants across 143 countries. Note that the analysis on this dataset was not preregistered. Analysis of these data corroborated the results from our experimental manipulations; on average, scientists are considered more trustworthy than traditional healers, standardized estimate = 0.30 (0.06, 0.58) (for comparison, the standardized estimate for the experimental source effect on credibility is 0.41 (0.22, 0.49)). Although the positive effects model strongly outperforms both the null model and the common effect model (BF+0, BF+1 > 10308; R2 for the positive effects model = 28.1% (27.8%, 28.3%)), the analysis indicates most evidence for the unconstrained model \({{{{\mathcal{M}}}}}_{u}\), which indicates that scientists are not explicitly trusted more than traditional healers in all of the 143 countries (BFu+ = 320.76). Nonetheless, as displayed in Fig. 4a, only in 3 of the 143 countries is the mean of the estimated source effect negative, whereas the overall effect is clearly positive.

Fig. 4: Multilevel-model (unconstrained) estimates and predicted overall effects for explicit trust ratings.
figure 4

a, The source effect on trust ratings for each of the 143 countries, showing that in all but three countries, scientists are trusted more than traditional healers. The estimates are ordered from largest to smallest, and open circles denote negatively valued effects. The error bars give the 95% CI for each country. The vertical lines denote the overall estimated effect with the 95% CI in the shaded bands. The dashed lines indicates zero. b, The predicted trust rating as a function of source and individual religiosity, showing that religious individuals trust scientists slightly less and traditional healers more compared with non-religious individuals. The shaded bands reflect the 95% CI, crosses reflect the observed values for two randomly sampled participants per country, and circles reflect the estimated values per condition. The crosses are jittered to enhance visibility.

We also investigated the fit effect in this dataset, by including an interaction term between authority (scientists versus traditional healers) and religiosity (religious versus not religious). Because in 41 countries all of the participants indicated that they were religious, we could not reliably estimate varying effects for the authority-by-religiosity interaction. There was, however, strong evidence for an overall interaction between authority and religiosity, BF10 = 6.3 × 1014, R2 = 28.1% (27.8%, 28.4%) standardized estimate = −0.09 (−0.14, −0.02); for comparison, the standardized estimate for the experimental source-by-religiosity effect on credibility is −0.12 (−0.16, −0.08)). The pattern of the interaction is the same as for the experimental credibility data: the relative difference between trust in scientists versus traditional healers is smaller for religious individuals than for non-religious individuals. Interestingly, whereas the experimental study found that religiosity was associated with increased credibility ratings for both sources, albeit to a smaller extent for the scientist (Fig. 2c), the trust data show a positive effect of religiosity on trust for traditional healers (standardized estimate = 0.03 (0.02, 0.04)), yet a negative effect of religiosity on trust for scientists (standardized estimate = −0.01 (−0.02, −0.01)). See the Supplementary Information for an additional exploratory analysis on the country-level correlation in the source effect between the primary experimental dataset and secondary validation dataset on trust.

Robustness and additional checks

We conducted eight additional analyses that the results should be robust against, including all specifications mentioned in the preregistration:

  1. 1.

    Excluding observations for which participants did not correctly recall the source of the statement (nobs = 1616 (7.95%))

  2. 2.

    Excluding data from Lithuania because n < 300 (as preregistered)

  3. 3.

    Using a different, less-informed prior setting for r scale; \(r=\frac{\sqrt{2}}{2}\approx 0.707\), corresponding to a ‘wide’ prior scale provided in the BayesFactor package76

  4. 4.

    Using the importance rating instead of the credibility rating as the outcome variable

  5. 5.

    Applying a between-subjects design by taking only the first observation per participant

  6. 6.

    Including all participants, including those who failed the attention check

  7. 7.

    Running the analyses without adding any predictors as covariates

  8. 8.

    Running the analyses including all covariates that might affect either the independent variable (religiosity) or the dependent variable (credibility ratings): statement version (A or B), presentation order (guru–scientist or scientist–guru), participant age (in decades), participant gender, level of education and perceived socio-economic status.

The results of these robustness analyses are given in Table 2 and corroborate the conclusions from the main analyses: the data indicate (1) a source effect that varies between countries but is consistently positive (scientist > guru), and (2) a positive source-by-religiosity interaction effect (either a common or varying effect).

Table 2 Bayes factor of different models for robustness checks

Discussion

In the current cross-cultural study, we used a straightforward manipulation and measurement of source credibility effects at the individual level. We found a robust source effect on credibility judgements of meaningless statements ascribed to different authority figures; across all 24 countries and all levels of religiosity, gobbledegook from a scientist was considered more credible than the same gobbledegook from a spiritual guru. In addition to this robust overall Einstein effect, participants’ background beliefs predicted the credibility evaluations; individuals scoring low on religiosity considered the statement from the guru less credible than that from the scientist, whereas this difference was less pronounced for highly religious individuals. These patterns were consistent with explicit trust data collected for over 100,000 individuals from 143 countries: across 140 of 143 of these countries, people indicated greater trust in scientists than in traditional healers, with a larger difference for non-religious compared with religious individuals. Robustness analyses for the experimental study indicated that the effects were robust against different data inclusion criteria (for example, attention checks) and analytic choices (for example, selection of covariates, dependent variable, prior settings). Moreover, the effects also emerged compellingly when analysed as a between-subjects design (Table 2), suggesting that they are not simply explained by social desirability or participants responding in line with their guess of the research hypothesis (also note that recent empirical work indicates that online survey experiments are generally robust to experimenter demand effects77). Results of exploratory response time analyses suggest that in addition to giving more positive evaluations, people may actually put more effort into processing information from credible sources (although they did not recall it better). In particular, participants spent more time and may have tried relatively harder to decipher the gobbledegook from the scientist, whereas previous scepticism may have steered some to immediately dismiss the information from the guru as nonsense.

The pattern of results suggests that variability in the source effect between individuals and countries is more strongly driven by differences in the credibility of the spiritual authority than the scientific authority. Based on the literature one could consider various plausible hypotheses explaining cross-cultural variation in the source effects, for instance in terms of cultural religiosity, vertically versus horizontally structured societies, general trust in authorities and specific trust patterns toward religious and secular authorities78,79,80,81,82,83. However, although our analysis indicated quantitative differences in the size of the source effect between countries (that is, varying positive effects), we did not find qualitative differences (that is, changes in the direction or presence of the effect). Descriptively, the weakest source effects (that is, smallest difference between the scientific and the spiritual source) are observed in Asian countries (Japan, China, India), possibly because the spiritual guru as presented in the survey more closely fits Eastern belief systems than Abrahamic faith traditions. However, this explanation remains speculative and we are hesitant to overinterpret the cross-national variability both in the overall credibility judgements and the effect of source. Although we included main effects of age, gender, level of education and socio-economic status in the analyses, the different sampling strategies that were applied between countries also calls for caution in making inferences based on direct comparisons.

Our findings could reflect a universal gullibility with regard to gobbledegook statements: only a small minority of participants, regardless of their national or religious background, displayed candid scepticism towards the nonsense statements, and 76% of participants rated the scientist’s gobbledegook at or above the midpoint of the credibility scale (compared with 55% for the guru). However, the notion of a general gullibility underlying the observed effects is not entirely supported by the data. The median response was the midpoint of the credibility scale. Participants may have primarily used the midpoint of the scale to indicate that they were uncertain about whether or not the claim was credible, that is, to refrain from passing judgement at all84,85,86. This response might appear as a lack in motivation to critically reflect on the information that was presented; at the same time, saving one’s cognitive resources can also be considered ‘strategic’. First, as with most psychology experiments, our study was a zero-stakes task with no incentive for accuracy, which may have lowered effort and biased responses toward the midpoint. Second, when analytical reasoning about the plausibility of a presented claim does not yield any conclusion, the most rational thing to do may be either suspending judgement (selecting the neutral midpoint of the rating scale) or calibrating judgement to previous beliefs about the source of the claim. If one considers the group to which the source belongs generally competent and benevolent, it makes sense to give a positive judgement of their difficult-to-evaluate claim. After all, credible experts often acquired credentials based on their reputation of discovering phenomena that seem implausible at first glance55. For instance, the premises of using vaccines (‘inserting a virus prevents disease’) or facts about climate change (‘humans are changing the weather’) are intuitively dubious, yet reputable scientists have convinced many laypeople of their truth.

In this study, we intentionally selected authorities that are generally considered benevolent30,31 and we generated statements that are nearly impossible to (in)validate and that bear no relation to controversial or politicized scientific topics about which people may have strong previous attitudes (efficacy of vaccinations, climate change, etc.). By using ambiguous claims without any specific ideological content, we tried to isolate the worldview effect regarding the source from any worldview effect related to the content of the claims. At the same time, we aimed to maximize the efficacy of our manipulation, by varying the names, photographs and visual contexts (chalkboard versus stars) in addition to the authority’s profession. This approach makes it more difficult to single out which specific factor contributes to the source effect (for example, the observed effects might be partly driven by the authority’s appearance rather than their domain of expertise). Relatedly, some participants might have recognized the depicted men (Enrico Fermi and José Argüelles), although we consider it unlikely that many did. Because we did not ask whether participants recognized any of the depicted sources, we tried to indirectly and retrospectively assess recognition by scanning the open text items at the end of the survey (comments and awareness item) for any mentioning of either ‘Enrico’, ‘Fermi’, ‘José’ or ‘Argüelles’ (ignoring capitalization or diacritical marks). Only one (Spanish) participant mentioned recognizing both of the sources. Although this obviously does not prove that no other participants might have known the depicted sources, it seems unlikely that this was the case for a large proportion of participants. On the other hand, the multifaceted nature of the manipulation also increases its ecological validity; our stimuli resemble popular internet memes and real-life instances of source credibility also involve a combination of different features (for example, authorities typically look the part in public and appear in congruous contexts). Furthermore, a recent study showed that the mere mentioning of a famous source such as Aristotle or the Dalai Lama enhanced profundity ratings for pseudo-profound nonsense relative to unauthored versions, suggesting that even the mere name of an authority may suffice to induce source effects87.

The effects observed in our experimental data and the associations identified in the existing trust data were highly comparable, suggesting that by using our source credibility manipulation we tapped into participants’ attitudes about scientific and religious authorities. A noteworthy divergence, however, is that whereas our data showed a small positive relation between religiosity and credibility ratings for gobbledegook from the scientist, the trust data demonstrated a small but negative association between religiosity and trust in scientists. The finding that religious people are generally less trusting towards science has often been reported in the literature53,88,89,90. However, recent studies suggest that the negative relation between religiosity and trust in science might be US-specific and be weak or absent in other countries91,92,93,94. In addition, although trust is probably closely linked to credibility, explicit trust assessments and credibility ratings of specific statements may diverge, perhaps particularly for the kind of obscure statements used in the current study. That is, the gobbledegook statements may still have resonated better with religious individuals than non-religious individuals, resulting in the main effect of religiosity on credibility ratings. This main effect may be driven by a tendency for intuitive reasoning, which has been related to religiosity78,95,96 and receptivity of pseudo-profound and pseudo-scientific nonsense36,67. It could thus be that mistrust in science only partially dampens the allure of well-sounding science-related gobbledegook for intuitive reasoners36.

Notably, our study showed that across 24 countries even those who are highly religious are prone to a scientific source credibility bias, what we have deemed the Einstein effect. Looking ahead, there are at least six compelling horizons for future research to address the generalizability and underlying causes of the Einstein effect. First, whether scientific education diminishes the appeal of scientific authority outside its immediate domain remains unclear. Although those who place faith in science are prone to Einstein effects38,40,97,98, strong scepticism is normative within the practice of science—as anyone who has experienced peer review will attest. Although it is 150 years since Charles Peirce famously argued for fixing beliefs from the ‘method of science’ in favour the ‘method of authority’, the role of appeals to scientific authority among scientists remains unclear99. Second, future researchers might investigate whether political partisanship predicts differences in scientific source credibility. Although political commitments may share common psychological features with religious commitments100,101,102,103, the rise of anti-science populist ideologies might diminish or reverse Einstein effects among political partisans. By contrast, individual differences in deference to science104 may predict enhanced Einstein effects, although a recent study failed to find this pattern for faith in science (van der Miesen et al., in preparation). Third, the historical origins of scientific source credibility across different cultures remain unclear. If we were to wind back the clock a century to Einstein’s era, would we also observe preferential source credibility for scientific authority over spiritual authority? Fourth, the proximate and sustaining social and technological causes of scientific source credibility are not addressed in our study, and remain ripe for investigations. Is scientific source credibility an artefact of global information networks, country-wide science education or the sequestering of religious authority to the private domain? Fifth, although our study covers 24 countries worldwide, we cannot claim universality for our findings. Indeed, investigating source credibility in cultures where spiritual authority dominates may help to clarify the mechanistic questions that our study raises but does not address. Sixth, future work may extend the current work and investigate how the Einstein effect is affected by content cues (for example, the use of jargon, argument coherence, disclosure of uncertainty105) and personal attitudes towards the topic106,107,108.

In conclusion, our results strongly suggest that scientific authority is generally considered a reliable source for truth, more so than spiritual authority. Indeed, there are ample examples demonstrating that science serves as an important cue for credibility; the cover of Donald Trump’s niece’s family history book is adorned by ‘Mary L. Trump, PhD’; advertisements for cosmetic products often claim to be ‘clinically proven’ and ‘recommended by dermatologists’, and even the tobacco industry used to appeal to science (for example, ‘more doctors smoke Camels than any other cigarette’). By systematically quantifying the difference between acceptance of statements by a scientific and spiritual authority in a global sample, this work addresses the fundamental question of how people trust what others say about the world.

Methods

Participants

In total, 10,535 participants completed the online experiment. Of these, 340 participants (3.23%) were excluded because they failed the attention check (but see Table 2 for equivalent results when data all participants are included), leaving an analytical sample of N = 10,195 participants from 24 countries (see Table 3 for descriptive statistics per country). Participants were recruited from university student samples, from personal networks and from representative samples accessed by panel agencies and online platforms (MTurk, Kieskompas, Sojump, TurkPrime, Lancers, Qualtrics panels, Crowdpanel and Prolific). Participants were compensated for participation by a financial remuneration, the possibility of a reward through a raffle, course credits or no compensation. There were no a priori exclusion criteria; everyone over 18 years old could participate. Participants were forced to answer all multiple choice questions, hence there was no missing data (except for 36 people who did not provide a valid age). The countries were convenience sampled (that is, through personal networks), but were selected to cover six continents and include different ethnic majorities and religious majorities (Christian, Muslim, Hindu, Jewish, Eastern religions, as well as highly secular societies). Table 3 displays the method of recruitment and compensation per country.

Table 3 Descriptive statistics per country

The study was approved by the local ethics committee at the Psychology Department of the University of Amsterdam (Project #2018-SP-9713). Additional approval was obtained from local IRBs at the Adolfo Ibáñez University (Chile), the Babes-Bolyai University (Romania), the James Cook University (Singapore), Royal Holloway, University of London (UK) and the University of Connecticut (USA).

Sampling plan

We preregistered a target sample size of n = 400 per country and 20–25 target countries. The preregistered sample size and composition allowed us to look at overall effects, effects within countries and between countries. Because we applied a Bayesian statistical framework, we needed a minimum of 20 countries to have sufficient data for accurate estimation in cross-country comparisons109. However, our main interest were overall effects, rather than effects for individual countries. With approximately 8,800 participants, we would have sufficient data to reliably estimate overall effects, especially as the source effect is within-subjects. Data collection was terminated by 30 November 2019. The data from ten participants who completed the survey after this termination date were retained in the dataset.

Materials

The study was part of a larger project on cross-cultural effects related to religiosity (see Supplementary Information for details about the project). The full translated survey for each included country can be found at osf.io/kywjs/. The relevant variables for the current study were individual religiosity, the manipulated source of authority and the ratings of the statements.

Participant religiosity was measured using established items taken from the World Values Survey80, covering religious behaviours (institutionalized such as church attendance and private such as prayer/mediation), beliefs, identification, values and denomination (see Supplementary Table 5 for the exact items). Besides having high face-validity, these measures have been applied cross-culturally in other studies79,110,111. A Bayesian reliability analysis using the Bayesrel package112 indicated good internal consistency of the religiosity measure, McDonald omega = 0.930 (0.927, 0.931). The religious membership item was removed from the scale, as this item was only moderately correlated with the other items (item-rest correlation = 0.608, all others >0.706) and dropping it improved the reliability to omega = 0.939 (0.938, 0.941). The remaining seven individual religiosity items were transformed on a 0–1 scale (to make each item contribute equally to the scale), tallied to create a religiosity score per participant, and grand mean standardized for the analyses.

The experimental stimuli consisted of two gobbledegook statements that were attributed to a spiritual guru and to a scientific authority (within-subjects). We created two versions of the statement, manipulating (1) the background of the frame: an opaque new age purple galaxy background versus an opaque dark green chalkboard with physics equations; (2) the accompanying grey-scale photo of the alleged source: a man in robes (photo of José Argüelles) versus a man in an old-fashioned suit (photo of Enrico Fermi); and (3) the reported profession: spiritual leader versus scientist. In addition, in the introductory text, the source was further announced as ‘Saul J. Adrian—a spiritual authority in world religions’ versus ‘Edward K. Leal—a scientific authority in the field of particle physics’, names counterbalanced. The names were fictitious and the photos were taken from Wikipedia with re-use permission. The two versions of the text were three-sentence, 37/38-word statements. We generated the statements using the New Age Bullshit Generator (http://sebpearce.com/bullshit/), that combines new age buzzwords in a syntactically correct structure resulting in meaningless, but pseudo-profound sounding texts67. The two versions of the text were counterbalanced between sources. Participants were randomly assigned to the scientific–spiritual or the spiritual–scientific ordered condition. The stimuli in each language are provided at osf.io/qsyvw/.

The main outcome variable pertained to judgements of the importance and credibility of gobbledegook, measured on a seven-point Likert scale from not at all important/not at all credible to extremely important/extremely credible, respectively. A multiple choice recognition item for the source that expressed the statement was included as a manipulation check. In our preregistration, we did not specify that we would exclude participants based on incorrect recall of the source of the statement. We therefore kept all observations in the dataset for the main analyses and additionally ran the models without the observations for which the source was not recalled correctly. The results of this robustness check are provided in Table 2. For exploratory purposes, we also measured reading and processing time for the statement, as well as depth of processing. The latter was operationalized as the number of items correctly identified as having appeared in the statement. Participants were presented with a list of ten words, including five targets and five distractors, and were asked to select the words that they recognized from the statement.

Procedure

Participants received a link to the Qualtrics survey, either by email, social media or through an online platform. After reading the instructions and providing informed consent, they first completed items for a separate study about religiosity and trustworthiness. Next, they were presented with the first statement and source stimulus, rated its importance and credibility, completed the manipulation check to validate that they registered the source, and completed the word recall item. These elements were then repeated for the second statement. After that, participants completed items about body–mind dualism. Finally, they provided demographics, a quality of life scale, the religiosity items and were given the opportunity to provide comments. It took about 10 minutes to complete the entire survey (median completion time was 11.4 minutes).

Data analysis

We used the R package BayesFactor76 to estimate and test the multilevel Bayesian regression models113,114. The multilevel Bayesian modelling approach allows us to systematically evaluate the evidence in the data under different models: (1) across all countries the effect is truly null; (2) all countries share a common non-zero effect; (3) countries differ, but all effects are in the same (predicted) direction; and (4) in some countries the effect is positive, whereas in others the effect is negative. The models differ in the extent to which they constrain their predictions, from the most constrained (1) to completely unconstrained (4). We refer to these models as the null model, the common effect model, the positive effects model and the unconstrained model, respectively. Note that although the predictions from model (3) are less constrained than those from model (2), it is more difficult to obtain evidence for small effects under the latter model because it assumes that the effect is present in every country, rather than only in the aggregate sample. When applied to our hypothesis for the source effect, evidence for (1) would indicate that people from these 24 countries do not differentially evaluate credibility of claims from a guru or a scientist, evidence for (2) would indicate that on average people from these 24 countries consider claims from a scientist more credible than from a guru (or vice versa) with little between-country variability in the size of the effect, evidence for (3) would indicate that in all of the 24 countries, people consider claims from a scientist more credible than from a guru (or vice versa), but there is cultural variation in the size of this effect, and evidence for (4) would indicate that in some countries people consider claims from a scientist more credible than from a guru, and in other countries people consider claims from a guru more credible than from a scientist, indicating cultural variation in the direction (and size) of the effect. We used the interpretation categories for Bayes factors proposed by Lee and Wagenmakers115, based on the original labels specified by Jeffreys116.

For the main effect of source (\({{{{\mathcal{H}}}}}_{1}\)), we specified the following unconstrained model. Let Yijk be the credibility rating for the ith participant, i = 1,…, N, in the jth country, j = 1,…, 24, for the kth condition, k = 1, 2. Then Yijk ~ N(μ + αj + viβ + riδj + xkγj, σ2). Here, the term μ + αj serves as the baseline credibility intercepts with μ being the grand mean and αj the jth country’s deviation from the grand mean. The β term reflects the fixed effect of the level of education covariate. δj is the jth country’s main effect of religiosity on credibility ratings. The crucial parameter here is γj which is the source effect for the jth country. In the common effects model, we will replace γi with γ. The variable xk = −0.5, 0.5 if k = 1, 2, respectively, where k = 1 indicates the scientist condition and the k = 2 indicates the guru condition. The variable vi is the standardized participant-level education covariate. The variable ri is the standardized religiosity score for each participant. Finally, σ2 is the variance in credibility ratings across participants.

To test the source-by-religiosity interaction for hypothesis 2, the model from (1) is extended by including an interaction term: Yijk ~ N(μ + αj + viβ + riδj + xkγj + rixkθj, σ2), where θj is the parameter of interest, the religiosity × source interaction effect, with rixk as the product of the experimental condition and the standardized individual religiosity score. The parameter estimates as reported in the results section are based on the full model from (2).

To systematically investigate which third variables should and should not be included in the statistical model, we used directed acyclic graphs117 to visually represent the causal relations between the variables in our data118,119,120. In short, this method entails specifying directed relations (arrows) between different constructs and measures (nodes) in a given design that allow one to intuitively reflect causal structures and determine which third variables should be accounted for and which should be ignored in the statistical model. Based on directed acyclic graphs created in the R package ggdag121, both country and level of education were identified as potential confounding factors that warranted inclusion, because they may affect both religiosity122,123 and overall credibility assessments (for example, due to scepticism). Country was therefore added as a clustering factor, while level of education was added as a fixed covariate in all models. We also ran the models while including all participant-level variables related to the primary measures, that is, gender124, age125, socio-economic status126,127, statement version (A or B) and presentation order (guru–scientist or scientist–guru). Note that including these covariates improved the model fit, but the qualitative results remain the same regardless of the (set of) covariates. See Supplementary Figs. 46 for details on the causal graphs and Table 2 for the primary results without any and with all covariates.

Prior settings

The BayesFactor package applies the default priors for ANOVA and regression designs128,129, in which the researcher can determine the scale settings for each individual predictor in the model. We used the settings for the critical priors in the multilevel models as proposed by Rouder et al.114, concerning the scale settings on μγ, μθ and \({\sigma }_{\gamma }^{2},{\sigma }_{\theta }^{2}\). The scale on μγ, μθ reflects the expected size of the overall source effect and source-by-religiosity effect, respectively, and is set to 0.4 (small–medium effect). The scale of \({\sigma }_{\gamma }^{2},{\sigma }_{\theta }^{2}\) reflects the expected amount of variability in these effects across countries. This scale is set to 60% of the overall effect, resulting in a value of 0.24. The prior scale for the overall between-countries variance was set to 1. We used 31,000 iterations for the Markov chain Monte Carlo sampling and discarded the first 1,000 iterations (‘burn-in’).

Deviations from preregistration

We deviated from the preregistration in the following ways. First, in our preregistration, we formulated a hypothesis about the interaction between source and perceived cultural norms of religiosity in one’s country. However, in retrospect, we realized this hypothesis lacked theoretical justification and the proposed analysis was methodologically suboptimal (see Supplementary Information for details on this analysis).

Second, as a stopping rule, we preregistered that data collection would be terminated (1) when the target of n = 400 per country was reached, or (2) by 30 September 2019. However, due to unforeseen delays in construction of the materials and recruitment, this deadline was extended to 30 November 2019. We did not download or inspect the data until after 30 November.

Third, we preregistered to only include countries where usable data from at least 300 participants was collected (that is, complete data from attentive participants). However, we decided to keep the n = 291 participants from Lithuania in the final sample, because the hierarchical models account for uncertainty in estimates from countries with smaller samples and removing these data will actually reduce the overall precision of the estimates. Moreover, it would simply be unfortunate to remove all data from a highly understudied country.

Fourth, we preregistered that we would use the R package brms130 to analyse the data and estimate model parameters. However, we ended up using the BayesFactor package76. This method is arguably more suitable for model comparison and calculating Bayes factors in particular. However, we also ran the models as preregistered and report these results in the Supplementary Information.

Fifth, we added level of education as a participant-level covariate to the models, which improved the model fits. Note that adjustments 3–5 did not qualitatively change any of the results (Table 2 and the Supplementary Information).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.