False Positives

by Jeff Lindsay (12/13/2015 5:27 pm)

Fun with Dice in a Workshop for Management

"Facts" and "data" are what most people use when they make decisions, especially decisions that others see as hopelessly biased and idiotic. The challenge is learning how to interpret facts and how to analyze data so that are decisions are less likely to be blunders based on the many biases and fallacies that can mislead all of us.

In a training class on decision making that I was invited to do for managers in one of my employer's groups in China, I brought in a big bag of dice to help local managers understand a serious example of self-deception in leadership. In particular, I sought to illustrate why criticism of employees for poor results seems to work better than praise for good results. It's an example where solid experience and significant data can actually mislead and deceive..

Everybody in a group of about 50 people were given three dice. I then explained their KPIs (Key Performance Indicators--an important phrase in modern Bizspeak) for the exercise: the company needs high scores. Everyone was asked to shake their dice and roll them once, then record their score. The few with the highest scores (e.g., a sum of 15 or higher) were brought up to the front. A group with about the same number of people with the lowest scores (e.g., less than 6) was also brought up to the front.

Now it was time for their performance review. I approached the high performers and personally congratulated them. I shook each person's hand, thanked them for their amazing achievement, gave them gifts (a candy bar) and cash incentives (a Chinese bill--real money), and praised them as good examples for the rest of the company. The whole room then joined me in loud applause for these high-achievers.

Then it was time for consequences for the low-performers. I shook my head, grimaced, wagged my finger at them, and scolded them for their failure in giving such disappointing results. They were a shame to the company, and we might need to demote them or fire them if they didn't shape up.

With proper rewards and punishments having been meted out, the people in these two outlier groups were given three more dice each and allowed to roll again. Amazingly, the average score of the former top performers now dropped significantly. Almost everyone in that team did more poorly after receiving praise and rewards. But for the ones who had been scolded and criticized, a notable improvement was observed. Almost all of them showed significant gains in their scores. Wow, praise hurts and criticism helps, right?

This pattern is fairly reproducible, and coincides with the vast experience of many coaches, bosses, generals, and leaders of all kinds: criticism and punishment works better than praise; yelling works better than kindness. They have solid experience to prove it, and they are often right, in a sense, but also perhaps terribly wrong.

Obviously, the results with the dice were not likely to be affected by praise or rewards (as long as the participants behaved honestly). What was happening here is a common statistical phenomenon that results in a great deal of self-deception in many fields of life. The phenomenon is "regression to the mean." When there is a degree of randomness, as there is in much of life, random trends that depart above or below the mean tend to come back. Results that are extreme are often statistical outliers, explainable by chance, that are not necessarily caused by the explanations we try to concoct. It's why athletes who make it to the cover of famous sports magazines after a string of remarkable successes tend to disappoint immediately after, leading to the "Sports Illustrated jinx" which may not be a real jinx at all. It's the reason why highly intelligent women such as my wife tend to marry men who are less intelligent, like certain bloggers around here. Since the correlation between female and male intelligence in marriage is not perfect and therefore involves some degree of randomness, the most intelligent outliers among females will tend to marry men who are not such extreme outliers themselves, and the probability is that they will tend to be less intelligent. This works both ways.

Two Books That Inspired My Workshop

My experiment with dice and other parts of my workshop were inspired in part by two outstanding books that I recommend. The exercise with dice was inspired by a story in Thinking Fast and Slow by Daniel Kahneman (New York: Farrar, Straus and Giroux, 2011). I picked this up for airplane reading a couple years ago and found a real gem that I have applied in a variety of ways, though I still readily fall into many of the fallacies of human thought that so easily beset us. The story that motivated my dice exercise for management came from Chapter 17, "Regression to the Mean," pp. 175-176.

Kahneman, winner of the Nobel Prize in economics, once taught Israeli air force instructors about the psychology of effective training. He stressed an important principle: that rewards for improved performance worked better than punishment for mistakes. This is something that we employees tend to understand easily, but is often a mystery to those dishing out the punishments and rewards. One of the instructors challenged him and said that this principle was refuted by his own extensive experience. When a cadet performed exceptionally well and was praised, he would usually do worse on the next exercise. But when someone performed poorly and was criticized, he usually did better on the next run. This was a "a joyous moment of insight" to Kahneman, who recognized an important application of what he had been teaching for years about the regression to the mean. The instructor had been looking for a cause-and-effect explanation to natural, random fluctuations, and had developed an iron-clad theory that was dead wrong. He had extensive real data, but had been deceived by a failure to understand the impact of randomness. Real data + bad statistics (or bad math) = bogus conclusions.

Regression to the mean is one of several important principles Kahnema discusses. Many have roots in mathematics. All have connections to human psychology and the way our brains work. Kahneman is brilliant in illustrating how often we make flawed decisions, and gives us some tools to overcome these tendencies.

Related to Kahneman's work is another math-oriented book which I relied on in my workshop on decision making, and which I highly recommend: Jordan Ellenberg, How Not to be Wrong: The Power of Mathematical Thinking (New York: Penguin Press, 2014).

Ellenberg shows how basic mathematics can quickly expose many of the fallacies that we make in our thinking and decision making. Some of his discussions have application to matters that come up in discussions of LDS religion and in evaluation of evidence to support or discredit a theory. He decimates one of the classic "Texas sharpshooter" disasters in religious circles, the utterly bogus methodology used in The Bible Code in which the Hebrew text of the Old Testament was treated as a miraculous, absolutely perfect text filled with hidden prophecies that could be obtained by mathematically rearranging the letters of the text in numerous different grid patterns and then searching for new words in something of a hidden-word puzzle. With computerized tools, thousands of grids could be formed to lay out the letters in new two-dimensional patterns and then these patterns were searched to find all sorts of modern topics.

There's an old joke about a man in Texas who took a rifle and fired a few dozen random rounds into the side of a barn. Wherever the shots were clustered together, he painted a target around them and then told people that he was a sharpshooter. Drawing circles around spaced-apart Hebrew letters on, say, arrangement number 47, 356 and finding a hidden prophecy is analogous to the Texas sharpshooter.

Further, the very premise of a perfect text for the study is completely without logic. There are multiple versions of the ancient Torah with obvious gaps and uncertainties (e.g., compare the Torah in the Dead Sea Scrolls to the version used today: there are differences). Changing one letter in the text would through off the alignments on the selected grid patterns that give mystical results, making the whole exercise obviously bogus.

In "Dead Fish Don't Read Minds" (Chapter 7), Ellenberg warns of the dangers of amplifying noise into false positives when we have the resources of Big Data to play with. With numerous variables to explore and map, it is incredibly easy to find some that seem to correlate. This was brilliantly illustrated in a real but still somewhat tongue-in-cheek paper that managed to be accepted for presentation at the 2009 Organization for Human Brain Mapping in San Francisco, where UC Santa Barbara neuroscientist Craig Bennett presented a poster called "Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparison corrections." (See discussion at Scientific American.) Basically, this paper reported statistical results from MRI brain mapping taken in a dead fish as the fish was shown photos of people in different emotional states. Dr. Bennett used the power of Big Data to explore MRI signals from millions of positions and found a couple spots in the fish's neural architecture where the fluctuates corresponded well with the emotional state shown in the photos. His paper was a clever way of illustrating the dangers of using Big Data to find correlations that really don't mean anything, much like the work to find "smoking guns" for Book of Mormon plagiarism by doing computer searches of short phrases with hundreds of thousands of books to toy with.

Ellenberg is often critical of religion and believers, and probably with good reason, and mocks some aspects of the Bible a time or two, but religious people can learn much from his mathematically sound approach to thinking.

What Are the Odds of That? Actually, Unlikely Results are Guaranteed

The many fallacies explored by Kahneman and Ellenberg can affect all of us in our thinking and decision making. When it comes to apologetics, Latter-day Saints can fall into related traps if we treat anything from anywhere, anytime as potentially being a relevant parallel to the Book of Mormon or other LDS works. Not every New World drawing of people in two or more tones implies that we are looking at Nephites and Lamanites. Not every drawing of a horse means we are looking at evidence for Book of Mormon horses. And a place called Nehem on a map of Arabia is not necessarily evidence that a name like Nahom existed anywhere in Arabia in Lehi's day. Those things can be random parallels. If they are meaningful, there should be further data that can support the hypotheses put forward. Such finds would be most meaningful if they are part of a large body of information from multiple sources that can serve as convergences useful in assessing the particular question at hand, such as "Is the story of Lehi's trail plausible? Could there have been a place called Nahom where Ishmael was buried, with a fertile place like Bountiful nearly due east?" Such questions can be framed in ways that do not leave the infinite wiggle room of Bible Code explorations. (It was such a question, in fact, that motivated Warren Aston to undertake exploration in the Arabian Peninsula at great personal cost with predetermined criteria for Bountiful, a target already drawn before he ever touched the coast of Oman.)

When we are exploring a hypothesis, false positives can easily result from errors in thinking due to failure to understand randomness and regression to the mean, as well as other mathematical and logical fallacies. A key element in the field of statistics is recognizing that a random result can seem to support a hypothesis when there is not actually a cause-and-effect relationship. The science of statistics provides tools and tests to help differentiate between what is random and what is real, though it certainty is almost always elusive. Statistics gives us some tools to help reduce the risk of seeing things that aren't there, or to know when we might be missing something that is (these topics involves the issues of "significance" and "power," for example). Even for those trained in statistics, there are abundant errors that can be made and false conclusions made.

I'm not a statistician, but I did have 10 hours of graduate level statistics and have frequently had to rely on statistics to assess hypotheses. I even published a little paper on a mathematical issue related to a statistical issue known as the "collector's problem." The publication (peer-reviewed, but still lightweight, IMO) is J.D. Lindsay, "A New Solution for the Probability of Completing Sets in Random Sampling: Definition of the 'Two-Dimensional Factorial'," The Mathematical Scientist, 17: 101-110 (1992), which you can also read online as a Web page or as a Word document. But what really matters is that I am married to a statistician (M.Sc. degree in statistics, now math teacher at an international school in Shanghai)--what are the odds of that? Well, 100%, since that's what happened.

That reminds me of the many mistakes that people on both sides of the debate can make as they argue probabilities. It's easy to see significance in something that happens by chance, especially when we find something that did happen, possibly by chance, and then try to make a case for how improbable that was. Richard Feynman once joked to a class that one the way to campus that morning, he saw a car with a specific license plate, ARW 357. "Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight?" This fallacy of a posteriori conclusions is an easy one to make. It's inevitable that any particular license plate will be rare, maybe even unique in all the world or at least all the state. That odds of that happening are low--a priori, before the event--but a posteriori, the odds are high, as in 100%. Making much of something because it is unusual is an error. But again, today, much of the serious evidence being raised for Book of Mormon plausibility is not of that nature.

Parallels occur all over the place. Parallel words, themes, and motifs occur across large bodies of literature, even when they are surely unrelated. I can take any two texts and find some parallels. Within a sufficiently large text, I can probably find parallel words and phrases that I can sketch out as a chiasmus. If we scour all the names introduced in the Book of Mormon, we should not be surprised to find a few might have apparent connections to ancient languages. We might even not be shocked to find an occasional one whose purported meaning might be construed to relate well to its context. But when numerous names begin to have support and offer useful new meaning for the text, when things like Hebraic wordplays occur many times and in interesting, meaningful ways, then the evidence can become more significant. When linguistic and archaeological evidence bring about interesting and repeated convergences, it may be time to take a deeper look at the evidence rather than assuming it's all chance.

Seemingly unlikely findings are actually quite likely to happen. That truism, however, is not an excuse to disregard meaningful bodies of evidence and convergences that enlighten a text in question.

Progress in Avoiding False Positives

Fortunately, the risk of methodological fallacies is frequently and openly considered among many LDS apologists, contrary to the allegations of critics who sometimes seem blind to their own biases and methodological flaws.

For example, while chiasmus can and does occur randomly, just as rhymes and other aspects of poetry can be found in random text, there are reasonable criteria for evaluating the strength of a chiasmus that can help screen random chaff from deliberately crafted gems. The possibility of false positives was an important factor in the analysis of John Welch as he explored the role of chiasmus in scripture. Others build upon his foundation and even offered statistical tools for evaluating chiasmus. It is still possible for something that appears elegant, compact, and brilliantly crafted to be an unintended creation, but we can speak of probability and plausibility in making reasonable evaluations.

In the early days of LDS apologetics, evidence of all kind was enthusiastically accepted. But gradually, I see LDS writers becoming more nuanced and cautious. A prime example of this is, in my opinion, Brant Gardner in his book, Traditions of the Fathers: The Book of Mormon as History (Salt Lake City: Greg Kofford Books, 2015). He is fully aware of the risk of fallacious parallels, but in evaluating the relationship between a text and history, they must be considered. What matters is how they are considered and what they data can plausibly support. On page 47 [visible in an online preview], Gardner discusses an insight from William Dever, a prominent professor of Near Eastern archaeology and anthropology at the University of Arizona who has been excavating in the Near East for several decades:

Some form of comparison between text and history is always required to discern
historicity. Texts are always compared to archaeology and/or other texts. Sometimes even artifacts require explanation by comparison or analogy to similar artifacts from another culture. Comparisons must be made. The problem cannot, therefore, reside in an absolute deficit in any methodology that makes comparisons, but rather in the way the comparisons are made and made to be significant. One important type of controlled parallel is ethnographic analogy. Dever explains his version of this method:
One aspect shared by both biblical scholarship and archaeology is a dependence on analogy as a fundamental method of argument. . . .

The challenge is to find appropriate analogues, those offering the most promise yet capable of being tested in some way. Ethnoarchaeology is useful in this regard, particularly in places where unsophisticated modern cultures are still found superimposed, as it were, upon the remains of the ancient world, as in parts of the Middle East. Analogies drawn from life of modern Arab villages or Bedouin society can, with proper controls, be used to illuminate both artifacts and texts, as many studies have shown. [William G. Dever, What Did the Biblical Writers Know and When Did They Know It? What Archaeology Can Tell Us About the Reality of Ancient Israel (Grand rapids, Michigan: Eerdmans, 2001), pp. 77-78.]

Dever's work deals with the concept of "convergences" wherein multiple lines of evidence, such as evidence from archaeology and from a text, come together and support the historicity of a text in question. In spite of the risk of fallacious parallels or false positives, convergences can be strong and can create compelling cases for the historicity of a text. Dever's work is also relied on by John Sorenson in Mormon's Codex, where extensive correspondences between Mesoamerica and the Book of Mormon in many different topical areas are explored. Gardner observes his requirements for a convergence are more demanding than the criteria in Sorenson's approach. But I feel both authors are aware of the risk of random parallels being mistaken as evidence, and are seeking to provide a thorough methodology that combines multiple approaches to establish meaningful though tentative connections that do more than just buttress one point, but which also provide a framework that solves other problems and makes more sense of the text. Convergences that are fruitful and lead to new meaningful discoveries are the most interesting and compelling, though there is always the risk of being wrong.

Gardner's approach also draws upon wisdom from the field of linguistics, which offers many analogies to the problems of evaluating the historicity of a text based on parallels and convergences.

As a result of my orientation, I suggest that we will be best served by an approach applied with great success in the field of historical linguistics. Bruce L. Pearson describes both the problem and the solution:

Sets of words exhibiting similarities in both form and meaning may be presumed to be cognates, given that the languages involved are assumed to be related. This of course is quite circular. We need a list of cognates to show that languages are related, but we first need to know that the languages are related before we may safely look for cognates. In actual practice, therefore, the hypothesis builds slowly, and there may be a number of false starts along the way. But gradually certain correspondence patterns begin to emerge. These patterns point to unsuspected cognates that reveal additional correspondences until eventually a tightly woven web of interlocking evidence is developed. [Bruce L. Pearson, Introduction to Linguistic Concepts, 51]

Pearson’s linguistic methodology describes quite nicely the problem we have in attempting to place the Book of Mormon in history. We cannot adequately compare the text to history unless we know that it is history. We cannot know that it is history unless we compare the text to history. We cannot avoid the necessity of examining parallels between the text and history.

The problem with the fallacy of parallels is that it doesn’t protect against false positives. What is required is a methodology that is more recursive than simple parallels. We need a methodology that generates the “tightly woven web of interlocking evidence” that Pearson indicates resolves the similar issue for historical linguists.

In my studies of foreign language, I've often been intrigued by false cognates that can trick people into imagining connections between languages that might not exist. An interesting involves the English and Chinese words "swallow." In English "swallow" can be a noun involving the ingestion of food or liquid and it can be a noun describing a particular bird. Something similar happens in Chinese, where 燕 (yan, pronounced with a falling tone) is the Chinese character for swallow, the bird, while the same sound and nearly the same traditional character, 嚥, is the verb, to swallow. The latter just adds a square at the left, representing a mouth. It's a cool parallel. If this kind of thing happened frequently, or if there were, say, hundreds of ancient Chinese words that showed connections to English, we might have a case for a systematic relationship between the languages. But there really are not meaningful connections between the languages apart from modern borrowed words and a few rare occurrences that can be chalked up to chance. But exploring parallels between languages is a vital area for research and study--it's how relationships between languages are established in the first place and can help fill in huge gaps in the historical and archaeological record. When the parallels become numerous and show patterns that begin to make sense, it's possible that two languages share historical connections. To me, Gardner's appeal to lessons from historical linguistics makes sense. Parallels can be real and meaningful, or they can be spurious. It's a matter of exploring the data and being open to convergences that enlighten and reveal useful new ways of understanding the data.

In evaluating the Book of Mormon, I believe LDS scholars today generally recognize that there is a risk of finding impressive parallels to, say, ancient Mesoamerica or ancient Old World writings that may be merely due to chance.

In my own writings, I've often pointed to the risk that my conclusions are based on chance, misunderstanding, and so forth, and use my blog as a tool to get frequent input from critics. In spite of their repetitive dismissal of all evidence as mere blindness, bias, and methodological fallacy on our part, occasionally they engage with the data and provide some helpful balance or even strong reasons to reject a hypothesis. It's a healthy debate. We don't have all the answers, we are subject to biases of many kinds, but there is still a great deal of exploration and discovery to do that goes beyond finding random items and painting a bullseye around it.

Sometimes the target was there before the bullseye was there long before the bullet holes were discovered, as in the Arabian Peninsula, which has long been a target for criticism of the Book of Mormon before the field work was done that helped us recognize just how many impressive hits had been scored by the text in First Nephi.

Methodological Error: Not Unique to Mormons

Fallacies of logic and math, of course, aren't unique to believers.

When it comes to the Texas sharpshooter fallacy and related problems with false positives, the critics of Mormonism also have some particular gifts in this area as they scour modern sources to support theories of plagiarism or modern fabrication of the Book of Mormon. Examples of improperly finding meaning from randomness coupled with serious methodological flaws include computer-assisted database searching among thousands of texts for short phrases found in common with the Book of Mormon or allegedly pointing to implausible sources for the Book of Mormon such as the The Late War Between the United States and Great Britain. Naturally, texts written in imitation of the King James style score highly with their abundance of such words as "thou" and "thee" instead of "you," but there is no substance to the claim of plagiarism.

"Parallelomania," in fact, is an increasing problem in the works of critics purporting to explain the Book of Mormon by appeals to numerous other texts. An excellent discussion of false positives from parallels in an anti-Mormon work is found in "Finding Parallels: Some Cautions and Criticisms, Part One" by Benjamin L. McGuire in Interpreter: A Journal of Mormon Scripture 5 (2013): 1-59. In Part Two, he gets more heavily into the methodology of treating parallels.

There is no doubt that there are many parallels, as there can be between any two unrelated texts. One of my early essays on the Book of Mormon sought to expose the problem of false positives for those claiming Book of Mormon plagiarism by coming up with even stronger examples of parallels than the critics were delivering. The result was my satirical essay, "Was the Book of Mormon Plagiarized from Walt Whitman's Leaves of Grass?" (May 20, 2002, but slightly updated several times since then), which Brant Gardner kindly quotes from in The Book of Mormon as History (2015, pp. 44-45). Based on the data alone, one can make a strong case that the Book of Mormon borrowed heavily from Whitman's 1855 work or at least had a common source (perhaps Solomon Spaulding was plagiarized by Whitman as well?). That's ridiculous, of course--but the parallels show just how easy one can be mislead by random parallels, coupled with a little creativity and a dash of zeal. If those claiming plagiarism can't clearly outdo Whitman as a control, a reasonable case has not been made.

One of latest posts at Mormanity dealt with my explorations of a hypothesis from Noel Reynolds about the possibility of the Book of Moses and the brass plates sharing some common material. Reynolds' article includes a detailed discussion of what it takes to determine the relationship between two texts. It begins with a consideration of the requirements for one text to depend on another. He is aware of the risk of random parallels and discusses the issues rather carefully, aware of risks and aware of the kind of evidence that is required to find meaningful parallels. He also offers a test case also in the Book of Abraham. Does it depend on the Book of Mormon or visa versa? Very little sign of relationship is evidence in that case. But further tests and more rigor is needed. It's very tentative and speculative, but interesting. Not completely illogical methodology at all, though the results are controversial.

But just as alleged evidence sometimes falls prey to the sharpshooter fallacy as it is improperly used to buttress a theory, the sharpshooter fallacy can easily be misapplied to dismiss legitimate, meaningful evidence. A noteworthy example is found when Dr. Philip Jenkins, a professor of history, dismisses the significance of the evidence for Nahom in the Arabian Penninsula, an important but small piece of the body of evidence related to the plausibility of Nephi's account of his journey through Arabia along the route we often call Lehi's Trail. Jenkins has this to say as he dismisses this evidence:

One other critical point seems never to have been addressed, and the omission is amazing, and irresponsible. Apologists argue that it is remarkable that they have found a NHM inscription – in exactly the (inconceivably vast) area suggested by the Book of Mormon. What are the odds!
By the way, the Arabian Peninsular covers well over a million square miles.

Yes indeed, what are the odds? Actually, that last question can and must be answered before any significance can be accorded to this find. When you look at all the possible permutations of NHM – as the name of a person, place, city or tribe – how common was that element in inscriptions and texts in the Middle East in the long span of ancient history? As we have seen, apologists are using rock bottom evidentiary standards to claim significance – hey, it’s the name of a tribe rather than a place, so what?

How unusual or commonplace was NHM as a name element in inscriptions? In modern terms, was it equivalent to “Steve” or to “Benedict Cumberbatch”?

So were there five such NHM inscriptions in the region in this period? A thousand? Ten thousand? And that question is answerable, because we have so many databases of inscriptions and local texts, which are open to scholars. We would need figures that are precise, and not impressionistic. You might conceivably find, in fact, that between 1000 BC and 500 AD, NHM inscriptions occur every five miles in the Arabian peninsular, not to mention being scattered over Iraq and Syria, so that finding one in this particular place is random chance. Or else, the one that has attracted so much attention really is the only one in the whole region. I have no idea. But until someone actually goes out and does some quantitative analysis on this, you can say precisely nothing about how probable or not such a supposed correlation is.

It's a fair question, but one that has been answered for many years. The NHM name turns out to be exceedingly rare in the Arabian Peninsula. As far as we can tell, it is only found in the general region associated with the ancient Nihm tribe. It's in the region required by the Book of Mormon, with a convergence of data showing that this tribal name was there in Lehi's day, in a region associated with ancient burial sites, a region where one can go nearly due east and reach a remarkable candidate for Nephi's Bountiful. It's part of an impressive set of convergences pointing to plausibility for the journey of Lehi's trail. In light of the new body of evidence, the task of critics has suddenly shifted from mocking the implausibility of Nephi's account to explaining how obvious it all is in light of the knowledge that Joseph and his technical advisory team surely must have found by searching various books and maps, all the time lacking any evidence that such materials were anywhere near, and still being unable to explain the motivation for plucking "Nehhm" or "Nehem" off a rare European map amidst the hundreds of other names, ignoring every opportunity to use the map for something useful. They also fail to explain how any of these sources could have guided Joseph to fabricate the River of Laman and Valley of Lemuel, the place Shazer, or the place Bountiful, each with excellent and plausible candidates. Here appeals to the Texas sharpshooter fallacy or any other fallacy miss the reality of serious evidence and serious convergences that demand more than casual dismissal and scenarios devoid of explanatory power.

Yes, it's fair and important to worry about false positive (and false negatives) as we approach issues of evidence for the Book of Mormon, and against the Book of Mormon as well. Methodology, logic, and intellectual soundness are fair topics for debate. Let's keep that in mind as we explore the issues and watch out for the many fallacies that can catch us on either side of the debate.
Continue reading at the original source →