Cover Story / Ken Regan
of questions. This gives a percentage, which translates to an arbitrary grade like A, B-, C+, etc. What matters is not just the percentage but how one interprets the percentage. If a test is especially difficult and most students do poorly on it, then an 85 percent might translate to an ‘A’ rather than the more typical ‘B’. This is called grading on a curve. Figure 2 shows the conceptual relation ship between a player’s chosen moves for a
set of positions and how an engine might distribute partial credit. Each point repre - sents a move. Good moves fall into the top left corner of the plot, while poor moves fall into the bottom right. Since average players and grand masters both make relatively poor moves compared to an engine, all human players’ plots take on the same general L-shape. This method of converting en gine evaluations into objective partial credit is the original aspect of Regan’s work. He calls it “Converting Utilities into Probabil - ities.” (Regan uses the technical term “probability” instead of “partial credit,” because after the partial credits conform to the constraint that they must sum to full credit, they mathematically behave like probabilities.) “I made it up,” he says. “I’ve been astounded, actually, that there doesn’t seem to be precedent in the literature for it. I was dead sure people were doing this problem.” (Regan’s literature search nourished his penchant for coincidence as well. As a serious
Christian he sometimes gets asked if he believes in the theory of evolution, which he does. But, he says, “Intelligent Design papers featured large in my initial literature search. There’s no direct connec tion to my work, but some of the math e matical ingredients are the same.” Intelligent Design’s leading complexity theorist is William Dembski, and Regan noted that his wife’s old roommate’s husband is Robert Sloan, chair of the computer science department at the University of Illinois, Chicago, where Dembski earned his Ph.D.) In Regan’s algorithms it is the relative differences in move quality that matter, not the
absolute differences. So if, for example, three top candidate moves are judged by the engine to be only slightly apart, then these top three moves will each earn approximately 30 percent credit (the re maining 10 percent left for the remaining candidate moves). This empha sis on rel ative differ ences rather than absolute value explains why cheaters who use moves that are not always the engine’s first choice will still get caught. This also
Figure 3
Partial Credit (y)
Drop Off From Best Answer (d)
IInstead of averaging between two locations on a number line and finding a “best fit” point like in Figure 1, Regan’s scoring method finds a “best fit” curve between distributed point locations. Each player’s moves result in a unique curve (shown by expression ‘y’) that can be characterized by the parameters ‘s’ and ‘c’.
explains why it’s not possible for partial credit to be greater against weak oppo nents. After a player’s partial credit is plotted
for a set of positions, Regan graphically scores his exam by drawing a curve averaged through the data (See Figure 3). (In statistical jargon, this process is called a “least squares best fit.” The score on a standard multiple-choice exam can be thought of as a “best fit” too, but in this case its best fit is calculated between the points zero and one on a number line rather than between multiple points on a two-dimensional plot. See Figure 1 again.) The best fit pro duces a curve (shown as ‘y’ in Figure 3) and two values, ‘s’ and ‘c,’ which characterize the bend in the curve. Regan calls ‘s’ the sensitivity. It shifts the curve left and right and correlates to a player’s ability to sense small differences in move quality. Regan calls ‘c’ the consistency and it thins or thickens the tail of the curve. A larger ‘c’ represents a player’s a void ance of gross blunders (“gross” being somewhat relative to the interpretation of the engine). Regan has found that different values of ‘s’ and ‘c’ translate into well-defined categories that align with Elo ratings, similar to the way that a 95 percent and an 85 percent on an exam typically translate to an A and B, respectively. Back in the 1970s, when Arpad Elo designed the USCF and FIDE rating systems, he arbitrarily picked 2000 to mean expert, 2200 to mean master, etc. This arbitrary assignment means chess ratings are based on a curve, and specific values of ‘s’ and ‘c’ can be mapped directly to specific Elo. The mapped rating is the Intrinsic Performance Rating. It’s more reliable to call someone, say, a
B-player in chess than it is to call someone a B-student in school. A student can study for an individual test, but chess strength tends to change slowly. If Regan knows a player’s Elo before subjecting the player’s moves to an anti-cheating exam, he can compare how well each moves’ partial credit matches the typical partial credit earned by a player with that Elo. Regan represents this difference as a z-score, which is a fancy name for the ratio of how many standard deviations a player’s test perfor mance is from that player’s typical Elo performance. The greater the z-score, the more likely a person has cheat ed. (See Figure 4) The IPR and z-score are two separate
results that emerge from the same test, but the z-score is much more reliable. If Regan were to compute an IPR with only a few moves, it would be like marking an exam with very few questions. This would trans late to an unreliable letter grade. The z-score, however, is more accurate. “The IPR does not have forensic standing,” says Regan. “But the cheating test [z-score] is based on settings that come from training 8,500 moves of world championship games.” These moves act like questions
www.uschess.org 25
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54 |
Page 55 |
Page 56 |
Page 57 |
Page 58 |
Page 59 |
Page 60 |
Page 61 |
Page 62 |
Page 63 |
Page 64 |
Page 65 |
Page 66 |
Page 67 |
Page 68 |
Page 69 |
Page 70 |
Page 71 |
Page 72 |
Page 73 |
Page 74 |
Page 75 |
Page 76