## The Entropy of the SAT

If you get a question wrong on the SAT, you get a 1/4 point deduction. If you skip that question, there’s no penalty. This way, the test is “guessing-neutral”. Each question has five choices, so if you guess randomly your expected score increase is 1/5*1 + 4/5 * (-1/4) = 0. On average, you neither win nor lose points by guessing.

However, guessing still increases the variance of your score by 0.1 raw points per guess. Maybe there is a better way to account for test-takers getting some correct answers at random?

I remember the words of my 9th grade geometry teacher. We had to answer some True/False questions, and he was wise to the strategy of writing an ambiguous sort of pencil scratch which, if it were marked wrong, you could later come back and complain had been misread.

Mr. Allen told us, “Be very clear about your T’s and F’s. If I can’t tell whether what you wrote is a T or an F, I’m going to guess. And I always guess whichever one is the wrong answer. I am the worst guesser ever.” He paused a moment after that and added, “Well, I guess that means I’m the best guesser ever, too.”

What he meant was that if you guess randomly, you’ll guess that the student got it right 50% of the time. If your guess has the student getting it wrong 100% of the time, that means you, the guesser definitely know the answers. From the student’s point of view, getting all the answers wrong on a T/F test is just as hard as getting them all right.

This makes a T/F test very easy to pass. Even if you only know the answers to 20% of the questions, you can still guess on the rest and get an expected score of 60%.

We can adjust the grading of the test to get an estimate of the student’s knowledge by analyzing the amount of information they put into the test. By “information” what I mean is the reduction in the entropy.

Say we have a T/F test with 100 questions. There are 100891344545564193334812497256 ways to get exactly 50/100 on the test, but only 1 way to get 100/100 or 0/100. There are 100 ways to get 99/100 or 1/100.

If we take the binary log of these numbers, we find the entropy of a score 50/100 is 96.3 bits. The entropy of a perfect score is 0 bits. To convert to a reasonable test score, we multiply this entropy by -1 and add 96.3. Then to get on a scale of 0 – 100 multiply by 100/96.3. (This is equivalent to using not a binary log, but a log with base 2^0.963 = 1.95). Since this score is based on information, let’s call it the iScore. The random guessing iScore is zero, which reflects the test-taker’s complete lack of knowledge. The perfect iScore is 100. A raw score of 99/100 is worth 93.1. One answer wrong means a lot! Here’s a plot of the iScore as a function of the raw score.

In the general case of $n$ questions with $c$ choices each and a raw score $r$, the formula for the iScore is

$iS(r) = \left(-\log\frac{n!(c-1)^{n-r}}{r!(n-r!)} + \log\frac{n!(c-1)^{n-n/c}}{n/c!(n-n/c)!}\right)\frac{100}{\log\frac{n!(c-1)^{n-n/c}}{n/c!(n - n/c)!}}$

For example, here’s the iScore plot for a test with 100 questions, each with 4 choices.

The more choices there are, the less difference between the iScore and the raw score. In the limit where $c \to \infty$, the iScore and the raw score are the same thing. That means that if you want to give a test so the raw score accurately determines how much students know, make it free response.

The SAT does not report your raw score to colleges, even after its guessing adjustment. Instead it reports a scaled score, indicating how well you did relative to the other test takers. I have a suggestion for an improved algorithm for relative scoring, too. The basic idea is that you should be rewarded for getting very difficult questions correct. Meanwhile, if you make a boo boo on one of the easy questions, it shouldn’t mess you up too much because it was probably not indicative of a gap in your knowledge, and might be due to something dumb like accidentally marking the wrong bubble on the answer sheet.

When you get your SAT score report, you get to see every question on the test along with the answers you gave and the percent of test takers who got that question right. If 50% of test takers got a certain question right, that question is only as hard as a normal true/false question. We could construct an effective number of choices for each question by taking 1/(proportion correct answers). Let’s call that the question’s “quality”. The question that 50% of people got right has quality 2, while a question that only 10% of people got right as quality 10.

We take the binary log of the quality of each question and call that the question’s “weight”. A question’s weight is the number of true/false questions it’s worth. Now we multiply your raw score on each question (1 or 0) by the question’s weight and sum to create a weighted raw score. This is a different metric than the iScore, and I’m not sure which I like better. This score allows us to account for the SAT’s free response math questions, though.

Of course, none of this actually removes the advantage of guessing. To answer that problem, I propose that the test’s answer choices be randomized before the test is administered, then before scoring someone’s test, all the answers they left blank are automatically filled in with choice A, or filled in randomly. This simply takes the guesswork out of the test-taker’s hands and makes guessing compulsory, but it seems fair enough to me.