A reader (Paul) writes:
Hi, I love the blog. I thought you'd find this entertaining. I was running an internet blitz game through Fritz for suggestions, and it proposed the line below. I knew something was wrong, but couldn't believe just how wrong...(Fritz when asked for a hint at the end of the line sees it immediately of course). Anyhow, I hope you enjoy the small piece of entertainment in return.
1. e4 c5 2. Nf3 Nc6 3. d4 cd 4. c3 dc 5. Nxc3 Nf6 6. Bc4 e5 7. 0-0 Fritz evaluates as 0.22, and recommends instead 7. Ng5 d5 8. Nxd5 Nxd5 9. Bxd5 Bb4+ 10. Kf1 Qf6 11. Bxf7+ Kf8 12. Bd5 Nd4 13. Be3 which it evaluates as 0.75. Just lovely.
Hi Paul,
Thanks for the nice words about the blog, and I appreciate your submission.
I attempted to replicate your results, but was unsuccessful. One thing I'd need to know was which version of Fritz you were using (I'm on Fritz 9), and at what move it produced that variation, what depth, etc. My best guess is that this analysis takes place after Black's 6th move, but on my computer Fritz gives 7.Ng5 d5 8.Nxd5 Nxd5 9.Bxd5 Be6 as Black's best hope, but thinks White has a pawn advantage or so after 10.Bxc6+ bxc6. (That's at depth 13; at depth 14 it continues 11.Qxd8+ Rxd8 12.Nxe6 fxe6, with the same +1.08 evaluation.)
This doesn't change as I move further into your variation. After 9.Bxd5, it considers 9...Bb4+ as its second choice, but follows up 10.Kf1 with 10...Be6, not 10...Qf6. (At least through depth 14.) Once I've entered 10...Qf6, it advocates 11.Bxf7+ Kf8 12.Bd5, but then thinks Black should play 12...h6. Finally, once we get to 12.Bd5, 12...Nd4 is its third choice, and it immediately recommends 13.Kg1, avoiding all the ...Qa6 shenanigans it apparently overlooked in your experience.
Nevertheless, my inability to duplicate your results doesn't disprove the more general phenomenon known as the horizon effect. This refers to the propensity of computers to calculate a variation to a certain depth and evaluate it in their favor, only to find, upon getting nearer to the line's conclusion, that the evaluation was (seriously) mistaken. The problem was initially out of the computer's "sight" - it was like a ship approaching, but not yet having appeared on the horizon.
Much has been made of this weakness in computer chess over the years, but not necessarily correctly. The reason is that this same problem befalls humans: it often happens that we calculate long variations, only to miss a zinger at or near the end of our variation. So why make fun of chess engines for the same thing? As long as computers are too weak to solve the game, and are forced to search, prune, and evaluate without certainty, horizon effect errors are guaranteed to occur. (Indeed, in a trivial sense, all (unintentional) errors are horizon effect errors.)
It seems to me that we can distinguish between types of horizon effect errors, though, in a way that is illuminating to the difference between human play and that of computers, and which may still be of use in games between the two. The first, not-too-interesting or useful sort is the one we've discussed so far: the missed tactic at the end of a long sequence. Chess engines are getting better and better at not missing these, but I'm not sure those errors can be stamped out completely, prior to the game's being solved.
The second and, for our purposes, more interesting sort is what I've called the frog-in-the-kettle problem. Apparently (I haven't tested this, and earnestly hope no one reading this will, either) if you put a frog in hot water, it will show good sense and jump back out if it can, but if you put it in warm weather and heat the pot, it will stay put. The application of this strange fact is not that you should put your computer in a vat of water and heat it - surprise, surprise. Rather, it's this: if you engage in a slow build-up against the enemy king, but do so in such a way that there are no hard-to-meet threats coming up against the enemy king in the next 5-10 moves, it turns out that the program will tend to ignore what you're doing, and will evaluate the position favorably to itself, provided everything else is going well.
As always, programmers are aware of the problem and are doing what they can to fix it, and it's not as easy to exploit this idea as it once was. Even so, as I've shown many times on my ChessBase show and on the blog, too, chess engines tend to underestimate one side's attacking prospects until the threats are right on top of them.
Thus while the first sort of horizon effect is a general problem that afflicts humans and engines alike, this second problem is distinctively silicon-based. A moderately experienced club player will know almost immediately that when the opponent starts massing troops on the border, it's time to bring in the reinforcements, or send the king elsewhere, or do something to deter the opponent's attacking ambitions. Not so for chess engines, even for one that's the strongest player in the world.
Unless Kramnik is reading this blog - and I'll go out on a limb and guess that he isn't - it's unlikely that any of us are going to face a computer in a meaningful event anytime soon. It is useful to keep this second horizon effect idea in mind when using a chess engine to analyze, however. If you're examining a position where one side seems to have a promising attack in the offing, even if it requires a bit of preparation first, then if the computer disagrees, ignore it. Finish the preparatory moves, and keep an eye on the evaluation. Is it creeping in the attacker's favor? Good! Continue in that same vein, and you'll often notice a pro-attacker trend. You'll get more out of your computer when you're aware of this, and it's useful when preparing novelties for your unsuspecting opponents - especially those who don't fully realize the danger of the frog-in-the-kettle horizon effect!
in the position 8/8/8/3k1ppp/P1p5/1PP1KP1P/8/8 w all the engines lose half a point by going b4??
This brings up another important point I've mentioned before, but certainly bears repeating: chess engines will often find the right move, even in positions they badly misevaluate.
Another plus: once the moves 1.b4 f4+ 2.Kf2 h4 are entered, the evaluation drops immediately to less than +1. 0.00 would be better, but this is progress! Interestingly, Shredder, which is usually much better than its rivals in evaluating these sorts of positions, alleges a whopping +4.93 edge with 3.a5.
P.S. For those who are unfamiliar with FEN, here's the position:
White: Ke3, p's a4, b3, c3, f3, h3.
Black: Kd5, p's c4, f5, g5, h5.
(White to move.)
It seems to me there are several really large problems with your suggestion - think of this as a challenge. The most obvious one is that a 10 null-move search in every position is likely to prove extremely wasteful. How often is it going to have any value to go that deeply?
Second, is this only going to be used in the actual position, or within a given variation? It seems to me the problem could recur as a sort of second-order horizon effect: the engine evaluates the position 15 ply in as +3 in its favor, but it only realizes once it gets there, thanks to a 10-ply null-move search, that the position is a draw. So unless you want to incorporate this massive null-move search at every node, the problem will recur.
Third, in this particular position allowing White 10 consecutive moves will "hurt" Black: a4-5-6-7-8=Q. And even if we find some way to bring Black king moves into the picture, there's also this: White brings the king to a3, pushes his pawns and takes on c4 with a win. Black needs to intervene to stop White's passers and to play ...g4 once the White king goes to the b-file.
In sum, the moral of the story is that there's more to realizing what matters than can be captured by a null search - of whatever length.
That second form of the horizon effect revolves around what is usually called "intuition" -- or if viewed from the defender's side, the "sense of danger."
Christian Kongsted in his book, How to Use Computers to Improve Your Chess, gives several excellent, detailed examples from actual top-level computer-vs-human games. All those examples are similar to what Dennis is talking about. That is, they involve long-range attacks building up against an opposing king position.
Kongsted notes, for instance, that an engine will never initiate a plan of storming an enemy king position with pawns, unless the software includes specific instructions that help pinpoint situations where that is the appropriate plan (such as kings castled on opposite wings). That's because standard evaluation functions tend to deduct points for weakening your own pawn structure; while the eventual tactical justification for such a plan lies beyond the program's horizon -- the payoff (prying open files around the enemy king) is too far off in the future for the program to recognize it ahead of time.
Evidently some more-recent programs do include features that recognize when a pawn-storm might be promising, so they can initiate one when appropriate.
In a somewhat different situation that is probably easier to program, I've seen Fritz immediately institute a "minority attack" (pushing its a- and b-pawns with the aim of forcing pawn-exchanges that will ultimately saddle its opponent with either a backward or an isolated pawn). I'm pretty sure an engine could not conceive such a plan "on its own," that is, based on pure brute-force calculation. I think it must have had specific instructions that award extra evaluation points for pushing its q-side pawns when the pawn configuration fits a certain pattern.
Kingside attack situations would be tougher to model than minority-attack situations, I think. But in principle there's no reason it can't be done.
Still, computers are likely to continue to lack what I call "defender's intuition". For a detailed discussion of the latter that includes both computer and human examples, see the initial article in my "Sense of Danger" series, in Chess Life, October 2005.
Finally, when it comes to tackling knotty problems like computers' notorious blindness about "fortress" positions, I think that an explicit pattern-recognition approach -- while probably offensive to software purists (from a programming standpoint, writing specific pattern descriptions into an engine is laborious, inelegant, and violates the idea of the computer "thinking" to reach its own conclusions) -- probably are ultimately more promising than calculation-based methods like multiple null-move pruning.
Anyway that’s my 2 sense
Regards
Peter