The concept is simple. Design a machine to deterministically play the perfect Scrabble game. Imagine a machine not unlike today’s modern supercomputer machine. It will be a machine with many processors because the analysis of a Scrabble board can surely be broken down into very small independent elements that should logically be processed concurrently. The very smallest independent element that a Scrabble board can be broken down into is referred to as a PlaySpace.
The PlaySpace unit refers to a subsection of a chosen row or column where a subset of new tiles from the player’s rack will be played on consecutively occurring free spaces that may have previously filled spaces between them. A PlaySpace is confined by the geometry and the rules of the game. In particular, the smallest PlaySpace is two units in size, and the largest is fifteen. Each unit will be referred to as a PlayBucket. The least number of free spaces that a PlaySpace can hold is one, and the most is seven.
PlayBuckets come in three varieties, the filled bucket, the conditional bucket, and the vacant bucket. The filled bucket contains a tile played in a previous turn and remains constant throughout the game. The conditional bucket can hold a subset of letter tiles. These tiles must satisfy the condition that all perpendicular words established as a byproduct of the primary PlaySpace word must be contained in the agreed upon lexicon. The vacant bucket is just that, it is unconstrained, and thus may hold any letter tile leading to a valid primary word.
There will be a primary processor that handles the parsing of a game state and the management of all of the sub threads handled by the other processors. There will be at least one processor that handles the calculation of the point value of each possible play. There will be one processor that will sort and organize all of the possible plays by point value and other heuristic values.
There will be at least 14 processors that will engage in play discovery. Each processor involved in this task will handle PlaySpaces of only one length. A particular processor will thus only need a small subset of the lexicon inside of its primary data structure. Due to the small size of the lexicon, the data structure will be extremely robust, and filling of PlaySpaces will proceed through the path of highest constraint.
For example, we may search for 5 letter words starting from any letter and then proceeding to any of the remaining letters, and so forth, and so on. This will give the size ‘n’ PlaySpace data structure exactly ‘n’ factorial root nodes. This approach only makes sense in theory because the size 15 PlaySpace analysis data structure will have more than 1.3x10^12 root nodes. Veteran Scrabble players know that size 15 PlaySpaces are extremely rare and contain huge constraints even using an ordered traversal.
The 120 root nodes in the size 5 PlaySpace data structure will share identical postfix structures using the optimal DAWG algorithm to reduce the size. It is noteworthy to mention that the amount of redundant information in a data structure where each root node holds rearrangements of letters in the same words is astronomical. The PlaySpace will be ordered using statistics collected about how many nodes are below the initial entry point node in the data structure.
The Parsing of the game board into PlaySpaces in this environment will likely be a bottleneck and because it can be broken down, it should be. A total of 100 tiles likely allows for every possible game state to be discovered. Even then, there are inevitably game states that do not allow for a line of plays to be made that will guarantee victory. Thus an exact probabilistic determination of the likelihood of victory must be the guiding principle in the perfect game of Scrabble. The reality is that the game of Scrabble is played with imperfect information; not knowing the tile rack of the opponent. This game more closely mirrors all real competitive situations than does a game like Chess, which lives in a space of perfect information. Inevitably, the Perfect Scrabble Machine is forced to play imperfectly. It is a game of skill and chance, but it is well known that chance really only determines a game’s outcome when the skill level of the players is roughly equivalent. Hence, the Perfect Scrabble Machine may only ever lose to itself or the luckiest human being to ever exist.
The Perfect Scrabble machine differs from all other board analysis automatons in that it does not “see” the board in quite the same way. The board that the machine sees is populated with empty squares, filled squares, and “pivot” squares. Pivots are defined as vacant squares that have anywhere between one, and four filled neighboring squares. The concept is not new. These are not anchor squares in the traditional sense, in that they are not filled. How can an anchor be vacant? It can not. There are three types of Pivot squares:
So far, this information has surely been included in Scrabble analysis automatons. The new information that will be stored inside of Pivot squares has to do with the ability of the words on the board to be prefixes, infixes, and postfixes. Across Pivot squares will store the pre-in-post-fix information for words that are above and below them. Likewise, the Down Pivot squares will store the pre-in-post-fix information about the words to the left and right of them. Clearly, if there are only single neighboring squares, the pre-in-post vector will be TRUE-TRUE-TRUE and should not be considered. The value of this information arises when a PlaySpace has a filled block inside of it that eliminates the possibility of any arrangement of letters to fill the space with valid plays. Thus we are completely eliminating the need to analyze a great many PlaySpaces, reducing the processor time requirements when this type of PlaySpace takes the longest to analyze. Less PlaySpaces will need to be considered, and that is advantageous.
In PlaySpaces that have qualified to be analyzed, the Pivot squares with the same direction to the primary play itself will become the conditional PlayBuckets. Further, if only 1 tile in the players rack can be placed into the PlayBucket, that bucket becomes virtually filled, and the players rack becomes smaller for the rest of the analysis. By attempting to virtually fill the conditional PlayBuckets inside of a particular PlaySpace, many of the potential PlaySpaces will be found to be invalid because the same tile is required to be in multiple PlayBuckets simultaneously. This elimination of potential PlaySpaces is an impossibility using traditional algorithms that proceed by discovering every play of any length that contains a Pivot square. The Pivot squares are erroneously referred to as a hook squares. The pre-in-post-fix information required for this improvement can easily be stored in a trie, where each node that represents the end of a word will contain a 3 bit value for pre-in-post-fix status. It is important to note that this will increase the size of the DAWG required to store a lexicon, but certainly the Perfect Scrabble Machine will have at least one independent processor taking care of updating the information stored inside of Pivot squares after each play is made.
A real machine is just that, real. It has finite limitations and thus, traversal of PlaySpaces in any order becomes irrational. The best compromise that any real machine can make is an ordered traversal of a PlaySpace starting from any position within a word, and toward a chosen reference point, either the start or end of the word, and then the remaining letters towards the opposing reference point. This is what will be referred to as true bi-directionality. Steven A. Gordon claims that the GADDAG incorporates bi-directionality, but this is not true in that the only way to traverse to the end of a word first is to start at the first letter in the word. The “start of word” delimiter is a waste of space compared to the UDAWG's implicit start delimiter based on the root node chosen to traverse the graph, see the treatment below. The GADDAG’s power as a sequential data structure is truly its undoing when applied to the parallel nature of a Scrabble board analysis automaton. The data structure with the most roots used in the Real Machine would then be for 15 letter words with a root count equal to 28. This number is not 30 because when dealing with only 15 letter words, traversing from the 15’th letter from the start and traversing from the 1’st letter from the end amounts to an identical procedure.
Real machines require an operating system to control all of the hardware involved in computation, and the Real Machine Compromise is no exception, only that its operating system will be designed for one purpose… Generation of plays… And that’s it. Processor instructions will have to be added to the architecture of the chips themselves to reduce the number of clock cycles required for any command used in the algorithm. A custom one purpose super computer with no compromise is still a potential reality. The Real Machine Compromise could be approximated by a 4 CPU slot motherboard with quad core CPU’s in each slot, thus 16 thread lines and 1 machine per length of PlaySpace, and 2 CPU’s to coordinate the effort and update the board between plays.
Even the Real Machine Compromise is far out of the reach of the independent software hobbyist. Multiple threads of command are confusing and for low level languages that do not require the overhead of a virtual machine, they are Operation System specific. The same can be said about even 2D graphics, so the Double Compromise Machine is thus a command line based application designed as a proof of concept, and test bed for the Real Machine Compromise. The only advantage that the Double Compromise Machine has over the two machines outlined above is that it is complete and available as a windows executable and a data file, or C source and a data file.
The Double Compromise Machine does not contain the pre-in-post-fix information yet. It has one primary data structure that does the work of the 14 outlined in the Real Machine. This data structure is referred to as an Optimal Universal Directed Acyclic Word Graph. A separate web page will explore this structure further. It requires 21 hours to compile it on an AMD 2000+ based machine for the TWL06 lexicon. In short, words of length n are able to be traversed in 2x (n-1) ways and they are all represented in the data structure. It has 28 root nodes representing the different ways that a word of up to 15 letters can be traversed. With a size of fewer than 7 Mega Bytes, it holds a huge amount of data, in a compact container. Modern CPU cache size gets up to 12 Mega Bytes by comparison. The most complex procedure in the Double Compromise Machine is the selection of the root node in the OUDAWG, or stated directly, the determination of the ordered path of highest constraint. It makes use of node layout statistics whenever possible and it is certainly non-trivial.
Download it and test it, and give me some feedback via email. That is all that I ask. If you are concerned about the inevitable widespread use of this program to cheat on the Internet Scrabble Club, well then do yourself a favor, and play the game on a board with real people in front of you.
The C code, the data file holding the OUDAWG, and the Windows console program.zip 4.92 MB