Zhao, Yongwei

Problem

Given a Gomoku board situation, identify all threat points. Threat points refer to the positions where both sides can form a winning situation after placing a piece, such as the end of a four-in-a-row (winning, immediately forming a five-in-a-row), and both ends of a living three (forming a living four after placing a piece, if the opponent does not win in this step, it will be a winning situation in the next turn). For example, the positions marked with “x” in the following board.

┌┬┬┬┬┬┬┬┬┬┬┬┬┬┐

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼x┼┼┤

├┼┼┼┼┼┼┼┼┼●┼┼┼┤

├┼┼┼x○○○○●┼┼┼┼┤

├┼┼┼┼┼┼┼●┼┼┼┼┼┤

├┼┼┼┼┼┼x┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

└┴┴┴┴┴┴┴┴┴┴┴┴┴┘

During the search of AI programs, threat points must be quickly found in order to prune other moves. Because placing a piece at other positions when there is a threat point will always lead to a disadvantage.

Naive Method

A simple solution is to traverse according to the rules. I believe most people solve it this way when they first start writing programs, and I am no exception. After all, premature optimization is evil. But eventually, you will find that the speed of this process is crucial.

The first solution that comes to mind is vectorization. Use a bitboard to represent the situation, dividing the board into black and white, each board with 225 bits, which can be placed in uint16_t [16], or a YMM register __m256i. Through AVX instructions such as _mm256_slli_epi16/_mm256_srli_epi16 (VPSLLW/VPSRLW) to shift and align the entire board, and then perform logical AND to find living four, blocked four, living three, etc. This saves the overhead of enumerating the board, but due to the complexity of the threatening shapes (blocked four and living three are not necessarily continuous), the program is not simple and still takes a relatively long time.

Ternary Coding

In 2014, I independently discovered a solution and applied it to the evaluation function of kalscope v2 “Dolanaar”. This program was a practice work for me when I was an undergraduate, but it also achieved competitive strength at that time.

I treat each Yang line (horizontal or vertical line) on the board separately. There are 15 places on a line, I list all possible situations, calculate the threat points in advance to form a lookup table. However, if represented by a bitboard, the encoding of a line can reach , and the size of the lookup table reaches 2 GB. Note that there is a lot of redundancy in the bitboard encoding, with 2 bits encoding 3 situations: empty “┼” 0b00, white piece “○” 0b01, black piece “●” 0b10, and 0b11 is illegal. To achieve compact encoding, ternary counting is needed, 0 represents empty, 1 represents white piece, 2 represents black piece, each ternary digit represents a place.

The conversion to ternary involves a lot of division and modulus operations, thus it is unrealistic. Note that the board situation does not appear suddenly, but is always played gradually from an empty board, and ternary representation can be incrementally maintained. The empty line is encoded as 0; placing a white stone at the -th position is equal to adding to the encoding; placing a black piece at the -th position is equal to adding to the encoding. On backtrack, subtract the increased value from encoding. To implement efficiently, we only need to keep another lookup table of the powers of three, power3.

constexpr int power3[16] {
 1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683, 59049, 177147, 531441, 1594323, 4782969, 14348907
};
int triboard_h[15] {};
int triboard_v[15] {};
uint16_t threat_lut[14348907] { ... }; // pre-generated.

void make(int x, int y, int color) {
    triboard_h[y] += power3[x] << color;
    triboard_v[x] += power3[y] << color;
}
void unmake(int x, int y, int color) {
    triboard_h[y] -= power3[x] << color;
    triboard_v[x] -= power3[y] << color;
}
void find_threat() {
    for (int y = 0; y < 15; y++) {
        uint16_t threat = threat_lut[triboard_h[y]];
        while (threat) {
            int x = std::countr_zero(threat);

            // do something to the threat (x,y).

            threat &= threat - 1;
        }
    }
    // another loop for vertical lines.
    // TODO: how about diagonal / anti-diagonal lines?
}

The lookup table threat_lut tightly encodes all possible situations of a Yang line on a 15-line board, occupying only 27.4 MB of memory.

This method is designed specifically for a standard 15-line board, but at that time, the Gomocup rules were using 20-line boards, which made this method invalid. Otherwise, I might have opted in the competition. I am not sure if anyone else has discovered a similar method earlier by me, but I did notice that Rapfi, recent champion of Gomocup, also implemented this technique in it.

This method is not perfect. It is only for a fixed-length Yang line, and using it for the 20-line rules of Gomocup will result in an oversized LUT. In addition, the length of each Yin line (diagonal or anti-diagonal line) is different, and if you forcibly apply the lookup table for the Yang line to Yin lines, it will lead to false positives. For example, on a Yin line with a total length of only 4 places, a living three is not a threat because the board boundary will naturally block it. How should this be handled?

I did not give an answer in 2014, but left an unsolved problem.

Skew Ternary Coding

Last year, I guided doctoral student Guo, Hongrui to complete Cambricon-U, which implemented the in-place counting in SRAM using skew binary numbers. Skew numbers can not only be used for binary, but also for ternary, solving the abovementioned problem.

Skew ternary allows a digit “3” to temporarily exist in the number without carrying over. When counting, if there is no digit “3” existing, increment the least significant digit by 1; If there is a digit “3”, reset the “3” to “0” and carry over to the next digit. Under such counting rules, it is easy to find that skew ternary has the following two properties:

There is at most one digit “3” in the number;
If there is a “3”, all less significant digits must be “0”.

We use “0” to represent an empty place, “1” to represent a white stone, “2” to represent a black stone, and “3” to represent the board boundary. Such a coding can naturally and compactly encode various lengths of Yin lines into the lookup table. When the length of the Yin line is less than 15, only the high digits are used, and the boundary is set to “3”, and the less significant digits are “0”. The items in the lookup table are compiled in the order of skew ternary counting:

Lookup Table

Decimal	Skew Ternary	Line Situation	Threats
0	000000000000000	├┼┼┼┼┼┼┼┼┼┼┼┼┼┤	`0b000000000000000`
1	000000000000001	├┼┼┼┼┼┼┼┼┼┼┼┼┼○	`0b000000000000000`
2	000000000000002	├┼┼┼┼┼┼┼┼┼┼┼┼┼●	`0b000000000000000`
3	000000000000003	├┼┼┼┼┼┼┼┼┼┼┼┼┤▒	`0b000000000000000`
4	000000000000010	├┼┼┼┼┼┼┼┼┼┼┼┼○┤	`0b000000000000000`
5	000000000000011	├┼┼┼┼┼┼┼┼┼┼┼┼○○	`0b000000000000000`
6	000000000000012	├┼┼┼┼┼┼┼┼┼┼┼┼○●	`0b000000000000000`
7	000000000000013	├┼┼┼┼┼┼┼┼┼┼┼┼○▒	`0b000000000000000`
8	000000000000020	├┼┼┼┼┼┼┼┼┼┼┼┼●┤	`0b000000000000000`
9	000000000000021	├┼┼┼┼┼┼┼┼┼┼┼┼●○	`0b000000000000000`
10	000000000000022	├┼┼┼┼┼┼┼┼┼┼┼┼●●	`0b000000000000000`
11	000000000000023	├┼┼┼┼┼┼┼┼┼┼┼┼●▒	`0b000000000000000`
12	000000000000030	├┼┼┼┼┼┼┼┼┼┼┼┤▒▒	`0b000000000000000`
13	000000000000100	├┼┼┼┼┼┼┼┼┼┼┼○┼┤	`0b000000000000000`
14	000000000000101	├┼┼┼┼┼┼┼┼┼┼┼○┼○	`0b000000000000000`
15	000000000000102	├┼┼┼┼┼┼┼┼┼┼┼○┼●	`0b000000000000000`
16	000000000000103	├┼┼┼┼┼┼┼┼┼┼┼○┤▒	`0b000000000000000`
17	000000000000110	├┼┼┼┼┼┼┼┼┼┼┼○○┤	`0b000000000000000`
18	000000000000111	├┼┼┼┼┼┼┼┼┼┼┼○○○	`0b000000000000000`
19	000000000000112	├┼┼┼┼┼┼┼┼┼┼┼○○●	`0b000000000000000`
20	000000000000113	├┼┼┼┼┼┼┼┼┼┼┼○○▒	`0b000000000000000`
21	000000000000120	├┼┼┼┼┼┼┼┼┼┼┼○●┤	`0b000000000000000`
22	000000000000121	├┼┼┼┼┼┼┼┼┼┼┼○●○	`0b000000000000000`
23	000000000000122	├┼┼┼┼┼┼┼┼┼┼┼○●●	`0b000000000000000`
24	000000000000123	├┼┼┼┼┼┼┼┼┼┼┼○●▒	`0b000000000000000`
25	000000000000130	├┼┼┼┼┼┼┼┼┼┼┼○▒▒	`0b000000000000000`
26	000000000000200	├┼┼┼┼┼┼┼┼┼┼┼●┼┤	`0b000000000000000`
27	000000000000201	├┼┼┼┼┼┼┼┼┼┼┼●┼○	`0b000000000000000`
28	000000000000202	├┼┼┼┼┼┼┼┼┼┼┼●┼●	`0b000000000000000`
29	000000000000203	├┼┼┼┼┼┼┼┼┼┼┼●┤▒	`0b000000000000000`
30	000000000000210	├┼┼┼┼┼┼┼┼┼┼┼●○┤	`0b000000000000000`
31	000000000000211	├┼┼┼┼┼┼┼┼┼┼┼●○○	`0b000000000000000`
32	000000000000212	├┼┼┼┼┼┼┼┼┼┼┼●○●	`0b000000000000000`
33	000000000000213	├┼┼┼┼┼┼┼┼┼┼┼●○▒	`0b000000000000000`
34	000000000000220	├┼┼┼┼┼┼┼┼┼┼┼●●┤	`0b000000000000000`
35	000000000000221	├┼┼┼┼┼┼┼┼┼┼┼●●○	`0b000000000000000`
36	000000000000222	├┼┼┼┼┼┼┼┼┼┼┼●●●	`0b000000000000000`
37	000000000000223	├┼┼┼┼┼┼┼┼┼┼┼●●▒	`0b000000000000000`
38	000000000000230	├┼┼┼┼┼┼┼┼┼┼┼●▒▒	`0b000000000000000`
39	000000000000300	├┼┼┼┼┼┼┼┼┼┼┤▒▒▒	`0b000000000000000`
40	000000000001000	├┼┼┼┼┼┼┼┼┼┼○┼┼┤	`0b000000000000000`
41	000000000001001	├┼┼┼┼┼┼┼┼┼┼○┼┼○	`0b000000000000000`
42	000000000001002	├┼┼┼┼┼┼┼┼┼┼○┼┼●	`0b000000000000000`
43	000000000001003	├┼┼┼┼┼┼┼┼┼┼○┼┤▒	`0b000000000000000`
44	000000000001010	├┼┼┼┼┼┼┼┼┼┼○┼○┤	`0b000000000000000`
45	000000000001011	├┼┼┼┼┼┼┼┼┼┼○┼○○	`0b000000000000000`
…	…	…	…
138	000000000010110	├┼┼┼┼┼┼┼┼┼○┼○○┤	`0b000000000001000`
…	…	…	…
174	000000000011100	├┼┼┼┼┼┼┼┼┼○○○┼┤	`0b000000000100010`
175	000000000011101	├┼┼┼┼┼┼┼┼┼○○○┼○	`0b000000000000010`
176	000000000011102	├┼┼┼┼┼┼┼┼┼○○○┼●	`0b000000000100000`
177	000000000011102	├┼┼┼┼┼┼┼┼┼○○○┤▒	`0b000000000100000`
…	…	…	…
21523354	222223000000000	●●●●●▒▒▒▒▒▒▒▒▒▒	`0b000000000000000`
21523355	222230000000000	●●●●▒▒▒▒▒▒▒▒▒▒▒	`0b000000000000000`
21523356	222300000000000	●●●▒▒▒▒▒▒▒▒▒▒▒▒	`0b000000000000000`
21523357	223000000000000	●●▒▒▒▒▒▒▒▒▒▒▒▒▒	`0b000000000000000`
21523358	230000000000000	●▒▒▒▒▒▒▒▒▒▒▒▒▒▒	`0b000000000000000`

The lookup table has a total of 21523359 items, tightly encoding all situations of length-15 Yang lines and Yin lines from length 1 to 15, occupying 41.1 MB of memory. The code is slightly modified, changing the ternary weight power3 to the skew ternary weight basesk3, which is the sequence OEIS A261547.

constexpr int basesk3[16] {
    1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524, 88573, 265720, 797161, 2391484, 7174453, 21523360
};
int sk3board_h[15] {};
int sk3board_v[15] {};
int sk3board_d[29] {};
int sk3board_a[29] {};
uint16_t threat_lut[21523359] { ... }; // pre-generated.
   
// set board bounds on init. 
void init() {
    for (int i = 0; i < 15; i++) {
        sk3board_d[i] = basesk3[13-i] * 3;
        sk3board_a[i] = basesk3[13-i] * 3;
    }
    for (int i = 15; i < 29; i++)
        sk3board_d[i] = basesk3[i-15] * 3;
        sk3board_a[i] = basesk3[i-15] * 3;
    }
}

void make(int x, int y, int color) {
    sk3board_h[y]      += basesk3[x]             << color;
    sk3board_v[x]      += basesk3[y]             << color;
    sk3board_d[14-x+y] += basesk3[x<y?y:x]       << color;
    sk3board_a[x+y]    += basesk3[x+y<15?14-y:x] << color;
}
void unmake(int x, int y, int color) {
    sk3board_h[y]      -= basesk3[x]             << color;
    sk3board_v[x]      -= basesk3[y]             << color;
    sk3board_d[14-x+y] -= basesk3[x<y?y:x]       << color;
    sk3board_a[x+y]    -= basesk3[x+y<15?14-y:x] << color;
}
void find_threat() {
    for (int y = 0; y < 15; y++) {
        uint16_t threat = threat_lut[sk3board_h[y]];
        while (threat) {
            int x = std::countr_zero(threat);

            // do something to the threat (x,y).

            threat &= threat - 1;
        }
    }
    // to process vertical, diagonal and anti-diagonal lines
}

We have completely transformed the very cumbersome threat point enumerating problem into incrementally maintaining the board representation in four directions, and then performing table lookups. A lookup table of about 41 MB can now as a whole fit in the L3 cache, thus the speed is extremely fast. In the above example, find_threat can be further vectorized, and the overhead of the entire operation can be compressed to about a hundred clock cycles.

The same idea can be used for winning checks, evaluation functions, etc. If you are preparing to use an architecture similar to AlphaGo, you might as well try to change the first layer of the neural network to 1D convolution along the four lines, and this method can directly save the first layer convolution operation with lookup.

The Application of Skew Ternary Numbers in Gomoku AI Programs

Problem

Naive Method

Ternary Coding

Skew Ternary Coding