Problem

Given a Gomoku board situation, identify all threat points. Threat points refer to the positions where both sides can form a winning situation after placing a piece, such as the end of a four-in-a-row (winning, immediately forming a five-in-a-row), and both ends of a living three (forming a living four after placing a piece, if the opponent does not win in this step, it will be a winning situation in the next turn). For example, the positions marked with “x” in the following board.

┌┬┬┬┬┬┬┬┬┬┬┬┬┬┐

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼x┼┼┤

├┼┼┼┼┼┼┼┼┼●┼┼┼┤

├┼┼┼x○○○○●┼┼┼┼┤

├┼┼┼┼┼┼┼●┼┼┼┼┼┤

├┼┼┼┼┼┼x┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

├┼┼┼┼┼┼┼┼┼┼┼┼┼┤

└┴┴┴┴┴┴┴┴┴┴┴┴┴┘

During the search of AI programs, threat points must be quickly found in order to prune other moves. Because placing a piece at other positions when there is a threat point will always lead to a disadvantage.

Naive Method

A simple solution is to traverse according to the rules. I believe most people solve it this way when they first start writing programs, and I am no exception. After all, premature optimization is evil. But eventually, you will find that the speed of this process is crucial.

The first solution that comes to mind is vectorization. Use a bitboard to represent the situation, dividing the board into black and white, each board with 225 bits, which can be placed in uint16_t [16], or a YMM register __m256i. Through AVX instructions such as _mm256_slli_epi16/_mm256_srli_epi16 (VPSLLW/VPSRLW) to shift and align the entire board, and then perform logical AND to find living four, blocked four, living three, etc. This saves the overhead of enumerating the board, but due to the complexity of the threatening shapes (blocked four and living three are not necessarily continuous), the program is not simple and still takes a relatively long time.

Ternary Coding

In 2014, I independently discovered a solution and applied it to the evaluation function of kalscope v2 “Dolanaar”. This program was a practice work for me when I was an undergraduate, but it also achieved competitive strength at that time.

I treat each Yang line (horizontal or vertical line) on the board separately. There are 15 places on a line, I list all possible situations, calculate the threat points in advance to form a lookup table. However, if represented by a bitboard, the encoding of a line can reach , and the size of the lookup table reaches 2 GB. Note that there is a lot of redundancy in the bitboard encoding, with 2 bits encoding 3 situations: empty “┼” 0b00, white piece “○” 0b01, black piece “●” 0b10, and 0b11 is illegal. To achieve compact encoding, ternary counting is needed, 0 represents empty, 1 represents white piece, 2 represents black piece, each ternary digit represents a place.

The conversion to ternary involves a lot of division and modulus operations, thus it is unrealistic. Note that the board situation does not appear suddenly, but is always played gradually from an empty board, and ternary representation can be incrementally maintained. The empty line is encoded as 0; placing a white stone at the -th position is equal to adding to the encoding; placing a black piece at the -th position is equal to adding to the encoding. On backtrack, subtract the increased value from encoding. To implement efficiently, we only need to keep another lookup table of the powers of three, power3.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
constexpr int power3[16] {
1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683, 59049, 177147, 531441, 1594323, 4782969, 14348907
};
int triboard_h[15] {};
int triboard_v[15] {};
uint16_t threat_lut[14348907] { ... }; // pre-generated.

void make(int x, int y, int color) {
triboard_h[y] += power3[x] << color;
triboard_v[x] += power3[y] << color;
}
void unmake(int x, int y, int color) {
triboard_h[y] -= power3[x] << color;
triboard_v[x] -= power3[y] << color;
}
void find_threat() {
for (int y = 0; y < 15; y++) {
uint16_t threat = threat_lut[triboard_h[y]];
while (threat) {
int x = std::countr_zero(threat);

// do something to the threat (x,y).

threat &= threat - 1;
}
}
// another loop for vertical lines.
// TODO: how about diagonal / anti-diagonal lines?
}

The lookup table threat_lut tightly encodes all possible situations of a Yang line on a 15-line board, occupying only 27.4 MB of memory.

This method is designed specifically for a standard 15-line board, but at that time, the Gomocup rules were using 20-line boards, which made this method invalid. Otherwise, I might have opted in the competition. I am not sure if anyone else has discovered a similar method earlier by me, but I did notice that Rapfi, recent champion of Gomocup, also implemented this technique in it.

This method is not perfect. It is only for a fixed-length Yang line, and using it for the 20-line rules of Gomocup will result in an oversized LUT. In addition, the length of each Yin line (diagonal or anti-diagonal line) is different, and if you forcibly apply the lookup table for the Yang line to Yin lines, it will lead to false positives. For example, on a Yin line with a total length of only 4 places, a living three is not a threat because the board boundary will naturally block it. How should this be handled?

I did not give an answer in 2014, but left an unsolved problem.

Skew Ternary Coding

Last year, I guided doctoral student Guo, Hongrui to complete Cambricon-U, which implemented the in-place counting in SRAM using skew binary numbers. Skew numbers can not only be used for binary, but also for ternary, solving the abovementioned problem.

Skew ternary allows a digit “3” to temporarily exist in the number without carrying over. When counting, if there is no digit “3” existing, increment the least significant digit by 1; If there is a digit “3”, reset the “3” to “0” and carry over to the next digit. Under such counting rules, it is easy to find that skew ternary has the following two properties:

  • There is at most one digit “3” in the number;
  • If there is a “3”, all less significant digits must be “0”.

We use “0” to represent an empty place, “1” to represent a white stone, “2” to represent a black stone, and “3” to represent the board boundary. Such a coding can naturally and compactly encode various lengths of Yin lines into the lookup table. When the length of the Yin line is less than 15, only the high digits are used, and the boundary is set to “3”, and the less significant digits are “0”. The items in the lookup table are compiled in the order of skew ternary counting:

Lookup Table
Decimal Skew Ternary Line Situation Threats
0 000000000000000 ├┼┼┼┼┼┼┼┼┼┼┼┼┼┤ 0b000000000000000
1 000000000000001 ├┼┼┼┼┼┼┼┼┼┼┼┼┼○ 0b000000000000000
2 000000000000002 ├┼┼┼┼┼┼┼┼┼┼┼┼┼● 0b000000000000000
3 000000000000003 ├┼┼┼┼┼┼┼┼┼┼┼┼┤▒ 0b000000000000000
4 000000000000010 ├┼┼┼┼┼┼┼┼┼┼┼┼○┤ 0b000000000000000
5 000000000000011 ├┼┼┼┼┼┼┼┼┼┼┼┼○○ 0b000000000000000
6 000000000000012 ├┼┼┼┼┼┼┼┼┼┼┼┼○● 0b000000000000000
7 000000000000013 ├┼┼┼┼┼┼┼┼┼┼┼┼○▒ 0b000000000000000
8 000000000000020 ├┼┼┼┼┼┼┼┼┼┼┼┼●┤ 0b000000000000000
9 000000000000021 ├┼┼┼┼┼┼┼┼┼┼┼┼●○ 0b000000000000000
10 000000000000022 ├┼┼┼┼┼┼┼┼┼┼┼┼●● 0b000000000000000
11 000000000000023 ├┼┼┼┼┼┼┼┼┼┼┼┼●▒ 0b000000000000000
12 000000000000030 ├┼┼┼┼┼┼┼┼┼┼┼┤▒▒ 0b000000000000000
13 000000000000100 ├┼┼┼┼┼┼┼┼┼┼┼○┼┤ 0b000000000000000
14 000000000000101 ├┼┼┼┼┼┼┼┼┼┼┼○┼○ 0b000000000000000
15 000000000000102 ├┼┼┼┼┼┼┼┼┼┼┼○┼● 0b000000000000000
16 000000000000103 ├┼┼┼┼┼┼┼┼┼┼┼○┤▒ 0b000000000000000
17 000000000000110 ├┼┼┼┼┼┼┼┼┼┼┼○○┤ 0b000000000000000
18 000000000000111 ├┼┼┼┼┼┼┼┼┼┼┼○○○ 0b000000000000000
19 000000000000112 ├┼┼┼┼┼┼┼┼┼┼┼○○● 0b000000000000000
20 000000000000113 ├┼┼┼┼┼┼┼┼┼┼┼○○▒ 0b000000000000000
21 000000000000120 ├┼┼┼┼┼┼┼┼┼┼┼○●┤ 0b000000000000000
22 000000000000121 ├┼┼┼┼┼┼┼┼┼┼┼○●○ 0b000000000000000
23 000000000000122 ├┼┼┼┼┼┼┼┼┼┼┼○●● 0b000000000000000
24 000000000000123 ├┼┼┼┼┼┼┼┼┼┼┼○●▒ 0b000000000000000
25 000000000000130 ├┼┼┼┼┼┼┼┼┼┼┼○▒▒ 0b000000000000000
26 000000000000200 ├┼┼┼┼┼┼┼┼┼┼┼●┼┤ 0b000000000000000
27 000000000000201 ├┼┼┼┼┼┼┼┼┼┼┼●┼○ 0b000000000000000
28 000000000000202 ├┼┼┼┼┼┼┼┼┼┼┼●┼● 0b000000000000000
29 000000000000203 ├┼┼┼┼┼┼┼┼┼┼┼●┤▒ 0b000000000000000
30 000000000000210 ├┼┼┼┼┼┼┼┼┼┼┼●○┤ 0b000000000000000
31 000000000000211 ├┼┼┼┼┼┼┼┼┼┼┼●○○ 0b000000000000000
32 000000000000212 ├┼┼┼┼┼┼┼┼┼┼┼●○● 0b000000000000000
33 000000000000213 ├┼┼┼┼┼┼┼┼┼┼┼●○▒ 0b000000000000000
34 000000000000220 ├┼┼┼┼┼┼┼┼┼┼┼●●┤ 0b000000000000000
35 000000000000221 ├┼┼┼┼┼┼┼┼┼┼┼●●○ 0b000000000000000
36 000000000000222 ├┼┼┼┼┼┼┼┼┼┼┼●●● 0b000000000000000
37 000000000000223 ├┼┼┼┼┼┼┼┼┼┼┼●●▒ 0b000000000000000
38 000000000000230 ├┼┼┼┼┼┼┼┼┼┼┼●▒▒ 0b000000000000000
39 000000000000300 ├┼┼┼┼┼┼┼┼┼┼┤▒▒▒ 0b000000000000000
40 000000000001000 ├┼┼┼┼┼┼┼┼┼┼○┼┼┤ 0b000000000000000
41 000000000001001 ├┼┼┼┼┼┼┼┼┼┼○┼┼○ 0b000000000000000
42 000000000001002 ├┼┼┼┼┼┼┼┼┼┼○┼┼● 0b000000000000000
43 000000000001003 ├┼┼┼┼┼┼┼┼┼┼○┼┤▒ 0b000000000000000
44 000000000001010 ├┼┼┼┼┼┼┼┼┼┼○┼○┤ 0b000000000000000
45 000000000001011 ├┼┼┼┼┼┼┼┼┼┼○┼○○ 0b000000000000000
138 000000000010110 ├┼┼┼┼┼┼┼┼┼○┼○○┤ 0b000000000001000
174 000000000011100 ├┼┼┼┼┼┼┼┼┼○○○┼┤ 0b000000000100010
175 000000000011101 ├┼┼┼┼┼┼┼┼┼○○○┼○ 0b000000000000010
176 000000000011102 ├┼┼┼┼┼┼┼┼┼○○○┼● 0b000000000100000
177 000000000011102 ├┼┼┼┼┼┼┼┼┼○○○┤▒ 0b000000000100000
21523354 222223000000000 ●●●●●▒▒▒▒▒▒▒▒▒▒ 0b000000000000000
21523355 222230000000000 ●●●●▒▒▒▒▒▒▒▒▒▒▒ 0b000000000000000
21523356 222300000000000 ●●●▒▒▒▒▒▒▒▒▒▒▒▒ 0b000000000000000
21523357 223000000000000 ●●▒▒▒▒▒▒▒▒▒▒▒▒▒ 0b000000000000000
21523358 230000000000000 ●▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 0b000000000000000

The lookup table has a total of 21523359 items, tightly encoding all situations of length-15 Yang lines and Yin lines from length 1 to 15, occupying 41.1 MB of memory. The code is slightly modified, changing the ternary weight power3 to the skew ternary weight basesk3, which is the sequence OEIS A261547.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
constexpr int basesk3[16] {
1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524, 88573, 265720, 797161, 2391484, 7174453, 21523360
};
int sk3board_h[15] {};
int sk3board_v[15] {};
int sk3board_d[29] {};
int sk3board_a[29] {};
uint16_t threat_lut[21523359] { ... }; // pre-generated.

// set board bounds on init.
void init() {
for (int i = 0; i < 15; i++) {
sk3board_d[i] = basesk3[13-i] * 3;
sk3board_a[i] = basesk3[13-i] * 3;
}
for (int i = 15; i < 29; i++)
sk3board_d[i] = basesk3[i-15] * 3;
sk3board_a[i] = basesk3[i-15] * 3;
}
}

void make(int x, int y, int color) {
sk3board_h[y] += basesk3[x] << color;
sk3board_v[x] += basesk3[y] << color;
sk3board_d[14-x+y] += basesk3[x<y?y:x] << color;
sk3board_a[x+y] += basesk3[x+y<15?14-y:x] << color;
}
void unmake(int x, int y, int color) {
sk3board_h[y] -= basesk3[x] << color;
sk3board_v[x] -= basesk3[y] << color;
sk3board_d[14-x+y] -= basesk3[x<y?y:x] << color;
sk3board_a[x+y] -= basesk3[x+y<15?14-y:x] << color;
}
void find_threat() {
for (int y = 0; y < 15; y++) {
uint16_t threat = threat_lut[sk3board_h[y]];
while (threat) {
int x = std::countr_zero(threat);

// do something to the threat (x,y).

threat &= threat - 1;
}
}
// to process vertical, diagonal and anti-diagonal lines
}

We have completely transformed the very cumbersome threat point enumerating problem into incrementally maintaining the board representation in four directions, and then performing table lookups. A lookup table of about 41 MB can now as a whole fit in the L3 cache, thus the speed is extremely fast. In the above example, find_threat can be further vectorized, and the overhead of the entire operation can be compressed to about a hundred clock cycles.

The same idea can be used for winning checks, evaluation functions, etc. If you are preparing to use an architecture similar to AlphaGo, you might as well try to change the first layer of the neural network to 1D convolution along the four lines, and this method can directly save the first layer convolution operation with lookup.