Forbidden Abelian Square-Free Factors over 4 Letters

Veikko Keränen, 8 June 2004
Rovaniemi Polytechnic

Some computational experiments in context of pattern avoidance in words are described below.

We illustrate highly nonlinear and quite nonintuitive processes where running time and required amount of memory fluctuates in an unpredictable way. A conjecture regarding abelian square-free words over four letters is presented for the ratio of the number of properly extendable words and of forbidden words (factors), where both have the same length.

Here we are interested in forbidden abelian square-free (a-2-free) factors over 4 letters ( = {a,b,c,d}) which we define to be those a-2-free words over which cannot occur as proper factors inside long, maybe infinite, a-2-free words. That is to say, over ,  these forbidden a-2-free words cannot be continued infinitely long to the left and to the right without necessarily creating an abelian square. Quite probably (though open) it is possible to extend some of these words to one direction (only) without producing any abelian squares.

Below we take a word (one of length 16, and four (or sixteen) cases of length 20) and try to extend it to the right (blue) and to the left (red) with all possible letters over .  The length of the words increases only by 1 at a time (alternately to right and left). All possible words are gathered to a list. For each word there are four possibilities: either it can be continued with  3, 2, 1, or 0  letters. The case of  0  letters means the word cannot be continued at all. If no words can be continued, then we get an empty list and the computation ends. The resulting list consists of forbidden factors that start from the originally given word, and this starting word is 'in the middle' of every non-empty word in the list. Indeed, all the computations below do end with an empty list.

The number of words are plotted according to the length of the words. The results look rather incredible: for the 'same looking' starting words of length 20, the computation time varies from a fraction of a second to several hours, and the resulting output files range from 8 kB to nearly 50 MB (some of the new ones even to 87 MB). The factor "abcbdbcbacbdcdacbdac" has the top at (82, 84218)!! Maybe some factors can even be cut to shorter ones which still have the same trajectory. This has not been checked as yet, though.

Actually, this phenomenon is not completely new to us, since we have ran into it several times when carrying out extensive searches for abelian square-free endomorphisms over four letters. Indeed, in many cases the processes have behaved in a very nonintuitive ways and this has to a great extent affected our everyday calculations (which are hard enough to carry out even without these fluctuations).

However, the phenomenon that relatively short 'good looking' words turn out to be forbidden factors after being 'safely' extendable for quite a long time (with a great number of branches) sounds rather peculiar. One might have expected the quite long buffers to be sufficient for further grow. Due to Carpi (1998), we know that the number of a-2-free words over four letters grows exponentially (≥ ) with respect to length  n.  But how do these words grow then?  We conjecture that in spite of the exponential growth, the ratio between the number of properly extendable words of length  n  and between the number of forbidden factors of the same length  n  actually tends to zero when the length  n tends to infinity! Thus we suspect that the vast majority of a-2-free words over four letters cannot occur as proper factors in the middle of very long (infinite) words. In a way, this can partly explain why it is so extremely difficult to find a-2-free endomorphisms over four letters. At present we know that at least nearly half (49.8 %) of the a-2-free words over of length 20 are actually forbidden.

The Mathematica experiments below are based on results firstly achieved by using a C program written by Antti Karhu, Veli-Matti Lahtela, and Olli-Pekka Siivola at the Rovaniemi Polytechnic in Spring 2004.

This first case (of  abcdcbcdadbcbdbc, of length 16) looks definitely boring. Until the empty set appears, each word can be continued with one way only: The empty sublist  {}  in the last (sub)list tells us that the previous word  bcbdbcbabcdcbcdadbcbdbcacbcdcbc  could not be continued to the left (there the letter  L  stood for the direction (state) of the next step), and that all words obtained are indeed, according to our definition above, forbidden factors.

However, the next case of abacabadbacbcdbacdbd  is surely not boring. Before the words die out at the length of 85, the process produces a huge number of forbidden factors. The number of words grows very high, but surprisingly enough, drops from 29323 directly down to 0:     