Repetition Free Words

Repetition Free Words and Computer Algebra


Abstract

Words or strings belong to the very basic objects in theoretical computer science. Thus, the investigation of structures in words constitutes a central research topic in this branch of science. The systematic study of word structures (combinatorics on words) was started by a Norwegian mathematician Axel Thue [7] (1863-1922) at the beginning of this century. One of the remarkable discoveries made by Thue is that the consecutive repetitions of non-empty subwords (squares) can be avoided in infinite words over a three letter alphabet.

After Thue's time, repetition-free words have been used in various fields of mathematics. For example, in group theory, in formal languages, in connection with unending games, and in symbolic dynamics (which constitutes a tool for studying chaos). Very recently repetition-free words have also aroused interest in the field of music, see eg. Laakso [5].

Let X be a four letter alphabet {a,b,c,d }. In [3] (Keränen 1992) we presented an Abelian square-free endomorphism g: X* -> X* which we found by the aid of computers. This endomorphism g: X* -> X* is a mapping that preserves the Abelian square-freeness of words. Its size is |g(abcd)| = 340, and it is defined as follows. The image word of a is

g(a) = abcacdcbcdcadcdbdabacabadbabcbdbcbacbcdcacbabdabacadcbcdcacdbcbacbcdcacdcbdcdadbdcbca

and the image words of letters b, c, d, i.e., the words g(b), g(c), g(d) are obtained by cyclic permutation of letters in the preceding words g(x), with x in X.

All currently known methods, see [1] (Carpi 1994), for constructing arbitrarily long Abelian square-free words over a four letter alphabet X are based on the structure of this endomorphism g. However, it is not known whether there exist Abelian square-free endomorphisms over X with size less than the above-mentioned 340. Solving this and other questions concerning algebral structures of words is developing into a challenging area of computer algebra.

In 1961 Erdös [2, s. 240] raised the question whether abelian squares can be avoided in infinitely long words, i.e. whether there exist infinitely many Abelian square-free words over a given alphabet. An Abelian square means a non-empty word uv, where u and v are permutations of each other. For example, abcacb is an Abelian square. A word is called Abelian square-free, if it does not contain any Abelian square as a subword. For example, the word abacaba is Abelian square-free, while abcdadcada is not (it contains the subword cda dca).

Later, in 1970, Pleasants [6] showed that there exists an infinite Abelian square-free word over five letters.

The case of four letters remained open until 1991 when we managed to show, see Keränen [3], that the iteration of the above-mentioned endomorphism g produces as a "limit" an infinite Abelian square-free word. In the picture (necklace), the beginning of this infinite self-reading word, namely g(abcacd ) = g(a) g(b) g(c) g(a) g(c) g(d), is represented by 6*85 coloured polyhedra in six circles. Circles start from below and each of them represents an image word for a letter read from the beginning of the chain. Thus the first circle is g(a), the second is g(b), then come g(c), g(a), g(c) and g(d).

It is easily seen that abelian squares cannot be avoided over a three letter alphabet. Indeed, in this alphabet, each word of length 8 contains an abelian square. Thus a four letter alphabet is special in the sense that it is the smallest alphabet over which Abelian squares can be avoided in words of unbounded length. Could this special state of affairs be in connection with the genetic code used by nature? - DNA is made up of four different characters.

A part of this abstract has been published in Finnish in [4].


References

  1. A. Carpi. On the number of abelian square-free words on four letters. Technical report, Preprint no. R.I. 110/94/IC, pages 1-15, Istituto di Cibernetica del C.N.R., Arco Felice - Napoli, Italy, 1994.

  2. P. Erdös. Some unsolved problems. Magyar Tud. Acad. Mat. Kutató Int. Közl., 6:221-254, 1961.

  3. V. Keränen. Abelian squares are avoidable on 4 letters. In W. Kuich, editor, Proc. ICALP '92, Lecture Notes in Comp. Sci., volume 623, pages 41-52. Springer-Verlag, Berlin, 1992.

  4. V. Keränen. Toistovapaat merkkijonot ja tietokonealgebra. In C. Gefwert, P. Orponen, and J. Seppänen, editors, Proc. Logic, Mathematics and the Computer, Publications of the Finnish Artificial Intelligence Society, Symposiosarja, volume 14, pages 250-257. Hakapaino, Helsinki, 1996.

  5. T. Laakso. Musical redering of an infinite repetition-free string. In C. Gefwert, P. Orponen, and J. Seppänen, editors, Proc. Logic, Mathematics and the Computer, Publications of the Finnish Artificial Intelligence Society, Symposiosarja, volume 14, pages 292-297. Hakapaino, Helsinki, 1996.

  6. P.A.B. Pleasants. Non-repetitive sequences. Proc. Cambridge Phil. Soc., 68:267- 274, 1970.

  7. A. Thue. Uber unendliche zeichenreihen. Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiania, 7:1-22, 1906.