Michel Planat, Marcelo Amaral, Fang Fang, David Chester, Raymond Aschheim, Klee Irwin  (2021)

Transcription factors (TFs) are proteins that recognize specific DNA fragments in order to decode the genome and ensure its optimal functioning. TFs work at the local and global scales by specifying cell type, cell growth and death, cell migration, organization and timely tasks. We investigate the structure of DNA-binding motifs with the theory of finitely generated groups. The DNA ‘word’ in the binding domain -the motif- may be seen as the generator of a finitely generated group Fdna on four letters, the bases A, T, G and C. It is shown that, most of the time, the DNA-binding motifs have subgroup structure close to free groups of rank three or less, a property that we call ‘syntactical freedom’. Such a property is associated to the aperiodicity of the motif when it is seen as a substitution sequence. Examples are provided for the major families of TFs such as leucine zipper factors, zinc finger factors, homeo-domain factors, etc. We also discuss the exceptions to the existence of such a DNA syntactical rule and their functional role. This includes the TATA box in the promoter region of some genes, the single nucleotide markers (SNP) and the motifs of some genes of ubiquitous role in transcription and regulation.