In fact, ORC from these eukaryotes exhibits sequence-independent binding to DNA. Moreover, other yeast species such as the fission yeast were found to have less sequence specific origins than budding yeast ( Dai et al., 2005).Īfter over 20 metazoan origins were extensively studied and characterized, no consensus DNA sequence equivalent to ARS emerged ( Gilbert, 2001). Using DNA combing, one group deleted all ARSs within a single yeast chromosome and found that replication still initiated from other non-ARS origins throughout the chromosome ( Bogenschutz et al., 2014). Later, it became evident that the requirement for ARS in budding yeast was much more flexible than originally thought ( Méchali, 2010). ChIP-based methods demonstrated that ARS elements are occupied by ORC as well as MCM subunits ( Wyrick et al., 2001). ARS elements were also shown to initiate replication in yeast chromosomes and further delineated to an 11-bp fragment still capable of initiating replication ( Bell and Stillman, 1992 Marahrens and Stillman 1992). Gilbert, in Nuclear Architecture and Dynamics, 2018 18.3.3.1 DNA SequenceĬonsensus sequences acting as replication origins were originally defined for eukaryotes in the budding yeast Saccharomyces cerevisiae as a 12–17-bp sequence capable of permitting circular plasmid replication and termed the autonomously replicating sequence (ARS) ( Stinchcomb et al., 1979). Only a fraction of known sequences precisely match the consensus sequence, and allowing mismatches or gaps greatly increases the number of false positive predictions, a problem that is exacerbated by the loss of information discussed, above. Third, consensus sequence do not generalize well. Second, consensus sequences do not include a background model – overprediction of promoters in regions that are simply A/T rich is common – consensus sequences do not take into account that a sequence such as TATAAT occurs much more frequently at random in A/T rich regions. One the other hand, the T at the third position, is not even present in a majority promoters – A, C, G, and T occur almost equally, indicating that this position is essentially irrelevant. coli promoter, TATAAT, the final T is almost 100% conserved, indicating it has very high functional significance. For instance, in the consensus sequence for the E. First, they discard all information about the frequency of the letters at different positions of the motif. However, they are a very weak method for motif identification. Note that no sequence in the training set exactly matches the consensus.īecause they are concise, consensus sequences are often shown in the literature to mark the position of a motif in a sequence. Bases matching the consensus are sown in bold. coli −10 promoter region and the majority-rule consensus sequence.