Created on
The protein, SaCas9 Base Editor 4-Gam converts particular “C”s in the genome to “T”s without causing a double-stranded break to the DNA, and protecting the DNA if a break does occur. At the core of the protein is a compact variant of Cas9. By coupling the protein with a guide RNA made of a particular sequence, the protein can seek out specific DNA sequences in a genome with high fidelity, and once arrived at, bring the rest of its DNA-manipulation machinery to bear. Ultimately the single protein converts a specific C to a T in an mammalian organism’s genome, a few basepairs upstream of the targeted DNA sequence. Older variants of ‘Base Editor 4’ have already been used with success in in many applications including editing rice genomes.
The paper describing SaBE4-Gam was published at the end of August 2017 [1], and describes the fourth generation of a synthetic protein that involves significantly more engineering than most other synthetic proteins, and demonstrates one way in which biological tools can be both engineered and optimized. It contains 4 different major functional domains including Cas9, with each functional domain separated by variously designed linkers, epitope tags or localization signals. The domains each have independent functions, but when physically coupled into a single protein they in concert produce a specifically engineered outcome. In this case much of the engineering has been focused on ensuring the DNA being edited is edited accurately and specifically, while minimizing the chance for DNA damage from indel accumulation.
Cas9, from the CRISPR system guides the entire multi-domain protein to a particular sequence in the genome. Cas9 binds a strand of guide RNA that has an exposed sequence that probes genomic dna for a match. Once matched, the protein will presist at that location while other editing functions are performed.
Listed is a causal order of events that result in a C->T conversion, however each component of the protein acts persistently and independently on its surroundings, not necessarily in sequential order:
SV40 NLS
SV40 is from Simian virus 40.
PKKKRKV
SaCas9 [D10A] Nickase)
SaCas9 [D10A] is a strategically mutated version of SaCas9 from Staphylococcus aureus that prevents both strands of DNA from being cut. It is related to, but smaller than, the original CRISPR Cas9 protein found in Streptococcus pyogenes.
KRNYILGLAIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG
rApoBEC1
rApoBEC1 is the rat version of the human ApoBEC1 which regulates Apolipoprotein B.
TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML
Uracil Glycosylase Inhibitor
UGI is from Bacillus subtilis.
TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML
Gam
Gam is from Bacteriophage Mu.
MAKPAKRIKSAAAAYVPQNRDAVITDIKRIGDLQREASRLETEMNDAIAEITEKFAARIAPIKTDIETLSKGVQGWCEANRDELTNGGKVKTANLVTGDVSWRVRPPSVSIRGMDAVMETLERLGLQRFIRTKQEINKEAILLEPKAVAGVAGITVKSGIEDFSIIPFEQEAGI
Flag
Flag is a synthetic peptide that was patented by Sigma-Aldrich in 1987.
PKKKRKV
gRNA
The guide RNA has part of its sequence that is recognized by the Cas9 protein. This recognition sequence is called a “protospacer adjacent motif” (PAM), and is specific to each variant of Cas9. The second part of the RNA sequence is arbitrary, and what specifies where Cas9 will bind to the DNA in the genome. With >20 nucleotides in the gRNA’s variable region, a single sequence will be able to register >40 bits of information - enough to generally uniquely identifiy a section in a human genome (~3Gbp) An example SaCas9 gRNA might be as follows:
[XXXXXXXXXXX][ACCG][NNNNNNNNNNNNNNNNNNNNN]
In the case of SaBE4Gam, whichever ‘C’ is ~6 base pairs in from the end of the gRNA (in the XXX region), will be flipped in the genome to a corresponding T.