PEPCF expresses proteins in bacteria, insect and mammalian cells and uses a variety of chromatographic and biophysical techniques for protein purification and characterization.
The first step in any protein expression and purification experiment is the strategy and construct design, which includes choosing the host organism that will be used for the protein expression and deciding on the most optimal construct (i.e. plasmid backbone, protein tags, localization signals, …).
At EMBL PEPCF, we work with 3 different host organisms, which are E. coli, insect cells and mammalian cells. Currently we are not offering a service regarding protein expression in yeast.
To design the most optimal expression construct, it’s important to collect as much information about your protein as possible and have a good idea of the requirements of the downstream applications you want to use the purified protein for. If you don’t have so much experience with construct design or you would like some advice on cloning and possible expression vectors, don’t hesitate to contact the PEPCF staff for help.
E. coli is often the first host organism of choice, as it’s easy to work with, grows very fast and the expression medium is relatively cheap. There are also many different expression vectors and E. coli strains optimized for specific applications available. However, E. coli does not perform post-translational modifications and complex proteins cannot always be folded correctly.
The most frequently used method for protein production in insect cells is baculovirus-mediated expression in lepidopteran cell lines such as Sf9, Sf21, Hi5 and Tnao38 cells. Insect cells often offer a better machinery for folding of eukaryotic proteins and can perform some post-translational modifications. They are also able to secrete proteins to the extracellular medium. However, the timeline from construct to protein expression takes longer and the costs associated with protein production in insect cells are higher than when working with E. coli.
Mammalian cells are capable of properly folding mammalian proteins, providing native post-translational modifications and secreting protein to the medium. For preparative scale production of recombinant proteins, HEK293 (human embryonic kidney) and CHO (Chinese hamster ovary) cells are currently the most commonly used cell lines. They can both be used for transient transfection or for the generation of stable cell lines, although the latter takes more time. The costs associated with culturing mammalian cells still remain relatively high as well when compared to other host organisms.
Characteristics | E. coli | Yeast | Insect cells | Mammalian cells |
---|---|---|---|---|
Growth speed | Fast (Td ~ 20-30 min) | Fast (Td ~ 90 min) | Slow (Td ~ 24-30 h) | Slow (Td ~ 24-30 h) |
Growth medium | Simple Cheap | Simple Cheap | Complex Expensive | Complex Expensive |
Expression level | High | Low – High | Low – High | Low – Moderate |
Secretion | Periplasm | Medium | Medium | Medium |
Post-translational modifications | ||||
N-linked glycosylation | no | yes (high mannose) | yes (simple, no sialic acid) | yes (complex) |
O-linked glycosylation | no | yes | yes | yes |
Phosphorylation | no | yes | yes | yes |
Acylation | no | yes | yes | yes |
Acetylation | no | yes | yes | yes |
g-carboxylation | no | no | no | yes |
Gene Expression Systems. Using nature for the art of expression (Fernandez, J.M. & Hoeffler, J.P., eds), Academic Press, San Diego, 1999
To design a successful protein expression construct, it’s imperative to collect as much information as possible about your protein of interest. Some key points you need to consider are the following:
If there is not much information about your protein available in literature or on Uniprot, there are many bio-informatics tools that can help you predict several protein characteristics and thereby guide the construct design. Multiple sequence alignments are very helpful to define conserved parts and can help define domain boundaries, together with domain and structural predictions.
Expasy: Swiss Bioinformatics Resource Portal. Here you can find a large variety of tools and databases that are very helpful when working with proteins (for example sequence analysis tools, topology predictions, post-translational modifications predictions, 2D and 3D structural predictions, domain predictions, …)
Clustal Omega: Multiple Sequence Alignment program
SignalP: predicts the presence of signal sequences and their cleavage sites
The next step in the design of the expression construct is the choice of your expression vector. For commonly used host organisms, large collections of expression vectors are available commercially or in non-profit plasmid repositories such as Addgene and Gene Corner. We also have a collection of expression vectors that were created at EMBL, which we share freely with the academic community via a Material Transfer Agreement.
Another important point in the construct design is the decision of which protein tags to include in your construct. In many cases, an affinity tag to facilitate the purification (and/or detection) will be added. Small affinity tags (e.g. His6, StrepII, twinStrepII, Flag, Myc, HA, SPOT, …) are usually added to the N- or C-terminus of the protein, although in rare cases they can also be placed inside internal loops. Solubility-enhancing tags (e.g. SUMO, Trx, NusA, DsbA, DsbC, …) are generally placed at the N-terminus of a protein. Some protein tags such as GST and MBP can fulfil both functions at once and act both as a solubility-enhancing tag and affinity tag that can be used later on during the protein purification. Fluorescent tags (e.g. eGFP, mCherry, YFP, CFP, …) can be added to the N- or C-terminus of the protein and used for imaging purposes or studying interactions via biophysical techniques based on fluorescence. Modular tags such as HALO, SNAP and CLIP allow the attachment of different chemical functionalities and can be used to couple the protein covalently to a fluorescent dye, an affinity handle or a solid surface.
Affinity tags or solubility-enhancing tags can be removed during the protein purification when a specific protease cleavage site is included between the tag and the protein of interest.
Los G.V., Encell L.P., McDougall M.G., Hartzell D.D., Karassina N., Zimprich C., Wood M.G., Learish R., Ohana R.F., Urh M., Simpson D., Mendez J., Zimmerman K., Otto P., Vidugiris G., Zhu J., Darzins A., Klaubert D.H., Bulleit R.F. and Wood K.V. (2008) HaloTag: a novel protein labeling technology for cell imaging and protein analysis. ACS Chem Biol. 3(6):373-82
Gautier A., Juillerat A., Heinis C., Corrêa I.R. Jr, Kindermann M., Beaufils F. and Johnsson K. (2008) An engineered protein tag for multiprotein labeling in living cells. Chem Biol. 15(2):128-36
Saccardo P., Corchero J.L. and Ferrer-Miralles N. (2016) Tools to cope with difficult-to-express proteins. Appl Microbiol Biotechnol. 100:4347–4355
Bell M.R., Engleka M.J., Malik A. and Strickler J.E. (2013) To fuse or not to fuse: What is your purpose? Protein Science 22:1466-1477
Many different cloning methods exist that each have their own advantages and disadvantages. Usually, researchers select the most optimal method based on the cloning strategy they have designed and the available starting materials.
Celie P.H.N., Parret A.H.A. and Perrakis A. (2016) Recombinant cloning strategies for protein expression. Current Opinion in Structural Biology. 38:145–154
Lessard J. (2013) Molecular Cloning. Methods in Enzymology. 529:85-98
Zhang Y., Werling U. and Edelmann W. (2014) Seamless Ligation Cloning Extract (SLiCE) Cloning method. Methods Mol Biol. 1116:235-244
Li M.Z. and Elledge S.J. (2012) SLIC: A Method for Sequence- and Ligation-Independent Cloning. Methods Mol Biol. 852:51-58
Gibson D.G., Young L., Chuang R.-Y., Venter J.C., Hutchinson III C.A. and Smith H.O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature methods. 6:343-345
Parret A., Besir H. and Meijers R. (2016) Critical reflections on synthetic gene design for recombinant protein expression. Current Opinion in Structural Biology. 38:155–162
We maintain a large database of expression vectors at the EMBL Protein Expression and Purification Core Facility. Vectors that are commercially available can only be shared with internal EMBL users, but vectors generated at EMBL are freely available to the entire academic research community via a Material Transfer Agreement.
If you’re interested in obtaining vectors created at EMBL via the MTA, please complete the form and send a signed and dated pdf file to Kim Remans. Since the contracting partners are the institutes (not the scientists) the contract has to be signed by yourself and by an official representative of your institute who is authorized to sign binding agreements. Please understand that we are not in a position to negotiate changes of the wording in the text of our MTA.