Strategy and construct design

The first step in any protein expression and purification experiment is the strategy and construct design, which includes choosing the host organism that will be used for the protein expression and deciding on the most optimal construct (i.e. plasmid backbone, protein tags, localization signals, …).

At EMBL PEPCF, we work with 3 different host organisms, which are E. coli, insect cells and mammalian cells. Currently we are not offering a service regarding protein expression in yeast.

To design the most optimal expression construct, it’s important to collect as much information about your protein as possible and have a good idea of the requirements of the downstream applications you want to use the purified protein for. If you don’t have so much experience with construct design or you would like some advice on cloning and possible expression vectors, don’t hesitate to contact the PEPCF staff for help.

Choice of expression system

E. coli is often the first host organism of choice, as it’s easy to work with, grows very fast and the expression medium is relatively cheap. There are also many different expression vectors and E. coli strains optimized for specific applications available. However, E. coli does not perform post-translational modifications and complex proteins cannot always be folded correctly.

The most frequently used method for protein production in insect cells is baculovirus-mediated expression in lepidopteran cell lines such as Sf9, Sf21, Hi5 and Tnao38 cells. Insect cells often offer a better machinery for folding of eukaryotic proteins and can perform some post-translational modifications. They are also able to secrete proteins to the extracellular medium. However, the timeline from construct to protein expression takes longer and the costs associated with protein production in insect cells are higher than when working with E. coli.

Mammalian cells are capable of properly folding mammalian proteins, providing native post-translational modifications and secreting protein to the medium. For preparative scale production of recombinant proteins, HEK293 (human embryonic kidney) and CHO (Chinese hamster ovary) cells are currently the most commonly used cell lines. They can both be used for transient transfection or for the generation of stable cell lines, although the latter takes more time. The costs associated with culturing mammalian cells still remain relatively high as well when compared to other host organisms.

Characteristics	E. coli	Yeast	Insect cells	Mammalian cells
Growth speed	Fast (T_d ~ 20-30 min)	Fast (T_d ~ 90 min)	Slow (T_d ~ 24-30 h)	Slow (T_d ~ 24-30 h)
Growth medium	Simple Cheap	Simple Cheap	Complex Expensive	Complex Expensive
Expression level	High	Low – High	Low – High	Low – Moderate
Secretion	Periplasm	Medium	Medium	Medium
Post-translational modifications
N-linked glycosylation	no	yes (high mannose)	yes (simple, no sialic acid)	yes (complex)
O-linked glycosylation	no	yes	yes	yes
Phosphorylation	no	yes	yes	yes
Acylation	no	yes	yes	yes
Acetylation	no	yes	yes	yes
g-carboxylation	no	no	no	yes

Characteristics of commonly used host organisms for protein production.

References

Gene Expression Systems. Using nature for the art of expression (Fernandez, J.M. & Hoeffler, J.P., eds), Academic Press, San Diego, 1999

Design of expression construct

To design a successful protein expression construct, it’s imperative to collect as much information as possible about your protein of interest. Some key points you need to consider are the following:

Is it a single- or multi-domain protein? If it’s a multi-domain protein, do you want to express full-length protein or would a specific domain be sufficient for your planned down-stream applications?
What is the native localization of the protein in the cell? Is it a soluble protein or a membrane protein? Does the native sequence contain a signal sequence?
Does the protein require certain co-factors for stability and/or functionality? If yes, it might be necessary to add these to the culturing medium and/or the purification buffers.
Does the protein contain a large number of disulfide bonds?
Does the protein need interaction partners for stability? If the protein isn’t stable on itself, it might be necessary to co-express with interaction partners.
Does the protein contain post-translational modifications? If yes, are these necessary for the folding and/or the functionality of the protein?

If there is not much information about your protein available in literature or on Uniprot, there are many bio-informatics tools that can help you predict several protein characteristics and thereby guide the construct design. Multiple sequence alignments are very helpful to define conserved parts and can help define domain boundaries, together with domain and structural predictions.

Useful bio-informatics tools

Expasy: Swiss Bioinformatics Resource Portal. Here you can find a large variety of tools and databases that are very helpful when working with proteins (for example sequence analysis tools, topology predictions, post-translational modifications predictions, 2D and 3D structural predictions, domain predictions, …)

Clustal Omega: Multiple Sequence Alignment program

SignalP: predicts the presence of signal sequences and their cleavage sites

The next step in the design of the expression construct is the choice of your expression vector. For commonly used host organisms, large collections of expression vectors are available commercially or in non-profit plasmid repositories such as Addgene and Gene Corner. We also have a collection of expression vectors that were created at EMBL, which we share freely with the academic community via a Material Transfer Agreement.

Another important point in the construct design is the decision of which protein tags to include in your construct. In many cases, an affinity tag to facilitate the purification (and/or detection) will be added. Small affinity tags (e.g. His₆, StrepII, twinStrepII, Flag, Myc, HA, SPOT, …) are usually added to the N- or C-terminus of the protein, although in rare cases they can also be placed inside internal loops. Solubility-enhancing tags (e.g. SUMO, Trx, NusA, DsbA, DsbC, …) are generally placed at the N-terminus of a protein. Some protein tags such as GST and MBP can fulfil both functions at once and act both as a solubility-enhancing tag and affinity tag that can be used later on during the protein purification. Fluorescent tags (e.g. eGFP, mCherry, YFP, CFP, …) can be added to the N- or C-terminus of the protein and used for imaging purposes or studying interactions via biophysical techniques based on fluorescence. Modular tags such as HALO, SNAP and CLIP allow the attachment of different chemical functionalities and can be used to couple the protein covalently to a fluorescent dye, an affinity handle or a solid surface.

diagram — Removal of protein tags using specific protease cleavage sites.

Affinity tags or solubility-enhancing tags can be removed during the protein purification when a specific protease cleavage site is included between the tag and the protein of interest.

References

Los G.V., Encell L.P., McDougall M.G., Hartzell D.D., Karassina N., Zimprich C., Wood M.G., Learish R., Ohana R.F., Urh M., Simpson D., Mendez J., Zimmerman K., Otto P., Vidugiris G., Zhu J., Darzins A., Klaubert D.H., Bulleit R.F. and Wood K.V. (2008) HaloTag: a novel protein labeling technology for cell imaging and protein analysis. ACS Chem Biol. 3(6):373-82

Gautier A., Juillerat A., Heinis C., Corrêa I.R. Jr, Kindermann M., Beaufils F. and Johnsson K. (2008) An engineered protein tag for multiprotein labeling in living cells. Chem Biol. 15(2):128-36

Saccardo P., Corchero J.L. and Ferrer-Miralles N. (2016) Tools to cope with difficult-to-express proteins. Appl Microbiol Biotechnol. 100:4347–4355

Bell M.R., Engleka M.J., Malik A. and Strickler J.E. (2013) To fuse or not to fuse: What is your purpose? Protein Science 22:1466-1477

Cloning

Many different cloning methods exist that each have their own advantages and disadvantages. Usually, researchers select the most optimal method based on the cloning strategy they have designed and the available starting materials.

References

Celie P.H.N., Parret A.H.A. and Perrakis A. (2016) Recombinant cloning strategies for protein expression. Current Opinion in Structural Biology. 38:145–154

Lessard J. (2013) Molecular Cloning. Methods in Enzymology. 529:85-98

Zhang Y., Werling U. and Edelmann W. (2014) Seamless Ligation Cloning Extract (SLiCE) Cloning method. Methods Mol Biol. 1116:235-244

Li M.Z. and Elledge S.J. (2012) SLIC: A Method for Sequence- and Ligation-Independent Cloning. Methods Mol Biol. 852:51-58

Gibson D.G., Young L., Chuang R.-Y., Venter J.C., Hutchinson III C.A. and Smith H.O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature methods. 6:343-345

Parret A., Besir H. and Meijers R. (2016) Critical reflections on synthetic gene design for recombinant protein expression. Current Opinion in Structural Biology. 38:155–162

See more about:

Collections of expression vectors

We maintain a large database of expression vectors at the EMBL Protein Expression and Purification Core Facility. Vectors that are commercially available can only be shared with internal EMBL users, but vectors generated at EMBL are freely available to the entire academic research community via a Material Transfer Agreement.

If you’re interested in obtaining vectors created at EMBL via the MTA, please complete the form and send a signed and dated pdf file to Kim Remans. Since the contracting partners are the institutes (not the scientists) the contract has to be signed by yourself and by an official representative of your institute who is authorized to sign binding agreements. Please understand that we are not in a position to negotiate changes of the wording in the text of our MTA.

Protein Expression and Purification Core Facility

On this page:

Choice of expression system

Design of expression construct

Useful bio-informatics tools

Cloning

See more about:

Collections of expression vectors

See more about: