Peptide#

class peptides.Peptide(object)#

A sequence of amino acids.

sequence#

The peptide primary sequence, encoded in the IUPAC one-letter code.

Type:: str

classmethod sample(length: int, frequencies: str = 'SwissProt2021') → Peptide#

Generate a peptide with the given amino-acid frequencies.

This method is useful for testing, but using amino-acid frequencies to generate a peptide is not a biologically accurate method, instead consider sampling based on dipeptide frequencies in a particular organism, or using k-mer shuffling.

Parameters:

length (int) – The desired length for the generated peptide.
frequencies (str) – The name of the amino-acid frequency table to use: either KingJukes to use the amino-acid frequencies for vertebrate organisms reported in King & Jukes (1969), or SwissProt2021 to use the amino-acid frequencies in all the proteins from the January 2021 release of SwissProt.

Returns:

Peptide – A new peptide. The first amino-acid will always be a Methionine for biological accuracy.

References

King, J. L., and T. H. Jukes. Non-Darwinian Evolution. Science. May 1969;164(3881):788–98. doi:10.1126/science.164.3881.788. PMID:5767777.
The UniProt Consortium. UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Research. Jan 2021;49(D1):D480–89. doi:10.1093/nar/gkaa1100. PMID:33237286.

__init__(sequence: str) → None#

Create a new peptide object with the given sequence.

Parameters:: sequence (str) – A sequence of amino acids encoded with the IUPAC one-letter code. Non-standard (O, U), ambiguous (B, Z, J) and unknown (X) residues are supported in some methods, but not all of them.

aliphatic_index() → float#

Compute the aliphatic index of the peptide.

The aliphatic index of a protein was proposed in Ikai (1980). It is defined as the relative volume occupied by aliphatic side chains (Alanine, Valine, Isoleucine, and Leucine):

\[\text{aliphatic index} = A + 2.9 V + 3.9 (I + L + J)\]

It may be regarded as a positive factor for the increase of thermostability of globular proteins.

Returns:: float – The computed aliphatic index for the peptide sequence, between 0.0 and 390.0.

Example

>>> peptide = Peptide("SDKEVDEVDAALSDLEITLE")
>>> peptide.aliphatic_index()
117.0

References

Ikai, A. Thermostability and Aliphatic Index of Globular Proteins. Journal of Biochemistry. Dec 1980;88(6):1895–98. PMID:7462208.

atchley_factors() → AtchleyFactors#

Compute the Atchley factors of the peptide.

See AtchleyFactors for more information.

Returns:: peptides.AtchleyFactors – The computed average Atchley factors for all the amino acids in the peptide.

Example

>>> peptide = Peptide("KLKLLLLLKLK")
>>> for i, kf in enumerate(peptide.atchley_factors()):
...     print(f"AF{i+1:<3} {kf: .4f}")
AF1    0.0176
AF2   -0.8321
AF3   -0.7636
AF4    0.7048
AF5    0.0189

auto_correlation(table: Dict[str, float], lag: int = 1, center: bool = True) → float#

Compute the auto-correlation index of a peptide sequence.

Example

>>> peptide = Peptide("SDKEVDEVDAALSDLEITLE")
>>> table = peptides.tables.HYDROPHOBICITY["KyteDoolittle"]
>>> peptide.auto_correlation(table=table)
-0.3519908...
>>> peptide.auto_correlation(table=table, lag=5)
0.00113355...

auto_covariance(table: Dict[str, float], lag: int = 1, center: bool = True) → float#

Compute the auto-covariance index of a peptide sequence.

Example

>>> peptide = Peptide("SDKEVDEVDAALSDLEITLE")
>>> table = peptides.tables.HYDROPHOBICITY["KyteDoolittle"]
>>> peptide.auto_covariance(table)
-0.414005...
>>> peptide.auto_covariance(table, lag=5)
0.0010003...

blosum_indices() → BLOSUMIndices#

Compute the BLOSUM62-derived indices of the peptide.

See BLOSUMIndices for more information.

Returns:: peptides.BLOSUMIndices – The computed average BLOSUM indices for all the amino acids in the peptide.

Example

>>> peptide = Peptide("KLKLLLLLKLK")
>>> for i, b in enumerate(peptide.blosum_indices()):
...     print(f"BLOSUM{i+1:<3} {b: .4f}")
BLOSUM1   -0.4827
BLOSUM2   -0.5618
BLOSUM3   -0.8509
BLOSUM4   -0.4173
BLOSUM5    0.3173
BLOSUM6    0.2527
BLOSUM7    0.1464
BLOSUM8    0.1427
BLOSUM9   -0.2145
BLOSUM10  -0.3218

boman() → float#

Compute the Boman (potential peptide interaction) index.

The potential interaction index proposed by Boman (2003) is an index computed by averaging the solubility values for all residues in a sequence. It can be used to give an overall estimate of the potential of a peptide to bind to membranes or other proteins.

Returns:: float – The Boman index for the peptide. A value greater than 2.48 indicates that a protein has high binding potential.

Example

>>> peptide = Peptide("FLPVLAGLTPSIVPKLVCLLTKKC")
>>> peptide.boman()
-1.2358...

Note

The potential protein interaction index was originally proposed as an easy way to differentiate between the action mechanism of hormones (protein/protein) and antimicrobial peptides (protein/membrane).

References

Boman, H. G. Antibacterial Peptides: Basic Facts and Emerging Concepts. Journal of Internal Medicine. 2003 Sep;254(3):197–215. doi:10.1046/j.1365-2796.2003.01228.x. PMID:12930229.

charge(pH: float = 7, pKscale: str = 'Lehninger') → float#

Compute the theoretical net charge of a peptide sequence.

This function computes the theoretical net charge of a peptide sequence, based on the Henderson-Hasselbach equation described by Dexter S. Moore (1985). The net charge can be computed at a given pH using one of the 9 pKa scales available.

Parameters:

pH (float) – The pH value for which to compute the charge.
pKscale (str) – The name of the pKa scale to be used. A list of all the allowed values can be retrieved from the keys of the peptides.tables.PK dictionary.

Returns:

float – The net charge of the peptide.

Example

>>> peptide = Peptide("FLPVLAGLTPSIVPKLVCLLTKKC")
>>> peptide.charge(pKscale="Bjellqvist")
2.7373...
>>> peptide.charge(pKscale="EMBOSS")
2.9141...
>>> peptide.charge(pKscale="Murray")
2.9075...
>>> peptide.charge(pKscale="Sillero")
2.9198...
>>> peptide.charge(pKscale="Solomon")
2.8444...
>>> peptide.charge(pKscale="Stryer")
2.8765...
>>> peptide.charge(pKscale="Lehninger")
2.8731...
>>> peptide.charge(pKscale="Dawson")
2.8444...
>>> peptide.charge(pKscale="Rodwell")
2.8197...

References

Bjellqvist, B., G. J. Hughes, C. Pasquali, N. Paquet, F. Ravier, J. C. Sanchez, S. Frutiger, and D. Hochstrasser. The Focusing Positions of Polypeptides in Immobilized pH Gradients Can Be Predicted from Their Amino Acid Sequences. Electrophoresis. 1993 Oct;14(10):1023–31. doi:10.1002/elps.11501401163. PMID:8125050.
Dawson, R. M. C. and D. C. Elliott. Data for Biochemical Research. Oxford: Clarendon Press. 2002;3:592. ISBN:978-0-19-855299-4.
Kiraga, J. Analysis and computer simulations of variability of isoelectric point of proteins in the proteomes. PhD thesis, University of Wroclaw, Poland. 2008.
Lehninger, A. L., D. L. Nelson, and M. M. Cox. Lehninger Principles of Biochemistry. 4th ed. New York: W.H. Freeman. 2005;4:1100. ISBN:978-0-7167-4339-2.
Murray, R. K. Harper’s Illustrated Biochemistry. New York: Lange Medical Books/McGraw-Hill. 2006;27. ISBN:978-0-07-146197-9.
Rodwell, J.D. Heterogeneity of Component Bands in Isoelectric Focusing Patterns. Analytical Biochemistry. 1982 Jan;119(2):440-49. doi:10.1016/0003-2697(82)90611-x. PMID:7072964.
Sillero, A., and A. Maldonado. Isoelectric Point Determination of Proteins and Other Macromolecules: Oscillating Method. Computers in Biology and Medicine. 2006 Feb;36(2): 157–66. doi:10.1016/j.compbiomed.2004.09.006. PMID:16389075.
Solomons, T. W. G. Fundamentals of Organic Chemistry. New York: Wiley. 1997;5. ISBN:978-0-471-28298-3.
Stryer, L., J. Augustyniak, and J. Michejda. Biochemia. Warszawa: Wydawnictwo Naukowe PWN. 2000. ISBN:978-83-01-12044-3.

counts() → Dict[str, int]#

Return a table of amino-acid counts in the peptide.

Returns:: dict – A dictionary mapping each amino-acid code to the number of times it occurs in the peptide sequence.

Example

>>> p = Peptide("SDKEVDEVDAALS")
>>> {k:v for k,v in p.counts().items() if v != 0}
{'A': 2, 'D': 3, 'E': 2, 'L': 1, 'K': 1, 'S': 2, 'V': 2}

cross_covariance(table1: Dict[str, float], table2: Dict[str, float], lag: int = 1, center: bool = True) → float#

Compute the cross-covariance index of a peptide sequence.

Example

>>> peptide = Peptide("SDKEVDEVDAALSDLEITLE")
>>> table1 = peptides.tables.HYDROPHOBICITY["KyteDoolittle"]
>>> table2 = peptides.tables.HYDROPHOBICITY["Eisenberg"]
>>> peptide.cross_covariance(table1, table2)
-0.3026609...
>>> peptide.cross_covariance(table1, table2, lag=5)
0.0259803...

cruciani_properties() → CrucianiProperties#

Compute the Cruciani properties of the peptide.

See CrucianiProperties for more information.

Returns:: peptides.CrucianiProperties – The computed average Cruciani properties of all the amino acids in the corresponding peptide sequence.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, b in enumerate(peptide.cruciani_properties()):
...     print(f"PP{i+1:<3} {b: .4f}")
PP1   -0.1130
PP2   -0.0220
PP3    0.2735

descriptors() → Dict[str, float]#

Create a dictionary containing every protein descriptor available.

Example

>>> peptide = Peptide("SDKEVDEVDAALSDLEITLE")
>>> sorted(peptide.descriptors().keys())
['AF1', ..., 'F1', ..., 'KF1', ..., 'MSWHIM1', ..., 'PP1', ...]

Hint

Use this method to create a DataFrame containing the descriptors for several sequences.

detect_outlier() → OutlierResult#

Detect if this sequence is an outlier based on SwissProt distributions.

This method analyzes the sequence using the vetting metrics (entropy, max_frequency, longest_run) and compares them against established distributions from SwissProt proteins to identify potential outliers, artifacts, or unusual sequences. It provides an automated way to flag sequences that may require further investigation.

See OutlierResult for more information.

Returns:: OutlierResult – The outlier detection results, as a named tuple.

Example

For a real peptide, the large ribosomal subunit protein bL32 of Escherichia coli (UniProt:P0A7N4):

>>> peptide = Peptide(
...     "MAVQQNKPTRSKRGMRRSHDALTAVTSLSVDKT"
...     "SGEKHLRHHITADGYYRGRKVIAK"
... )
>>> result = peptide.detect_outlier()
>>> result.is_outlier
False

For a problematic sequence:

>>> peptide = Peptide("AAAA")
>>> result = peptide.detect_outlier()
>>> result.is_outlier
True
>>> result.issues[0]
'Entropy (0.000) below 5th percentile (3.714)'

Note

Thresholds are based on SwissProt analysis and are hardcoded for efficiency. No external files are required.

Added in version 0.5.0.

energy_cost(scale: str = 'Akashi', *, mode: str | None = None)#

Estimate the energy cost required to biosynthesize a peptide.

Parameters:

scale (str) –

The name of the energy estimation scale to use. Supports the following values:

Akashi: The energetic cost computed by Akashi & Gojobori (2002) based on major codon usage values in Escherichia coli and Bacillus subtilis.
Craig: The energetic cost computed by Craig & Weber () from amino-acid substitution probabilities in Escherichia coli.
Heizer: The energetic cost computed by Heizer et al. (2006), derived from Akashi & Gojobori (2002) for photoautotrophs (capable of the Calvin cycle reactions).
Wagner: The energetic cost computed by Wagner (2005) based on expression data in Saccharomyces cerevisiae.

Keyword Arguments:

mode (str) – For the Wagner scale, the mode of growth of the source organism, either respiration (the default) or fermentation.

Example

>>> peptide = Peptide("SDKEVDEVDAALSDLEITLEYLKW")
>>> peptide.energy_cost()
550.5...
>>> peptide.energy_cost("Heizer")
554.5...

References

Akashi, H., & Gojobori, T. (2002). Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proceedings of the National Academy of Sciences of the United States of America, 99(6), 3695–3700. PMID:11904428. doi:10.1073/pnas.062526999.
Heizer, E. M., Jr, Raiford, D. W., Raymer, M. L., Doom, T. E., Miller, R. V., & Krane, D. E. (2006). Amino acid cost and codon-usage biases in 6 prokaryotic genomes: a whole-genome analysis. Molecular biology and evolution, 23(9), 1670–1680. PMID:16754641. doi:10.1093/molbev/msl029.
Wagner A. (2005). Energy constraints on the evolution of gene expression. Molecular biology and evolution, 22(6), 1365–1374. PMID:15758206. doi:10.1093/molbev/msi126.

entropy() → float#

Compute the Shannon entropy of the amino acid sequence.

The Shannon entropy measures the diversity of amino acids in the peptide sequence. It is calculated using the formula:

\[H = -\sum_{i=1}^{n} p_i \log_2(p_i)\]

where \(p_i\) is the frequency of amino acid \(i\) and \(n\) is the number of possible amino acids (26, including ambiguous codes).

The entropy is maximized when all amino acids are equally frequent, and minimized when the sequence consists of only one amino acid type.

Returns:: float – The Shannon entropy of the peptide sequence in bits. The maximum possible value is log₂(26) ≈ 4.70 bits.

Example

>>> peptide = Peptide("AALS")
>>> peptide.entropy()
1.5
>>> peptide = Peptide("AAAA")
>>> peptide.entropy()
0.0
>>> peptide = Peptide("ACDEFGHIKLMNPQRSTVWY")
>>> peptide.entropy()
4.3219...

References

Shannon, C. E. A Mathematical Theory of Communication. Bell System Technical Journal. 1948;27(3):379-423. doi:10.1002/j.1538-7305.1948.tb01338.x.

Added in version 0.5.0.

fasgai_vectors() → FasgaiVectors#

Compute the FASGAI vectors of the peptide.

See FasgaiVectors for more information.

Returns:: peptides.FasgaiVectors – The computed average FASGAI vectors for all the amino acids in the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, b in enumerate(peptide.fasgai_vectors()):
...     print(f"F{i+1:<3} {b: .5f}")
F1   -0.13675
F2   -0.45485
F3   -0.11695
F4   -0.45800
F5   -0.38015
F6    0.52740

frequencies() → Dict[str, float]#

Return a table of amino-acid frequencies in the peptide.

Returns:: dict – A dictionary mapping each amino-acid code to its frequency in the peptide sequence.

Example

>>> p = Peptide("AALS")
>>> {k:v for k,v in p.frequencies().items() if v != 0}
{'A': 0.5, 'L': 0.25, 'S': 0.25}

hydrophobic_moment(window: int = 11, angle: int = 100) → float#

Compute the maximal hydrophobic moment of a protein sequence.

This function computes the hydrophobic moment based on Eisenberg et al (1984). Hydrophobic moment is a quantitative measure of the amphiphilicity perpendicular to the axis of any periodic peptide structure, such as the α-helix or β-sheet.

Parameters:

angle (int) – A protein rotational angle, in degrees. Usual values are 100 for α-helix, and 160 for β-sheet.
window (int) – The size of the sliding window for which to compute the local hydrophobic moment.

Returns:

float – The maximal hydrophobic moment of the peptide.

Example

>>> peptide = Peptide("FLPVLAGLTPSIVPKLVCLLTKKC")
>>> peptide.hydrophobic_moment(angle=100)
0.519922...
>>> peptide.hydrophobic_moment(angle=160)
0.270590...

See also

The hydrophobic_moment method, which computes the maximal hydrophobic moment instead of building a profile.

hydrophobicity(scale: str = 'KyteDoolittle') → float#

Compute the hydrophobicity index of a protein sequence.

This function calculates the hydrophobicity index of an amino acid sequence by averaging the hydrophobicity values of each residue using one of the 39 scales from different sources.

Parameters:: scale (str) – The name of the hydrophobicity scale to be used. A list of all the allowed values can be retrieved from the keys of the peptides.tables.HYDROPHOBICITY dictionary.
Returns:: float – The hydrophobicity index of the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> peptide.hydrophobicity(scale="Aboderin")
3.84...
>>> peptide.hydrophobicity(scale="AbrahamLeo")
0.092...

Note

The hydrophobicity is an important stabilization force in protein folding; this force changes depending on the solvent in which the protein is found.

References

Aboderin, A. A. An Empirical Hydrophobicity Scale for α-Amino-Acids and Some of Its Applications. International Journal of Biochemistry. 1971 Oct;2(11):537–44. doi:10.1016/0020-711X(71)90023-1.
Abraham, D.J., and A. J. Leo. Extension of the Fragment Method to Calculate Amino Acid Zwitterion and Side Chain Partition Coefficients. Proteins: Structure, Function, and Genetics. 1987;2(2):130–52. doi:10.1002/prot.340020207.
Argos, P., J. K. Rao, and P. A. Hargrave. Structural Prediction of Membrane-Bound Proteins. European Journal of Biochemistry. Nov 1982;128(2–3):565–75. doi:10.1111/j.1432-1033.1982.tb07002.x. PMID:7151796.
Barley, M. H., N. J. Turner, and R. Goodacre. Improved Descriptors for the Quantitative Structure–Activity Relationship Modeling of Peptides and Proteins. Journal of Chemical Information and Modeling. Feb 2018;58(2):234–43. doi:10.1021/acs.jcim.7b00488. PMID:29338232.
Black, S. D., and D. R. Mould. Development of Hydrophobicity Parameters to Analyze Proteins Which Bear Post- or Cotranslational Modifications. Analytical Biochemistry. Feb 1991;193(1):72–82. doi:10.1016/0003-2697(91)90045-u. PMID:2042744.
Bull, H. B., and K. Breese. Surface Tension of Amino Acid Solutions: A Hydrophobicity Scale of the Amino Acid Residues. Archives of Biochemistry and Biophysics. Apr 1974;161(2):665–70. doi:10.1016/0003-9861(74)90352-x. PMID:4839053.
Casari, G., and M. J. Sippl. Structure-Derived Hydrophobic Potential. Hydrophobic Potential Derived from X-Ray Structures of Globular Proteins Is Able to Identify Native Folds. Journal of Molecular Biology. Apr 1992;224(3):725–32. doi:10.1016/0022-2836(92)90556-y. PMID:1569551.
Chothia, C. The Nature of the Accessible and Buried Surfaces in Proteins. Journal of Molecular Biology. Jul 2917;105(1):1–12. doi:10.1016/0022-2836(76)90191-1. PMID:994183.
Cid, H., M. Bunster, M. Canales, and F. Gazitúa. Hydrophobicity and Structural Classes in Proteins. Protein Engineering. Jul 1992;5(5):373–75. doi:10.1093/protein/5.5.373. PMID:1518784.
Cowan, R., and R. G. Whittaker. Hydrophobicity Indices for Amino Acid Residues as Determined by High-Performance Liquid Chromatography. Peptide Research. Apr 1990;3(2):75–80. PMID:2134053.
Eisenberg, D., E. Schwarz, M. Komaromy, and R. Wall. Analysis of Membrane and Surface Protein Sequences with the Hydrophobic Moment Plot. Journal of Molecular Biology. Oct 1984;179(1):125–42. doi:10.1016/0022-2836(84)90309-7. PMID:6502707.
Engelman, D. M., T. A. Steitz, and A. Goldman. Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins. Annual Review of Biophysics and Biophysical Chemistry. 1986;15:321–53. doi:10.1146/annurev.bb.15.060186.001541. PMID:3521657.
Fasman, G. D. Prediction of Protein Structure and the Principles of Protein Conformation. Springer US. 1989. doi:10.1007/978-1-4613-1571-1. ISBN:978-0-306-43131-9.
Fauchère, J-L., and Pliska V. Hydrophobic Parameters π of Amino-Acid Side Chains from the Partitioning of N-Acetyl-Amino-Acid Amides. European Journal of Medicinal Chemistry. 1983;18(4):369–75.
Goldsack, D. E., and R. C. Chalifoux. Contribution of the Free Energy of Mixing of Hydrophobic Side Chains to the Stability of the Tertiary Structure of Proteins. Journal of Theoretical Biology. Jun 1973;39(3):645–51. doi:10.1016/0022-5193(73)90075-1. PMID:4354159.
Guy, H. R. Amino Acid Side-Chain Partition Energies and Distribution of Residues in Soluble Proteins. Biophysical Journal. Jan 1985;47(1):61–70. doi:10.1016/S0006-3495(85)83877-7. PMID:3978191.
Hopp, T. P., and K. R. Woods. Prediction of Protein Antigenic Determinants from Amino Acid Sequences. Proceedings of the National Academy of Sciences of the United States of America. Jun 1981;78(6):3824–28. doi:10.1073/pnas.78.6.3824. PMID:6167991.
Janin, J. Surface and inside Volumes in Globular Proteins. Nature. Feb 1979;277(5696):491–92. doi:10.1038/277491a0. PMID:763335.
Jones, D. D. Amino Acid Properties and Side-Chain Orientation in Proteins: A Cross Correlation Approach. Journal of Theoretical Biology. Mar 1975;50(1):167–83. doi:10.1016/0022-5193(75)90031-4. PMID:1127956.
Juretić, D., D. Zucić, B. Lucić, and N. Trinajstić. Preference Functions for Prediction of Membrane-Buried Helices in Integral Membrane Proteins. Computers & Chemistry. Jun 1998;22(4):279–94. doi:10.1016/s0097-8485(97)00070-3. PMID:9680689.
Kawashima, S., H. Ogata, and M. Kanehisa. AAindex: Amino Acid Index Database. Nucleic Acids Research. Jan 1999;27(1):368–69. doi:10.1093/nar/27.1.368. PMID:9847231.
Kawashima, S., and M. Kanehisa. AAindex: Amino Acid Index Database. Nucleic Acids Research. Jan 2000;28(1):374. doi:10.1093/nar/28.1.374. PMID:10592278.
Kawashima, S., P. Pokarowski, M. Pokarowska, A. Kolinski, T. Katayama, and M. Kanehisa. AAindex: Amino Acid Index Database, Progress Report 2008. Nucleic Acids Research. Jan 2008;36:D202-205. doi:10.1093/nar/gkm998. PMID:17998252.
Kidera, A., Y. Konishi, M. Oka, T. Ooi, and H. A. Scheraga. Statistical Analysis of the Physical Properties of the 20 Naturally Occurring Amino Acids. Journal of Protein Chemistry. Feb 1985;4(1):23-55. doi:10.1007/BF01025492.
Kuhn, L. A., C. A. Swanson, M. E. Pique, J. A. Tainer, and E. D. Getzoff. Atomic and Residue Hydrophilicity in the Context of Folded Protein Structures. Proteins. Dec 1995;23(4):536–47. doi:10.1002/prot.340230408. PMID:8749849.
Kyte, J., and R. F. Doolittle. A Simple Method for Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology. May 1982;157(1):105–32. doi:10.1016/0022-2836(82)90515-0. PMID:7108955.
Levitt, M. A Simplified Representation of Protein Conformations for Rapid Simulation of Protein Folding. Journal of Molecular Biology. Jun 1976;104(1):59–107. doi:10.1016/0022-2836(76)90004-8. PMID:957439.
Manavalan, P., and P. K. Ponnuswamy. Hydrophobic Character of Amino Acid Residues in Globular Proteins. Nature. Oct 1978;275(5681):673–74. doi:10.1038/275673a0. PMID:703834.
Miyazawa, S., and R. L. Jernigan. Estimation of Effective Interresidue Contact Energies from Protein Crystal Structures: Quasi-Chemical Approximation. Macromolecules. Mar 1985;18(3):534–52. doi:10.1021/ma00145a039.
Nakai, K., A. Kidera, and M. Kanehisa. Cluster Analysis of Amino Acid Indices for Prediction of Protein Structure and Function. Protein Engineering. Jul 1988;2(2):93–100. doi:10.1093/protein/2.2.93. PMID:3244698.
Nozaki, Y., and C. Tanford. The Solubility of Amino Acids and Two Glycine Peptides in Aqueous Ethanol and Dioxane Solutions. Establishment of a Hydrophobicity Scale. The Journal of Biological Chemistry. Apr 1971;246(7):2211–17. PMID:5555568.
Parker, J. M., D. Guo, and R. S. Hodges. New Hydrophilicity Scale Derived from High-Performance Liquid Chromatography Peptide Retention Data: Correlation of Predicted Surface Residues with Antigenicity and X-Ray-Derived Accessible Sites. Biochemistry. 1986;25(19):5425–32. doi:10.1021/bi00367a013. PMID:2430611.
Ponnuswamy, P. K. Hydrophobic Characteristics of Folded Proteins. Progress in Biophysics and Molecular Biology. 1993;59(1):57–103. doi:10.1016/0079-6107(93)90007-7. PMID:8419986.
Prabhakaran, M. The Distribution of Physical, Chemical and Conformational Properties in Signal and Nascent Peptides. The Biochemical Journal. Aug 1990;269(3):691–96. doi:10.1042/bj2690691. PMID:2390062.
Rao, J. K. M., and P. Argos. A Conformational Preference Parameter to Predict Helices in Integral Membrane Proteins. Biochimica Et Biophysica Acta. Jan 1986;869(2):197–214. doi:10.1016/0167-4838(86)90295-5. PMID:2935194.
Rose, G. D., A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Zehfus. Hydrophobicity of Amino Acid Residues in Globular Proteins. Science (New York, N.Y.). Aug 1985;229(4716):834–38. doi:10.1126/science.4023714. PMID:4023714.
Roseman, M. A. Hydrophilicity of Polar Amino Acid Side-Chains Is Markedly Reduced by Flanking Peptide Bonds. Journal of Molecular Biology. Apr 1988;200(3):513–22. doi:10.1016/0022-2836(88)90540-2. PMID:3398047.
Sweet, R. M., and D. Eisenberg. Correlation of Sequence Hydrophobicities Measures Similarity in Three-Dimensional Protein Structure. Journal of Molecular Biology. Dec 1983;171(4):479-88. doi:10.1016/0022-2836(83)90041-4. PMID:6663622.
Tomii, K., and M. Kanehisa. Analysis of Amino Acid Indices and Mutation Matrices for Sequence Comparison and Structure Prediction of Proteins. Protein Engineering. Jan 1996;9(1):27–36. doi:10.1093/protein/9.1.27. PMID:9053899.
Welling, G. W., W. J. Weijer, R. van der Zee R, and S. Welling-Wester. Prediction of Sequential Antigenic Regions in Proteins. FEBS Letters. Feb 1985;188(2):215-8. doi:10.1016/0014-5793(85)80374-4. PMID:2411595.
White, S. H., and W. C. Wimley. Membrane Protein Folding and Stability: Physical Principles. Annual Review of Biophysics and Biomolecular Structure. 1999;28:319–65. doi:10.1146/annurev.biophys.28.1.319. PMID:10410805
White, S. H., and W. C. Wimley. Hydrophobic Interactions of Peptides with Membrane Interfaces. Biochimica Et Biophysica Acta. Nov 1998;1376(3):339-52. doi:10.1016/s0304-4157(98)00021-5. PMID:9804985.
Wilson, K. J., A. Honegger, R. P. Stötzel, and G. J. Hughes. The Behaviour of Peptides on Reverse-Phase Supports during High-Pressure Liquid Chromatography. The Biochemical Journal. Oct 1981;199(1):31-41. doi:10.1042/bj1990031. PMID:7337711.
Wimley, W. C., and S. H. White. Experimentally Determined Hydrophobicity Scale for Proteins at Membrane Interfaces. Nature Structural Biology. Oct 1996;3(10):842–48. doi:10.1038/nsb1096-842. PMID:8836100.
Wimley, W. C., T. P. Creamer, and S. H. White. Solvation Energies of Amino Acid Side Chains and Backbone in a Family of Host-Guest Pentapeptides. Biochemistry. Apr 1996;35(16):5109–24. doi:10.1021/bi9600153. PMID:8611495.
Wolfenden, R., L. Andersson, P. M. Cullis, and C. C. Southgate. Affinities of Amino Acid Side Chains for Solvent Water. Biochemistry. Feb 1981;20(4):849–55. doi:10.1021/bi00507a030. PMID:7213619.
Zimmerman, J. M., N. Eliezer, and R. Simha. The Characterization of Amino Acid Sequences in Proteins by Statistical Methods. Journal of Theoretical Biology. Nov 1968;21(2):170–201. doi:10.1016/0022-5193(68)90069-6. PMID:5700434.

hydrophobicity_profile(window: int = 11, scale: str = 'KyteDoolittle') → Sequence[float]#

Build a hydrophobicity profile of a sliding window.

Example

>>> peptide = Peptide("ARQQNLFINFCLILIFLLLI")
>>> h = peptide.hydrophobicity_profile(window=12, scale="Eisenberg")
>>> [round(x, 3) for x in h]
[0.083, 0.147, 0.446, 0.632, 0.802, 0.955, 0.955, 0.944, 0.944]

instability_index() → float#

Compute the instability index of a protein sequence.

This function calculates the instability index proposed by Guruprasad et al (1990). This index predicts the stability of a protein based on its dipeptide composition.

Returns:: float – The instability index of the peptide. A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> round(peptide.instability_index(), 2)
83.68

References

Guruprasad, K., B.V. Bhasker Reddy, and M. W. Pandit. Correlation between Stability of a Protein and Its Dipeptide Composition: A Novel Approach for Predicting in Vivo Stability of a Protein from Its Primary Sequence. Protein Engineering, Design and Selection. 1990 Dec;4(2):155–61. doi:10.1093/protein/4.2.155. PMID:2075190.

isoelectric_point(pKscale: str = 'EMBOSS') → float#

Compute the isoelectric point of a protein sequence.

The isoelectric point (pI), is the pH at which a particular molecule or surface carries no net electrical charge.

Parameters:: pKscale (str) – The name of the pKa scale to be used. A list of all the allowed values can be retrieved from the keys of the peptides.tables.PK dictionary.
Returns:: float – The pH at which the peptide has a neutral net charge.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> peptide.isoelectric_point(pKscale="EMBOSS")
9.71...
>>> peptide.isoelectric_point(pKscale="Murray")
9.81...
>>> peptide.isoelectric_point(pKscale="Sillero")
9.89...
>>> peptide.isoelectric_point(pKscale="Solomon")
9.58...
>>> peptide.isoelectric_point(pKscale="Stryer")
9.62...
>>> peptide.isoelectric_point(pKscale="Lehninger")
9.93...
>>> peptide.isoelectric_point(pKscale="Dawson")
9.56...
>>> peptide.isoelectric_point(pKscale="Rodwell")
9.71...

Note

The pI is a variable that affects the solubility of the peptides under certain conditions of pH. When the pH of the solvent is equal to the pI of the protein, it tends to precipitate and lose its biological function.

References

Rice, P., I. Longden, and A. Bleasby. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics. June 2000;16(6):276–77. doi:10.1016/s0168-9525(00)02024-2. PMID:10827456

kidera_factors() → KideraFactors#

Compute the Kidera factors of the peptide.

See KideraFactors for more information.

Returns:: peptides.KideraFactors – The computed average Kidera factors for all the amino acids in the peptide.

Example

>>> peptide = Peptide("KLKLLLLLKLK")
>>> for i, kf in enumerate(peptide.kidera_factors()):
...     print(f"KF{i+1:<3} {kf: .4f}")
KF1   -0.7855
KF2    0.2982
KF3   -0.2364
KF4   -0.0818
KF5    0.2100
KF6   -1.8936
KF7    1.0291
KF8   -0.5127
KF9    0.1118
KF10   0.8100

linker_preference_profile(window: int = 15) → Sequence[float]#

Compute the linker preference profile of a protein sequence.

The linker preference profile is a measure used as a basis for the DomCut method in Suyama & Ohara (2002). The resulting profile can then be used to identify putative domain boundaries in the input protein, either:

Using prior knowledge of the estimated domain count \(D\), in which case the \(D-1\) global minimums in the sequence can be used as cutting points
Without prior knowledge of the domain count, using a fixed threshold to estimate the number of domains and linkers. A cutoff value of \(-0.09\) was selected by the authors optimizing on the specificity / selectivity tradeoff.

References

Suyama, M., O. Ohara O. DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics. Mar 2003;19(5):673-4. doi:10.1093/bioinformatics/btg031. PMID:12651735.

Added in version 0.3.0.

longest_run() → int#

Return the length of the longest consecutive run of the same amino acid.

This method finds the longest stretch of consecutive identical amino acids in the sequence. Long runs of the same amino acid can indicate repetitive regions or low complexity sequences.

Returns:: int – The length of the longest consecutive run, between 1 and len(sequence).

Example

>>> peptide = Peptide("AALS")
>>> peptide.longest_run()
2
>>> peptide = Peptide("AAAA")
>>> peptide.longest_run()
4
>>> peptide = Peptide("ACDEFGHIKLMNPQRSTVWY")
>>> peptide.longest_run()
1
>>> peptide = Peptide("AALLLSSS")
>>> peptide.longest_run()
3

Added in version 0.5.0.

mass_shift(aa_shift: str | Dict[str, float] | None = 'silac_13c', monoisotopic: bool = True) → float#

Compute the mass difference of modified peptides.

This function calculates the mass difference of peptides introduced by chemical modifications or heavy isotope labelling.

Parameters:

aa_shift (str or dict) – Either the key to a pre-defined isotope label (see peptides.tables.MASS_SHIFT), or a dictionary mapping each amino acid to it mass difference in Dalton (use nTer and cTer keys for N-terminal and C-terminal modifications).
monoisotopic (bool) – Flag whether monoisotopic weights of amino-acids should be used.

Returns:

float – The mass difference of the modified peptide.

Example

>>> peptide = Peptide("EGVNDNECEGFFSAR")
>>> peptide.mass_shift(aa_shift="silac_13c")
6.020129...
>>> peptide.mass_shift(aa_shift=dict(R=10.00827))
10.00827...

References

Ong, S-E., I. Kratchmarova, and M. Mann. Properties of 13C-Substituted Arginine in Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC). Journal of Proteome Research. Apr 2003;2(2):173–81. doi:10.1021/pr0255708. PMID:12716131.
Picotti, P., B. Bodenmiller, L. N. Mueller, B. Domon, and R. Aebersold. Full Dynamic Range Proteome Analysis of S. Cerevisiae by Targeted Proteomics. Cell. Aug 2009;138(4):795–806. doi:10.1016/j.cell.2009.05.051. PMID:19664813.

max_frequency() → float#

Return the maximum frequency of any amino acid in the peptide.

This method identifies the amino acid that appears most frequently in the sequence and returns its frequency. A high maximum frequency indicates low sequence diversity.

Returns:: float – The maximum frequency of any amino acid in the peptide, between 1/len(sequence) and 1.0.

Example

>>> peptide = Peptide("AALS")
>>> peptide.max_frequency()
0.5
>>> peptide = Peptide("AAAA")
>>> peptide.max_frequency()
1.0
>>> peptide = Peptide("ACDEFGHIKLMNPQRSTVWY")
>>> peptide.max_frequency()
0.05

Added in version 0.5.0.

membrane_position_profile(window: int = 11, angle: int = 100) → List[str]#

Compute the theoretical class of a protein sequence.

This function builds a profile predicting the theoretical class of a section of the peptide based on the relationship between the hydrophobic moment and hydrophobicity scale as proposed by Eisenberg (1984).

Parameters:

window (int) – The window size to consider when building the profile.
angle (int) – The protein rotational angle, in degrees, for which to compute the hydrophobic moment profile. Usual values are 100 for α-helix, and 160 for β-sheet.

Returns:

list of str – A list containing a one-character code for each window starting position: either 'G' for globular, 'T' for transmembrane, or 'S' for surface.

Example

>>> peptide = Peptide("ARQQNLFINFCLILIFLLLI")
>>> peptide.membrane_position_profile(window=12, angle=100)
['G', 'G', 'G', 'T', 'S', 'T', 'T', 'T', 'T']
>>> peptide.membrane_position_profile(window=12, angle=160)
['G', 'G', 'G', 'S', 'S', 'S', 'S', 'S', 'S']

References

Eisenberg, D. Three-Dimensional Structure of Membrane and Surface Proteins. Annual Review of Biochemistry. July 1984;53:595–623. doi:10.1146/annurev.bi.53.070184.003115. PMID:6383201.
Eisenberg, D., E. Schwarz, M. Komaromy, and R. Wall. Analysis of Membrane and Surface Protein Sequences with the Hydrophobic Moment Plot. Journal of Molecular Biology. Oct 1984;179(1):125–42. doi:10.1016/0022-2836(84)90309-7. PMID:6502707.
Eisenberg, D., R. M. Weiss, and T. C. Terwilliger. The Helical Hydrophobic Moment: A Measure of the Amphiphilicity of a Helix. Nature. Sep 1982;299(5881):371–74. doi:10.1038/299371a0. PMID:7110359

molecular_weight(average: str = 'expasy', aa_shift: str | Dict[str, float] | None = None) → float#

Compute the molecular weight of a protein sequence.

This function calculates the molecular weight of a protein sequence. It is calculated as the sum of the mass of each amino acid using one of the 3 available scales. It also supports mass calculation of proteins with predefined or custom stable isotope mass labels.

Parameters:

average (str) – The name of the average amino acid average weight scale. See peptides.tables.MOLECULAR_WEIGHT for a list of appropriate values.
aa_shift (str, dict or None) – Either an appropriate shift value to pass to Peptide.mass_shift, or None to get the unmodified weight.

Returns:

float – The molecular weight of the peptide, in Dalton.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> peptide.molecular_weight()
2485.91...
>>> peptide.molecular_weight(average="mascot")
2485.89...
>>> peptide.molecular_weight(average="monoisotopic")
2484.11...

References

Wilkins, M. R., E. Gasteiger, A. Bairoch, J. C. Sanchez, K. L. Williams, R. D. Appel, and D. F. Hochstrasser. Protein Identification and Analysis Tools in the ExPASy Server. Methods in Molecular Biology. 1992;112: 531–52. doi:10.1385/1-59259-584-7:531. PMID:10027275

ms_whim_scores() → MSWHIMScores#

Compute the MS-WHIM scores of the peptide.

See MSWHIMScores for more information.

Returns:: peptides.MSWHIMScores – The compute average of MS-WHIM scores of all the amino acids in the peptide.

Example

>>> peptide = Peptide("KLKLLLLLKLK")
>>> for i, mw in enumerate(peptide.ms_whim_scores()):
...     print(f"MSWHIM{i+1:<3} {mw: .4f}")
MSWHIM1   -0.6564
MSWHIM2    0.4873
MSWHIM3    0.1164

mz(charge: int = 2, aa_shift: str | Dict[str, float] | None = None, cysteins: float = 57.021464) → float#

Compute the m/z (mass over charge) ratio for a peptide.

This function calculates the (monoisotopic) mass over charge ratio (m/z) for peptides, as measured in mass spectrometry.

Parameters:

charge (int) – The net charge for which the m/z should be computed.
aa_shift (str, dict or None) – Either an appropriate shift value to pass to Peptide.mass_shift, or None to get the unmodified weight.
cysteins (float) – The mass shift (in Dalton) of blocked cysteins. Default corresponds to cysteins blocked by iodoacetamide.

Returns:

float – The m/z ratio of the peptide.

Example

>>> peptide = Peptide("EGVNDNECEGFFSAR")
>>> peptide.mz()
865.857...
>>> peptide.mz(aa_shift=dict(K=6.020129, R=6.020129))
868.867...
>>> peptide.mz(aa_shift="silac_13c", cysteins=58.005479)
869.359...

nutrient_cost(nutrient: str = 'glucose', organism: str = 'yeast', relative: bool = False)#

Estimate the nutrient cost to biosynthesize a peptide.

The nutrient cost was proposed by Barton et al. to estimate the energy cost required to biosynthesize a peptide based on genome-scale metabolic modeling in Saccharomyces cerevisiae and Escherichia coli. This approach offers advantages to estimate costs in nutrient-limited environments compared to energy-based methods.

Parameters:

nutrient (str) – The name of the nutrient, one of glucose, sulphate or ammonia.
organism (str) – The reference organism for which the values were computed, either yeast or ecoli.
relative (bool) – Whether to use the absolute or relative values when computing costs.

Example

>>> peptide = Peptide("SDKEVDEVDAALSDLEITLE")
>>> peptide.nutrient_cost("glucose", "ecoli")
15.763...
>>> peptide.nutrient_cost("ammonia", "yeast", relative=True)
10.839...

References

Barton, M. D., Delneri, D., Oliver, S. G., Rattray, M., & Bergman, C. M. (2010). Evolutionary systems biology of amino acid biosynthetic cost in yeast. PloS one, 5(8), e11935. PMID:20808905 doi:10.1371/journal.pone.0011935.

pcp_descriptors() → PCPDescriptors#

Compute the Physical-Chemical Properties descriptors of the peptide.

See PCPDescriptors for more information.

Returns:: peptides.PCPDescriptors – The computed average of PCP descriptors of all the amino acids in the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, pcp in enumerate(peptide.pcp_descriptors()):
...     print(f"E{i+1:<3} {pcp: .5f}")
E1    0.01090
E2    0.03810
E3    0.12505
E4    0.04095
E5   -0.10595

physical_descriptors() → PhysicalDescriptors#

Compute the Physical Descriptors of the peptide.

See PhysicalDescriptors for more information.

Returns:: peptides.PhyiscalDescriptors – The computed average of Physical Descriptors of all the amino acids in the peptide. PD1 is related to volume while PD2 is related to hydrophilicity.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, pd in enumerate(peptide.physical_descriptors()):
...     print(f"PD{i+1:<3} {pd: .4f}")
PD1    0.1190
PD2    0.2825

prin_components() → PRINComponents#

Compute the PRIN components of the peptide.

See PRINComponents for more information.

Returns:: peptides.PRINComponents – The computed average of PRIN components of all the amino acids in the peptide.

profile(table: Dict[str, float], window: int = 1, default: float = 0.0) → Sequence[float]#

Compute a generic per-residue profile from per-residue indices.

Parameters:

table (dict) – The values per residue to apply to the whole protein sequence.
window (int) – The window size for computing the profile. Leave as 1 to return per-residue values.
default (float) – The default value to use for amino-acids that are not present in the given table.

Returns:

collections.abc.Sequence of float – The per-residue profile values, averaged in the given window size. When window is larger than the available number of resiudes, an empty sequence is returned.

Example

>>> peptide = Peptide("PKLVCLKKC")
>>> peptide.profile(peptides.tables.CHARGE['sign'])
[0.0, 1.0, 0.0, 0.0, -1.0, 0.0, 1.0, 1.0, -1.0]
>>> peptide.profile(peptides.tables.MOLECULAR_WEIGHT['expasy'], 5)
[108..., 111..., 111..., 114..., 115...]

Added in version 0.3.0.

protfp_descriptors() → ProtFPDescriptors#

Compute the ProtFP descriptors of the peptide.

See ProtFPDescriptors for more information.

Returns:: peptides.ProtFPDescriptors – The computed average of ProtFP descriptors of all the amino acids in the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, fp in enumerate(peptide.protfp_descriptors()):
...     print(f"ProtFP{i+1:<3} {fp: .4f}")
ProtFP1    0.2065
ProtFP2   -0.0565
ProtFP3    1.9930
ProtFP4   -0.2845
ProtFP5    0.7315
ProtFP6    0.7000
ProtFP7    0.1715
ProtFP8    0.1135

sneath_vectors() → SneathVectors#

Compute the Sneath vectors for the peptide.

See SneathVectors for more information.

Returns:: peptides.SneathVectors – The computed average of Sneath vectors of all the amino acids in the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, fp in enumerate(peptide.sneath_vectors()):
...     print(f"SV{i+1:<3} {fp: .5f}")
SV1    0.19620
SV2    0.04655
SV3    0.04050
SV4    0.02775

st_scales() → STScales#

Compute the ST-scales of the peptide.

See STScales for more information.

Returns:: peptides.STScales – The computed average of ST-scales of all the amino acids in the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, st in enumerate(peptide.st_scales()):
...     print(f"ST{i+1:<3} {st: .5f}")
ST1   -0.63760
ST2    0.07965
ST3    0.05150
ST4    0.07135
ST5   -0.27905
ST6   -0.80995
ST7    0.58020
ST8    0.54400

structural_class(frequencies: str = 'Nakashima', distance: str = 'mahalanobis') → str#

Predict the structural class of the peptide from its sequence.

The structural class of a protein, as defined in Levitt and Chothia (1976), can be either α, β, α+β, or α/β, with ζ being later defined for irregular proteins. It depends on the secondary structure of the protein. Several methods have been proposed to elucidate the structural class from the amino acid sequence, all based on similarity with proteins which structures have been elucidated.

Parameters:

frequencies (str) – The frequencies of the amino acids in proteins of different structural classes to use as reference centroids. Use "Chou" to load the frequencies of the 64 proteins analyzed in Chou (1989), "Nakashima" to use the normalized frequencies of the 135 proteins analyzed in Nakashima et al. (1986) and Zhang & Chou (1995), or "ChouZhang" to load the frequencies of 120 proteins used in Chou & Zhang (1995).
distance (str) – The distance metric to use in the 20-D space formed by the 20 usual amino acid to find the nearest structural class for the peptide. Use "cityblock" to use the Manhattan distance like in Chou (1989), "euclidean" to use the Euclidean distance like in Nakashima et al (1986), "correlation" to use the correlation distance like in Chou & Zhang (1992), "mahalanobis" to use the Mahalanobis distance like in Chou & Zhang (1995), or "discriminant" to use the Bayes discriminant like in Chou et al. (1998).

Returns:

str – The structural class the protein most likely belongs to. Note that some classes may not be predictable, depending on the reference frequencies being used (at the moment, the ζ class can only be predicted from the Nakashima frequencies with euclidean or manhattan distances).

Example

Predict the structural class of the skipjack tuna Cytochrome C, (P0025), an α protein

>>> p = Peptide(
...     "MGDVAKGKKTFVQKCAQCHTVENGGKHKVGPNLWGLFGRKTGQAEGYSYT"
...     "DANKSKGIVWNENTLMEYLENPKKYIPGTKMIFAGIKKKGERQDLVAYLK"
...     "SATS"
... )
>>> p.structural_class("Nakashima", distance="mahalanobis")
'alpha'
>>> p.structural_class("ChouZhang", distance="mahalanobis")
'beta'
>>> p.structural_class("Chou", distance="correlation")
'alpha'
>>> p.structural_class("Nakashima", distance="euclidean")
'alpha'
>>> p.structural_class("Chou", distance="cityblock")
'alpha+beta'

Predict the structural class of the sea krait Erabutoxin B (Q90VW1), a β protein:

>>> p = Peptide(
...     "MKTLLLTLVVVTIVCLDLGYTRICFNHQSSQPQTTKTCSPGESSCYHKQW"
...     "SDFRGTIIERGCGCPTVKPGIKLSCCESEVCNN"
... )
>>> p.structural_class("Nakashima", distance="mahalanobis")
'beta'
>>> p.structural_class("ChouZhang", distance="mahalanobis")
'alpha+beta'
>>> p.structural_class("Chou", distance="correlation")
'beta'
>>> p.structural_class("Nakashima", distance="euclidean")
'zeta'
>>> p.structural_class("Chou", distance="cityblock")
'beta'

Predict the structural class of the Arthrospira platensis Ferredoxin (P00246), a ζ protein:

>>> p = Peptide(
...     "MATYKVTLINEAEGINETIDCDDDTYILDAAEEAGLDLPYSCRAGACSTC"
...     "AGTITSGTIDQSDQSFLDDDQIEAGYVLTCVAYPTSDCTIKTHQEEGLY"
... )
>>> p.structural_class("Nakashima", distance="euclidean")
'zeta'

References

Chou, K-C., W-M. Liu, G. M. Maggiora, and C-T. Zhang. Prediction and Classification of Domain Structural Classes. Proteins: Structure, Function, and Genetics. Apr 1998;31(1):97–103. PMID:9552161.
Chou, K-C., and C-T. Zhang. Prediction of Protein Structural Classes. Critical Reviews in Biochemistry and Molecular Biology. Feb 1995;30:275–349. doi:10.3109/10409239509083488. PMID:7587280.
Chou, K-C., and C-T. Zhang. A Correlation-Coefficient Method to Predicting Protein-Structural Classes from Amino Acid Compositions. European Journal of Biochemistry. 1992;207(2):429–33. doi:10.1111/j.1432-1033.1992.tb17067.x. PMID:1633801.
Chou, P. Y. Prediction of Protein Structural Classes from Amino Acid Compositions. In Prediction of Protein Structure and the Principles of Protein Conformation, edited by G. D. Fasman. Springer US. 1989:549–86. doi:10.1007/978-1-4613-1571-1. ISBN:978-0-306-43131-9.
Nakashima, H., K. Nishikawa, and T. Ooi. The Folding Type of a Protein Is Relevant to the Amino Acid Composition. Journal of Biochemistry. Jan 1986;99(1):153–62. doi:10.1093/oxfordjournals.jbchem.a135454. PMID:3957893.
Zhang, Chun-Ting, and Kuo-Chen Chou. An Eigenvalue-Eigenvector Approach to Predicting Protein Folding Types. Journal of Protein Chemistry. Jul 1995;14(5):309–26. doi:10.1007/BF01886788. PMID:8590599.
Zhou, G.P., and N. Assa-Munt. Some Insights into Protein Structural Class Prediction. Proteins: Structure, Function, and Bioinformatics. 2001;44(1):57–59. doi:10.1002/prot.1071. PMID:11354006.

svger_descriptors() → SVGERDescriptors#

Compute the SVGER descriptors of the peptide.

See SVGERDescriptors for more information.

Returns:: peptides.SVGERDescriptors – The computed average of SVGER descriptors of all the amino acids in the peptide.

Added in version 0.3.2.

t_scales() → TScales#

Compute the T-scales of the peptide.

See TScales for more information.

Returns:: peptides.TScales – The computed average of T-scales of all the amino acids in the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, t in enumerate(peptide.t_scales()):
...     print(f"T{i+1:<3} {t: .4f}")
T1   -3.2700
T2   -0.0035
T3   -0.3855
T4   -0.1475
T5    0.7585

vhse_scales() → VHSEScales#

Compute the VHSE-scales of the peptide.

See VHSEScales for more information.

Returns:: peptides.VHSEScales – The computed average of VHSE-scales of the amino acids in the peptide. VHSE1 and VHSE2 represent hydrophobic properties, VHSE3 and VHSE4 represent steric properties, while VHSE5, VHSE6, VHSE7 and VHSE8 represent electronic properties.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, vhse in enumerate(peptide.vhse_scales()):
...     print(f"VHSE{i+1:<3} {vhse: .4f}")
VHSE1   -0.1150
VHSE2    0.0630
VHSE3   -0.0055
VHSE4    0.7955
VHSE5    0.4355
VHSE6    0.2485
VHSE7    0.1740
VHSE8   -0.0960

vstpv_descriptors() → VSTPVDescriptors#

Compute the VSTPV descriptors of the peptide.

See VSTPVDescriptors for more information.

Returns:: peptides.VSTPVDescriptors – The computed VSTPV descriptors for the peptide.

z_scales() → ZScales#

Compute the Z-scales of the peptide.

See ZScales for more information.

Returns:: peptides.ZScales – The computed average of Z-scales of all the amino acid in the peptide.

Example

>>> peptide = Peptide("QWGRRCCGWGPGRRYCVRWC")
>>> for i, z in enumerate(peptide.z_scales()):
...     print(f"Z{i+1:<3} {0.0+round(z,5): .4f}")
Z1    0.5520
Z2    0.0985
Z3    0.0000
Z4    0.8130
Z5   -0.8285

Peptide#

This Page