| util.formula {CHNOSZ} | R Documentation |
Calculate the standard molal entropy of elements in a compound; calculate the standard molal Gibbs energy or enthalpy of formation, or standard molal entropy, from the other two; list coefficients of selected elements in a chemical formula; calculate the average oxidation number of carbon. Also, create a matrix having the chemical formulas of amino acid residues in proteins and calculate the chemical formulas of proteins from their amino acid composition.
GHS(species = NULL, DG = NA, DH = NA, S = NA, T = thermo$opt$Tr)
element(compound, property = c("mass","entropy"))
expand.formula(elements, makeup)
ZC(x)
residue.formula()
protein.formula(proteins, as.residue = FALSE)
species |
character, formula of a compound from which to calculate entropies of the elements. |
DG |
numeric, standard molal Gibbs energy of formation. |
DH |
numeric, standard molal enthalpy of formation. |
S |
numeric, standard molal molal entropy. |
T |
numeric, temperature in Kelvin. |
compound |
character, name of element(s) or compound(s). |
property |
character, name(s) of thermodynamic properties. |
elements |
character, name(s) of elements. |
makeup |
dataframe, elemental composition of a compound returned by makeup. |
x |
character, object representing chemical formula. |
proteins |
dataframe, amino acid composition of one or more proteins in the same format as thermo$protein |
as.residue |
logical, return the per-residue formula of the protein(s)? |
GHS computes one of the standard molal Gibbs energy or enthalpy of formation from the elements (DG, DH) or entropy (S) at 298.15 K and 1 bar from values of the other two. If the species argument is present, it is used to calculate the entropies of the elements (Se) using element, otherwise Se is set to zero. The equation in effect can be written as DG = DH - T * DS, where DS = S - Se and T denotes the reference temperature of 298.15 K. If two of DG, DH, and S are provided, the value of the third is returned. If three are provided, the value of DG in the arguments is ignored and the calculated value of DG is returned. If none of DG, DH or S are provided, the value of Se is returned. If only one of the values is provided, an error results. Units of cal mol^-1 (DG, DH) and cal K^-1 mol^-1 (S) are assumed. It T is provided, it use used instead of the reference temperature.
element returns a dataframe of the mass and entropy of one or more elements or formulas given in compound. The property can be mass and/or entropy.
expand.formula converts a 1-column dataframe representing the elemental composition of a compound (see makeup) to a numeric vector, each value of which is the coefficient of the elements given in the argument. If any of these is not present in the makeup dataframe, its coefficient is set to zero. A non-zero coefficient of an element in the makeup dataframe does not appear in the output if that element is not one of elements.
ZC returns the nominal carbon oxidation state for the chemical formula represented by x. (For discussion of nominal carbon oxidation state, see Hendrickson et al., 1970; Buvet, 1983.) If carbon is not present in the formula the result is NaN.
protein.formula exists to quickly compute the chemical formulas of many proteins. The proteins argument contains the amino acid compositions of the proteins in the same format as the thermo$protein dataframe. residue.formula is called to calculate the chemical formulas of each of the 20 common amino acid residues (and the terminal H- and -OH). The amino acid compositions of the proteins and the output of residue.formula are multiplied using matrix multiplication to generate the result.
GHS and ZC return numeric values. expand.formula returns a numeric vector.
Buvet, R. (1983) General criteria for the fulfillment of redox reactions, in Bioelectrochemistry I: Biological Redox Reactions, Milazzo, G. and Blank, M., eds., Plenum Press, New York, p. 15–50. http://www.worldcat.org/oclc/9282370
Hendrickson, J. B., Cram, D. J., and Hammond, G. S. (1970) Organic Chemistry, 3rd ed., McGraw-Hill, New York, 1279 p. http://www.worldcat.org/oclc/78308
makeup can be used to count the elements in formulas and display formulas in various formats.
## converting among Gibbs, enthalpy, entropy
GHS("H") # entropy of H (element)
# calculate enthalpy of formation of arsenopyrite
GHS("FeAsS",DG=-33843,S=68.5)
# return the value of DG calculated from DH and S
# cf. -56687.71 from subcrt("water")
GHS("H2O",DH=-68316.76,S=16.7123)
## mass and entropy of compounds of elements
element("CH4")
element(c("CH4","H2O"),"mass")
element("Z") # charge
# same mass, opposite entropy as charge
element("Z-1") # i.e., electron
## count selected elements in a formula
m <- makeup("H2O")
expand.formula(c("H","O"),m)
expand.formula(c("C","H","S"),m)
## calculate the average chemical formula of all of
## the proteins in CHNOSZ' database
## this is much faster than a for-loop
pf <- protein.formula(thermo$protein)
colSums(pf)/nrow(pf)
## nominal carbon oxidation states
ZC("CO2") # 4
ZC("CH4") # -4
ZC("CHNOSZ") # 7
si <- info(info("LYSC_CHICK"))
ZC(si$formula) # 0.01631
## plot ZC of reference protein sequence
## for different organisms
file <- system.file("extdata/refseq/protein_refseq.csv.xz",package="CHNOSZ")
ip <- add.protein(file)
# only use those organisms with a certain
# number of sequenced bases
ip <- ip[as.numeric(thermo$protein$abbrv[ip])>100000]
pf <- protein.formula(thermo$protein[ip,])
zc <- ZC(pf)
# the organism names we search for
# "" matches all organisms
terms <- c("Streptomyces","Pseudomonas","Salmonella",
"Escherichia","Vibrio","Bacteroides","Lactobacillus",
"Staphylococcus","Streptococcus","Methano","Bacillus","Thermo","")
tps <- thermo$protein$ref[ip]
plot(0,0,xlim=c(1,13),ylim=c(-0.3,-0.05),pch="",
ylab="average oxidation state of carbon in proteins",
xlab="",xaxt="n",mar=c(6,3,1,1))
for(i in 1:length(terms)) {
it <- grep(terms[i],tps)
zct <- zc[it]
points(jitter(rep(i,length(zct))),zct,pch=20)
}
terms[13] <- paste("all organisms")
axis(1,1:13,terms,las=2)
title(main=paste("Average Oxidation State of Carbon:",
"Total Protein per taxID in NCBI RefSeq",sep="\n"))