| get.expr {CHNOSZ} | R Documentation |
Get abundance data from a protein expression experiment and add the proteins to the working instance of CHNOSZ.
get.expr(file, idcol, abundcol, seqfile, filter=NULL,
is.log=FALSE, loga.total = 0)
file |
character, name of file with sequence IDs and abundance data. |
idcol |
character, name of the column with sequence IDs. |
abundcol |
character, name of the column with abundances. |
seqfile |
character, name of the FASTA file with protein sequences. |
filter |
list, optional filters to apply. |
is.log |
logical, are the abundances in the file in logarithmic (base 10) units? |
loga.total |
numeric, logarithm of total activity of residues. |
This function reads a CSV file that contains protein sequence IDs and protein abundance data. The header (first line) of this file contains the column names; the names of the columns holding the sequence IDs and protein abundances are indicated by idcol and abundcol, respectively. The sequence IDs are searched for in the accession lines in the FASTA file indicated by seqfile (using grep); a match can occur in any part of an accession line, and the first such match is used. Any IDs that are NA or can not be found in seqfile are excluded from further consideration. The amino acid compositions of the matched proteins are computed (using read.fasta) and are added to the inventory of proteins in CHNOSZ (thermo$protein).
The function returns values of the logarithms of activities of the proteins. We associate molality with activity (i.e., activity coefficients are implicitly unity). If loga.total is not NULL, the abundances of the proteins from the data file are scaled to give a logarithm of total activity of amino acid residues equal to the value in loga.total, usually set to zero (see unitize). This operation preserves the relative abundances of the proteins. If the abundances of the proteins in the file are already in logarithmic units, set is.log to TRUE.
If seqfile is one of SGD, ECO or HUM it refers to the database of amino acid compositions of proteins packaged with CHNOSZ for either Saccharomyces cerevisiae, Escherichia coli or Homo sapiens. In this case, the search for matching IDs is performed using get.protein.
The data file can be filtered by using filter. This argument should be a list with one element, the name of which indicates the column to apply the filter to, and the value of which is a search term.
Returns a list with objects iprotein (the indices of the proteins in thermo$protein) and loga.ref (the logarithms of activities of the proteins).
findit for finding combinations of chemical activities that optimize the fit of metastable protein assemblages to experimental protein abundances.
# let's use a sample data file
file <- system.file("extdata/abundance/ISR+08.csv",package="CHNOSZ")
# read the abundances and get the proteins from ECO.csv
expr <- get.expr(file,"ID","emPAI","ECO")
# what if we just wanted kinases?
expr <- get.expr(file,"ID","emPAI","ECO",list(description="kinase"))
# the abundances were scaled so that the total activity of residues is unity
pl <- protein.length(-expr$iprotein)
stopifnot(all.equal(sum(pl*10^expr$loga),1))
# see the 'protactiv' vignette for comparison with equilibrium calculations
# if you want to read the protein sequences from a FASTA file...
# e <- get.expr(file,"ID","emPAI","ECOLI.fasta")