wiki-archive/twiki/data/Phylogenetics/MIAPAWorkshop2011Ideas.txt,v

513 lines
25 KiB
Plaintext

head 1.19;
access;
symbols;
locks; strict;
comment @# @;
1.19
date 2011.11.02.13.27.44; author ArlinStoltzfus; state Exp;
branches;
next 1.18;
1.18
date 2011.10.28.21.17.50; author HilmarLapp; state Exp;
branches;
next 1.17;
1.17
date 2011.10.19.12.49.53; author ArlinStoltzfus; state Exp;
branches;
next 1.16;
1.16
date 2011.10.18.23.22.22; author RossMounce; state Exp;
branches;
next 1.15;
1.15
date 2011.10.18.19.41.39; author ArlinStoltzfus; state Exp;
branches;
next 1.14;
1.14
date 2011.10.18.17.18.32; author ArlinStoltzfus; state Exp;
branches;
next 1.13;
1.13
date 2011.10.18.16.16.52; author KarenCranstn; state Exp;
branches;
next 1.12;
1.12
date 2011.10.18.16.00.28; author ArlinStoltzfus; state Exp;
branches;
next 1.11;
1.11
date 2011.10.18.14.39.38; author KarenCranstn; state Exp;
branches;
next 1.10;
1.10
date 2011.10.18.13.34.16; author ArlinStoltzfus; state Exp;
branches;
next 1.9;
1.9
date 2011.10.18.01.00.12; author ArlinStoltzfus; state Exp;
branches;
next 1.8;
1.8
date 2011.10.17.22.21.13; author ArlinStoltzfus; state Exp;
branches;
next 1.7;
1.7
date 2011.10.17.18.40.00; author ArlinStoltzfus; state Exp;
branches;
next 1.6;
1.6
date 2011.10.17.09.32.26; author EnricoPontelli; state Exp;
branches;
next 1.5;
1.5
date 2011.10.15.16.06.51; author ArlinStoltzfus; state Exp;
branches;
next 1.4;
1.4
date 2011.10.14.15.58.03; author ArlinStoltzfus; state Exp;
branches;
next 1.3;
1.3
date 2011.10.12.19.40.46; author MaryamPanahiazar; state Exp;
branches;
next 1.2;
1.2
date 2011.10.12.15.06.16; author ArlinStoltzfus; state Exp;
branches;
next 1.1;
1.1
date 2011.10.07.16.15.09; author ArlinStoltzfus; state Exp;
branches;
next ;
desc
@none
@
1.19
log
@none
@
text
@%META:TOPICINFO{author="ArlinStoltzfus" date="1320240464" format="1.1" version="1.19"}%
%META:TOPICPARENT{name="MIAPAWorkshop2011"}%
---+++ Scoping questions
The scoping questions originally listed here were addressed at the TDWG workshop in 2011, with results available at [[http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/MIAPADraft#Scope_discussion][http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/MIAPADraft#Scope_discussion]].
---+++ Exploring the Role of OBI in MIAPA
We would like MIAPA to develop as a set of concepts and relations that are formalized within an ontology. In this regard, we need to recognized that a significant amount of work has already been done in the context of three ontologies
* PhylOnt created by Leebens-Mack's team
* CDAO (including the v2 version that includes a classification of methods)
* the Ontology for Biomedical Investigations ([[http://obi-ontology.org/page/Main_Page][OBI]])
In particular, it appears that OBI already provides the a rich infrastructure for the general description of experiments and protocols; we believe that much can be achieved by simply integrating these three ontologies. Our proposal is to integrate in the use-cases and benchmark activities discussed below exercises aimed at using the three ontologies to create sample reports.
Interested parties: Son Cao Tran
---+++ Annotation Phylogeny Papers with PhylOnt Ontology and other NCBO ontologies
We implemented a tools to annotate concepts in Phylogeny related documents with concepts from ontology such as PhylOnt ontology and other related ontologies.
after annotation user can publish the annotations and retrieve annotation for re-use.
Interested parties: Jim Leebens-Mack and Maryam Panahiazar
---+++ Actual examples of MIAPA-compliant reports
People have been talking about MIAPA for 5 years now, but there is not a single example of a MIAPA-compliant report. Why don't we make a few? We could use the [[http://www.evoio.org/wiki/MIAPA_Draft_RFC][strawman draft of MIAPA]]. Like most MIBBI standards, it does not specify a format. Therefore, we could use currently available formats, perhaps supplemented with metadata files. Or we could use a NEXUS file with comments (square-bracketed strings) to contain the MIAPA-relevant annotations.
Because the point of this project is just to create sample records, we can avoid complexities (e.g., highly complex workflows due to iteration or reuse) and start with some reports that are easy to characterize. The rough outline for the project would be:
* identify a few (3 to 6) simple studies for which we can obtain trees, matrix, and PDF of publication
* create a record for each study that complies with the strawman draft of MIAPA
* for one of the studies, create different versions of the same record using different formats (e.g., NEXUS-with-comments vs. NeXML vs. treefile+alignmentfile+metadatafile)
The MIAPA page has [[http://www.evoio.org/wiki/MIAPA/PhyloWays#some_candidate_cases][a short list of examples, some simple]], but I don't know how many we can get PDFs for.
---+++ Create benchmark of annotated pubs relating phylogenies and biodiversity
The notion of annotating a set of phylogenetic studies was suggested in the workshop announcement. The process of annotation would help us to work out the metadata attributes needed to describe a study, e.g., describing phylogenetic methods and workflows is a thorny problem. The resulting set of annotated studies could serve as a benchmark or training set for automated methods.
The publications could be a random sample from the relevant literature, or some set of canonical or exemplary publications.
Alternatively, given that our larger goal is to facilitate data-sharing, why waste our time annotating publications for which we are not offering the supporting data for public re-use via some searchable interface? Instead, why not annotate some studies that are already poised for re-use, with the data available via a searchable interface? An example would be the 300 studies with 2010 publication dates in TreeBASE. If we work with the TreeBASE providers, perhaps we could provide an search interface that takes advantage of the extra annotations. Another important set of re-useable trees is the set of trees agglomerated into the APG tree, or the ToLWeb tree.
Interested parties:
---+++ Sharing the tree of life: What can we learn from mega-phylogeny resources & use-cases?
Trees with large numbers of species have many uses in community ecology, bioinformatics, biodiversity studies, etc. NSF has poured a large amount of money into "assembling the tree of life" for the benefit of the scientific community, but there is no comprehensive tree-of-life resource for scientists to use. How can this situation be improved, and in particular, how does improvement depend on information standards?
One way to understand this situation is to study which resources promote re-use, and why. ToLWeb has hundreds of curators and a huge code-base, but it seems to be rarely used for research. By comparison, Phylomatic rocks, even though it is an extremely small project and is limited to plants. The NCBI taxonomy hierarchy provides the organizing framework for a number of projects (e.g., Rod Page's iPhylo; Ross Mounce has started [[http://www.citeulike.org/user/rossmounce/tag/ncbitaxonomy][a short list of citations for such projects]]), in spite of the fact that it is not properly a phylogeny, and it is distributed in a form (SQL database dump) that most users would find intimidating.
Another way to understand this situation is to look at what users are doing when they make use of a tree of life. The studies listed below rely on a large tree to accomplish research objectives such as computing measures of phylogenetic diversity, or studying trait correlations among a large set of species.
* [[http://www.ncbi.nlm.nih.gov/pubmed/21402914][Burns JH, Strauss SY: More closely related species are more ecologically similar in an experimental test. Proc Natl Acad Sci U S A 2011, 108(13):5302-5307.]]
* [[http://dx.doi.org/10.1111/j.1600-0706.2010.18898.x][Duarte LdS: Phylogenetic habitat filtering influences forest nucleation in grasslands. 2011, 120(2):208-215.]]
* [[http://www.amjbot.org/content/early/2011/01/25/ajb.1000154.abstract][Walls RL: Angiosperm leaf vein patterns are linked to leaf functions in a global-scale data set. American journal of botany 2011.]]
* [[http://dx.doi.org/10.1111/j.1466-8238.2010.00582.x][Zhang S-B, Ferry Slik JW, Zhang J-L, Cao K-F: Spatial patterns of wood traits in China are controlled by phylogeny and the environment. Global Ecology and Biogeography 2011, 20(2):241-250.]]
* [[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=21166972][Morlon H, Schwilk DW, Bryant JA, Marquet PA, Rebelo AG, Tauss C, Bohannan BJ, Green JL: Spatial patterns of phylogenetic diversity. Ecology letters 2011, 14(2):141-149.]]
* [[http://dx.doi.org/10.1016/j.mambio.2010.03.004][Riek A: Allometry of milk intake at peak lactation. Mammalian Biology Zeitschrift fur Saugetierkunde 2011, 76(1):3-11.]]
Interested parties:
-- Main.ArlinStoltzfus - 07 Oct 2011@
1.18
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="HilmarLapp" date="1319836670" format="1.1" version="1.18"}%
d5 1
a5 12
What is the scope of "phylogenetic analysis" for which MIAPA is being developed? Some questions are below. One thing to keep in mind is that a MIAPA standard might be applied to cases for which it was not designed. The real issue not whether MIAPA might be applied ad hoc to some odd cases, but whether we are going to go out of our way to ensure that such cases are covered. Another thing to keep in mind is that the questions below are difficult, and my not have simple yes-or-no answers. It may be necessary to enumerate sub-cases and gather some facts in order to decide how important or relevant is a particular case.
1. will MIAPA cover cases other than publication-associated archiving, i.e., will it provide coverage for trees generated by an online resource separate from any specific publication?
1. Is MIAPA restricted to studies involving molecular sequence data (as the MIAPA paper sometimes implies with its references to "alignments", as opposed to the more general term of "character-state data")?
1. is it restricted only to reports of branching phylogenies, as opposed to reports of comparative analyses that use evolutionary principles but do not derive a conventional tree (related: the Bayesian question below)? What if a method generates only splits, not (conventional) trees?
1. is the scope restricted to reports deriving a phylogeny from comparative biological data, as opposed to
1. studies that use an existing tree (perhaps pruned or grafted) to test models of character evolution, reconstruct ancestral states, or infer dates?
1. studies that derive phylogenies from simulated (not observed) data?
1. does it apply to Bayesian studies that integrate over a phylogenetic distribution to compute some statistic (e.g., [[http://dx.doi.org/10.1126/science.288.5475.2349][a gain-loss ratio]]) but do not construe any token tree or modal tree to be representative of this distribution?
1. does it apply to population studies for sexual organisms, where tree-like constructs are used, but have a somewhat different meaning, and different typical rules?
1. does it apply to tree-like constructs that can be construed as something other than a graph representing an inferred history (e.g., in the simplest case, a phenogram; or in a more complex case, a neighbor-joining tree, which some cladists don't accept as a "phylogeny" on the grounds of having been inferred using distances)
1. is it restricted to cases of phylogenetic inference, as opposed to cases in which a phylogeny is observed, generated (via computer simulations), or imposed as an experimental regime (e.g., in lab cultures)?
1. in general, will MIAPA arbitrarily exclude rare cases that clearly fall into the domain of phylogenetic analysis, but do not fit a standard workflow, e.g., Warnow and colleagues have developed an efficient method to infer a phylogeny from tens or hundreds of thousands of sequences without inferring an alignment of all the sequences. No alignment, therefore no MIAPA-compliant record?
@
1.17
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1319028593" format="1.1" reprev="1.17" version="1.17"}%
a2 4
---++ Unsorted ideas
Put your idea here. Adding your name, and links to other resources, would be helpful.
d71 1
a71 5
---+++ another idea
-- Main.ArlinStoltzfus - 07 Oct 2011
@
1.16
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="RossMounce" date="1318980142" format="1.1" reprev="1.16" version="1.16"}%
d10 11
a20 11
* will MIAPA cover cases other than publication-associated archiving, i.e., will it provide coverage for trees generated by an online resource separate from any specific publication?
* Is MIAPA restricted to studies involving molecular sequence data (as the MIAPA paper sometimes implies with its references to "alignments", as opposed to the more general term of "character-state data")?
* is it restricted only to reports of branching phylogenies, as opposed to reports of comparative analyses that use evolutionary principles but do not derive a conventional tree (related: the Bayesian question below)? What if a method generates only splits, not (conventional) trees?
* is the scope restricted to reports deriving a phylogeny from comparative biological data, as opposed to
* studies that use an existing tree (perhaps pruned or grafted) to test models of character evolution, reconstruct ancestral states, or infer dates?
* studies that derive phylogenies from simulated (not observed) data?
* does it apply to Bayesian studies that integrate over a phylogenetic distribution to compute some statistic (e.g., [[http://dx.doi.org/10.1126/science.288.5475.2349][a gain-loss ratio]]) but do not construe any token tree or modal tree to be representative of this distribution?
* does it apply to population studies for sexual organisms, where tree-like constructs are used, but have a somewhat different meaning, and different typical rules?
* does it apply to tree-like constructs that can be construed as something other than a graph representing an inferred history (e.g., in the simplest case, a phenogram; or in a more complex case, a neighbor-joining tree, which cladists don't accept a as a "phylogeny" because it is a similarity-based clustering method rather than homology-based)
* is it restricted to cases of phylogenetic inference, as opposed to cases in which a phylogeny is observed, generated (via computer simulations), or imposed as an experimental regime (e.g., in lab cultures)?
* in general, will MIAPA arbitrarily exclude rare cases that clearly fall into the domain of phylogenetic analysis, but do not fit a standard workflow, e.g., Warnow and colleagues have developed an efficient method to infer a phylogeny from tens or hundreds of thousands of sequences without inferring an alignment of all the sequences. No alignment, therefore no MIAPA-compliant record?
@
1.15
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318966899" format="1.1" reprev="1.15" version="1.15"}%
d16 1
a16 1
* does it apply to Bayesian studies that integrate over a phylogenetic distribution to compute some statistic (e.g., a gain-loss ratio DOI:10.1126/science.288.5475.2349) but do not construe any token tree or modal tree to be representative of this distribution?
d18 1
a18 1
* does it apply to tree-like constructs that can be construed as something other than a graph representing an inferred history (e.g., in the simplest case, a phenogram; or in a more complex case, a neighbor-joining tree, which parsimonotheists don't accept a as a "phylogeny")
d62 1
a62 1
One way to understand this situation is to study which resources promote re-use, and why. ToLWeb has hundreds of curators and a huge code-base, but it seems to be rarely used for research. By comparison, Phylomatic rocks, even though it is an extremely small project and is limited to plants. The NCBI taxonomy hierarchy provides the organizing framework for a number of projects (e.g., Rod Page's iPhylo; Ross Mounce has started [http://www.citeulike.org/user/rossmounce/tag/ncbitaxonomy a short list of citations for such projects]), in spite of the fact that it is not properly a phylogeny, and it is distributed in a form (SQL database dump) that most users would find intimidating.
d69 1
a69 1
* Zhang S-B, Ferry Slik JW, Zhang J-L, Cao K-F: Spatial patterns of wood traits in China are controlled by phylogeny and the environment. Global Ecology and Biogeography 2011, 20(2):241-250.
d71 1
a71 1
* Riek A: Allometry of milk intake at peak lactation. Mammalian Biology Zeitschrift fur Saugetierkunde 2011, 76(1):3-11.
@
1.14
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318958312" format="1.1" reprev="1.14" version="1.14"}%
d9 6
a14 6
What is the scope and complexity of "phylogenetic analysis" reports to which MIAPA will apply?
* will MIAPA provide coverage of cases other than publication-associated archiving, i.e., will it provide coverage for trees provided by an online resource separate from any specific publication?
* Is the scope restricted to studies involving molecular sequence data (as the MIAPA paper sometimes implies with its references to "alignments" as opposed to the more general term of "character-state data")?
* is it restricted only to reports of phylogenies, as opposed to reports of comparative analyses that use evolutionary principles but do not derive or report a tree (related: the Bayesian question below)?
* is the scope restricted to reports deriving a phylogeny, as opposed to
* studies that use an existing tree (perhaps pruned or grafted) for testing models of character evolution, reconstructing ancestral states, or dating?
d16 1
a16 1
* does it apply to Bayesian studies that integrate over a phylogenetic distribution to compute some other statistic (e.g., a gain-loss ratio DOI:10.1126/science.288.5475.2349) and do not report token trees or modal trees to represent this distribution?
d18 3
a20 1
* does it apply to tree-like constructs that can be construed, or that are typically construed, or that are construed by the originators, as something other than a graph representing an inferred history (e.g., died-in-the-wool cladists don't accept the output of NJ as a "phylogeny"; the parsimony-cladist interpretation of a cladogram is not an inferred history, apparently)
@
1.13
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="KarenCranstn" date="1318954612" format="1.1" version="1.13"}%
d10 1
d18 1
a18 1
* does it apply to tree-like constructs that can be construed, or that are typically construed, or that are construed by the originators, as something other than a graph representing a historical process of descent-with-modification (e.g., died-in-the-wool cladists don't accept the output of NJ as a "phylogeny"; yet the )
d77 1
a77 1
-- Main.ArlinStoltzfus - 07 Oct 2011@
1.12
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318953628" format="1.1" reprev="1.12" version="1.12"}%
a2 3
---++ Checklists
[[Group 2 checklist]]
d76 1
a76 1
-- Main.ArlinStoltzfus - 07 Oct 2011
@
1.11
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="KarenCranstn" date="1318948778" format="1.1" reprev="1.11" version="1.11"}%
d10 12
@
1.10
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318944856" format="1.1" version="1.10"}%
d3 3
d67 1
a67 1
-- Main.ArlinStoltzfus - 07 Oct 2011@
1.9
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318899612" format="1.1" version="1.9"}%
d47 1
a47 1
One way to understand this situation is to study which resources promote re-use, and why. ToLWeb has hundreds of curators and a huge code-base, but it seems to be rarely used for research. By comparison, Phylomatic rocks, even though it is an extremely small project and is limited to plants. The NCBI taxonomy hierarchy rocks even more, although it is not properly a phylogeny, and it is distributed in a form (SQL database dump) that most users would find intimidating.
@
1.8
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318890073" format="1.1" reprev="1.8" version="1.8"}%
d31 2
d64 1
a64 1
-- Main.ArlinStoltzfus - 07 Oct 2011
@
1.7
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318876800" format="1.1" version="1.7"}%
d24 1
a24 1
People have been talking about MIAPA for 5 years now, but there is not a single example of a MIAPA-compliant report. Why don't we make a few? We could use the [[strawman draft of MIAPA][http://www.evoio.org/wiki/MIAPA_Draft_RFC]]. Like most MIBBI standards, it does not specify a format. Therefore, we could use currently available formats, perhaps supplemented with metadata files. Or we could use a NEXUS file with comments (square-bracketed strings) to contain the MIAPA-relevant annotations.
d39 1
a39 1
Interested parties: Arlin Stoltzfus
d56 1
a56 1
Interested parties: Arlin Stoltzfus
d62 1
a62 1
-- Main.ArlinStoltzfus - 07 Oct 2011@
1.6
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="EnricoPontelli" date="1318843946" format="1.1" version="1.6"}%
d9 3
a11 3
* PhylOnt created by Leebens-Mack's team
* CDAO (including the v2 version that includes a classification of methods)
* the Ontology for Biomedical Investigations ([[http://obi-ontology.org/page/Main_Page][OBI]])
@
1.5
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318694811" format="1.1" version="1.5"}%
d7 9
@
1.4
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318607883" format="1.1" reprev="1.4" version="1.4"}%
d15 1
a15 1
People have been talking about MIAPA for 5 years now, but there is not a single example of a MIAPA-compliant report (maybe that explains the lack of progress?). So, why don't we make a few? We can use the [[strawman draft of MIAPA][http://www.evoio.org/wiki/MIAPA_Draft_RFC]] for guidelines. Like most MIBBI standards, it does not specify a format. Therefore, we could use currently available formats, perhaps supplemented with metadata files. Or we could use a NEXUS file with comments (square-bracketed strings) to contain the MIAPA-relevant annotations.
d32 1
a32 1
---+++ Sharing the tree of life: What can we learn from the re-use (and lack of re-use) of mega-phylogenies?
d34 1
a34 1
Here is an idea that is not fully worked out. In particular, it needs to be focused more toward standards (i.e., what kinds of standards would facilitate sharing the tree of life, given user needs and available resources).
d36 1
a36 1
Trees with large numbers of species have many uses in community ecology, bioinformatics, biodiversity studies, etc. Given the amount of money that NSF has poured into "assembling the tree of life" for the benefit of the scientific community, one would think that the scientific community would be benefiting from having a comprehensive tree to use in various types of analyses. However, the benefits seem to be slow in coming, and difficult to harvest.
d38 1
a38 3
One possible route to understanding how to improve this situation is to study which resources have promoted re-use and which ones have not. ToLWeb, APG, Phylomatic, NCBI all offer informatics resources that have been used as an organizing hierarchy of biological diversity. ToLWeb has been around for more than a decade, it has hundreds of curators and a huge code-base, but it seems to be rarely used for research. By comparison, Phylomatic rocks, even though it is an extremely small project and is limited to a plant tree. In terms of stimulating re-use, the NCBI taxonomy hierarchy rocks more, even thought it is not properly a phylogeny, and it is distributed as an SQL database dump. What can we learn from this? For instance, the NCBI taxonomy hierarchy is comprehensive, accessible, computable, and readily linkable to a wealth of other resources via ncbi taxids.
Another way to learn about how to facilitate re-use of megatrees is to look at what users are doing. The studies listed below rely on a large tree to accomplish research objectives such as computing measures of phylogenetic diversity, or studying trait correlations among a large set of species. Maybe we can learn from these studies.
d53 1
a53 1
-- Main.ArlinStoltzfus - 07 Oct 2011
@
1.3
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="MaryamPanahiazar" date="1318448446" format="1.1" reprev="1.3" version="1.3"}%
d7 15
d51 1
a53 7
Interested parties: Jim Leebens-Mack and Maryam Panahiazar
---+++ Annotation Phylogeny Papers with PhylOnt Ontology and other NCBO ontologies
We implemented a tools to annotate concepts in Phylogeny related documents with concepts from ontology such as PhylOnt ontology and other related ontologies.
after annotation user can publish the annotations and retrieve annotation for re-use.
---+++ another idea
@
1.2
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318431976" format="1.1" reprev="1.2" version="1.2"}%
d38 6
@
1.1
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="ArlinStoltzfus" date="1318004109" format="1.1" reprev="1.1" version="1.1"}%
d17 19
@