wiki-archive/twiki/data/Phylogenetics/Group2Checklist.txt,v

76 lines
2.4 KiB
Plaintext

head 1.1;
access;
symbols;
locks; strict;
comment @# @;
1.1
date 2011.10.18.15.52.58; author KarenCranstn; state Exp;
branches;
next ;
desc
@none
@
1.1
log
@none
@
text
@%META:TOPICINFO{author="KarenCranstn" date="1318953178" format="1.1" reprev="1.1" version="1.1"}%
%META:TOPICPARENT{name="MIAPAWorkshop2011Ideas"}%
---++Checklist for group 2:
Members: Bill Piel, Jim Leebens-Mack, Karen Cranston, Teodor Georgiev
Initial list of terms:
* topology in digital format
* branch lengths
* support values
* method of analysis (ML / MP / Bayes)
* samples with valid taxon names
* character data
* excluded characters
* character sets / partitions
* biorepository collection code
* specimen number
* locality data
* !GenBank accession number
* tissue sequenced
* molecular or morphological
* alignment method
* consensus method
* software & version
* character labels and states for morphological data
* evolutionary model
* heuristic search parameters / MCMC settings
* random seed
* input trees for consensus / composite trees
Agreed-upon minimums
* topology
* support values
* method of analysis (ML / MP / Bayes)
* unique and valid OTU label (valid species name, or specimen identifier that could be found in a database somewhere - using some informatics magic)
* alignment used to construct the tree
* raw data (pre-cleaning and alignment, e.g. !GenBank ID)
* alignment method
* data assembly (how did we go from databased sequence data to final alignment); in the short to medium term, this would likely be text, not machine readable
Discussed:
* branch lengths with units: ideal, but not required; morphological data may not produce meaningful branch lengths
* metadata about OTU label
* specimen data: if a study did new sequencing, specimen data should be included
Other interesting discussion:
* distinction between what data should be available (the best practices) and how this information should be communicated (in text of paper, as digital object)
* issue of something being required that might not be relevent for all studies (i.e. support values for a glommogram, valid species names for environmental samples)
* best practices for analysis and data management: i.e. better to include all characters in matrix and exclude using software settings than removing those characters from the matrix itself
-- Main.KarenCranstn - 18 Oct 2011
@