wiki-archive/twiki/data/Phylogenetics/Group2Checklist.txt,v

head	1.1;
access;
symbols;
locks; strict;
comment	@# @;


1.1
date	2011.10.18.15.52.58;	author KarenCranstn;	state Exp;
branches;
next	;


desc
@none
@


1.1
log
@none
@
text
@%META:TOPICINFO{author="KarenCranstn" date="1318953178" format="1.1" reprev="1.1" version="1.1"}%
%META:TOPICPARENT{name="MIAPAWorkshop2011Ideas"}%
---++Checklist for group 2:
Members: Bill Piel, Jim Leebens-Mack, Karen Cranston, Teodor Georgiev

Initial list of terms:
   * topology in digital format
   * branch lengths
   * support values
   * method of analysis (ML / MP / Bayes)
   * samples with valid taxon names
   * character data
   * excluded characters
   * character sets / partitions
   * biorepository collection code
   * specimen number
   * locality data
   * !GenBank accession number
   * tissue sequenced
   * molecular or morphological
   * alignment method
   * consensus method
   * software & version
   * character labels and states for morphological data
   * evolutionary model
   * heuristic search parameters / MCMC settings
   * random seed
   * input trees for consensus / composite trees

Agreed-upon minimums
   * topology
   * support values
   * method of analysis (ML / MP / Bayes)
   * unique and valid OTU label (valid species name, or specimen identifier that could be found in a database somewhere - using some informatics magic)
   * alignment used to construct the tree
   * raw data (pre-cleaning and alignment, e.g. !GenBank ID)
   * alignment method
   * data assembly (how did we go from databased sequence data to final alignment); in the short to medium term, this would likely be text, not machine readable

Discussed:
   * branch lengths with units: ideal, but not required; morphological data may not produce meaningful branch lengths
   * metadata about OTU label
   * specimen data: if a study did new sequencing, specimen data should be included

Other interesting discussion:
   * distinction between what data should be available (the best practices) and how this information should be communicated (in text of paper, as digital object)
   * issue of something being required that might not be relevent for all studies (i.e. support values for a glommogram, valid species names for environmental samples)
   * best practices for analysis and data management: i.e. better to include all characters in matrix and exclude using software settings than removing those characters from the matrix itself


-- Main.KarenCranstn - 18 Oct 2011
@