generate new termlist, create instructions for workflow

2020-08-20 14:05:42 -05:00 · 2020-08-20 14:05:42 -05:00 · 8f67b01109
parent b1d1cbb926
commit 8f67b01109
6 changed files with 14568 additions and 34 deletions
--- a/build/README.md
+++ b/build/README.md
@ -1,4 +1,34 @@
-# Build script
+# Workflow for generating a new version of Darwin Core
+
+![workflow diagram](workflow_diagram.png)
+
+1. Download the CSV file for the terms in the namespace to be modified. The `dwc:` namespace terms are [here](https://github.com/tdwg/rs.tdwg.org/blob/master/terms/terms.csv), `dwciri:` terms are [here](https://github.com/tdwg/rs.tdwg.org/blob/master/iri/iri.csv), `dc:` terms are [here](https://github.com/tdwg/rs.tdwg.org/blob/master/dc-for-dwc/dc-for-dwc.csv), and the `dcterms:` terms are [here](https://github.com/tdwg/rs.tdwg.org/blob/master/dcterms-for-dwc/dcterms-for-dwc.csv).
+2. **Note:** *It is highly recommended that you do not hand-edit the raw CSVs with a text editor. Use Libre Office or Open Office (NOT Excel). They will reliably open, close, and edit the file while preserving and escaping commas, quotes, etc. and also not mess up the UTF-8 encoding if you set them up properly.* Delete the columns that do not serve as input to the system. They are: `document_modified`, `term_isDefinedBy`, `term_created`, `term_modified`, `term_deprecated`, `replaces_term`, `replaces1_term`, and `replaces2_term`. That should leave `term_localName` and all of the columns starting with `label` and onwards to the right.
+3. Delete any rows whose terms are not being modified.
+4. Edit the cells whose values need to be changed.
+5. If a new term is being added, fill in a new row anywhere below the header row.
+6. Special care must be taken if columns are added (i.e. metadata properties are added). This is not for the faint of heart! The new columns must be added to every file used as source data for the various scripts and the column header mapping files also need to be edited. See [this page](for more details). This should should be a rare event. DO NOT ever delete columns! If you want to elimite values for a property, just leave empty strings in all of the cells of that property's column.
+7. Create a new branch (or fork if you don't have push rights) of the [rs.tdwg.org repo](https://github.com/tdwg/rs.tdwg.org). Save your edited CSV file using some notable name in the [process](https://github.com/tdwg/rs.tdwg.org/tree/master/process) directory. 
+8. Open the [simplified_process_rs_tdwg_org.ipynb](https://github.com/tdwg/rs.tdwg.org/blob/master/process/simplified_process_rs_tdwg_org.ipynb) Jupyter notebook and follow [these instructions](https://github.com/tdwg/rs.tdwg.org/blob/master/process/process-vocabulary.md#21-setup) to edit the configuration section of the script. 
+9. Run the script, paying careful attention to whether particular sections are appropriate for what you are trying to accomplish. NOTE: there are still some kinks to be worked out for the borrowed terms (`dc:` and `dcterms:` namespaces), but changes there should be rare. It is useful to monitor the diffs that are generated as sections of the script are run and make sure that the changes are reasonable. This is easily monitored if you are using the GitHub desktop client.
+10. If there are changes to more than one namespace, repeat all of the previous steps with the second namespace before continuing on.
+11. When you are satisfied that all of the term, term list, vocabulary, and standards metadata changes are sensible, discard the changes made to the Jupyter notebook so that it will remain in it's "example" stage when the branch (or fork) is merged. Alternately, you can download the "example" notebook from GitHub to write over the version that you modified, and commit it to the branch.
+12. As of 2020-08-20, updating rs.tdwg.org document metadata must be done manually. Steve Baskauf knows how to do it and will try to eventually write a script to automate the process. It's best to ask him to do the updating before merging the branch.
+13. Push the branch to GitHub and create a pull request. It is best for someone to review the changes carefully before merging.
+14. Once the branch has been merged the data are available via HTTP to the other scripts that use those data. 
+15. Create a branch of the [Darwin Core repo](https://github.com/tdwg/dwc). 
+16. Edit the [termlist-header.md](https://github.com/tdwg/dwc/blob/master/build/termlist-header.md) file, changing "Date version issued", the "This version" URL, the version URL in the citation, and the date in the "1 Introduction" to the date used for the new version of Darwin Core. Change the "Previous version" date to the date of the version that is being replaced. Save the file.
+17. Go to the `docs/list/` directory and change the name of the `index.md` file to the date of the version being replaced (e.g. `2020-08-12.md`). Open that file and add a "Replaced by" label and value to the IRI of the new version (see an older version for an example). Save the file.
+18. Run the script [build-termlist.py](https://github.com/tdwg/dwc/blob/master/build/build-termlist.py). Be patient since some steps take a few seconds. When the `Done` message appears, it's finished.
+19. Check the diff for the newly generated `index.md` file in the [docs/list/](https://github.com/tdwg/dwc/tree/master/docs/list) directory and make sure that the changes are appropriate.
+20. Run the [generate_term_versions.py](https://github.com/tdwg/dwc/blob/master/build/generate_term_versions.py) script to generate a new version of [term_versions.csv](https://github.com/tdwg/dwc/blob/master/vocabulary/term_versions.csv). This file serves as the source of data for the build script in the next step. At some point, that script may be modified to eliminate this intermediate step. 
+21. Run the [build.py](https://github.com/tdwg/dwc/blob/master/build/build.py) script to build the Quick Reference Guide.
+22. Create a pull request for the new branch.
+23. When the branch has been reviewed carefully, merge the branch. The new pages shuld be live as soon as Jekyll rebuilds them on GitHub.
+24. Term dereferencing to human and machine readable representations is handled by a server managed by GBIF. Ask Matt Blissett to reload the data from the `rs.tdwg.org` repo into the server (he has a script to do it.). Because dereferencing of current terms to human-readable web pages is handled by a redirect, there won't be any noticeable difference whether the data are reloaded in this step or not. But dereferencing the term versions, or dereferencing to acquire machine readable metadata will not reflect the new changes until the server is reloaded.
+
+
+## Build script

 The build script `build.py` uses as input:

--- a/build/build-termlist.py
+++ b/build/build-termlist.py
@ -0,0 +1,403 @@
+# Script to build Markdown pages that provide term metadata for complex vocabularies
+# Steve Baskauf 2020-08-12 CC0
+# This script merges static Markdown header and footer documents with term information tables (in Markdown) generated from data in the rs.tdwg.org repo from the TDWG Github site
+
+import re
+import requests   # best library to manage HTTP transactions
+import csv        # library to read/write/parse CSV files
+import json       # library to convert JSON to Python data structures
+import pandas as pd
+
+# -----------------
+# Configuration section
+# -----------------
+
+# !!!! NOTE !!!!
+# There is not currently an example of a complex vocabulary that has the column headers
+# used in the sample files. In order to test this script, it uses the Audubon Core files,
+# which have headers that differ from the samples. So throughout the code, there are
+# pairs of lines where the default header names are commented out and the Audubon Core
+# headers are not. To build a page using the sample files, you will need to reverse the
+# commenting of these pairs.
+
+# This is the base URL for raw files from the branch of the repo that has been pushed to GitHub
+githubBaseUri = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/'
+
+headerFileName = 'termlist-header.md'
+footerFileName = 'termlist-footer.md'
+outFileName = '../docs/list/index.md'
+
+# This is a Python list of the database names of the term lists to be included in the document.
+termLists = ['terms', 'iri', 'dc-for-dwc', 'dcterms-for-dwc']
+#termLists = ['pathway']
+
+# NOTE! There may be problems unless every term list is of the same vocabulary type since the number of columns will differ
+# However, there probably aren't any circumstances where mixed types will be used to generate the same page.
+vocab_type = 1 # 1 is simple vocabulary, 2 is simple controlled vocabulary, 3 is c.v. with broader hierarchy
+
+# Terms in large vocabularies like Darwin and Audubon Cores may be organized into categories using tdwgutility_organizedInClass
+# If so, those categories can be used to group terms in the generated term list document.
+organized_in_categories = True
+
+# If organized in categories, the display_order list must contain the IRIs that are values of tdwgutility_organizedInClass
+# If not organized into categories, the value is irrelevant. There just needs to be one item in the list.
+display_order = ['', 'http://purl.org/dc/elements/1.1/', 'http://purl.org/dc/terms/', 'http://rs.tdwg.org/dwc/terms/Occurrence', 'http://rs.tdwg.org/dwc/terms/Organism', 'http://rs.tdwg.org/dwc/terms/MaterialSample', 'http://rs.tdwg.org/dwc/terms/Event', 'http://purl.org/dc/terms/Location', 'http://rs.tdwg.org/dwc/terms/GeologicalContext', 'http://rs.tdwg.org/dwc/terms/Identification', 'http://rs.tdwg.org/dwc/terms/Taxon', 'http://rs.tdwg.org/dwc/terms/MeasurementOrFact', 'http://rs.tdwg.org/dwc/terms/ResourceRelationship', 'http://rs.tdwg.org/dwc/terms/attributes/UseWithIRI']
+display_label = ['Record level', 'Dublin Core legacy namespace', 'Dublin Core terms namespace', 'Occurrence', 'Organism', 'Material Sample', 'Event', 'Location', 'Geological Context', 'Identification', 'Taxon', 'Measurement or Fact', 'Resource Relationship', 'IRI-value terms']
+display_comments = ['','','','','','','','','','','','','','']
+display_id = ['record_level', 'dc', 'dcterms', 'occurrence', 'organism', 'material_sample', 'event', 'location', 'geological_context', 'identification', 'taxon', 'measurement_or_fact', 'resource_relationship', 'use_with_iri']
+
+#display_order = ['']
+#display_label = ['Vocabulary'] # these are the section labels for the categories in the page
+#display_comments = [''] # these are the comments about the category to be appended following the section labels
+#display_id = ['Vocabulary'] # these are the fragment identifiers for the associated sections for the categories
+
+# ---------------
+# Function definitions
+# ---------------
+
+# replace URL with link
+#
+def createLinks(text):
+    def repl(match):
+        if match.group(1)[-1] == '.':
+            return '<a href="' + match.group(1)[:-1] + '">' + match.group(1)[:-1] + '</a>.'
+        return '<a href="' + match.group(1) + '">' + match.group(1) + '</a>'
+
+    pattern = '(https?://[^\s,;\)"]*)'
+    result = re.sub(pattern, repl, text)
+    return result
+
+# ---------------
+# Retrieve term list metadata from GitHub
+# ---------------
+
+print('Retrieving term list metadata from GitHub')
+term_lists_info = []
+
+frame = pd.read_csv(githubBaseUri + 'term-lists/term-lists.csv', na_filter=False)
+for termList in termLists:
+    term_list_dict = {'list_iri': termList}
+    term_list_dict = {'database': termList}
+    for index,row in frame.iterrows():
+        if row['database'] == termList:
+            term_list_dict['pref_ns_prefix'] = row['vann_preferredNamespacePrefix']
+            term_list_dict['pref_ns_uri'] = row['vann_preferredNamespaceUri']
+            term_list_dict['list_iri'] = row['list']
+    term_lists_info.append(term_list_dict)
+print(term_lists_info)
+print()
+
+# ---------------
+# Create metadata table and populate using data from namespace databases in GitHub
+# ---------------
+
+# Create column list
+column_list = ['pref_ns_prefix', 'pref_ns_uri', 'term_localName', 'label', 'rdfs_comment', 'dcterms_description', 'examples', 'term_modified', 'term_deprecated', 'rdf_type', 'replaces_term', 'replaces1_term']
+#column_list = ['pref_ns_prefix', 'pref_ns_uri', 'term_localName', 'label', 'definition', 'usage', 'notes', 'term_modified', 'term_deprecated', 'type']
+if vocab_type == 2:
+    column_list += ['controlled_value_string']
+elif vocab_type == 3:
+    column_list += ['controlled_value_string', 'skos_broader']
+if organized_in_categories:
+    column_list.append('tdwgutility_organizedInClass')
+column_list.append('version_iri')
+
+print('Retrieving metadata about terms from all namespaces from GitHub')
+# Create list of lists metadata table
+table_list = []
+for term_list in term_lists_info:
+    # retrieve versions metadata for term list
+    versions_url = githubBaseUri + term_list['database'] + '-versions/' + term_list['database'] + '-versions.csv'
+    versions_df = pd.read_csv(versions_url, na_filter=False)
+    
+    # retrieve current term metadata for term list
+    data_url = githubBaseUri + term_list['database'] + '/' + term_list['database'] + '.csv'
+    frame = pd.read_csv(data_url, na_filter=False)
+    for index,row in frame.iterrows():
+        row_list = [term_list['pref_ns_prefix'], term_list['pref_ns_uri'], row['term_localName'], row['label'], row['rdfs_comment'], row['dcterms_description'], row['examples'], row['term_modified'], row['term_deprecated'], row['rdf_type'], row['replaces_term'], row['replaces1_term']]
+        #row_list = [term_list['pref_ns_prefix'], term_list['pref_ns_uri'], row['term_localName'], row['label'], row['definition'], row['usage'], row['notes'], row['term_modified'], row['term_deprecated'], row['type']]
+        if vocab_type == 2:
+            row_list += [row['controlled_value_string']]
+        elif vocab_type == 3:
+            if row['skos_broader'] =='':
+                row_list += [row['controlled_value_string'], '']
+            else:
+                row_list += [row['controlled_value_string'], term_list['pref_ns_prefix'] + ':' + row['skos_broader']]
+        if organized_in_categories:
+            row_list.append(row['tdwgutility_organizedInClass'])
+
+        # Borrowed terms really don't have implemented versions. They may be lacking values for version_status.
+        # In their case, their version IRI will be omitted.
+        found = False
+        for vindex, vrow in versions_df.iterrows():
+            if vrow['term_localName']==row['term_localName'] and vrow['version_status']=='recommended':
+                found = True
+                version_iri = vrow['version']
+                # NOTE: the current hack for non-TDWG terms without a version is to append # to the end of the term IRI
+                if version_iri[len(version_iri)-1] == '#':
+                    version_iri = ''
+        if not found:
+            version_iri = ''
+        row_list.append(version_iri)
+
+        table_list.append(row_list)
+
+# Turn list of lists into dataframe
+terms_df = pd.DataFrame(table_list, columns = column_list)
+
+terms_sorted_by_label = terms_df.sort_values(by='label')
+#terms_sorted_by_localname = terms_df.sort_values(by='term_localName')
+
+# This makes sort case insensitive
+terms_sorted_by_localname = terms_df.iloc[terms_df.term_localName.str.lower().argsort()]
+#terms_sorted_by_localname
+print('done retrieving')
+print()
+
+# ---------------
+# generate the index of terms grouped by category and sorted alphabetically by lowercase term local name
+# ---------------
+
+print('Generating term index by CURIE')
+text = '### 3.1 Index By Term Name\n\n'
+text += '(See also [3.2 Index By Label](#32-index-by-label))\n\n'
+
+text += '**Classes**\n'
+text += '\n'
+for row_index,row in terms_sorted_by_localname.iterrows():
+    if row['rdf_type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
+        curie = row['pref_ns_prefix'] + ":" + row['term_localName']
+        curie_anchor = curie.replace(':','_')
+        text += '[' + curie + '](#' + curie_anchor + ') |\n'
+text = text[:len(text)-2] # remove final trailing vertical bar and newline
+text += '\n\n' # put back removed newline
+
+for category in range(0,len(display_order)):
+    text += '**' + display_label[category] + '**\n'
+    text += '\n'
+    if organized_in_categories:
+        filtered_table = terms_sorted_by_localname[terms_sorted_by_localname['tdwgutility_organizedInClass']==display_order[category]]
+        filtered_table.reset_index(drop=True, inplace=True)
+    else:
+        filtered_table = terms_sorted_by_localname
+        
+    for row_index,row in filtered_table.iterrows():
+        if row['rdf_type'] != 'http://www.w3.org/2000/01/rdf-schema#Class':
+            curie = row['pref_ns_prefix'] + ":" + row['term_localName']
+            curie_anchor = curie.replace(':','_')
+            text += '[' + curie + '](#' + curie_anchor + ') |\n'
+    text = text[:len(text)-2] # remove final trailing vertical bar and newline
+    text += '\n\n' # put back removed newline
+
+index_by_name = text
+
+#print(index_by_name)
+print()
+
+# ---------------
+# generate the index of terms by label
+# ---------------
+
+print('Generating term index by label')
+text = '\n\n'
+
+# Comment out the following two lines if there is no index by local names
+text = '### 3.2 Index By Label\n\n'
+text += '(See also [3.1 Index By Term Name](#31-index-by-term-name))\n\n'
+
+text += '**Classes**\n'
+text += '\n'
+for row_index,row in terms_sorted_by_label.iterrows():
+    if row['rdf_type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
+        curie_anchor = row['pref_ns_prefix'] + "_" + row['term_localName']
+        text += '[' + row['label'] + '](#' + curie_anchor + ') |\n'
+text = text[:len(text)-2] # remove final trailing vertical bar and newline
+text += '\n\n' # put back removed newline
+
+for category in range(0,len(display_order)):
+    if organized_in_categories:
+        text += '**' + display_label[category] + '**\n'
+        text += '\n'
+        filtered_table = terms_sorted_by_label[terms_sorted_by_label['tdwgutility_organizedInClass']==display_order[category]]
+        filtered_table.reset_index(drop=True, inplace=True)
+    else:
+        filtered_table = terms_sorted_by_label
+        
+    for row_index,row in filtered_table.iterrows():
+        if row_index == 0 or (row_index != 0 and row['label'] != filtered_table.iloc[row_index - 1].loc['label']): # this is a hack to prevent duplicate labels
+            if row['rdf_type'] != 'http://www.w3.org/2000/01/rdf-schema#Class':
+                curie_anchor = row['pref_ns_prefix'] + "_" + row['term_localName']
+                text += '[' + row['label'] + '](#' + curie_anchor + ') |\n'
+    text = text[:len(text)-2] # remove final trailing vertical bar and newline
+    text += '\n\n' # put back removed newline
+
+index_by_label = text
+print()
+
+#print(index_by_label)
+
+decisions_df = pd.read_csv('https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/decisions/decisions-links.csv', na_filter=False)
+
+# ---------------
+# generate a table for each term, with terms grouped by category
+# ---------------
+
+print('Generating terms table')
+# generate the Markdown for the terms table
+text = '## 4 Vocabulary\n'
+if True:
+    filtered_table = terms_sorted_by_localname
+
+#for category in range(0,len(display_order)):
+#    if organized_in_categories:
+#        text += '### 4.' + str(category + 1) + ' ' + display_label[category] + '\n'
+#        text += '\n'
+#        text += display_comments[category] # insert the comments for the category, if any.
+#        filtered_table = terms_sorted_by_localname[terms_sorted_by_localname['tdwgutility_organizedInClass']==display_order[category]]
+#        filtered_table.reset_index(drop=True, inplace=True)
+#    else:
+#        filtered_table = terms_sorted_by_localname
+
+    for row_index,row in filtered_table.iterrows():
+        text += '<table>\n'
+        curie = row['pref_ns_prefix'] + ":" + row['term_localName']
+        curieAnchor = curie.replace(':','_')
+        text += '\t<thead>\n'
+        text += '\t\t<tr>\n'
+        text += '\t\t\t<th colspan="2"><a id="' + curieAnchor + '"></a>Term Name  ' + curie + '</th>\n'
+        text += '\t\t</tr>\n'
+        text += '\t</thead>\n'
+        text += '\t<tbody>\n'
+        text += '\t\t<tr>\n'
+        text += '\t\t\t<td>Term IRI</td>\n'
+        uri = row['pref_ns_uri'] + row['term_localName']
+        text += '\t\t\t<td><a href="' + uri + '">' + uri + '</a></td>\n'
+        text += '\t\t</tr>\n'
+        text += '\t\t<tr>\n'
+        text += '\t\t\t<td>Modified</td>\n'
+        text += '\t\t\t<td>' + row['term_modified'] + '</td>\n'
+        text += '\t\t</tr>\n'
+
+        if row['version_iri'] != '':
+            text += '\t\t<tr>\n'
+            text += '\t\t\t<td>Term version IRI</td>\n'
+            text += '\t\t\t<td><a href="' + row['version_iri'] + '">' + row['version_iri'] + '</a></td>\n'
+            text += '\t\t</tr>\n'
+
+        text += '\t\t<tr>\n'
+        text += '\t\t\t<td>Label</td>\n'
+        text += '\t\t\t<td>' + row['label'] + '</td>\n'
+        text += '\t\t</tr>\n'
+
+        if row['term_deprecated'] != '':
+            text += '\t\t<tr>\n'
+            text += '\t\t\t<td></td>\n'
+            text += '\t\t\t<td><strong>This term is deprecated and should no longer be used.</strong></td>\n'
+            text += '\t\t</tr>\n'
+
+            for dep_index,dep_row in filtered_table.iterrows():
+                if dep_row['replaces_term'] == uri:
+                    text += '\t\t<tr>\n'
+                    text += '\t\t\t<td>Is replaced by</td>\n'
+                    text += '\t\t\t<td><a href="#' + dep_row['pref_ns_prefix'] + "_" + dep_row['term_localName'] + '">' + dep_row['pref_ns_uri'] + dep_row['term_localName'] + '</a></td>\n'
+                    text += '\t\t</tr>\n'
+                if dep_row['replaces1_term'] == uri:
+                    text += '\t\t<tr>\n'
+                    text += '\t\t\t<td>Is replaced by</td>\n'
+                    text += '\t\t\t<td><a href="#' + dep_row['pref_ns_prefix'] + "_" + dep_row['term_localName'] + '">' + dep_row['pref_ns_uri'] + dep_row['term_localName'] + '</a></td>\n'
+                    text += '\t\t</tr>\n'
+
+        text += '\t\t<tr>\n'
+        text += '\t\t\t<td>Definition</td>\n'
+        text += '\t\t\t<td>' + row['rdfs_comment'] + '</td>\n'
+        #text += '\t\t\t<td>' + row['definition'] + '</td>\n'
+        text += '\t\t</tr>\n'
+
+        if row['dcterms_description'] != '':
+        #if row['notes'] != '':
+            text += '\t\t<tr>\n'
+            text += '\t\t\t<td>Notes</td>\n'
+            text += '\t\t\t<td>' + createLinks(row['dcterms_description']) + '</td>\n'
+            #text += '\t\t\t<td>' + createLinks(row['notes']) + '</td>\n'
+            text += '\t\t</tr>\n'
+
+        if row['examples'] != '':
+        #if row['usage'] != '':
+            text += '\t\t<tr>\n'
+            text += '\t\t\t<td>Examples</td>\n'
+            text += '\t\t\t<td>' + createLinks(row['examples']) + '</td>\n'
+            #text += '\t\t\t<td>' + createLinks(row['usage']) + '</td>\n'
+            text += '\t\t</tr>\n'
+
+        if vocab_type == 2 or vocab_type ==3: # controlled vocabulary
+            text += '\t\t<tr>\n'
+            text += '\t\t\t<td>Controlled value</td>\n'
+            text += '\t\t\t<td>' + row['controlled_value_string'] + '</td>\n'
+            text += '\t\t</tr>\n'
+
+        if vocab_type == 3 and row['skos_broader'] != '': # controlled vocabulary with skos:broader relationships
+            text += '\t\t<tr>\n'
+            text += '\t\t\t<td>Has broader concept</td>\n'
+            curieAnchor = row['skos_broader'].replace(':','_')
+            text += '\t\t\t<td><a href="#' + curieAnchor + '">' + row['skos_broader'] + '</a></td>\n'
+            text += '\t\t</tr>\n'
+
+        text += '\t\t<tr>\n'
+        text += '\t\t\t<td>Type</td>\n'
+        if row['rdf_type'] == 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property':
+        #if row['type'] == 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property':
+            text += '\t\t\t<td>Property</td>\n'
+        elif row['rdf_type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
+        #elif row['type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
+            text += '\t\t\t<td>Class</td>\n'
+        elif row['rdf_type'] == 'http://www.w3.org/2004/02/skos/core#Concept':
+        #elif row['type'] == 'http://www.w3.org/2004/02/skos/core#Concept':
+            text += '\t\t\t<td>Concept</td>\n'
+        else:
+            text += '\t\t\t<td>' + row['rdf_type'] + '</td>\n' # this should rarely happen
+            #text += '\t\t\t<td>' + row['type'] + '</td>\n' # this should rarely happen
+        text += '\t\t</tr>\n'
+
+        # Look up decisions related to this term
+        for drow_index,drow in decisions_df.iterrows():
+            if drow['linked_affected_resource'] == uri:
+                text += '\t\t<tr>\n'
+                text += '\t\t\t<td>Executive Committee decision</td>\n'
+                text += '\t\t\t<td><a href="http://rs.tdwg.org/decisions/' + drow['decision_localName'] + '">http://rs.tdwg.org/decisions/' + drow['decision_localName'] + '</a></td>\n'
+                text += '\t\t</tr>\n'                        
+
+        text += '\t</tbody>\n'
+        text += '</table>\n'
+        text += '\n'
+    text += '\n'
+term_table = text
+print('done generating')
+print()
+
+#print(term_table)
+
+# ---------------
+# Merge term table with header and footer Markdown, then save file
+# ---------------
+
+print('Merging term table with header and footer and saving file')
+#text = index_by_label + term_table
+text = index_by_name + index_by_label + term_table
+
+# read in header and footer, merge with terms table, and output
+
+headerObject = open(headerFileName, 'rt', encoding='utf-8')
+header = headerObject.read()
+headerObject.close()
+
+footerObject = open(footerFileName, 'rt', encoding='utf-8')
+footer = footerObject.read()
+footerObject.close()
+
+output = header + text + footer
+outputObject = open(outFileName, 'wt', encoding='utf-8')
+outputObject.write(output)
+outputObject.close()
+    
+print('done')
+
--- a/build/termlist-header.md
+++ b/build/termlist-header.md
@ -4,7 +4,7 @@ Title
 : List of Darwin Core terms

 Date version issued
-: 2020-08-12
+: 2020-08-20

 Date created
 : 2020-08-12
@ -13,11 +13,14 @@ Part of TDWG Standard
 : <http://www.tdwg.org/standards/450>

 This version
-: <http://rs.tdwg.org/dwc/doc/list/2020-08-12>
+: <http://rs.tdwg.org/dwc/doc/list/2020-08-20>

 Latest version
 : <http://rs.tdwg.org/dwc/doc/list/>

+Previous version
+: <http://rs.tdwg.org/dwc/doc/list/2020-08-12>
+
 Abstract
 : Darwin Core is a vocabulary standard for transmitting information about biodiversity. This document lists all terms in namespaces currently used in the vocabulary.

@ -28,12 +31,12 @@ Creator
 : TDWG Darwin Core Maintenance Group

 Bibliographic citation
-: Darwin Core Maintenance Group. 2020. List of Darwin Core terms. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/list/2020-08-12>
+: Darwin Core Maintenance Group. 2020. List of Darwin Core terms. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/list/2020-08-20>


 ## 1 Introduction (Informative)

-This document contains terms that are part of the most recent version of the Darwin Core vocabulary (http://rs.tdwg.org/version/dwc/2020-08-12).
+This document contains terms that are part of the most recent version of the Darwin Core vocabulary (http://rs.tdwg.org/version/dwc/2020-08-20).

 This document includes terms in four namespaces that contain recommended terms: `dwc:`, `dwciri:`, `dc:`, and `dcterms:`. However, some terms in these namespaces are deprecated and should no longer be used. Deprecation is noted in the term metadata. Namespaces that contain only deprecated terms are not included in this document, but metadata about those terms can be retrieved by dereferencing their IRIs.

--- a/build/workflow_diagram.png
+++ b/build/workflow_diagram.png
--- a/docs/list/2020-08-12.md
+++ b/docs/list/2020-08-12.md
--- a/docs/list/index.md
+++ b/docs/list/index.md
@ -4,7 +4,7 @@ Title
 : List of Darwin Core terms

 Date version issued
-: 2020-08-12
+: 2020-08-20

 Date created
 : 2020-08-12
@ -13,11 +13,14 @@ Part of TDWG Standard
 : <http://www.tdwg.org/standards/450>

 This version
-: <http://rs.tdwg.org/dwc/doc/list/2020-08-12>
+: <http://rs.tdwg.org/dwc/doc/list/2020-08-20>

 Latest version
 : <http://rs.tdwg.org/dwc/doc/list/>

+Previous version
+: <http://rs.tdwg.org/dwc/doc/list/2020-08-12>
+
 Abstract
 : Darwin Core is a vocabulary standard for transmitting information about biodiversity. This document lists all terms in namespaces currently used in the vocabulary.

@ -28,12 +31,12 @@ Creator
 : TDWG Darwin Core Maintenance Group

 Bibliographic citation
-: Darwin Core Maintenance Group. 2020. List of Darwin Core terms. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/list/2020-08-12>
+: Darwin Core Maintenance Group. 2020. List of Darwin Core terms. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/list/2020-08-20>


 ## 1 Introduction (Informative)

-This document contains terms that are part of the most recent version of the Darwin Core vocabulary (http://rs.tdwg.org/version/dwc/2020-08-12).
+This document contains terms that are part of the most recent version of the Darwin Core vocabulary (http://rs.tdwg.org/version/dwc/2020-08-20).

 This document includes terms in four namespaces that contain recommended terms: `dwc:`, `dwciri:`, `dc:`, and `dcterms:`. However, some terms in these namespaces are deprecated and should no longer be used. Deprecation is noted in the term metadata. Namespaces that contain only deprecated terms are not included in this document, but metadata about those terms can be retrieved by dereferencing their IRIs.

@ -3036,11 +3039,11 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Modified</td>
-			<td>2017-10-06</td>
+			<td>2020-08-20</td>
 		</tr>
 		<tr>
 			<td>Term version IRI</td>
-			<td><a href="http://rs.tdwg.org/dwc/terms/version/endDayOfYear-2017-10-06">http://rs.tdwg.org/dwc/terms/version/endDayOfYear-2017-10-06</a></td>
+			<td><a href="http://rs.tdwg.org/dwc/terms/version/endDayOfYear-2020-08-20">http://rs.tdwg.org/dwc/terms/version/endDayOfYear-2020-08-20</a></td>
 		</tr>
 		<tr>
 			<td>Label</td>
@ -3048,7 +3051,7 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Definition</td>
-			<td>The latest ordinal day of the year on which the Event occurred (1 for January 1, 365 for December 31, except in a leap year, in which case it is 366).</td>
+			<td>The latest integer day of the year on which the Event occurred (1 for January 1, 365 for December 31, except in a leap year, in which case it is 366).</td>
 		</tr>
 		<tr>
 			<td>Examples</td>
@ -4362,11 +4365,11 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Modified</td>
-			<td>2017-10-06</td>
+			<td>2020-08-20</td>
 		</tr>
 		<tr>
 			<td>Term version IRI</td>
-			<td><a href="http://rs.tdwg.org/dwc/terms/version/footprintSpatialFit-2017-10-06">http://rs.tdwg.org/dwc/terms/version/footprintSpatialFit-2017-10-06</a></td>
+			<td><a href="http://rs.tdwg.org/dwc/terms/version/footprintSpatialFit-2020-08-20">http://rs.tdwg.org/dwc/terms/version/footprintSpatialFit-2020-08-20</a></td>
 		</tr>
 		<tr>
 			<td>Label</td>
@ -4374,15 +4377,15 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Definition</td>
-			<td>The ratio of the area of the footprint (footprintWKT) to the area of the true (original, or most specific) spatial representation of the Location. Legal values are 0, greater than or equal to 1, or undefined. A value of 1 is an exact match or 100% overlap. A value of 0 should be used if the given footprint does not completely contain the original representation. The footprintSpatialFit is undefined (and should be left blank) if the original representation is a point and the given georeference is not that same point. If both the original and the given georeference are the same point, the footprintSpatialFit is 1.</td>
+			<td>The ratio of the area of the footprint (footprintWKT) to the area of the true (original, or most specific) spatial representation of the Location. Legal values are 0, greater than or equal to 1, or undefined. A value of 1 is an exact match or 100% overlap. A value of 0 should be used if the given footprint does not completely contain the original representation. The footprintSpatialFit is undefined (and should be left empty) if the original representation is a point without uncertainty and the given georeference is not that same point (without uncertainty). If both the original and the given georeference are the same point, the footprintSpatialFit is 1.</td>
 		</tr>
 		<tr>
 			<td>Notes</td>
-			<td>Detailed explanations with graphical examples can be found in the Guide to Best Practices for Georeferencing, Chapman and Wieczorek, eds. 2006.</td>
+			<td>Detailed explanations with graphical examples can be found in the Georeferencing Best Practices, Chapman and Wieczorek, 2020 (<a href="https://doi.org/10.15468/doc-gg7h-s853">https://doi.org/10.15468/doc-gg7h-s853</a>).</td>
 		</tr>
 		<tr>
 			<td>Examples</td>
-			<td>Detailed explanations with graphical examples can be found in the Guide to Best Practices for Georeferencing, Chapman and Wieczorek, eds. 2006.</td>
+			<td>`0`, `1`, `1.5708`</td>
 		</tr>
 		<tr>
 			<td>Type</td>
@ -5048,11 +5051,11 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Modified</td>
-			<td>2017-10-06</td>
+			<td>2020-08-20</td>
 		</tr>
 		<tr>
 			<td>Term version IRI</td>
-			<td><a href="http://rs.tdwg.org/dwc/terms/version/georeferenceProtocol-2017-10-06">http://rs.tdwg.org/dwc/terms/version/georeferenceProtocol-2017-10-06</a></td>
+			<td><a href="http://rs.tdwg.org/dwc/terms/version/georeferenceProtocol-2020-08-20">http://rs.tdwg.org/dwc/terms/version/georeferenceProtocol-2020-08-20</a></td>
 		</tr>
 		<tr>
 			<td>Label</td>
@ -5064,7 +5067,7 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Examples</td>
-			<td>`Guide to Best Practices for Georeferencing. (Chapman and Wieczorek, eds. 2006). Global Biodiversity Information Facility.`, `MaNIS/HerpNet/ORNIS Georeferencing Guidelines`, `Georeferencing Quick Reference Guide`</td>
+			<td>`Georeferencing Quick Reference Guide (Zermoglio et al. 2020, <a href="https://doi.org/10.35035/e09p-h128">https://doi.org/10.35035/e09p-h128</a>)`</td>
 		</tr>
 		<tr>
 			<td>Type</td>
@ -9086,11 +9089,11 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Modified</td>
-			<td>2018-09-06</td>
+			<td>2020-08-20</td>
 		</tr>
 		<tr>
 			<td>Term version IRI</td>
-			<td><a href="http://rs.tdwg.org/dwc/terms/version/Occurrence-2018-09-06">http://rs.tdwg.org/dwc/terms/version/Occurrence-2018-09-06</a></td>
+			<td><a href="http://rs.tdwg.org/dwc/terms/version/Occurrence-2020-08-20">http://rs.tdwg.org/dwc/terms/version/Occurrence-2020-08-20</a></td>
 		</tr>
 		<tr>
 			<td>Label</td>
@ -9102,7 +9105,7 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Examples</td>
-			<td>A wolf pack on the shore of Kluane Lake in 1988. A virus in a plant leaf in a the New York Botanical Garden at 15:29 on 2014-10-23. A fungus in Central Park in the summer of 1929.</td>
+			<td>A wolf pack on the shore of Kluane Lake in 1988. A virus in a plant leaf in the New York Botanical Garden at 15:29 on 2014-10-23. A fungus in Central Park in the summer of 1929.</td>
 		</tr>
 		<tr>
 			<td>Type</td>
@ -10436,11 +10439,11 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Modified</td>
-			<td>2017-10-06</td>
+			<td>2020-08-20</td>
 		</tr>
 		<tr>
 			<td>Term version IRI</td>
-			<td><a href="http://rs.tdwg.org/dwc/terms/version/pointRadiusSpatialFit-2017-10-06">http://rs.tdwg.org/dwc/terms/version/pointRadiusSpatialFit-2017-10-06</a></td>
+			<td><a href="http://rs.tdwg.org/dwc/terms/version/pointRadiusSpatialFit-2020-08-20">http://rs.tdwg.org/dwc/terms/version/pointRadiusSpatialFit-2020-08-20</a></td>
 		</tr>
 		<tr>
 			<td>Label</td>
@ -10448,15 +10451,15 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Definition</td>
-			<td>The ratio of the area of the point-radius (decimalLatitude, decimalLongitude, coordinateUncertaintyInMeters) to the area of the true (original, or most specific) spatial representation of the Location. Legal values are 0, greater than or equal to 1, or undefined. A value of 1 is an exact match or 100% overlap. A value of 0 should be used if the given point-radius does not completely contain the original representation. The pointRadiusSpatialFit is undefined (and should be left blank) if the original representation is a point without uncertainty and the given georeference is not that same point (without uncertainty). If both the original and the given georeference are the same point, the pointRadiusSpatialFit is 1.</td>
+			<td>The ratio of the area of the point-radius (decimalLatitude, decimalLongitude, coordinateUncertaintyInMeters) to the area of the true (original, or most specific) spatial representation of the Location. Legal values are 0, greater than or equal to 1, or undefined. A value of 1 is an exact match or 100% overlap. A value of 0 should be used if the given point-radius does not completely contain the original representation. The pointRadiusSpatialFit is undefined (and should be left empty) if the original representation is a point without uncertainty and the given georeference is not that same point (without uncertainty). If both the original and the given georeference are the same point, the pointRadiusSpatialFit is 1.</td>
 		</tr>
 		<tr>
 			<td>Notes</td>
-			<td>Detailed explanations with graphical examples can be found in the Guide to Best Practices for Georeferencing, Chapman and Wieczorek, eds. 2006.</td>
+			<td>Detailed explanations with graphical examples can be found in the Georeferencing Best Practices, Chapman and Wieczorek, 2020 (<a href="https://doi.org/10.15468/doc-gg7h-s853">https://doi.org/10.15468/doc-gg7h-s853</a>).</td>
 		</tr>
 		<tr>
 			<td>Examples</td>
-			<td>Detailed explanations with graphical examples can be found in the Guide to Best Practices for Georeferencing, Chapman and Wieczorek, eds. 2006.</td>
+			<td>`0`, `1`, `1.5708`</td>
 		</tr>
 		<tr>
 			<td>Type</td>
@ -12700,11 +12703,11 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Modified</td>
-			<td>2017-10-06</td>
+			<td>2020-08-20</td>
 		</tr>
 		<tr>
 			<td>Term version IRI</td>
-			<td><a href="http://rs.tdwg.org/dwc/terms/version/startDayOfYear-2017-10-06">http://rs.tdwg.org/dwc/terms/version/startDayOfYear-2017-10-06</a></td>
+			<td><a href="http://rs.tdwg.org/dwc/terms/version/startDayOfYear-2020-08-20">http://rs.tdwg.org/dwc/terms/version/startDayOfYear-2020-08-20</a></td>
 		</tr>
 		<tr>
 			<td>Label</td>
@ -12712,7 +12715,7 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Definition</td>
-			<td>The earliest ordinal day of the year on which the Event occurred (1 for January 1, 365 for December 31, except in a leap year, in which case it is 366).</td>
+			<td>The earliest integer day of the year on which the Event occurred (1 for January 1, 365 for December 31, except in a leap year, in which case it is 366).</td>
 		</tr>
 		<tr>
 			<td>Examples</td>
@ -13512,11 +13515,11 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Modified</td>
-			<td>2017-10-06</td>
+			<td>2020-08-20</td>
 		</tr>
 		<tr>
 			<td>Term version IRI</td>
-			<td><a href="http://rs.tdwg.org/dwc/terms/version/verbatimCoordinateSystem-2017-10-06">http://rs.tdwg.org/dwc/terms/version/verbatimCoordinateSystem-2017-10-06</a></td>
+			<td><a href="http://rs.tdwg.org/dwc/terms/version/verbatimCoordinateSystem-2020-08-20">http://rs.tdwg.org/dwc/terms/version/verbatimCoordinateSystem-2020-08-20</a></td>
 		</tr>
 		<tr>
 			<td>Label</td>
@ -13524,7 +13527,7 @@ Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](../rdf/#
 		</tr>
 		<tr>
 			<td>Definition</td>
-			<td>The spatial coordinate system for the verbatimLatitude and verbatimLongitude or the verbatimCoordinates of the Location.</td>
+			<td>The coordinate format for the verbatimLatitude and verbatimLongitude or the verbatimCoordinates of the Location.</td>
 		</tr>
 		<tr>
 			<td>Notes</td>