This PR is motivated from https://github.com/just-the-docs/just-the-docs/pull/1259#issuecomment-1655899503. It adds a new workflow (`CI / Validate HTML (3.1)`) that validates the output of `bundle exec jekyll build`. It does this with two separate tools: 1. The [`html5validator-action`](https://github.com/Cyb3r-Jak3/html5validator-action), which is a wrapper (Docker image + argument forwarding) around the [Nu HTML checker](https://github.com/validator/validator), which is what is used by the [W3C markup validation service](https://validator.w3.org/) 2. [`html-proofer`](https://github.com/gjtorikian/html-proofer), which performs auxiliary checks on the validity of script, image, and link *values*, but not the markup itself - note: prior versions of `html-proofer` did use nokogiri to also validate HTML, but the author has elected to remove that feature in versions 4+ I then fix a few issues that are flagged by these tools. I'll split this into, **changes affecting users**: - strictly incorrect: in `_layouts/minimal.html`, a `<div>` had duplicate `id`s. I've removed the incorrect one, which is related to... - semantically wrong (but not technically incorrect): in both `minimal` and `default` layouts, we had two `<div>` tags with `id="main-content-wrap"`. These don't do anything; the associated styling is with the *class* `main-content-wrap`. I've elected to remove these `id`s to avoid confusion and keep the layouts in sync; however, **this is technically a breaking change** - observe that `#main-content` is used for the "skip to main content" feature, which I missed in an earlier iteration of this PR **changes affecting only our documentation** - a broken link to mermaid docs (I've changed it to a valid one) - an incorrectly-specified `aux_link` to our own repository - various links that point to the bare URL `another-page`, which is clearly invalid; I've changed these to point to our homepage - an incorrect header link - various links to `http://example.com`, which I've changed to point to our homepage - an incorrect link to `@flyx`'s profile for the AsciiDoctor gist - a handful of (otherwise-valid) `http` links that should be `https`: the lunr docs, and patrick's personal website The commit history shows the Nu validator flagging issues in CI properly in commits [4128b23](4128b23ef2
) and [3527220](35272203ba
). ## relevant configuration - I exclude `github.com` URLs from external link checks with `html-proofer`. This is because GitHub does not like it when we ping too frequently, and rate limits us, which in turn provides many false positives. This is aligned with their documentation, which uses this ignore. - I've pinned the hash for the 3rd-party action that wraps the W3C markup validation service. This aligns with #1148, but means that we'll have to keep an eye on it for updates.
5.6 KiB
layout, title, nav_order
layout | title | nav_order |
---|---|---|
default | Search | 7 |
Search
{: .no_toc }
Table of contents
{: .no_toc .text-delta }
- TOC {:toc}
Just the Docs uses lunr.js to add a client-side search interface powered by a JSON index that Jekyll generates. All search results are shown in an auto-complete style interface (there is no search results page). By default, all generated HTML pages are indexed using the following data points:
- Page title
- Page content
- Page URL
Enable search in configuration
In your site's _config.yml
, enable search:
# Enable or disable the site search
# Supports true (default) or false
search_enabled: true
Search granularity
Pages are split into sections that can be searched individually. The sections are defined by the headings on the page. Each section is displayed in a separate search result.
# Split pages into sections that can be searched individually
# Supports 1 - 6, default: 2
search.heading_level: 2
Search previews
A search result can contain previews that show where the search words are found in the specific section.
# Maximum amount of previews per search result
# Default: 3
search.previews: 3
# Maximum amount of words to display before a matched word in the preview
# Default: 5
search.preview_words_before: 5
# Maximum amount of words to display after a matched word in the preview
# Default: 10
search.preview_words_after: 10
Search tokenizer
The default is for hyphens to separate tokens in search terms:
gem-based
is equivalent to gem based
, matching either word.
To allow search for hyphenated words:
# Set the search token separator
# Default: /[\s\-/]+/
# Example: enable support for hyphenated search words
search.tokenizer_separator: /[\s/]+/
Display URL in search results
# Display the relative url in search results
# Supports true (default) or false
search.rel_url: false
Display search button
The search button displays in the bottom right corner of the screen and triggers the search input when clicked.
# Enable or disable the search button that appears in the bottom right corner of every page
# Supports true or false (default)
search.button: true
Hiding pages from search
Sometimes you might have a page that you don't want to be indexed for the search nor to show up in search results, e.g., a 404 page.
To exclude a page from search, add the search_exclude: true
parameter to the page's YAML front matter:
Example
{: .no_toc }
---
layout: default
title: Page not found
nav_exclude: true
search_exclude: true
---
Generate search index when used as a gem
If you use Just the Docs as a remote theme, you do not need the following steps.
If you use the theme as a gem, you must initialize the search by running this rake
command that comes with just-the-docs
:
$ bundle exec just-the-docs rake search:init
This command creates the assets/js/zzzz-search-data.json
file that Jekyll uses to create your search index.
Alternatively, you can create the file manually with [this content]({{ site.github.repository_url }}/blob/main/assets/js/zzzz-search-data.json).
Custom content for search index
{: .d-inline-block }
New (v0.4.0) {: .label .label-green }
Advanced {: .label .label-yellow }
By default, the search feature indexes a page's .content
, .title
, and some headers within the .content
. Other data (e.g. front matter, files in _data
and assets
) is not indexed. Users can customize what is indexed.
{: .warning }
Customizing search indices is an advanced feature that requires Javascript and Liquid knowledge.
- When Just the Docs is a local or gem theme, ensure
assets/js/zzzz-search-data.json
is up-to-date with Generate search index when used as a gem. - Add a new file named
_includes/lunr/custom-data.json
. Insert custom Liquid code that reads your data (e.g. the page object atinclude.page
) then generates custom Javascript fields that hold the custom data you want to index. Verify these fields in the generatedassets/js/search-data.json
. - Add a new file named
_includes/lunr/custom-index.js
. Insert custom Javascript code that reads your custom Javascript fields and inserts them into the search index. You may want to inspectassets/js/just-the-docs.js
to better understand the code.
Example
This example adds front matter usage
and examples
fields to the search index.
_includes/lunr/custom-data.json
custom code reads the page usage
and examples
fields, normalizes the text, and writes the text to custom Javascript myusage
and myexamples
fields. Javascript fields are similar yet not the same as JSON. jsonify
will probably work for most scenarios.
{% raw %}
{%- capture newline %}
{% endcapture -%}
"myusage": {{ include.page.usage | markdownify | replace:newline,' ' | strip_html | normalize_whitespace | strip | jsonify }},
"myexamples": {{ include.page.examples | markdownify | replace:newline,' ' | strip_html | normalize_whitespace | strip | jsonify }},
{% endraw %}
_includes/lunr/custom-index.js
custom code is inserted into the Javascript loop of assets/js/just-the-docs.js
. All custom Javascript fields are accessed as fields of docs[i]
such as docs[i].myusage
. Finally, append your custom fields on to the already existing docs[i].content
.
const content_to_merge = [docs[i].content, docs[i].myusage, docs[i].myexamples];
docs[i].content = content_to_merge.join(' ');