just-the-docs/docs/search.md
Matt Wang d7e4a808b5
Fix incorrect HTML in theme & docs; validate HTML in CI (#1305)
This PR is motivated from https://github.com/just-the-docs/just-the-docs/pull/1259#issuecomment-1655899503. It adds a new workflow (`CI / Validate HTML (3.1)`) that validates the output of `bundle exec jekyll build`. It does this with two separate tools:

1. The [`html5validator-action`](https://github.com/Cyb3r-Jak3/html5validator-action), which is a wrapper (Docker image + argument forwarding) around the [Nu HTML checker](https://github.com/validator/validator), which is what is used by the [W3C markup validation service](https://validator.w3.org/)
2. [`html-proofer`](https://github.com/gjtorikian/html-proofer), which performs auxiliary checks on the validity of script, image, and link *values*, but not the markup itself
    - note: prior versions of `html-proofer` did use nokogiri to also validate HTML, but the author has elected to remove that feature in versions 4+

I then fix a few issues that are flagged by these tools. I'll split this into,

**changes affecting users**:
- strictly incorrect: in `_layouts/minimal.html`, a `<div>` had duplicate `id`s. I've removed the incorrect one, which is related to...
- semantically wrong (but not technically incorrect): in both `minimal` and `default` layouts, we had two `<div>` tags with `id="main-content-wrap"`. These don't do anything; the associated styling is with the *class* `main-content-wrap`. I've elected to remove these `id`s to avoid confusion and keep the layouts in sync; however, **this is technically a breaking change**
    - observe that `#main-content` is used for the "skip to main content" feature, which I missed in an earlier iteration of this PR

**changes affecting only our documentation**
- a broken link to mermaid docs (I've changed it to a valid one)
- an incorrectly-specified `aux_link` to our own repository
- various links that point to the bare URL `another-page`, which is clearly invalid; I've changed these to point to our homepage
- an incorrect header link
- various links to `http://example.com`, which I've changed to point to our homepage
- an incorrect link to `@flyx`'s profile for the AsciiDoctor gist
- a handful of (otherwise-valid) `http` links that should be `https`: the lunr docs, and patrick's personal website

The commit history shows the Nu validator flagging issues in CI properly in commits [4128b23](4128b23ef2) and [3527220](35272203ba).

## relevant configuration

- I exclude `github.com` URLs from external link checks with `html-proofer`. This is because GitHub does not like it when we ping too frequently, and rate limits us, which in turn provides many false positives. This is aligned with their documentation, which uses this ignore.
- I've pinned the hash for the 3rd-party action that wraps the W3C markup validation service. This aligns with #1148, but means that we'll have to keep an eye on it for updates.
2023-08-19 21:17:26 -04:00

5.6 KiB

layout, title, nav_order
layout title nav_order
default Search 7

Search

{: .no_toc }

Table of contents

{: .no_toc .text-delta }

  1. TOC {:toc}

Just the Docs uses lunr.js to add a client-side search interface powered by a JSON index that Jekyll generates. All search results are shown in an auto-complete style interface (there is no search results page). By default, all generated HTML pages are indexed using the following data points:

  • Page title
  • Page content
  • Page URL

Enable search in configuration

In your site's _config.yml, enable search:

# Enable or disable the site search
# Supports true (default) or false
search_enabled: true

Search granularity

Pages are split into sections that can be searched individually. The sections are defined by the headings on the page. Each section is displayed in a separate search result.

# Split pages into sections that can be searched individually
# Supports 1 - 6, default: 2
search.heading_level: 2

Search previews

A search result can contain previews that show where the search words are found in the specific section.

# Maximum amount of previews per search result
# Default: 3
search.previews: 3

# Maximum amount of words to display before a matched word in the preview
# Default: 5
search.preview_words_before: 5

# Maximum amount of words to display after a matched word in the preview
# Default: 10
search.preview_words_after: 10

Search tokenizer

The default is for hyphens to separate tokens in search terms: gem-based is equivalent to gem based, matching either word. To allow search for hyphenated words:

# Set the search token separator
# Default: /[\s\-/]+/
# Example: enable support for hyphenated search words
search.tokenizer_separator: /[\s/]+/

Display URL in search results

# Display the relative url in search results
# Supports true (default) or false
search.rel_url: false

Display search button

The search button displays in the bottom right corner of the screen and triggers the search input when clicked.

# Enable or disable the search button that appears in the bottom right corner of every page
# Supports true or false (default)
search.button: true

Sometimes you might have a page that you don't want to be indexed for the search nor to show up in search results, e.g., a 404 page. To exclude a page from search, add the search_exclude: true parameter to the page's YAML front matter:

Example

{: .no_toc }

---
layout: default
title: Page not found
nav_exclude: true
search_exclude: true
---

Generate search index when used as a gem

If you use Just the Docs as a remote theme, you do not need the following steps.

If you use the theme as a gem, you must initialize the search by running this rake command that comes with just-the-docs:

$ bundle exec just-the-docs rake search:init

This command creates the assets/js/zzzz-search-data.json file that Jekyll uses to create your search index. Alternatively, you can create the file manually with [this content]({{ site.github.repository_url }}/blob/main/assets/js/zzzz-search-data.json).

Custom content for search index

{: .d-inline-block }

New (v0.4.0) {: .label .label-green }

Advanced {: .label .label-yellow }

By default, the search feature indexes a page's .content, .title, and some headers within the .content. Other data (e.g. front matter, files in _data and assets) is not indexed. Users can customize what is indexed.

{: .warning }

Customizing search indices is an advanced feature that requires Javascript and Liquid knowledge.

  1. When Just the Docs is a local or gem theme, ensure assets/js/zzzz-search-data.json is up-to-date with Generate search index when used as a gem.
  2. Add a new file named _includes/lunr/custom-data.json. Insert custom Liquid code that reads your data (e.g. the page object at include.page) then generates custom Javascript fields that hold the custom data you want to index. Verify these fields in the generated assets/js/search-data.json.
  3. Add a new file named _includes/lunr/custom-index.js. Insert custom Javascript code that reads your custom Javascript fields and inserts them into the search index. You may want to inspect assets/js/just-the-docs.js to better understand the code.

Example

This example adds front matter usage and examples fields to the search index.

_includes/lunr/custom-data.json custom code reads the page usage and examples fields, normalizes the text, and writes the text to custom Javascript myusage and myexamples fields. Javascript fields are similar yet not the same as JSON. jsonify will probably work for most scenarios.

{% raw %}

{%- capture newline %}
{% endcapture -%}
"myusage": {{ include.page.usage | markdownify | replace:newline,' ' | strip_html | normalize_whitespace | strip | jsonify }},
"myexamples": {{ include.page.examples | markdownify | replace:newline,' ' | strip_html | normalize_whitespace | strip | jsonify }},

{% endraw %}

_includes/lunr/custom-index.js custom code is inserted into the Javascript loop of assets/js/just-the-docs.js. All custom Javascript fields are accessed as fields of docs[i] such as docs[i].myusage. Finally, append your custom fields on to the already existing docs[i].content.

const content_to_merge = [docs[i].content, docs[i].myusage, docs[i].myexamples];
docs[i].content = content_to_merge.join(' ');