just-the-docs/docs/search.md
Matt Wang d7e4a808b5
Fix incorrect HTML in theme & docs; validate HTML in CI (#1305)
This PR is motivated from https://github.com/just-the-docs/just-the-docs/pull/1259#issuecomment-1655899503. It adds a new workflow (`CI / Validate HTML (3.1)`) that validates the output of `bundle exec jekyll build`. It does this with two separate tools:

1. The [`html5validator-action`](https://github.com/Cyb3r-Jak3/html5validator-action), which is a wrapper (Docker image + argument forwarding) around the [Nu HTML checker](https://github.com/validator/validator), which is what is used by the [W3C markup validation service](https://validator.w3.org/)
2. [`html-proofer`](https://github.com/gjtorikian/html-proofer), which performs auxiliary checks on the validity of script, image, and link *values*, but not the markup itself
    - note: prior versions of `html-proofer` did use nokogiri to also validate HTML, but the author has elected to remove that feature in versions 4+

I then fix a few issues that are flagged by these tools. I'll split this into,

**changes affecting users**:
- strictly incorrect: in `_layouts/minimal.html`, a `<div>` had duplicate `id`s. I've removed the incorrect one, which is related to...
- semantically wrong (but not technically incorrect): in both `minimal` and `default` layouts, we had two `<div>` tags with `id="main-content-wrap"`. These don't do anything; the associated styling is with the *class* `main-content-wrap`. I've elected to remove these `id`s to avoid confusion and keep the layouts in sync; however, **this is technically a breaking change**
    - observe that `#main-content` is used for the "skip to main content" feature, which I missed in an earlier iteration of this PR

**changes affecting only our documentation**
- a broken link to mermaid docs (I've changed it to a valid one)
- an incorrectly-specified `aux_link` to our own repository
- various links that point to the bare URL `another-page`, which is clearly invalid; I've changed these to point to our homepage
- an incorrect header link
- various links to `http://example.com`, which I've changed to point to our homepage
- an incorrect link to `@flyx`'s profile for the AsciiDoctor gist
- a handful of (otherwise-valid) `http` links that should be `https`: the lunr docs, and patrick's personal website

The commit history shows the Nu validator flagging issues in CI properly in commits [4128b23](4128b23ef2) and [3527220](35272203ba).

## relevant configuration

- I exclude `github.com` URLs from external link checks with `html-proofer`. This is because GitHub does not like it when we ping too frequently, and rate limits us, which in turn provides many false positives. This is aligned with their documentation, which uses this ignore.
- I've pinned the hash for the 3rd-party action that wraps the W3C markup validation service. This aligns with #1148, but means that we'll have to keep an eye on it for updates.
2023-08-19 21:17:26 -04:00

168 lines
5.6 KiB
Markdown

---
layout: default
title: Search
nav_order: 7
---
# Search
{: .no_toc }
## Table of contents
{: .no_toc .text-delta }
1. TOC
{:toc}
---
Just the Docs uses [lunr.js](https://lunrjs.com) to add a client-side search interface powered by a JSON index that Jekyll generates.
All search results are shown in an auto-complete style interface (there is no search results page).
By default, all generated HTML pages are indexed using the following data points:
- Page title
- Page content
- Page URL
## Enable search in configuration
In your site's `_config.yml`, enable search:
```yaml
# Enable or disable the site search
# Supports true (default) or false
search_enabled: true
```
### Search granularity
Pages are split into sections that can be searched individually.
The sections are defined by the headings on the page.
Each section is displayed in a separate search result.
```yaml
# Split pages into sections that can be searched individually
# Supports 1 - 6, default: 2
search.heading_level: 2
```
### Search previews
A search result can contain previews that show where the search words are found in the specific section.
```yaml
# Maximum amount of previews per search result
# Default: 3
search.previews: 3
# Maximum amount of words to display before a matched word in the preview
# Default: 5
search.preview_words_before: 5
# Maximum amount of words to display after a matched word in the preview
# Default: 10
search.preview_words_after: 10
```
### Search tokenizer
The default is for hyphens to separate tokens in search terms:
`gem-based` is equivalent to `gem based`, matching either word.
To allow search for hyphenated words:
```yaml
# Set the search token separator
# Default: /[\s\-/]+/
# Example: enable support for hyphenated search words
search.tokenizer_separator: /[\s/]+/
```
### Display URL in search results
```yaml
# Display the relative url in search results
# Supports true (default) or false
search.rel_url: false
```
### Display search button
The search button displays in the bottom right corner of the screen and triggers the search input when clicked.
```yaml
# Enable or disable the search button that appears in the bottom right corner of every page
# Supports true or false (default)
search.button: true
```
## Hiding pages from search
Sometimes you might have a page that you don't want to be indexed for the search nor to show up in search results, e.g., a 404 page.
To exclude a page from search, add the `search_exclude: true` parameter to the page's YAML front matter:
#### Example
{: .no_toc }
```yaml
---
layout: default
title: Page not found
nav_exclude: true
search_exclude: true
---
```
## Generate search index when used as a gem
If you use Just the Docs as a remote theme, you do not need the following steps.
If you use the theme as a gem, you must initialize the search by running this `rake` command that comes with `just-the-docs`:
```bash
$ bundle exec just-the-docs rake search:init
```
This command creates the `assets/js/zzzz-search-data.json` file that Jekyll uses to create your search index.
Alternatively, you can create the file manually with [this content]({{ site.github.repository_url }}/blob/main/assets/js/zzzz-search-data.json).
## Custom content for search index
{: .d-inline-block }
New (v0.4.0)
{: .label .label-green }
Advanced
{: .label .label-yellow }
By default, the search feature indexes a page's `.content`, `.title`, and *some* headers within the `.content`. Other data (e.g. front matter, files in `_data` and `assets`) is not indexed. Users can customize what is indexed.
{: .warning }
> Customizing search indices is an advanced feature that requires Javascript and Liquid knowledge.
1. When Just the Docs is a local or gem theme, ensure `assets/js/zzzz-search-data.json` is up-to-date with [Generate search index when used as a gem](#generate-search-index-when-used-as-a-gem).
2. Add a new file named `_includes/lunr/custom-data.json`. Insert custom Liquid code that reads your data (e.g. the page object at `include.page`) then generates custom Javascript fields that hold the custom data you want to index. Verify these fields in the generated `assets/js/search-data.json`.
3. Add a new file named `_includes/lunr/custom-index.js`. Insert custom Javascript code that reads your custom Javascript fields and inserts them into the search index. You may want to inspect `assets/js/just-the-docs.js` to better understand the code.
#### Example
This example adds front matter `usage` and `examples` fields to the search index.
`_includes/lunr/custom-data.json` custom code reads the page `usage` and `examples` fields, normalizes the text, and writes the text to custom Javascript `myusage` and `myexamples` fields. Javascript fields are similar yet [not the same as JSON](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON#javascript_and_json_differences). `jsonify` will probably work for most scenarios.
{% raw %}
```liquid
{%- capture newline %}
{% endcapture -%}
"myusage": {{ include.page.usage | markdownify | replace:newline,' ' | strip_html | normalize_whitespace | strip | jsonify }},
"myexamples": {{ include.page.examples | markdownify | replace:newline,' ' | strip_html | normalize_whitespace | strip | jsonify }},
```
{% endraw %}
`_includes/lunr/custom-index.js` custom code is inserted into the Javascript loop of `assets/js/just-the-docs.js`. All custom Javascript fields are accessed as fields of `docs[i]` such as `docs[i].myusage`. Finally, append your custom fields on to the already existing `docs[i].content`.
```javascript
const content_to_merge = [docs[i].content, docs[i].myusage, docs[i].myexamples];
docs[i].content = content_to_merge.join(' ');
```