tainacan/docs/importer-flow.md

331 lines
15 KiB
Markdown

# The Importer flow
This page describes how Tainacan importer works and is a reference to write your own importer.
This documentation is still in construction. A very effective way to learn more on how to build an importer is to look at the source code of the Test Importer and the CSV Importer classes that are included in the tainacan package.
## Introduction
Importers can import items inside a collection, or even create a bunch of collections, taxonomies and items all at once.
In order to create an Importer, you have to extend the `\Tainacan\Importer` Class and register it using the global `Tainacan_Importer_Handler->register_importer()` method.
This method takes an array as argument, with the defintion of your importer. These are the expected attributes.
```
@type string $name The name of the importer. e.g. 'Example Importer'
@type string $slug A unique slug for the importer. e.g. 'example-importer'
@type string $description The importer description. e.g. 'This is an example importer description'
@type string $class_name The Importer Class. e.g. '\Tainacan\Importer\Test_Importer'
@type bool $manual_mapping Wether Tainacan must present the user with an interface to manually map
the metadata from the source to the target collection.
If set to true, Importer Class must implement the method
get_source_metadata() to return the metadata found in the source.
Note that this will only work when importing items to one single collection.
@type bool $manual_collection Wether Tainacan will let the user choose a destination collection.
If set to true, Tainacan will handle Collection creation and will assign it to
the importer object using add_collection() method.
Otherwise, the child importer class must create the collections and add them to the collections property also using add_collection()
```
Note that depending on the value of `manual_mapping` and `manual_collection` you will have to implement some methods in your importer class.
## Initializing a new importer
When the user starts a new import process, he/she first choose which import to use.
Once the Importer is chosen, the first thing that happens is the creation of a new instance of the chosen Importer. This fires the `__construct()` method.
## Choose a collection (if `manual_collection` is true)
After choosing the importer, user will be given the choice to choose the destination collection.
If called from inside a collection, this step is skipped and the current collection is set as destination.
## Set options
Now its time to set the importer options. Each importer may have its own set of options, that will be used during the import process. It could be anything, from the delimiter character in a CSV importer, to an API key for a importer that fetches something from an API.
Importer classes should declare the default values for its options in the `__construct()` method, by calling `set_default_options()`.
```
namespace Tainacan\Importer;
class MyImporter extends Importer {
function __construct() {
parent::construct();
$this->set_default_options(['
'foo' => 'bar'
']);
}
}
```
The Importer classes must also implement the `options_form` method, in which it will echo the form that the user will interact to set the options.
```
function options_form() {
$form = '<div class="field">';
$form .= '<label class="label">' . __('My Importer Option 1', 'tainacan') . '</label>';
$form .= '<div class="control">';
$form .= '<input type="text" class="input" name="my_importer_option_1" value="' . $this->get_option('my_importer_option_1') . '" />';
$form .= '</div>';
$form .= '</div>';
return $form;
}
```
## Fetch source
Next, users will choose the source. Each importer declares the kind of sources they accpet: URL, file, or both.
Importers do this by calling the `add_import_method()` and `remove_import_method` in its construction. By default, importers will support only `file` method. Here is an example of what an importer thar accepts only URLs should do:
```
function __construct() {
parent::construct();
$this->add_import_method('url');
$this->remove_import_method('file');
}
```
If the Importer accepts the `file` method, user will be prompted with a file input to upload a file. The file will then be saved and will be accessible via the `$this->get_tmp_file()` method.
If the importer accepts the `url` method, user will be prompted with an text input to enter an URL. By default, the importer will fetch any given URL to the same `file` attrribute, as if the user had uploaded it. However, each importer may override the `fetch_from_remote()` method and do whatever it want to create the file. For example, it could make several paged requests.
From that point forward, the importer will behave just as if it was using the file method.
## Mapping
At this point, if the Importer definition has `manual_mapping` set to `true`, the user is presented with an interface to map the metadata from the source to the metadata present in the chosen collection.
The Importer class must implement the `get_source_metadata()` method, that will return an array of the metadata found in the source. It can either return an hard coded array or an array that is read from the source file. For example, an importer that fetches data from an api knows beforehand what are the metadata the api will return, however, an importer that reads from a csv, may want to return whatever is found in the first line of the array.
```
// Example 1: returns a hard coded set of metadata
function get_source_metadata() {
return [
'title',
'description'
'date',
];
}
// Example 2: returns the columns of the first line of a CSV file
public function get_source_metadata(){
$file = new \SplFileObject( $this->tmp_file, 'r' );
$file->seek(0);
return $file->fgetcsv( $this->get_option('delimiter') );
}
```
## Importer steps
An Importer may have several steps, that will handle different parts of the process. Each step will be handled by a different callback in the Importer class.
First, lets have a look at a simple CSV importer, that only have one step, in which it imports the items from the source into a chosen collection. After that we will have a look on how to create custom steps.
### Simple Importer - One step that imports items
By default, the only method an Importer class must implement to function is the `process_item()` class.
This method gets two parameters, the `$index` of the item to be inserted, and the `$collection_definition`, with information on the target collection.
Inside this method you must fetch the item from the source and format it according to the `mapping` definition of the collection.
The `mapping` defines how the item metadata from the source should be mapped to the metadata present in the target collection. It was created either manually, by the user, or programatically by the importer in an earlier step (see advanced importers below). This is an array where the keys are the `metadata IDs` and the values are the `identifers` found in source.
All this method should do is return the item as an associative array, where the keys are the metadata `identifiers`, and the values are the values tha should be stored.
And that's it. Behind the scenes the Importer super class is handling everyhting and it will call `process_item()` as many times as needed to import all items into the collection
### Advanced Importer - Many steps
By default, Tainacan Importer super class is registering one single step to the importer:
```
[
'name' => 'Import Items',
'progress_label' => 'Importing Items',
'callback' => 'process_collections'
]
```
This step will lopp through all the collections added to the importer (manuall or programatically) and add the items to it.
You may register as many steps and callbacks as you want in your importer, but you should consider keeping this default step at some point to handle the items insertion. For example, see how the Test Importer adds other steps before and after but keeps this default step in the middle:
```
class Test_Importer extends Importer {
protected $steps = [
[
'name' => 'Create Taxonomies',
'progress_label' => 'Creating taxonomies',
'callback' => 'create_taxonomies'
],
[
'name' => 'Create Collections',
'progress_label' => 'Creating Collections',
'callback' => 'create_collections'
],
// we keep the default step
[
'name' => 'Import Items',
'progress_label' => 'Importing items',
'callback' => 'process_collections'
],
[
'name' => 'Post-configure taxonomies',
'progress_label' => 'post processing taxonomies',
'callback' => 'close_taxonomies'
],
[
'name' => 'Finalize',
'progress_label' => 'Finalizing',
'callback' => 'finish_processing',
'total' => 5
]
];
//...
```
#### Steps callbacks
Each step has its own callback. The callback may do anything necessary, just keep in mind that you should allow the importer to break very long processes into several requests.
In order to that, your step callback might be called several times, and each time run a part of the process and return its current status, until its done.
When you run the importer, Tainacan will automatically iterate over your steps. If a step callback returns `false`, it assumes the step is over and it will pass to the next step in the next iteration. If the step callback returns an integer, it will keep the pointer in this step and call the same step again in the next iteration. The current position, which is the integer returned the last time the callback was invoked, will be accessible via the `$this->get_in_step_count()` method.
See this example found in the Test Importer:
```
public function finish_processing() {
// Lets just pretend we are doing something really important
$important_stuff = 5;
$current = $this->get_in_step_count();
if ($current <= $important_stuff) {
// This is very important
sleep(5);
$current ++;
return $current;
} else {
return false;
}
}
```
#### Adding collections
If your importer does not use the `manual_collection` option, you might have to create the collection on your own.
You will do this using the [Tainacan internal API](internal-api.md).
After you've created one or more collections, you will have to add them to the importer queue, registering some information about them. This only if you want (and most likely you should) rely on the default step for processing items into the collections.
To add or remove a collection from the queue, use the `add_collection()` and `remove_collection()` methods passing the collection definition.
The collection definition is an array with their IDs, an identifier from the source, the total number of items to be imported, the mapping array from the source structure to the ID of the metadata metadata in tainacan.
The format of the map is an array where the keys are the metadata IDs of the destination collection and the values are the identifier from the source. This could be an ID or a string or whatever the importer finds appropriate to handle.
The source_id can be anyhting you like, that helps you relate this collection to your source. You will use it in you `process_item` method to know where to fetch the item from.
Example of the structure of this propery for one collection:
```
[
'id' => 12,
'mapping' => [
30 => 'column1'
31 => 'column2'
],
'total_items' => 1234,
'source_id' => 55
]
```
#### Using transients
Since Importers are handled as background processes, they will run accross multiple requests. For that reason, you can not simply add properties to your class and trust their values will be kept during all the time the process is running.
If you want a value to persist so you can use it accross all methods of your importer, at any time, you should use `transients` to store them in the database.
This is as simple as calling `set_transient()` and `get_transient()`. See the example below:
```
function callback_for_step_1() {
$this->add_transient('time_step_1', time());
}
// ...
function callback_for_step_5() {
$time_step_1 = get_transient('time_step_1');
}
```
#### Handling user feedback
There are two information Tainacan Importers give to the user about the status of the process while it is running in feedback. the `progress label` and the `progress value`.
The `progress label` is a string that can be anything that tells the user what is going on. By default, it is the Step Name, but you can inform a specific `progress_label` attribute when registering the steps.
The `progress value` is a number between 0 and 100 that indicates the progress of the current step or the whole importer, Thats up to you. By default, it calculates it automatically using the `total` attribute registered with the steps, against the `$this->get_in_step_count()` value. In the case of the default Process Items callback, it calculates based on the number of items found in each collection.
Remember the `finish_processing` dummy callback we saw in the Test Importer. You might have also noticed that when we registered the step, we informed a `total` attribute to this step with the value of 5. This will tell Tainacan that the total number iterations this step need to complete is 5 and allow it to calculate the progress.
If it is no possible to know `total` of a step beforehand, you can set it at any time, even inside the step callback itself, using the `set_current_step_total($value)` or `set_step_total($step, $value)` methods.
##### Final output
When the process finishes, Background processes define an "output", which is a final report to the user of what happened.
This could be simply a message saying "All good", or could be a report with the names and links to the collections created. HTML is welcome.
Remember that for a more detailed and technical report, you should use Logs (see Logs below). This output is meant as a short but descriptive user friendly message.
In order to achieve this, Importers must implement a method called `get_output()` that returns a string.
This method will be called only once, when the importer ends, so you might need to save information using `transients` during the process and use them in `get_output()` to compose your message.
#### Logs
There are two useful methods to write information to the logs: `add_log()` and `add_error_log()`. These are written into a log file related to the importer background process and a link to it will be presented to the user.
## Run importer
Finally, everything is ready. The importer runs.
This will trigger a Background Process (documentation needed) and the importer will run through as many background requests as needed.