Once we know how to perform the HTML-to-Structured-Text conversion, we only have to do some minor changes to make it work also for Markdown content.
As we just saw, the datocms-html-to-structured-text
package knows how to convert an hast
tree (HTML) to a dast
tree (Structured Text), so if we can convert a Markdown string to hast
, then the rest of the code will be basically the same.
Luckily, hast
is part of the unified ecosystem, which also includes:
an analogue specification for representing Markdown in a syntax tree called mdast
;
a tool to convert Markdown strings to mdast
;
a tool to convert mdast
trees to hast
.
Let's install all the packages we need:
npm install --save-dev unified remark-parse mdast-util-to-hast
We can now create a function similar to htmlToStructuredText
called markdownToStructuredText
that connects all the dots:
// ./migrations/utils/markdownToStructuredText.jsconst unified = require('unified');const toHast = require('mdast-util-to-hast');const parse = require('remark-parse');const { hastToStructuredText } = require('datocms-html-to-structured-text');const { validate } = require('datocms-structured-text-utils');const inspect = require('unist-util-inspect');module.exports = async function markdownToStructuredText(text, settings) {const mdastTree = unified().use(parse).parse(text);const hastTree = toHast(mdastTree);const result = await hastToStructuredText(hastTree, settings);const validationResult = validate(result);if (!validationResult.valid) {console.log(inspect(result));throw new Error(validationResult.message);}return result;};
We can now create the a new migration script:
> dato new migration convertMarkdownArticlesCreated migrations/1612340785_convertMarkdownArticles.js
And basically copy the previous migration, just replacing the name of the model (from html_article
to markdown_article
), and the call to htmlToStructuredText
with a call to markdownToStructuredText
:
// ./migrations/1612340785_convertMarkdownArticles.jsconst { buildModularBlock } = require("datocms-client");const { findAll } = require("unist-utils-core");const getModelIdsByApiKey = require("./utils/getModelIdsByApiKey");const createStructuredTextFieldFrom = require("./utils/createStructuredTextFieldFrom");const findOrCreateUploadWithUrl = require("./utils/findOrCreateUploadWithUrl");const markdownToStructuredText = require("./utils/markdownToStructuredText");const getAllRecords = require("./utils/getAllRecords");const swapFields = require("./utils/swapFields");module.exports = async (client) => {const modelIds = await getModelIdsByApiKey(client);await createStructuredTextFieldFrom(client, "markdown_article", "content", [modelIds.image_block.id,]);const records = await getAllRecords(client, "markdown_article");for (const record of records) {await client.items.update(record.id, {structuredTextContent: await markdownToStructuredText(record.content, {preprocess: (tree) => {findAll(tree, (node, index, parent) => {if (node.tagName === "img") {tree.children.push(node);parent.children.splice(index, 1);return;}});},handlers: {img: async (createNode, node, context) => {const { src: url } = node.properties;const uploadPath = await client.createUploadPath(url);const upload = await client.uploads.create({ path: uploadPath });return createNode("block", {item: buildModularBlock({image: {uploadId: upload.id,},itemType: modelIds.image_block.id,}),});},},}),});if (record.meta.status !== "draft") {console.log("Republish!");await client.items.publish(record.id);}}await swapFields(client, "markdown_article", "content");};
We can now run the new migration inside the sandbox environment we already created for the first migration:
> dato migrate --source=with-structured-text --inPlace --token=<READWRITE_API_TOKEN>✔ Running 1612340785_convertMarkdownArticles.js...Done!
To migrate Modular Content fields into Structured Text fields, we must acknowledge the fact that both fields allow nested record blocks: the difference between the two is that Modular Content is basically an array of record blocks, while in Structed Text record blocks are inside the dast
tree in nodes of type block
. In other words, our task here is, for every modular content, to transform an array of block records into a single dast
document. It's up to us decide how to convert each block we encounter into one/many nodes into our dast
document.
Let's take a look at the project schema again:
The existing Modular Content field supports three block types:
Text (which in turn contains a text
Markdown field);
Code (which has two fields, one that contains the actual code and another that stores the language);
Image (which, as we already know, it contains a single-asset field called image
).
Here's the code for our migration:
const { validate } = require("datocms-structured-text-utils");const getModelIdsByApiKey = require("./utils/getModelIdsByApiKey");const createStructuredTextFieldFrom = require("./utils/createStructuredTextFieldFrom");const getAllRecords = require("./utils/getAllRecords");const swapFields = require("./utils/swapFields");const markdownToStructuredText = require("./utils/markdownToStructuredText");module.exports = async (client) => {const modelIds = await getModelIdsByApiKey(client);await createStructuredTextFieldFrom(client,"modular_content_article","content",[modelIds.image_block.id]);const records = await getAllRecords(client, "modular_content_article");for (const record of records) {const rootNode = {type: "root",children: [],};for (const block of record.content) {switch (block.relationships.itemType.data.id) {case modelIds.text_block.id: {const markdownSt = await markdownToStructuredText(block.attributes.text);if (markdownSt) {rootNode.children = [...rootNode.children, ...markdownSt.document.children];}break;}case modelIds.code_block.id: {rootNode.children.push({type: "code",language: block.attributes.language,code: block.attributes.code,});break;}default: {delete block.id;delete block.meta;delete block.attributes.createdAt;delete block.attributes.updatedAt;rootNode.children.push({type: "block",item: block,});break;}}}const result = {schema: "dast",document: rootNode,};const validationResult = validate(result);if (!validationResult.valid) {console.log(inspect(result));throw new Error(validationResult.message);}await client.items.update(record.id, {structuredTextContent: result,});if (record.meta.status !== "draft") {await client.items.publish(record.id);}}await swapFields(client, "modular_content_article", "content");};
Every time we need to convert a Modular Content field, we start by creating an empty Dast root
node (that is, one with no children, line 21-24).
Then, for every block contained in the modular content (line 26), we're going to accumulate children inside of the root
node:
If it is a Text block (line 28), we use the markdownToStructuredText
function to convert its Markdown content into a Dast tree, then take the children of the resulting root
node and add them to our accumulator;
Since Dast supports nodes of type code
, if we encounter a Code block (line 37), we simply convert it to code
node, and add it to the accumulator;
If we find an Image block (line 45), we'll wrap the block into a Dast block
node, and add it to the accumulator as it is.
Once you get to know the Structured Text format, it becomes quite straightforward converting from/to its Dast tree representation of nodes, and the DatoCMS API, coupled with migrations/sandbox environments, makes it easy to perform any kind of treatment to your content.
You can download the final code from this Github repo.