notesassorted ramblings on computer

Supporting zk’s Markdown Extension in Hakyll

To me, one of the central drawbacks of Markdown is that it wasn’t designed with extensibility in mind. Nonetheless, a variety of tooling adds custom syntax on top of Markdown (e.g., Pollen or zk). Since this website is generated from a zk notebook using Hakyll, I also wanted to support some of zk’s Markdown extensions in my setup. Hakyll does not mandate a specific input format, but Markdown is commonly supported through Pandoc. As it turns out, Pandoc supports manipulating the Markdown AST either through Lua scripts or its Haskell API. In the following, we use the latter to implement some of zk’s Markdown extensions for Hakyll.

Tag Syntax

A defining feature of zk is its support for tags. Tags are used to organize notes in a zk notebook and can be facilitated to find notes through filters. A note can reference tags either in a YAML metadata block or inline through custom Markdown syntax. Pandoc already supports the former, so we concern ourselves with implementing the latter. At the time of writing, zk supports three different syntaxes for inline references to tags:

  1. #hashtags
  2. :colon:separated:tags:
  3. Bear’s #multi-word tags#

These Markdown extensions have to be enabled via zk’s configuration. I personally prefer Bear’s multi-word tags as they support tags with spaces in them and—at least in my opinion—are more readable than colon tags. Therefore, I use the following zk configuration:

[format.markdown]
hashtags = false
colon-tags = false
multiword-tags = true

In order to implement support for multi-word tags in Pandoc, we need to manipulate inline elements. For manipulation purposes, Pandoc’s inline element provides a walkable instance. We have to implement a transformation [Inline] → [Inline] as, for each multi-word tag, we need to return a link element and the remaining text. A challenge in this regard is that multi-word tags with spaces in them will be split at the word boundary. For example, the multi-word tag #Alpine Linux# will be passed as [Str "#Alpine", Space, Str "Linux#"]. As such, we cannot consider elements from [Inline] in isolation, which makes this a bit tricky. My approach to tackle this looks as follows:

inlineBearTags :: [P.Inline] -> [P.Inline]
inlineBearTags (i@(P.Str (T.stripPrefix "#" -> Just tagRst)) : ix) =
  case takeTagElems (P.Str tagRst : ix) of
    Nothing -> i : inlineBearTags ix
    Just el ->
      let (tag, rst) = splitTag $ T.unwords el
          numElement = (length el - 1) * 2 -- count P.Space too
       in [linkToTag tag, P.Str rst] ++ inlineBearTags (drop numElement ix)
  where
    takeTagElems :: [P.Inline] -> Maybe [T.Text]
    takeTagElems (P.Str str : xs)
      | T.elem '#' str = Just [str]
      | otherwise = (str :) <$> takeTagElems xs
    takeTagElems (P.Space : xs) = takeTagElems xs
    takeTagElems _ = Nothing

    splitTag :: T.Text -> (T.Text, T.Text)
    splitTag t = splitAtEx (fromJust $ T.findIndex (== '#') t) t

    splitAtEx :: Int -> T.Text -> (T.Text, T.Text)
    splitAtEx n t = let (b, a) = T.splitAt n t in (b, T.drop 1 a)
inlineBearTags (i : ix) = i : inlineBearTags ix
inlineBearTags [] = []

In order to support multi-word tags, it merges the individual inline elements into a Text which is then split on the # character. Running this code on a Markdown input replaces occurrences of the #some tag# syntax with a link to the tag page generated by Hakyll. Now we just need to construct a custom Hakyll Compiler that invokes Pandoc with this transformation:

pandocCompilerZk :: Compiler (Item String)
pandocCompilerZk =
  cached "pandocCompilerZk" $
    pandocCompilerWithTransform
      defaultHakyllReaderOptions
      defaultHakyllWriterOptions
      (walk transform)
  where
    transform :: P.Block -> P.Block
    transform = walk inlineBearTags

Now within the Hakyll rules, just replace pandocCompiler with pandocCompilerZk.

Note References

Apart from tags, due to its zettelkasten nature, zk also makes heavy use of references between notes. These references refer to the Markdown files in the zk notebook. When generating HTML for the notebook using Hakyll, we need to refer to the generated HTML instead. Conceptually, this is somewhat similar to Hakyll’s existing relativizeUrls function. That is, we need to iterate over all links, check if they reference another note, and if so, change the link to refer to the .html (instead of .md) file. For this purpose, I implemented the following fixupNoteRefs function:

fixupNoteRefs :: Item String -> Compiler (Item String)
fixupNoteRefs = pure . fmap (withUrls go)
 where
  go :: String -> String
  go url
    | isZkRef url = replaceExtension url ".html"
    | otherwise = url

  isZkRef :: String -> Bool
  isZkRef ('#' : _) = False
  isZkRef url =
    let ext = takeExtension url
      in not (isExternal url) && (ext == "" || ext == ".md")

Within the Compiler monad, this function should be invoked directly after pandocCompilerZk. For example:

pandocCompilerZk
  >>= fixupNoteRefs
  >>= loadAndApplyTemplate "templates/note.html" (postTags <> noteCtx)
  >>= saveSnapshot "content"
  >>= loadAndApplyTemplate "templates/default.html" (sidebar <> noteCtx)
  >>= relativizeUrls

Theoretically, fixupNoteRefs could also be made a part of pandocCompilerZk.

Future Work

I am also interested in supporting zk’s [[Wiki Links]]. However, since they refer to the note by title, supporting them in Hakyll requires an integration with zk’s SQLite database. An integration with the database would also be a prerequisite for supporting backlinks.