How to build a search engine with Strapi and Nuclia

Strapi is an open-source headless CMS. Offers an admin panel to manage content and a great API. But what if you want to provide a search engine for content?

Developers

Strapi is an open-source headless CMS, it offers a nice admin panel to manage content and a powerful API to fetch it. But what if you want to provide a search engine for your content?

Nuclia is the best search API to do it!

Nuclia is an API able to index and process any kind of data, including audio and video files, to boost applications with powerful search capability, using natural language processing and machine learning to understand the searcher’s intent and return results that are more relevant to the searcher’s needs.

The steps are simple:

generate a Nuclia search widget,
use lifecycle events to index text content,
index video and audio files too!

Let’s go!

Generating the Nuclia widget: as simple as copy/paste!

To add Nuclia search feature to your Strapi-managed web site, you need to create a Nuclia account. You can do it here.

Nuclia manages contents in knowledge boxes. When creating your account, Nuclia automatically creates a default knowledge box for you.

After completing the account creation, you will be redirected to the Nuclia dashboard where you can manage your knowledge box. As you want to allow visitors to run search on your website, you must make your knowledge box public. To do so, click on the Publish button on the top right of the page.

If you go to the "Widgets" entry in the left menu, you can create a new search widget for your Next.js application.
Let’s call it strapi-search-widget.

Change the mode from input to form, and save it.

It generates a code snippet similar to:

<script src="https://cdn.nuclia.cloud/nuclia-widget.umd.js"></script>
<nuclia-search
  knowledgebox="YOUR-KNOWLEDGE-BOX-ID"
  zone="europe-1"
  widgetid="strapi-search-widget"
  type="form"
></nuclia-search>

By default the Nuclia widget uses a modal popup to render a given result resource. But as you want to navigate directly to the pages corresponding to the search results, you need to add an extra attribute to the widget:

navigatetolink="true"

As the widget is just HTML, it is totally independent from the technology you are using to implement your web pages. You just need to copy/paste it in your web page main layout to get your search feature up and running!

Note: you can definitely implement your own search widget, check this previous blog post to learn how to build a search engine with Next.js and Nuclia.

Okay, for now, this widget does not do much because your knowledge box is empty. Let’s fix that!

Indexing your content

You want that anytime a content is published in Strapi, it gets indexed in Nuclia, and vice versa, when a content is either deleted or unpublished in Strapi, it is removed from Nuclia.

Fortunately, Strapi allows to hook into the content lifecycle events to execute custom code.

Let’s say your content type is called article. You need to create a file named lifecycles.js in the src/api/article/content-types/article folder of your Strapi project.

By adding the following code, you will call either index() or unindex() in the expected events:

module.exports = {
  afterUpdate(event) {
    if (event.params.data.publishedAt === null) {
      unindex(event.result.id);
    } else if (!!event.params.data.publishedAt || !!event.result.publishedAt) {
      index(event.result);
    }
  },
  async beforeDelete(event) {
    const entry = await strapi.db.query('api::article.article').findOne({
      where: { id: event.params.where.id },
    });
    if (entry.publishedAt) {
      unindex(event.params.where.id);
    }
  },
  async beforeDeleteMany(event) {
    const entries = await strapi.db.query('api::article.article').findMany({
      where: event.params.where,
    });
    entries.forEach((entry) => unindex(entry.id));
  },
};

That’s a good start, now you need to implement the index() and unindex() functions. That’s where the Nuclia API comes into play.

First you need to install the following dependencies:

npm install @nuclia/core isomorphic-unfetch localstorage-polyfill
# OR
yarn add @nuclia/core isomorphic-unfetch localstorage-polyfill

Now you can create the Nuclia object:

const nuclia = new Nuclia({
  backend: 'https://nuclia.cloud/api',
  zone: 'europe-1',
  knowledgeBox: '6700692b-704e-4eb3-8558-5c2ba036c0bd',
  apiKey: 'YOUR-API-KEY',
});

As you can see, you need to provide a Nuclia API key. An API key is necessary when adding or modifying contents in a knowledge box. You can get your API key in the Nuclia Dashboard, in the "API keys" section:

Create a new Service Access (name it strapi for example) with Contributor role
Click on the + sign to generate a new token for this service access
Copy the generated token and paste it in the previous script

Now you are ready to use the Nuclia API to index your content. Here is the index() function:

const index = (content) => {
  const resource = {
    title: content.Title,
    slug: `article-${content.id}`,
    texts: {
      text: {
        format: 'MARKDOWN',
        body: content.Body,
      },
    },
  };
  nuclia.db
    .getKnowledgeBox()
    .pipe(switchMap((kb) => kb.createOrUpdateResource(resource)))
    .subscribe({
      next: () => console.log(`Uploaded article ${content.id} to Nuclia`),
      error: (err) => console.error(`Error with article ${content.id}`, err),
    });
};

It creates the resource data structure expected by Nuclia, passing the title and the markdown content of the article provided by Strapi (assuming your Article content type has a Title field and a Body field).
The data structure also contains a slug which will be used to identify the resource in Nuclia. It is important to make it unique, so you prefix it with the content type name.

The createOrUpdateResource() will either create the resource if the slug does not exist, or update it if it already exists.

The unindex() function is similar:

const unindex = (id) => {
  nuclia.db
    .getKnowledgeBox()
    .pipe(switchMap((kb) => kb.getResourceFromData({ id: '', slug: `article-${id}` }).delete()))
    .subscribe({
      next: () => console.log(`${content.id} deleted`),
      error: (err) => console.error(`Error when deleting article ${content.id}`, err),
    });
};

Indexing media files

Nuclia is really good at indexing video or audio files. But for now you are only providing the text content of your articles. Let’s fix that!

Let’s imagine you have a video content type with a Title field and a Video field. You want to index the video file in Nuclia.

You will implement a lifecycles.js file in the src/api/video/content-types/video folder very similar to the one you have just created for the article content type.

The difference is in the index() function:

const index = (content) => {
  const filePath = `./public${content.Video.url}`;
  const filename = filePath.split('/').pop();
  const contentType = content.Video.mime;
  const id = `video-${content.id}`;
  const resourceData = {
    title: content.Title,
    slug: id,
  };
  nuclia.db
    .getKnowledgeBox()
    .pipe(
      switchMap((kb) =>
        kb.createOrUpdateResource(resourceData).pipe(
          switchMap(() => kb.getResourceBySlug(id, ['values'])),
          switchMap((res) => {
            const fileContent = fs.readFileSync(filePath);
            if (hasFileChanged(id, res, fileContent)) {
              return uploadFile(kb, id, filename, fileContent, contentType);
            } else {
              return of(null);
            }
          }),
        ),
      ),
    )
    .subscribe({
      next: () => console.log(`Uploaded ${id} to Nuclia`),
      error: (err) => console.error(`Error with ${id}`, err),
    });
};

In Strapi, media files are in the /public folder, and the Video field contains the path to the file and its MIME type. That’s a good start.

The createOrUpdateResource() function is the same as before, passing the title and the slug. But regarding the file, as it might be big, you want to make sure it is worth uploading it or not.

By calling getResourceBySlug(), you get the current content of the resource, and it contains the MD5 of the stored file (if any). That way you can compare it with the current file content MD5 and know if that is a different file or not. That’s what happens in hasFileChanged:

const hasFileChanged = (id, resource, fileContent) => {
  if (resource.data.files && resource.data.files[id]) {
    const md5 = crypto.createHash('md5').update(fileContent).digest('hex');
    return md5 !== resource.data.files[id].file?.md5;
  }
};

If the file has changed, you upload it with the uploadFile() function, passing the file content and the MIME type:

const uploadFile = (kb, id, filename, fileContent, contentType) => {
  return kb.getResourceFromData({ id: '', slug: id }).upload(id, fileContent.buffer, false, {
    contentType,
    filename,
  });
};

And here you go, you can now index your video files in Nuclia!

The full code example discussed here is available on GitHub.