Programmatically trigger content of a node in the search index

George Tselios asked on February 4, 2020 16:44

Dear Sirs,

The portal that we are currently developing (using Kentico v12.0.51 MVC) has several pages that are designed to display content that is not directly stored in the actual page’s fields but is stored in the page’s descendant nodes. (We call these content only nodes as strips). In this way we may control the layout and the content of a page in a more flexible and dynamic manner.

The portal also provides a full text search feature that we have implemented using the Smart Search and, in more detail, local search indexes of “Pages” index type. We have enabled the search only for the page types of the actual pages and not the strips.

In order to additionally index the content that is stored in the strips we have developed a custom module and created an event handler for the DocumentEvents.GetContent.Execute event. Each time the event is triggered, depending on the node’s ClassName, we load its descendant nodes, get the content and add it to the content of the node in the index. In this way when we search for content that is stored in a strip the actual page is returned as a result which is the desired behavior.

The problem we are now facing has to do with the update of the content of the strips. When a strip is updated in the Admin tool, the above event is not triggered in order to add the strip’s content to the parent page, so the search does not function correctly. We need to manually rebuild the index in order to update the index content correctly.

We tried to enable the search functionality for the page types of the strips, which lead to triggering the event but only for the strip node and not its ancestors as well. (I believe that this is the expected behavior).

What we need, is for a way to programmatically trigger the above event for the parent page each time the contents of a strip are updated.

Thanks in advance,

George

Correct Answer

Dat Nguyen answered on February 7, 2020 11:53

George, the boolean field update sounds very clever. Give it a shot. Trial and error is how I work out many things. I think it will work.

As for the query, first I determine if I'm in preview/edit mode because I don't necessary want the published version for that. Then I call the PublishedVersion method, like so:

bool getLatestVersion = HttpContext.Kentico().PageBuilder().EditMode 
    || HttpContext.Kentico().Preview().Enabled;

IQueryable<TreeNode> query = DocumentHelper.GetDocuments()
    .Types("classnames")
    .Culture("en-us")
    .PublishedVersion(!getLatestVersion)
    .Where(x => treeNode.NodeAliasPath.Contains(x.NodeAliasPath));

This will get the published version for the live site, latest version for preview/edit. If the node is not under workflow, it doesn't matter.

0 votesVote for this answer Unmark Correct answer

Recent Answers


Dat Nguyen answered on February 4, 2020 17:02

I would try updating the parent node in an event handler for the strip updates. That parent node update could in turn trigger the GetContent event for the parent node. Here is what I am suggesting, step by step:

  1. Implement an event handler for the Update event on the strip content.
  2. That event handler makes a fake update to the parent node and saves the update.
  3. The update to the parent node should trigger an index update and therefore the GetContent event.
  4. Your existing GetContent event handler does its thing.

Let me know if it works out.

0 votesVote for this answer Mark as a Correct answer

George Tselios answered on February 4, 2020 17:24

Dear Dat,

First of all thanks for you answer.

By "fake update", do you mean to just execute a node.Update() for the parent node?

I tried something like this manually in the Admin tool, meaning just pressed the Save button, but it did not trigger the GetContent event unless I did change the value of a searchable field. Perhaps, programmatically the behavior is different.

Two more questions:

  1. Given a TreeNode, what is the (MultiDocument) query to get all of its ancestors of a given class name(s) and culture?

  2. I presume that the GetContent event in only triggered for changes to nodes that are in published state and not participating in any workflow, is this correct?

Thanks in advance,

George

0 votesVote for this answer Mark as a Correct answer

Dat Nguyen answered on February 4, 2020 17:51 (last edited on February 4, 2020 18:51)

A fake update likely would mean adding whitespace to a searchable field before executing node.Update().

To answer your other questions:

  1. This should work:

        IQueryable<TreeNode> query = DocumentHelper.GetDocuments()
            .Types("classnames")
            .Culture("en-us")
            .Where(x => treeNode.NodeAliasPath.Contains(x.NodeAliasPath));
    
  2. GetContent should trigger for nodes under workflow as well, as long as the changes are published.

0 votesVote for this answer Mark as a Correct answer

Dmitry Bastron answered on February 5, 2020 14:47

Hi George and Dat,

There are a few points to keep in mind regarding your discussion.

First, the information will be send to index only if any of fields marked as searchable are amended prior to .Update() so fake update won't work by design.

Going back to the original question, what you are trying to do is actually covered by Pages Crawler inder type as it will index what actually is on the page, and the indexed content can be customized in SearchCrawler.OnHtmlToPlainText event. So that you can index for example only content of body tag or divs with specified classes and so on. This approach should suite you better I think.

0 votesVote for this answer Mark as a Correct answer

George Tselios answered on February 7, 2020 11:29

Dear Dat and Dmitry,

Thank you both for your replies.

@Dmitry:

We have ruled out the use of pages crawler type of index, because parsing end extracting actual data from Html code is far more difficult and inaccurate. Besides, we need certain search fields to be of certain type (e.g. dates, integers) for creating strongly typed search criteria.

@Dat:

What we came up with is to introduce in the parent node a (hidden) boolean searchable field. Each time a (child) strip node is updated we would use the DocumentEvents.Update.After event to find the corresponding parent node, negate the value of this field and then use the node.Update() method. We suppose this would trigger a DocumentEvents.GetContent.Execute event which will index the entire sub-tree of the parent node. Would such an implementation work?

Also, the query that you provided for finding the ancestors of a given node would return the correct version of nodes, meaning would it return nodes under workflow if the given node is a node under workflow?

Thanks in advance,

George

0 votesVote for this answer Mark as a Correct answer

Dmitry Bastron answered on February 7, 2020 15:45

We have ruled out the use of pages crawler type of index, because parsing end extracting actual data from Html code is far more difficult and inaccurate. Besides, we need certain search fields to be of certain type (e.g. dates, integers) for creating strongly typed search criteria.

1) Stripping the data from actual HTML works quite well with Kentico as the default implementation strips tags successfully. All you need to do is to customize it in the event to cut body or divs with certian classes and this can easily be done with HtmlAgilityPack.

2) Page Crawler index still contain all these strong type fields (as configured for the page type). You can check it if you connect to the index with Luke tool. It just changes the way "_content" column is populated. So you safely use the same search helper.

0 votesVote for this answer Mark as a Correct answer

George Tselios answered on February 13, 2020 17:59

Dear Dat,

We have implemented the triggering of the GetContent event by introducing a Boolean field as described in a previous message and it works as expected as far as the re-indexing of the content is concerned.

Now we are facing another problem. The search in the MVC portal does not return any result if we use a term that is part of a strip's content. The funny thing here, is that if we use the Search Preview of the Kentico Admin and use the exact same term then we get the expected results. This means that the content of the strip is actually added to the index as a part of the the parent node content. On the side of the MVC portal, the search still works and returns results as long as the search term is a part of the parent node content.

We also run the following test: We copied the exact raw values (no variables) of the SearchParameters object the produces results in the "Search Preview" page of Kentico Admin to the corresponding function of the MVC portal and still it does not return any result. The only value that is different between the two applications is actually the User (MembershipContext.AuthenticatedUser), since in Kentico the user is the Global Admin and in the MVC is the anonymous user. Also note that the CheckPermissions property of SearchParameters is set to false.

So, is there a chance when one consumes the GetContent event in order to add extra content to an index, some additional security permissions are also added that prevent the anonymous user to search for it?

Please advice on how to work around this issue.

Thanks in advance,

George

0 votesVote for this answer Mark as a Correct answer

Dmitry Bastron answered on February 13, 2020 18:32

Hi George,

There might be slight difference between search indexes used by CMS (in admin) and MVC. Kentico synchronizes these indexes via Web Farms. So there are these potential issues:

  1. Index data is not synced between CMS and MVC, chech the state of Web Farm, check the outstanding Web Farm Tasks
  2. Check outstanding Search Tasks
  3. I guess, your customization should be included into MVC project as well (because MVC app rebuilds index too)
0 votesVote for this answer Mark as a Correct answer

George Tselios answered on February 14, 2020 10:33

Dear Dmitry,

I found out that the index related files (.cfs, .del, .gen etc) that are created under the CMS ([CMS]\App_Data\CMSModules\SmartSearch[IndexName]) do not match the files under the MVC index folder ([MVC Portal]\App_Data\CMSModules\SmartSearch[IndexName]). The files vary in size and sometimes not all files that are created under the CMS are found under the MVC portal.

I manually copied and pasted the index files of the CMS under the MVC and the search worked as expected.

It seems that the additional content that is inserted during the GetContent event (on the CMS side) is not present in the MVC portal.

Are there any event handlers in the MVC project that I need to register my custom implementation in order to produce the same index files?

Please advice on how to proceed.

Thanks in advance,

George

0 votesVote for this answer Mark as a Correct answer

Dmitry Bastron answered on February 14, 2020 10:46

Hi George,

MVC application has the same event handlers and the same customization approach of those. Basically, you just need the same code to be presented in the MVC application. Kentico recommends moving this logic into a separate project (assembly) and include it in both CMS and MVC solutions. Please refer to Kentico guidlines. I hope it will help.

0 votesVote for this answer Mark as a Correct answer

George Tselios answered on February 14, 2020 13:25

Dear Dmitry,

We added our custom CMS module in the MVC project also and now the search is working as expected.

Regards,

George

0 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.