We have a basic k13 Azure Cognitive search implementation with a custom ISearchCrawlerContentProcessor to restrict the page builder content being indexed on included pages.
There is a consistent problem when a search executes that the first highlight value in the sys_content field begins with spurious content before the expected output of the ISearchCrawlerContentProcessor.
The returned highlight value has the following pattern:
My page title My-page-title My Page title [expected ISearchCrawlerContentProcessor output here]
We have confirmed that the ISearchCrawlerContentProcessor output does not include this spurious content.
The spurious content looks to be titles and a page alias, but we're unable to identify what's causing it to be indexed.
We have checked search field configuration and that does not seem to be the cause.
We'd appreciate if anyone can offer advice on the problem.
I think the 'content' checkbox is checked for those fields thus causing these fields to be added to the 'sys_content' field.
This could be default Kentico search index settings, you can look under 'modules' in the 'pages' module for the 'page' class.
Many thanks Arjan that has solved the problem.
We suspected it was a case of default 'content' field indexing but weren't sure how or where that was controlled.
CC Kentico: this should be in the search documentation.
Please, sign in to be able to submit a new answer.