Lucene pipe delimited field search problem

Jim Piller asked on February 7, 2020 16:45

I have a couple of fields on a custom page type that have pipe delimited values. These need to be indexed so that I can use smart search to find results based on these values. An example value would be something like this:

218|244|310

I have tried doing wildcard searches, escaping the pipe character in the search and I've even tried building a custom analyzer based on the documentation located here:

https://docs.kentico.com/k12sp/configuring-kentico/setting-up-search-on-your-website/using-locally-stored-search-indexes/creating-local-search-indexes/creating-custom-smart-search-analyzers

My question is: How can I confirm that the analyzer is being hit by Kentico when building the index? I've used Luke to look at the index, but I still see the values of the field are listed as above? I followed the instructions in the documentation but I don't think the custom analyzer is being hit by Kentico.

Any help on this would be appreciated.

Correct Answer

Dat Nguyen answered on February 7, 2020 18:27

Kentico API doesn't expose the ability to set the leading wildcard property of a query parser, unfortunately. You can develop your own custom search provider, but that may require some knowledge of the Lucene API.

0 votesVote for this answer Unmark Correct answer

Recent Answers


Dat Nguyen answered on February 7, 2020 17:21

If you're able to attach a debugger to your Kentico admin process, you should be able to put a breakpoint in your analyzer code and step through to see what's happening.

If your breakpoints are not being hit, check if you've actually set the custom analyzer for the index.

0 votesVote for this answer Mark as a Correct answer

Jim Piller answered on February 7, 2020 17:58

Thanks for the response. I was able to resolve the issue by trying the White Space analyzer instead and that worked. I do have a follow up question - how do I set the ability to have a first character wildcard for a Kentico smart search index in code?

0 votesVote for this answer Mark as a Correct answer

Juraj Ondrus answered on February 10, 2020 13:26

I just want to say that it is not the Kentico API which is not allowing the wild cards as the first character, it is the Lucene parser: "Note: You cannot use a * or ? symbol as the first character of a search."

0 votesVote for this answer Mark as a Correct answer

Jim Piller answered on February 10, 2020 15:30

@Juraj, you're correct, the documentation does say that, however if you use Luke to look at the indexes, the option to allow a first character wildcard is possible in the Lucene API somewhere. On a side note for anyone else finding this thread, we resolved this problem by making the analyzer in Kentico "Subset" which breaks down all the pieces of the index into search items, and this resolved our issue.

0 votesVote for this answer Mark as a Correct answer

Dat Nguyen answered on February 10, 2020 15:40

Technically, the Lucene parser doesn't allow a true leading wildcard, but the Lucene v3 API does allow you to set the QueryParser.AllowLeadingWildcard bool property when using a PrefixQuery or a WildcardQuery. (Using leading wildcards doesn't work the same as a regular search, as the whole index is scanned for matches.)

This is what I mean when I say the Kentico API doesn't expose that property to developers. You can implement your own search provider to use the Lucene.NET API directly. You just need to know what you're doing. QueryParser doc

0 votesVote for this answer Mark as a Correct answer

   Please, sign in to be able to submit a new answer.