Page Crawler Index

Rohan Pandit asked on May 7, 2020 20:22

Hi guys,

I have a Kentico 12 MVC site where the cms and I guess "client" site are in the same server but separate IIS entries. One is called admin.site.com and the other is called dev.site.com.

I'm trying to implement the Smart Search functionality with a Page Crawler index. The reason I want a Page Crawler index is because my content structure is as follows:

Page Container > Page Type "Product"

Then within "Product" page type, I'm pulling in content from a different part of the content tree using widgets/page builder functionality in the Page tab. The Content tab of that page has very little actual content.

If I use Pages Index and search on that, it only grabs the page types that are in the content widget section of the site, so not the pages that implement the widgets which are the actually live pages on the site. I implemented the Page Crawler index and tried a search preview but literally anything I search comes with no results. Please let me know what details you'd need from me to help, I appreciate any help!

Best, RP

page crawler index search smart search

Recent Answers

Technical support leader

Juraj Ondrus answered on May 7, 2020 21:15 (last edited on May 7, 2020 21:16)

Check the documentation and especially the note:
"We do not recommend using crawler indexes on MVC content-only sites. The crawler only selects pages from the site's content tree in Kentico, which may not match the actual structure of the website (in many cases, content-only pages only store data and do not represent pages on the live site)."

To achieve your need you will need to create your own crawler code and combine it with custom search index.

0 votesVote for this answer Mark as a Correct answer

Rohan Pandit answered on May 7, 2020 21:22

Thanks for the info Juraj!

I saw that note but based off another thread thought Page Crawler Index would be the best since it would crawl over outputed html which includes the widget content that's being pulled into the actual displayed pages.

I looked at the documentation for creating a custom index and I am a little bit confused. The documentation specifies how to create custom code to go over text files that I guess are the index, but how do I create those txt files in the first place? How would I create a custom crawler to crawl over specific page types and then use the code in the documentation to traverse that custom index I've made? Any starting tips or pointers to more code or anything is much appreciated!

Best, RP

0 votesVote for this answer Mark as a Correct answer

Dmitry Bastron answered on May 8, 2020 14:53

Hi Rohan,

I have just tested page crawler indexes on test Dancing Goat website and found them working perfectly. So I guess it might be a configuration issue. As Juraj mentioned in his comment, the only drawback of using page crawler indexes is that all you indexed pages must be present in CMS content tree as Kentico page types with URL pattern defined (and in MVC as you probably know we have a full freedom how to route URLs). So that's why my guess is the problem is probably with your configuration. Could you post here the following (screenshots may be better):

What Presentation URL do you have configured for your site? (you can find it in Sites application > edit Site) Or do you have Domain configured for your Pages crawler index?
What do you have configured on Indexed content tab for your index?
When you rebuild your Page crawler index, do you see that this index contains documents (number of indexed documents > 0) and rebuild completes without errors?
Where are you trying to perform a search that returns no results? Is it in Search preview in CMS or in your code in MVC site? If it is on MVC - check that you have Web farms configured to automatic mode and web farm servers are not red.

0 votesVote for this answer Mark as a Correct answer

Rohan Pandit answered on May 8, 2020 18:14

Hi Dmitry,

Thank you very much for this info!

Presentation url: http://dev.site.com Domain: I've tried both admin.site.com and dev.site.com as our site is split into two parts.
Currently I have /% so the entire Content tree. I have not restricted by adding specific page types.
Yes, I see 71 indexed items and a file size of about 67 kb. I see no errors when I rebuild.
In both areas I get no results. I will look into the web farms config. I see that one is healthy, the other that has _AutoExternal1 appended to the name is also healthy. Where do I check for if configuration is set to automatic mode?

0 votesVote for this answer Mark as a Correct answer

Technical support leader

Juraj Ondrus answered on May 11, 2020 05:48

The documentation has just a sample to see what is needed from Kentico side. The text files are the source of the data for the index. In your case, the source is the front end MVC web site. So, this means you need to create your own crawler as mentioned above, which will crawl the MVC pages to fit your needs and provide the data to the custom index.

0 votesVote for this answer Mark as a Correct answer

Dmitry Bastron answered on May 11, 2020 10:05

Rohan,

Web Farms settings can be changed here: Settings application > Versioning & Synchronization > Web farm. But all your settings paper to be ok. Although there are couple more things to check then:

On General tab of your index what analyzer do you have selected?
What culture (language) is your website content in?

0 votesVote for this answer Mark as a Correct answer

Brandon Owensby answered on June 2, 2020 00:58

Rohan, I am actually going through the process of building a search on an MVC site as well. After some research I found that the warning about using the page crawler on MVC sites is based on a HUGE assumption. In my case I'm using the MVC portion of the site for templating pages and the actual pages are defined by the tree which allows the page crawler to work beautifully. What I have learned is that the disclaimer is that Kentico feels most people will define the pages in the MVC app using the MVC routing and therefore the admin app will not be able to get the proper URLs to crawl. If you are able to get the URLs for the pages you wish to get the crawl then I've seen absolutely no issue in using the page crawler on an MVC site...in fact it seems the best option for me thus far. Not to say it will be all I use...as I do want to also search custom tables...it but will be my primary.

I know this may not address why yours isn't working...but wanted to give you confidence that the page crawler can work well with MVC...just a matter of understanding the rules.

Thank you, Brandon

0 votesVote for this answer Mark as a Correct answer

Brandon Owensby answered on June 2, 2020 16:21 (last edited on November 11, 2020 15:43)

Rohan, As I continued the development of my search bar I think I figured out the issue you are facing. One big change with the MVC version of Kentico is that there are now 2 web apps that run your site. What I found is that both the CMS App and the MVC App keep their own copy of the index by default. For a completely separate reason I wanted specify a custom directory for indexes but I initially only configured it in the admin (thinking that was the only place that used it). I noticed I was getting 2 sets of files and I thought the second was left in the original directory for the admin so I deleted them. I then noticed that my search wasn't working on the MVC even thought it was in the admin. When I rebuilt my index eventually those files came back and I realized they were for the MVC and not the admin. This may be addressed in the above posts but when I read them I didn't get that at all. I didn't do anything special to get it to work...but I'm sure there is something you can mess up to stop the MVC from knowing to rebuild the index. Hopefully that helps give your or anyone else direction...even if it isn't a complete answer.

Thank you, Brandon

0 votesVote for this answer Mark as a Correct answer