Kentico CMS 7.0 Developer's Guide

Creating an index

Creating an index

Previous topic Next topic Mail us feedback on this topic!  

Creating an index

Previous topic Next topic JavaScript is required for the print function Mail us feedback on this topic!  

Before you can use the Smart search module, you need to prepare the appropriate indexes for storing information about the website's content in an efficiently searchable format.

 

The following types of search indexes are available:

 

Documents - stores information about the content of documents in the content tree.

Documents crawler - directly indexes the HTML output of documents (pages).

Forums - stores information about the content of discussion forums in the system.

Custom tables - indexes records stored in custom tables.

Users - stores information about users in the system.

General - stores information about system objects of a specified type.

Custom - allows you to use your own custom‑coded search index. Stores any kind of data depending on the implementation.

 

The following steps describe how to create smart search indexes for your website:

 

1. Go to Site Manager -> Administration -> Smart search.

 

2. Click NewIndex New index. The New search index dialog opens.

 

3. Fill in the following details for the index:

 

Display name

Name of the index displayed in the administration interface.

Code name

Name of the index used as a unique identifier, typically in web part properties or in the API. You can leave the default (automatic) option to have the system generate a code name based on the display name.

 

Warning: This name is also used for the physical index file. The fully qualified name of the file must be less than 260 characters long, including the directory path.

Index type

Determines what type of content is stored in the index. The following index types are available:

 

Custom index - indexes any kind of data depending on the implementation.

Custom tables - indexes records in custom tables.

Documents - indexes the content of documents in the content tree.

Documents crawler - indexes the HTML output of the website's documents (pages).

Forums - indexes the content of discussion forums.

General - indexes system objects of a specified type. General indexes allow you to search through any objects within the CMS.

Users - indexes details about users in the system (fields of the CMS_User system table).

Analyzer type

Sets the type of text analyzer that the index uses to process (tokenize) content. The following analyzer types are available:

 

Custom - allows you to assign a custom‑written analyzer. This provides a way to perform text tokenization according to your own specific requirements. If selected, you need to specify the names of the assembly and class where the custom analyzer is implemented. See Using custom analyzers for more information.

Keyword - returns the entire text stream as a single token. This is useful for structured data fields like zip codes or IDs.

Simple - divides text at non-letter characters.

Standard - grammar-based analyzer (stop words, shortcuts, ...). This option is very efficient for English, but may not produce satisfactory results with other languages.

Starts with - tokenizes all prefixes contained in words, which allows searching for words that start with the search keyword. Divides text at whitespace characters. For example, searching for test returns words such as test, tests, tester, etc.

Stop - uses a predefined collection of stop words to divide text.

Subset - tokenizes all possible substrings in words. Divides text at whitespace characters. Indexes with this analyzer type return results for all words that contain the search keyword. For example, searching for net returns words such as net, Internet, network, etc.

White space - divides text at whitespace characters.

Stop words

Selects the stop word dictionary for Stop or Standard analyzers.

 

Stop words (e.g., 'and', 'or') are excluded from the index content and the analyzer uses them to divide text into tokens.

 

You can edit the content of the dictionaries or add new ones. The application stores the dictionaries as text files in the ~\App_Data\CMSModules\SmartSearch\_StopWords folder.

Assign index to website <sitename>

Check this box to automatically assign the new index to the currently active site.

 

Creating a new document index

Creating a new document index

 

4. Click Save Save to create the search index. The General tab of the index's editing interface opens, where you can edit the same properties that you configured when creating the index.

 

Additionally, you can set the Batch size property for the index, which sets the maximum amount of records that the system retrieves in a single database query when rebuilding (or creating) the index. This allows you to optimize indexing performance. The default value is 10. Increasing the value reduces the amount of queries required for large numbers of records, which may improve performance, but also increases memory consumption. The optimal value depends on the type (size) of the indexed objects and on the resources available in your hosting environment. When indexing large objects (e.g. documents), it is recommended to set a reasonably small batch size.

 

5. Switch to the Index tab and define which documents, forums, custom tables, users or other objects should be included in the index. The options available on the Index tab depend on the type of the index. You can find detailed information about setting the content of individual index types in the following topics:

 

Defining document index content

Defining forums index content

Defining custom tables index content

Defining user index content

Defining general index content

Defining custom index content

 

devguide_clip1085

 

6. Open the Sites tab and assign the index to the websites where you wish to use it. You can implement multi-site search functionality by assigning the index to more than one website.

 

devguide_clip1086

 

Note: If the index includes global objects that are not site-specific, the selection made on the Sites tab does not affect the index's content. However, the index is only available for use (through Smart search web parts) on the assigned sites.

 

7. If you are creating a Documents or Documents crawler type index, switch to the Cultures tab. Here you need to select which language versions of the website's documents should be indexed.

oYou must assign at least one culture in order for the index to be functional.

oIf you have a multi-site index, you can select the cultures separately for each site.

 

devguide_clip0001

 

8. Go back to the General tab and RebuildIndex Rebuild the index.

The Index info box on the right side of the General tab displays current information about the status and properties of the index.

 

devguide_clip1084

 

Once the system finishes building the index, you can start using it on your website. The Search preview tab allows you to test the functionality of the index.

 

Maintaining search indexes

 

You can manage existing search indexes using the actions available on the General tab of the index editing interface.

 

The system automatically updates search indexes to reflect all changes made to the indexed content. Over time, these updates can make indexes less efficient, particularly in the case of large indexes.

 

To restore optimal search performance for an index, defragment it by clicking the OptimizeIndex Optimize action. You can enable the Optimize search indexes scheduled task to have the system automatically optimize all smart search indexes once per week.

 

The RebuildIndex Rebuild action deletes the current index file and indexes all specified content again.

 

Use the rebuild action to apply changes made to the index's configuration. This includes modifications of the analyzer settings (Analyzer type, Stop words), all options on the Index, Sites or Cultures tabs, and adjustments of the search field settings for the indexed objects.

The system automatically optimizes the index after a successful rebuild.

 

 

InfoBox_Note

 

Note

 

Clicking the RebuildIndex Rebuild action does not always guarantee that the index starts rebuilding immediately. The process may be delayed if another index is already being rebuilt or if the rebuilding tasks are configured to be handled by the scheduler.