Maintenance in Kentico
Many developers think that by providing a finished website, their job is done. Even if they’ve followed all of our best practices regarding website design and setup, there is still an ongoing need to keep the overall system healthy. Websites should be serviced regularly; just like you would service your car.
There are several areas you should pay attention to when maintaining your website. The areas can be divided into:
- Linking content
- Unnecessary performance hogs
- Long term content strategy
- Database health
- File system health
We will take a look at each of these areas, and describe how to tackle them effectively in order to keep your website running smoothly for a long time.
Every page and resource in a website has a unique link. Some pages and resources can be accessed by using different methods. You should always be aware of the benefits and drawbacks of each linking method as you pick the one you will implement. Below is a brief discussion of the most commonly used types of content and their linking options.
Pages can be linked via a live URL, such as ~/Home.aspx. This is the default way in which pages are linked throughout the system (e.g. in navigation web parts) and one of the best options when it comes to linking pages (three SQL queries are required to get the PageInfo of the page). It’s also an SEO-friendly method for linking content.
An alternative method for linking content to use a permanent URL, which looks something like this: ~/cms/getdoc/a88f82be-bb76-4b82-8faf-5253209f0f75/Home.aspx. You can force these types of URLs in Kentico via the Use Permanent URLs setting, found under URLs and SEO > Page URLs. This method gives a slight improvement in performance. Additionally the part of the URL after the GUID (in this case Home) isn’t really important, and renaming the page doesn’t impact performance because the page is retrieved via the GUID part of the URL. You may be asking yourself why renaming the page might degrade the performance. If you rename or move a page, Kentico creates page aliases (depending on the chosen setting in Remember original URLs when moving pages, which is enabled by default) for the old name or old location of the given page. If the old links for these pages are used (the old links become aliases), then additional processing is required to retrieve the given page (eight SQL queries are required to get the PageInfo of the page) and on top of that, a redirect (depending on Page alias settings) could take place, slowing down the user experience even further. Another feature that can be used on pages is the Path or pattern field. This property allows you to add one alternative alias to the given page without any performance penalty. This frequently used feature is the wildcard. The most performance friendly approach is to use this wildcard in the Path or pattern field (only six SQL queries are required to get the PageInfo of the page).
Of course, if the website is setup properly, without the need to move content around, most of the issues mentioned in this section will never become an issue. Additionally, compared to other performance issues that we have found on some of our client websites, the performance impact of linking content is usually relatively small. However, if you have a page on your website that receives high traffic, you may want to look into what kind of URL is used to access that page. From a maintenance point of view, focus on keeping the number of aliases low. Make sure that content editors understand the implications of moving a whole subtree, and work to disable the Remember original URLs when moving pages setting where possible (I recommend using permanent URLs for this). An alternative could be to use redirects or rewrites in place of aliases in the IIS to achieve the best performance. A hybrid solution; managing redirects in Kentico while using IIS rewrite module to execute them is explained in the “Real World Examples - Part II” webinar here.
Media assets have two linking options. Option one is to use a getmedia link, such as ~/getmedia/eced9859-b422-4fbc-835e-925c24c60dc7/Lighthouse.aspx. This approach allows the use of resizing parameters, e.g. ~/getmedia/eced9859-b422-4fbc-835e-925c24c60dc7/Lighthouse.aspx?maxsidesize=100 which generates an image with a maximum side size of 100px. However, the performance of this approach is not as good compared to using a direct link. The direct link to this image would be ~/CorporateSite/media/VideoGallery/Lighthouse.jpg. You probably understand already that with any added feature comes a performance penalty. If the media files on your website are saved in the actual size in which they are also displayed on the website, a direct link is the most effective strategy. However, you should also be aware that moving or renaming the file will result in a broken link. In summary, a getmedia link allows you to move, rename, and resize images, whereas the direct link allows you to serve the images faster if they were already uploaded in the size that is used on the website.
A hybrid approach to this question would be the best of both worlds and means useing automatically generated thumbnails, which are created by Kentico automatically when a side size query string parameter is used (e.g. ~/CorporateSite/media/VideoGallery/__thumbnails/Lighthouse_jpg_100_75.jpg). With this approach, beyond proper initial setup, there isn’t much you can do on a regular basis to keep the website performing well, except to regularly cleanup unused image thumbnails. You can force the system to generate direct or permanent links for media files by changing the Use permanent URLs setting under Content > Media. Finally, you should monitor the site for 404 errors in order to make sure there are no broken links to these media files.
Resources and Unmanaged Files
If your main aim is to improve performance, you can move these files into a stand-alone folder (such as the CMSGlobalFiles folder), then “minify” and combine them manually. This saves CPU cycles, and the files can be linked directly. However, this does require an additional step because you will need to exclude the URLs of custom resources referenced on the site via the Excluded URLs setting under URLs and SEO > URL format. If you don’t do this, the following will happen: Kentico will try to process the URL, and will check the main CMS_Tree database table, the CMS_DocumentAlias table, and try to match the URL via a wildcard. If Kentico cannot find the file, the system will try to access it in the file system. Excluding a folder from Kentico’s processing will improve the retrieval process of these files significantly. System folders like CMSGlobalFiles, CMSPages, etc. are automatically excluded from Kentico URL processing. Furthermore, any non .NET requests (like links ending with .jpg, .bmp, .gif, etc.) are directly processed by IIS if you don’t forward all requests to Kentico via the web.config key runAllManagedModulesForAllRequests="true".
Unnecessary Performance Hogs
There are certain things that you may not think of as performance hogs once the site goes live, but they can cause quite a few headaches if not dealt with promptly. Typical examples are unused scheduled tasks, 404 errors, and improper web farm configuration.
There are several scheduled tasks that run continually without any purpose, even if a given feature is unused. System maintenance here is easy. Simply go through all of the scheduled tasks in the Scheduled tasks application and make sure only the ones are enabled which you really need. Make sure you go through the global and site-specific tasks as well as the system tasks so that no unnecessary task is missed. You should focus on the tasks with the most executions, such as Send queued emails, Send email campaigns, or Synchronize web farm changes. If you don’t use Kentico for online marketing purposes, most of these tasks can be disabled. Another option is to use an external scheduler, so the task execution is offloaded to the actual server and not executed by the IIS process itself. When using web farms, think about distributing these scheduled tasks across multiple web farm servers. The same scheduled task, if not file-system-dependent, doesn’t have to be executed by each web farm server separately.
“Page not found” or “404 Errors” are usually unaddressed when found in the Event log. However, now that you know that Kentico searches several database tables for possible URL pattern matches, you also can understand how having too many of these errors can cause serious issues during traffic spikes. I suggest monitoring the Event log for any reoccurring 404 Errors and fixing them as soon as possible by creating replacement content or blocking IPs suspected of being spam bots. Typical examples of unnecessary “Page not found” errors I’ve come across include missing robots.txt, googlesitemap.xml, apple-touch-icon-*, or favicon.ico files.
Long Term Content Strategy
Content tree architecture is very important, as explained here. But, you should also have a strategy to keep the content tree organized and healthy.
Automatic content archiving and management should also be on your list of things to setup in order for your website to succeed. The content tree in Kentico is a great place to store and manage objects and files, but you should keep the number of child nodes under every parent node below 1000. Kentico’s database indexes are setup in such a way that they perform best under these parameters. To achieve this, consider automatic archiving or automatic content management. By “content archiving” I don’t mean the Archive step in a workflow. Pages in the Archive step are still in the content tree, so they count against the “1000 children per parent node” rule. The recommended means to achieve this is to save an archived page into custom table or custom module class with a flat database structure outside of the content tree. You will need to think about the implications of storing content there (e.g. link management, taxonomy, and custom table search indexes) but on a web site with a large number of documents, content changes are an archival strategy of utmost importance. Alternative approaches are to save the given content to a custom table or module in the first place or to delete the page altogether. You may use our advanced workflow or a scheduled task to implement this auto-archival business rule.
Automatic Content Structuring
An alternative content structure could be to automatically structure the content via a custom event handler, or by using an advanced workflow. You can take a look at our Kentico blog functionality to get an idea how this works. In case of our blogs, the system automatically creates Blog month pages based on the current month. You rarely publish 1000 blogs in one month, so this keeps the content tree structure nicely organized.
“Database Health” is a broad term, but in general it means keeping only the data you need and making sure there are no database inconsistencies. There are several areas you should keep an eye on, so let’s have a look.
Keep only as many event log entries as you need. The default Event log size is 1000 entries. I have seen logs with 150, 000 entries and some with 500 entries (the Event log size setting can be adjusted under System > Event log). You may think that allowing a lower number of entries is the best approach. But there are some cases where this may degrade performance. By default, 10% of the oldest entries are cleared if the Event log reaches its maximum allowed size. If you have a page that generates 10 entries (e.g. because of 404 errors) about a page load, then having a maximum of 1000 entries may already be an issue if you have 10 concurrent users (for example). In this case, the Event log will be essentially cleared every time the page is requested, because 10 (errors) x10 (users) = 100 entries and 100 is 10% of 1000 entries, which is the default Event log size. This can lead to overhead in the system because the system tries to enter new rows into the Event log as well as deleting old entries at the same time. What is the ideal Event log size? We recommend keeping the event log around 5,000-10,000 entries. By keeping Error monitoring turned on you can take care of any 404 or other system errors as soon as possible and reduce log entries. You may of course archive the entries to a log file (e.g. via a custom scheduled task) as well.
Content and object versions are great Kentico features. However, if not configured properly, they may take over your free database space within a few days. That’s why you should make sure that only those objects and files that really need to be versioned actually are versioned. Also, make sure the version history length is adequately configured for both the System objects (Versioning & Synchronization > Object versioning > Version history) and the Content (Content > Content management > Workflow). I recommend you give these some serious thought. If you have 15,000 pages x 30 versions per page, that’s 450,000 versions in total. If you have a 100kB attachment associated with each of these versions as well, then you end up with 450,000 versions x 100 kB per attachment, per version, which means 45,000 MB (45 GB) of data. Finally, be aware that the Recycle Bin is just another feature using version history.
Web Farm Tasks
This is a feature which will be addressed in Kentico version 9, which will have options for automatic web farm recovery. However, if you are running an older version of Kentico, make sure all enabled servers are functioning. If not, tasks generated for a server that is currently down won’t be processed, and these will take up your database space quickly.
If you are using Kentico’s Online Marketing modules, make sure you have enabled clean-up of inactive contacts. Usually, contacts who don’t return to your website after a three to six months can be considered lost leads, and their activities and data only take up space in your database. You can find settings related to automatic contact removal in On-line marketing > Contact management > Inactive contacts.
Old Analytics Data
Web analytics data is another area where you can make use of an automatic cleaning schedule. As a rule of thumb, three months of data can be kept in the system without any performance degradation. Historic data may be kept in the form of saved reports, or by using Google analytics alongside Kentico’s web analytics tools. You can setup automatic data removal via Kentico’s “Remove analytics data scheduled” task where you can specify the number of days of data the system will keep.
An easy mistake to make is to configure the way staging tasks are logged without having a proper staging server configured or removing a staging server without disabling the logging of those tasks on the leaf servers in your synchronization path. This scenario is similar to setting up web farms, so the tasks pile up without ever being processed, which eats up your database space a change at a time.
Other objects may require your attention as well, but these tend not to be as performance sensitive as the ones mentioned above because the number of entries for those don’t grow exponentially with time or traffic. Keep an eye on the number of other objects; like message boards, forum posts, blog comments, emails, or translations.
The last thing to consider is making sure database inconsistencies are prevented. To do this as you set up your staging environment, make sure that content is only managed in the staging instance, and the content of the production server isn’t modified directly. If you require this setup, make sure bi-directional staging is set up for all servers where the content is edited.
File System Health
File system health isn’t as big of an issue as it was in previous years where disk space was at a premium. However, make sure you are not running out of space. You can avoid running out of space by using a network drive, Azure blob, or Amazon S3 services for your media files and attachments, because thumbnails of media files aren’t automatically cleared if not used. Additionally, make sure that you delete the contents of the Import/Export folders regularly. Finally, in versions of Kentico older than version 8, web analytics logs are stored in the file system. If the analytics processing of scheduled tasks isn’t enabled or working properly, the log files pile up without being processed. This can be an issue even if everything is working properly, because of the speed of specific storage systems, as described here.
Finding and fixing the issues mentioned in this article may be a tedious task. But, if you keep these topics in mind for your next project, they will definitely help it to be a success. You can prepare yourself for future projects by making use of our Reporting and Health Monitoring module.
The simplest way to setup basic monitoring for your website is to use our built-in health monitoring services. You can connect them to external applications, including the Performance Monitor in Windows to monitor your website 24/7.
An easy way to keep track of all the statistics and logs I’ve mentioned is to create a set of custom reports, to which you can subscribe, so that you can schedule updates about the size of your database tables or the number of 404 errors in your event log. This is a convenient and easy way to monitor your websites without the need for additional software packages. Here is a simple example of an SQL query that could be used to monitor all types of errors and their frequency:
select count(*) as 'Number of occurrences' , EventDescription from CMS_EventLog where EventType = 'E' group by EventDescription order by count(*) desc
Here are some more ideas for custom reports
- Amount of stating tasks grouped by type
- 404 errors grouped by type
- Event log errors grouped by same type
- List of web farm servers in an auto-scale cloud environment
- List of not processed web farm tasks
- List of web parts where COLUMNS are not defined
- List of search indexed not used anywhere (SS index is rebuild for no reason)
- TOP N big tables
Hotfixes and Upgrades
Applying hotfixes and performing upgrades is an area that can be discussed on its own, so they won’t be covered here, but make sure you apply hotfixes on a regular basis.
Subscribe to Errors From the Event log
Last but not least, make sure the Error notification email addresses setting under System > Emails is not empty, and make sure you have an SMTP server setup on your website, so that you get immediate notifications of any errors that are logged in your Event log.
You can optionally monitor the performance of your web site with some 3rd party tools like NewRelic.
Understanding increased traffic with Kentico web analytics or Google Analytics can also be a great exercise to prevent application and environmental failures.
Finally, there are also 3rd party services that monitor the availability of a web site by pinging the site on a regular basis and reporting all issues (e.g. 503 errors) back to site owner/administrator.
As you can see, there are plenty of maintenance tasks that you may perform to keep your website healthy for years to come. An additional reasource are the reports from KInspector, which may help you to keep your website healthy too. While this article has tried to provide a range of ideas, rather than concrete approaches; you can always schedule a consulting session with our team of Solution Architects via firstname.lastname@example.org if you have any specific scenarios you’d like to discuss. We are here to make Kentico a success for you.