Architecture matters!

   —   

As a Solution Architect, I very often deal with clients’ websites getting slower and slower until they reach the point at which even the Kentico UI stops responding. That’s usually the point where we (the consulting department) are called in to help solve the urgent situation. Of course, we can always do something to get the website running again, but such a situation is usually caused by bad non-scalable architecture and a misconfiguration of Kentico or the IIS and could be avoided. The purpose of this article is to emphasize how important it is to think about architecture carefully when building a complex website. I’ll demonstrate this in two real-life examples.

Client's scenario 1:

We have a large base of articles with about 50 new articles being created every day. We need those articles to go through a workflow, to be stored for a long period of time (10 years) and to be searchable.

As the only place we can have a workflow at the same time as smart search is in the content tree, this would appear the obvious storage place for these articles. But we would soon reach the limit for documents per section (1000), as well as the maximum total number of documents in the content tree before performance issues are a problem (note that those numbers are just an estimated numbers and depend on many aspects, but the general rule to keep content tree as light as possible holds). So how do we deal with such requirements and how do we ensure our website is both scalable and sustainable?
First we need to deal with the limit of 1000 documents per section. The solution is very simple—structure. Don’t keep your content tree flat; in storing your articles, introduce some structure. For example, why not store them in a:  Articles / <year of publishing> / <month of publishing> structure?
Some may argue that this this is too tedious to ask of the content editors; requiring a year/ month (or even day) structure in the content tree. I agree. And would not be happy myself if I had to do it manually for 50 articles each day. But it is very easy for developers to write just a few lines of code to make this an automatic process, so that the content editors can carry on as normal, creating new articles as a child of the Articles section. What we need to make this happen is an event for document creation/update, which is already in Kentico! DocumentEvents.Insert.Before and DocumentEvents.Update.Before events are all we need. We can store the document in its right location automatically, based on its creation date.
So the first problem is solved. By introducing this structure we won’t face the 1000-documents-per-section limit. As we near it, however, we can introduce yet another level of structure and separate  articles by week of publishing or a day of publishing, depending on how many articles we insert daily/ weekly).
So what about the second limit of 150k documents in total? Well, we’ve actually exacerbated this problem with our new structure as each folder for year / month / day is another document in the content tree!
Here, the magic word is “archiving”. You need to have an archival plan. And I’m not talking about the “Archive” button in the Pages application—this would not solve our problem because the documents stay in the content tree but are just “unpublished” (i.e. invisible on live site). That’s exactly opposite of what we want! We want our articles to be visible and searchable. What we don’t need for these old articles is the workflow or their editing history, as nobody will ever modify them now.
So, to solve our problem we can write a scheduled task that would periodically (e.g. once in a month during a quieter period) check the content tree and take articles older than (let’s say) 2 years (depending on numbers of articles) and move them from the content tree to a Custom table. Now we only store the last 2 years’ worth of articles in the content tree, no more. So the tree never grows.
We will also delete the version and workflow history of the old articles so those tables in the DB don’t grow either. The only thing that would grow are the Custom tables, but they are capable of handling much more than the content tree as they are much simpler objects—only the Article fields will be stored, eradicating the processing overhead of documents.
We are still able to very easily display data from the Custom tables on the live site using the out-of-the-box Custom table datasources webparts, which we can connect to any repeater or datalist controls. (There is standard API for working with data from Custom tables.) Smart search is also supported for Custom table data and search webparts are able to combine results from more than one index, so our users won’t see any difference between content received from the content tree and the Custom tables. Actually, they will see one significant difference—the website will be fast, even after 2 years of article storage. J
 

 

Client’s scenario 2:

We have 5 million articles to be displayed on our websites. Each article has to go through a complex workflow process where several roles are involved before publishing, so we need to have those articles in the content tree as pages are the only objects in Kentico that can be under a workflow.

How do we deal with this project? Five million is way too much of a strain on the content tree, as the default indexes in the Kentico DB are tuned for up to 150k documents (again, estimate depending on many aspects). The archival strategy does not help us, as we want to display five million items at once. And there are no outdated documents that can be (re)moved from the content tree to lighten it. We also cannot use Kentico Custom tables as there is no workflow for objects in Custom tables.... Have we reached the limit of Kentico and should start looking somewhere else? Not at all!
We just need to start asking the right questions about the article publishing process and to think it through one more time. The most important question is "How many editors do you have creating the documents and what is the total number of people involved in the approval process?" In this client’s case, the answer was "5 editors and up to 10 people all together." This means that there couldn’t be too many articles in a not-published state waiting in the middle of the workflow process for approval. When I asked how many unpublished (waiting-for-approval) articles there were at a given time, the answer was “less than 50”. This showed me that it wasn’t a custom archiving process they needed, but a custom publishing process!
The fact is that in this scenario the workflow is really only needed for the initial process of creating the document. Once the document is fully approved and published, it is simply displayed on the live site, with no changes made to it before it is so outdated that it can be removed. So why not store just those in the process of creation inside the content tree, and upon publishing move it to a Custom table? Then we can just display everything on the live site from Custom tables and not from the content tree!
This satisfies all the initial requirements; the articles can go through a complex workflow and there is no problem storing five million records as Custom tables are capable of that. The only thing you need to do is implement a simple logic on the Publish event which gets the data of the document and puts it into the Custom table before removing it from the content tree to keep it light.

 

 

Summary

Though these ideas are highly effective at dealing with given scenarios, don't take this article as a silver bullet for every situation—each project is different. Instead, take this article as more proof of how important it is to think about the architecture of your solution first. And don’t just think one month into the future, think about how your site will look in a year, or two. Don’t rely on tests you perform on your website with 5 testing pages, think about how many pages it will have in production. If you know a few basic facts about (and limitations of) Kentico and take your time to think about appropriate architecture, you'll be able to handle more complex and more demanding projects far better than you may think now. If you are not sure, the Kentico Consulting team is here for you, to help with (not just) initial architecture. So don't hesitate to contact us at consulting@kentico.com.

 

Share this article on   LinkedIn

Comments

stepank-kentico commented on

Mark, both solutions are perfectly fine in regards of full-text search (Smart Search) as custom tables support fulltext search - you can create smart search indexes over those as well as over pages in content tree. Moreover, Kentico Smart Search webparts are able to combine results from more indexes so your end user won't notice any difference when data are stored in custom tables!

Mark commented on

How do either of your proposed solutions address the issue of full-text searching?
If articles get moved to custom tables then won't they be unavailable to search?