Kentico 8 Technology - DataQuery Advanced API
You are already familiar with the concept of DataQuery and its basic capabilities. Now is the time to show you some more advanced stuff that you can leverage to write more complex queries. Want to learn more? Read this article.
Hi There,
Today, I would like to show you some more advanced scenarios that you can leverage with DataQuery. I expect that you have already read my previous two articles,
DataQuery concept and
DataQuery API. If you haven’t, please read them before you continue.
Leveraging the source placeholder
As I have shown you before, one part of the general select query is the
##SOURCE## placeholder. DataQuery, by default, selects data from the table that contains the data for a particular class. This information is retrieved from the DataClassInfo.ClassTableName property, which is stored in the corresponding column in the CMS_Class database table.
But, sometimes you may need to retrieve the information from other sources. To do that, you can leverage a method called
From – you just enter the expression that will be used as a source of the data. The most common usage is “redirecting” the source of data to a view or other table, but you can also use any expression including a call to the table-valued function in the database which would return the source data. Here is a quick example:
var contacts =
ContactInfoProvider.GetContacts()
.From("My_Contact_View")
.WhereEquals("MyColumn", true);
This outputs the following SQL code:
SELECT * FROM My_Contact_View WHERE MyColumn = 1
Note that if the source of data is compatible enough with the object type, you can still enumerate the results using the strongly typed enumeration. Otherwise, just work with the data as a DataSet.
This is just a basic example. Now let me show you how you can leverage this for JOIN.
Using JOINs
As you probably already know, each object type has a metadata called type info. Among other aspects, this metadata connects the object type to a class record that contains metadata for the storage and data structure. So in the end, every object type knows where it is stored in the database. Kentico leverages this information to provide correct data within each DataQuery, but also to provide you with the ability to use JOINs within your query.
Of course, you could just enter the full expression with JOIN as a source, but I strongly recommend you to use it only if there is no other option available. The reason is the same as for the whole DataQuery – The more you
abstract from database syntax, the more
future and upgrade-proof your code will be.
Now let me show you some examples with JOINs to give you an idea of how it works.
var contactsWithCountries =
ContactInfoProvider.GetContacts()
.Source(s => s.Join<CountryInfo>("ContactCountryID", "CountryID"));
This outputs the following SQL code:
SELECT * FROM OM_Contact INNER JOIN cms_country ON OM_Contact.ContactCountryID = cms_country.CountryID
You can see that I didn’t call the methods directly. Instead, I am starting with the
existing source, and extending it. But you can also start with any expression or even write your own source class if you’d like, by combining this approach with the previous approach:
var query =
new DataQuery()
.From(
new QuerySource("My_Contact_View")
.LeftJoin("My_Countries_View", "ContactCountryID", "CountryID")
);
Note that I haven’t used any strong types to get the data, but referred to completely custom views in this case. It yields the following SQL query:
SELECT * FROM My_Contact_View LEFT OUTER JOIN My_Countries_View ON My_Contact_View.ContactCountryID = My_Countries_View.CountryID
So those were some basics about JOINs, have a look at the Intellisense or class reference for other options.
As always, if for any reason Kentico doesn’t allow you to fully abstract from SQL syntax with any of the aspects of DataQuery, let us know. Providing more options is not a problem, we just need to know about the particular requirements, and get some idea about the priorities.
External sources of data
Speaking about sources, the previous example technically modifies only the query text, but still executes the query in database, which limits the data to a Kentico database (or other MS SQL databases for that matter if you use CMSConnectionScope).
DataQuery is more powerful than that, and can use a completely different source of data. Similar to how we have abstraction of a source expression, we have abstraction also for the whole source of data. There is a method called
WithSource, which sets up the query source to an instance of class that inherits from the
DataQuerySource. Kentico leverages this for a couple of scenarios where it needs to get the data from external sources, such as LinkedIn or Data.com data. There is also one very important source called
MemoryDataQuerySource, which can serve the data from a DataSet in memory. This can be leveraged to build a query over an XML source, or to model the whole data source in the code.
Does that sound familiar yet? It should! Kentico leverages this to support fake data in UnitTests. The
WithData method simply takes your data, models a
MemoryDataQuerySource from it and gives it to the provider, which then passes it to its queries. Here is an example of how you can leverage it:
var srcData = new DataSet();
srcData.ReadXml("c:\\mydata.xml");
var source = new MemoryDataQuerySource(srcData);
var query =
new DataQuery()
.WithSource(source)
.WhereEquals("MyColumn", true);
Note that in this case, the selection options are limited by the support of the select method in the memory DataTable. I’ve used this simple example for the purpose of demonstration, but you can leverage this for getting the data from various external sources.
The query is read-only in all such cases. We plan to look at options of supporting both-ways query in integration scenarios in the future.
Advanced paging options
As I mentioned earlier, you can set up the query for a specific range of results to leverage paging using the
Page method. For more advanced scenarios, such as batch processing, there are some more advanced options represented by methods with a prefix ForEach…
-
ForEachRow – Executes the given action for each DataRow in results (DataQuery)
-
ForEachObject - Executes the given action for each object in results (ObjectQuery)
-
ForEachPage - Executes the given action for each page in results. Each page returned is represented by the query itself, just adjusted to a correct offset.
All these methods work on the same principle. Using the
batch size provided in the method parameter, they iteratively set up the query
one page after another, calling the action for corresponding items. Imagine you have 1 million contacts in your database, and you need to process them one-by-one. Acquiring all of them with a single query would simply cause too much overhead, especially in allocated memory. Using one of these methods, you can process them in more meaningful batches. Let’s say
10k at a time. While your API call stays intuitive without any complications, the system handles the batches for you automatically:
var contacts = ContactInfoProvider.GetContacts();
contacts.ForEachObject(contact => DoSomeAction(contact), 10000);
I have used the example applying a strongly typed iterator. Always consider the general overhead of each cycle, and opt to use the
ForEachRow method for performance-critical operations if necessary.
(Im)mutability of DataQuery
I have mentioned in my earlier articles that the
DataQuery is mutable on purpose, and each setup method call modifies the existing object instead of creating a new one as a clone of the source query. The main reason for this is the
lowest overhead possible. But in some cases, you may need to alter this behavior to make sure that particular instances of DataQuery do not influence other code.
We provide several methods to cover these scenarios. For all these methods, note that once a query is made immutable, you shouldn’t call any setup methods without replacing your query variable with the result of that method. Only mutable queries can be set up by calling the methods on the original query object.
Clone
This method creates a
clone of the query that you can further modify, thereby producing two copies of the same query (original and clone) effectively.
You can use this in scenarios where you want to be sure that you don't modify the original query that you created, and need to extend it in your code right away.
Immutable
This method
"locks" the query in it's current state, making sure that the next setup operation produces a
clone of the query. This clone is mutable, so the subsequent setup steps are modifying the single object that was produced as a result of the clone.
Unlike Clone, this doesn't construct a new query until modification is actually needed, which can be used in case you want to pass query to the parameter of a method – and be sure that the method doesn't modify it. It is also recommended in scenarios where more processes need to access the same query object, but are not able to modify it.
Also, if you need to start with a single provider call to GetUsers(), and get several different sets of users by the same query, you can lock the base, and allow further modifications.
var baseQuery =
UserInfoProvider.GetUsers()
.OrderBy("FullName")
.Immutable(); // Base query made immutable so we don't change it further
var editors = baseQuery.WhereTrue("UserIsEditor"); // Editors only
var admins = baseQuery.WhereTrue("UserIsGlobalAdministrator"); // Global admins only
Snapshot
This provides a combination of the previous two methods. It produces an
Immutable clone of the current query so that the
original can be further modified. But, any further modification of the result produces a new query.
This may be useful in cases when several processes share the same query (and you want to further modify the original), but you need to make sure that query is not modified by those processes. If you want to leverage the method ForEachPage, but not process the pages immediately, just store their queries and further modify them – the method Snapshot allows you to do that with ease.
LINQ and execution in database
We sometimes get questions like, “How can you tell if the LINQ query that you write against Kentico API executes in the database or fallbacks to LINQ to objects?”
Just a friendly reminder: Kentico provides its own LINQ provider for queries that behave in a similar manner as LINQ to SQL, except that it automatically provides fallback to LINQ to objects in case it is unable to parse the expression.
When a LINQ query is executed against Kentico DataQuery, it performs the following actions:
-
Attempts to parse the query to interpret it in database
-
If successful, the resulting object is a modified clone of the original query (this is the only default scenario where DataQuery behaves as immutable)
-
If not successful, the query object is wrapped into plain IEnumerable, and the execution fallbacks to LINQ to objects
So, you can recognize which of these two scenarios occurred simply by checking the result type, as shown here:
ObjectQuery<ContactInfo> contacts = ContactInfoProvider.GetContacts();
IEnumerable<ContactInfo> filtered =
from c in contacts where c.ContactCompanyName.ToUpper() == "KENTICO" select c;
if (filtered is IDataQuery)
{
// Executed in DB
}
else
{
// Fallback to LINQ to objects was done
}
If you find out that the fallback occurs, and more than just a couple of records are processed, I strongly recommend you alter the query in order to be more efficient.
In my case, the
ToUpper method is not available for evaluation on SQL side, so fallback occurs. If you remove that method call, the query will be executed in database. I have also used specifically typed variables to demonstrate the concept, but I recommend that you use the
var keyword as much as possible – it will prepare you better for potential future updates.
Future of DataQuery API
As we get feedback from, both, our clients and also from internal development, we are further extending the DataQuery concept to cover even more scenarios. Let me just quickly summarize what we have already prepared for you in Kentico 8.1:
-
More object-oriented insides – The current DataQuery builds column lists just as a single string expression. In the upcoming version, column lists will be completely object-oriented with less overhead, and more options. We plan a similar transition with where conditions and other aspects of DataQuery in future versions.
-
More specific methods for where – We are fully covering all operators using methods such as WhereContains, to provide an easier learning curve for beginners, and better security (as seen with LIKE operators that have an uncontrolled input, which can be generally abused for DoS attacks on SQL server).
-
More general where conditions – Current where conditions are limited to format “<column name> <operator> <value>” which is quite limiting, because you can’t simply turn sides, or even include more complex expression instead of value. You also can’t compare a column to another column in Kentico 8. There will be new options in the upcoming version to give you full control over the where conditions. To give you an idea, here’s a small teaser:
var contacts =
ContactInfoProvider.GetContacts()
.WhereEquals("ContactFirstName".AsColumn(), "ContactLastName".AsColumn());
Wrap up
So that is all for today. You have learned how to leverage sources, joins, paging, and immutability of the queries – I hope you find these things helpful. There is not much more that I can tell you about the current DataQuery since I pretty much covered everything that’s important in my articles. However, if something is not yet clear, do not hesitate to ask.
As always, we would love to hear your feedback or any other requirements you may have that would help you work with DataQuery API. Feel free to comment on this article or post your ideas to
http://ideas.kentico.com/
See you next time!