Kentico 8 Technology - DataQuery Advanced API

   —   
You are already familiar with the concept of DataQuery and its basic capabilities. Now is the time to show you some more advanced stuff that you can leverage to write more complex queries. Want to learn more? Read this article.
Hi There,

Today, I would like to show you some more advanced scenarios that you can leverage with DataQuery. I expect that you have already read my previous two articles, DataQuery concept and DataQuery API. If you haven’t, please read them before you continue.

Leveraging the source placeholder

As I have shown you before, one part of the general select query is the ##SOURCE## placeholder. DataQuery, by default, selects data from the table that contains the data for a particular class. This information is retrieved from the DataClassInfo.ClassTableName property, which is stored in the corresponding column in the CMS_Class database table.

But, sometimes you may need to retrieve the information from other sources. To do that, you can leverage a method called From – you just enter the expression that will be used as a source of the data. The most common usage is “redirecting” the source of data to a view or other table, but you can also use any expression including a call to the table-valued function in the database which would return the source data. Here is a quick example:

var contacts = ContactInfoProvider.GetContacts() .From("My_Contact_View") .WhereEquals("MyColumn", true);

 
This outputs the following SQL code:

SELECT * FROM My_Contact_View WHERE MyColumn = 1

 
Note that if the source of data is compatible enough with the object type, you can still enumerate the results using the strongly typed enumeration. Otherwise, just work with the data as a DataSet.

This is just a basic example. Now let me show you how you can leverage this for JOIN.

Using JOINs

As you probably already know, each object type has a metadata called type info. Among other aspects, this metadata connects the object type to a class record that contains metadata for the storage and data structure. So in the end, every object type knows where it is stored in the database. Kentico leverages this information to provide correct data within each DataQuery, but also to provide you with the ability to use JOINs within your query.

Of course, you could just enter the full expression with JOIN as a source, but I strongly recommend you to use it only if there is no other option available. The reason is the same as for the whole DataQuery – The more you abstract from database syntax, the more future and upgrade-proof your code will be.

Now let me show you some examples with JOINs to give you an idea of how it works.

var contactsWithCountries = ContactInfoProvider.GetContacts() .Source(s => s.Join<CountryInfo>("ContactCountryID", "CountryID"));

 
This outputs the following SQL code:

SELECT * FROM OM_Contact INNER JOIN cms_country ON OM_Contact.ContactCountryID = cms_country.CountryID

 
You can see that I didn’t call the methods directly. Instead, I am starting with the existing source, and extending it. But you can also start with any expression or even write your own source class if you’d like, by combining this approach with the previous approach:

var query = new DataQuery() .From( new QuerySource("My_Contact_View") .LeftJoin("My_Countries_View", "ContactCountryID", "CountryID") );


Note that I haven’t used any strong types to get the data, but referred to completely custom views in this case. It yields the following SQL query:

SELECT * FROM My_Contact_View LEFT OUTER JOIN My_Countries_View ON My_Contact_View.ContactCountryID = My_Countries_View.CountryID

 
So those were some basics about JOINs, have a look at the Intellisense or class reference for other options.

As always, if for any reason Kentico doesn’t allow you to fully abstract from SQL syntax with any of the aspects of DataQuery, let us know. Providing more options is not a problem, we just need to know about the particular requirements, and get some idea about the priorities.

External sources of data

Speaking about sources, the previous example technically modifies only the query text, but still executes the query in database, which limits the data to a Kentico database (or other MS SQL databases for that matter if you use CMSConnectionScope).

DataQuery is more powerful than that, and can use a completely different source of data. Similar to how we have abstraction of a source expression, we have abstraction also for the whole source of data. There is a method called WithSource, which sets up the query source to an instance of class that inherits from the DataQuerySource. Kentico leverages this for a couple of scenarios where it needs to get the data from external sources, such as LinkedIn or Data.com data. There is also one very important source called MemoryDataQuerySource, which can serve the data from a DataSet in memory. This can be leveraged to build a query over an XML source, or to model the whole data source in the code.

Does that sound familiar yet? It should! Kentico leverages this to support fake data in UnitTests. The WithData method simply takes your data, models a MemoryDataQuerySource from it and gives it to the provider, which then passes it to its queries. Here is an example of how you can leverage it:

var srcData = new DataSet(); srcData.ReadXml("c:\\mydata.xml"); var source = new MemoryDataQuerySource(srcData); var query = new DataQuery() .WithSource(source) .WhereEquals("MyColumn", true);


Note that in this case, the selection options are limited by the support of the select method in the memory DataTable. I’ve used this simple example for the purpose of demonstration, but you can leverage this for getting the data from various external sources.

The query is read-only in all such cases. We plan to look at options of supporting both-ways query in integration scenarios in the future.

Advanced paging options

As I mentioned earlier, you can set up the query for a specific range of results to leverage paging using the Page method. For more advanced scenarios, such as batch processing, there are some more advanced options represented by methods with a prefix ForEach…
  • ForEachRow – Executes the given action for each DataRow in results (DataQuery)
  • ForEachObject - Executes the given action for each object in results (ObjectQuery)
  • ForEachPage - Executes the given action for each page in results. Each page returned is represented by the query itself, just adjusted to a correct offset.
All these methods work on the same principle. Using the batch size provided in the method parameter, they iteratively set up the query one page after another, calling the action for corresponding items. Imagine you have 1 million contacts in your database, and you need to process them one-by-one. Acquiring all of them with a single query would simply cause too much overhead, especially in allocated memory. Using one of these methods, you can process them in more meaningful batches. Let’s say 10k at a time. While your API call stays intuitive without any complications, the system handles the batches for you automatically:
       
var contacts = ContactInfoProvider.GetContacts(); contacts.ForEachObject(contact => DoSomeAction(contact), 10000);

 
I have used the example applying a strongly typed iterator. Always consider the general overhead of each cycle, and opt to use the ForEachRow method for performance-critical operations if necessary.

(Im)mutability of DataQuery

I have mentioned in my earlier articles that the DataQuery is mutable on purpose, and each setup method call modifies the existing object instead of creating a new one as a clone of the source query. The main reason for this is the lowest overhead possible. But in some cases, you may need to alter this behavior to make sure that particular instances of DataQuery do not influence other code.

We provide several methods to cover these scenarios. For all these methods, note that once a query is made immutable, you shouldn’t call any setup methods without replacing your query variable with the result of that method. Only mutable queries can be set up by calling the methods on the original query object.

Clone

This method creates a clone of the query that you can further modify, thereby producing two copies of the same query (original and clone) effectively.

You can use this in scenarios where you want to be sure that you don't modify the original query that you created, and need to extend it in your code right away.

Immutable

This method "locks" the query in it's current state, making sure that the next setup operation produces a clone of the query. This clone is mutable, so the subsequent setup steps are modifying the single object that was produced as a result of the clone.

Unlike Clone, this doesn't construct a new query until modification is actually needed, which can be used in case you want to pass query to the parameter of a method – and be sure that the method doesn't modify it. It is also recommended in scenarios where more processes need to access the same query object, but are not able to modify it.
Also, if you need to start with a single provider call to GetUsers(), and get several different sets of users by the same query, you can lock the base, and allow further modifications.

var baseQuery = UserInfoProvider.GetUsers() .OrderBy("FullName") .Immutable(); // Base query made immutable so we don't change it further var editors = baseQuery.WhereTrue("UserIsEditor"); // Editors only var admins = baseQuery.WhereTrue("UserIsGlobalAdministrator"); // Global admins only

Snapshot

This provides a combination of the previous two methods. It produces an Immutable clone of the current query so that the original can be further modified. But, any further modification of the result produces a new query.

This may be useful in cases when several processes share the same query (and you want to further modify the original), but you need to make sure that query is not modified by those processes. If you want to leverage the method ForEachPage, but not process the pages immediately, just store their queries and further modify them – the method Snapshot allows you to do that with ease.

LINQ and execution in database

We sometimes get questions like, “How can you tell if the LINQ query that you write against Kentico API executes in the database or fallbacks to LINQ to objects?”

Just a friendly reminder: Kentico provides its own LINQ provider for queries that behave in a similar manner as LINQ to SQL, except that it automatically provides fallback to LINQ to objects in case it is unable to parse the expression.

When a LINQ query is executed against Kentico DataQuery, it performs the following actions:
  • Attempts to parse the query to interpret it in database
    • If successful, the resulting object is a modified clone of the original query (this is the only default scenario where DataQuery behaves as immutable)
    • If not successful, the query object is wrapped into plain IEnumerable, and the execution fallbacks to LINQ to objects
So, you can recognize which of these two scenarios occurred simply by checking the result type, as shown here:

ObjectQuery<ContactInfo> contacts = ContactInfoProvider.GetContacts(); IEnumerable<ContactInfo> filtered = from c in contacts where c.ContactCompanyName.ToUpper() == "KENTICO" select c; if (filtered is IDataQuery) { // Executed in DB } else { // Fallback to LINQ to objects was done }

 
If you find out that the fallback occurs, and more than just a couple of records are processed, I strongly recommend you alter the query in order to be more efficient.

In my case, the ToUpper method is not available for evaluation on SQL side, so fallback occurs. If you remove that method call, the query will be executed in database. I have also used specifically typed variables to demonstrate the concept, but I recommend that you use the var keyword as much as possible – it will prepare you better for potential future updates.

Future of DataQuery API

As we get feedback from, both, our clients and also from internal development, we are further extending the DataQuery concept to cover even more scenarios. Let me just quickly summarize what we have already prepared for you in Kentico 8.1:
  • More object-oriented insides – The current DataQuery builds column lists just as a single string expression. In the upcoming version, column lists will be completely object-oriented with less overhead, and more options. We plan a similar transition with where conditions and other aspects of DataQuery in future versions.
  • More specific methods for where – We are fully covering all operators using methods such as WhereContains, to provide an easier learning curve for beginners, and better security (as seen with LIKE operators that have an uncontrolled input, which can be generally abused for DoS attacks on SQL server).
  • More general where conditions – Current where conditions are limited to format “<column name> <operator> <value>” which is quite limiting, because you can’t simply turn sides, or even include more complex expression instead of value. You also can’t compare a column to another column in Kentico 8. There will be new options in the upcoming version to give you full control over the where conditions. To give you an idea, here’s a small teaser:
var contacts = ContactInfoProvider.GetContacts() .WhereEquals("ContactFirstName".AsColumn(), "ContactLastName".AsColumn());

 

Wrap up

So that is all for today. You have learned how to leverage sources, joins, paging, and immutability of the queries – I hope you find these things helpful. There is not much more that I can tell you about the current DataQuery since I pretty much covered everything that’s important in my articles. However, if something is not yet clear, do not hesitate to ask.

As always, we would love to hear your feedback or any other requirements you may have that would help you work with DataQuery API. Feel free to comment on this article or post your ideas to http://ideas.kentico.com/

See you next time!
Share this article on   LinkedIn Google+

Martin Hejtmanek

Hi, I am the CTO of Kentico and I will be constantly providing you the information about current development process and other interesting technical things you might want to know about Kentico.

Comments

MartinH commented on

Not directly, you either need to iterate through objects with just their data, or a DataSet with all results. You can either use standard enumerators or ForEachXYZ (ForEachRow, ForEachObject) methods to iterate through all results.

Note that you can use ForEachRow and create the particular object from DataRow using regular constructor to have access to both, that is basically what the query does in the background. Note that constructing objects has higher overhead than using the raw DataRow data for obvious reasons, so if your code is performance-critical, I recommend to use all data directly from DataRow anyway.

jkrill-janney commented on

Thanks Martin! That's very helpful. Is there anyway to access the underlying DataRow (or some other raw data object) for each data item? That is, is there anyway to do something like this, given the previous example:

var contactsWithCountries = ContactInfoProvider.GetContacts().Source(s => s.Join<CountryInfo>("ContactCountryID", "CountryID"));
foreach(var contact in contactsWithCountries)
{
// Can I access the underlying DataRow or result data from here somehow?
contact.DataRow; // Doesn't exist.
contact.GetUnderlyingFieldValue("CountryDisplayName"); // Also doesn't exist. But would be very nice.
}

Or some similar way?

MartinH commented on

Hi,

In that case, you need to access the resulting data as a DataSet either by casting it to a DataSet variable, or by using the Result property of the query.

jkrill-janney commented on

In regards to JOINs, how do you actually access the underlying data, then?

Given this:

var contactsWithCountries =
ContactInfoProvider.GetContacts()
.Source(s => s.Join<CountryInfo>("ContactCountryID", "CountryID"));

How can I access the "CountryDisplayName" column, for example?

MartinH commented on

Hi,

Yes, you should be able to use .WhereExists(nestedQuery) to achieve that. Please contact the support if you need further clarification.

http://devnet.kentico.com/docs/8_0/api/html/M_CMS_DataEngine_WhereConditionBase_1_WhereExists.htm

Kamran commented on

Martin great article and feature.

Would it be possible to write code that would result in a query similar to shown below;

SELECT *
FROM suppliers
WHERE EXISTS (SELECT OrderID
FROM orders
WHERE suppliers.supplier_id = orders.supplier_id);