Apache Ignite.NET

The Apache Ignite .NET Developer Hub

Welcome to the Apache Ignite .NET developer hub. You'll find comprehensive guides and documentation to help you start working with Apache Ignite.NET as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Distributed DML

Overview

Apache Ignite.NET SQL Grid not only allows selecting data from the Data Grid, using SQL ANSI-99 syntax, it makes it possible to modify that data with well-known DML statements like INSERT, UPDATE, or DELETE. By taking advantage of this ability, you can work with Apache Ignite In-Memory Data Fabric as with an in-memory distributed database fully relying on its SQL capabilities.

SQL ANSI-99 Compliance

DML queries, as well as all the SELECT queries, are SQL ANSI-99 compliant.

Ignite stores all the data in memory in form of key-value pairs and hence all the DML related operations are converted into corresponding cache key-value based commands like ICache.Put(...) or ICache.InvokeAll(...). Let's take a deep look at how the DML statements are implemented in Ignite.

DML API

In general, all the DML statements can be divided into two groups - Those that add new entries into a cache (INSERT and MERGE), and those that modify the existing data (UPDATE and DELETE).

To execute DML statements in .NET code, you should use the same Ignite API that is used for SELECT queries - SqlFieldsQuery API. This API is used by DML operations the same way it is used by read-only queries, where SqlFieldsQuery returns IQueryCursor<IList>. The only difference is that as a result of a DML statement execution, IQueryCursor<List> contains a single-item IList of long type that signifies the number of cache items that were affected by the DML statement, whereas as a result of a SELECT statement, IQueryCursor<IList> will contain a list of items retrieved from the cache.

Basic Configuration

DML does not require additional configuration on top of SQL Queries configuration.

In addition to all the fields marked with [QuerySqlField] annotation or defined with QueryEntity, there will be two special predefined fields _key and _val for every object type registered in SQL Grid. These predefined fields provide reference to key-value entries stored in a cache and can be used directly inside of DML statements.

// Preparing cache configuration.
var cacheCfg = new CacheConfiguration("personCache", new QueryEntity(typeof(long), typeof(Person)));
      
// Starting the cache.
ICache<long, Person> cache = ignite.GetOrCreateCache(cacheCfg);

// Inserting a new key-value pair referring to prefedined `_key` and `_value`
// fields for Person type.
cache.QueryFields(new SqlFieldsQuery("INSERT INTO Person(_key, _val) VALUES(?, ?)", 1L, new Person("John", "Smith")));

If you prefer to work with concrete fields rather than the whole object value, you can execute a query like the one shown below:

ICache<long, Person> cache = ignite.GetOrCreateCache(cacheCfg);

cache.QueryFields(new SqlFieldsQuery(
    "INSERT INTO Person(_key, firstName, lastName) VALUES(?, ?, ?)", 1L, "John", "Smith"));

Note that the DML engine will be able to recreate a Person object from firstName and lastName, and put it into the cache but those fields have to be defined using QueryEntity or [QuerySqlField] as described above.

Advanced Configuration

Custom Keys

If you use only predefined SQL data types for cache keys, as shown below, then there is no need to perform additional manipulation with DML related configuration.

Predefined SQL Data Types

  • all the primitives (including nullable form) except char
  • string
  • decimal
  • byte[]
  • DateTime
  • Guid

However, once you decide to introduce a custom complex key and refer to its fields from DML statements, you have to set QueryField.IsKeyField to true in QueryEntity configuration.

When using attribute-based configuration, there are no extra steps required. All fields of QueryEntity.KeyType marked with [QuerySqlField] will be treated as key fields.

When using manual QueryEntity configuration, IsKeyField should be set explicitly.

var cfg = new CacheConfiguration("cars", new QueryEntity
{
	KeyTypeName = "CarKey",
	ValueTypeName = "Car",
	Fields = new[]
	{
		new QueryField("VIN", typeof(string)) {IsKeyField = true},
		new QueryField("Id", typeof(int)) {IsKeyField = true},
		new QueryField("Make", typeof(string)),
		new QueryField("Year", typeof(int))
	}
});
<cacheConfiguration name="cars">
  <queryEntities>
	<queryEntity keyTypeName="CarKey" valueTypeName="Car">
	  <fields>
		<queryField fieldType="System.String" fieldTypeName="java.lang.String" isKeyField="true" name="VIN" />
		<queryField fieldType="System.Int32" fieldTypeName="java.lang.Integer" isKeyField="true" name="Id" />
		<queryField fieldType="System.String" fieldTypeName="java.lang.String" name="Make" />
		<queryField fieldType="System.Int32" fieldTypeName="java.lang.Integer" name="Year" />
	  </fields>
	</queryEntity>
  </queryEntities>
</cacheConfiguration>

DML Operations

MERGE

MERGE is one of the most straightforward operations because it is translated into cache.Put(...) and cache.PutAll(...) operations depending on the number of rows that need to be inserted or updated as part of the MERGE query.

The examples below show how to update the data set with a MERGE command by either providing a list of entries, or injecting a result of a subquery execution.

cache.QueryFields(new SqlFieldsQuery("MERGE INTO Person(_key, firstName, lastName)" + 	"values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
cache.QueryFields(new SqlFieldsQuery("MERGE INTO someCache.Person(_key, firstName, lastName) (SELECT _key + 1000, firstName, lastName " +
   	"FROM anotherCache.Person WHERE _key > ? AND _key < ?)", 100, 200);

INSERT

The difference between MERGE and INSERT commands is that the latter adds only those entries into a cache whose keys are not there yet.

If a single key-value pair is being added into a cache then, eventually, an INSERT statement will be converted into a cache.PutIfAbsent(...) operation. In other cases, when multiple key-value pairs are inserted, the DML engine creates an EntryProcessor for each pair and uses cache.InvokeAll(...) to propagate the data into a cache.

The examples below show how to insert a data set with an INSERT command by either providing a list of entries or injecting a result of a subquery execution.

cache.QueryFields(new SqlFieldsQuery("INSERT INTO Person(_key, firstName, " +
         "lastName) values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
cache.QueryFields(new SqlFieldsQuery("INSERT INTO someCache.Person(_key, firstName, lastName) (SELECT _key + 1000, firstName, secondName " +
   	"FROM anotherCache.Person WHERE _key > ? AND _key < ?)", 100, 200);

UPDATE

This operation updates values in a cache on per field basis.

Initially, SQL engine generates and executes a SELECT query based on the UPDATE WHERE clause and only after that it modifies the existing values that satisfy the clause result.

The modification is performed via cache.InvokeAll(...) operation. Basically, it means that once the result of the SELECT query is ready, SQL Engine will prepare a number of EntryProcessors and will execute all of them using cache.invokeAll(...) operation. While the data is being modified using EntryProcessors, additional checks are performed to make sure that nobody has interfered between the SELECT and the actual update.

The following example shows how to execute an UPDATE query in Apache Ignite.

cache.QueryFields(new SqlFieldsQuery("UPDATE Person set lastName = ? " +
         "WHERE _key >= ?", "Jones", 2L));

Inability to modify a key or its fields with an UPDATE query

The reason behind that is that the state of the key determines internal data layout and its consistency (key's hashing and affinity, indexes integrity). Hence, there is no way to update a key without removing it from cache. For example, the following query:

UPDATE _key = 11 where _key = 10;

may result in the following cache operations:

val = get(10);
put(11, val);
remove(10);

DELETE

DELETE statements' execution is split into two phases and is similar to the execution of UPDATE statements.

First, using a SELECT query, the SQL engine gathers those keys that satisfy the WHERE clause in the DELETE statement. Next, after having all those keys in place, it creates a number of EntryProcessors and executes them with cache.InvokeAll(...). While the data is being deleted, additional checks are performed to make sure that nobody has interfered between the SELECT and the actual removal of the data.

The following example shows how to execute a DELETE query in Apache Ignite.

cache.QueryFields(new SqlFieldsQuery("DELETE FROM Person " +
         "WHERE _key >= ?", 2));

Modifications Order

If a DML statement inserts/updates the whole value referring to _val field and at the same time tries to modify a field that belongs to _val, then the order in which the changes are applied is :

  • The _val is updated/inserted first.
  • The field gets updated.

The order never changes regardless of how you define it in the DML statement. For example, after the statement shown below gets executed, the final Person's value will be "Mike Smith", ignoring the fact that _val field appears after firstName in the query.

cache.QueryFields(new SqlFieldsQuery("INSERT INTO Person(_key, firstName, _val)" +
           " VALUES(?, ?, ?)", 1, "Mike", new Person("John", "Smith")));

This is similar to the execution of the query like the one below where _val appears before in the statement string.

cache.QueryFields(new SqlFieldsQuery("INSERT INTO Person(_key, _val, firstName)" +
           " VALUES(?, ?, ?)", 1, new Person("John", "Smith"), "Mike"));

The order in which the changes are applied for _val and its fields is the same for INSERT, UPDATE and MERGE statements.

Concurrent Modifications

As explained above, UPDATE and DELETE statements generate SELECT queries internally in order to get a set of cache entries that have to be modified. The keys from the set are not locked and there is a chance that their values will be modified by other queries concurrently. A special technique is implemented by the DML engine that, first, avoids locking of keys and, second, guarantees that the values will be up-to-date at the time they will be updated by a DML statement.

Basically, the engine detects a subset of the cache entries which were modified concurrently and re-executes the SELECT statement limiting its scope to the modified keys only.

Let's say the following UPDATE statement is being executed.

// Adding the cache entry.
cache.Put(1, new Person("John", "Smith");
          
// Updating the entry.          
cache.QueryFields(new SqlFieldsQuery("UPDATE Person set firstName = ? " +
         "WHERE lastName = ?", "Mike", "Smith"));

Before firstName and lastName are updated, the DML engine will generate the SELECT query to get cache entries that satisfy theUPDATE statement's WHERE clause. The statement will be the following.

SELECT _key, _value, "Mike" from Person WHERE lastName = "Smith"

Right after that, the entry that was retrieved​ with the SELECT query can be updated concurrently.

cache.Put(1, new Person("Sarah", "Connor"))

The DML engine will find out that the entry with key 1 was modified at the update phase of UPDATE query execution. After that, it will stop the update and will re-execute a modified version the SELECT query in order to get latest entries' values:

SELECT _key, _value, "Mike" from Person WHERE secondName = "Smith"
    AND _key IN (SELECT * FROM TABLE(KEY long = [ 1 ]))

This query will be executed only for outdated keys. In our example, there is only one key that is 1.

This process will repeat until the DML engine is sure at the update phase that all the entries, that are going to be updated, are up-to-date. The maximum number of attempts is 4. Presently there is no configuration parameter that can change this value.

DML engine does not re-execute the SELECT query for entries that are deleted concurrently​. The query is re-executed only for entries that are still in the cache.

Known Limitations

Subqueries in WHERE clause

SELECT queries used in INSERT and MERGE statements as well as SELECT queries automatically generated by UPDATE and DELETE operations will be distributed and executed in either colocated or non-colocated distributed modes if needed.

However, if there is a subquery that is executed as part of the WHERE clause, then it will not be executed in non-colocated distributed mode. The subquery will be executed in the colocated mode over the local data set all the times.

For example, in the following query:

DELETE FROM Person WHERE _key IN
    (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)

the DML engine will generate the SELECT query in order to get a list of entries that need to be deleted. The query will be distributed and executed across the cluster and will look like the one below:

SELECT _key, _val FROM Person WHERE _key IN
    (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)

However, the subquery from IN clause (SELECT personId FROM "salary".Salary ...) will not be distributed further and will be executed over the local data set present on a cluster node.

EXPLAIN support for DML statements

Presently, EXPLAIN is not supported for DML operations.

One possible approach is to execute EXPLAIN for the SELECT query that is automatically generated (UPDATE, DELETE) or used (INSERT, MERGE) by DML statements. This will give you an insight on the indexes that are used when a DML operation is executed.

Example

Ignite distribution includes ready-to-run QueryDmlExample as a part of its sources. This example demonstrates the usage of all the above-mentioned DML operations.

Distributed DML