Overhead of Keeping the IM Column Store Transactionally Consistent
The overhead of keeping the IM column store transactionally consistent will vary by application based on a number
of factors, including: the rate of change, the in-memory compression level chosen for a table, the location of the
changed rows, and the type of operations being performed. Tables with higher compression levels will incur more
overhead than tables with lower compression levels.
Changed rows that are co-located in the same block will incur less overhead than changed rows that are spread
randomly across a table. Examples of changed rows that are co-located in the same blocks are newly inserted rows
since the database will usually group these together. Another example is data that is loaded using a direct path load
operation.
For tables that have a high rate of DML,
MEMCOMPRESS FOR DML
is recommended, and, where possible, it is also
recommended to use partitioning to localize changes within the table. For example, range partitioning can be used
to localize data in a table by date so most changes will be confined to data stored in the most recent partition. Date
range partitioning also provides many other manageability and performance advantages.
The In-Memory Column Store on RAC
Each node in a RAC environment has its own IM column store. It is highly recommended that the IM column stores
be equally sized on each RAC node. Any RAC node that does not require an IM column store should have the
INMEMORY_SIZE
parameter set to 0. By default all objects populated into memory will be distributed across all of the
IM column stores in the cluster. It is also possible to have the same objects appear in the IM column store on every
node (Engineered Systems only). The distribution of objects across the IM column stores in a cluster is controlled by
two additional sub-clauses to the
INMEMORY
attribute:
DISTRIBUTE
and
DUPLICATE
.
In a RAC environment, an object that only has the
INMEMORY
attribute specified on it will be distributed across all of
the IM column stores in the cluster, effectively making the IM column store a shared-nothing architecture. How an
object is distributed across the cluster is controlled by the
DISTRIBUTE
sub-clause. By default, Oracle decides the
best way to distribute the object across the cluster given the type of partitioning used (if any). Alternatively, you can
specify
DISTRIBUTE BY ROWID RANGE
to distribute by rowid range,
DISTRIBUTE BY PARTITION
to distribute
partitions to different nodes, or
DISTRIBUTE BY SUBPARTITION
to distribute sub-partitions to different nodes.
ALTER TABLE lineorder INMEMORY DISTRIBUTE BY PARTITION
Figure 32. This command distributes the lineorder table across the IM column stores in the cluster by partition.
DISTRIBUTE BY PARTITION
or
SUBPARTITION
is recommended if the tables are partitioned or sub-partitioned by
HASH
and a partition-wise join plan is expected. This will allow each partition join to be co-located within a single
node.
DISTRIBUTE BY ROWID RANGE
can be used for non-partitioned tables or for partitioned tables where
DISTRIBUTE BY PARTITION
would lead to data skew.
If the object is very small (consists of just 1 IMCU), it will be populated into the IM column store on just one node in
the cluster.
Since data populated in-memory in a RAC environment is affinitized to a specific RAC node, parallel server
processes must be employed to execute a query on each RAC node against the piece of the object that resides in
that node’s IM column store. The query coordinator aggregates the results from each of the parallel server
processes together before returning them to the end user’s session. In order to ensure the parallel server processes
are distributed appropriately across the RAC cluster, the location of the data needs to be known. Previously,
Automatic Degree of Parallelism (Auto DOP) was required so that the query coordinator could ensure that the
22 | ORACLE DATABASE IN-MEMORY WITH ORACLE DATABASE 19C