10 WHITE PAPER | Best Practices for Implementing High Volume IoT Workloads with Oracle Database | Version 1.01
Copyright © 2023, Oracle and/or its affiliates.
Parallelizing a Direct Path Load
Parallel execution is a commonly used method of speeding up operations by splitting them into smaller sub-tasks. Just as
you would split up an extensive shopping list into two if your spouse went to the grocery store with you, you can take
advantage of parallel execution within the database to speed up both data ingestion and queries. Parallel execution in Oracle
Database is based on the principles of a coordinator (often called the Query Coordinator – QC for short) and parallel
execution (PX) server processes. The QC is the session that initiates the parallel SQL statement, and the PX servers are the
individual processes that perform work in parallel on behalf of the initiating session. The QC distributes the work to the PX
servers and aggregates their results before returning them to the end-user.
The external files must be processed in parallel to achieve scalable direct data loads. From a processing perspective, this
means that the input data has to be divisible into units of work - known as granules that are then processed concurrently by
the PX server processes.
Oracle can build the parallel granules without restrictions if it can position itself in an external data file and find the
beginning of the next record. For example, when the data file contains single-byte records terminated by a well-known
character (a new line or a semicolon). Each external data file is divided into granules of approximately 10 MB in size and
distributed among the parallel server processes in a round-robin fashion. For optimal parallel load performance, all files
should be similar in size, be multiples of 10MB, and have a minimum size of a few GB.
In this case, there are no constraints on the number of concurrent parallel server processes involved or the Degree of
Parallelism (DOP) other than the requested DOP for the statement.
However, when the files' format prevents Oracle from finding record boundaries to build granules (compressed data, etc.) or
when the type of media does not support position-able or seekable scans, the parallelization of the loading is defined by the
number of data files. Oracle treats each data file as a single entity – and, therefore, a single granule. The parallelization of
such data loads has to be done by providing multiple staging files, and the total number of staging files will determine the
maximum DOP possible.
Data Compression and Direct Path Load
Loading large volumes of data always begs the question of whether or not to compress the data during the data load. There
is a tradeoff between maximizing the data ingest performance and improved query performance (since less data has to be
read from disk) plus space savings. Regarding our grocery store analogy, you should consider compress akin to organizing
items in your cart. You can fit many more groceries in your cart if you spend a little time organizing them rather than just
throwing them in.
To load data in a compressed format, you must declare the target table (or partitions) as COMPRESSED. Oracle offers the
following compression algorithms:
COMPRESS/COMPRESS FOR DIRECT_LOAD OPERATIONS – block-level compression for direct path operations only
COMPRESS FOR ALL OPERATIONS – block-level compression for direct path operations and conventional DML, part of the
Advanced Compression Option
COMPRESS FOR [QUERY|ARCHIVE] [HIGH|LOW] – columnar compression, for direct path operations only, a feature of
Exadata storage
Irrespective of the compression technique, additional CPU resources will be consumed during the data load operation.
However, the benefits of compressing the data far outweigh this cost for an IoT workload as the data is typically ingested
once, never changed, and queried many times.
Memoptimized Row Store For Loading Streaming Data
For many IoT workloads, data is continuously streamed into the database directly from a smart device or application. Oracle
Database 19c offers an efficient way to ingest streaming data via the Memoptimized Row Store. With Memoptimized
Rowstore Fast Ingest, the standard Oracle transaction mechanisms are bypassed, and data is ingested into a temporary
buffer in the Large Pool. The buffer's content is then periodically written to disk via a deferred, asynchronous process. Since
the application doesn't have to wait for the data to be written to disk, the inserts statement returns extremely quickly. A
single session can ingest approximately 1.4X more rows per second than with a conventional insert. However, when an array
insert is used after batching 100 records on the mid-tier, a recommended approach, a single session can insert over 1.7X
more rows per second. More details on the performance you can expect from the Memoptimized Row Store can be found in
Appendix A.