New statistical, analytic and machine learning features in 18c

Oracle 18c has been released and comes with many new features and incremental improvements. At the moment it is currently available on the Oracle Cloud and for Oracle Engineered System. We will have to wait until later in the year before we will be able to perform out own installs either on-premises or on our own cloud instances. This will be most likely for Oracle 18.2 or 18.3 release.

In the mean time we will have to, more than likely, use the Oracle cloud to play with Oracle. For this you will need a paid cloud account or maybe you have gotten your hands on some free cloud credits.

In this blog post we will look at some of the new analytical and analytics features available in Oracle 18c. The new features listed here are not a complete list, but comprise those that will have the most impact for developers and data scientists in their everyday work.

Approximate Top-N Query Processing

Top-N queries have been introduced in Oracle 12.1c, and additional enhancements came in Oracle 12.2c. With Oracle 18c we have some additional enhancements with APPROX_COUNT and APPROX_SUM functions, which can now be used with APPROX_RANK. The following examples illustrate these new functions.

SELECT state, approx_count(*) 
FROM   insur_cust_ltv_sample
GROUP BY state 
HAVING approx_rank(partition by state order by approx_count(*) desc) <= 1;

When using APPROX_COUNT and APPROX_SUM you will need to include a HAVING clause that uses the APPROX_RANK function; otherwise, you will get the following error message:

ORA-62237: Missing HAVING clause when approximate top functions are used.

The APPROX_* functions have proven over the previous version of the database to give significant performance improvement and query response times. These functions are not suitable for all application areas, but in the era of big data and in data warehousing environments they have proved very useful.

Analytic Views Enhancements

Analytics Views comes with a few new enhancements. These include the ability to add new filters and measures to the view definition. These are enabled with the new FILTER FACT and ADD MEASURE keywords. These can be used to filter and restrict the data set before the aggregation is performed. With ADD MEASURE you can define calculations with a SELECT statement. The following example illustrates the use of the FILTER FACT keyword to filter records in the hierarchy.

SELECT time_hier.member_name, 
       TO_CHAR(sales, '999,999,999,999') AS sales

FROM ANALYTIC VIEW (                     -- inline analytic view

<code> USING sales_av HIERARCHIES(time_hier)</code>
FILTER FACT (time_hier TO level_name = 'MONTH'
               AND TO_CHAR(month_end_date, 'Q') IN (1, 2))
)
WHERE time_hier.level_name = 'YEAR')
ORDER BY time_hier.hier_order;

Analytic Views in Oracle 18c has some additional functions that can be used to calculate measures. The new functions include RANK_*, PERCENTILE_*, STATS_*, COVAR_*, HIER_DEPTH, HIER_LEVEL, HIER_MEMBER_NAME, HIER_MEMBER_UNIQUE_NAME, HIER_CAPTION, and HIER_DESCRIPTION.

These new Analytic Views functions greatly extend the types of calculation that can be performed and thereby support the use of Analytic Views with a wider range of application areas.

Machine Learning Enhancements

The Oracle Advanced Analytics option (OAA) contains all the fun machine learning algorithms. With each release of the database, we keep on getting more algorithms added into the core of the database engine. We need to be careful not to mix these up with the algorithms that come as part of the Oracle R Enterprise component of OAA.

The in-database algorithms are fully accessible by using SQL. With Oracle 18c they have added Random Forests, Neural Networks, and Time Series Analysis using Exponential Smoothing and incorporate 14 different statistical models to the core database engine.

The following illustrates an example for creating a Random Forest model. The first step is to create a settings table. This contains the algorithm settings. The second step is to create the model.

CREATE TABLE rf_settings_table ( 
     setting_name VARCHAR2(30), 
     setting_value VARCHAR2(30));

BEGIN 
    INSERT INTO decision_tree_model_settings (setting_name,                                  
                                              setting_value) 
     VALUES (dbms_data_mining.algo_name,        
             dbms_data_mining.algo_random_forest); 
     
     INSERT INTO decision_tree_model_settings (setting_name, 
                                              setting_value) 
     VALUES (dbms_data_mining.prep_auto,
            dbms_data_mining.prep_auto_on); 
    
     COMMIT; 
 END;

BEGIN  
   DBMS_DATA_MINING.CREATE_MODEL(
   model_name         => 'RANDOM_FOREST_MODEL',
   mining_function     => dbms_data_mining.classification,
   data_table_name     => 'MINING_DATA_BUILD_V',
   case_id_column_name => 'CUST_ID',
   target_column_name => 'AFFINITY_CARD',
   settings_table_name => 'rf_settings_table');
END;

If you are familiar with using the OAA SQL functions and creating machine learning models, then you will notice that the above code is very similar to what you had to do previously. All that is needed is some small changes to using Random Forests. You can build a Neural Network model in a similar way.

Inline External Tables

Accessing external tables has been enhanced to allow for inline external table definitions, by allowing the external table to be defined at runtime of the SQL statement. This can remove the need to define an external table as an object in the database. The external table definition is instead defined in the SELECT statement.

This following code illustrates how you would define a traditional external table. This external table becomes a persistent object in the database.

CREATE TABLE countries_ext (
 country_code     VARCHAR2(5),
 country_name     VARCHAR2(50),
 country_language VARCHAR2(50)
)

ORGANIZATION EXTERNAL (
 TYPE ORACLE_LOADER
 ACCESS PARAMETERS (
   RECORDS DELIMITED BY NEWLINE
   FIELDS TERMINATED BY ','
   MISSING FIELD VALUES ARE NULL
   (
     country_code     CHAR(5),
     country_name     CHAR(50),
     country_language CHAR(50)
   )
 )

LOCATION ('Countries.txt')
REJECT LIMIT UNLIMITED
);

With an Inline External Table we can bypass the creation of this object in the database and go straight to the external file. The following example illustrates an Inline External Table alternative to the persistent example.

<code>SELECT * FROM   </code>
EXTERNAL (   
  ( country_code     CHAR(5),
   country_name     CHAR(50),
    country_language CHAR(50)
   )     
   TYPE ORACLE_LOADER     
   ACCESS PARAMETERS (
     RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY ','
   MISSING FIELD VALUES ARE NULL)     
   LOCATION ('Countries.txt') 
   REJECT LIMIT UNLIMITED) 
sales_external;

The new features listed in this post are a subset of all the new features available in Oracle 18c. But the new features listed here will be of most interest to those focused on analytics and data science.

To give Oracle 18c a try, and to test out the new features, you will need to go to cloud.oracle.com and select Database from the dropdown list from the Platform menu. Yes, you are going to need an Oracle Cloud account and some money or some free credit. Go and get some free cloud credits at the upcoming Oracle Code events.

If you want a 'free' way of trying out Oracle 18c, you can use Oracle Live SQL. They have set up some examples of the new features for you to try.

About the Author

Brendan Tierney

About the Author

Start the discussion at forums.toadworld.com