Checking Data Type Consistency in Oracle

In large databases it can be a challenge to have data type consistency across many tables and views, especially since SQL does not understand PL/SQL’s %TYPE attribute. When designing the overall structure of the tables, tools such as SQL Developer’s Data Modeller can be used to reduce the pain associated with potential data type inconsistencies. However, as databases grow and evolve, data types may diverge and cause headaches when moving data back and forth. Here I present a utility to identify and automatically fix many of these issues.

Package: plsql_utils

In a previous post I introduced the utils package, which included subprograms relevant to both PL/SQL and SQL. Since such a mixture ruins the serial reusability, which is great for reducing the memory footprint of a package, I decided to split the package into two components: plsql_utils and sql_utils. The latter is mainly interesting for functions that you can call from SQL statements, whereas the former is an extension of what was the package utils but with PRAGMA SERIALLY_REUSABLE. If you want the gory details, you can read the commit message Create logic for data type checks on the repository and the comments in the code.

Anyway, the package plsql_utils contains a procedure called fix_data_type_issues that depends on a view data_type_issues, which are all available on the GitHub repository. The view lists which data type is more common and this is used by plsql_utils.fix_data_type_issues to automatically resolve the inconsistencies.

What the procedure basically does is check whether the table is empty or not and either alters the table in place or uses DBMS_REDEFINITION to redefine the table online. Internally, the DBMS_METADATA API is used to generate the DDL statement without constraints but with partition and storage clauses, so that the table after redefinition matches the original table exactly. There is also a check whether the table has a primary key and if so uses that to move the data from the original to the interim table; if there is no (enabled) primary key, the ROWID is used.

You can install the required objects by executing the following script from the directory, which is the same as SCRIPT - deployment.pls on my Git repository:

@'TABLE - error_log.sql'
/
@'PACKAGE - errors.pks'
/
@'PACKAGE - errors.pkb'
/
@'PACKAGE - type_defs.pkg'
/
@'PACKAGE - plsql_utils.pks'
/
@'PACKAGE - plsql_utils.pkb'
/
@'PACKAGE - sql_utils.pks'
/
@'PACKAGE - sql_utils.pkb'
/
@'VIEW - data_type_issues.sql'
/

The forward slashes have been added so that the script runs successfully in SQL*Plus too.

Example

Shall we run the procedure on a couple of dummy tables?

CREATE TABLE t1
(
    id          NUMBER NOT NULL
  , num         NUMBER(4,2)
  , text        VARCHAR2(100)
  , another_num NUMBER(6,2)
  , CONSTRAINT t1_pk      PRIMARY KEY ( id )
  , CONSTRAINT t1_text_ck CHECK ( LENGTH(text) > 10 )
);

INSERT INTO t1
SELECT
  ROWNUM
, DBMS_RANDOM.NORMAL
, DBMS_RANDOM.STRING('U',15)
, DBMS_RANDOM.NORMAL
FROM
  dual
  CONNECT BY LEVEL <= 100;

CREATE TABLE t2 AS SELECT * FROM t1 WHERE 1=2;

ALTER TABLE t2 MODIFY ( num  NUMBER(5,3) );
ALTER TABLE t2 MODIFY ( another_num  NUMBER(8,2) );

CREATE SMALLFILE TABLESPACE sandbox1_ts
  DATAFILE 'df_sandbox1.dat'
  SIZE 10M AUTOEXTEND ON;

CREATE SMALLFILE TABLESPACE sandbox2_ts
  DATAFILE 'df_sandbox2.dat'
  SIZE 10M AUTOEXTEND ON;

CREATE TABLE t3
  PARTITION BY HASH (id)
  PARTITIONS 3
  STORE IN (sandbox1_ts, sandbox2_ts, sandbox1_ts)
AS
  SELECT * FROM t1 WHERE 1=2;

ALTER TABLE t3 MODIFY ( text VARCHAR2(20) );
ALTER TABLE t3 MODIFY ( another_num  NUMBER(8,2) );

INSERT INTO t3 SELECT * FROM t1;

What these SQL statements do is create three tables with the same column names but t1 has a different data type for another_num, t2 for num, and t3 for text. What our fix_data_type_issues procedure in plsql_utils is supposed to do is fix these issues based on the most prevalent data type available, so at the end we expect all three tables to have the same structure:

Name        Null     Type
----------- -------- -------------
ID          NOT NULL NUMBER
NUM                  NUMBER(4,2)
TEXT                 VARCHAR2(100)
ANOTHER_NUM          NUMBER(8,2)

Both t1 and t3 contain data, so that means the DBMS_REDEFINITION package will be used. Technically, we could get away with a simple ALTER TABLE ... MODIFY statement whenever the data type change does not involve a modification of the precision or scale even when there is data in the table. Of course, the data type after the redefinition must be able to squeeze in the data already in the table.

Now, let’s run the procedure:

EXEC plsql_utils.fix_data_type_issues;

This is what it spits out on my sandbox database (12c).

SANDBOX.T2:
  NUM - NUMBER(5,3) [1x] ->
    NUMBER(4,2) [2x]
  Successfully redefined.
SANDBOX.T3:
  TEXT - VARCHAR2(20) [1x] ->
    VARCHAR2(100) [2x]
  Successfully redefined.
SANDBOX.T1:
  ANOTHER_NUM - NUMBER(6,2) [1x] ->
    NUMBER(8,2) [2x]
  Successfully redefined.

The numbers in the square brackets show the occurrences of the data type, that is, the number of tables in which the column listed appears with that particular data type.

You can verify that indeed all tables have the same data types and that any constraints, indexes, partitions, and so on are still there.

Exceptional Cases

There are a few exceptional cases that I’d like to discuss in a bit more detail.

Ties

So, what does the procedure do when there is a tie?

Nothing. It shows the data type issue but informs you that ties cannot be resolved automatically, please redefine manually.

What If The Data Does Not Fit?

When the redefinition causes data not to fit anymore, the procedure says so: encountered an error during the redefinition. These errors can be seen from the error_log table.

For example, ORA-12899: value too large for column “SANDBOX”.”COPY_TAB_XNJVJ7SRXSND8LX”.”TEXT” (actual: 15, maximum: 10). It shows you that the temporary object COPY_TAB_XNJVJ7SRXSND8LX has values that cannot be resized to the data type suggested. Yes, internally an object with a semi-random name is generated, which should not cause any issues with name clashes unless your developers are forced to follow a naming convention that favours gibberish.

Since the redefine_column sub-procedure is placed inside its own BEGIN ... EXCEPTION ... END block, the main procedure continues unabatedly. All redefinition errors are automatically logged in the error_log table, so you can review these later. In case there are errors, the last line of the output will inform you of the number of errors encountered.

Excluding Certain Columns

What if you have the column id in 500 tables with different data types and you don’t want id to be fixed automatically? Well, the procedure currently does not handle this situation but you can add it yourself fairly easily as most of the functionality is already baked into the package:

  1. Create a SQL (schema-level) type, so you don’t run into a PLS-00642 error that tells you that local collection types are not allowed in SQL statements. In 12c you can increase the length of the VARCHAR2 to the maximum in PL/SQL, but in case you’re still stuck with 11g or below that won’t work. Just use CREATE TYPE varchar2_ntt IS TABLE OF VARCHAR2(4000) instead. Please observe that technically we can define a nested table type with fewer characters as column names are at most 30 characters. However, the schema-level type may have other uses, so it’s often advantageous to define it as generically as possible.
  2. Overload the split_objects function to return the schema-level type varchar2_ntt.
  3. Add a formal parameter to fix_data_type_issues, say exclude_cols_in, that takes as its argument a list of column names, just like cols_in in the find_table procedure. Remember to modify the procedure’s signature in the package specification accordingly.
  4. Call the overloaded split_objects function in fix_data_type_issues.
  5. Add WHERE column_name IN (SELECT * FROM TABLE(exclude_cols_in)) to the cursor, which requires you to either call the split_objects function in the declaration section before you define the cursor or move all the assignments to the main block. In the correct sequence, that is. The appearance of TABLE is an example of the so-called table expression.

Complex/Custom Data Types

Most tables use the standard data types provided by Oracle. Sometimes a table may hold composites, such as nested tables, as columns. In this case the fix_data_type_issues procedure tells you that custom data types cannot be automatically redefined at present. This functionality could be added to the package but that is something for another day, as it requires the package to dynamically build the constructor function and provide a custom mapping of the columns required. Such special cases can be done best manually. Please check Oracle’s Database Administrator’s Guide for an example of an online redefinition with object types.

Prerequisites

In order to compile and run the subprograms in the package, advanced replication must be enabled. You can check this by using the following query:

SELECT
  value
FROM
  v$option
WHERE
  parameter = 'Advanced replication';

Without it, you’ll run into trouble.

Moreover, the EXECUTE_CATALOG_ROLE cannot be used to execute DBMS_REDEFINITION from the package because roles are disabled when executing (or compiling) stored PL/SQL code. Hence, you have to grant the EXECUTE privilege on DBMS_REDEFINITION explicitly.

Clean-Up

After you’re done fiddling with the demo tables, you can kick them out of the database:

DROP TABLE t1;
DROP TABLE t2;
DROP TABLE t3;

DROP TABLESPACE sandbox1_ts INCLUDING CONTENTS AND DATAFILES;
DROP TABLESPACE sandbox2_ts INCLUDING CONTENTS AND DATAFILES;

And that’s it. I hope you like the new features in plsql_utils.