Data lineage analysis from multiple SQL Files

To get an accurate data lineage analysis result, we may provide the definition of the database objects such as table, view, procedure to the GSP(General SQL Parser).

1. Parse SQL file with ambigious table/columnn relation

Take this SQL (file1.sql) for example:

CREATE VIEW test 
AS 
  (SELECT NAME, 
          address 
   FROM   manager, 
          employee 
   WHERE  manager.id = employee.id) 

Without more information. GSP doesn’t know the column NAME, address in the select list belongs to which table in the from clause.

2. Provides the table definition

File2.sql

Create table employee (id number, name varchar2(100), address varchar2(100));

File3.sql

Create table manager (id number, age varchar2(100), country varchar2(100));

If you provide those 2 SQL files with the table definition to the GSP, then column NAME, address will be linked to the table employee correctly.

3. How to provides multiple SQL files to GSP

In GSP, gudusoft.gsqlparser.dlineage.DataFlowAnalyzer class do the actual work of data lineage analysis.

	public DataFlowAnalyzer(File sqlFile, EDbVendor dbVendor, boolean simpleOutput) {
		this.sqlFile = sqlFile;
		this.vendor = dbVendor;
		this.simpleOutput = simpleOutput;
	}

As you can see here, the first parameter of DataFlowAnalyzer accept a File type which will accept a directory that includes all SQL files that need to be processed.

You may also check the DataFlowAnalyzer demo under demos.lineage package shipped together with the GSP library to find out how to feed multiple SQL files.

4. Pulling all objects from a database (table, view, function, procedure, and trigger definitions)

Once you were pulling all objects from a database (table, view, function, procedure, and trigger definitions), it is recommended to put the definition of a single object in a single SQL file, especially for function, procedure, and trigger definitions. In this way, the processing error in one single SQL file will not affect the other SQL files.

The order of those SQL files put under a directory doesn’t matter. GSP is smart enough to get the necessary information accordingly.