General SQL Parser

General SQL Parser FAQ

2021-07-22T00:00:00+00:00

Technical support
Licensing and billing
Sales and reseller
1. Q: We are resellers. can we purchase your products for our customers?
2. Q: Can we purchase via emailed PO

Technical support

Q: Does general SQL parser depend on any third party library/software/DLLs?

General SQL Parser(GSP) doesn’t depend on any third party library/software/DLLs. In order to run GSP Java version, JRE 1.5 is needed. In order to run GSP .NET version, .NET Framework 4.5 or higher is needed.

Q: In order to use GSP to validate SQL syntax, do I need to connect to a database instance such as Oracle?

GSP can validate SQL syntax without any connection to the database instance, no internet connection. GSP includes all SQL parser engines itself, no additional file or connection is needed.

Q: How long will my feature request or bug report be processed?

We provide email-based tech support. Usually, Feature requests and bug reports will be processed in 2-3 weeks. However, this is not guaranteed. According to the complexity of the issue, the processing time will be varied from weeks to months. In addition to providing you with our free tech support, we also offer customized services which fix the bug and implement the feature in time, please check info@sqlparser.com for the detailed information.

Q: When database vendor add new SQL syntax, how long will those SQL syntaxes be supported in general SQL parser?

General SQL Parser supports both PL/SQL and SQL. Although we try to add support for all SQL syntax of the database, it’s quite difficult to make sure all SQL syntax of the database is supported especially keep up with the recent version.

The goal of General SQL Parser is NOT to support all SQL syntax of the database, but support the most used SQL syntax. So, our strategy is to add support for the new SQL syntax when it is requested by the user.

Q: Is GSP .NET version a .NET Standard library?

Yes. General SQL Parser .NET version is .NET Standard compatible which means it can run on all .NET platforms that implement .NET Standard.

Licensing and billing

Q: What’s kind of General SQL Parser license do I need?

General SQL Parser is licensed as per user/developer.

There are three developer licenses: single user license, team license(2-5 developers), and site license(more than five developers).

The Single License grants the one developer the right to install and use multiple copies of the Software during the development.

The Team License grants all those developers(less than 5) the right to install and use multiple copies of the Software during the development.
The Team License cannot be shared or used concurrently by more than five developers. The Team License is NOT a ‘floating’ license; that is, you cannot temporarily transfer access rights to users outside the team.

A Site License grants you the right to share or use the Software concurrently by multiple individual developers at the authorized site.

You also need to specify the database platforms that need to be included when you purchase the license.

Please refer to General SQL Parser Licensing for more details.

Q: What if I want to distribute this library?

You can’t deploy this library together with your product/service to customers outside your organization. You can deploy this library to specific machine inside your organization according to the license you purchased.

If you need to distribute this library into a cloud environment and provide service outside your organization, you need to purchase the distribution license.

Don’t hesitate to contact us (support@sqlparser.com) for a distribution license if you need to distribute this library outside your organization as a part of your product/service.

The single license entitles you to deploy this library with your software that depends on the library to a single machine inside your organization. This library can’t be deployed to more than one machine without purchase the additional distribution license.

The team license entitles you to deploy this library with your software that depends on the library to five machines inside your organization. This library can’t be deployed to more than five machines without purchase the additional distribution license.

This site license entitles you to deploy this library with your software that depends on the library to unlimited machines inside your organization.

Please note that only the licensed developer can access General SQL Parser library; any third-party developer or program can’t access APIs of General SQL Parser even your program wraps it.

Please refer to General SQL Parser Licensing for more details.

Q: Will the license expire?

You can use the software without any time limitation. It never expired. Furthermore, you can upgrade to the latest version of the software within 12 months after purchase. However, if you like to upgrade to the latest version after 12 months, you need purchase our yearly subscription which enables another 12 months upgrade and free tech support. the price for annual subscription is 20% of the original purchase. You will be notified when it’s time to renew your license, You need to renew this annual subscription yourself.

We want to emphasize that when you renew your subscription for our software, the payment must cover all fees incurred since your last renewal. It is essential that there is no interruption in your subscription period. If there is any gap in your coverage, you will be required to pay the outstanding fees to restore uninterrupted access to our services.

For instance, if your subscription expires on May 1, 2024, and you choose to renew it on July 1, 2024, please note that the new subscription period will still start from May 1, 2024.

Q: if I were to buy support for one database to start – can I add additional ones at a later date?

Yes, of course. You only need to pay the price for the additional database when you need.

Q: We have a need to create a parsing service. Can we use the components to develop a parser and then deploy the service to a web-like Tomcat for other to consume via API or is this specific to a specific user and a specific user’s machine?

General SQL Parser is licensed as per user/developer. A developer license is needed If any user/developer/machine need to access API, even a wrapper is created and API of GSP is not accessed directly.

Q: May I use GSP in more than one product?

Yes. There is no limitation of how many products GSP can be used in.

Q: Payment term?

The full licensed version will be available to download from the official site within 2 working days after we receive the payment. You need to send the payment first.

Sales and reseller

Q: We are resellers. can we purchase your products for our customers?

Yes, you can purchase software from our online shop on behalf of your customer. After purchasing software, email us detailed information about your customer. The full licensed version should be available to download from our site within 48 hours after we receive your order. There is no discount for reseller within the first 50 licenses.

Q: Can we purchase via emailed PO

Yes, our award-winning payment processor support emailed PO

Data lineage analysis from multiple SQL Files.

2020-12-31T00:00:00+00:00

Data lineage analysis from multiple SQL Files

To get an accurate data lineage analysis result, we may provide the definition of the database objects such as table, view, procedure to the GSP(General SQL Parser).

1. Parse SQL file with ambigious table/columnn relation

Take this SQL (file1.sql) for example:

CREATE VIEW test 
AS 
  (SELECT NAME, 
          address 
   FROM   manager, 
          employee 
   WHERE  manager.id = employee.id) 

Without more information. GSP doesn’t know the column NAME, address in the select list belongs to which table in the from clause.

2. Provides the table definition

File2.sql

Create table employee (id number, name varchar2(100), address varchar2(100));

File3.sql

Create table manager (id number, age varchar2(100), country varchar2(100));

If you provide those 2 SQL files with the table definition to the GSP, then column NAME, address will be linked to the table employee correctly.

3. How to provides multiple SQL files to GSP

In GSP, gudusoft.gsqlparser.dlineage.DataFlowAnalyzer class do the actual work of data lineage analysis.

	public DataFlowAnalyzer(File sqlFile, EDbVendor dbVendor, boolean simpleOutput) {
		this.sqlFile = sqlFile;
		this.vendor = dbVendor;
		this.simpleOutput = simpleOutput;
	}

As you can see here, the first parameter of DataFlowAnalyzer accept a File type which will accept a directory that includes all SQL files that need to be processed.

You may also check the DataFlowAnalyzer demo under demos.lineage package shipped together with the GSP library to find out how to feed multiple SQL files.

4. Pulling all objects from a database (table, view, function, procedure, and trigger definitions)

Once you were pulling all objects from a database (table, view, function, procedure, and trigger definitions), it is recommended to put the definition of a single object in a single SQL file, especially for function, procedure, and trigger definitions. In this way, the processing error in one single SQL file will not affect the other SQL files.

The order of those SQL files put under a directory doesn’t matter. GSP is smart enough to get the necessary information accordingly.

SQL parse tree node and underlying tokens

2020-10-08T00:00:00+00:00

This document explains how to use the GSP library to parse an existing SQL script, then modify the SQL parse tree, and rebuild the whole SQL using TParseTreeNode.toString() method.

If you build a SQL parse tree from the scratch(Not from the existing SQL), and then generate SQL text from this parse tree, then TParseTreeNode.toScript() is the better choice.

1. TParseTreeNode setString()

Change the text of a SQL clause or the whole SQL statement.

Take this SQL for example:

SELECT *
FROM scott.employee
WHERE e.job_id = 1

We like to change the condition in the where clause from e.job_id = 1 to e.salary > 1000.

Below is the Java code illustrates how to achieve this.

sqlparser.sqltext = "SELECT *\n" +
        "FROM scott.employee\n" +
        "WHERE e.job_id = 1";
		
sqlparser.parse();

TSelectSqlStatement select = (TSelectSqlStatement)sqlparser.sqlstatements.get(0);
TWhereClause whereClause = select.getWhereClause();

whereClause.getCondition().setString("e.salary > 1000");

System.out.println(select.toString());

After running the above Java code, the output is:

SELECT *
FROM scott.employee
WHERE e.salary > 1000

2. remove a node

Call setXXX() method from the parent node and pass null as input parameter, will remove the SQL clause from the parent node.

This Java code will remove where clause from the select statement.

sqlparser.sqltext = "SELECT * FROM TABLE_X where a>1 order by a";
		
sqlparser.parse();

select.setWhereClause(null);

System.out.println(select.toString());

Call removeItem(int index) of `TParseTreeNodeList` will remove an item from the node list.

This Java code will remove column b from the order by clause.

sqlparser.sqltext = "SELECT * FROM TABLE_X order by a,b";
		
sqlparser.parse();

select.getOrderbyClause().getItems().removeItem(1);

System.out.println(select.toString());

3. update a node

Please node’s setString() method.

TGSqlParser parser = new TGSqlParser(EDbVendor.dbvoracle);
parser.sqltext = "SELECT A.COLUMN1, B.COLUMN2 from TABLE1 A, TABLE2 B where A.COLUMN1=B.COLUMN1";
parser.parse();
TSelectSqlStatement select = (TSelectSqlStatement)parser.sqlstatements.get(0);

select.getWhereClause.setString("where a>2");

System.out.println (select.toString());

4. add a new node

Call setXXX() method from the parent node and pass the new node as a paremeter. In order to add a new node, we must know the parent node of this new added node.

Take this SQL for example, TCustomSqlStatement.getWhereClause() returns null.

SELECT emp_id,salary+100 FROM emp

In order to add where clause for this SQL, below is the Java:

//create a new node
TWhereClause whereClause = new TWhereClause();
whereClause.setString("where a>2");

//link this new created node in the SELECT statement
select.setWhereClause(whereClause);

Then, we will get the new SQL like this:

SELECT emp_id,salary+100 FROM emp where a>2

steps to add a new node:

create a node, new TParseTreeNode(), then call TParseTreeNode.setString() to set the text of this node.
call setXXX(TParseTreeNode) from the parent node, and pass the new created node as parameter.

APIs available to modify the parse tree

TParseTreeNode.setString(String sqlSegment), update the text of a node.
TParseTreeNodeList.removeItem(int index), All decendant class of TParseTreeNodeList can use this method to remove a sub-node.
TCustomSqlStatement.setOutputClause(TOutputClause outputClause)
TCustomSqlStatement.setResultColumnList(TResultColumnList resultColumnList)
TCustomSqlStatement.setReturningClause(TReturningClause returningClause)
TCustomSqlStatement.setTargetTable(TTable targetTable)
TCustomSqlStatement.setTopClause(TTopClause topClause)
TCustomSqlStatement.setWhereClause(TWhereClause newWhereClause)
TCustomSqlStatement.setWhereClause()
TCustomSqlStatement.setWhereClause()
all TSelectSqlStatement.setXXX() method.

use visitor pattren to search and modify node

Since there are lots of nodes in a parse tree node, and you may only need to modify some specific node type. So, use visitor to search and modify a specific type node is very convenient.

Please find how to search datatype, function, SQL statement and modify it here: Java demo

SQL parse tree node and expression modification

2020-10-08T00:00:00+00:00

How to modify and rebuild expression.

1. remove a sub-node of an Expression

After removing a sub-node of an expression, the whole expression maybe affected. Take this SQL for example:

d.cntrb_date1 >= '$From_Date$'

remove either d.cntrb_date1 or ‘$From_Date$’, the whole expression will be removed as well.

According to the different kind of expression, the result will be different after removing a sub-node.

Math expression: +,-,*,/ and other expression with two operands, after removing one operand, the other will be remain unchanged.
Logical expression: and, or， after removing one operand, the other will be remain unchanged.
Comparison expression: <, > , after removing one operand, the whole expression will be removed.
in, between, () expression: after removing one operand, the whole expression will be removed.
Other kind of expression: after removing one operand, the whole expression will be removed.

After the removal of the sub-node, if the whole parent expression is removed as well, the processing will be executed recursively until the top-level expression.

1.1 API

Call TExpression.removeMe() to remove an expression itself.

1.2 using TParseTreeNodeList.removeItem(int index) to remove the sub-expression in the expression list

(1,2,3,4)

After calling

expressionList.removeItem(0);

The result is:

(2,3,4)

2. Modify the expression

d.cntrb_date1 >= '$From_Date$'

After set ‘$From_Date$’ to 1 , the expression will be

d.cntrb_date1 >= 1

The java code to achieve this:

expression.getRightOperand().setString("1");
assertTrue(expression.toString().equalsIgnoreCase("d.cntrb_date1 >= 1"));

3. Add a new expression

If we like to change

d.cntrb_date1 >= '$From_Date$'

to:

d.cntrb_date1 >= '$From_Date$' + 1

Here is the Java code:

expression.getRightOperand().setString(expression.getRightOperand().toString()+" + 1");
assertTrue(expression.toString().equalsIgnoreCase("d.cntrb_date1 >= '$From_Date$' + 1"));

Reference Java code

testExpression

public void testRemove1()
public void testRemoveExprList()

testModifyExpr testModifySql

SQL parse tree node and expression modification

2020-09-10T00:00:00+00:00

表达式的修改及结果。

一、删除表达式的子节点

删除表达式后，可能会对这个表达式所在的整个表达式产生影响，例如：

d.cntrb_date1 >= '$From_Date$'

以上比较表达式，删除 d.cntrb_date1 或 ‘$From_Date$’ 中的任意一个, 整个表达式也被删除。

根据表达式和其所在表达式的情况不同，处理的方式也不同，以下为主要情况

数学表达式: +,-,*,/ 和其他含有两个 operand 的数学表达式，删除其中一个 operand 后，还会留下另外一个。
逻辑表达式: and, or，删除其中一个 operand 后，还会留下另外一个。
比较表达式: <, > 等，删除其中一个 operand 后，整个表达式也被删除。
in, between, () 表达式，删除其中一个 operand 后，整个表达式也被删除。
其它表达式，删除其中一个 operand 后，整个表达式也被删除。

删除一个表达式后，如果导致父表达式也被删除，会递归处理更高级别的表达式，直到最高层的表达式。

1. API

TExpression.removeMe()

2. 相关属性的变化

如果一个节点被删除，那么该节点的属性：

getNodeStatus() 为 ENodeStatus.nsRemoved
getStartToken(), getEndToken() 返回 null
toString(), 返回 null
expression.getExpressionType() == EExpressionType.removed_t

判断一个节点是否被删除，用 getNodeStatus() == ENodeStatus.nsRemoved

如果一个表达式的 left operann and right operand 都被删除，那么该表达式的状态也处于被删除。以上属性同样满足。

3. 利用 TParseTreeNodeList.removeItem(int index) 来移除 expression list 中的 expresssion

(1,2,3,4)

调用

expressionList.removeItem(0);

结果为：

(2,3,4)

二、更改表达式

d.cntrb_date1 >= '$From_Date$'

After set ‘$From_Date$’ to 1 , the expression will be

d.cntrb_date1 >= 1

通过以下代码实现上面的更改功能：

expression.getRightOperand().setString("1");
assertTrue(expression.toString().equalsIgnoreCase("d.cntrb_date1 >= 1"));

三、增加表达式

增加表达式一般通过修改原有表达式来实现，例如：

d.cntrb_date1 >= '$From_Date$'

想变为：

d.cntrb_date1 >= '$From_Date$' + 1

通过以下代码实现上面的更改功能：

expression.getRightOperand().setString(expression.getRightOperand().toString()+" + 1");
assertTrue(expression.toString().equalsIgnoreCase("d.cntrb_date1 >= '$From_Date$' + 1"));

参考代码

testExpression 中的

public void testRemove1()
public void testRemoveExprList()

testModifyExpr testModifySql

SQL parse tree node and underlying tokens

2020-09-09T00:00:00+00:00

本文目的是帮助用户掌握：通过操作 SQL 语句的 AST，输出新的 SQL 语句。具体而言，是通过调用 TParseTreeNode.toString() 方法(拼接对应的 token list)来输出 SQL 语句。使用这种方法，只要是 GSP 能够解析的 SQL，都可以正确输出 SQL 语句。这种方法的使用场景是：解析 SQL 语句，修改 SQL 对应的 AST，输出新的 SQL 语句。

GSP 中另一个输出 SQL 语句的方法是 TParseTreeNode.toScript() ，它根据语法把 AST 中每个 node 转换为文本，然后拼接成完整的 SQL 语句。主要的使用场景是：用户完全从头开始利用 GSP API 来构造一颗 SQL 语句的 AST 树，然后根据 AST 来输出 SQL 语句。当然也可以利用 TParseTreeNode.toScript() 来输出 GSP 解析后的 SQL 语句，但如果 AST 中某个 node 转换文本功能没有支持，则整个 SQL 语句的输出将失败。

一、 SQL 文本，AST Node 及 Tokens 的关系

GSP 解析 SQL 语句，先由 lexer 把 SQL 文本分解成一系列 tokens, 然后由 parser 逐个处理这些 tokens, 生成语法树(AST)。AST 中的每个 node 对应 SQL 语句中的一部分文本，也对应 tokens 中的一段连续的 tokens.

SELECT emp_id,salary+100 FROM emp

以上SQL对应下面的 token list:

每个 node 都含有一个起始 token(startToken) 和一个结束 token(endToken)。组成 node 的 token 由 startToken 开始，到 endToken 结束。node 中的所有 token 以双向链表方式建立关联。

public TSourceToken getStartToken()
public TSourceToken getEndToken()

由 SQL 的语法决定，一个 token 可以是一个或多个 node 的 startToken, 也可以是一个或多个 node 的 endToken.

public Stack<TParseTreeNode> getNodesStartFromThisToken()
public Stack<TParseTreeNode> getNodesEndWithThisToken()

例 1, token: emp_id

以它为 startToken 的 node 有：

Node type:gudusoft.gsqlparser.nodes.TObjectName, 	Node text:emp_id
Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:emp_id
Node type:gudusoft.gsqlparser.nodes.TResultColumn, 	Node text:emp_id

以它为 endToken 的 node 有：

Node type:gudusoft.gsqlparser.nodes.TObjectName, 	Node text:emp_id
Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:emp_id
Node type:gudusoft.gsqlparser.nodes.TResultColumn, 	Node text:emp_id

可以发现，当 node 只有唯一一个 token 组成时， node 的 startToken 和 endToken 都为该 token。

例 2, token: salary

以它为 startToken 的 node 有：

Node type:gudusoft.gsqlparser.nodes.TObjectName, 	Node text:salary
Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:salary
Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:salary+100
Node type:gudusoft.gsqlparser.nodes.TResultColumn, 	Node text:salary+100

以它为 endToken 的 node 有：

0: Node type:gudusoft.gsqlparser.nodes.TObjectName, 	Node text:salary
1: Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:salary

例 3, token: 100

以它为 startToken 的 node 有：

0: Node type:gudusoft.gsqlparser.nodes.TConstant, 	Node text:100
1: Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:100

以它为 endToken 的 node 有：

Node type:gudusoft.gsqlparser.nodes.TConstant, 	Node text:100
Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:100
Node type:gudusoft.gsqlparser.nodes.TExpression, 	Node text:salary+100
Node type:gudusoft.gsqlparser.nodes.TResultColumn, 	Node text:salary+100

例 4, token: emp

Parser 以 LALR 的方式解析 SQL，因此以某个 token 开始的所有 node 被 parser 以创建的先后次序依次存放在栈中，即子 node 比父 node 先进入栈中。但在 LALR 解析的前期或后期，以这个 token 为 startToken 或 endToken 的 node 可能还会被创建，因此在栈高层的 node 并不一定都是底层 node 的父辈 node。判断栈中两个 node 的包含关系，通过比较它们包含 token 个数来决定， token 个数多的为父辈 node。

从 emp token 我们就可以观察到这种现象，特别是以它为 endToken 的 node。

以它为 startToken 的 node 有：

Node type:gudusoft.gsqlparser.nodes.TObjectName, 	Node text:emp
Node type:gudusoft.gsqlparser.nodes.TFromTable, 	Node text:emp
Node type:gudusoft.gsqlparser.nodes.TTable, 	Node text:emp
Node type:gudusoft.gsqlparser.nodes.TJoin, 	Node text:emp

以它为 endToken 的 node 有：

Node type:gudusoft.gsqlparser.stmt.TSelectSqlStatement, 	Node text:SELECT emp_id,salary+100 FROM emp
Node type:gudusoft.gsqlparser.nodes.TObjectName, 	Node text:emp
Node type:gudusoft.gsqlparser.nodes.TFromTable, 	Node text:emp
Node type:gudusoft.gsqlparser.nodes.TSelectSqlNode, 	Node text:SELECT emp_id,salary+100 FROM emp
Node type:gudusoft.gsqlparser.nodes.TTable, 	Node text:emp
Node type:gudusoft.gsqlparser.nodes.TJoin, 	Node text:emp

其中， node0 是在 getrawsqlstatements() 时创建的。 node1, node2, node3 是在 Parser 以 LALR 的方式解析时创建的。 node4, node5 是在后期的语义处理阶段创建的。

1, TParseTreeNodeList 子类类型的 node

TParseTreeNodeList 子类类型的 node 维护着一个链表，该链表包含多个相同类型的 node. TParseTreeNodeList 类型的 node 本身不直接包含 startToken 和 endToken。它的 startToken 为它链表中首个 node 的 startToken. 它的 endToken 为它链表中最后一个 node 的 endToken.

以 TParseTreeNodeList 子类 TResultColumnList 为例，TSelectSqlStatement.getResultColumnList() 返回下面 SELECT 语句的 emp_id,salary+100 部分。

SELECT emp_id,salary+100 FROM emp

因此，该 TResultColumnList 的 startToken 为 emp_id, endToken 为 100, 同时从例1 和例3 我们也可以知道，以 emp_id 为 startToken 的 node 中并不包含 TResultColumnList, 以 100 为 endToken 的 node 中也不包含 TResultColumnList。

调用 node 的ToString()方法，即从 startToken 开始输出文本，遍历每一个 token，直到 endToken 结束。因此，使用 GSP 的 API 对 AST node 进行操作时，更新 node 结构也会同步跟新对应的 token，以保证 node 的 ToString() 输出正确的文本。

2, 节点与子节点的关系

语法树 (AST) 包含多个节点 (node)，同时，节点 (node) 也可以包含多个子节点，因此，一个顶级的 node 就是一颗语法树。例如TSelectSqlStatement.

不同数据库的相同 SQL 语句，例如 SELECT 语句，在 GSP 中用同一个节点 TSelectSqlStatement 表示，它的子节点因为不同的数据库而可能会有不同，例如，Oracle 中就没有 TTopClause 这个子节点。visitor 访问代表不同数据库 SELECT 语句的 TSelectSqlStatement 节点的方式是相同的。

节点可以包含多个子节点，同级子节点对应的 token list 不会重叠，节点的 token list 包含所有子节点的 token list，除此之外，可能还会包含节点自身独有的辅助 token，例如，TSelectSqlStatement 就有 SELECT 这个token，它不属于任何子节点。

节点的 startToken, endToken 可能和它子节点的startToken, endToken重合，分为三种情况：

节点和子节点的 startToken, endToken 都重合。
节点和子节点的 startToken 重合，但节点的 endToken 在子节点的 endToken 之后。
节点和子节点的 endToken 重合，但节点的 startToken 在子节点的 startToken 之前。

因此，当某个节点的 startToken, endToken 发生变化时，共用这些 startToken, endToken 的节点也需要同步更新他们 startToken, endToken 的指向。

二、GSP 如何保证 AST Node 和 Tokens 的同步

这个 SQL 表达式

fx(2)+1

它的 token list 为：

它的 node 关系图：

从图中可以知道，fx 同时是 TObjectName, TFunctionCall, TExpression (3), TExpression (4) 的 startToken. 当我们用 TFunctionCall.setString('gx(2)') 把 fx(2) 更改为 gx(2) 时，子节点 TFunctionCall 的 startToken 变为 gx，此时如果不做同步，其它三个节点的 startToken 仍然指向 fx，这是不对的。此时如果调用 TExpression (3).toString() , 它的结果是 fx(2), 而不是已经变更后的 gx(2).

有一点需要注意的是，如果从更高层级的node调用toString()方法，输出结果仍然是正确的，例如：

WHERE fx(2)+1>1

当用TFunctionCall.setString('gx(2)')把fx(2)更改为gx(2)后，TWhereClause.toString() 仍将输出正确的结果，原因是：

TFunctionCall.setString()不会影响TWhereClause的startToken,它仍然是WHERE。
在TFunctionCall.setString()时，gx会取代fx加入到TWhereClause的token list中来。

接下来，我们主要讨论在修改 AST 的同时，为了保证 node 和 token list 的同步， GSP 提供了哪些数据结构，并且在利用 API 对 AST 进行操作时，如何保证 node 和 token list 的同步。

1、保证 AST Node 和 Tokens 同步的数据结构

//  TParseTreeNode:

public TSourceToken getStartToken()
public TSourceToken getEndToken()
public void setStartToken(TSourceToken newStartToken)
public void setEndToken(TSourceToken newEndToken) 

public void removeTokens() // 从链表中移除该 node 对应的所有 token， 并确保 node 和 startToken, endToken状态的准确
public void appendNewNode(TParseTreeNode newNode, boolean needCommaBefore)
public void replaceWithNewNode(TParseTreeNode newNode)
public void setText(String nodeText)

public void setNewSubNode( TParseTreeNode oldSubNode, TParseTreeNode newSubNode,TParseTreeNode anchorNode)

public void setAnchorNode(TParseTreeNode anchorNode)

public ENodeStatus getNodeStatus()

设置 node 的 startToken, endToken

public void setStartToken(TSourceToken newStartToken)

设置 node 的 startToken。如果该 node 原来已经有一个 startToken，并且该 node 在原有 startToken 所维护的 NodesStartFromThisToken 栈的顶部，那么把该 node 从原有 token 的 NodesStartFromThisToken 栈中弹出。然后检查 node 是否在新 token 的 NodesStartFromThisToken 中存在，如果不存在，压入该 node。

以上我们可以看出，设置一个 node 的 startToken，需同时维护 node 和 token 的双向关系。

public void setEndToken(TSourceToken newStartToken)

setEndToken() 的处理逻辑同 setStartToken()。

// TSourceToken:

public Stack<TParseTreeNode> getNodesStartFromThisToken()
public Stack<TParseTreeNode> getNodesEndWithThisToken()

// 双向链表中， 通过以下方法把 token 加入链表，或从链表中移除。
public TSourceToken getNextTokenInChain()
public void setNextTokenInChain(TSourceToken nextTokenInChain)
public TSourceToken getPrevTokenInChain()
public void setPrevTokenInChain(TSourceToken prevTokenInChain)

public void removeFromChain()

利用 API 对 AST 进行操作后，利用以上数据结构和方法，同步 node 和 token list。主要实现这三个功能：

在 token 双向链表中，在指定的位置，把一个或多个 token 加入链表，或从链表中更新、移除。
token 被更新、移除后，把该 token 作为 startToken 或 endToken 的 node 做更新。
node 被更新、移除后，它本身及子 node 的状态需要做更新，确保后续操作可以知道这些 node 所处的状态，并作出合适的处理。

首次建立 token 间的双向链接

TGSqlParser parse SQL 语句时，所有 token 在 dosqltexttotokenlist() 中首次建立双向链接。

2、利用 API 对 AST 进行的操作

2.1 TParseTreeNode setString()

给 node 设置 text 时，GSP 会把 text 转换成 tokens，然后把该 node 原来在 AST 的 token list 中的 token 用这些新的 token 取代。node 及子 node 的结构并没有发生变化。

把该 node 及子 node 的 nodeStatus 更新为 nsPartitial 或 nsDetached
更新和该 node 指向相同 startToken, endToken 父节点的 startToken, endToken
在 AST 的 token 链表中，把该 node 原有的 token 换成新的 token

SELECT *
FROM scott.employee
WHERE e.job_id = 1

sqlparser.sqltext = "SELECT *\n" +
        "FROM scott.employee\n" +
        "WHERE e.job_id = 1";
sqlparser.parse();
TSelectSqlStatement select = (TSelectSqlStatement)sqlparser.sqlstatements.get(0);
TWhereClause whereClause = select.getWhereClause();
whereClause.getCondition().setString("e.salary > 1000");
System.out.println(select.toString());

执行上面Java代码后，SQL语句为：

SELECT *
FROM scott.employee
WHERE e.salary > 1000

如果需要删除该节点，请在父节点中调用 setXXX(null) 方法。调用 setString() 时传入一个长度为 0 的空字符串不起作用。

因为每种数据库的词法有差别，在把text转换成tokens时，需要明确是哪种数据库。为避免每次调用setString()时都额外指定数据库，引入一个静态全局变：TGSqlParser.currentDBVendor，当创建新的TGSqlParser实例时，设置TGSqlParser.currentDBVendor的值，该值总是和最近一次创建的TGSqlParser实例的数据库相同。如果想改变下一次setString()使用的数据库词法，可以更改该值。在多线程环境中这个设计可能导致问题

2.2 删除 node

一、调用父节点 setXXX() 方法, 并且传入一个 null 参数，即把该 node 从父节点从删除。 GSP 的内部具体实现如下：

调用 TParseTreeNode.removeTokens() 把对应的 tokens 从 AST 的 token list 中删除。
为保证 toString() 生成的 SQL 语法的正确，可能需要删除该 node 前后的一些辅助token。尤其是 TParseTreeNodeList 删除其中的某个元素时。
在父节点中把指向该 node 的引用设为 null。

二、 TParseTreeNodeList 的子类节点移除其中的某个元素时，调用 removeItem(int index), 它会自动调用 removeAndSyncTokens(int index)，如果被移除元素是 list 中的第一个节点，并且它之后有 comma token , 该 comma token 会被一起移除。如果被移除元素不是 list 中的第一个节点，它之前有 comma token 时, 需要一起移除。

以该SQL为例

SELECT e.emp_id,e.fname,e.lname,j.job_desc
FROM scott.employee AS e,jobs AS j

如果要从select list中删除e.emp_id，则e.emp_id后面的,也必须一起删除。而删除j.job_desc时，则j.job_desc之前的,也必须一起删除。

2.3 更新 node

调用父节点 setXXX() 方法设置新的 node。 这种情况一般建议使用原有 node 的 setString() 方法，效果是一样的，执行效率更高。

TGSqlParser parser = new TGSqlParser(EDbVendor.dbvoracle);
parser.sqltext = "SELECT A.COLUMN1, B.COLUMN2 from TABLE1 A, TABLE2 B where A.COLUMN1=B.COLUMN1";
parser.parse();
TSelectSqlStatement select = (TSelectSqlStatement)parser.sqlstatements.get(0);

//create a new node
TWhereClause whereClause = new TWhereClause();
whereClause.setText("where a>2");

//replace with the new created node
select.setWhereClause(whereClause);

System.out.println (select.toString());

2.4 新增 node

父节点中原来指向该节点的指针为空，新增 node 需要在父节点中调用对应的 setXXX() 方法。

例如这个 SELECT 语句， TCustomSqlStatement.getWhereClause() 是空的。

SELECT emp_id,salary+100 FROM emp

为给该语句增加 where clause，我们可以这样：

//create a new node
TWhereClause whereClause = new TWhereClause();
whereClause.setText("where a>2");

//link this new created node in the SELECT statement
select.setWhereClause(whereClause);

这样，我们就可以得到这个新的 SELECT 语句：

SELECT emp_id,salary+100 FROM emp where a>2

增加node时，一般包含以下步骤：

创建该node， new TParseTreeNode(), 然后调用 TParseTreeNode.setString() 设置该 node 的文本.
在父节点调用 setXXX(TParseTreeNode) 方法, 并传入 TParseTreeNode 参数.

确定插入位置

在 GSP 的内部，需要在 AST 的 token list 中找到合适的位置插入该 node 的 token。以上面的 SQL 为例，需要找到 emp token，然后在它后面把新的 token 插入。

我们以 TWhereClause 为例，在 TCustomSqlStatement 中，

public void setWhereClause(TWhereClause newWhereClause)

当使用这个方法时， token 的插入点默认为父节点的最后一个 token，例如上面的 SQL 的例子。但有时，这种假设会产生不正确的结果。例如这个语句：

SELECT emp_id,salary+100 FROM emp order by 1

如果还是使用 setWhereClause(TWhereClause newWhereClause), 那将产生下面错误的 SQL 语句。

SELECT emp_id,salary+100 FROM emp order by 1 where a>2

因为 SQL 语句的灵活性， GSP 无法自己辨别该把新 node 的 token 插入到哪个位置，因此，TParseTreeNode 提供这个方法，由调用者决定插入位置。

public void setAnchorNode(TParseTreeNode anchorNode)

TParseTreeNode anchorNode 是和新 node 同级的node，并且在 AST 中已经存在。针对上例中的 SQL，我们可以这样

select.setAnchorNode(select.joins);
select.setWhereClause(whereClause);

这样 where clause 为被插入到 anchor node: joins (即 from clause) 后，得到以下正确的结果：

SELECT emp_id,salary+100 FROM emp where a>2 order by 1

添加可能需要的辅助 token

可能需要添加辅助token，以保证SQL语法的正确。(尚未有具体的实现)
TParseTreeNodeList.addElement(T ptn) 插入子节点时，会对需要添加的辅助 token 做统一处理。

以该SQL为例

SELECT e.emp_id,e.fname,e.lname
FROM scott.employee AS e,jobs AS j

当在e.lname后加入j.job_desc时，必须在j.job_desc前同时加入,以确保语法正确。

3、GSP 中目前实现的对 AST 进行操作的 API

对 AST 进行的操作就是对 node 的新增、删除和更新（更新 node 自身，或更新 node 的文本）。

GSP API 已经完全支持 node 的删除和更新，但新增功能因为不同的 node 需要单独的 setXXX() 方法，需逐步添加支持，目前实现以下方法：

TParseTreeNode.setString(String sqlSegment), 更新 node 文本。
TParseTreeNodeList.removeItem(int index), 删除 node
TCustomSqlStatement.setWhereClause()
所有 TSelectSqlStatement.setXXX() 方法

利用visitor来访问和修改node

利用 visitor 来找到指定类型的 node 是一种高效的方法。利用 visitor 遍历整颗语法树并对 node 进行操作时，需要注意以下几点：

最小化原则，能够修改某个特定子节点，就不要修改整个父节点。同级节点的修改不会互相影响。
当用setString()修改某个节点后，它及其所有子节点都不再处于 ENodeStatus.nsNormal 状态，即不再属于整颗语法树，随后对这些子节点的改动也是无效的，不会反应在语法树中。
在 visitor 的 postVisit() 中处理节点时，可以保证先让子节点得到处理。
一个 visitor 可以根据实际业务需求，多次遍历同一个 node，处理不同的子节点。但要注意处理的节点必须处于 ENodeStatus.nsNormal 状态，否则改动不会反应到最后 toString() 的结果中。

在把一颗代表Oracle的SELECT语句的TSelectSqlStatement语法树转换成代表SQL Server的SELECT语句的TSelectSqlStatement时，我们可以采用上述方法。转换完成后，利用toString()就可以输出一个满足SQL Server语法的SELECT语句。

输出修改 AST 后的 SQL 语句: Node toString()

增、删、改node时，node 的 token list 已经同步到整个 AST 中，那么，输出整个 SQL 语句的文本只要简单的遍历 startToken 到 endToken 即可。

Node toScript()

从上面的介绍可知，利用toString()从语法树生成SQL文本时，对语法树上node做改动时，必须对底层对应的token做好同步。

利用toScript()从语法树生成SQL文本时，对语法树上node做改动，无需对底层对应的token做同步，但对语句中的每一个node都要根据语法重新生成文本，即便这个node在本次操作中没有发生变化。由于GSP目前无法对所有的SQL 语法都支持重新生成文本，因此容易导致生成不正确的SQL文本。

toScript()的优点在于改动语法树中的node时，无需同步更新底层的对应token，特别是一些辅助token。

Token的基本信息

1. token的类型

public ETokenType tokentype

主要的类型有：

ttkeyword, 上例中的SELECT，SELECT，FROM，WHERE，ORDER，BY。
ttidentifier,上例中的e，emp_id等。
ttwhitespace, 空格和tab。
ttreturn,换行符。
各种符号，ttperiod,ttcomma等。
ttsimplecomment，ttbracketedcomment，注释。

2. token的code

public int tokencode

code用来表示token的编号。ttkeyword类型的token有唯一不同的编号。ttidentifier类型的token编号值相同，都为264。各种符号的编号就是它们的ASCII值。

3. token的text

token的文本。

Node

Node表示SQL语法中的各个元素，例如

数据库对象名，e.emp_id，它包含三个tokene,.,emp_id。
也可以是一个子句(clause)，例如where子句，WHERE e.job_id = j.job_id,它包含ttkeyword，ttwhitespace，符号，ttidentifier等token。
也可以是一个语句，例如SELECT,包含各种SQL子句。

Iterator interface implmented in TParseTreeNode and Iterable interface implmented in TParseTreeNodeList

2020-08-20T00:00:00+00:00

Iterator interface implmented in TParseTreeNode is used to iterates the all source tokens of the parse tree node.

public abstract class TParseTreeNode implements Visitable,Iterator<TSourceToken>

Iterable interface implmented in TParseTreeNodeList is used to iterates all the parse tree nodes included in this list.

public class TParseTreeNodeList<T extends TParseTreeNode> extends TParseTreeNode implements Iterable<T> 

Iterable interface implmented in TStatementList is used to iterates the all the sql statements included in this list.

public class TStatementList extends TParseTreeNode implements Iterable<TCustomSqlStatement> 

Let take this SQL for example:

SELECT e.employee_id,
       e.last_name,
       e.department_id
FROM   employees e,
       departments d
;

SELECT e.employee_id,
       e.last_name,
       e.department_id
FROM   employees e
       JOIN departments d
         ON e.department_id = d.department_id 

Print the type of all sql statements:

for (TCustomSqlStatement sqlStatement:sqlparser.sqlstatements) {
	System.out.println(sqlStatement.sqlstatementtype);
}

Since TParseTreeNodeList is subclass of TParseTreeNode, so TParseTreeNodeList support both Iterable and Iterator interface. Please aware that Iterator interface is used to get all source tokens belong to this node like this:

while(sqlStatement.tables.hasNext()){
	System.out.println(sqlStatement.tables.next().toString());
}

While Iterable interface is used to get parse tree node in the list:

for(TTable table:sqlStatement.tables){
	System.out.println(table.getTableType());
}

General SQL Parser and SQLFrog

2020-07-15T00:00:00+00:00

GSP在SQLFrog项目中的应用

SQLFrog的两种工作模式

scan模式，仅找出需要转换的SQL语法和语义，给出报告，不做转换。
convert模式，找出需要转换的SQL语法和语义，并做转换。

scan为默认模式。

SQLFrog和GSP的关系

SQLFrog的底层实现依赖GSP的解析能力、visitor模式、语法树改动、语法树到SQL文本的生成技术。

使用GSP的visitor模式

顶层SQL语句应用某种类型node的visitor后，可以快速高效的访问语句中所有该类型的node。

下面这段代码示例访问所有TObjectName类型的node。在同一个visitor中，我们可以同时处理多个类型的node，根据实际的业务需求决定。

int ret = sqlparser.parse();
if (ret == 0){
    objectNameVisitor objectNameVisitor = new objectNameVisitor();
    for(int i=0;i<sqlparser.sqlstatements.size();i++){
        TCustomSqlStatement sqlStatement = sqlparser.sqlstatements.get(i);
        sqlStatement.acceptChildren(objectNameVisitor);
    }
}

class objectNameVisitor extends TParseTreeVisitor {
    public void preVisit(TObjectName node){
    }
}

GSP的visitor对所有node的深度访问可能会有遗漏，在开发中遇到此类问题需及时反馈。

利用visitor来进行SQL语句中datatype的检查

例如，在netezza到snowflake的SQL转换过程中，我们需要检查datatype是否兼容，当发现create table语句中有使用ST_GEOMETRY datatype时，我们就要标记出该datatype 需要被转换成snowflake的VARBINARY.

创建一个datatype visitor就非常容易实现以上功能。

class datatypeVisitor extends TParseTreeVisitor {
    public void preVisit(TTypeName node){
    // 加入功能检查代码
    }
}

类似的，我们可以对SQL函数进行检查。

visitor配合GSP的语法树改动技术，进行SQL转换

当找到需要转换的语法或语义点后，需要进行转换，通过修改GSP生成的SQL语法树，我们可以做到这一点。GSP提供两种方法可以从语法树生成SQL文本：toString() and toScript()。

toString()

利用toString()从语法树生成SQL文本时，要求对语法树上node做改动时，必须对底层对应的token做好同步。 SQLFrog采用这种方法。

toScript()

利用toScript()从语法树生成SQL文本时，对语法树上node做改动，无需对底层对应的token做同步，但对语句中的每一个node都要根据语法树重新生成文本，即便这个node在本次操作中没有发生变化。由于GSP目前无法对所有的SQL 语法都支持重新生成文本，因此容易导致生成不正确的SQL文本。

详细的说明可以看这篇文章。 还需要补充一篇文档对toString() and toScript()的工作原理做进一步的说明。

visitor相关代码

https://github.com/sqlparser/gsp_demo_java/tree/master/src/main/java/demos/visitors

SQL Function and TFunctionCall

2020-07-13T00:00:00+00:00

SQL function 在GSP中用TFunctionCall类表示。所有的function都用这个类表示。

TFunctionCall中的基本信息

一般的SQL function的语法如下：

funcName(arg1,arg2)

TFunctionCall中对应属性值：

getFunctionName() = funcName
getArgs().size() = 2
getArgs().getExpression(0) = arg1
getArgs().getExpression(1) = arg2
getFunctionType() = unknown_t

目前 getFunctionType() 表示的函数类型并不完善，用它来判断函数并不一定准确，需小心。

不规则参数的函数

一般情况下，函数的参数由TExpressionList getArgs()获得，这些函数的参数形如：

funcName(arg1,arg2,arg3)

其中，arg1,arg2,arg3的类型都是表达式：TExpression。

但有一些函数的参数并不能仅仅由表达式来表示，例如cast函数，

SELECT CAST(ytd_sales AS CHAR(5)) FROM titles

除了ytd_sales可以用TExpression，还有AS关键字和CHAR(5)数据类型，所以cast函数的参数不能用TExpressionList getArgs()获得，它对应的参数分别为：

getExpr1() = ytd_sales
getTypename() = CHAR(5)

函数的参数语法上的多样性，导致获取参数的API方式的不统一。以后API上可能需要改进以保证获取参数方法统一。

其它参数不规则的函数另行补充.

判断一个函数是否为数据库的内置函数(built-in funciton)

判断该函数是否为某一数据库的内置函数。EDbVendor 用来指定数据库厂商，例如db2,oracle等。
```
public boolean isBuiltIn(EDbVendor pDBVendor)
```

静态函数，需指定函数名。功能同1.

public static boolean isBuiltIn(String pName, EDbVendor pDBVendor)

aggregate function

1. ALL | DISTINCT

getAggregateType() is used to determine the ALL | DISTINCT used in the aggregate function.

2. WITHIN GROUP

This information is not available in the TFunctionCall yet.

window function

FUNCTION_NAME(expr) OVER {window_name | (window_specification)}

在Oracle and SQL Server中，window_specification也称为window_clause。

在GSP中，用TWindowDef表示window_specification, 在TFunctionCall中，用以下方法获得window_specification

public TWindowDef getWindowDef()

以这个包含window function的SQL为例：

SELECT NUM, ODD,
CUME_DIST( ) OVER(PARTITION BY ODD ORDER BY NUM) cumedist
FROM test4

GSP的输出为：

sstselect
--> function: CUME_DIST, type:unknown_t
	window_specification
		Parition value: ODD
		Order by clause: NUM

CASE FUNCTION

用TCaseExpression表示。

search SQL Function

SQL Datatype and TTypeName

2020-07-06T00:00:00+00:00

SQL Datatype and TTypeName

SQL的数据类型，GSP中的对应类：TTypeName。

SQL Datatype的类型

TTypeName 表示SQL中的数据类型，例如：

char(10),
int,
float(24),
decimal(8,2)

基本属性

以表示 decimal(8,2)为例，TTypeName的基本熟悉值如下：

getDataType() = decimal_t
toString() = decimal(8,2)
getDataTypeName() = decimal

扩展属性

扩展属性仅适用于部分特定的datatype。

getLength()

以 char(10)为例

getLength() = 10

getPrecision(), getScale()

以 decimal(8,2)为例

getPrecision() = 8
getScale() = 2

参考资料

SQL2003 datatypes 的详细列表见 “SQL in a Nutshell, 3rd Edition” p30, Table 2-8. SQL2003 categories and datatypes

search SQL Datatype

EDataType的完整列表

package gudusoft.gsqlparser;

/**
* @since v1.4.3.0
*/

public enum EDataType {
    unknown_t,
    /**
     * user defined datetype
     */
    generic_t,
    bfile_t,
    /**
     * ansi2003: bigint
     * postgresql
     */
    bigint_t,
    /**
     * ansi2003: blob
     */
    binary_t,
    binary_float_t,
    binary_double_t,
    /**
     * plsql binary_integer
     */
    binary_integer_t,
    /**
     * binary large object
     * Databases: DB2, teradata
     */
    binary_large_object_t,
    bit_t,
    bit_varying_t, // = varbit
    blob_t,
    /**
     * bool, boolean, ansi2003: boolean
     */
    bool_t,
    box_t,
    /**
     * teradata: byte
     */
    byte_t,
    bytea_t, //ansi2003 blob
    /**
     * teradata byteint
     */
    byteint_t,
    /**
     * char, character,  ansi2003: character
     */
    character_t,
    char_t,
    char_for_bit_data_t,
    /**
     * teradata: character large object
     */
    char_large_object_t,
    cidr_t,
    circle_t,
    clob_t,
    cursor_t,
    datalink_t,
    date_t,
    /**
     *  ansi2003: timestamp
     */
    datetime_t,
    datetimeoffset_t,// ansi2003: timestamp
    datetime2_t, //  ansi2003: timestamp with time zone
    /**
     * ansi2003: nclob
     * Databases: DB2
     */
    dbclob_t,
    /**
     * dec,decimal, ansi2003: decimal
     */
    decimal_t,
    dec_t,
    /**
     * double, double precision, ansi2003: float
     */
    double_t,
    enum_t,
    float_t,// ansi2003: double precision
    float4_t,// ansi2003: float(p)
    float8_t, // ansi2003 float(p)
    /**
     * ansi2003 blob
     */
    graphic_t,
    geography_t,
    geometry_t,
    hierarchyid_t,
    image_t,
    inet_t,
    /**
     * int, integer, ansi2003: integer
     */
    integer_t,
    int_t,
    int2_t, // ansi2003: smallint
    int4_t, // ansi2003: int, integer
    /**
     * Postgresql
     */
    interval_t,
    /**
     * teradata: interval day
     */
    interval_day_t,
    /**
     * teradata: interval day to hour
     */
    interval_day_to_hour_t,
    /**
     * teradata: interval day to minute
     */
    interval_day_to_minute_t,
    interval_day_to_second_t,
    /**
     * teradata: interval hour
     */
    interval_hour_t,
    /**
     * teradata: interval hour to minute
     */
    interval_hour_to_minute_t,
    /**
     * teradata: interval hour to second
     */
    interval_hour_to_second_t,
    /**
     * teradata: interval minute
     */
    interval_minute_t,
    /**
     * teradata: interval minute to second
     */
    interval_minute_to_second_t,
    /**
     * teradata: interval month
     */
    interval_month_t,
    /**
     * teradata:interval second
     */
    interval_second_t,
    /**
     * teradata interval year.
     */
    interval_year_t,
    interval_year_to_month_t,
    line_t,
    long_t,
    long_varchar_t,
    /**
     * long varbinary, mysql
     * MySQL Connector/ODBC defines BLOB values as LONGVARBINARY and TEXT values as LONGVARCHAR.
     */
    long_varbinary_t,
    longblob_t, // ansi2003: blob
    /**
     *  ansi2003: blob
     */
    long_raw_t,
    long_vargraphic_t,
    longtext_t,
    lseg_t,
    macaddr_t,
    mediumblob_t,
    /**
     * mediumint, middleint(MySQL) , ansi2003:  int
     */
    mediumint_t,
    mediumtext_t,
    money_t, // = decimal(9,2),INFORMIX
    /**
     * national_char_varying,nchar_varying,nvarchar, ansi2003: national character varying
     */
    nvarchar_t,
    /**
     * nchar, national char, national character,ansi2003: national character
     */
    nchar_t,
    ncharacter_t,
    /**
     * ansi2003: nclob
     */
    nclob_t,
    /**
     * ntext, national text, ansi2003: nclob
     */
    ntext_t,
    /**
     * nvarchar2(n)
     */
    nvarchar2_t,
    /**
     * number, num
     */
    number_t,
    /**
     *  ansi2003: numeric
     */
    numeric_t,
    oid_t,
    path_t,
    /**
     * teradata: period(n)
     */
    period_t,
    /**
     * plsql pls_integer
     */
    pls_integer_t,
    point_t,
    polygon_t,
    raw_t,
    /**
     * ansi2003: real
     */
    real_t,
    rowid_t,
    rowversion_t,
    serial_t,// = serial4
    serial8_t,// = bigserial
    bigserial_t,//informix
    smallfloat_t,//informix
    /**
     * MySQL: set
     */
    set_t,
    smalldatetime_t,
    /**
     * ansi2003: smallint
     */
    smallint_t,
    smallmoney_t,
    sql_variant_t,
    table_t,
    text_t,
    /**
     * ansi2003: time
     */
    time_t,
    /**
     * teradata: time with time zone
     */
    time_with_time_zone_t,
    time_without_time_zone_t,
    timespan_t, // ansi2003: interval
    timestamp_t, // ansi2003: timestamp
    /**
     * timestamp with local time zone,
     * Database: Oracle,SQL Server
     */
    timestamp_with_local_time_zone_t,
    /**
     * timestamp with time zone, timestamptz, ansi2003: timestamp with time zone
     */
    timestamp_with_time_zone_t,
    timestamp_without_time_zone_t,
    /**
     * time with time zone,  ansi2003: time with time zone
     * Databases: teradata
     */
    timetz_t,
    timentz_t,
    tinyblob_t,
    tinyint_t,
    tinytext_t,
    uniqueidentifier_t,
    urowid_t,
    /**
     *  ansi2003: blob
     */
    varbinary_t,
    /**
     * netezza, bit varying
     */
    varbit_t,
    /**
     * teradata: varbyte
     */
    varbyte_t,
    /**
     * varchar, char varying, character varying, ansi2003:character varying(n)
     */
    varchar_t,
    /**
     * ansi2003: character varying
     */
    varchar2_t,
    varchar_for_bit_data_t,// ansi2003:    bit varying
    lvarchar_t, //informix,openedge
    idssecuritylabel_t,//informix
    /**
     *  ansi2003: nchar varying
     */
    vargraphic_t,
    row_data_types_t, //informix
    collection_data_types_collection_t,
    collection_data_types_set_t,
    collection_data_types_multiset_t,
    collection_data_types_list_t,
    /**
     * ansi2003: tinyint
     */
    /**
     * datatypeAttribute in cast function will be treated as a datatype without typename
     * RW_CAST ( expr AS datatypeAttribute )
     */
    no_typename_t,
    year_t,
    xml_t, // ansi2003: xml
    xmltype_t, // ansi2003: xml
    natural_t, //plsql
    naturaln_t,//plsql
    positive_t,
    positiven_t,
    signtype_t,
    simple_integer_t,
    double_precision_t,
    boolean_t,
    string_t,
    listType_t, //hive array 
    structType_t,//hive
    mapType_t,
    unionType_t,
    refcursor_t,//postgresql
    json_t, //postgresql
    jsonb_t,//postgresql
    self_t,//oracle, constructor function
    seconddate_t,//hana
    smalldec_t,//hana
    array_t,//hana,bigquery
    alphanum_t,//hana
    shorttext_t,//hana
    bintext_t,//hana
    currency_t,//dax
    int8_t,
    lvarbinary_t,//openedge
    long_byte_t,//mysql
    object_t,//snowflake
    variant_t,//snowflake
    unsigned_int_t,//
    decfloat_t,//db2
    struct_t,//bigquery
    int64_t,//bigquery
    float64_t,//bigquery

}

General SQL Parser

General SQL Parser FAQ

Table of Contents

Technical support

Q: Does general SQL parser depend on any third party library/software/DLLs?

Q: In order to use GSP to validate SQL syntax, do I need to connect to a database instance such as Oracle?

Q: How long will my feature request or bug report be processed?

Q: When database vendor add new SQL syntax, how long will those SQL syntaxes be supported in general SQL parser?

Q: Is GSP .NET version a .NET Standard library?

Licensing and billing

Q: What’s kind of General SQL Parser license do I need?

Q: What if I want to distribute this library?

Q: Will the license expire?

Q: if I were to buy support for one database to start – can I add additional ones at a later date?

Q: We have a need to create a parsing service. Can we use the components to develop a parser and then deploy the service to a web-like Tomcat for other to consume via API or is this specific to a specific user and a specific user’s machine?

Q: May I use GSP in more than one product?

Q: Payment term?

Sales and reseller

Q: We are resellers. can we purchase your products for our customers?

Q: Can we purchase via emailed PO

Data lineage analysis from multiple SQL Files.

1. Parse SQL file with ambigious table/columnn relation

2. Provides the table definition

3. How to provides multiple SQL files to GSP

4. Pulling all objects from a database (table, view, function, procedure, and trigger definitions)

SQL parse tree node and underlying tokens

1. TParseTreeNode setString()

2. remove a node

Call setXXX() method from the parent node and pass null as input parameter, will remove the SQL clause from the parent node.

Call removeItem(int index) of TParseTreeNodeList will remove an item from the node list.

3. update a node

4. add a new node

APIs available to modify the parse tree

use visitor pattren to search and modify node

SQL parse tree node and expression modification

1. remove a sub-node of an Expression

1.1 API

1.2 using TParseTreeNodeList.removeItem(int index) to remove the sub-expression in the expression list

2. Modify the expression

3. Add a new expression

Reference Java code

SQL parse tree node and expression modification

一、 删除表达式的子节点

1. API

2. 相关属性的变化

3. 利用 TParseTreeNodeList.removeItem(int index) 来移除 expression list 中的 expresssion

二、更改表达式

三、增加表达式

参考代码

SQL parse tree node and underlying tokens

一、 SQL 文本，AST Node 及 Tokens 的关系

例 1, token: emp_id

例 2, token: salary

例 3, token: 100

例 4, token: emp

1, TParseTreeNodeList 子类类型的 node

2, 节点与子节点的关系

二、GSP 如何保证 AST Node 和 Tokens 的同步

1、 保证 AST Node 和 Tokens 同步的数据结构

设置 node 的 startToken, endToken

首次建立 token 间的双向链接

2、 利用 API 对 AST 进行的操作

2.1 TParseTreeNode setString()

2.2 删除 node

2.3 更新 node

2.4 新增 node

确定插入位置

添加可能需要的辅助 token

3、GSP 中目前实现的对 AST 进行操作的 API

利用visitor来访问和修改node

输出修改 AST 后的 SQL 语句: Node toString()

Node toScript()

Token的基本信息

1. token的类型

2. token的code

3. token的text

Node

Iterator interface implmented in TParseTreeNode and Iterable interface implmented in TParseTreeNodeList

General SQL Parser and SQLFrog

SQLFrog的两种工作模式

Call removeItem(int index) of `TParseTreeNodeList` will remove an item from the node list.

一、删除表达式的子节点

1、保证 AST Node 和 Tokens 同步的数据结构

2、利用 API 对 AST 进行的操作