Sunday, October 30, 2022

Rule based static analysis for Java source code

Recently, someone asked me about doing static analysis on source code. Here is an experiment to see how easy (or difficult) it is to combine a rule engine and a Java source code parser for code analysis.

Around 10 years ago I did something similar: create abstract syntax tree (AST) from Java source code for analysis. Back then, it was for finding out all the toString() calls for BigDecimal objects because the behavior changed in Java 1.5 (JSR 13).

This time I wanted it to be more generic and be able to:

  • parse one or more Java source files into AST
  • use YAML (or JSON etc) to write multiple rules to define what we are looking for and what should be the actions

So, on a rainy Sunday afternoon, this is what I came up with. Instead of the PMD library I used 10 years ago, this time I tried to use JavaParser to create the AST. And for the rule engine, it is using the easy-rules (sadly the library is in maintenance mode and no further development being done).

So far it can read rules from YAML files for the condition and actions, e.g.:

---
name: "BigDecimal explicit toString"
description: "find code that explicitly call BigDecimal toString"
condition: "resolvedMethodRef.getPackageName().toString().equals(\"java.math\") &&
resolvedMethodRef.getClassName().toString().equals(\"BigDecimal\") &&
node.getName().toString() == \"toString\""
actions:
- "System.out.print(\"Cautions! Explicitly calling BigDecimal toString() in \" + file.toString());
if (node.getRange().isPresent()) {
System.out.println(\" at \" + node.getRange().get().toString());
}"

 

 It is still a POC with these limitations:

  • can only find / act on certain syntax (method calls and binary expression). Needed to figure out a way to easily add all AST node types
  • can't use reflection on source code being scanned. For that, needed to compile those source code files and (dynamically?) add them to the class loader
  • needed to figure out a way to easily add vocabulary for writing rules to handle complex scenarios, especially if there are cases for rules that need to check condition with multiple files

 But it seems promising. Maybe a potential Jenkins plugin? Or even a standalone application?

https://github.com/kitsook/RuleBasedStaticAnalysis