Tuesday, December 20, 2022

Java HashMap and multi-thread

Java HashMap is not thread-safe. But if the logic only access it in certain way and reader threads can do retry similar to a optimistic read (without the help of StampedLock etc), can it be done?

Wrote some code to simulate a special scenario. Seems to be working. But then, seems risky... Should probably also test the performance of using StampedLock.

Monday, November 21, 2022

Optimized sort vs Quickselect

With the implementation of Quickselect done yesterday, now comes the question: is it quicker to use an optimized algorithm to sort the whole list then find the i-th element instead?

As expected, it depends on the list size. For short list (maybe < 10 items), optimized sort is faster. But otherwise Quickselect is faster because it only sort part of the list.

Sunday, November 20, 2022

Sunday, October 30, 2022

Rule based static analysis for Java source code

Recently, someone asked me about doing static analysis on source code. Here is an experiment to see how easy (or difficult) it is to combine a rule engine and a Java source code parser for code analysis.

Around 10 years ago I did something similar: create abstract syntax tree (AST) from Java source code for analysis. Back then, it was for finding out all the toString() calls for BigDecimal objects because the behavior changed in Java 1.5 (JSR 13).

This time I wanted it to be more generic and be able to:

  • parse one or more Java source files into AST
  • use YAML (or JSON etc) to write multiple rules to define what we are looking for and what should be the actions

So, on a rainy Sunday afternoon, this is what I came up with. Instead of the PMD library I used 10 years ago, this time I tried to use JavaParser to create the AST. And for the rule engine, it is using the easy-rules (sadly the library is in maintenance mode and no further development being done).

So far it can read rules from YAML files for the condition and actions, e.g.:

name: "BigDecimal explicit toString"
description: "find code that explicitly call BigDecimal toString"
condition: "resolvedMethodRef.getPackageName().toString().equals(\"java.math\") &&
resolvedMethodRef.getClassName().toString().equals(\"BigDecimal\") &&
node.getName().toString() == \"toString\""
- "System.out.print(\"Cautions! Explicitly calling BigDecimal toString() in \" + file.toString());
if (node.getRange().isPresent()) {
System.out.println(\" at \" + node.getRange().get().toString());


 It is still a POC with these limitations:

  • can only find / act on certain syntax (method calls and binary expression). Needed to figure out a way to easily add all AST node types
  • can't use reflection on source code being scanned. For that, needed to compile those source code files and (dynamically?) add them to the class loader
  • needed to figure out a way to easily add vocabulary for writing rules to handle complex scenarios, especially if there are cases for rules that need to check condition with multiple files

 But it seems promising. Maybe a potential Jenkins plugin? Or even a standalone application?


Sunday, September 4, 2022

Returning local references from Java JNI

Wrote some tests on Java JNI. Lesson learned: it is ok to return a local reference from JNI. Just don't explicitly call DeleteLocalRef on it otherwise Java side will get a null value.


Regions highlighted on the graph:

(1) Running the JNI calls normally, freeing local references as it goes. Everything works as expected.

(2) Running the JNI calls without explicitly freeing local references. It is still fine. There were reclaimable objects left in the memory. But triggering the GC (from VisualVM) will free up the space.

(3) Running the JNI calls incorrectly by freeing the local references before returning from JNI. Memory usage normal. But the program printout should indicate 0 test case passed. This is because null is returned instead of the expected HashMap.

(4) !! This will cause OutOfMemoryError !!. The JNI calls explicitly create global references without freeing them, causing out-of-memory issue.


Monday, August 1, 2022

Java Collections.shuffle poor performance with parallelism

 Previously when using a Java program to simulate the 100 prisoners problem, noticed an interesting performance issue: the Collections.shuffle method will get slower when there are more threads to run the simulation in parallel. And running the VisualVM profiler does indicate that the performance issue came from Collections.shuffle when preparing a list of randomized integer.

This exercise is an attempt to collect more information and do an initial analysis.


The prisoner problem simulation needed a list of n integer from 1 to n (or 0 to n-1) randomly stored in it.  Here is a simplified implementation of two approaches.

The easy approach is to add 0..n-1 to the list and then use Collections.shuffle to randomize it.

private static List<Integer> initBoxesByShuffle() {
List<Integer> boxes = new ArrayList<>(LIST_SIZE);
for (int i = 0; i < LIST_SIZE; i++) {
return boxes;

A not-so-efficient approach is to loop from 0 to n-1 and insert each value at random places of the list. This is slower as the ArrayList.add(pos, val) method will need to shift all existing elements after the insertion point by one position (using other List implementation will help but let's just assume we are stuck with ArrayList).

private static List<Integer> initBoxesByRandInsert() {
List<Integer> boxes = new ArrayList<>(LIST_SIZE);
Random rand = new Random();
for (int i = 1; i < LIST_SIZE; i++) {
boxes.add(rand.nextInt(boxes.size()), i);
return boxes;

Performance was measured with the perf command, e.g.:

perf stat -e task-clock,cpu-migrations,page-faults,instructions,branch-misses
,branches,cache-misses,cache-references /usr/lib64/jvm/java-11-openjdk/bin/java -classpath . ShufflePerformance

Machine used was an AMD 5600G (6C12T) with 44GiB of memory running OpenSUSE Tumbleweed. JDK used was OpenJDK 11.0.16.


Performance data collected when trying to run the two approaches with different number of threads (range from 1 to 5):

A few things to note:

  • Row 1 and 6 show the two approaches executed with single thread (parallelism = 1). The time elapsed (i.e. execution time) of Collections.shuffle is much better than random insert (~10 seconds vs ~30 seconds).
  • However, when the parallelism is increased to divide the same amount of work among different threads, suddenly Collections.shuffle is getting slower. Looking at the time elapsed columns for Collections.shuffle (row 1 to row 5), the execution time roughly doubled every time an extra thread is added. In contrast, for the random insert approach (row 6 to row 10), the execution time went down steadily as more threads are added


  • And looking at the user time column (i.e. amount of work done by all CPU cores), the values stayed roughly the same for random insert approach. This explain why when more threads were added, the execution time decreased. But, the values for the Collections.shuffle kept increasing when more threads were added. That means the effort actually increased when we tried to divide the same task among more threads.
  • That was interesting because if we look at the instructions and branches columns, we can see that (1) the numbers stayed relatively the same for each approach no matter how many threads we were using; and (2) the random insert approach actually requires more works. 
  • The random insert approach has a much higher branch-misses (both by count and by %) but they stay relatively the same regardless of how many threads were used. However, the Collections.shuffle approach saw the branch-misses % figures going up with number of threads.


  • The cache-misses % stated relatively the same for random insert approach but reached over 80% for Collections.shuffle.


Initial analysis

If the high cache-misses rate is the reason of the poor performance, what is the cause? And will a different architecture (OS and CPU) yield a different result?

In theory the Collections.shuffle should be more efficient as it runs with linear time by swapping elements within the list without additional copying or moving. Could it be the case that the random access nature of the swapping needed the whole ArrayList to be in cache for it to run efficiently?