Finding differences in JavaScript programs with the help of parse trees

I’d like to present a quick and dirty solution to comparing two JavaScript programs that can’t be compared with a simple diff. This can be useful to check whether programs are mostly equal and can help in figuring out where these differences are. The result will be achieved by performing a makeshift parse tree comparison.

Tools that will be used:

First, Google’s Closure Compiler will be used to generate the parse trees.

$ java -jar compiler.jar --print_tree --js original.js > original.tree.txt
$ java -jar compiler.jar --print_tree --js modified.js > modified.tree.txt

The sourcename attribute should be stripped from the resulting file. Actually for most comparisons I did it was sufficient to keep only the first word of each line. This script can be used to print only the first word, but preserve heading whitespace (whitespace should be kept to keep a better overview later):

# treestrip.pl
while (<>) {
    m/(^\W*\w+)\W*/;
    print $1."\n";
}

Next:

$ perl treestrip.pl < original.tree.txt > original.stripped.tree.txt
$ perl treestrip.pl < modified.tree.txt > modified.stripped.tree.txt

Finally the stripped down parse trees can be compared with vimdiff. The iwhite option makes vimdiff ignore differences in whitespace:

$ vimdiff -c 'set diffopt+=iwhite' original.stripped.tree.txt modified.stripped.tree.txt

Suspicious blocks can be traced back to the parse tree before it was stripped (same line number). From there the surrounding function or variable names will lead back to the code in the JavaScript file.