Thrill  0.1
Runtime Profile Plots and DIA Dataflow Graphs

Creating Runtime Profile Plots

Thrill contains a built-in logging mechanism beyond the usual stdout output. This must be activated by supplying the environment variable THRILL_LOG=abc. Thrill writes rather extensive logs to abc-host0.json in a JSON format.

The source archive contains a program to read this raw JSON log (which can be used in future for other purposes) and output a HTML execution profile.

For example, try to run:

$ cd ~/thrill/build/examples/page_rank/
$ THRILL_LOG=ourlog ./page_rank_run --generate 100000
$ ls -la ourlog*
(this should show ourlog-host0.json and ourlog-host1.json)
$ ~/thrill/build/misc/json2profile ourlog*.json > exec-profile.html

And open exec-profile.html using a web browser to see an execution profile and more important statistics.

DIA Dataflow Graph Output

It is also possible to create a .dot file of the data-flow graph from the THRILL_LOG output using a small python program. The resulting .dot file can then be layouted using graph drawing tools such as GraphViz (dot, fdp, etc). Try to run the following for the previous page rank example:

$ ~/thrill/misc/json2graphviz.py ourlog-host0.json > page_rank.dot
$ dot -Tps -o page_rank.ps page_rank.dot
or
$ dot -Tsvg -o page_rank.svg page_rank.dot

The following graph is an example generated by running the DC7 suffix sorter (with one level of recursion):

dc7-dataflow.svg