Thrill  0.1
Thrill Documentation Overview

Getting Started


WordCount, PageRank, and more...
Convenient user interface for writing Big Data algorithms as dataflow graphs with imperative actions. Contains the Context and DIA classes.
See List of DIA Operation for a comprehensive overview.
Distributed data structures and algorithms used to build API: Shuffle/Reduce Table, StageBuilder.
Manages transfer of large amounts of data between workers. Contains Serialization, File, BlockWriter, Channel, and Multiplexer. Controls connections between compute nodes, contains collective communication primitives like Broadcast, AllReduce for simple datatypes.
Backends: net::mock, net::tcp, net::mpi.
asynchronous I/O, direct file access impl, and disk allocation. (shared with STXXL) for transparent access to POSIX, S3, and in future HDFS.
Independent common tools such as Logger, ThreadPool, Delegates, ConcurrentQueue, CmdlineParser, etc. Keeps track of memory consumption of all stakeholders in system. Extra memory pool for I/O data structures.



Thrill is free software provided under BSD 2-clause license.

If you use Thrill in an academic context or publication, please cite our paper

author = {Timo Bingmann and Michael Axtmann and Emanuel J{\"{o}}bstl and Sebastian Lamm and Huyen Chau Nguyen and Alexander Noe and Sebastian Schlag and Matthias Stumpp and Tobias Sturm and Peter Sanders},
title = {{Thrill}: High-Performance Algorithmic Distributed Batch Data Processing with {C++}},
booktitle = {IEEE International Conference on Big Data},
year = 2016,
pages = {172--183},
month = dec,
organization = {IEEE},
note = {preprint arXiv:1608.05634},
isbn = {978-1-4673-9005-7},