Thrill logo

About Thrill

Thrill is a C++ framework for distributed Big Data batch computations on a cluster of machines. It is currently being designed and developed as a research project at Karlsruhe Institute of Technology and is in early testing.

We last presented our ongoing work on Thrill at the IEEE Conference on Big Data in December 2016. A longer technical report about the design and goals is available at arXiv: https://arxiv.org/abs/1608.05634. This paper gives a good introduction into the concepts and ideas. The slides of our presentation at the conference are also available and give a visual introduction.

The development code is available on github under a BSD open-source license and outside contributors are welcome to join and contact us.

Doxygen documentation automatically built from the master is available. The doxygen documentation also contains a tutorial “ Write your first Thrill program”.

GitHub: Travis-CI: Travis-CI Status

Some of the main goals for the design are:

In the long term the framework can play a mediator role between Big Data applications and lower layer algorithms research, which may include:

Current Authors and Contributors:

Michael Axtmann, Timo Bingmann, Emanuel Jöbstl, Sebastian Lamm, Huyen Chau Nguyen, Alexander Noe, Matthias Stumpp, Peter Sanders, Sebastian Schlag, Tobias Sturm.

Weblog Posts

  • Word Count Example

    This C++ snippet shows our (unoptimized) working example of Word Count in Thrill.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    
    using namespace thrill;
    
    size_t WordCountExample(Context& ctx) {
    
        auto lines = ReadLines(ctx, "wordcount.in");
    
        auto word_pairs = lines.template FlatMap<WordCountPair>(
            [](const std::string& line, auto emit) -> void {
                    /* map lambda: emit each word */
                for (const std::string& word : common::split(line, ' ')) {
                    if (word.size() != 0)
                        emit(WordCountPair(word, 1));
                }
            });
    
        auto red_words =  word_pairs.ReduceBy(
            [](const WordCountPair& in) -> std::string {
                /* reduction key: the word string */
                return in.first;
            },
            [](const WordCountPair& a, const WordCountPair& b) -> WordCountPair {
                /* associative reduction operator: add counters */
                return WordCountPair(a.first, a.second + b.second);
            });
    
        red_words.Map(
            [](const WordCountPair& wc) {
                return wc.first + ": " + std::to_string(wc.second);
            })
        .WriteLinesMany(
            "wordcount_" + std::to_string(ctx.my_rank()) + ".out");
    
        return 0;
    }
    
    int main(int argc, char* argv[]) {
        return api::Run(WordCountExample);
    }