Thrill
0.1
|
The final step in this tutorial is to enable reading and writing of files instead of generating random point.
In Thrill line-based text files are easily read using ReadLines(). This DIA operation creates a DIA<std::string>
which can be parsed further. The following function performs such an operation and parses the lines as "<x> <y>" using std::istringstream
into our Point
struct.
LoadPoints returns a DIA<Point>
, so we need to refactor the random point generator into a similar function.
Interestingly, we have to add an Execute() to explicitly generate the cached DIA prior to returning from the function, because otherwise the random generator objects are destructed while still be used by the lambda function. This is one of the pitfalls due to lazy DIA operation evaluation.
With LoadPoints() and GeneratePoints(), we only have to add a DIA<Point>
parameter to Process().
To make the output configurable we also add an output
parameter. Line-based text files can be written in Thrill using WriteLines(), which requires a DIA<std::string>
. So we have to map Points
to std::string
objects prior to calling the write operation.
The only remaining thing to do it to pass the command line parameters to Process(). This is a very simplistic method to process the command line, see other examples in Thrill's source for a more elaborate command line parser.
See the complete example code examples/tutorial/k-means_step5.cpp
The source package contains a file k-means_points.txt
as an example input.