Thrill
0.1
|
A File is an ordered sequence of Block objects for storing items.
By using the Block indirection, the File can be composed using existing Block objects (via reference counting), but only contain a subset of the items in those Blocks. This may be used for Zip() and Repartition().
A File can be written using a BlockWriter instance, which is delivered by GetWriter(). Thereafter it can be read (multiple times) using a BlockReader, delivered by GetReader().
Using a prefixsum over the number of items in a Block, one can seek to the block contained any item offset in log_2(Blocks) time, though seeking within the Block goes sequentially.
#include <file.hpp>
Public Types | |
using | ConsumeReader = BlockReader< ConsumeFileBlockSource > |
using | KeepReader = BlockReader< KeepFileBlockSource > |
using | Reader = DynBlockReader |
using | Writer = BlockWriter< FileBlockSink > |
Public Member Functions | |
File (BlockPool &block_pool, size_t local_worker_id, size_t dia_id) | |
Constructor from BlockPool. More... | |
File (const File &)=delete | |
non-copyable: delete copy-constructor More... | |
File (File &&)=default | |
move-constructor: default More... | |
const Block & | block (size_t i) const |
Return reference to a block. More... | |
const std::deque< Block > & | blocks () const |
Returns constant reference to all Blocks in the File. More... | |
File | Copy () const |
Return a copy of the File (explicit copy-constructor) More... | |
bool | empty () const |
Returns true if the File is empty. More... | |
template<typename ItemType , typename CompareFunction = std::less<ItemType>> | |
size_t | GetIndexOf (const ItemType &item, size_t tie, size_t left, size_t right, const CompareFunction &func=CompareFunction()) const |
Get index of the given item, or the next greater item, in this file. More... | |
template<typename ItemType , typename CompareFunction = std::less<ItemType>> | |
size_t | GetIndexOf (const ItemType &item, size_t tie, const CompareFunction &less=CompareFunction()) const |
Get index of the given item, or the next greater item, in this file. More... | |
template<typename ItemType > | |
ItemType | GetItemAt (size_t index) const |
template<typename ItemType > | |
std::vector< Block > | GetItemRange (size_t begin, size_t end) const |
size_t | ItemsStartIn (size_t i) const |
Return number of items starting in block i. More... | |
size_t | num_blocks () const |
Return the number of blocks. More... | |
size_t | num_items () const |
Return the number of items in the file. More... | |
File & | operator= (const File &)=delete |
non-copyable: delete assignment operator More... | |
File & | operator= (File &&)=default |
move-assignment operator: default More... | |
void | set_dia_id (size_t dia_id) |
size_t | size_bytes () const |
Return the number of bytes of user data in this file. More... | |
Writers and Readers | |
Writer | GetWriter (size_t block_size=default_block_size) |
Get BlockWriter. More... | |
Reader | GetReader (bool consume, size_t prefetch_size=File::default_prefetch_size_) |
Get BlockReader or a consuming BlockReader for beginning of File. More... | |
KeepReader | GetKeepReader (size_t prefetch_size=File::default_prefetch_size_) const |
Get BlockReader for beginning of File. More... | |
ConsumeReader | GetConsumeReader (size_t prefetch_size=File::default_prefetch_size_) |
Get consuming BlockReader for beginning of File. More... | |
template<typename ItemType > | |
KeepReader | GetReaderAt (size_t index, size_t prefetch=default_prefetch_size_) const |
Get BlockReader seeked to the corresponding item index. More... | |
std::string | ReadComplete () const |
Public Member Functions inherited from BlockSink | |
BlockSink (BlockPool &block_pool, size_t local_worker_id) | |
constructor with reference to BlockPool More... | |
BlockSink (BlockPool *block_pool, size_t local_worker_id) | |
constructor with reference to BlockPool More... | |
BlockSink (const BlockSink &)=default | |
default copy-constructor More... | |
BlockSink (BlockSink &&)=default | |
move-constructor: default More... | |
virtual | ~BlockSink () |
required virtual destructor More... | |
virtual PinnedByteBlockPtr | AllocateByteBlock (size_t block_size) |
virtual void | AppendPinnedBlock (PinnedBlock &&b, bool is_last_block) |
Appends the PinnedBlock. More... | |
BlockPool * | block_pool () const |
Returns block_pool_. More... | |
size_t | local_worker_id () const |
local worker id to associate pinned block with More... | |
common::JsonLogger & | logger () |
Returns BlockPool.logger_. More... | |
BlockSink & | operator= (const BlockSink &)=default |
default assignment operator More... | |
BlockSink & | operator= (BlockSink &&)=default |
move-assignment operator: default More... | |
virtual void | ReleaseByteBlock (ByteBlockPtr &block) |
Release an unused ByteBlock with n bytes backing memory. More... | |
size_t | workers_per_host () const |
return number of workers per host More... | |
Public Member Functions inherited from ReferenceCounter | |
ReferenceCounter () noexcept | |
new objects have zero reference count More... | |
ReferenceCounter (const ReferenceCounter &) noexcept | |
coping still creates a new object with zero reference count More... | |
~ReferenceCounter () | |
bool | dec_reference () const noexcept |
Call whenever resetting (i.e. More... | |
void | inc_reference () const noexcept |
Call whenever setting a pointer to the object. More... | |
ReferenceCounter & | operator= (const ReferenceCounter &) noexcept |
assignment operator, leaves pointers unchanged More... | |
size_t | reference_count () const noexcept |
Return the number of references to this object (for debugging) More... | |
bool | unique () const noexcept |
Test if the ReferenceCounter is referenced by only one CountingPtr. More... | |
Static Public Attributes | |
static size_t | default_prefetch_size_ = 2 * default_block_size |
Static Public Attributes inherited from BlockSink | |
static constexpr bool | allocate_can_fail_ = false |
Private Attributes | |
std::deque< Block > | blocks_ |
container holding Blocks and thus shared pointers to all byte blocks. More... | |
size_t | dia_id_ |
optionally associated DIANode id More... | |
size_t | id_ |
unique file id More... | |
std::deque< size_t > | num_items_sum_ |
size_t | size_bytes_ = 0 |
Total size of this file in bytes. Sum of all block sizes. More... | |
size_t | stats_bytes_ = 0 |
size_t | stats_items_ = 0 |
Friends | |
std::ostream & | operator<< (std::ostream &os, const File &f) |
Output the Block objects contained in this File. More... | |
Methods of a BlockSink | |
static constexpr bool | allocate_can_fail_ = false |
void | AppendBlock (const Block &b) |
void | AppendBlock (Block &&b) |
void | AppendBlock (const Block &b, bool) final |
void | AppendBlock (Block &&b, bool) final |
void | Close () final |
Closes the sink. Must not be called multiple times. More... | |
~File () | |
write out stats More... | |
void | Clear () |
Free all Blocks in the File and deallocate vectors. More... | |
Additional Inherited Members | |
Protected Attributes inherited from BlockSink | |
size_t | local_worker_id_ |
local worker id to associate pinned block with More... | |
using KeepReader = BlockReader<KeepFileBlockSource> |
using Reader = DynBlockReader |
using Writer = BlockWriter<FileBlockSink> |
~File | ( | ) |
write out stats
Definition at line 26 of file file.cpp.
References File::dia_id_, die, File::id_, BlockSink::logger(), ReferenceCounter::reference_count(), File::stats_bytes_, and File::stats_items_.
Referenced by File::AppendBlock().
|
inline |
Append a block to this file, the block must contain given number of items after the offset first.
Definition at line 88 of file file.hpp.
References File::blocks_, Block::num_items(), File::num_items(), File::num_items_sum_, Block::size(), File::size_bytes_, File::stats_bytes_, and File::stats_items_.
Referenced by File::AppendBlock(), CacheBlockQueueSource::NextBlock(), and ReadBinaryNode< ValueType >::ReadBinaryNode().
|
inline |
Append a block to this file, the block must contain given number of items after the offset first.
Definition at line 99 of file file.hpp.
References File::blocks_, File::num_items(), File::num_items_sum_, File::size_bytes_, File::stats_bytes_, and File::stats_items_.
|
inlinefinalvirtual |
Append a block to this file, the block must contain given number of items after the offset first.
Implements BlockSink.
Definition at line 110 of file file.hpp.
References File::AppendBlock().
|
inlinefinalvirtual |
Append a block to this file, the block must contain given number of items after the offset first.
Implements BlockSink.
Definition at line 116 of file file.hpp.
References File::AppendBlock(), File::Clear(), File::Close(), and File::~File().
|
inline |
Return reference to a block.
Definition at line 191 of file file.hpp.
References File::blocks_.
Referenced by KeepFileBlockSource::MakeNextBlock().
|
inline |
Returns constant reference to all Blocks in the File.
Definition at line 197 of file file.hpp.
References File::blocks_.
void Clear | ( | ) |
Free all Blocks in the File and deallocate vectors.
Definition at line 57 of file file.cpp.
References File::blocks_, File::num_items_sum_, File::size_bytes_, and tlx::swap().
Referenced by File::AppendBlock(), CacheNode< ValueType >::Dispose(), RebalanceNode< ValueType >::Dispose(), BaseWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::Dispose(), PrefixSumNode< ValueType, SumFunction, Inclusive >::Dispose(), ZipWithIndexNode< ValueType, ZipFunction >::Dispose(), ReadBinaryNode< ValueType >::Dispose(), and ConsumeFileBlockSource::~ConsumeFileBlockSource().
|
finalvirtual |
Closes the sink. Must not be called multiple times.
Implements BlockSink.
Definition at line 52 of file file.cpp.
Referenced by File::AppendBlock().
File Copy | ( | ) | const |
Return a copy of the File (explicit copy-constructor)
Definition at line 42 of file file.cpp.
References BlockSink::block_pool(), File::blocks_, File::dia_id_, BlockSink::local_worker_id(), File::num_items_sum_, File::size_bytes_, File::stats_bytes_, and File::stats_items_.
Referenced by RebalanceNode< ValueType >::OnPreOpFile(), CacheNode< ValueType >::OnPreOpFile(), PrefixSumNode< ValueType, SumFunction, Inclusive >::OnPreOpFile(), CollapseNode< ValueType >::OnPreOpFile(), ZipWithIndexNode< ValueType, ZipFunction >::OnPreOpFile(), BaseWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::OnPreOpFile(), and SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::OnPreOpFile().
|
inline |
Returns true if the File is empty.
Definition at line 185 of file file.hpp.
References File::blocks_.
File::ConsumeReader GetConsumeReader | ( | size_t | prefetch_size = File::default_prefetch_size_ | ) |
Get consuming BlockReader for beginning of File.
Definition at line 73 of file file.cpp.
References File::ConsumeFileBlockSource, and BlockSink::local_worker_id_.
Referenced by WriteLinesOneNode< ValueType >::Execute(), JoinNode< ValueType, FirstDIA, SecondDIA, KeyExtractor1, KeyExtractor2, JoinFunction, HashFunction, UseLocationDetection >::Execute(), GroupByNode< ValueType, KeyExtractor, GroupFunction, HashFunction, UseLocationDetection >::Execute(), Stream::ScatterConsume(), and SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::TransmitItems().
|
inline |
Get index of the given item, or the next greater item, in this file.
The file has to be ordered according to the given compare function. The tie value can be used to make a decision in case of many successive equal elements. The tie is compared with the local rank of the element.
WARNING: This method uses GetItemAt combined with a binary search and is therefore not efficient. The method should be reimplemented in near future.
Definition at line 236 of file file.hpp.
References File::GetIndexOf(), File::GetItemRange(), File::num_items(), and File::operator<<.
File::KeepReader GetKeepReader | ( | size_t | prefetch_size = File::default_prefetch_size_ | ) | const |
Get BlockReader for beginning of File.
Definition at line 68 of file file.cpp.
References File::KeepFileBlockSource, and BlockSink::local_worker_id_.
Referenced by PrefixSumNode< ValueType, SumFunction, Inclusive >::OnPreOpFile(), and Stream::ScatterKeep().
File::Reader GetReader | ( | bool | consume, |
size_t | prefetch_size = File::default_prefetch_size_ |
||
) |
Get BlockReader or a consuming BlockReader for beginning of File.
Definition at line 78 of file file.cpp.
References BlockSink::local_worker_id_.
Referenced by PrefixSumNode< ValueType, SumFunction, Inclusive >::PushData(), ZipWithIndexNode< ValueType, ZipFunction >::PushData(), OverlapWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::PushData(), DisjointWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::PushData(), DIANode< StackInput >::PushFile(), GroupToIndexNode< ValueType, KeyExtractor, GroupFunction >::RunUserFunc(), and GroupByNode< ValueType, KeyExtractor, GroupFunction, HashFunction, UseLocationDetection >::RunUserFunc().
File::Writer GetWriter | ( | size_t | block_size = default_block_size | ) |
Get BlockWriter.
Definition at line 63 of file file.cpp.
Referenced by GroupToIndexNode< ValueType, KeyExtractor, GroupFunction >::FlushVectorToFile(), SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::StartPreOp(), GroupByNode< ValueType, KeyExtractor, GroupFunction, HashFunction, UseLocationDetection >::StartPreOp(), and JoinNode< ValueType, FirstDIA, SecondDIA, KeyExtractor1, KeyExtractor2, JoinFunction, HashFunction, UseLocationDetection >::StartPreOp().
|
inline |
Return number of items starting in block i.
Definition at line 200 of file file.hpp.
References File::blocks_, File::GetIndexOf(), File::GetItemAt(), and File::num_items_sum_.
|
inline |
Return the number of blocks.
Definition at line 177 of file file.hpp.
References File::blocks_.
Referenced by KeepFileBlockSource::NextBlock(), KeepFileBlockSource::NextBlockUnpinned(), and KeepFileBlockSource::Prefetch().
|
inline |
Return the number of items in the file.
Definition at line 180 of file file.hpp.
References File::num_items_sum_.
Referenced by File::AppendBlock(), RebalanceNode< ValueType >::Execute(), WriteLinesOneNode< ValueType >::Execute(), ZipWithIndexNode< ValueType, ZipFunction >::Execute(), OverlapWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::Execute(), DisjointWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::Execute(), ReduceByHashPostPhase< TableItem, Key, ValueType, KeyExtractor, ReduceFunction, thrill::api::ReduceNode::Emitter, VolatileKey, ReduceConfig, thrill::core::ReduceByHash, KeyEqualFunction >::Flush(), BlockQueue::GetBlockSource(), File::GetIndexOf(), CacheNode< ValueType >::NumItems(), RebalanceNode< ValueType >::OnPreOpFile(), CacheNode< ValueType >::OnPreOpFile(), ZipWithIndexNode< ValueType, ZipFunction >::OnPreOpFile(), BaseWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::OnPreOpFile(), SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::OnPreOpFile(), PrefixSumNode< ValueType, SumFunction, Inclusive >::PushData(), ZipWithIndexNode< ValueType, ZipFunction >::PushData(), OverlapWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::PushData(), and DisjointWindowNode< ValueType, Input, WindowFunction, PartialWindowFunction >::PushData().
non-copyable: delete assignment operator
Referenced by FileBlockSink::FileBlockSink().
std::string ReadComplete | ( | ) | const |
Read complete File into a std::string, obviously, this should only be used for debugging!
Definition at line 87 of file file.cpp.
References File::blocks_.
|
inline |
change dia_id after construction (needed because it may be unknown at construction)
Definition at line 252 of file file.hpp.
References File::dia_id_.
Referenced by BlockQueue::set_dia_id().
|
inline |
Return the number of bytes of user data in this file.
Definition at line 188 of file file.hpp.
References File::size_bytes_.
|
friend |
Output the Block objects contained in this File.
Definition at line 94 of file file.cpp.
Referenced by File::GetIndexOf().
|
static |
boolean flag whether to check if AllocateByteBlock can fail in any subclass (if false: accelerate BlockWriter to not be able to cope with nullptr).
|
private |
container holding Blocks and thus shared pointers to all byte blocks.
Definition at line 262 of file file.hpp.
Referenced by File::AppendBlock(), File::block(), File::blocks(), File::Clear(), File::Copy(), File::empty(), File::GetReaderAt(), File::ItemsStartIn(), ConsumeFileBlockSource::NextBlock(), ConsumeFileBlockSource::NextBlockUnpinned(), File::num_blocks(), thrill::data::operator<<(), ConsumeFileBlockSource::Prefetch(), and File::ReadComplete().
|
static |
|
private |
optionally associated DIANode id
Definition at line 259 of file file.hpp.
Referenced by File::Copy(), File::set_dia_id(), and File::~File().
|
private |
|
private |
inclusive prefixsum of number of elements of blocks, hence num_items_sum_[i] is the number of items starting in all blocks preceding and including the i-th block.
Definition at line 267 of file file.hpp.
Referenced by File::AppendBlock(), File::Clear(), File::Copy(), File::GetReaderAt(), File::ItemsStartIn(), and File::num_items().
|
private |
Total size of this file in bytes. Sum of all block sizes.
Definition at line 270 of file file.hpp.
Referenced by File::AppendBlock(), File::Clear(), File::Copy(), and File::size_bytes().
|
private |
Total number of bytes stored in the File by a Writer: for stats, never decreases.
Definition at line 274 of file file.hpp.
Referenced by File::AppendBlock(), File::Copy(), and File::~File().
|
private |
Total number of items stored in the File by a Writer: for stats, never decreases.
Definition at line 278 of file file.hpp.
Referenced by File::AppendBlock(), File::Copy(), and File::~File().