Thrill  0.1
File Class Reference

Detailed Description

A File is an ordered sequence of Block objects for storing items.

By using the Block indirection, the File can be composed using existing Block objects (via reference counting), but only contain a subset of the items in those Blocks. This may be used for Zip() and Repartition().

A File can be written using a BlockWriter instance, which is delivered by GetWriter(). Thereafter it can be read (multiple times) using a BlockReader, delivered by GetReader().

Using a prefixsum over the number of items in a Block, one can seek to the block contained any item offset in log_2(Blocks) time, though seeking within the Block goes sequentially.

Definition at line 56 of file file.hpp.

+ Inheritance diagram for File:
+ Collaboration diagram for File:

#include <file.hpp>

Public Types

using ConsumeReader = BlockReader< ConsumeFileBlockSource >
 
using KeepReader = BlockReader< KeepFileBlockSource >
 
using Reader = DynBlockReader
 
using Writer = BlockWriter< FileBlockSink >
 

Public Member Functions

 File (BlockPool &block_pool, size_t local_worker_id, size_t dia_id)
 Constructor from BlockPool. More...
 
 File (const File &)=delete
 non-copyable: delete copy-constructor More...
 
 File (File &&)=default
 move-constructor: default More...
 
const Blockblock (size_t i) const
 Return reference to a block. More...
 
const std::deque< Block > & blocks () const
 Returns constant reference to all Blocks in the File. More...
 
File Copy () const
 Return a copy of the File (explicit copy-constructor) More...
 
bool empty () const
 Returns true if the File is empty. More...
 
template<typename ItemType , typename CompareFunction = std::less<ItemType>>
size_t GetIndexOf (const ItemType &item, size_t tie, size_t left, size_t right, const CompareFunction &func=CompareFunction()) const
 Get index of the given item, or the next greater item, in this file. More...
 
template<typename ItemType , typename CompareFunction = std::less<ItemType>>
size_t GetIndexOf (const ItemType &item, size_t tie, const CompareFunction &less=CompareFunction()) const
 Get index of the given item, or the next greater item, in this file. More...
 
template<typename ItemType >
ItemType GetItemAt (size_t index) const
 
template<typename ItemType >
std::vector< BlockGetItemRange (size_t begin, size_t end) const
 
size_t ItemsStartIn (size_t i) const
 Return number of items starting in block i. More...
 
size_t num_blocks () const
 Return the number of blocks. More...
 
size_t num_items () const
 Return the number of items in the file. More...
 
Fileoperator= (const File &)=delete
 non-copyable: delete assignment operator More...
 
Fileoperator= (File &&)=default
 move-assignment operator: default More...
 
void set_dia_id (size_t dia_id)
 
size_t size_bytes () const
 Return the number of bytes of user data in this file. More...
 
Writers and Readers
Writer GetWriter (size_t block_size=default_block_size)
 Get BlockWriter. More...
 
Reader GetReader (bool consume, size_t prefetch_size=File::default_prefetch_size_)
 Get BlockReader or a consuming BlockReader for beginning of File. More...
 
KeepReader GetKeepReader (size_t prefetch_size=File::default_prefetch_size_) const
 Get BlockReader for beginning of File. More...
 
ConsumeReader GetConsumeReader (size_t prefetch_size=File::default_prefetch_size_)
 Get consuming BlockReader for beginning of File. More...
 
template<typename ItemType >
KeepReader GetReaderAt (size_t index, size_t prefetch=default_prefetch_size_) const
 Get BlockReader seeked to the corresponding item index. More...
 
std::string ReadComplete () const
 
- Public Member Functions inherited from BlockSink
 BlockSink (BlockPool &block_pool, size_t local_worker_id)
 constructor with reference to BlockPool More...
 
 BlockSink (BlockPool *block_pool, size_t local_worker_id)
 constructor with reference to BlockPool More...
 
 BlockSink (const BlockSink &)=default
 default copy-constructor More...
 
 BlockSink (BlockSink &&)=default
 move-constructor: default More...
 
virtual ~BlockSink ()
 required virtual destructor More...
 
virtual PinnedByteBlockPtr AllocateByteBlock (size_t block_size)
 
virtual void AppendPinnedBlock (PinnedBlock &&b, bool is_last_block)
 Appends the PinnedBlock. More...
 
BlockPoolblock_pool () const
 Returns block_pool_. More...
 
size_t local_worker_id () const
 local worker id to associate pinned block with More...
 
common::JsonLoggerlogger ()
 Returns BlockPool.logger_. More...
 
BlockSinkoperator= (const BlockSink &)=default
 default assignment operator More...
 
BlockSinkoperator= (BlockSink &&)=default
 move-assignment operator: default More...
 
virtual void ReleaseByteBlock (ByteBlockPtr &block)
 Release an unused ByteBlock with n bytes backing memory. More...
 
size_t workers_per_host () const
 return number of workers per host More...
 
- Public Member Functions inherited from ReferenceCounter
 ReferenceCounter () noexcept
 new objects have zero reference count More...
 
 ReferenceCounter (const ReferenceCounter &) noexcept
 coping still creates a new object with zero reference count More...
 
 ~ReferenceCounter ()
 
bool dec_reference () const noexcept
 Call whenever resetting (i.e. More...
 
void inc_reference () const noexcept
 Call whenever setting a pointer to the object. More...
 
ReferenceCounteroperator= (const ReferenceCounter &) noexcept
 assignment operator, leaves pointers unchanged More...
 
size_t reference_count () const noexcept
 Return the number of references to this object (for debugging) More...
 
bool unique () const noexcept
 Test if the ReferenceCounter is referenced by only one CountingPtr. More...
 

Static Public Attributes

static size_t default_prefetch_size_ = 2 * default_block_size
 
- Static Public Attributes inherited from BlockSink
static constexpr bool allocate_can_fail_ = false
 

Private Attributes

std::deque< Blockblocks_
 container holding Blocks and thus shared pointers to all byte blocks. More...
 
size_t dia_id_
 optionally associated DIANode id More...
 
size_t id_
 unique file id More...
 
std::deque< size_t > num_items_sum_
 
size_t size_bytes_ = 0
 Total size of this file in bytes. Sum of all block sizes. More...
 
size_t stats_bytes_ = 0
 
size_t stats_items_ = 0
 

Friends

std::ostream & operator<< (std::ostream &os, const File &f)
 Output the Block objects contained in this File. More...
 

Methods of a BlockSink

static constexpr bool allocate_can_fail_ = false
 
void AppendBlock (const Block &b)
 
void AppendBlock (Block &&b)
 
void AppendBlock (const Block &b, bool) final
 
void AppendBlock (Block &&b, bool) final
 
void Close () final
 Closes the sink. Must not be called multiple times. More...
 
 ~File ()
 write out stats More...
 
void Clear ()
 Free all Blocks in the File and deallocate vectors. More...
 

Additional Inherited Members

- Protected Attributes inherited from BlockSink
size_t local_worker_id_
 local worker id to associate pinned block with More...
 

Member Typedef Documentation

◆ ConsumeReader

Definition at line 62 of file file.hpp.

◆ KeepReader

Definition at line 61 of file file.hpp.

◆ Reader

Definition at line 60 of file file.hpp.

◆ Writer

Definition at line 59 of file file.hpp.

Constructor & Destructor Documentation

◆ File() [1/3]

File ( BlockPool block_pool,
size_t  local_worker_id,
size_t  dia_id 
)

Constructor from BlockPool.

Definition at line 22 of file file.cpp.

◆ File() [2/3]

File ( const File )
delete

non-copyable: delete copy-constructor

◆ File() [3/3]

File ( File &&  )
default

move-constructor: default

◆ ~File()

Member Function Documentation

◆ AppendBlock() [1/4]

void AppendBlock ( const Block b)
inline

Append a block to this file, the block must contain given number of items after the offset first.

Definition at line 88 of file file.hpp.

References File::blocks_, Block::num_items(), File::num_items(), File::num_items_sum_, Block::size(), File::size_bytes_, File::stats_bytes_, and File::stats_items_.

Referenced by File::AppendBlock(), CacheBlockQueueSource::NextBlock(), and ReadBinaryNode< ValueType >::ReadBinaryNode().

◆ AppendBlock() [2/4]

void AppendBlock ( Block &&  b)
inline

Append a block to this file, the block must contain given number of items after the offset first.

Definition at line 99 of file file.hpp.

References File::blocks_, File::num_items(), File::num_items_sum_, File::size_bytes_, File::stats_bytes_, and File::stats_items_.

◆ AppendBlock() [3/4]

void AppendBlock ( const Block b,
bool   
)
inlinefinalvirtual

Append a block to this file, the block must contain given number of items after the offset first.

Implements BlockSink.

Definition at line 110 of file file.hpp.

References File::AppendBlock().

◆ AppendBlock() [4/4]

void AppendBlock ( Block &&  b,
bool   
)
inlinefinalvirtual

Append a block to this file, the block must contain given number of items after the offset first.

Implements BlockSink.

Definition at line 116 of file file.hpp.

References File::AppendBlock(), File::Clear(), File::Close(), and File::~File().

◆ block()

const Block& block ( size_t  i) const
inline

Return reference to a block.

Definition at line 191 of file file.hpp.

References File::blocks_.

Referenced by KeepFileBlockSource::MakeNextBlock().

◆ blocks()

const std::deque<Block>& blocks ( ) const
inline

Returns constant reference to all Blocks in the File.

Definition at line 197 of file file.hpp.

References File::blocks_.

◆ Clear()

◆ Close()

void Close ( )
finalvirtual

Closes the sink. Must not be called multiple times.

Implements BlockSink.

Definition at line 52 of file file.cpp.

Referenced by File::AppendBlock().

◆ Copy()

◆ empty()

bool empty ( ) const
inline

Returns true if the File is empty.

Definition at line 185 of file file.hpp.

References File::blocks_.

◆ GetConsumeReader()

◆ GetIndexOf()

size_t GetIndexOf ( const ItemType &  item,
size_t  tie,
const CompareFunction &  less = CompareFunction() 
) const
inline

Get index of the given item, or the next greater item, in this file.

The file has to be ordered according to the given compare function. The tie value can be used to make a decision in case of many successive equal elements. The tie is compared with the local rank of the element.

WARNING: This method uses GetItemAt combined with a binary search and is therefore not efficient. The method should be reimplemented in near future.

Definition at line 236 of file file.hpp.

References File::GetIndexOf(), File::GetItemRange(), File::num_items(), and File::operator<<.

◆ GetKeepReader()

File::KeepReader GetKeepReader ( size_t  prefetch_size = File::default_prefetch_size_) const

◆ GetReader()

◆ GetWriter()

◆ ItemsStartIn()

size_t ItemsStartIn ( size_t  i) const
inline

Return number of items starting in block i.

Definition at line 200 of file file.hpp.

References File::blocks_, File::GetIndexOf(), File::GetItemAt(), and File::num_items_sum_.

◆ num_blocks()

size_t num_blocks ( ) const
inline

Return the number of blocks.

Definition at line 177 of file file.hpp.

References File::blocks_.

Referenced by KeepFileBlockSource::NextBlock(), KeepFileBlockSource::NextBlockUnpinned(), and KeepFileBlockSource::Prefetch().

◆ num_items()

◆ operator=() [1/2]

File& operator= ( const File )
delete

non-copyable: delete assignment operator

Referenced by FileBlockSink::FileBlockSink().

◆ operator=() [2/2]

File& operator= ( File &&  )
default

move-assignment operator: default

◆ ReadComplete()

std::string ReadComplete ( ) const

Read complete File into a std::string, obviously, this should only be used for debugging!

Definition at line 87 of file file.cpp.

References File::blocks_.

◆ set_dia_id()

void set_dia_id ( size_t  dia_id)
inline

change dia_id after construction (needed because it may be unknown at construction)

Definition at line 252 of file file.hpp.

References File::dia_id_.

Referenced by BlockQueue::set_dia_id().

◆ size_bytes()

size_t size_bytes ( ) const
inline

Return the number of bytes of user data in this file.

Definition at line 188 of file file.hpp.

References File::size_bytes_.

Friends And Related Function Documentation

◆ operator<<

std::ostream& operator<< ( std::ostream &  os,
const File f 
)
friend

Output the Block objects contained in this File.

Definition at line 94 of file file.cpp.

Referenced by File::GetIndexOf().

Member Data Documentation

◆ allocate_can_fail_

constexpr bool allocate_can_fail_ = false
static

boolean flag whether to check if AllocateByteBlock can fail in any subclass (if false: accelerate BlockWriter to not be able to cope with nullptr).

Definition at line 131 of file file.hpp.

◆ blocks_

◆ default_prefetch_size_

size_t default_prefetch_size_ = 2 * default_block_size
static

external static variable containing the default number of bytes to prefetch in File readers

Definition at line 66 of file file.hpp.

◆ dia_id_

size_t dia_id_
private

optionally associated DIANode id

Definition at line 259 of file file.hpp.

Referenced by File::Copy(), File::set_dia_id(), and File::~File().

◆ id_

size_t id_
private

unique file id

Definition at line 256 of file file.hpp.

Referenced by File::~File().

◆ num_items_sum_

std::deque<size_t> num_items_sum_
private

inclusive prefixsum of number of elements of blocks, hence num_items_sum_[i] is the number of items starting in all blocks preceding and including the i-th block.

Definition at line 267 of file file.hpp.

Referenced by File::AppendBlock(), File::Clear(), File::Copy(), File::GetReaderAt(), File::ItemsStartIn(), and File::num_items().

◆ size_bytes_

size_t size_bytes_ = 0
private

Total size of this file in bytes. Sum of all block sizes.

Definition at line 270 of file file.hpp.

Referenced by File::AppendBlock(), File::Clear(), File::Copy(), and File::size_bytes().

◆ stats_bytes_

size_t stats_bytes_ = 0
private

Total number of bytes stored in the File by a Writer: for stats, never decreases.

Definition at line 274 of file file.hpp.

Referenced by File::AppendBlock(), File::Copy(), and File::~File().

◆ stats_items_

size_t stats_items_ = 0
private

Total number of items stored in the File by a Writer: for stats, never decreases.

Definition at line 278 of file file.hpp.

Referenced by File::AppendBlock(), File::Copy(), and File::~File().


The documentation for this class was generated from the following files: