Thrill  0.1
SampleNode< ValueType > Class Template Referencefinal

Detailed Description

template<typename ValueType>
class thrill::api::SampleNode< ValueType >

A DIANode which performs sampling without replacement.

The implementation is an adaptation of Algorithm P from Sanders, Lamm, Hübschle-Schneider, Schrade, Dachsbacher, ACM TOMS 2017: "Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online". The modification is in how samples are assigned to workers. Instead of doing log(num_workers) splits to assign samples to ranges of workers, do O(log(input_size)) splits to assign samples to input ranges. Workers only compute the ranges which overlap their local input range, and then add up the ranges that are fully contained in their local input range. This ensures consistency while requiring only a single prefix-sum and two scalar broadcasts.

Definition at line 50 of file sample.hpp.

+ Inheritance diagram for SampleNode< ValueType >:
+ Collaboration diagram for SampleNode< ValueType >:

#include <sample.hpp>

Public Member Functions

template<typename ParentDIA >
 SampleNode (const ParentDIA &parent, size_t sample_size)
 
void Dispose () final
 Virtual clear method. Triggers actual disposing in sub-classes. More...
 
void Execute () final
 Virtual execution method. Triggers actual computation in sub-classes. More...
 
void PushData (bool consume) final
 Virtual method for pushing data. Triggers actual pushing in sub-classes. More...
 
- Public Member Functions inherited from DOpNode< ValueType >
 DOpNode (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents)
 Constructor for a DOpNode, which sets references to the parent nodes. More...
 
 DOpNode (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents)
 Constructor for a DOpNode, which sets references to the parent nodes. More...
 
- Public Member Functions inherited from DIANode< ValueType >
 DIANode (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents)
 Constructor for a DIANode, which sets references to the parent nodes. More...
 
 DIANode (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents)
 Constructor for a DIANode, which sets references to the parent nodes. More...
 
virtual void AddChild (DIABase *node, const Callback &callback=Callback(), size_t parent_index=0)
 Enables children to push their "folded" function chains to their parent. More...
 
std::vector< DIABase * > children () const override
 Returns the children of this DIABase. More...
 
void PushFile (data::File &file, bool consume) const
 
void PushItem (const ValueType &item) const
 Method for derived classes to Push a single item to all children. More...
 
void RemoveAllChildren () override
 
void RemoveChild (DIABase *node) override
 
void RunPushData () override
 
- Public Member Functions inherited from DIABase
 DIABase (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents)
 The constructor for a DIABase. More...
 
 DIABase (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents)
 The constructor for a DIABase. More...
 
 DIABase (const DIABase &)=delete
 non-copyable: delete copy-constructor More...
 
 DIABase (DIABase &&)=default
 move-constructor: default More...
 
virtual ~DIABase ()
 Virtual destructor for a DIABase. More...
 
virtual size_t consume_counter () const
 Returns consume_counter_. More...
 
Contextcontext ()
 Returns the api::Context of this DIABase. More...
 
virtual void DecConsumeCounter (size_t counter)
 
const size_t & dia_id () const
 return unique id of DIANode subclass as stored by StatsNode More...
 
virtual bool ForwardDataOnly () const
 
virtual void IncConsumeCounter (size_t counter)
 
const char * label () const
 return label() of DIANode subclass as stored by StatsNode More...
 
mem::Managermem_manager ()
 Return the Context's memory manager. More...
 
DIABaseoperator= (const DIABase &)=delete
 non-copyable: delete assignment operator More...
 
DIABaseoperator= (DIABase &&)=default
 move-assignment operator: default More...
 
std::vector< size_t > parent_ids () const
 Returns the parents of this DIABase. More...
 
const std::vector< DIABasePtr > & parents () const
 Returns the parents of this DIABase. More...
 
void RemoveParent (DIABase *p)
 Remove a parent. More...
 
virtual bool RequireParentPushData (size_t) const
 
void RunScope ()
 
void set_mem_limit (const DIAMemUse &mem_limit)
 
void set_state (const DIAState &state)
 
virtual void SetConsumeCounter (size_t counter)
 
DIAState state () const
 
virtual DIAMemUse PreOpMemUse ()
 Amount of RAM used by PreOp after StartPreOp() More...
 
virtual void StartPreOp (size_t)
 Virtual method for preparing start of PushData. More...
 
virtual bool OnPreOpFile (const data::File &, size_t)
 
virtual void StopPreOp (size_t)
 Virtual method for preparing end of PushData. More...
 
virtual DIAMemUse ExecuteMemUse ()
 Amount of RAM used by Execute() More...
 
virtual DIAMemUse PushDataMemUse ()
 Amount of RAM used by PushData() More...
 
- Public Member Functions inherited from ReferenceCounter
 ReferenceCounter () noexcept
 new objects have zero reference count More...
 
 ReferenceCounter (const ReferenceCounter &) noexcept
 coping still creates a new object with zero reference count More...
 
 ~ReferenceCounter ()
 
bool dec_reference () const noexcept
 Call whenever resetting (i.e. More...
 
void inc_reference () const noexcept
 Call whenever setting a pointer to the object. More...
 
ReferenceCounteroperator= (const ReferenceCounter &) noexcept
 assignment operator, leaves pointers unchanged More...
 
size_t reference_count () const noexcept
 Return the number of references to this object (for debugging) More...
 
bool unique () const noexcept
 Test if the ReferenceCounter is referenced by only one CountingPtr. More...
 

Private Types

using Super = DOpNode< ValueType >
 

Private Member Functions

size_t calc_local_samples (size_t my_begin, size_t my_end, size_t range_begin, size_t range_end, size_t sample_size, size_t seed)
 
size_t hash_combine (size_t seed, size_t v)
 

Private Attributes

common::StatsTimerStopped comm_timer_
 
common::hypergeometric hyp_
 Hypergeometric distribution to calculate local sample sizes. More...
 
size_t local_samples_
 
size_t local_size_
 local input size, number of samples to draw globally, and locally More...
 
common::StatsTimerStopped local_timer_
 Timers for local work and communication. More...
 
const bool parent_stack_empty_
 Whether the parent stack is empty. More...
 
std::mt19937_64 rng_ { std::random_device { } () }
 Random generator for reservoir sampler. More...
 
size_t sample_size_
 
common::ReservoirSamplingFast< ValueType, decltype(rng_)> sampler_
 Reservoir sampler for pre-op. More...
 
std::vector< ValueType > samples_
 local samples More...
 

Static Private Attributes

static constexpr bool debug = false
 

Additional Inherited Members

- Public Types inherited from DOpNode< ValueType >
using Super = DIANode< ValueType >
 
- Public Types inherited from DIANode< ValueType >
using Callback = tlx::delegate< void(const ValueType &)>
 
- Public Types inherited from DIABase
using DIABasePtr = tlx::CountingPtr< DIABase >
 
- Public Attributes inherited from DIABase
common::JsonLogger logger_
 
- Static Public Attributes inherited from DIABase
static constexpr size_t kNeverConsume = static_cast<size_t>(-1)
 Never full consume. More...
 
- Protected Attributes inherited from DIANode< ValueType >
std::vector< Childchildren_
 Callback functions from the child nodes. More...
 
- Protected Attributes inherited from DIABase
Contextcontext_
 associated Context More...
 
const size_t dia_id_
 DIA serial id. More...
 
const char *const label_
 DOp node static label. More...
 
DIAState state_ = DIAState::NEW
 State of the DIANode. State is NEW on creation. More...
 
std::vector< DIABasePtrparents_
 Parents of this DIABase. More...
 
DIAMemUse mem_limit_ = 0
 
size_t consume_counter_ = 1
 

Member Typedef Documentation

◆ Super

using Super = DOpNode<ValueType>
private

Definition at line 54 of file sample.hpp.

Constructor & Destructor Documentation

◆ SampleNode()

Member Function Documentation

◆ calc_local_samples()

size_t calc_local_samples ( size_t  my_begin,
size_t  my_end,
size_t  range_begin,
size_t  range_end,
size_t  sample_size,
size_t  seed 
)
inlineprivate

◆ Dispose()

void Dispose ( )
inlinefinalvirtual

Virtual clear method. Triggers actual disposing in sub-classes.

Reimplemented from DIABase.

Definition at line 165 of file sample.hpp.

References SampleNode< ValueType >::samples_, and tlx::vector_free().

◆ Execute()

◆ hash_combine()

size_t hash_combine ( size_t  seed,
size_t  v 
)
inlineprivate

Definition at line 170 of file sample.hpp.

Referenced by SampleNode< ValueType >::calc_local_samples().

◆ PushData()

void PushData ( bool  consume)
inlinefinalvirtual

Member Data Documentation

◆ comm_timer_

common::StatsTimerStopped comm_timer_
private

◆ debug

constexpr bool debug = false
staticprivate

Definition at line 52 of file sample.hpp.

◆ hyp_

common::hypergeometric hyp_
private

Hypergeometric distribution to calculate local sample sizes.

Definition at line 235 of file sample.hpp.

Referenced by SampleNode< ValueType >::calc_local_samples(), and SampleNode< ValueType >::SampleNode().

◆ local_samples_

size_t local_samples_
private

◆ local_size_

size_t local_size_
private

local input size, number of samples to draw globally, and locally

Definition at line 231 of file sample.hpp.

Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::SampleNode().

◆ local_timer_

common::StatsTimerStopped local_timer_
private

Timers for local work and communication.

Definition at line 241 of file sample.hpp.

Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::PushData().

◆ parent_stack_empty_

const bool parent_stack_empty_
private

Whether the parent stack is empty.

Definition at line 243 of file sample.hpp.

Referenced by SampleNode< ValueType >::SampleNode().

◆ rng_

std::mt19937_64 rng_ { std::random_device { } () }
private

Random generator for reservoir sampler.

Definition at line 237 of file sample.hpp.

Referenced by SampleNode< ValueType >::PushData(), and SampleNode< ValueType >::SampleNode().

◆ sample_size_

size_t sample_size_
private

◆ sampler_

common::ReservoirSamplingFast<ValueType, decltype(rng_)> sampler_
private

Reservoir sampler for pre-op.

Definition at line 239 of file sample.hpp.

Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::SampleNode().

◆ samples_

std::vector<ValueType> samples_
private

The documentation for this class was generated from the following file: