Thrill  0.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
SampleNode< ValueType > Class Template Referencefinal

Detailed Description

template<typename ValueType>
class thrill::api::SampleNode< ValueType >

A DIANode which performs sampling without replacement.

The implementation is an adaptation of Algorithm P from Sanders, Lamm, Hübschle-Schneider, Schrade, Dachsbacher, ACM TOMS 2017: "Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online". The modification is in how samples are assigned to workers. Instead of doing log(num_workers) splits to assign samples to ranges of workers, do O(log(input_size)) splits to assign samples to input ranges. Workers only compute the ranges which overlap their local input range, and then add up the ranges that are fully contained in their local input range. This ensures consistency while requiring only a single prefix-sum and two scalar broadcasts.

Definition at line 49 of file sample.hpp.

+ Inheritance diagram for SampleNode< ValueType >:
+ Collaboration diagram for SampleNode< ValueType >:

#include <sample.hpp>

Public Member Functions

template<typename ParentDIA >
 SampleNode (const ParentDIA &parent, size_t sample_size)
 
- Public Member Functions inherited from DOpNode< ValueType >
 DOpNode (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents)
 Constructor for a DOpNode, which sets references to the parent nodes. More...
 
 DOpNode (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents)
 Constructor for a DOpNode, which sets references to the parent nodes. More...
 
- Public Member Functions inherited from DIANode< ValueType >
 DIANode (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents)
 Constructor for a DIANode, which sets references to the parent nodes. More...
 
 DIANode (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents)
 Constructor for a DIANode, which sets references to the parent nodes. More...
 
virtual void AddChild (DIABase *node, const Callback &callback=Callback(), size_t parent_index=0)
 Enables children to push their "folded" function chains to their parent. More...
 
std::vector< DIABase * > children () const override
 Returns the children of this DIABase. More...
 
void PushFile (data::File &file, bool consume) const
 
void PushItem (const ValueType &item) const
 Method for derived classes to Push a single item to all children. More...
 
void RemoveAllChildren () override
 
void RemoveChild (DIABase *node) override
 
void RunPushData () override
 
- Public Member Functions inherited from DIABase
 DIABase (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents)
 The constructor for a DIABase. More...
 
 DIABase (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents)
 The constructor for a DIABase. More...
 
 DIABase (const DIABase &)=delete
 non-copyable: delete copy-constructor More...
 
 DIABase (DIABase &&)=default
 move-constructor: default More...
 
virtual ~DIABase ()
 Virtual destructor for a DIABase. More...
 
virtual size_t consume_counter () const
 Returns consume_counter_. More...
 
Contextcontext ()
 Returns the api::Context of this DIABase. More...
 
virtual void DecConsumeCounter (size_t counter)
 
virtual bool ForwardDataOnly () const
 
const size_t & id () const
 return unique id() of DIANode subclass as stored by StatsNode More...
 
virtual void IncConsumeCounter (size_t counter)
 
const char * label () const
 return label() of DIANode subclass as stored by StatsNode More...
 
mem::Managermem_manager ()
 Return the Context's memory manager. More...
 
DIABaseoperator= (const DIABase &)=delete
 non-copyable: delete assignment operator More...
 
DIABaseoperator= (DIABase &&)=default
 move-assignment operator: default More...
 
std::vector< size_t > parent_ids () const
 Returns the parents of this DIABase. More...
 
const std::vector< DIABasePtr > & parents () const
 Returns the parents of this DIABase. More...
 
void RemoveParent (DIABase *p)
 Remove a parent. More...
 
virtual bool RequireParentPushData (size_t) const
 
void RunScope ()
 
void set_mem_limit (const DIAMemUse &mem_limit)
 
void set_state (const DIAState &state)
 
virtual void SetConsumeCounter (size_t counter)
 
DIAState state () const
 
virtual DIAMemUse PreOpMemUse ()
 Amount of RAM used by PreOp after StartPreOp() More...
 
virtual void StartPreOp (size_t)
 Virtual method for preparing start of PushData. More...
 
virtual bool OnPreOpFile (const data::File &, size_t)
 
virtual void StopPreOp (size_t)
 Virtual method for preparing end of PushData. More...
 
virtual DIAMemUse ExecuteMemUse ()
 Amount of RAM used by Execute() More...
 
virtual void Execute ()=0
 Virtual execution method. Triggers actual computation in sub-classes. More...
 
virtual DIAMemUse PushDataMemUse ()
 Amount of RAM used by PushData() More...
 
virtual void PushData (bool consume)=0
 Virtual method for pushing data. Triggers actual pushing in sub-classes. More...
 
virtual void Dispose ()
 Virtual clear method. Triggers actual disposing in sub-classes. More...
 
- Public Member Functions inherited from ReferenceCounter
 ReferenceCounter () noexcept
 new objects have zero reference count More...
 
 ReferenceCounter (const ReferenceCounter &) noexcept
 coping still creates a new object with zero reference count More...
 
 ~ReferenceCounter ()
 
bool dec_reference () const noexcept
 Call whenever resetting (i.e. More...
 
void inc_reference () const noexcept
 Call whenever setting a pointer to the object. More...
 
ReferenceCounteroperator= (const ReferenceCounter &) noexcept
 assignment operator, leaves pointers unchanged More...
 
size_t reference_count () const noexcept
 Return the number of references to this object (for debugging) More...
 
bool unique () const noexcept
 Test if the ReferenceCounter is referenced by only one CountingPtr. More...
 

Public Attributes

common::StatsTimerStopped comm_timer_
 
common::hypergeometric hyp_
 Hypergeometric distribution to calculate local sample sizes. More...
 
common::StatsTimerStopped local_timer_
 Timers for local work and communication. More...
 
const bool parent_stack_empty_
 Whether the parent stack is empty. More...
 
std::mt19937_64 rng_ { std::random_device { } () }
 Random generator for reservoir sampler. More...
 
common::ReservoirSamplingFast
< ValueType, decltype(rng_)> 
sampler_
 Reservoir sampler for pre-op. More...
 
std::vector< ValueType > samples_
 local samples More...
 
- Public Attributes inherited from DIABase
common::JsonLogger logger_
 

Private Types

using Super = DOpNode< ValueType >
 

Static Private Attributes

static constexpr bool debug = false
 

Additional Inherited Members

- Public Types inherited from DOpNode< ValueType >
using Super = DIANode< ValueType >
 
- Public Types inherited from DIANode< ValueType >
using Callback = tlx::delegate< void(const ValueType &)>
 
- Public Types inherited from DIABase
using DIABasePtr = tlx::CountingPtr< DIABase >
 
- Static Public Attributes inherited from DIABase
static constexpr size_t kNeverConsume = static_cast<size_t>(-1)
 Never full consume. More...
 
- Protected Attributes inherited from DIANode< ValueType >
std::vector< Childchildren_
 Callback functions from the child nodes. More...
 
- Protected Attributes inherited from DIABase
Contextcontext_
 associated Context More...
 
const size_t id_
 DIA serial id. More...
 
const char *const label_
 DOp node static label. More...
 
DIAState state_ = DIAState::NEW
 State of the DIANode. State is NEW on creation. More...
 
std::vector< DIABasePtrparents_
 Parents of this DIABase. More...
 
DIAMemUse mem_limit_ = 0
 
size_t consume_counter_ = 1
 

Member Typedef Documentation

using Super = DOpNode<ValueType>
private

Definition at line 53 of file sample.hpp.

Constructor & Destructor Documentation

SampleNode ( const ParentDIA &  parent,
size_t  sample_size 
)
inline

Definition at line 58 of file sample.hpp.

Member Data Documentation

Definition at line 241 of file sample.hpp.

constexpr bool debug = false
staticprivate

Definition at line 51 of file sample.hpp.

Hypergeometric distribution to calculate local sample sizes.

Definition at line 235 of file sample.hpp.

Timers for local work and communication.

Definition at line 241 of file sample.hpp.

const bool parent_stack_empty_

Whether the parent stack is empty.

Definition at line 243 of file sample.hpp.

std::mt19937_64 rng_ { std::random_device { } () }

Random generator for reservoir sampler.

Definition at line 237 of file sample.hpp.

common::ReservoirSamplingFast<ValueType, decltype(rng_)> sampler_

Reservoir sampler for pre-op.

Definition at line 239 of file sample.hpp.

std::vector<ValueType> samples_

local samples

Definition at line 59 of file sample.hpp.


The documentation for this class was generated from the following file: