Thrill
0.1
|
A DIANode which performs sampling without replacement.
The implementation is an adaptation of Algorithm P from Sanders, Lamm, Hübschle-Schneider, Schrade, Dachsbacher, ACM TOMS 2017: "Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online". The modification is in how samples are assigned to workers. Instead of doing log(num_workers) splits to assign samples to ranges of workers, do O(log(input_size)) splits to assign samples to input ranges. Workers only compute the ranges which overlap their local input range, and then add up the ranges that are fully contained in their local input range. This ensures consistency while requiring only a single prefix-sum and two scalar broadcasts.
Definition at line 50 of file sample.hpp.
#include <sample.hpp>
Public Member Functions | |
template<typename ParentDIA > | |
SampleNode (const ParentDIA &parent, size_t sample_size) | |
void | Dispose () final |
Virtual clear method. Triggers actual disposing in sub-classes. More... | |
void | Execute () final |
Virtual execution method. Triggers actual computation in sub-classes. More... | |
void | PushData (bool consume) final |
Virtual method for pushing data. Triggers actual pushing in sub-classes. More... | |
Public Member Functions inherited from DOpNode< ValueType > | |
DOpNode (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents) | |
Constructor for a DOpNode, which sets references to the parent nodes. More... | |
DOpNode (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents) | |
Constructor for a DOpNode, which sets references to the parent nodes. More... | |
Public Member Functions inherited from DIANode< ValueType > | |
DIANode (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents) | |
Constructor for a DIANode, which sets references to the parent nodes. More... | |
DIANode (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents) | |
Constructor for a DIANode, which sets references to the parent nodes. More... | |
virtual void | AddChild (DIABase *node, const Callback &callback=Callback(), size_t parent_index=0) |
Enables children to push their "folded" function chains to their parent. More... | |
std::vector< DIABase * > | children () const override |
Returns the children of this DIABase. More... | |
void | PushFile (data::File &file, bool consume) const |
void | PushItem (const ValueType &item) const |
Method for derived classes to Push a single item to all children. More... | |
void | RemoveAllChildren () override |
void | RemoveChild (DIABase *node) override |
void | RunPushData () override |
Public Member Functions inherited from DIABase | |
DIABase (Context &ctx, const char *label, const std::initializer_list< size_t > &parent_ids, const std::initializer_list< DIABasePtr > &parents) | |
The constructor for a DIABase. More... | |
DIABase (Context &ctx, const char *label, std::vector< size_t > &&parent_ids, std::vector< DIABasePtr > &&parents) | |
The constructor for a DIABase. More... | |
DIABase (const DIABase &)=delete | |
non-copyable: delete copy-constructor More... | |
DIABase (DIABase &&)=default | |
move-constructor: default More... | |
virtual | ~DIABase () |
Virtual destructor for a DIABase. More... | |
virtual size_t | consume_counter () const |
Returns consume_counter_. More... | |
Context & | context () |
Returns the api::Context of this DIABase. More... | |
virtual void | DecConsumeCounter (size_t counter) |
const size_t & | dia_id () const |
return unique id of DIANode subclass as stored by StatsNode More... | |
virtual bool | ForwardDataOnly () const |
virtual void | IncConsumeCounter (size_t counter) |
const char * | label () const |
return label() of DIANode subclass as stored by StatsNode More... | |
mem::Manager & | mem_manager () |
Return the Context's memory manager. More... | |
DIABase & | operator= (const DIABase &)=delete |
non-copyable: delete assignment operator More... | |
DIABase & | operator= (DIABase &&)=default |
move-assignment operator: default More... | |
std::vector< size_t > | parent_ids () const |
Returns the parents of this DIABase. More... | |
const std::vector< DIABasePtr > & | parents () const |
Returns the parents of this DIABase. More... | |
void | RemoveParent (DIABase *p) |
Remove a parent. More... | |
virtual bool | RequireParentPushData (size_t) const |
void | RunScope () |
void | set_mem_limit (const DIAMemUse &mem_limit) |
void | set_state (const DIAState &state) |
virtual void | SetConsumeCounter (size_t counter) |
DIAState | state () const |
virtual DIAMemUse | PreOpMemUse () |
Amount of RAM used by PreOp after StartPreOp() More... | |
virtual void | StartPreOp (size_t) |
Virtual method for preparing start of PushData. More... | |
virtual bool | OnPreOpFile (const data::File &, size_t) |
virtual void | StopPreOp (size_t) |
Virtual method for preparing end of PushData. More... | |
virtual DIAMemUse | ExecuteMemUse () |
Amount of RAM used by Execute() More... | |
virtual DIAMemUse | PushDataMemUse () |
Amount of RAM used by PushData() More... | |
Public Member Functions inherited from ReferenceCounter | |
ReferenceCounter () noexcept | |
new objects have zero reference count More... | |
ReferenceCounter (const ReferenceCounter &) noexcept | |
coping still creates a new object with zero reference count More... | |
~ReferenceCounter () | |
bool | dec_reference () const noexcept |
Call whenever resetting (i.e. More... | |
void | inc_reference () const noexcept |
Call whenever setting a pointer to the object. More... | |
ReferenceCounter & | operator= (const ReferenceCounter &) noexcept |
assignment operator, leaves pointers unchanged More... | |
size_t | reference_count () const noexcept |
Return the number of references to this object (for debugging) More... | |
bool | unique () const noexcept |
Test if the ReferenceCounter is referenced by only one CountingPtr. More... | |
Private Types | |
using | Super = DOpNode< ValueType > |
Private Member Functions | |
size_t | calc_local_samples (size_t my_begin, size_t my_end, size_t range_begin, size_t range_end, size_t sample_size, size_t seed) |
size_t | hash_combine (size_t seed, size_t v) |
Private Attributes | |
common::StatsTimerStopped | comm_timer_ |
common::hypergeometric | hyp_ |
Hypergeometric distribution to calculate local sample sizes. More... | |
size_t | local_samples_ |
size_t | local_size_ |
local input size, number of samples to draw globally, and locally More... | |
common::StatsTimerStopped | local_timer_ |
Timers for local work and communication. More... | |
const bool | parent_stack_empty_ |
Whether the parent stack is empty. More... | |
std::mt19937_64 | rng_ { std::random_device { } () } |
Random generator for reservoir sampler. More... | |
size_t | sample_size_ |
common::ReservoirSamplingFast< ValueType, decltype(rng_)> | sampler_ |
Reservoir sampler for pre-op. More... | |
std::vector< ValueType > | samples_ |
local samples More... | |
Static Private Attributes | |
static constexpr bool | debug = false |
Additional Inherited Members | |
Public Types inherited from DOpNode< ValueType > | |
using | Super = DIANode< ValueType > |
Public Types inherited from DIANode< ValueType > | |
using | Callback = tlx::delegate< void(const ValueType &)> |
Public Types inherited from DIABase | |
using | DIABasePtr = tlx::CountingPtr< DIABase > |
Public Attributes inherited from DIABase | |
common::JsonLogger | logger_ |
Static Public Attributes inherited from DIABase | |
static constexpr size_t | kNeverConsume = static_cast<size_t>(-1) |
Never full consume. More... | |
Protected Attributes inherited from DIANode< ValueType > | |
std::vector< Child > | children_ |
Callback functions from the child nodes. More... | |
Protected Attributes inherited from DIABase | |
Context & | context_ |
associated Context More... | |
const size_t | dia_id_ |
DIA serial id. More... | |
const char *const | label_ |
DOp node static label. More... | |
DIAState | state_ = DIAState::NEW |
State of the DIANode. State is NEW on creation. More... | |
std::vector< DIABasePtr > | parents_ |
Parents of this DIABase. More... | |
DIAMemUse | mem_limit_ = 0 |
size_t | consume_counter_ = 1 |
Definition at line 54 of file sample.hpp.
|
inline |
Definition at line 59 of file sample.hpp.
References ReservoirSamplingFast< Type, RNG >::add(), SampleNode< ValueType >::hyp_, SampleNode< ValueType >::local_samples_, SampleNode< ValueType >::local_size_, SampleNode< ValueType >::parent_stack_empty_, SampleNode< ValueType >::rng_, SampleNode< ValueType >::sample_size_, SampleNode< ValueType >::sampler_, and SampleNode< ValueType >::samples_.
|
inlineprivate |
Definition at line 176 of file sample.hpp.
References SampleNode< ValueType >::hash_combine(), SampleNode< ValueType >::hyp_, LOG, and hypergeometric_distribution< int_t, fp_t >::seed().
Referenced by SampleNode< ValueType >::Execute().
|
inlinefinalvirtual |
Virtual clear method. Triggers actual disposing in sub-classes.
Reimplemented from DIABase.
Definition at line 165 of file sample.hpp.
References SampleNode< ValueType >::samples_, and tlx::vector_free().
|
inlinefinalvirtual |
Virtual execution method. Triggers actual computation in sub-classes.
Implements DIABase.
Definition at line 72 of file sample.hpp.
References FlowControlChannel::Broadcast(), SampleNode< ValueType >::calc_local_samples(), SampleNode< ValueType >::comm_timer_, DIABase::context_, ReservoirSamplingFast< Type, RNG >::count(), FlowControlChannel::ExPrefixSumTotal(), SampleNode< ValueType >::local_samples_, SampleNode< ValueType >::local_size_, SampleNode< ValueType >::local_timer_, min(), Context::my_rank(), Context::net, Context::num_workers(), SampleNode< ValueType >::sample_size_, SampleNode< ValueType >::sampler_, SampleNode< ValueType >::samples_, seed, and sLOG.
|
inlineprivate |
Definition at line 170 of file sample.hpp.
Referenced by SampleNode< ValueType >::calc_local_samples().
|
inlinefinalvirtual |
Virtual method for pushing data. Triggers actual pushing in sub-classes.
Implements DIABase.
Definition at line 127 of file sample.hpp.
References SampleNode< ValueType >::comm_timer_, SampleNode< ValueType >::local_samples_, SampleNode< ValueType >::local_timer_, LOGC, DIANode< ValueType >::PushItem(), SampleNode< ValueType >::rng_, SampleNode< ValueType >::samples_, sLOG, sLOGC, and tlx::vector_free().
|
private |
Definition at line 241 of file sample.hpp.
Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::PushData().
|
staticprivate |
Definition at line 52 of file sample.hpp.
|
private |
Hypergeometric distribution to calculate local sample sizes.
Definition at line 235 of file sample.hpp.
Referenced by SampleNode< ValueType >::calc_local_samples(), and SampleNode< ValueType >::SampleNode().
|
private |
Definition at line 231 of file sample.hpp.
Referenced by SampleNode< ValueType >::Execute(), SampleNode< ValueType >::PushData(), and SampleNode< ValueType >::SampleNode().
|
private |
local input size, number of samples to draw globally, and locally
Definition at line 231 of file sample.hpp.
Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::SampleNode().
|
private |
Timers for local work and communication.
Definition at line 241 of file sample.hpp.
Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::PushData().
|
private |
Whether the parent stack is empty.
Definition at line 243 of file sample.hpp.
Referenced by SampleNode< ValueType >::SampleNode().
|
private |
Random generator for reservoir sampler.
Definition at line 237 of file sample.hpp.
Referenced by SampleNode< ValueType >::PushData(), and SampleNode< ValueType >::SampleNode().
|
private |
Definition at line 231 of file sample.hpp.
Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::SampleNode().
|
private |
Reservoir sampler for pre-op.
Definition at line 239 of file sample.hpp.
Referenced by SampleNode< ValueType >::Execute(), and SampleNode< ValueType >::SampleNode().
|
private |
local samples
Definition at line 233 of file sample.hpp.
Referenced by SampleNode< ValueType >::Dispose(), SampleNode< ValueType >::Execute(), SampleNode< ValueType >::PushData(), and SampleNode< ValueType >::SampleNode().