Thrill  0.1
ReservoirSamplingGrow< Type, RNG > Class Template Reference

Detailed Description

template<typename Type, typename RNG = std::default_random_engine>
class thrill::common::ReservoirSamplingGrow< Type, RNG >

Implementation of a fast approximation of adaptive reservoir sampling using http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/ The reservoir size grows logarithmically with the number given to the sampler, new items replace old ones such that all items in the stream are sampled with the same approximately uniform probability.

Growing of the reservoir is implemented trivially by selecting any current item to expand the array. This works well enough if growing steps become rarer for larger streams.

Definition at line 174 of file reservoir_sampling.hpp.

+ Inheritance diagram for ReservoirSamplingGrow< Type, RNG >:

#include <reservoir_sampling.hpp>

Public Member Functions

 ReservoirSamplingGrow (std::vector< Type > &samples, RNG &rng, double desired_imbalance=0.05)
 initialize reservoir sampler More...
 
void add (const Type &item)
 visit item, maybe add it to the sample. More...
 
size_t calc_sample_size (size_t count) const
 calculate desired sample size More...
 
size_t calc_sample_size () const
 calculate desired sample size More...
 
size_t count () const
 number of items seen More...
 
double desired_imbalance () const
 desired imbalance More...
 
const std::vector< Type > & samples () const
 access to samples More...
 
size_t size () const
 size of reservoir More...
 

Private Member Functions

size_t calc_next_gap () const
 draw gap size from geometric distribution with p = size_ / count_ More...
 
size_t calc_steps_to_next_resize () const
 

Private Attributes

size_t count_ = 0
 number of items seen More...
 
const double desired_imbalance_
 
size_t gap_ = 0
 items to skip until next sample (used in gap algorithm) More...
 
RNG & rng_
 source of randomness More...
 
std::vector< Type > & samples_
 reservoir More...
 
size_t size_ = 0
 size of reservoir More...
 
size_t steps_to_resize = 0
 items to process prior to checking for reservoir resize More...
 

Static Private Attributes

static constexpr bool debug = false
 

Constructor & Destructor Documentation

◆ ReservoirSamplingGrow()

ReservoirSamplingGrow ( std::vector< Type > &  samples,
RNG &  rng,
double  desired_imbalance = 0.05 
)
inline

initialize reservoir sampler

Definition at line 180 of file reservoir_sampling.hpp.

Member Function Documentation

◆ add()

void add ( const Type &  item)
inline

visit item, maybe add it to the sample.

Definition at line 187 of file reservoir_sampling.hpp.

Referenced by SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::PreOp().

◆ calc_next_gap()

size_t calc_next_gap ( ) const
inlineprivate

draw gap size from geometric distribution with p = size_ / count_

Definition at line 302 of file reservoir_sampling.hpp.

◆ calc_sample_size() [1/2]

size_t calc_sample_size ( size_t  count) const
inline

calculate desired sample size

Definition at line 270 of file reservoir_sampling.hpp.

Referenced by SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::wanted_sample_size().

◆ calc_sample_size() [2/2]

size_t calc_sample_size ( ) const
inline

calculate desired sample size

Definition at line 278 of file reservoir_sampling.hpp.

◆ calc_steps_to_next_resize()

size_t calc_steps_to_next_resize ( ) const
inlineprivate

calculate number of items/steps to process without checking for sample resize

Definition at line 318 of file reservoir_sampling.hpp.

◆ count()

size_t count ( ) const
inline

number of items seen

Definition at line 261 of file reservoir_sampling.hpp.

◆ desired_imbalance()

double desired_imbalance ( ) const
inline

desired imbalance

Definition at line 267 of file reservoir_sampling.hpp.

◆ samples()

const std::vector<Type>& samples ( ) const
inline

access to samples

Definition at line 264 of file reservoir_sampling.hpp.

◆ size()

size_t size ( ) const
inline

size of reservoir

Definition at line 258 of file reservoir_sampling.hpp.

Member Data Documentation

◆ count_

size_t count_ = 0
private

number of items seen

Definition at line 286 of file reservoir_sampling.hpp.

◆ debug

constexpr bool debug = false
staticprivate

Definition at line 176 of file reservoir_sampling.hpp.

◆ desired_imbalance_

const double desired_imbalance_
private

epsilon imbalance: this reservoir sampling works well for the range 0.5 to 0.01. Imbalance 0.5 results in 79 samples for 1 million items, 0.1 in 1992, 0.05 in 6643, 0.02 in 49828, and 0.01 in 199315.

Definition at line 299 of file reservoir_sampling.hpp.

◆ gap_

size_t gap_ = 0
private

items to skip until next sample (used in gap algorithm)

Definition at line 288 of file reservoir_sampling.hpp.

◆ rng_

RNG& rng_
private

source of randomness

Definition at line 294 of file reservoir_sampling.hpp.

◆ samples_

std::vector<Type>& samples_
private

reservoir

Definition at line 292 of file reservoir_sampling.hpp.

◆ size_

size_t size_ = 0
private

size of reservoir

Definition at line 284 of file reservoir_sampling.hpp.

◆ steps_to_resize

size_t steps_to_resize = 0
private

items to process prior to checking for reservoir resize

Definition at line 290 of file reservoir_sampling.hpp.


The documentation for this class was generated from the following file: