Thrill  0.1
ReservoirSamplingFast< Type, RNG > Class Template Reference

Detailed Description

template<typename Type, typename RNG = std::default_random_engine>
class thrill::common::ReservoirSamplingFast< Type, RNG >

Fast exact implementation of reservoir sampling using skip values.

Algorithm L from Kim-Hung Li: Reservoir Sampling Algorithms of Time Complexity O(n(1+log(N/n))), ACM TOMS 1994. The reservoir size is fixed, new items replace old ones such that all items in the stream are sampled with the same uniform probability.

Definition at line 85 of file reservoir_sampling.hpp.

+ Inheritance diagram for ReservoirSamplingFast< Type, RNG >:

#include <reservoir_sampling.hpp>

Public Member Functions

 ReservoirSamplingFast (size_t size, std::vector< Type > &samples, RNG &rng)
 initialize reservoir sampler More...
 
void add (const Type &item)
 visit item, maybe add it to the sample. More...
 
size_t count () const
 number of items seen More...
 
const std::vector< Type > & samples () const
 access to samples More...
 
size_t size () const
 size of reservoir More...
 

Private Member Functions

void calc_next_gap ()
 draw gap size from geometric distribution with p = size_ / count_ More...
 

Private Attributes

size_t count_ = 0
 number of items seen More...
 
size_t gap_
 number of items to skip until next sample More...
 
RNG & rng_
 source of randomness More...
 
std::vector< Type > & samples_
 reservoir More...
 
size_t size_
 size of reservoir More...
 
std::uniform_real_distribution< double > uniform
 uniform [0.0, 1.0) distribution More...
 
double W_
 

Constructor & Destructor Documentation

◆ ReservoirSamplingFast()

ReservoirSamplingFast ( size_t  size,
std::vector< Type > &  samples,
RNG &  rng 
)
inline

initialize reservoir sampler

Definition at line 89 of file reservoir_sampling.hpp.

Member Function Documentation

◆ add()

void add ( const Type &  item)
inline

visit item, maybe add it to the sample.

Definition at line 98 of file reservoir_sampling.hpp.

Referenced by SampleNode< ValueType >::SampleNode().

◆ calc_next_gap()

void calc_next_gap ( )
inlineprivate

draw gap size from geometric distribution with p = size_ / count_

Definition at line 156 of file reservoir_sampling.hpp.

◆ count()

size_t count ( ) const
inline

number of items seen

Definition at line 133 of file reservoir_sampling.hpp.

Referenced by SampleNode< ValueType >::Execute().

◆ samples()

const std::vector<Type>& samples ( ) const
inline

access to samples

Definition at line 136 of file reservoir_sampling.hpp.

◆ size()

size_t size ( ) const
inline

size of reservoir

Definition at line 130 of file reservoir_sampling.hpp.

Member Data Documentation

◆ count_

size_t count_ = 0
private

number of items seen

Definition at line 142 of file reservoir_sampling.hpp.

◆ gap_

size_t gap_
private

number of items to skip until next sample

Definition at line 144 of file reservoir_sampling.hpp.

◆ rng_

RNG& rng_
private

source of randomness

Definition at line 151 of file reservoir_sampling.hpp.

◆ samples_

std::vector<Type>& samples_
private

reservoir

Definition at line 149 of file reservoir_sampling.hpp.

◆ size_

size_t size_
private

size of reservoir

Definition at line 140 of file reservoir_sampling.hpp.

◆ uniform

std::uniform_real_distribution<double> uniform
private

uniform [0.0, 1.0) distribution

Definition at line 153 of file reservoir_sampling.hpp.

◆ W_

double W_
private

random value for gap calculation, distribution: largest value in a sample of Uniform(0, old_W) of size size_, where old_W is 1 initially

Definition at line 147 of file reservoir_sampling.hpp.


The documentation for this class was generated from the following file: