Thrill
0.1
|
Implementation of a fast approximation of adaptive reservoir sampling using http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/ The reservoir size grows logarithmically with the number given to the sampler, new items replace old ones such that all items in the stream are sampled with the same approximately uniform probability.
Growing of the reservoir is implemented trivially by selecting any current item to expand the array. This works well enough if growing steps become rarer for larger streams.
Definition at line 174 of file reservoir_sampling.hpp.
#include <reservoir_sampling.hpp>
Public Member Functions | |
ReservoirSamplingGrow (std::vector< Type > &samples, RNG &rng, double desired_imbalance=0.05) | |
initialize reservoir sampler More... | |
void | add (const Type &item) |
visit item, maybe add it to the sample. More... | |
size_t | calc_sample_size (size_t count) const |
calculate desired sample size More... | |
size_t | calc_sample_size () const |
calculate desired sample size More... | |
size_t | count () const |
number of items seen More... | |
double | desired_imbalance () const |
desired imbalance More... | |
const std::vector< Type > & | samples () const |
access to samples More... | |
size_t | size () const |
size of reservoir More... | |
Private Member Functions | |
size_t | calc_next_gap () const |
draw gap size from geometric distribution with p = size_ / count_ More... | |
size_t | calc_steps_to_next_resize () const |
Private Attributes | |
size_t | count_ = 0 |
number of items seen More... | |
const double | desired_imbalance_ |
size_t | gap_ = 0 |
items to skip until next sample (used in gap algorithm) More... | |
RNG & | rng_ |
source of randomness More... | |
std::vector< Type > & | samples_ |
reservoir More... | |
size_t | size_ = 0 |
size of reservoir More... | |
size_t | steps_to_resize = 0 |
items to process prior to checking for reservoir resize More... | |
Static Private Attributes | |
static constexpr bool | debug = false |
|
inline |
initialize reservoir sampler
Definition at line 180 of file reservoir_sampling.hpp.
|
inline |
visit item, maybe add it to the sample.
Definition at line 187 of file reservoir_sampling.hpp.
Referenced by SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::PreOp().
|
inlineprivate |
draw gap size from geometric distribution with p = size_ / count_
Definition at line 302 of file reservoir_sampling.hpp.
|
inline |
calculate desired sample size
Definition at line 270 of file reservoir_sampling.hpp.
Referenced by SortNode< ValueType, CompareFunction, SortAlgorithm, Stable >::wanted_sample_size().
|
inline |
calculate desired sample size
Definition at line 278 of file reservoir_sampling.hpp.
|
inlineprivate |
calculate number of items/steps to process without checking for sample resize
Definition at line 318 of file reservoir_sampling.hpp.
|
inline |
number of items seen
Definition at line 261 of file reservoir_sampling.hpp.
|
inline |
desired imbalance
Definition at line 267 of file reservoir_sampling.hpp.
|
inline |
access to samples
Definition at line 264 of file reservoir_sampling.hpp.
|
inline |
size of reservoir
Definition at line 258 of file reservoir_sampling.hpp.
|
private |
number of items seen
Definition at line 286 of file reservoir_sampling.hpp.
|
staticprivate |
Definition at line 176 of file reservoir_sampling.hpp.
|
private |
epsilon imbalance: this reservoir sampling works well for the range 0.5 to 0.01. Imbalance 0.5 results in 79 samples for 1 million items, 0.1 in 1992, 0.05 in 6643, 0.02 in 49828, and 0.01 in 199315.
Definition at line 299 of file reservoir_sampling.hpp.
|
private |
items to skip until next sample (used in gap algorithm)
Definition at line 288 of file reservoir_sampling.hpp.
|
private |
source of randomness
Definition at line 294 of file reservoir_sampling.hpp.
|
private |
reservoir
Definition at line 292 of file reservoir_sampling.hpp.
|
private |
size of reservoir
Definition at line 284 of file reservoir_sampling.hpp.
|
private |
items to process prior to checking for reservoir resize
Definition at line 290 of file reservoir_sampling.hpp.