Thrill
0.1
|
Namespaces | |
glob_local | |
Classes | |
struct | FileInfo |
General information of vfs file. More... | |
struct | FileList |
List of file info and additional overall info. More... | |
class | ReadStream |
Reader object from any source. More... | |
class | TemporaryDirectory |
A class which creates a temporary directory in the current directory and returns it via get(). More... | |
class | WriteStream |
Writer object to output data to any supported URI. More... | |
Typedefs | |
using | ReadStreamPtr = tlx::CountingPtr< ReadStream > |
using | WriteStreamPtr = tlx::CountingPtr< WriteStream > |
Enumerations | |
enum | GlobType { All, File, Directory } |
Type of objects to include in glob result. More... | |
enum | Type { File, Directory } |
VFS object type. More... | |
Functions | |
void | Deinitialize () |
Deinitialize VFS layer. More... | |
std::string | FillFilePattern (const std::string &pathbase, size_t worker, size_t file_part) |
FileList | Glob (const std::vector< std::string > &globlist, const GlobType >ype=GlobType::All) |
Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files. More... | |
FileList | Glob (const std::string &glob, const GlobType >ype=GlobType::All) |
Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files. More... | |
void | Hdfs3Deinitialize () |
void | Hdfs3Glob (const std::string &, const GlobType &, FileList &) |
void | Hdfs3Initialize () |
ReadStreamPtr | Hdfs3OpenReadStream (const std::string &, const common::Range &) |
WriteStreamPtr | Hdfs3OpenWriteStream (const std::string &) |
void | Initialize () |
Initialize VFS layer. More... | |
bool | IsCompressed (const std::string &path) |
bool | IsRemoteUri (const std::string &path) |
Returns true, if file at filepath is a remote uri like s3:// or hdfs://. More... | |
ReadStreamPtr | MakeBZip2ReadFilter (const ReadStreamPtr &) |
WriteStreamPtr | MakeBZip2WriteFilter (const WriteStreamPtr &) |
ReadStreamPtr | MakeGZipReadFilter (const ReadStreamPtr &) |
WriteStreamPtr | MakeGZipWriteFilter (const WriteStreamPtr &) |
ReadStreamPtr | OpenReadStream (const std::string &path, const common::Range &range=common::Range()) |
Construct reader for given path uri. More... | |
WriteStreamPtr | OpenWriteStream (const std::string &path) |
std::ostream & | operator<< (std::ostream &os, const Type &t) |
void | S3Deinitialize () |
void | S3Glob (const std::string &, const GlobType &, FileList &) |
void | S3Initialize () |
ReadStreamPtr | S3OpenReadStream (const std::string &, const common::Range &) |
WriteStreamPtr | S3OpenWriteStream (const std::string &) |
void | SysGlob (const std::string &path, const GlobType >ype, FileList &filelist) |
Glob a path and augment the FileList with matching file names. More... | |
static void | SysGlobWalkRecursive (const std::string &path, FileList &filelist) |
ReadStreamPtr | SysOpenReadStream (const std::string &path, const common::Range &range=common::Range()) |
Open file for reading and return file descriptor. More... | |
WriteStreamPtr | SysOpenWriteStream (const std::string &path) |
Open file for writing and return file descriptor. More... | |
using ReadStreamPtr = tlx::CountingPtr<ReadStream> |
Definition at line 145 of file file_io.hpp.
using WriteStreamPtr = tlx::CountingPtr<WriteStream> |
Definition at line 146 of file file_io.hpp.
|
strong |
Type of objects to include in glob result.
Enumerator | |
---|---|
All | |
File | |
Directory |
Definition at line 99 of file file_io.hpp.
|
strong |
void Deinitialize | ( | ) |
Deinitialize VFS layer.
Definition at line 40 of file file_io.cpp.
References Hdfs3Deinitialize(), and S3Deinitialize().
Referenced by thrill::api::Deinitialize(), and main().
std::string FillFilePattern | ( | const std::string & | pathbase, |
size_t | worker, | ||
size_t | file_part | ||
) |
function which takes pathbase and replaces $$$ with worker and ### with the file_part values.
Definition at line 71 of file file_io.cpp.
References debug, sLOG, and tlx::ssnprintf().
Referenced by WriteBinaryNode< ValueType >::OpenNextFile(), WriteLinesNode< ValueType >::PreOp(), and WriteLinesNode< ValueType >::WriteLinesNode().
Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files.
Definition at line 128 of file file_io.cpp.
References FileList::contains_compressed, FileList::contains_remote_uri, Hdfs3Glob(), S3Glob(), tlx::starts_with(), SysGlob(), and FileList::total_size.
Referenced by Glob(), main(), ReadBinaryNode< ValueType >::ReadBinaryNode(), and ReadLinesNode::ReadLinesNode().
Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files.
Definition at line 172 of file file_io.cpp.
References Glob().
void Hdfs3Deinitialize | ( | ) |
Definition at line 295 of file hdfs3_file.cpp.
Referenced by Deinitialize().
void Hdfs3Initialize | ( | ) |
Definition at line 292 of file hdfs3_file.cpp.
Referenced by Initialize().
ReadStreamPtr Hdfs3OpenReadStream | ( | const std::string & | , |
const common::Range & | |||
) |
WriteStreamPtr Hdfs3OpenWriteStream | ( | const std::string & | ) |
void Initialize | ( | ) |
Initialize VFS layer.
Definition at line 35 of file file_io.cpp.
References Hdfs3Initialize(), and S3Initialize().
Referenced by thrill::api::Initialize(), and main().
bool IsCompressed | ( | const std::string & | path | ) |
Returns true, if file at filepath is compressed (e.g, ends with '.{gz,bz2,xz,lzo}')
Definition at line 47 of file file_io.cpp.
References tlx::ends_with().
Referenced by FileInfo::IsCompressed().
bool IsRemoteUri | ( | const std::string & | path | ) |
Returns true, if file at filepath is a remote uri like s3:// or hdfs://.
Definition at line 55 of file file_io.cpp.
References tlx::starts_with().
Referenced by FileInfo::IsRemoteUri().
ReadStreamPtr MakeBZip2ReadFilter | ( | const ReadStreamPtr & | ) |
WriteStreamPtr MakeBZip2WriteFilter | ( | const WriteStreamPtr & | ) |
ReadStreamPtr MakeGZipReadFilter | ( | const ReadStreamPtr & | ) |
WriteStreamPtr MakeGZipWriteFilter | ( | const WriteStreamPtr & | ) |
ReadStreamPtr OpenReadStream | ( | const std::string & | path, |
const common::Range & | range = common::Range() |
||
) |
Construct reader for given path uri.
Range is the byte range [b,e) inside the file to read. If e = 0, the complete file is read.
For the POSIX SysFile implementation the range is used only to seek to the byte offset b. It allows additional bytes after e to be read.
For the S3File implementations, however, the range[b,e) is used to determine which data to fetch from S3. Hence, once e is reached, read() will return EOF.
Definition at line 180 of file file_io.cpp.
References Range::begin, die_unless, tlx::ends_with(), Hdfs3OpenReadStream(), MakeBZip2ReadFilter(), MakeGZipReadFilter(), S3OpenReadStream(), tlx::starts_with(), and SysOpenReadStream().
Referenced by ReadLinesNode::InputLineIteratorCompressed::HasNext(), ReadLinesNode::InputLineIteratorCompressed::InputLineIteratorCompressed(), ReadLinesNode::InputLineIteratorUncompressed::InputLineIteratorUncompressed(), main(), ReadLinesNode::InputLineIteratorUncompressed::Next(), ReadLinesNode::InputLineIteratorCompressed::Next(), and ReadBinaryNode< ValueType >::VfsFileBlockSource::VfsFileBlockSource().
WriteStreamPtr OpenWriteStream | ( | const std::string & | path | ) |
Definition at line 211 of file file_io.cpp.
References tlx::ends_with(), Hdfs3OpenWriteStream(), MakeBZip2WriteFilter(), MakeGZipWriteFilter(), S3OpenWriteStream(), tlx::starts_with(), and SysOpenWriteStream().
Referenced by main(), WriteLinesNode< ValueType >::PreOp(), and WriteLinesNode< ValueType >::WriteLinesNode().
std::ostream & operator<< | ( | std::ostream & | os, |
const Type & | t | ||
) |
Definition at line 60 of file file_io.cpp.
void S3Deinitialize | ( | ) |
Definition at line 734 of file s3_file.cpp.
Referenced by Deinitialize().
void S3Initialize | ( | ) |
Definition at line 731 of file s3_file.cpp.
Referenced by Initialize().
ReadStreamPtr S3OpenReadStream | ( | const std::string & | , |
const common::Range & | |||
) |
WriteStreamPtr S3OpenWriteStream | ( | const std::string & | ) |
Glob a path and augment the FileList with matching file names.
Definition at line 144 of file sys_file.cpp.
References All, CSimpleGlob, debug, die, Directory, File, LOG1, FileInfo::path, FileInfo::size, sLOG, SysGlobWalkRecursive(), thrill::mem::to_string(), and FileInfo::type.
Referenced by Glob().
|
static |
Definition at line 55 of file sys_file.cpp.
References Directory, File, FileInfo::path, FileInfo::size, thrill::mem::to_string(), thrill::common::ts_readdir(), and FileInfo::type.
Referenced by SysGlob().
ReadStreamPtr SysOpenReadStream | ( | const std::string & | path, |
const common::Range & | range = common::Range() |
||
) |
Open file for reading and return file descriptor.
Handles compressed files by calling a decompressor in a pipe, like "cat $f | gzip -dc |" in bash.
path | Path to open |
range | Byte range to read. begin of range is use to seek to, end can be 0 for reading the whole file. Depending on the underlying fs, one can read past end without errors, it is not enforced. |
POSIX lseek function from current position.
POSIX lseek function from current position.
Definition at line 323 of file sys_file.cpp.
References Range::begin, debug, tlx::ends_with(), LOG1, thrill::common::MakePipe(), O_BINARY, thrill::common::PortSetCloseOnExec(), and sLOG.
Referenced by OpenReadStream().
WriteStreamPtr SysOpenWriteStream | ( | const std::string & | path | ) |
Open file for writing and return file descriptor.
Handles compressed files by calling a compressor in a pipe, like "| gzip -d > $f" in bash.
path | Path to open |
Definition at line 414 of file sys_file.cpp.
References debug, tlx::ends_with(), LOG1, thrill::common::MakePipe(), O_BINARY, thrill::common::PortSetCloseOnExec(), and sLOG.
Referenced by OpenWriteStream().