Thrill  0.1
thrill::vfs Namespace Reference

Namespaces

 glob_local
 

Classes

struct  FileInfo
 General information of vfs file. More...
 
struct  FileList
 List of file info and additional overall info. More...
 
class  ReadStream
 Reader object from any source. More...
 
class  TemporaryDirectory
 A class which creates a temporary directory in the current directory and returns it via get(). More...
 
class  WriteStream
 Writer object to output data to any supported URI. More...
 

Typedefs

using ReadStreamPtr = tlx::CountingPtr< ReadStream >
 
using WriteStreamPtr = tlx::CountingPtr< WriteStream >
 

Enumerations

enum  GlobType { All, File, Directory }
 Type of objects to include in glob result. More...
 
enum  Type { File, Directory }
 VFS object type. More...
 

Functions

void Deinitialize ()
 Deinitialize VFS layer. More...
 
std::string FillFilePattern (const std::string &pathbase, size_t worker, size_t file_part)
 
FileList Glob (const std::vector< std::string > &globlist, const GlobType &gtype=GlobType::All)
 Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files. More...
 
FileList Glob (const std::string &glob, const GlobType &gtype=GlobType::All)
 Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files. More...
 
void Hdfs3Deinitialize ()
 
void Hdfs3Glob (const std::string &, const GlobType &, FileList &)
 
void Hdfs3Initialize ()
 
ReadStreamPtr Hdfs3OpenReadStream (const std::string &, const common::Range &)
 
WriteStreamPtr Hdfs3OpenWriteStream (const std::string &)
 
void Initialize ()
 Initialize VFS layer. More...
 
bool IsCompressed (const std::string &path)
 
bool IsRemoteUri (const std::string &path)
 Returns true, if file at filepath is a remote uri like s3:// or hdfs://. More...
 
ReadStreamPtr MakeBZip2ReadFilter (const ReadStreamPtr &)
 
WriteStreamPtr MakeBZip2WriteFilter (const WriteStreamPtr &)
 
ReadStreamPtr MakeGZipReadFilter (const ReadStreamPtr &)
 
WriteStreamPtr MakeGZipWriteFilter (const WriteStreamPtr &)
 
ReadStreamPtr OpenReadStream (const std::string &path, const common::Range &range=common::Range())
 Construct reader for given path uri. More...
 
WriteStreamPtr OpenWriteStream (const std::string &path)
 
std::ostream & operator<< (std::ostream &os, const Type &t)
 
void S3Deinitialize ()
 
void S3Glob (const std::string &, const GlobType &, FileList &)
 
void S3Initialize ()
 
ReadStreamPtr S3OpenReadStream (const std::string &, const common::Range &)
 
WriteStreamPtr S3OpenWriteStream (const std::string &)
 
void SysGlob (const std::string &path, const GlobType &gtype, FileList &filelist)
 Glob a path and augment the FileList with matching file names. More...
 
static void SysGlobWalkRecursive (const std::string &path, FileList &filelist)
 
ReadStreamPtr SysOpenReadStream (const std::string &path, const common::Range &range=common::Range())
 Open file for reading and return file descriptor. More...
 
WriteStreamPtr SysOpenWriteStream (const std::string &path)
 Open file for writing and return file descriptor. More...
 

Typedef Documentation

◆ ReadStreamPtr

Definition at line 145 of file file_io.hpp.

◆ WriteStreamPtr

Definition at line 146 of file file_io.hpp.

Enumeration Type Documentation

◆ GlobType

enum GlobType
strong

Type of objects to include in glob result.

Enumerator
All 
File 
Directory 

Definition at line 99 of file file_io.hpp.

◆ Type

enum Type
strong

VFS object type.

Enumerator
File 
Directory 

Definition at line 52 of file file_io.hpp.

Function Documentation

◆ Deinitialize()

void Deinitialize ( )

Deinitialize VFS layer.

Definition at line 40 of file file_io.cpp.

References Hdfs3Deinitialize(), and S3Deinitialize().

Referenced by thrill::api::Deinitialize(), and main().

◆ FillFilePattern()

std::string FillFilePattern ( const std::string &  pathbase,
size_t  worker,
size_t  file_part 
)

function which takes pathbase and replaces $$$ with worker and ### with the file_part values.

Definition at line 71 of file file_io.cpp.

References debug, sLOG, and tlx::ssnprintf().

Referenced by WriteBinaryNode< ValueType >::OpenNextFile(), WriteLinesNode< ValueType >::PreOp(), and WriteLinesNode< ValueType >::WriteLinesNode().

◆ Glob() [1/2]

FileList Glob ( const std::vector< std::string > &  globlist,
const GlobType gtype 
)

Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files.

Definition at line 128 of file file_io.cpp.

References FileList::contains_compressed, FileList::contains_remote_uri, Hdfs3Glob(), S3Glob(), tlx::starts_with(), SysGlob(), and FileList::total_size.

Referenced by Glob(), main(), ReadBinaryNode< ValueType >::ReadBinaryNode(), and ReadLinesNode::ReadLinesNode().

◆ Glob() [2/2]

FileList Glob ( const std::string &  glob,
const GlobType gtype 
)

Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files.

Definition at line 172 of file file_io.cpp.

References Glob().

◆ Hdfs3Deinitialize()

void Hdfs3Deinitialize ( )

Definition at line 295 of file hdfs3_file.cpp.

Referenced by Deinitialize().

◆ Hdfs3Glob()

void Hdfs3Glob ( const std::string &  ,
const GlobType ,
FileList  
)

Definition at line 298 of file hdfs3_file.cpp.

References die.

Referenced by Glob().

◆ Hdfs3Initialize()

void Hdfs3Initialize ( )

Definition at line 292 of file hdfs3_file.cpp.

Referenced by Initialize().

◆ Hdfs3OpenReadStream()

ReadStreamPtr Hdfs3OpenReadStream ( const std::string &  ,
const common::Range  
)

Definition at line 303 of file hdfs3_file.cpp.

References die.

Referenced by OpenReadStream().

◆ Hdfs3OpenWriteStream()

WriteStreamPtr Hdfs3OpenWriteStream ( const std::string &  )

Definition at line 308 of file hdfs3_file.cpp.

References die.

Referenced by OpenWriteStream().

◆ Initialize()

void Initialize ( )

Initialize VFS layer.

Definition at line 35 of file file_io.cpp.

References Hdfs3Initialize(), and S3Initialize().

Referenced by thrill::api::Initialize(), and main().

◆ IsCompressed()

bool IsCompressed ( const std::string &  path)

Returns true, if file at filepath is compressed (e.g, ends with '.{gz,bz2,xz,lzo}')

Definition at line 47 of file file_io.cpp.

References tlx::ends_with().

Referenced by FileInfo::IsCompressed().

◆ IsRemoteUri()

bool IsRemoteUri ( const std::string &  path)

Returns true, if file at filepath is a remote uri like s3:// or hdfs://.

Definition at line 55 of file file_io.cpp.

References tlx::starts_with().

Referenced by FileInfo::IsRemoteUri().

◆ MakeBZip2ReadFilter()

ReadStreamPtr MakeBZip2ReadFilter ( const ReadStreamPtr )

Definition at line 233 of file bzip2_filter.cpp.

References die.

Referenced by OpenReadStream().

◆ MakeBZip2WriteFilter()

WriteStreamPtr MakeBZip2WriteFilter ( const WriteStreamPtr )

Definition at line 228 of file bzip2_filter.cpp.

References die.

Referenced by OpenWriteStream().

◆ MakeGZipReadFilter()

ReadStreamPtr MakeGZipReadFilter ( const ReadStreamPtr )

Definition at line 271 of file gzip_filter.cpp.

References die.

Referenced by OpenReadStream().

◆ MakeGZipWriteFilter()

WriteStreamPtr MakeGZipWriteFilter ( const WriteStreamPtr )

Definition at line 266 of file gzip_filter.cpp.

References die.

Referenced by OpenWriteStream().

◆ OpenReadStream()

ReadStreamPtr OpenReadStream ( const std::string &  path,
const common::Range range = common::Range() 
)

Construct reader for given path uri.

Range is the byte range [b,e) inside the file to read. If e = 0, the complete file is read.

For the POSIX SysFile implementation the range is used only to seek to the byte offset b. It allows additional bytes after e to be read.

For the S3File implementations, however, the range[b,e) is used to determine which data to fetch from S3. Hence, once e is reached, read() will return EOF.

Definition at line 180 of file file_io.cpp.

References Range::begin, die_unless, tlx::ends_with(), Hdfs3OpenReadStream(), MakeBZip2ReadFilter(), MakeGZipReadFilter(), S3OpenReadStream(), tlx::starts_with(), and SysOpenReadStream().

Referenced by ReadLinesNode::InputLineIteratorCompressed::HasNext(), ReadLinesNode::InputLineIteratorCompressed::InputLineIteratorCompressed(), ReadLinesNode::InputLineIteratorUncompressed::InputLineIteratorUncompressed(), main(), ReadLinesNode::InputLineIteratorUncompressed::Next(), ReadLinesNode::InputLineIteratorCompressed::Next(), and ReadBinaryNode< ValueType >::VfsFileBlockSource::VfsFileBlockSource().

◆ OpenWriteStream()

◆ operator<<()

std::ostream & operator<< ( std::ostream &  os,
const Type t 
)

Definition at line 60 of file file_io.cpp.

References Directory, and File.

◆ S3Deinitialize()

void S3Deinitialize ( )

Definition at line 734 of file s3_file.cpp.

Referenced by Deinitialize().

◆ S3Glob()

void S3Glob ( const std::string &  ,
const GlobType ,
FileList  
)

Definition at line 737 of file s3_file.cpp.

References die.

Referenced by Glob().

◆ S3Initialize()

void S3Initialize ( )

Definition at line 731 of file s3_file.cpp.

Referenced by Initialize().

◆ S3OpenReadStream()

ReadStreamPtr S3OpenReadStream ( const std::string &  ,
const common::Range  
)

Definition at line 742 of file s3_file.cpp.

References die.

Referenced by OpenReadStream().

◆ S3OpenWriteStream()

WriteStreamPtr S3OpenWriteStream ( const std::string &  )

Definition at line 747 of file s3_file.cpp.

References die.

Referenced by OpenWriteStream().

◆ SysGlob()

void SysGlob ( const std::string &  path,
const GlobType gtype,
FileList filelist 
)

Glob a path and augment the FileList with matching file names.

Definition at line 144 of file sys_file.cpp.

References All, CSimpleGlob, debug, die, Directory, File, LOG1, FileInfo::path, FileInfo::size, sLOG, SysGlobWalkRecursive(), thrill::mem::to_string(), and FileInfo::type.

Referenced by Glob().

◆ SysGlobWalkRecursive()

static void thrill::vfs::SysGlobWalkRecursive ( const std::string &  path,
FileList filelist 
)
static

◆ SysOpenReadStream()

ReadStreamPtr SysOpenReadStream ( const std::string &  path,
const common::Range range = common::Range() 
)

Open file for reading and return file descriptor.

Handles compressed files by calling a decompressor in a pipe, like "cat $f | gzip -dc |" in bash.

Parameters
pathPath to open
rangeByte range to read. begin of range is use to seek to, end can be 0 for reading the whole file. Depending on the underlying fs, one can read past end without errors, it is not enforced.

POSIX lseek function from current position.

POSIX lseek function from current position.

Definition at line 323 of file sys_file.cpp.

References Range::begin, debug, tlx::ends_with(), LOG1, thrill::common::MakePipe(), O_BINARY, thrill::common::PortSetCloseOnExec(), and sLOG.

Referenced by OpenReadStream().

◆ SysOpenWriteStream()

WriteStreamPtr SysOpenWriteStream ( const std::string &  path)

Open file for writing and return file descriptor.

Handles compressed files by calling a compressor in a pipe, like "| gzip -d > $f" in bash.

Parameters
pathPath to open

Definition at line 414 of file sys_file.cpp.

References debug, tlx::ends_with(), LOG1, thrill::common::MakePipe(), O_BINARY, thrill::common::PortSetCloseOnExec(), and sLOG.

Referenced by OpenWriteStream().