Thrill  0.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
thrill::vfs Namespace Reference

Namespaces

 glob_local
 

Classes

struct  FileInfo
 General information of vfs file. More...
 
struct  FileList
 List of file info and additional overall info. More...
 
class  ReadStream
 Reader object from any source. More...
 
class  TemporaryDirectory
 A class which creates a temporary directory in the current directory and returns it via get(). More...
 
class  WriteStream
 Writer object to output data to any supported URI. More...
 

Typedefs

using ReadStreamPtr = tlx::CountingPtr< ReadStream >
 
using WriteStreamPtr = tlx::CountingPtr< WriteStream >
 

Enumerations

enum  GlobType { All, File, Directory }
 Type of objects to include in glob result. More...
 
enum  Type { File, Directory }
 VFS object type. More...
 

Functions

void Deinitialize ()
 Deinitialize VFS layer. More...
 
std::string FillFilePattern (const std::string &pathbase, size_t worker, size_t file_part)
 
FileList Glob (const std::vector< std::string > &globlist, const GlobType &gtype=GlobType::All)
 Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files. More...
 
FileList Glob (const std::string &glob, const GlobType &gtype=GlobType::All)
 Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files. More...
 
void Hdfs3Deinitialize ()
 
void Hdfs3Glob (const std::string &, const GlobType &, FileList &)
 
void Hdfs3Initialize ()
 
ReadStreamPtr Hdfs3OpenReadStream (const std::string &, const common::Range &)
 
WriteStreamPtr Hdfs3OpenWriteStream (const std::string &)
 
void Initialize ()
 Initialize VFS layer. More...
 
bool IsCompressed (const std::string &path)
 
bool IsRemoteUri (const std::string &path)
 Returns true, if file at filepath is a remote uri like s3:// or hdfs://. More...
 
ReadStreamPtr MakeBZip2ReadFilter (const ReadStreamPtr &)
 
WriteStreamPtr MakeBZip2WriteFilter (const WriteStreamPtr &)
 
ReadStreamPtr MakeGZipReadFilter (const ReadStreamPtr &)
 
WriteStreamPtr MakeGZipWriteFilter (const WriteStreamPtr &)
 
ReadStreamPtr OpenReadStream (const std::string &path, const common::Range &range=common::Range())
 Construct reader for given path uri. More...
 
WriteStreamPtr OpenWriteStream (const std::string &path)
 
std::ostream & operator<< (std::ostream &os, const Type &t)
 
void S3Deinitialize ()
 
void S3Glob (const std::string &, const GlobType &, FileList &)
 
void S3Initialize ()
 
ReadStreamPtr S3OpenReadStream (const std::string &, const common::Range &)
 
WriteStreamPtr S3OpenWriteStream (const std::string &)
 
void SysGlob (const std::string &path, const GlobType &gtype, FileList &filelist)
 Glob a path and augment the FileList with matching file names. More...
 
ReadStreamPtr SysOpenReadStream (const std::string &path, const common::Range &range=common::Range())
 Open file for reading and return file descriptor. More...
 
WriteStreamPtr SysOpenWriteStream (const std::string &path)
 Open file for writing and return file descriptor. More...
 

Typedef Documentation

Definition at line 142 of file file_io.hpp.

Definition at line 143 of file file_io.hpp.

Enumeration Type Documentation

enum GlobType
strong

Type of objects to include in glob result.

Enumerator
All 
File 
Directory 

Definition at line 96 of file file_io.hpp.

enum Type
strong

VFS object type.

Enumerator
File 
Directory 

Definition at line 52 of file file_io.hpp.

Function Documentation

void Deinitialize ( )

Deinitialize VFS layer.

Definition at line 39 of file file_io.cpp.

References Hdfs3Deinitialize(), and S3Deinitialize().

Referenced by thrill::api::Deinitialize(), and main().

std::string FillFilePattern ( const std::string &  pathbase,
size_t  worker,
size_t  file_part 
)

function which takes pathbase and replaces $$$ with worker and ### with the file_part values.

Definition at line 70 of file file_io.cpp.

References debug, and sLOG.

Referenced by WriteBinaryNode< ValueType >::OpenNextFile().

FileList Glob ( const std::vector< std::string > &  globlist,
const GlobType &  gtype 
)

Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files.

Definition at line 127 of file file_io.cpp.

References FileList::contains_compressed, FileList::contains_remote_uri, Hdfs3Glob(), S3Glob(), tlx::starts_with(), SysGlob(), and FileList::total_size.

Referenced by Glob(), main(), ReadBinaryNode< ValueType >::ReadBinaryNode(), and ReadLinesNode::ReadLinesNode().

FileList Glob ( const std::string &  glob,
const GlobType &  gtype 
)

Reads a glob path list and deliver a file list, sizes, and prefixsums (in bytes) for all matching files.

Definition at line 171 of file file_io.cpp.

References Glob().

void Hdfs3Deinitialize ( )

Definition at line 295 of file hdfs3_file.cpp.

Referenced by Deinitialize().

void Hdfs3Glob ( const std::string &  ,
const GlobType &  ,
FileList &   
)

Definition at line 298 of file hdfs3_file.cpp.

References die.

Referenced by Glob().

void Hdfs3Initialize ( )

Definition at line 292 of file hdfs3_file.cpp.

Referenced by Initialize().

ReadStreamPtr Hdfs3OpenReadStream ( const std::string &  ,
const common::Range &   
)

Definition at line 303 of file hdfs3_file.cpp.

References die.

Referenced by OpenReadStream().

WriteStreamPtr Hdfs3OpenWriteStream ( const std::string &  )

Definition at line 308 of file hdfs3_file.cpp.

References die.

Referenced by OpenWriteStream().

void Initialize ( )

Initialize VFS layer.

Definition at line 34 of file file_io.cpp.

References Hdfs3Initialize(), and S3Initialize().

Referenced by thrill::api::Initialize(), and main().

bool IsCompressed ( const std::string &  path)

Returns true, if file at filepath is compressed (e.g, ends with '.{gz,bz2,xz,lzo}')

Definition at line 46 of file file_io.cpp.

References tlx::ends_with().

Referenced by FileInfo::IsCompressed().

bool IsRemoteUri ( const std::string &  path)

Returns true, if file at filepath is a remote uri like s3:// or hdfs://.

Definition at line 54 of file file_io.cpp.

References tlx::starts_with().

Referenced by FileInfo::IsRemoteUri().

ReadStreamPtr MakeBZip2ReadFilter ( const ReadStreamPtr &  )

Definition at line 233 of file bzip2_filter.cpp.

References die.

Referenced by OpenReadStream().

WriteStreamPtr MakeBZip2WriteFilter ( const WriteStreamPtr &  )

Definition at line 228 of file bzip2_filter.cpp.

References die.

Referenced by OpenWriteStream().

ReadStreamPtr MakeGZipReadFilter ( const ReadStreamPtr &  )

Definition at line 271 of file gzip_filter.cpp.

References die.

Referenced by OpenReadStream().

WriteStreamPtr MakeGZipWriteFilter ( const WriteStreamPtr &  )

Definition at line 266 of file gzip_filter.cpp.

References die.

Referenced by OpenWriteStream().

ReadStreamPtr OpenReadStream ( const std::string &  path,
const common::Range &  range = common::Range() 
)

Construct reader for given path uri.

Range is the byte range [b,e) inside the file to read. If e = 0, the complete file is read.

For the POSIX SysFile implementation the range is used only to seek to the byte offset b. It allows additional bytes after e to be read.

For the S3File implementations, however, the range[b,e) is used to determine which data to fetch from S3. Hence, once e is reached, read() will return EOF.

Definition at line 179 of file file_io.cpp.

References Range::begin, die_unless, tlx::ends_with(), Hdfs3OpenReadStream(), MakeBZip2ReadFilter(), MakeGZipReadFilter(), S3OpenReadStream(), tlx::starts_with(), and SysOpenReadStream().

Referenced by ReadBinaryNode< ValueType >::FileBlockSource::FileBlockSource(), ReadLinesNode::InputLineIteratorCompressed::HasNext(), ReadLinesNode::InputLineIteratorCompressed::InputLineIteratorCompressed(), ReadLinesNode::InputLineIteratorUncompressed::InputLineIteratorUncompressed(), main(), ReadLinesNode::InputLineIteratorUncompressed::Next(), and ReadLinesNode::InputLineIteratorCompressed::Next().

WriteStreamPtr OpenWriteStream ( const std::string &  path)
std::ostream & operator<< ( std::ostream &  os,
const Type &  t 
)

Definition at line 59 of file file_io.cpp.

References Directory, and File.

void S3Deinitialize ( )

Definition at line 730 of file s3_file.cpp.

Referenced by Deinitialize().

void S3Glob ( const std::string &  ,
const GlobType &  ,
FileList &   
)

Definition at line 733 of file s3_file.cpp.

References die.

Referenced by Glob().

void S3Initialize ( )

Definition at line 727 of file s3_file.cpp.

Referenced by Initialize().

ReadStreamPtr S3OpenReadStream ( const std::string &  ,
const common::Range &   
)

Definition at line 738 of file s3_file.cpp.

References die.

Referenced by OpenReadStream().

WriteStreamPtr S3OpenWriteStream ( const std::string &  )

Definition at line 743 of file s3_file.cpp.

References die.

Referenced by OpenWriteStream().

void SysGlob ( const std::string &  path,
const GlobType &  gtype,
FileList &  filelist 
)

Glob a path and augment the FileList with matching file names.

Definition at line 54 of file sys_file.cpp.

References All, CSimpleGlob, die, Directory, File, FileInfo::path, FileInfo::size, and FileInfo::type.

Referenced by Glob().

ReadStreamPtr SysOpenReadStream ( const std::string &  path,
const common::Range &  range = common::Range() 
)

Open file for reading and return file descriptor.

Handles compressed files by calling a decompressor in a pipe, like "cat $f | gzip -dc |" in bash.

Parameters
pathPath to open
rangeByte range to read. begin of range is use to seek to, end can be 0 for reading the whole file. Depending on the underlying fs, one can read past end without errors, it is not enforced.

POSIX lseek function from current position.

POSIX lseek function from current position.

Definition at line 230 of file sys_file.cpp.

References Range::begin, debug, tlx::ends_with(), LOG1, lseek, thrill::common::MakePipe(), O_BINARY, thrill::common::PortSetCloseOnExec(), and sLOG.

Referenced by OpenReadStream().

WriteStreamPtr SysOpenWriteStream ( const std::string &  path)

Open file for writing and return file descriptor.

Handles compressed files by calling a compressor in a pipe, like "| gzip -d > $f" in bash.

Parameters
pathPath to open

Definition at line 321 of file sys_file.cpp.

References debug, tlx::ends_with(), LOG1, thrill::common::MakePipe(), O_BINARY, thrill::common::PortSetCloseOnExec(), and sLOG.

Referenced by OpenWriteStream().