OpenMS
FASTAFile Class Reference

This class serves for reading in and writing FASTA files If the protein/gene sequence contains unusual symbols (such as translation end (*)), they will be kept! You can use aggregate methods load() and store() to read/write a set of protein sequences at the cost of memory. More...

#include <OpenMS/FORMAT/FASTAFile.h>

Inheritance diagram for FASTAFile:
[legend]
Collaboration diagram for FASTAFile:
[legend]

Classes

struct  FASTAEntry
 FASTA entry type (identifier, description and sequence) The first String corresponds to the identifier that is written after the > in the FASTA file. The part after the first whitespace is stored in description and the text from the next line until the next > (exclusive) is stored in sequence. More...
 

Public Member Functions

 FASTAFile ()=default
 Default constructor. More...
 
 ~FASTAFile () override=default
 Destructor. More...
 
void readStart (const String &filename)
 Prepares a FASTA file given by 'filename' for streamed reading using readNext(). More...
 
bool readNext (FASTAEntry &protein)
 Reads the next FASTA entry from file. If you want to read all entries in one go, use load(). More...
 
std::streampos position ()
 current stream position More...
 
bool atEnd ()
 is stream at EOF? More...
 
bool setPosition (const std::streampos &pos)
 seek stream to pos More...
 
void writeStart (const String &filename)
 Prepares a FASTA file given by 'filename' for streamed writing using writeNext(). More...
 
void writeNext (const FASTAEntry &protein)
 Stores the data given by protein. Call writeStart() once before calling writeNext(). Call writeEnd() when done to close the file! More...
 
void writeEnd ()
 Closes the file (flush). Called implicitly when FASTAFile object goes out of scope. More...
 
void load (const String &filename, std::vector< FASTAEntry > &data) const
 loads a FASTA file given by 'filename' and stores the information in 'data' This uses more RAM than readStart() and readNext(). More...
 
void store (const String &filename, const std::vector< FASTAEntry > &data) const
 stores the data given by 'data' at the file 'filename' More...
 
- Public Member Functions inherited from ProgressLogger
 ProgressLogger ()
 Constructor. More...
 
virtual ~ProgressLogger ()
 Destructor. More...
 
 ProgressLogger (const ProgressLogger &other)
 Copy constructor. More...
 
ProgressLoggeroperator= (const ProgressLogger &other)
 Assignment Operator. More...
 
void setLogType (LogType type) const
 Sets the progress log that should be used. The default type is NONE! More...
 
LogType getLogType () const
 Returns the type of progress log being used. More...
 
void startProgress (SignedSize begin, SignedSize end, const String &label) const
 Initializes the progress display. More...
 
void setProgress (SignedSize value) const
 Sets the current progress. More...
 
void endProgress (UInt64 bytes_processed=0) const
 
void nextProgress () const
 increment progress by 1 (according to range begin-end) More...
 

Protected Member Functions

bool readEntry_ (std::string &id, std::string &description, std::string &seq)
 Reads a protein entry from the current file position and returns the ID and sequence. More...
 

Protected Attributes

std::fstream infile_
 filestream for reading; init using FastaFile::readStart() More...
 
std::ofstream outfile_
 filestream for writing; init using FastaFile::writeStart() More...
 
Size entries_read_ {0}
 some internal book-keeping during reading More...
 
std::streampos fileSize_ {}
 total number of characters of filestream More...
 
std::string seq_
 sequence of currently read protein More...
 
std::string id_
 identifier of currently read protein More...
 
std::string description_
 description of currently read protein More...
 
- Protected Attributes inherited from ProgressLogger
LogType type_
 
time_t last_invoke_
 
ProgressLoggerImplcurrent_logger_
 

Additional Inherited Members

- Public Types inherited from ProgressLogger
enum  LogType { CMD , GUI , NONE }
 Possible log types. More...
 
- Static Protected Member Functions inherited from ProgressLogger
static String logTypeToFactoryName_ (LogType type)
 Return the name of the factory product used for this log type. More...
 
- Static Protected Attributes inherited from ProgressLogger
static int recursion_depth_
 

Detailed Description

This class serves for reading in and writing FASTA files If the protein/gene sequence contains unusual symbols (such as translation end (*)), they will be kept! You can use aggregate methods load() and store() to read/write a set of protein sequences at the cost of memory.

Or use single read/write of protein sequences using readStart(), readNext() and writeStart(), writeNext(), writeEnd() for more memory efficiency. Reading from one and writing to another FASTA file can be handled by one single FASTAFile instance.

Constructor & Destructor Documentation

◆ FASTAFile()

FASTAFile ( )
default

Default constructor.

◆ ~FASTAFile()

~FASTAFile ( )
overridedefault

Destructor.

Member Function Documentation

◆ atEnd()

bool atEnd ( )

is stream at EOF?

◆ load()

void load ( const String filename,
std::vector< FASTAEntry > &  data 
) const

loads a FASTA file given by 'filename' and stores the information in 'data' This uses more RAM than readStart() and readNext().

Exceptions
Exception::FileNotFoundis thrown if the file does not exists.
Exception::ParseErroris thrown if the file does not suit to the standard.

Referenced by NucleicAcidSearchEngine::main_().

◆ position()

std::streampos position ( )

current stream position

◆ readEntry_()

bool readEntry_ ( std::string &  id,
std::string &  description,
std::string &  seq 
)
protected

Reads a protein entry from the current file position and returns the ID and sequence.

Returns
Return true if the protein entry was read and saved successfully, false otherwise

◆ readNext()

bool readNext ( FASTAEntry protein)

Reads the next FASTA entry from file. If you want to read all entries in one go, use load().

Returns
true if entry was read; false if EOF was reached
Exceptions
Exception::FileNotFoundis thrown if the file does not exists.
Exception::ParseErroris thrown if the file does not suit to the standard.

◆ readStart()

void readStart ( const String filename)

Prepares a FASTA file given by 'filename' for streamed reading using readNext().

Exceptions
Exception::FileNotFoundis thrown if the file does not exists.
Exception::ParseErroris thrown if the file does not suit to the standard.

◆ setPosition()

bool setPosition ( const std::streampos &  pos)

seek stream to pos

◆ store()

void store ( const String filename,
const std::vector< FASTAEntry > &  data 
) const

stores the data given by 'data' at the file 'filename'

This uses more RAM than writeStart() and writeNext().

Exceptions
Exception::UnableToCreateFileis thrown if the process is not able to write the file.

◆ writeEnd()

void writeEnd ( )

Closes the file (flush). Called implicitly when FASTAFile object goes out of scope.

◆ writeNext()

void writeNext ( const FASTAEntry protein)

Stores the data given by protein. Call writeStart() once before calling writeNext(). Call writeEnd() when done to close the file!

Exceptions
Exception::UnableToCreateFileis thrown if the process is not able to write the file.

◆ writeStart()

void writeStart ( const String filename)

Prepares a FASTA file given by 'filename' for streamed writing using writeNext().

Exceptions
Exception::UnableToCreateFileis thrown if the process is not able to write to the file (disk full?).

Member Data Documentation

◆ description_

std::string description_
protected

description of currently read protein

◆ entries_read_

Size entries_read_ {0}
protected

some internal book-keeping during reading

◆ fileSize_

std::streampos fileSize_ {}
protected

total number of characters of filestream

◆ id_

std::string id_
protected

identifier of currently read protein

◆ infile_

std::fstream infile_
protected

filestream for reading; init using FastaFile::readStart()

◆ outfile_

std::ofstream outfile_
protected

filestream for writing; init using FastaFile::writeStart()

◆ seq_

std::string seq_
protected

sequence of currently read protein