OpenMS
Loading...
Searching...
No Matches
ProteinIdentificationArrowIO Class Reference

Import and export ProteinIdentification data to/from Apache Arrow format. More...

#include <OpenMS/FORMAT/ProteinIdentificationArrowIO.h>

Static Public Member Functions

static std::shared_ptr< arrow::Table > exportProteinsToArrow (const std::vector< ProteinIdentification > &protein_identifications)
 Export protein hits to Apache Arrow Table.
 
static bool exportProteinsToParquet (const std::vector< ProteinIdentification > &protein_identifications, const String &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Export protein hits to Parquet file.
 
static std::shared_ptr< arrow::Table > exportProteinGroupsToArrow (const std::vector< ProteinIdentification > &protein_identifications)
 Export protein groups to Apache Arrow Table.
 
static bool exportProteinGroupsToParquet (const std::vector< ProteinIdentification > &protein_identifications, const String &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Export protein groups to Parquet file.
 
static std::shared_ptr< arrow::Table > exportSearchParamsToArrow (const std::vector< ProteinIdentification > &protein_identifications)
 Export search parameters to Apache Arrow Table.
 
static bool exportSearchParamsToParquet (const std::vector< ProteinIdentification > &protein_identifications, const String &filename, const ParquetWriteConfig &config=ParquetWriteConfig{})
 Export search parameters to Parquet file.
 
static bool importFromParquet (const String &proteins_filename, const String &protein_groups_filename, const String &search_params_filename, std::vector< ProteinIdentification > &protein_identifications)
 Import all three Parquet files and reconstruct ProteinIdentifications.
 
static bool importSearchParamsFromArrow (const std::shared_ptr< arrow::Table > &table, std::vector< ProteinIdentification > &protein_identifications)
 Import search parameters from Arrow Table.
 
static bool importProteinsFromArrow (const std::shared_ptr< arrow::Table > &table, std::vector< ProteinIdentification > &protein_identifications)
 Import protein hits from Arrow Table.
 
static bool importProteinGroupsFromArrow (const std::shared_ptr< arrow::Table > &table, std::vector< ProteinIdentification > &protein_identifications)
 Import protein groups from Arrow Table.
 
static bool importSearchParamsFromParquet (const String &filename, std::vector< ProteinIdentification > &protein_identifications)
 Import search parameters from Parquet file.
 
static bool importProteinsFromParquet (const String &filename, std::vector< ProteinIdentification > &protein_identifications)
 Import protein hits from Parquet file.
 
static bool importProteinGroupsFromParquet (const String &filename, std::vector< ProteinIdentification > &protein_identifications)
 Import protein groups from Parquet file.
 

Detailed Description

Import and export ProteinIdentification data to/from Apache Arrow format.

This class provides static methods to export and import ProteinIdentification data to/from Apache Arrow Tables and Parquet files. Separate tables are provided for protein hits, protein groups, and search parameters.

Experimental classes:
This API is experimental and may change in future versions.

Member Function Documentation

◆ exportProteinGroupsToArrow()

static std::shared_ptr< arrow::Table > exportProteinGroupsToArrow ( const std::vector< ProteinIdentification > &  protein_identifications)
static

Export protein groups to Apache Arrow Table.

Each ProteinGroup becomes one row with group probability and member accessions.

Parameters
[in]protein_identificationsVector of protein identifications
Returns
Shared pointer to Arrow Table, or nullptr on error

◆ exportProteinGroupsToParquet()

static bool exportProteinGroupsToParquet ( const std::vector< ProteinIdentification > &  protein_identifications,
const String filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Export protein groups to Parquet file.

Parameters
[in]protein_identificationsVector of protein identifications
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error

◆ exportProteinsToArrow()

static std::shared_ptr< arrow::Table > exportProteinsToArrow ( const std::vector< ProteinIdentification > &  protein_identifications)
static

Export protein hits to Apache Arrow Table.

Each ProteinHit becomes one row with identification, score, and metadata.

Parameters
[in]protein_identificationsVector of protein identifications
Returns
Shared pointer to Arrow Table, or nullptr on error

◆ exportProteinsToParquet()

static bool exportProteinsToParquet ( const std::vector< ProteinIdentification > &  protein_identifications,
const String filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Export protein hits to Parquet file.

Parameters
[in]protein_identificationsVector of protein identifications
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error

◆ exportSearchParamsToArrow()

static std::shared_ptr< arrow::Table > exportSearchParamsToArrow ( const std::vector< ProteinIdentification > &  protein_identifications)
static

Export search parameters to Apache Arrow Table.

Each ProteinIdentification's SearchParameters becomes one row.

Parameters
[in]protein_identificationsVector of protein identifications
Returns
Shared pointer to Arrow Table, or nullptr on error

◆ exportSearchParamsToParquet()

static bool exportSearchParamsToParquet ( const std::vector< ProteinIdentification > &  protein_identifications,
const String filename,
const ParquetWriteConfig config = ParquetWriteConfig{} 
)
static

Export search parameters to Parquet file.

Parameters
[in]protein_identificationsVector of protein identifications
[in]filenameOutput file path
[in]configParquet writing options
Returns
true on success, false on error

◆ importFromParquet()

static bool importFromParquet ( const String proteins_filename,
const String protein_groups_filename,
const String search_params_filename,
std::vector< ProteinIdentification > &  protein_identifications 
)
static

Import all three Parquet files and reconstruct ProteinIdentifications.

Reads the three Parquet files and reconstructs a vector of ProteinIdentification objects with hits, groups, and search parameters.

Parameters
[in]proteins_filenamePath to proteins Parquet file
[in]protein_groups_filenamePath to protein groups Parquet file
[in]search_params_filenamePath to search params Parquet file
[out]protein_identificationsReconstructed protein identifications
Returns
true on success, false on error

◆ importProteinGroupsFromArrow()

static bool importProteinGroupsFromArrow ( const std::shared_ptr< arrow::Table > &  table,
std::vector< ProteinIdentification > &  protein_identifications 
)
static

Import protein groups from Arrow Table.

Adds ProteinGroups and IndistinguishableProteins to matching ProteinIdentifications by run_identifier.

Parameters
[in]tableArrow Table with protein groups
[out]protein_identificationsProtein identifications to populate
Returns
true on success, false on error

◆ importProteinGroupsFromParquet()

static bool importProteinGroupsFromParquet ( const String filename,
std::vector< ProteinIdentification > &  protein_identifications 
)
static

Import protein groups from Parquet file.

Parameters
[in]filenamePath to Parquet file
[out]protein_identificationsProtein identifications to populate
Returns
true on success, false on error

◆ importProteinsFromArrow()

static bool importProteinsFromArrow ( const std::shared_ptr< arrow::Table > &  table,
std::vector< ProteinIdentification > &  protein_identifications 
)
static

Import protein hits from Arrow Table.

Adds ProteinHits to matching ProteinIdentifications by run_identifier. If no matching ProteinIdentification exists, creates new ones.

Parameters
[in]tableArrow Table with protein hits
[out]protein_identificationsProtein identifications to populate
Returns
true on success, false on error

◆ importProteinsFromParquet()

static bool importProteinsFromParquet ( const String filename,
std::vector< ProteinIdentification > &  protein_identifications 
)
static

Import protein hits from Parquet file.

Parameters
[in]filenamePath to Parquet file
[out]protein_identificationsProtein identifications to populate
Returns
true on success, false on error

◆ importSearchParamsFromArrow()

static bool importSearchParamsFromArrow ( const std::shared_ptr< arrow::Table > &  table,
std::vector< ProteinIdentification > &  protein_identifications 
)
static

Import search parameters from Arrow Table.

Each row becomes a ProteinIdentification shell with run-level metadata and SearchParameters populated.

Parameters
[in]tableArrow Table with search parameters
[out]protein_identificationsReconstructed protein identifications
Returns
true on success, false on error

◆ importSearchParamsFromParquet()

static bool importSearchParamsFromParquet ( const String filename,
std::vector< ProteinIdentification > &  protein_identifications 
)
static

Import search parameters from Parquet file.

Parameters
[in]filenamePath to Parquet file
[out]protein_identificationsReconstructed protein identifications
Returns
true on success, false on error