Workflow Type: Common Workflow Language
Stable

Protein Conformational ensembles generation

Workflow included in the ELIXIR 3D-Bioinfo Implementation Study:

Building on PDBe-KB to chart and characterize the conformation landscape of native proteins

This tutorial aims to illustrate the process of generating protein conformational ensembles from** 3D structures **and analysing its molecular flexibility, step by step, using the BioExcel Building Blocks library (biobb).

Conformational landscape of native proteins

Proteins are dynamic systems that adopt multiple conformational states, a property essential for many biological processes (e.g. binding other proteins, nucleic acids, small molecule ligands, or switching between functionaly active and inactive states). Characterizing the different conformational states of proteins and the transitions between them is therefore critical for gaining insight into their biological function and can help explain the effects of genetic variants in health and disease and the action of drugs.

Structural biology has become increasingly efficient in sampling the different conformational states of proteins. The PDB has currently archived more than 170,000 individual structures, but over two thirds of these structures represent multiple conformations of the same or related protein, observed in different crystal forms, when interacting with other proteins or other macromolecules, or upon binding small molecule ligands. Charting this conformational diversity across the PDB can therefore be employed to build a useful approximation of the conformational landscape of native proteins.

A number of resources and tools describing and characterizing various often complementary aspects of protein conformational diversity in known structures have been developed, notably by groups in Europe. These tools include algorithms with varying degree of sophistication, for aligning the 3D structures of individual protein chains or domains, of protein assemblies, and evaluating their degree of structural similarity. Using such tools one can align structures pairwise, compute the corresponding similarity matrix, and identify ensembles of structures/conformations with a defined similarity level that tend to recur in different PDB entries, an operation typically performed using clustering methods. Such workflows are at the basis of resources such as CATH, Contemplate, or PDBflex that offer access to conformational ensembles comprised of similar conformations clustered according to various criteria. Other types of tools focus on differences between protein conformations, identifying regions of proteins that undergo large collective displacements in different PDB entries, those that act as hinges or linkers, or regions that are inherently flexible.

To build a meaningful approximation of the conformational landscape of native proteins, the conformational ensembles (and the differences between them), identified on the basis of structural similarity/dissimilarity measures alone, need to be biophysically characterized. This may be approached at two different levels.

  • At the biological level, it is important to link observed conformational ensembles, to their functional roles by evaluating the correspondence with protein family classifications based on sequence information and functional annotations in public databases e.g. Uniprot, PDKe-Knowledge Base (KB). These links should provide valuable mechanistic insights into how the conformational and dynamic properties of proteins are exploited by evolution to regulate their biological function.

  • At the physical level one needs to introduce energetic consideration to evaluate the likelihood that the identified conformational ensembles represent conformational states that the protein (or domain under study) samples in isolation. Such evaluation is notoriously challenging and can only be roughly approximated by using computational methods to evaluate the extent to which the observed conformational ensembles can be reproduced by algorithms that simulate the dynamic behavior of protein systems. These algorithms include the computationally expensive classical molecular dynamics (MD) simulations to sample local thermal fluctuations but also faster more approximate methods such as Elastic Network Models and Normal Node Analysis (NMA) to model low energy collective motions. Alternatively, enhanced sampling molecular dynamics can be used to model complex types of conformational changes but at a very high computational cost.

The ELIXIR 3D-Bioinfo Implementation Study Building on PDBe-KB to chart and characterize the conformation landscape of native proteins focuses on:

  1. Mapping the conformational diversity of proteins and their homologs across the PDB.
  2. Characterize the different flexibility properties of protein regions, and link this information to sequence and functional annotation.
  3. Benchmark computational methods that can predict a biophysical description of protein motions.

This notebook is part of the third objective, where a list of computational resources that are able to predict protein flexibility and conformational ensembles have been collected, evaluated, and integrated in reproducible and interoperable workflows using the BioExcel Building Blocks library. Note that the list is not meant to be exhaustive, it is built following the expertise of the implementation study partners.


Copyright & Licensing

This software has been developed in the MMB group at the BSC & IRB for the European BioExcel, funded by the European Commission (EU H2020 823830, EU H2020 675728).

Licensed under the Apache License 2.0, see the file LICENSE for details.

Could not render the workflow diagram.

Inputs

ID Name Description Type
step0_extract_model_input_structure_path n/a n/a
  • File
step0_extract_model_output_structure_path n/a n/a
  • string
step0_extract_model_config n/a n/a
  • string
step1_extract_chain_output_structure_path n/a n/a
  • string
step1_extract_chain_config n/a n/a
  • string
step2_cpptraj_mask_output_cpptraj_path n/a n/a
  • string
step2_cpptraj_mask_config n/a n/a
  • string
step3_cpptraj_mask_output_cpptraj_path n/a n/a
  • string
step3_cpptraj_mask_config n/a n/a
  • string
step4_concoord_dist_output_pdb_path n/a n/a
  • string
step4_concoord_dist_output_gro_path n/a n/a
  • string
step4_concoord_dist_output_dat_path n/a n/a
  • string
step4_concoord_dist_config n/a n/a
  • string
step5_concoord_disco_output_traj_path n/a n/a
  • string
step5_concoord_disco_output_rmsd_path n/a n/a
  • string
step5_concoord_disco_output_bfactor_path n/a n/a
  • string
step4_concoord_disco_config n/a n/a
  • string
step6_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step6_cpptraj_rms_config n/a n/a
  • string
step7_cpptraj_convert_output_cpptraj_path n/a n/a
  • string
step7_cpptraj_convert_config n/a n/a
  • string
step8_prody_anm_output_pdb_path n/a n/a
  • string
step8_prody_anm_config n/a n/a
  • string
step9_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step9_cpptraj_rms_config n/a n/a
  • string
step10_cpptraj_convert_output_cpptraj_path n/a n/a
  • string
step10_cpptraj_convert_config n/a n/a
  • string
step11_bd_run_output_crd_path n/a n/a
  • string
step11_bd_run_output_log_path n/a n/a
  • string
step11_bd_run_config n/a n/a
  • string
step12_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step12_cpptraj_rms_output_traj_path n/a n/a
  • string
step12_cpptraj_rms_config n/a n/a
  • string
step13_dmd_run_output_crd_path n/a n/a
  • string
step13_dmd_run_output_log_path n/a n/a
  • string
step13_dmd_run_config n/a n/a
  • string
step14_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step14_cpptraj_rms_output_traj_path n/a n/a
  • string
step14_cpptraj_rms_config n/a n/a
  • string
step15_nma_run_output_crd_path n/a n/a
  • string
step15_nma_run_output_log_path n/a n/a
  • string
step15_nma_run_config n/a n/a
  • string
step16_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step16_cpptraj_rms_config n/a n/a
  • string
step17_cpptraj_convert_output_cpptraj_path n/a n/a
  • string
step17_cpptraj_convert_config n/a n/a
  • string
step18_nolb_nma_output_pdb_path n/a n/a
  • string
step18_nolb_nma_config n/a n/a
  • string
step19_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step19_cpptraj_rms_config n/a n/a
  • string
step20_cpptraj_convert_output_cpptraj_path n/a n/a
  • string
step20_cpptraj_convert_config n/a n/a
  • string
step21_imod_imode_output_dat_path n/a n/a
  • string
step21_imod_imode_config n/a n/a
  • string
step22_imod_imc_output_traj_path n/a n/a
  • string
step22_imod_imc_config n/a n/a
  • string
step23_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step23_cpptraj_rms_config n/a n/a
  • string
step24_cpptraj_convert_output_cpptraj_path n/a n/a
  • string
step24_cpptraj_convert_config n/a n/a
  • string
step26_make_ndx_output_ndx_path n/a n/a
  • string
step26_make_ndx_config n/a n/a
  • string
step27_gmx_cluster_output_pdb_path n/a n/a
  • string
step27_gmx_cluster_config n/a n/a
  • string
step28_cpptraj_rms_output_cpptraj_path n/a n/a
  • string
step28_cpptraj_rms_output_traj_path n/a n/a
  • string
step28_cpptraj_rms_config n/a n/a
  • string
step29_pcz_zip_output_pcz_path n/a n/a
  • string
step29_pcz_zip_config n/a n/a
  • string
step30_pcz_zip_output_pcz_path n/a n/a
  • string
step30_pcz_zip_config n/a n/a
  • string
step31_pcz_info_output_json_path n/a n/a
  • string
step32_pcz_evecs_output_json_path n/a n/a
  • string
step32_pcz_evecs_config n/a n/a
  • string
step33_pcz_animate_output_crd_path n/a n/a
  • string
step33_pcz_animate_config n/a n/a
  • string
step34_cpptraj_convert_output_cpptraj_path n/a n/a
  • string
step34_cpptraj_convert_config n/a n/a
  • string
step35_pcz_bfactor_output_dat_path n/a n/a
  • string
step35_pcz_bfactor_output_pdb_path n/a n/a
  • string
step35_pcz_bfactor_config n/a n/a
  • string
step36_pcz_hinges_output_json_path n/a n/a
  • string
step36_pcz_hinges_config n/a n/a
  • string
step37_pcz_hinges_output_json_path n/a n/a
  • string
step37_pcz_hinges_config n/a n/a
  • string
step38_pcz_hinges_output_json_path n/a n/a
  • string
step38_pcz_hinges_config n/a n/a
  • string
step39_pcz_stiffness_output_json_path n/a n/a
  • string
step39_pcz_stiffness_config n/a n/a
  • string
step40_pcz_collectivity_output_json_path n/a n/a
  • string
step40_pcz_collectivity_config n/a n/a
  • string

Steps

ID Name Description
step0_extract_model extract_model Extracts a model from a 3D structure.
step1_extract_chain extract_chain Extracts a chain from a 3D structure.
step2_cpptraj_mask cpptraj_mask Extracts a selection of atoms from a given cpptraj compatible trajectory.
step3_cpptraj_mask cpptraj_mask Extracts a selection of atoms from a given cpptraj compatible trajectory.
step4_concoord_dist concoord_dist Wrapper of the Concoord_dist software.
step5_concoord_disco concoord_disco Wrapper of the Concoord_disco software.
step6_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step7_cpptraj_convert cpptraj_convert Converts between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step8_prody_anm prody_anm Wrapper of the Prody software.
step9_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step10_cpptraj_convert cpptraj_convert Converts between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step11_bd_run bd_run Run Brownian Dynamics from FlexServ.
step12_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step13_dmd_run dmd_run Run Discrete Molecular Dynamics from FlexServ.
step14_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step15_nma_run nma_run Run Normal Mode Analysis from FlexServ.
step16_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step17_cpptraj_convert cpptraj_convert Converts between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step18_nolb_nma nolb_nma Wrapper of the Nolb software.
step19_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step20_cpptraj_convert cpptraj_convert Converts between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step21_imod_imode imod_imode Wrapper of the imods_imode software.
step22_imod_imc imod_imc Wrapper of the imods_imc software.
step23_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step24_cpptraj_convert cpptraj_convert Converts between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step26_make_ndx make_ndx Creates a GROMACS index file (NDX) from an input selection and an input GROMACS structure file.
step27_gmx_cluster gmx_cluster Clusters structures from a given GROMACS compatible trajectory.
step28_cpptraj_rms cpptraj_rms Calculates the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step29_pcz_zip pcz_zip Compress MD simulation trajectories with PCA suite.
step30_pcz_zip pcz_zip Compress MD simulation trajectories with PCA suite.
step31_pcz_info pcz_info Extract PCA info (variance, Dimensionality) from a compressed PCZ file.
step32_pcz_evecs pcz_evecs Extract PCA Eigen Vectors from a compressed PCZ file.
step33_pcz_animate pcz_animate Extract PCA animations from a compressed PCZ file.
step34_cpptraj_convert cpptraj_convert Converts between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step35_pcz_bfactor pcz_bfactor Extract residue bfactors x PCA mode from a compressed PCZ file.
step36_pcz_hinges pcz_hinges Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file.
step37_pcz_hinges pcz_hinges Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file.
step38_pcz_hinges pcz_hinges Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file.
step39_pcz_stiffness pcz_stiffness Extract PCA stiffness from a compressed PCZ file.
step40_pcz_collectivity pcz_collectivity Extract PCA collectivity (numerical measure of how many atoms are affected by a given mode) from a compressed PCZ file.

Outputs

ID Name Description Type
step0_extract_model_out1 output_structure_path Path to the output file
  • File
step1_extract_chain_out1 output_structure_path Path to the output file
  • File
step2_cpptraj_mask_out1 output_structure_path Path to the output file
  • File
step3_cpptraj_mask_out1 output_structure_path Path to the output file
  • File
step4_concoord_dist_out1 output_pdb_path Path to the output file
  • File
step4_concoord_dist_out2 output_gro_path Path to the output file
  • File
step4_concoord_dist_out3 output_dat_path Path to the output file
  • File
step5_concoord_disco_out1 output_traj_path Path to the output file
  • File
step5_concoord_disco_out2 output_rmsd_path Path to the output file
  • File
step5_concoord_disco_out3 output_bfactor_path Path to the output file
  • File
step6_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step7_cpptraj_convert_out1 output_cpptraj_path Path to the output file
  • File
step8_prody_anm_out1 output_pdb_path Path to the output file
  • File
step9_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step10_cpptraj_convert_out1 output_cpptraj_path Path to the output file
  • File
step11_bd_run_out1 output_crd_path Path to the output file
  • File
step11_bd_run_out2 output_log_path Path to the output file
  • File
step12_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step12_cpptraj_rms_out2 output_traj_path Path to the output file
  • File
step13_dmd_run_out1 output_crd_path Path to the output file
  • File
step13_dmd_run_out2 output_log_path Path to the output file
  • File
step14_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step14_cpptraj_rms_out2 output_traj_path Path to the output file
  • File
step15_nma_run_out1 output_crd_path Path to the output file
  • File
step15_nma_run_out2 output_log_path Path to the output file
  • File
step16_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step17_cpptraj_convert_out1 output_cpptraj_path Path to the output file
  • File
step18_nolb_nma_out1 output_pdb_path Path to the output file
  • File
step19_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step20_cpptraj_convert_out1 output_cpptraj_path Path to the output file
  • File
step21_imod_imode_out1 output_dat_path Path to the output file
  • File
step22_imod_imc_out1 output_traj_path Path to the output file
  • File
step23_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step24_cpptraj_convert_out1 output_cpptraj_path Path to the output file
  • File
step26_make_ndx_out1 output_ndx_path Path to the output file
  • File
step27_gmx_cluster_out1 output_pdb_path Path to the output file
  • File
step28_cpptraj_rms_out1 output_cpptraj_path Path to the output file
  • File
step28_cpptraj_rms_out2 output_traj_path Path to the output file
  • File
step29_pcz_zip_out1 output_pcz_path Path to the output file
  • File
step30_pcz_zip_out1 output_pcz_path Path to the output file
  • File
step31_pcz_info_out1 output_json_path Path to the output file
  • File
step32_pcz_evecs_out1 output_json_path Path to the output file
  • File
step33_pcz_animate_out1 output_crd_path Path to the output file
  • File
step34_cpptraj_convert_out1 output_cpptraj_path Path to the output file
  • File
step35_pcz_bfactor_out1 output_dat_path Path to the output file
  • File
step35_pcz_bfactor_out2 output_pdb_path Path to the output file
  • File
step36_pcz_hinges_out1 output_json_path Path to the output file
  • File
step37_pcz_hinges_out1 output_json_path Path to the output file
  • File
step38_pcz_hinges_out1 output_json_path Path to the output file
  • File
step39_pcz_stiffness_out1 output_json_path Path to the output file
  • File
step40_pcz_collectivity_out1 output_json_path Path to the output file
  • File

Version History

Version 2 (latest) Created 6th Jun 2023 at 11:10 by Genís Bayarri

Updated workflow descriptors


Frozen Version-2 70eb3d2

Version 1 (earliest) Created 31st May 2023 at 14:51 by Genís Bayarri

Initial commit


Frozen Version-1 ab62bfd
help Creators and Submitter
Citation
Hospital, A., & Bayarri, G. (2023). CWL Protein conformational ensembles generation. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.488.1
License
Other (Open)
Activity

Views: 3186   Downloads: 699

Created: 31st May 2023 at 14:51

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 366 KB