User manual for GMX pipelines

Disclaimer - WIP

This manual for GMX Molecular Dynamics pipelines is work-in-progress. If you notice deprecated description or feature and mismatches with the deployed pipelines in the LENS^ai portal, please reach out to us.

The GMX pipelines refers to different Molecular Dynamics (MD) pipeline which simulate proteins in solvent in specific context:

· GMX Protein Molecular Dynamics: performs MD simulations of proteins in explicitly modelled solvent without ligands, glycans, DNA, RNA, … (amino acids only).

· GMX Membrane-protein Molecular Dynamics: performs MD simulations of transmembrane proteins (and peripherical proteins) in explicitly modelled solvent along with a cell membrane explicitly modelled without ligands, DNA, RNA, … (amino acids only.

· GMX Glycoprotein Molecular Dynamics: performs MD simulations of glycosylated proteins in explicitly modelled solvent without ligands, DNA, RNA, … (amino acids only).

· GMX Constant-pH Protein Molecular Dynamics: performs MD simulations of proteins in explicitly modelled solvent at constant pH level without ligands, glycans, DNA, RNA, … (amino acids only). The main difference with the standard GMX Protein Molecular Dynamics is that the protonation state of residues are updated during the simulation to maintain the pH level constant.

In addition to the computation of the simulated trajectory, the MD pipelines output post-processed data, such as the Root Mean Square Deviation of atomic coordinate with respect to the starting conformation, the number of contacts between domains, etc.

What is a Molecular Dynamics simulation?

A MD simulation aims to mimics an ensemble of atoms/molecules in interaction with each other. In the context of the GMX pipelines, they are mainly proteins within water, along with some salt ions to simulate a physiological environment. The workflow is as follows:

After submitting an input molecular model, the input is analysed for conformity. Then the atoms are placed inside a simulation box whose size depends on the system size. This simulation box is then filled up with water atoms, and also salt ions (Na⁺ and Cl^-, by default). Periodic boundary conditions are applied, so that all water molecules interact with their neighbouring images (in principle, the size of the box should forbid interaction between images of the proteins themselves). This, combined with a force-field, represent a physical system which can evolves over time. This system can be separated into a topology and coordinates components. The topology do not vary over time, while the coordinate can vary over time (………)

Uploading an input dataset

The GMX pipelines accepted as input PDB files. The PDB file format is well documented in online resources, such as

PDB101: Learn: Guide to Understanding PDB Data: Introduction. For Molecular Dynamics, the most important data is contained within ATOM fields, which contains the atomic coordinates.

Assuming you have a valid PDB model, you can launch MD simulations in the LENS^ai portal after uploading the PDB file as a remote Dataset in the LENS^ai portal.

On the main LENS^ai page, click on the button in the upper left side, then on Settings.

Then click on Datasets. You will have an overview of available datasets, sorted by the latest update. Then, you can click on + ADD DATASET.

Different options are available, but for the sake of simplicity, we will choose Upload dataset from local files. This means that the input file is located in a drive accessible from your computer. If you upload a dataset from local files, you need to add files yourself with the ADD FILES or ADD FOLDER buttons. In addition, you need to specify a Dataset name, which should be a unique logical name with respect to your other datasets.

Before uploading a the Dataset, you can check its content, and click on the checkbox for agreeing of an additional monthly payment depending on dataset size.

Once the Dataset is uploaded, it should appear along with the other Datasets available.

With this, you now have a LENS^ai Dataset with a single PDB file. You can now launch jobs from the GMX pipelines.

Submitting a job

Example with GMX Protein Molecular Dynamics

Launching a simulation

By clicking on →RUN JOB , the job submission window opens, with a bunch of parameters to be set by the user.

The parameters are separated between standards inputs and expert settings:

Inputs:

description: description of the job to be run.
comment: add a comment.
goals: define of the goal for the job run.
input_dataset_id: LensAI Dataset with one PDB file of the system to be simulated. *required

Beware: the Dataset must be chosen, not the PDB file itself.

WARNING: The model should be clean; there should be no missing residues/gap (ideally), residues should be properly named.

temperature: temperature of the simulation, in Kelvins. *required
box_edge: size of the simulation box (distance between edges, in nanometers). *required
box_type: simulation box shape. To be chosen between dodecahedron, cubic, octahedron or tricilinic.
ph: tweak the protonation state of residues according to pH value. For standard simulation, use 'no'. WARNING: this will not lead to a constant pH simulation. *required
simulation_time: duration of the simulation, in nanoseconds. *required
receptor_chain: chain identifier(s) in the PDB file of the target (default 'no', skip interaction analysis). If specified, additional post-processing analyses will be perform to derive interaction between partners. For instance, if a trimer is submitted with chain A, B and C, and if A is set as the receptor_chain, then the partners will be A and B+C, and the interactions will be computed between them. *required
output_dataset_logical_name: name of the output dataset. It should not contains any space (underscore _ is accepted). *required

Expert Settings:

mmpbsa_start: ending trajectory frame for MM-GBSA calculations, *required
mmpbsa_end: starting trajectory frame for MM-GBSA calculations, *required
mmpbsa_interval: interval between frames for MM-GBSA calculations, *required

For the input_dataset_id parameter, you need to select the corresponding Dataset, not the actual PDB file in the Dataset.

Once you have filled the mandatory inputs, you can launch the job.

You can check the status of the job by looking in the Jobs page → History in the user Settings.

Once the job is finished, the outputs will be available as a Dataset with the name specified in the output_dataset_logical_name input field.

Checking out the results

Similarly to the rest of this document, the content of this Section is susceptible to be regularly updated.

For a standard run, you will find the following output files in the output Dataset. This list is not exhaustive, but highlight the most important files for analysis.

md.log

The log of the MD simulation, it may contains important information of your simulation.

md.tpr

The tpr file extension stands for portable binary run input file. This file contains the starting structure of your simulation, the molecular topology and all the simulation parameters. Because this file is in binary format it cannot be read with a normal editor

md.xtc

The md.xtc is the MD trajectory. It contains evolution of all atomic coordinates over time.

md_fit.xtc

This is the same MD trajectory as md.xtc, except that it has been adapted so that the molecule does not jump outside the simulation box.

md_movie.mp4

This is a short visualisation of the trajectory.

rmsd.svg

The RMSD plot (Root Mean Squared Distance) is the measure of the RMSD of atoms with respect of the starting conformation of the molecule in the system (without the solvent), in nanometers (nm). It should start at 0 nm (RMSD between the starting conformation with itself), then it should deviate and oscillate around some values. This indicates how much the structure deviated along the trajectory, the flatter region of the plot are characteristic of a stable conformation, although it is not guaranteed that the system will remains as such along the trajectory, as different conformational local minima can be explored. If the RMSD vary too much or increases, this is a strong indication of an unstable system.

rmsf.svg

The RMSF (Root Mean Square Fluctuation) plot represent the time average of the RMSD per residue. Instead of representing a variation over time from a reference structure, the RMSF reveals which amino-acid position fluctuate the most, which is a signature of high mobility and flexibility.

md0.pdb

This PDB file contains a structural model of the system at the start of the simulation.

md_last.pdb

This PDB file contains a structural model of the system at the end of the simulation (latest trajectory frame).

clusters.pdb

A PDB file containing conformational cluster representatives, highlighting important conformational states along the trajectory.

representative_1.pdb

This PDB file contains a structural model, which has been obtained by performing conformational clustering, and is the most representative of the overall trajectory.

If the simulated system is a protein complex, and that a receptor chain is specified as input, then additional files will appears in the output.

interface.matrix@6.0_Ang.svg

This is a heatmap plot of the contacts frequencies, defined with a 6 Angs. interatomic distance threshold between heavy atoms.

interface.overall@6.0_Ang.xlsx

The contact frequencies, in Excel format.

time_trace@6.0_Ang.svg

This is the contact distance of residue over time, along the trajectory.

contactnumber.svg

The number of contacts at each trajectory frame.

interface.overall@6.0_Ang.as_bfactors.pdb

The contact frequencies mapped back to the atomic b-factor. Can help visualise interacting residues in 3D.

FINAL_RESULTS_MMPBSA.dat

This file contains the binging energy data (statistics) computed from MM-(PB/GB)SA.

FINAL_DECOMP_MMPBSA.dat

This file contains the binging energy data, computed from MM-(PB/GB)SA decomposed per residues.

FINAL_DECOMP_MMPBSA_total_energy_decomposition_GB_L.svg

This is a plot of the binding energy decomposition per residue, for the partner defined as ligand

FINAL_DECOMP_MMPBSA_total_energy_decomposition_GB_R.svg

This is a plot of the binding energy decomposition per residue, for the partner defined as receptor

FINAL_DECOMP_MMPBSA_total_energy_decomposition_GB_L.svg

This is the raw data of the binding energy decomposition per residue, for the partner defined as ligand

FINAL_DECOMP_MMPBSA_total_energy_decomposition_GB_R.svg

This is the raw data of the binding energy decomposition per residue, for the partner defined as receptor.

Other GMX pipelines

Currently, there are 4 GMX pipelines for molecular dynamics, as mentioned in the introduction.

Most of them are set-up in a similar way as GMX Protein MD, but they differs slightly in term of parameters and outputs.

For instance, GMX Constant-pH Protein Molecular Dynamics does not have binding energy information.