Skip to content

A script for downloading 3D protein models from various databases, aligning their sequences, and superimposing their structures. Generates a detailed report on the status of each protein.

License

Notifications You must be signed in to change notification settings

rodrigo2000m/PDBandModelsTool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDBandModelsTool

A script for downloading 3D protein models from various databases, aligning their sequences, and superimposing their structures. Generates a detailed report on the status of each protein.

Installation

PDBandModelsTool can be installed via git or downloading the files:

git clone <PDBandModelsTool-repo>

If you're using a Conda environment, you can install the necessary packages from an environment.yml file.

conda env create -f environment.yml
conda activate pdb_models_env

Quickstart

PDBandModelsTool allows automatic downloading of multiple protein structures using a FASTA file as input. The FASTA file can be downloaded from UniProt or a file generated with another tool can be used. If using a custom file, ensure that the file structure is as follows:

>sp|A0AVI4|TM129_HUMAN E3 ubiquitin-protein ligase TM129 OS=Homo sapiens OX=9606 GN=TMEM129 PE=1 SV=1
MDSPEVTFTLAYLVFAVCFVFTPNEFHAAGLTVQNLLSGWLGSEDAAFVPFHLRRTAATL
LCHSLLPLGYYVGMCLAASEKRLHALSQAPEAWRLFLLLAVTLPSIACILIYYWSRDRWA
CHPLARTLALYALPQSGWQAVASSVNTEFRRIDKFATGAPGARVIVTDTWVMKVTTYRVH
VAQQQDVHLTVTESRQHELSPDSNLPVQLLTIRVASTNPAVQAFDIWLNSTEYGELCEKL
RAPIRRAAHVVIHQSLGDLFLETFASLVEVNPAYSVPSSQELEACIGCMQTRASVKLVKT
CQEAATGECQQCYCRPMWCLTCMGKWFASRQDPLRPDTWLASRVPCPTCRARFCILDVCT
VR

It is important that the file contains the unique and stable entry identifier and the correct sequence, as this information is used to search for structures.

Based on the data in the FASTA file, all structures related to the given sequence are searched in PDB, AlphaFold, Swiss Model, and ESM Atlas.

Run the script with:

python main.py /absolute/path/to/file.fasta

When the script is executed, a folder named outputs will be created at the same level as the input FASTA file.

Note: Ensure there is no pre-existing folder named outputs in the same directory, as this will cause the code to fail.

Contents of the outputs Folder:

  • Log File: Tracks the execution process.
  • Report File: Summarizes the structures retrieved.
  • Subfolders:
    • AF_models: Contains PDB files from AlphaFold.
    • ESM_Atlas_models: Contains PDB files from ESM Atlas.
    • pdb_structures: Contains PDB files from PDB.
    • swiss_models: Contains PDB files from Swiss Model.

All PDB files are named using their unique and stable entry identifier.

Contact

If you have any questions, comments, or suggestions, feel free to contact us via email:

Email: [email protected]


License

This project is licensed under the MIT License, meaning anyone is free to use, modify, and distribute the code under the terms of this license.

About

A script for downloading 3D protein models from various databases, aligning their sequences, and superimposing their structures. Generates a detailed report on the status of each protein.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages