A script for downloading 3D protein models from various databases, aligning their sequences, and superimposing their structures. Generates a detailed report on the status of each protein.
PDBandModelsTool can be installed via git or downloading the files:
git clone <PDBandModelsTool-repo>
If you're using a Conda environment, you can install the necessary packages from an environment.yml
file.
conda env create -f environment.yml
conda activate pdb_models_env
PDBandModelsTool allows automatic downloading of multiple protein structures using a FASTA file as input. The FASTA file can be downloaded from UniProt or a file generated with another tool can be used. If using a custom file, ensure that the file structure is as follows:
>sp|A0AVI4|TM129_HUMAN E3 ubiquitin-protein ligase TM129 OS=Homo sapiens OX=9606 GN=TMEM129 PE=1 SV=1
MDSPEVTFTLAYLVFAVCFVFTPNEFHAAGLTVQNLLSGWLGSEDAAFVPFHLRRTAATL
LCHSLLPLGYYVGMCLAASEKRLHALSQAPEAWRLFLLLAVTLPSIACILIYYWSRDRWA
CHPLARTLALYALPQSGWQAVASSVNTEFRRIDKFATGAPGARVIVTDTWVMKVTTYRVH
VAQQQDVHLTVTESRQHELSPDSNLPVQLLTIRVASTNPAVQAFDIWLNSTEYGELCEKL
RAPIRRAAHVVIHQSLGDLFLETFASLVEVNPAYSVPSSQELEACIGCMQTRASVKLVKT
CQEAATGECQQCYCRPMWCLTCMGKWFASRQDPLRPDTWLASRVPCPTCRARFCILDVCT
VR
It is important that the file contains the unique and stable entry identifier and the correct sequence, as this information is used to search for structures.
Based on the data in the FASTA file, all structures related to the given sequence are searched in PDB, AlphaFold, Swiss Model, and ESM Atlas.
Run the script with:
python main.py /absolute/path/to/file.fasta
When the script is executed, a folder named outputs
will be created at the same level as the input FASTA file.
Note: Ensure there is no pre-existing folder named
outputs
in the same directory, as this will cause the code to fail.
- Log File: Tracks the execution process.
- Report File: Summarizes the structures retrieved.
- Subfolders:
AF_models
: Contains PDB files from AlphaFold.ESM_Atlas_models
: Contains PDB files from ESM Atlas.pdb_structures
: Contains PDB files from PDB.swiss_models
: Contains PDB files from Swiss Model.
All PDB files are named using their unique and stable entry identifier.
If you have any questions, comments, or suggestions, feel free to contact us via email:
Email: [email protected]
This project is licensed under the MIT License, meaning anyone is free to use, modify, and distribute the code under the terms of this license.