Anàlisis estructural

Entorn de treball

uv init prody -p 3.11 --name prody-project
cd prody
uv add prody

Fitxers PDB

Aquest exemple demostra com utilitzar el cercador flexible de PDB, fetchPDB(). Les entrades vàlides són l’identificador PDB, per exemple 2k39, o una llista d’identificadors PDB, per exemple ["2k39", "1mkp", "1etc"]. Els fitxers PDB comprimits (pdb.gz) es desaran al directori de treball actual o a una carpeta específica.

Obtenir fitxers PDB

Un únic fitxer

La funció fetchPDB retornarà un nom de fitxer si la descàrrega és exitosa:

import os
from prody.proteins import localpdb

file = localpdb.fetchPDB("5uoj", folder="data")
assert file == 'data\\5uoj.pdb.gz'

Encara que el fitxer estigui comprimit, el pots visualitzar directament amb PyMOL

Múltiples fitxers

Aquesta funció també accepta una llista d’identificadors PDB:

files = localpdb.fetchPDB(["5uoj", "1r39", "@!~#"], folder="data")
assert files == ['data\\5uoj.pdb.gz', 'data\\1r39.pdb.gz', None]

Per a les descàrregues fallides, es retornarà None (o la llista contindrà un element None).

ProDy et donarà un informe dels resultats de descàrrega i retornarà una llista de noms de fitxers. L’informe es mostrarà a la pantalla, que en aquest cas seria:

@> WARNING '@!~#' is not a valid identifier.
@> Connecting wwPDB FTP server RCSB PDB (USA).
@> Downloading PDB files via FTP failed, trying HTTP.
@> 5uoj downloaded (data\5uoj.pdb.gz)
@> 1r39 downloaded (data\1r39.pdb.gz)
@> PDB download via HTTP completed (2 downloaded, 0 failed).

Analitzar fitxers PDB

ProDy ofereix un analitzador PDB ràpid i flexible, parsePDB(). L’analitzador es pot utilitzar per llegir subconjunts ben definits d’àtoms, cadenes específiques o models (en estructures NMR) per millorar el rendiment. Aquest exemple mostra com utilitzar les opcions flexibles d’anàlisi.

S’accepten tres tipus d’entrada de l’usuari:

  • Ruta del fitxer PDB, per exemple "../1MKP.pdb"
  • Ruta del fitxer PDB comprimit (gzipped), per exemple "5uoj.pdb.gz"
  • Identificador PDB, per exemple 2k39

La sortida és una instància AtomGroup que emmagatzema dades atòmiques i es pot utilitzar com a entrada per a funcions i classes per a l’anàlisi de dinàmica.

Parse a file

You can parse PDB files by passing a filename (gzipped files are handled).

We do so after downloading a PDB file:

Acetylcholinesterase

from prody import AtomGroup
from prody.proteins import localpdb, pdbfile

# ...

## Parse PDB files

file = localpdb.fetchPDB("1OCE") # Acetylcholinesterase
atoms: AtomGroup = pdbfile.parsePDB(file)
assert atoms.getTitle() == "1oce"

In [6]

In [6]: fetchPDB('5uoj')
Out[6]: '5uoj.pdb.gz'

In [7]: atoms = parsePDB('5uoj')

In [8]: atoms
Out[8]: <AtomGroup: 5uoj (3138 atoms)>

Parser returns an AtomGroup instance.

Also note that the time it took to parse the file is printed on the screen. This includes the time that it takes to evaluate coordinate lines and build an AtomGroup instance and excludes the time spent on reading the file from disk.

#### Use an identifier

PDB files can be parsed by passing simply an identifier. Parser will look for a PDB file that matches the given identifier in the current working directory. If a matching file is not found, ProDy will downloaded it from PDB FTP server automatically and saved it in the current working directory.
In [9]

In [9]: atoms = parsePDB('1mkp')

In [10]: atoms
Out[10]: <AtomGroup: 1mkp (1183 atoms)>

Subsets of atoms

Parser can be used to parse backbone or Cα atoms:
In [11]

In [11]: backbone = parsePDB('1mkp', subset='bb')

In [12]: backbone
Out[12]: <AtomGroup: 1mkp_bb (576 atoms)>

In [13]: calpha = parsePDB('1mkp', subset='ca')

In [14]: calpha
Out[14]: <AtomGroup: 1mkp_ca (144 atoms)>

Specific chains

Parser can be used to parse a specific chain from a PDB file:
In [15]

In [15]: chA = parsePDB('3mkb', chain='A')

In [16]: chA
Out[16]: <AtomGroup: 3mkbA (1198 atoms)>

In [17]: chC = parsePDB('3mkb', chain='C')

In [18]: chC
Out[18]: <AtomGroup: 3mkbC (1189 atoms)>

Multiple chains can also be parsed in the same way:
In [19]

In [19]: chAC = parsePDB('3mkb', chain='AC')

In [20]: chAC
Out[20]: <AtomGroup: 3mkbAC (2387 atoms)>

Specific models

Parser can be used to parse a specific model from a file:
In [21]

In [21]: model1 = parsePDB('2k39', model=10)

In [22]: model1
Out[22]: <AtomGroup: 2k39 (1231 atoms)>

Alternate locations

When a PDB file contains alternate locations for some of the atoms, by default alternate locations with indicator A are parsed.
In [23]

In [23]: altlocA = parsePDB('1ejg')

In [24]: altlocA
Out[24]: <AtomGroup: 1ejg (637 atoms)>

Specific alternate locations can be parsed as follows:
In [25]

In [25]: altlocB = parsePDB('1ejg', altloc='B')

In [26]: altlocB
Out[26]: <AtomGroup: 1ejg (634 atoms)>

Note that in this case number of atoms are different between the two atom groups. This is because the residue types of atoms with alternate locations are different.

Also, all alternate locations can be parsed as follows:
In [27]

In [27]: all_altlocs = parsePDB('1ejg', altloc=True)

In [28]: all_altlocs
Out[28]: <AtomGroup: 1ejg (637 atoms; active #0 of 3 coordsets)>

Note that this time parser returned three coordinate sets. One for each alternate location indicator found in this file (A, B, C). When parsing multiple alternate locations, parser will expect for the same residue type for each atom with an alternate location. If residue names differ, a warning message will be printed.
Composite arguments

Parser can be used to parse coordinates from a specific model for a subset of atoms of a specific chain:
In [29]

In [29]: composite = parsePDB('2k39', model=10, chain='A', subset='ca')

In [30]: composite
Out[30]: <AtomGroup: 2k39A_ca (76 atoms)>

Header data

PDB parser can be used to extract header data in a dict from PDB files as follows:
In [31]

In [31]: atoms, header = parsePDB('1ubi', header=True)

In [32]: list(header)
Out[32]:
['A',
'related_entries',
'sheet',
'classification',
'reference',
'title',
'sheet_range',
'polymers',
'resolution',
'space_group',
'helix_range',
'chemicals',
'experiment',
'helix',
'version',
'authors',
'identifier',
'deposition_date',
'biomoltrans']

In [33]: header['experiment']
Out[33]: 'X-RAY DIFFRACTION'

In [34]: header['resolution']
Out[34]: 1.8

It is also possible to parse only header data by passing model=0 as an argument:
In [35]

In [35]: header = parsePDB('1ubi', header=True, model=0)

or using parsePDBHeader() function:
In [36]

In [36]: header = parsePDBHeader('1ubi')

Write PDB file

PDB files can be written using writePDB() function. This example shows how to write PDB files for AtomGroup instances and subsets of atoms.
Write all atoms

All atoms in an AtomGroup can be written in PDB format as follows:
In [37]

In [37]: writePDB('MKP3.pdb', atoms)
Out[37]: 'MKP3.pdb'

Upon successful writing of PDB file, filename is returned.
Write a subset

It is also possible to write subsets of atoms in PDB format:
In [38]

In [38]: alpha_carbons = atoms.select('calpha')

In [39]: writePDB('1mkp_ca.pdb', alpha_carbons)
Out[39]: '1mkp_ca.pdb'

In [40]: backbone = atoms.select('backbone')

In [41]: writePDB('1mkp_bb.pdb', backbone)
Out[41]: '1mkp_bb.pdb'

TODO

Structural Analysis