Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain.jl #28

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ea40dc9
fiz uma mudança
ana-bblima Nov 19, 2024
b9b3c65
desfiz a mudança
ana-bblima Nov 19, 2024
6536de6
include Chain.jl and export Chain, eachchain
ana-bblima Nov 19, 2024
e59ed9d
first implementation of Chain iterator
ana-bblima Nov 19, 2024
0214299
adding comments in code (examples)
ana-bblima Nov 20, 2024
3c41290
new_examples
ana-bblima Nov 20, 2024
3647b4e
testitem used to test the code
ana-bblima Nov 20, 2024
3c61bcf
final version
ana-bblima Nov 20, 2024
63b4813
protein_test.pdb
ana-bblima Nov 20, 2024
f57d9f8
fix new pdb path
ana-bblima Nov 20, 2024
639d600
removed "" in the path of the the pdb
ana-bblima Nov 20, 2024
cbc6dfb
expport chains in Main.PDBTools
ana-bblima Nov 20, 2024
2696c71
removing $ fro comments
ana-bblima Nov 21, 2024
308c50f
final_version
ana-bblima Nov 21, 2024
7cd321a
Merge branch 'main' into chains
ana-bblima Nov 22, 2024
8ac108b
remove extra space on docs
ana-bblima Nov 22, 2024
147ffa7
testing mass command in Residue.jl
ana-bblima Nov 22, 2024
55e43ce
correct mass value in the tests and function documentation
ana-bblima Nov 22, 2024
93168e3
function documentation and more tests
ana-bblima Nov 22, 2024
f61fc4b
documentation of chain and eachchain
ana-bblima Nov 26, 2024
1223f15
function documentation for eachchain and more tests
ana-bblima Nov 26, 2024
2700615
removing inappropriate dependencys
ana-bblima Nov 26, 2024
53f2b4a
mass value in test fixed
ana-bblima Nov 26, 2024
ec79352
test item fixed
ana-bblima Nov 26, 2024
64b770d
Merge branch 'main' into chains
ana-bblima Nov 26, 2024
0a8d908
another throw argument error for test_item
ana-bblima Nov 27, 2024
e6227e0
remove LiveServer wrong dependency
ana-bblima Nov 27, 2024
0fd9424
improve error messages of getindex
ana-bblima Dec 5, 2024
0f8e812
Merge branch 'm3g:main' into chains
ana-bblima Dec 5, 2024
25253db
fix errors
ana-bblima Dec 5, 2024
38f8ae2
testing the @show messages in the code
ana-bblima Dec 6, 2024
0aae70a
created a new model for the pdb
ana-bblima Dec 6, 2024
98c6eb0
rename CHAINSPDB create a second model
ana-bblima Dec 6, 2024
49c957f
changed the path to chains.pdb
ana-bblima Dec 6, 2024
8798474
removing different segment name
ana-bblima Dec 6, 2024
41a35ec
funtion last implemented and new documentation
ana-bblima Dec 6, 2024
cabb40a
new examples and better descriptions
ana-bblima Dec 6, 2024
e561946
fixing documentation
ana-bblima Dec 6, 2024
4d21f36
documentation of function eachchain
ana-bblima Dec 7, 2024
e634112
modification in function documentations
ana-bblima Dec 7, 2024
e4e25f1
removing space in documentation
ana-bblima Dec 7, 2024
e083c5c
beter function documentation for eachchain
ana-bblima Dec 7, 2024
f21ad4a
modifying atom properties example
ana-bblima Dec 7, 2024
0032df5
better documentation for eachresidue
ana-bblima Dec 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Format = "1fa38f19-a742-5d3f-a2b9-30dd87b9d5f8"
InlineStrings = "842dd82b-1e85-43dc-bf29-5d0ee9dffc48"
InteractiveUtils = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
Expand All @@ -28,6 +29,7 @@ Format = "1.3"
InlineStrings = "1.4.2"
InteractiveUtils = "1.9"
LinearAlgebra = "1.9"
LiveServer = "1.4.0"
lmiq marked this conversation as resolved.
Show resolved Hide resolved
OrderedCollections = "1.6.3"
Parameters = "0.12"
PrecompileTools = "1.2.1"
Expand Down
126 changes: 125 additions & 1 deletion docs/src/selections.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ input, and returns `true` or `false` depending on the conditions required for th

## Iterate over residues (or molecules)

The `eachresidue` iterator allows iteration over the resiudes of a structure (in PDB files distinct molecules are associated to different residues, thus this iterates similarly over the molecules of a structure). For example:
The `eachresidue` iterator allows iteration over the residues of a structure (in PDB files distinct molecules are associated to different residues, thus this iterates similarly over the molecules of a structure). For example:

```jldoctest
julia> using PDBTools
Expand Down Expand Up @@ -260,6 +260,130 @@ resname
residuename
```

## Iterate over chains

The `eachchain` iterator allows for iteration over the chains of a structure. A PDB file may contain more than one protein monomer listed as different chains, thus this iterator behaves similarly to iterating over each monomer of a structure. For example:


```jldoctest
julia> using PDBTools

julia> ats = read_pdb(PDBTools.CHAINSPDB);

julia> chain.(eachchain(ats))
3-element Vector{InlineStrings.String3}:
"A"
"B"
"C"

```
The command `chain` provides the name of all the chains stored by the iterator `eachchain`. In this example, the PDB loaded has 3 chains identified in alphabetical order with the letters A, B and C.

The chains can be stored in diferent variables. In the example below, all the atoms belonging to chain A are stored in the variable chain_A with the command `first`, which selects the first element of the iterator `eachchain`. The command `last` and the indexing list is also supported. However, this type of indexing can only be used in a vector of elements `Atom`, so the chains must first to be collected in an array.

```julia-repl
julia> using PDBTools

ats = read_pdb(PDBTools.CHAINSPDB);

julia> chain_A = first(eachchain(ats))
Chain of name A with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ASP A 1 1 133.978 119.386 -23.646 1.00 0.00 1 ASYN 1
2 CA ASP A 1 1 134.755 118.916 -22.497 1.00 0.00 1 ASYN 2
3 C ASP A 1 1 135.099 117.439 -22.652 1.00 0.00 1 ASYN 3
46 HD22 LEU A 3 3 130.704 113.003 -27.586 1.00 0.00 1 ASYN 46
47 HD23 LEU A 3 3 130.568 111.868 -26.242 1.00 0.00 1 ASYN 47
48 O LEU A 3 3 132.066 112.711 -21.739 1.00 0.00 1 ASYN 48

julia> chains = collect(eachchain(ats))
Array{Chain,1} with 3 chains.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isto não tínhamos mudado para ˋtypeof(chain)ˋ na função, de forma que aparece ˋVector{Chain}ˋ no ˋshowˋ?


julia> chain_B = chains[2]
Chain of name B with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
49 N ASP B 4 4 135.661 123.866 -22.311 1.00 0.00 1 ASYN 49
50 CA ASP B 4 4 136.539 123.410 -21.227 1.00 0.00 1 ASYN 50
51 C ASP B 4 4 137.875 122.934 -21.788 1.00 0.00 1 ASYN 51
94 HD22 LEU B 6 6 139.485 119.501 -16.418 1.00 0.00 1 ASYN 94
95 HD23 LEU B 6 6 138.780 120.216 -17.864 1.00 0.00 1 ASYN 95
96 O LEU B 6 6 141.411 117.975 -21.923 1.00 0.00 1 ASYN 96

julia> chain_C = last(chains)
Chain of name C with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
97 N ASP C 7 7 137.110 128.213 -20.946 1.00 0.00 1 ASYN 97
98 CA ASP C 7 7 137.938 127.668 -19.879 1.00 0.00 1 ASYN 98
99 C ASP C 7 7 137.485 128.196 -18.521 1.00 0.00 1 ASYN 99
142 HD22 LEU C 9 9 141.370 132.570 -11.550 1.00 0.00 1 ASYN 142
143 HD23 LEU C 9 9 142.275 131.839 -10.225 1.00 0.00 1 ASYN 143
144 O LEU C 9 9 141.311 127.197 -12.279 1.00 0.00 1 ASYN 144


```

Any changes made to the atoms of a chain variable will overwrite the properties of the original atoms. In the example below, the occupancy column of the atoms in chain_A is set to the value of 0.00. When calling the original ats vector, it is possible to see that the occupancy column values of the same set of atoms changed from 1.00 to 0.00.

```julia-repl
julia> using PDBTools

julia> ats = read_pdb(PDBTools.CHAINSPDB)
Vector{Atom{Nothing}} with 144 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ASP A 1 1 133.978 119.386 -23.646 1.00 0.00 1 ASYN 1
2 CA ASP A 1 1 134.755 118.916 -22.497 1.00 0.00 1 ASYN 2
3 C ASP A 1 1 135.099 117.439 -22.652 1.00 0.00 1 ASYN 3
142 HD22 LEU C 9 9 141.370 132.570 -11.550 1.00 0.00 1 ASYN 142
143 HD23 LEU C 9 9 142.275 131.839 -10.225 1.00 0.00 1 ASYN 143
144 O LEU C 9 9 141.311 127.197 -12.279 1.00 0.00 1 ASYN 144

julia> chain_A = first(eachchain(ats));

julia> for atom in chain_A
atom.occup = 0.00
end

julia> ats
Vector{Atom{Nothing}} with 144 atoms with fields:
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ASP A 1 1 133.978 119.386 -23.646 0.00 0.00 1 ASYN 1
2 CA ASP A 1 1 134.755 118.916 -22.497 0.00 0.00 1 ASYN 2
3 C ASP A 1 1 135.099 117.439 -22.652 0.00 0.00 1 ASYN 3
142 HD22 LEU C 9 9 141.370 132.570 -11.550 1.00 0.00 1 ASYN 142
143 HD23 LEU C 9 9 142.275 131.839 -10.225 1.00 0.00 1 ASYN 143
144 O LEU C 9 9 141.311 127.197 -12.279 1.00 0.00 1 ASYN 144

```

When iterating over chains, it is possible to make equal changes along every chain. For instance, each chain in the example below starts with the aspartic acid residue, ASP, but if one wishes to equally rename only the first ASP in each chain to its protonated state name, ASPP - as it is implemented in the CHARMM22 force field -, one would do as follows:


```julia-repl
julia> using PDBTools

ats = read_pdb(PDBTools.CHAINSPDB);

julia> for chx in eachchain(ats)
for res in first(eachresidue(chx))
res.resname = "ASPP"
end
end

julia> count(res -> resname(res) == "ASPP", eachresidue(ats))
3

```

```@docs
Chain
eachchain
```

## Using VMD

[VMD](https://www.ks.uiuc.edu/Research/vmd/) is a very popular and
Expand Down
200 changes: 200 additions & 0 deletions src/Chain.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
"""
Chain(atoms::AbstractVector{<:Atom}, range::UnitRange{Int})

Creates a Chain data structure. It contains two fields: `atoms` which is a vector of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

não acho que faça sentido descrever o conteúdo dos campos da estrutura (sâo internos).

`Atom` elements, and `range`, which indicates which atoms of the `atoms` vector
compose the chain.

The Chain structure carries the properties of the atoms it contains, but it does not
copy the original vector of atoms. This means that any changes made in the Chain
structure atoms, will overwrite the original vector of atoms.

# Examples

```jldoctest
julia> using PDBTools

ana-bblima marked this conversation as resolved.
Show resolved Hide resolved
julia> ats = read_pdb(PDBTools.CHAINSPDB);

julia> chain.(eachchain(ats))
3-element Vector{InlineStrings.String3}:
"A"
"B"
"C"

julia> chains = collect(eachchain(ats))
Array{Chain,1} with 3 chains.

julia> chains[2].range
49:96

julia> chains[1]
Chain of name A with 48 atoms.
index name resname chain resnum residue x y z occup beta model segname index_pdb
1 N ASP A 1 1 133.978 119.386 -23.646 1.00 0.00 1 ASYN 1
2 CA ASP A 1 1 134.755 118.916 -22.497 1.00 0.00 1 ASYN 2
3 C ASP A 1 1 135.099 117.439 -22.652 1.00 0.00 1 ASYN 3
46 HD22 LEU A 3 3 130.704 113.003 -27.586 1.00 0.00 1 ASYN 46
47 HD23 LEU A 3 3 130.568 111.868 -26.242 1.00 0.00 1 ASYN 47
48 O LEU A 3 3 132.066 112.711 -21.739 1.00 0.00 1 ASYN 48

julia> mass(chains[1])
353.37881000000016
```

"""
@kwdef struct Chain{T<:Atom,Vec<:AbstractVector{T}} <: AbstractVector{T}
atoms::Vec
range::UnitRange{Int}
chain::String3
model::Int32
segname::String7
end

name(chain::Chain) = chain.chain
chain(chain::Chain) = chain.chain
model(chain::Chain) = chain.model
segname(chain::Chain) = chain.segname
mass(chain::Chain) = mass(@view chain.atoms[chain.range])

function Chain(atoms::AbstractVector{<:Atom}, range::UnitRange{<:Integer})
i = first(range)
if any(atoms[j].chain != atoms[i].chain for j in range)
error("Range $range does not correspond to a single protein chain.")
end
Chain(
atoms = atoms,
range = range,
chain = chain(atoms[i]),
model = model(atoms[i]),
segname = segname(atoms[i]),
)
end

Chain(atoms::AbstractVector{<:Atom}) = Chain(atoms, 1:length(atoms))

function Base.getindex(chain::Chain, i::Integer)
if i <= 0 || i > length(chain)
throw(ArgumentError("Index must be in 1:$(length(chain)). Attempted to access index $i."))
end
# Calculate the actual index in the atoms array
atom_index = first(chain.range) + i - 1
return chain.atoms[atom_index]
end

#
# Structure and function to define the eachchain iterator
#
struct EachChain{T<:AbstractVector{<:Atom}}
atoms::T
end

"""
eachchain(atoms::AbstractVector{<:Atom})

Iterator for the chains of a selection.

### Example

```julia-repl
julia> ats = read_pdb(PDBTools.CHAINSPDB);

julia> length(eachchain(ats))
3

julia> for chain in eachchain(ats)
println(chain.range)
end
1:48
49:96
97:144

```

"""
eachchain(atoms::AbstractVector{<:Atom}) = EachChain(atoms)
ana-bblima marked this conversation as resolved.
Show resolved Hide resolved

# Collect chains default constructor
Base.collect(c::EachChain) = collect(Chain, c)
Base.length(chains::EachChain) = sum(1 for chain in chains)
Base.firstindex(chains::EachChain) = 1
Base.lastindex(chains::EachChain) = length(chains)

function Base.getindex(::EachChain, ::Integer)
throw(ArgumentError("""\n
The eachchain iterator does not support indexing.
Use collect(eachchain(atoms)) to get an indexable list of chains.

"""))
end

Base.size(chain::Chain) = (length(chain.range),)
Base.length(chain::Chain) = length(chain.range)
Base.eltype(::Chain{T}) where {T} = T

# Iterate over residues of a structure
#
function Base.iterate(chains::EachChain, current_atom=firstindex(chains.atoms))
current_atom > length(chains.atoms) && return nothing
next_atom = current_atom + 1
while next_atom <= length(chains.atoms) &&
same_chain(chains.atoms[current_atom], chains.atoms[next_atom])
next_atom += 1
end
return (Chain(chains.atoms, current_atom:next_atom-1), next_atom)
end

# Iterate over atoms of one residue
#
function Base.iterate(chain::Chain, current_atom=nothing)
first_atom = index(first(chain))
last_atom = index(last(chain))
if isnothing(current_atom)
current_atom = first_atom
elseif current_atom > last_atom
return nothing
end
return (chain[current_atom - first_atom + 1], current_atom + 1)
end

function same_chain(atom1::Atom, atom2::Atom)
atom1.chain == atom2.chain &&
atom1.model == atom2.model &&
atom1.segname == atom2.segname
end

# io show functions
#
function Base.show(io::IO, ::MIME"text/plain", chain::Chain)
println(io, " Chain of name $(chain.chain) with $(length(chain)) atoms.")
print_short_atom_list(io, @view chain.atoms[chain.range])
end

function Base.show(io::IO, chains::EachChain)
print(io, " Iterator with $(length(chains)) chains.")
end

function Base.show(io::IO, ::MIME"text/plain", chains::AbstractVector{Chain})
print(io, " Array{Chain,1} with $(length(chains)) chains.")
end

@testitem "Chain iterator" begin
ana-bblima marked this conversation as resolved.
Show resolved Hide resolved
pdb = read_pdb(PDBTools.CHAINSPDB)
chains = eachchain(pdb)
@test Chain(pdb, 1:48).range == 1:48
@test length(chains) == 3
@test firstindex(chains) == 1
@test lastindex(chains) == 3
@test_throws ArgumentError chains[1]
chains = collect(eachchain(pdb))
@test name(chains[3]) == "C"
@test index.(filter(at -> resname(at) == "ASP" && name(at) == "CA", chains[1])) == [2]
@test length(findall(at -> resname(at) == "GLN", chains[1])) == 17
@test mass(chains[1]) == 353.37881000000016
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparações de floats tem que sempre ser feitas com aproximadamente (\approx + TAB gera o símbolo de aproximadamente - ou isapprox(x,y).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrigir este teste da massa para usar approx

@test chains[3].segname == "ASYN"
@test chains[2].model == 1
@test chains[1].chain == "A"

end

5 changes: 4 additions & 1 deletion src/PDBTools.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ export read_pdb, write_pdb, getseq, wget, edit!, oneletter, threeletter, residue
export read_mmcif, write_mmcif
export Atom, printatom, index, index_pdb, name, beta, occup, charge, pdb_element
export add_custom_field
export Residue, eachresidue, resname, residue, resnum, chain, model, segname
export Residue, eachresidue, resname, residue, resnum, model, segname
export Chain, eachchain, chain
export residue_ticks
export coor, maxmin, distance, closest
export element, mass, element_name, element_symbol, element_symbol_string
Expand All @@ -46,6 +47,7 @@ const TESTPDB = joinpath(@__DIR__,"../test/structure.pdb")
const SMALLPDB = joinpath(@__DIR__,"../test/small.pdb")
const SIRAHPDB = joinpath(@__DIR__,"../test/sirah.pdb")
const TESTCIF = joinpath(@__DIR__,"../test/small.cif")
const CHAINSPDB = joinpath(@__DIR__,"../test/protein_test.pdb")

ana-bblima marked this conversation as resolved.
Show resolved Hide resolved
# Basic chemistry
include("./elements.jl")
Expand All @@ -56,6 +58,7 @@ include("./protein_residues.jl")
#
include("./Atom.jl")
include("./Residue.jl")
include("./Chain.jl")

# Selection functions
include("./select.jl")
Expand Down
5 changes: 4 additions & 1 deletion src/Residue.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Residue(atoms::AbstractVector{<:Atom}, range::UnitRange{Int})
Residue(atoms::AbstractVector{<:Atom}, range::UnitRange{Int})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aqui por algum motivo você tirou um espaço e a identação está errada.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rever


Residue data structure. It contains two fields: `atoms` which is a vector of
`Atom` elements, and `range`, which indicates which atoms of the `atoms` vector
Expand Down Expand Up @@ -31,6 +31,9 @@ julia> residues[5].chain
julia> residues[8].range
52:58

julia> mass(residues[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Não sei se aqui vai dar erro, mas testes com números de ponto flutuante (floats) em jldoctest são dor de cabeça, porque se qualquer algarismo variar o teste falha. Os jldoctests comparam exatamente o output gerado com o output esperado, e comparações de igualdade de floats nunca funcionam bem (teria que usar uma comparação aproximada).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rever

82.0385

```

"""
Expand Down
Loading
Loading