Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Overflow of the ASE calculator #98

Open
jan-janssen opened this issue Feb 19, 2023 · 3 comments
Open

Memory Overflow of the ASE calculator #98

jan-janssen opened this issue Feb 19, 2023 · 3 comments

Comments

@jan-janssen
Copy link

Describe the bug

I try to calculate the xtb energies of the QM9 database - 133885 structures. But at 6000 structures the memory usage exceeds 32GB, resulting in my calculation to crash. This can be solved by copying the ASE atoms object before every evaluation.

To Reproduce

Example workflow, which fails:

from ase.io import read
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
from xtb.ase.calculator import XTB


def calc_energy(structure):
    structure.calc = XTB(method="GFN2-xTB")
    eng_pot = structure.get_potential_energy()
    return eng_pot


if __name__ == "__main__":
    path = "/home/janssen/pyiron/projects/2023-02-19-xtb-high-throughput/dsgdb9nsd"
    structure_lst = [read(os.path.join(path, f), format="xyz") for f in os.listdir(path)]
    potential_energy_lst = [calc_energy(structure=structure) for structure in tqdm(structure_lst)]
    print("Done")

Expected behaviour

This is the working example:

from ase.io import read
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
from xtb.ase.calculator import XTB


def calc_energy(structure):
    structure.calc = XTB(method="GFN2-xTB")
    eng_pot = structure.get_potential_energy()
    return eng_pot


if __name__ == "__main__":
    path = "/home/janssen/pyiron/projects/2023-02-19-xtb-high-throughput/dsgdb9nsd"
    structure_lst = [read(os.path.join(path, f), format="xyz") for f in os.listdir(path)]
    potential_energy_lst = [calc_energy(structure=structure.copy()) for structure in tqdm(structure_lst)]
    print("Done")

In particular I changed the line:

< potential_energy_lst = [calc_energy(structure=structure) for structure in tqdm(structure_lst)]
> potential_energy_lst = [calc_energy(structure=structure.copy()) for structure in tqdm(structure_lst)]

Additional context

I tried both cache_api=True and cache_api=False still this does not fix the issue. A similar option would be great to achieve the same reset I get from copying the atoms object.

@jan-janssen jan-janssen added the unconfirmed This report has not yet been confirmed by the developers label Feb 19, 2023
@SingletC
Copy link

SingletC commented Jun 15, 2023

you keep initializing new calc instances each time.
structure.calc = XTB(method="GFN2-xTB")
try to reuse the same calc instance will help.
but this seems to indicate there is an issue with GC.

@awvwgk
Copy link
Member

awvwgk commented Jun 15, 2023

Note that you modify each structure in place creating a new calculator object and associate it with the structure. The API objects cannot be cleaned up by the GC because references to the respective objects still exist.

@awvwgk awvwgk removed the unconfirmed This report has not yet been confirmed by the developers label Jun 15, 2023
@SingletC
Copy link

SingletC commented Jun 18, 2023

I can confirm reuse the calc object can avoid memory problems. The below code has no issue.

from ase.atoms import Atoms
from xtb.ase.calculator import XTB
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
atoms.calc = XTB(method="GFN2-xTB")
for _ in range(100000):
    atoms.rattle()  
    atoms.get_forces()

However, this code demonstrate there is indeed gc issue:

from ase.atoms import Atoms
from xtb.ase.calculator import XTB
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
for _ in range(100000):
    atoms.rattle()  
    atoms.calc = XTB(method="GFN2-xTB")
    atoms.get_forces()

Note that it is a slightly different approach compared to the op. the same atoms was used each time. The old instance of XTB calculator should be GCed.
Although it is arguably not the best approach, it indeed consumes more and more memory in my test environment:

python3.8 
ase=3.22.1 
xtb==22.1 

an additional test to verify that is not ase's issue. The below code does not consume more and more memory.

from ase.atoms import Atoms
from ase.calculators.lj import LennardJones
atoms = Atoms('H2',positions=[[0,0,0],[0,0,1]])
for _ in range(10000):
    atoms.rattle()  
    atoms.calc = LennardJones()
    atoms.get_forces()

Correct me if I did something wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants