fix projected frequencies #368

t-young31 · 2024-11-29T16:25:47Z

Resolves #362

Modifies the calculation of projected frequencies to not depend on the molecular orientation.

This dependance arose because the previous implementation blindly followed the reference approach in which the projection is meant to create a block hessian matrix with structure

(0  0)
(0  H')

then discarding the first 5/6 rows that should be translations/rotations. However, if the rows were not ~0 then, depending on the chosen rotation vectors (which are coordinate dependent)

autodE/autode/hessians.py

Lines 177 to 179 in 6a6eebe

    
           t4 += np.cross(e_x, atom.coord - com).tolist() 
        
           t5 += np.cross(e_y, atom.coord - com).tolist() 
        
           t6 += np.cross(e_z, atom.coord - com).tolist()

then sometimes a vibrational mode could be discarded, leading to the wrong projected frequencies.

This PR

Removes the random orientation of translational vectors as I don't think they're needed
Checks the structure of the projected hessian before diagonalising a submatrix that has all the non-zero components
Zeros not-quite zero modes that should be rotations (numerical issues?)

While debugging this I found that ORCA and Gaussian do something unexpected (to me at least). Their transform to normal modes is not a projection (i.e. eigenvalue preserving). For a distorted methane molecule

you go from 2 strong imaginary frequencies unprojected (Frequency(-881.98419 cm^-1), Frequency(-197.01737 cm^-1)), to only one (-178.5054 cm-1). Plotting the energy along this mode

it's very much not a TS, so I am confused about what they're doing, and if it's ok

Checklist

The changes include an associated explanation of how/why
Test pass
Documentation has been updated
Changelog has been updated

codecov · 2024-11-29T16:34:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.47%. Comparing base (140c60b) to head (4e6a7c1).
Report is 1 commits behind head on v1.4.5.

Additional details and impacted files

@@           Coverage Diff           @@
##           v1.4.5     #368   +/-   ##
=======================================
  Coverage   97.46%   97.47%           
=======================================
  Files         204      204           
  Lines       23759    23789   +30     
=======================================
+ Hits        23157    23188   +31     
+ Misses        602      601    -1

Flag	Coverage Δ
unittests	`97.47% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

shoubhikraj

I wrote some comments (mostly questions), but I will have to later run some tests with the code to see how it is working and why there are some extra non-zero modes.

autode/hessians.py

shoubhikraj · 2024-11-29T20:38:29Z

autode/hessians.py


-        trans_rot_freqs = [Frequency(0.0) for _ in range(n_tr)]
+        H = self._proj_mass_weighted
+        norms = np.linalg.norm(H, axis=0)


I am curious if this will work, because the projection forces the eigenvalue to be zero, but the norm may or not be zero? Or does it have to be? I am not so familiar with linear algebra to be able to tell 😅

these are the norms post projection, which, if the projection has worked, then the matrix is block zero ((0, 0), (0, H'). (I think)

@t-young31 I tried the code with an ethane molecule and XTB hessian - but the projected matrix does not seem to be block zero at all. Here is the projected mass weighted matrix: projected.txt and here is the xyz file: test.xyz.txt. I am not sure if the projection is actually working. I will try to go through the original literature papers to try to make sense of this -- it might be that the stackoverflow page had some errors 😕

for water it looks okay

-0.00 0.00 -0.00 0.00 0.00 0.00 -0.00 0.00 -0.01 0.00 -0.00 0.00 -0.00 0.00 -0.00 -0.00 0.00 -0.00 -0.00 0.00 0.00 -0.02 0.00 0.00 0.00 -0.00 0.00 0.00 -0.00 -0.02 0.10 -0.00 -0.00 -0.00 0.00 -0.00 0.00 0.00 0.00 -0.00 -0.10 -0.00 0.00 -0.00 -0.00 0.00 -0.00 0.00 -0.00 -0.00 -0.00 0.00 -0.00 0.00 -0.00 -0.00 0.00 -0.00 0.00 0.00 3.92 -1.55 0.07 0.00 0.00 -0.00 0.00 -0.00 -0.00 -1.55 1.63 0.15 -0.01 -0.00 0.00 -0.00 -0.00 0.00 0.07 0.15 4.72

but as you say, for ethane it looks quite a bit worse 😕 (using np.savetxt("tmp.txt", H/1e9, fmt='%5.2f'))

just looking at this again and it's the numerical XTB hessian being wrong. this is the projected ORCA Hessian, which looks perfect

-0.00 -0.00 0.00 -0.00 -0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 0.00 -0.00 -0.00 0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 0.00 0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 0.00 -0.00 -0.00 0.00 -0.00 0.00 0.00 0.00 -0.00 -0.00 -0.00 0.00 -0.00 0.00 0.00 0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 0.00 -0.00 0.00 -0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 0.00 -0.00 0.00 0.00 0.00 -0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 0.00 1.08 -0.22 -1.11 0.10 -0.03 -0.20 0.09 -0.01 -0.15 0.03 0.00 -0.01 0.02 0.00 -0.01 0.03 -0.01 -0.15 0.00 -0.00 0.00 0.00 0.00 0.00 -0.22 0.65 0.45 0.09 -0.01 -0.15 -0.11 0.06 0.23 -0.02 -0.00 -0.01 -0.03 0.00 -0.01 0.03 -0.01 0.04 0.00 0.00 0.00 0.00 0.00 -0.00 -1.11 0.45 2.74 -0.06 0.04 0.14 -0.03 -0.02 0.08 0.05 -0.01 -0.04 -0.18 0.03 -0.07 0.04 -0.03 0.16 -0.00 0.00 -0.00 -0.00 0.00 -0.00 0.10 0.09 -0.06 0.95 0.96 0.43 0.08 0.12 0.04 0.03 0.01 0.00 0.04 0.00 0.01 0.01 0.02 -0.15 -0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.03 -0.01 0.04 0.96 2.61 0.85 0.01 0.05 -0.01 -0.05 -0.03 -0.03 -0.04 -0.03 -0.02 0.18 -0.05 -0.15 -0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.20 -0.15 0.14 0.43 0.85 0.84 0.08 0.23 0.19 0.01 -0.01 -0.01 -0.05 -0.00 -0.02 0.08 -0.02 -0.06 -0.00 0.00 -0.00 -0.00 0.00 0.00 0.09 -0.11 -0.03 0.08 0.01 0.08 0.66 -0.54 0.39 -0.03 -0.01 0.01 0.04 -0.02 0.03 0.04 -0.02 -0.12 0.00 -0.00 -0.00 0.00 0.00 0.00 -0.01 0.06 -0.02 0.12 0.05 0.23 -0.54 2.31 -1.35 -0.16 -0.01 -0.00 0.01 -0.05 0.03 0.06 -0.03 0.15 -0.00 0.00 -0.00 -0.00 -0.00 0.00 -0.15 0.23 0.08 0.04 -0.01 0.19 0.39 -1.35 1.34 0.11 0.00 -0.01 -0.05 0.03 -0.04 0.01 0.01 -0.11 -0.00 0.00 -0.00 -0.00 -0.00 -0.00 0.03 -0.02 0.05 0.03 -0.05 0.01 -0.03 -0.16 0.11 0.72 -0.53 0.34 0.04 -0.16 0.11 0.05 -0.07 0.10 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 -0.01 0.01 -0.03 -0.01 -0.01 -0.01 0.00 -0.53 2.51 -1.15 -0.07 0.13 -0.17 -0.02 0.05 -0.16 -0.00 0.00 -0.00 -0.00 0.00 -0.00 -0.01 -0.01 -0.04 0.00 -0.03 -0.01 0.01 -0.00 -0.01 0.34 -1.15 1.45 -0.03 -0.10 0.03 0.10 -0.13 0.10 -0.00 -0.00 -0.00 0.00 0.00 -0.00 0.02 -0.03 -0.18 0.04 -0.04 -0.05 0.04 0.01 -0.05 0.04 -0.07 -0.03 1.12 -0.24 -1.12 0.08 -0.10 0.14 0.00 -0.00 0.00 0.00 -0.00 -0.00 0.00 0.00 0.03 0.00 -0.03 -0.00 -0.02 -0.05 0.03 -0.16 0.13 -0.10 -0.24 0.85 0.57 0.02 -0.01 -0.06 0.00 0.00 0.00 -0.00 -0.00 0.00 -0.01 -0.01 -0.07 0.01 -0.02 -0.02 0.03 0.03 -0.04 0.11 -0.17 0.03 -1.12 0.57 2.69 -0.05 0.09 -0.15 -0.00 0.00 -0.00 0.00 0.00 0.00 0.03 0.03 0.04 0.01 0.18 0.08 0.04 0.06 0.01 0.05 -0.02 0.10 0.08 0.02 -0.05 1.00 1.05 0.14 -0.00 0.00 -0.00 0.00 0.00 0.00 -0.01 -0.01 -0.03 0.02 -0.05 -0.02 -0.02 -0.03 0.01 -0.07 0.05 -0.13 -0.10 -0.01 0.09 1.05 2.97 0.16 0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.15 0.04 0.16 -0.15 -0.15 -0.06 -0.12 0.15 -0.11 0.10 -0.16 0.10 0.14 -0.06 -0.15 0.14 0.16 0.57

autode/hessians.py

shoubhikraj · 2025-01-03T10:28:02Z

@t-young31 After looking at a lot of webpages, a paper and the stackoverflow page which was used for the original implementation - this is what I came up with - I think it bypasses the problem of modes that are not completely zero.

The projection matrix should project the hessian into a reduced dimension that does not have the rotational modes:

@cached_property
def _proj_matrix(self) -> np.ndarray:

    if self.atoms is None:
        raise ValueError("Could generate projected Hessian. Atoms not set")

    tr_vecs = list(self._tr_vecs())

    # Construct M^1/2, which as it's diagonal, is just the roots of the
    # diagonal elements
    masses = np.repeat([atom.mass for atom in self.atoms], repeats=3)
    m_half = np.sqrt(masses)

    for t_i in tr_vecs:
        t_i *= m_half

    # Generate a tranform matrix for the null-space of translation
    # rotation using SVD
    tr_bas = np.array(tr_vecs).transpose()
    U_s, s_v, _ = np.linalg.svd(tr_bas, full_matrices=True)

    if self.atoms.are_linear():
        if s_v[5] / s_v[0] > 1.e-4:
            logger.warning(
                f"Molecule detected as linear, but lowest singular value"
                f" from trans. and rot. basis is {s_v[5]:.4e}."
            )
        return U_s[:, 5:]
    else:
        return U_s[:, 6:]

Part of this is inspired from another snippet of code here that I got from this page.

The mathematical justification, as I understand from this helpful answer - SVD can obtain the null-space of the matrix columns - i.e. the orthonormal basis vectors in U_s[:, 6:] represent the directions that are not described by the translation and rotation since we use the six known vectors to form the matrix columns.

When the hessian is projected into this space - we get a 3N-6 x 3N-6 matrix. The normal modes are obtained by only getting the eigenvectors - and then transforming them back into the full 3N atom space by using the projection matrix.

@cached_property
def normal_modes_proj(self) -> List[Coordinates]:

    if self.atoms is None:
        raise ValueError(
            "Could not calculate projected normal modes, must"
            " have atoms set"
        )

    # Obtain projected eigenvectors and back-transform into
    # the non-projected dimensions
    _, proj_modes = np.linalg.eigh(self._proj_mass_weighted)
    modes = np.matmul(self._proj_matrix, proj_modes)
    return [Coordinates(mode / np.linalg.norm(mode)) for mode in modes.T]

Calculating the projected frequencies also seems to be simpler

@cached_property
def frequencies_proj(self) -> List[Frequency]:

    if self.atoms is None:
        raise ValueError(
            "Could not calculate projected frequencies, must "
            "have atoms set"
        )

    lmdas = np.linalg.eigvalsh(self._proj_mass_weighted)
    vib_freqs = self._eigenvalues_to_freqs(lmdas)
    trans_rot_freqs = [Frequency(0.0) for _ in range(self.n_tr)]
    return trans_rot_freqs + vib_freqs

Two issues I found while trying to make this work - one is that the mass weighted Hessian should not be cast into SI units. The projection matrix uses amu units for mass, and angstrom for TR vectors. The mass weighted hessian uses kg for mass. I think this causes problems with my code because the projected frequencies and modes are directly obtained from the eigenvalues of the projected matrix. Here is the modified code:

@cached_property
def _mass_weighted(self) -> np.ndarray:

    if self.atoms is None:
        raise ValueError("Could not calculate frequencies. Atoms not set")

    H = self.to("ha/ang^2")
    mass_array = np.repeat(
        [atom.mass.to("amu") for atom in self.atoms],
        repeats=3,
        axis=np.newaxis,
    )

    return np.array(
        H / np.sqrt(np.outer(mass_array, mass_array))
    )  # Ha Å^-2 amu^-1

This also means there are some extra unit conversions needed in getting frequencies

def _eigenvalues_to_freqs(self, lambdas) -> List[Frequency]:
    """
    Convert eigenvalues of the mass-weighted Hessian matrix (units of
    Ha Å^-2 amu^-1) to frequencies in wavenumber units. Will use
    ade.Config.freq_scale_factor to scale the frequencies.

    -----------------------------------------------------------------------
    Arguments:
        lambdas (np.ndarray):

    Returns:
        (list(autode.values.Frequency)):
    """
    # Cast to SI units (J m^-2 kg^-1)
    evs = np.array(lambdas) * (
        Constants.ha_to_J / (Constants.ang_to_m ** 2) / Constants.amu_to_kg
    )
    nus = np.sqrt(np.cdouble(evs)) / (
        2.0 * np.pi * Constants.c_in_cm
    )
    nus *= self._freq_scale_factor

    # Cast the purely complex eigenvalues to negative real numbers, as is
    # usual in quantum chemistry codes
    idx_to_alter = np.iscomplex(nus)
    nus[idx_to_alter] = -np.abs(nus[idx_to_alter])

    return [Frequency(np.real(nu), units=wavenumber) for nu in nus]

The second problem I noticed is in autode/transition_states/base.py in the function displaced_species_along_mode. It adds the normal mode vector to the species coordinate - however, unless I am missing something - this normal mode vector is in mass weighted coordinates not Cartesian - so ideally it should be un-mass weighted before adding.

I have been running tests for this on a branch on my fork here and most of the code above should bere there. From tests it seems that there is some small difference in the first decimal point compared to ORCA. I am not certain why.

Another possible way of projecting the translation and rotation could be using the method mentioned in this arxiv paper. The method seems simpler - but it needs the moment of inertia tensor etc. which I don't understand very well especially how it links to linear molecules.

t-young31 · 2025-01-19T13:44:07Z

Thanks for digging in @shoubhikraj 🙌🏼

I've just had a quick look and it looks like the svd and qr factorisation give almost identical results (to 3dp). Replacing https://github.com/t-young31/autodE/blob/4e6a7c16e621f9580284143e9296ac5d9e76bdf7/autode/hessians.py#L199-L219
for
https://github.com/shoubhikraj/autodE/blob/821bd75570487564e3dbc123a858f8c42602bcb5/autode/hessians.py#L199-L222

(and not updating any units – I guess normalisation squashes it) with

import autode as ade
method = ade.methods.ORCA()
m = ade.Molecule(smiles="O")
m.optimise(method=method)
m.calc_thermo(method=method)
print(m.frequencies)

I get

# SVD: [Frequency(1643.04102 cm^-1), Frequency(3865.29913 cm^-1), Frequency(3964.11681 cm^-1)]
# QR: [Frequency(0.0 cm^-1), Frequency(0.0 cm^-1), Frequency(0.0 cm^-1), Frequency(0.0 cm^-1), Frequency(0.0 cm^-1), Frequency(0.0 cm^-1),
#   Frequency(1643.03966 cm^-1), Frequency(3865.29956 cm^-1), Frequency(3964.11695 cm^-1)]

However, with SVD https://github.com/t-young31/autodE/blob/4e6a7c16e621f9580284143e9296ac5d9e76bdf7/tests/test_hessian.py#L921
doesn't pass, from what I can see. I have been cut-pasting rather than running off your branch though, so please correct me if I'm wrong!

The second problem I noticed is in autode/transition_states/base.py in the function displaced_species_along_mode. It adds the normal mode vector to the species coordinate - however, unless I am missing something - this normal mode vector is in mass weighted coordinates not Cartesian - so ideally it should be un-mass weighted before adding.

This sounds very possible. We'd need to test whether how un-mass weighted mode performed in QRC. I guess the units are wrong, but it works well in my experience so 🤷🏼

Do you understand how Gaussian and ORCA are doing a projection and not maintaining the eigenvalues in the PR description?

shoubhikraj · 2025-01-26T15:45:30Z

@t-young31 Ok, I did some more testing.

It seems like QR factorization is fine when the molecule is not linear (6) modes. The first k columns of the Q matrix of QR form the orthogonal basis for the first k columns of the original matrix (which in this case would be the translation and rotational vectors). The problem is when molecule is linear, the 6 translation and rotation vectors from hessian._tr_vecs() would be rank-deficient - i.e. one of the columns will be linearly dependent. In such cases QR factorization is not guaranteed to return the Q vectors in the correct order (i.e. the trans-rot basis space in the first 5 columns of Q and the null space in the other columns). This means when the first 5 columns are selected in normal_modes_proj it could remove a vibration and keep a rotation, depending on the coordinates of the linear molecule.

For SVD, the singular values in the s matrix are required to be in descending order. (And you also don't need to make the M matrix, only the 6 trans-rot vectors will suffice). I suspect that a pivoted QR (from scipy) on only the 6 trans-rot vectors will return the same result. The M matrix in the _proj_matrix function is probably unnecessary!

Another problem I noticed in these lines:
https://github.com/t-young31/autodE/blob/4e6a7c16e621f9580284143e9296ac5d9e76bdf7/autode/hessians.py#L212-L217

I believe that checking the norm of the TR vectors to be zero will only work if we use the eigenvectors of the moment of inertia matrix. This is because in that frame of reference with linear molecules - one of the rotation vectors will have a norm of zero. But here we are using the original coordinates, so if you apply a random rotation, the norm will no longer be zero (unless somehow one of the cartesian axes match with a rotation axis).

I am not sure why the test is failing 😕 . Which method did you use to calculate the hessian matrix? I have tried with ORCA and XTB but there is no small frequency in those cases. With the precalculated matrix, there is a small 17 cm-1 frequency which is causing the test to fail.

As for Gaussian and ORCA I am not certain. Eigenvectors are probably contaminated with some rotational and translational components, so removing them changes the eigenvalues a little. But they should not change that much!

shoubhikraj · 2025-01-26T15:46:14Z

@t-young31 Could you send me an xyz file of the distorted methane molecule so that I can do some more testing?

t-young31 added 2 commits November 24, 2024 15:57

fix projection

640e135

self review

4e6a7c1

t-young31 requested a review from shoubhikraj November 29, 2024 16:25

shoubhikraj reviewed Nov 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix projected frequencies #368

fix projected frequencies #368

t-young31 commented Nov 29, 2024 •

edited

Loading

codecov bot commented Nov 29, 2024 •

edited

Loading

shoubhikraj left a comment

shoubhikraj Nov 29, 2024

t-young31 Dec 7, 2024

shoubhikraj Dec 18, 2024

t-young31 Dec 18, 2024 •

edited

Loading

t-young31 Jan 19, 2025

shoubhikraj commented Jan 3, 2025 •

edited

Loading

t-young31 commented Jan 19, 2025

shoubhikraj commented Jan 26, 2025

shoubhikraj commented Jan 26, 2025

	t4 += np.cross(e_x, atom.coord - com).tolist()
	t5 += np.cross(e_y, atom.coord - com).tolist()
	t6 += np.cross(e_z, atom.coord - com).tolist()

fix projected frequencies #368

Are you sure you want to change the base?

fix projected frequencies #368

Conversation

t-young31 commented Nov 29, 2024 • edited Loading

Resolves #362

Checklist

codecov bot commented Nov 29, 2024 • edited Loading

Codecov Report

shoubhikraj left a comment

Choose a reason for hiding this comment

shoubhikraj Nov 29, 2024

Choose a reason for hiding this comment

t-young31 Dec 7, 2024

Choose a reason for hiding this comment

shoubhikraj Dec 18, 2024

Choose a reason for hiding this comment

t-young31 Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

t-young31 Jan 19, 2025

Choose a reason for hiding this comment

shoubhikraj commented Jan 3, 2025 • edited Loading

t-young31 commented Jan 19, 2025

shoubhikraj commented Jan 26, 2025

shoubhikraj commented Jan 26, 2025

t-young31 commented Nov 29, 2024 •

edited

Loading

codecov bot commented Nov 29, 2024 •

edited

Loading

t-young31 Dec 18, 2024 •

edited

Loading

shoubhikraj commented Jan 3, 2025 •

edited

Loading