SALSA- (S)ubstitution, (A)pproximation, Evo(L)utionary (S)earch, and (A)B-Initio Calculations

cover
8 Oct 2024

(1) Sean M. Stafford, Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, 48824, USA;

(2) Alexander Aduenko, Moscow Institute of Physics and Technology, Moscow, Russia;

(3) Marcus Djokic, Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, 48824, USA;

(4) Yu-Hsiu Lin, Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, 48824, USA;

(5) Jose L. Mendoza-Cortes, Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, 48824, USA (Email: [email protected]).

Abstract and Introduction

SALSA- (S)ubstitution, (A)pproximation, Evo(L)utionary (S)earch, and (A)B-Initio Calculations

SALSA Applied to Photocatalytic Water-splitting

Discussion

Methods

Conclusions, Data Availability Statement and References

Appendix: Supplementary Material

II. SALSA – (S)UBSTITUTION, (A)PPROXIMATION, EVO(L)UTIONARY (S)EARCH, AND (A)B-INITIO CALCULATIONS

We developed a highly efficient and versatile materials discovery process, dubbed SALSA, which is an acronym for Substitution, Approximation, evoLutionary Search, and Abinitio calculations. An overview of SALSA is provided in Figure 1. The process starts by taking a target property or set of properties as input and returns a set of candidate structures as output. Instead of relying on brute-force approaches, SALSA harnesses the power of a large database of compounds with known structures and properties to rapidly search for new materials. The process begins with swapping ionic components between pairs of known compounds that have similar ionic species, as guided by a substitution likelihood matrix, to produce a dataset of hybrid compounds with defined compositions but undefined structures. We then infer approximate properties for these hybrid compounds using a weighted sum of properties of parent compounds and discard hybrids without desirable properties. Promising hybrids are then subjected to an evolutionary structure search using the USPEX algorithm, which generates stable crystal structures for a given composition whenever possible. High-fidelity DFT calculations are then used to recalculate the properties of the generated structures, and structures with undesirable properties are discarded. The process produces a set of undiscovered materials that are promising candidates for various applications, including the application to artificial photosynthesis discussed in Section III. Furthermore, SALSA is highly versatile and can be applied to other materials science problems as well.

a. Substitution by Chemical Similarity Our group reconstructed and expanded the scope of the substitution likelihood matrix introduced by Hautier et al. 19 In our construction, we used the entirety of the Inorganic Crystal Structure Database (ICSD)23 and do not restrict substitutions to preserve the space group of the crystal structure (Stafford et al., 2023b in prep will describe details of this construction.) High values of our matrix correspond to pairs of ionic species empirically observed to exist in similar chemical environments. Above a chosen threshold, a value designates substitution between an ion pair as likely. Applying these likely substitutions to compounds of our initial dataset forms a hypothetical set of new candidate compounds. The resulting candidate dataset is too large for us to feasibly calculate properties of all compounds unless we are overly restrictive with unit cell size or substitution threshold. Therefore, we narrow the scope of our investigation to a subset for which we can efficiently approximate properties.

b. Approximation by Linear Interpolation We examine the class of candidate compounds which are compositional interpolations between two initial compounds, i.e. hybrid compounds. We derive estimates for the properties of hybrids by summing the properties of parent compounds with the same ratio used in the corresponding hybrid composition. Next, we define the boundary of a target region of property space appropriate for our application. Finally, we eliminate hybrids that do not lie within this region. This step allows us to filter out the sizeable portion of our candidate compounds that are far removed from the target region before proceeding to intensive calculations. While this is an extremely simplistic model of property space, it is a computationally cheap way to approximate values close enough to eliminate most of the unsuitable candidates without a high risk of eliminating suitable ones. Note that we reduce this risk by extending the boundary of our target region beyond the ideal region of property space by enough to include some tolerance for the error that comes with our interpolation method. See Figure 2 for a summary of this scheme.

c. Evolutionary Search of Structure Space Until this point, we have defined our hybrid compounds by their composition alone, but reliable property calculations require structural information. Crystal structure prediction from first principles is prohibitively difficult using just composition. Instead, we turn to an evolutionary structure search code, USPEX, to generate crystal structures for our hybrids. We provide USPEX with a hybrid composition and enable all available stochastic variation operations, which includes variation of the space group. If USPEX is unable to converge a structure for a given composition, that indicates the composition is unlikely to have a thermodynamically stable structure and is eliminated from further consideration. See Section V E for a more detailed look at our USPEX methodology.

d. Ab-initio Property Calculations Our candidate set is now vastly narrowed down and contains structural information so high fidelity property calculations are computationally feasible. Therefore we perform geometry optimization and property calculation with another DFT code, CRYSTAL17, at the hybrid functional level of theory.24,25 Some candidate compounds located within the target region according to interpolation-inferred values shift outside the region upon replacement by CRYSTAL17-calculated values while others do not converge with CRYSTAL17 at all. We discard these and are left with the final products of SALSA – the structures which CRYSTAL17 converges and determines to have properties in the target region.

This paper is available on arxiv under CC 4.0 license.