espei.parameter_selection package¶
Submodules¶
espei.parameter_selection.model_building module¶
Building candidate models
- espei.parameter_selection.model_building.build_candidate_models(configuration, features)¶
Return a dictionary of features and candidate models
- Parameters
configuration (tuple) – Configuration tuple, e.g. ((‘A’, ‘B’, ‘C’), ‘A’)
features (dict) – Dictionary of {str: list} of generic features for a model, not considering the configuration. For example: {‘CPM_FORM’: [sympy.S.One, v.T, v.T**2, v.T**3]}
- Returns
Dictionary of {feature: [candidate_models])
- Return type
dict
Notes
Currently only works for binary and ternary interactions.
Candidate models match the following spec: 1. Candidates with multiple features specified will have 2. orders of parameters (L0, L0 and L1, …) have the same number of temperatures
Note that high orders of parameters with multiple temperatures are not required to contain all the temperatures of the low order parameters. For example, the following parameters can be generated L0: A L1: A + BT
- espei.parameter_selection.model_building.build_feature_sets(temperature_features, interaction_features)¶
Return a list of broadcasted features
- Parameters
temperature_features (list) – List of temperature features that will become a successive_list, such as [TlogT, T-1, T2]
interaction_features (list) – List of interaction features that will become a successive_list, such as [YS, YS*Z, YS*Z**2]
- Returns
- Return type
list
Notes
This allows two sets of features, e.g. [TlogT, T-1, T2] and [YS, YS*Z, YS*Z**2] and generates a list of feature sets where the temperatures and interactions are broadcasted successively.
Generates candidate feature sets like: L0: A + BT, L1: A L0: A , L1: A + BT
but not lists that are not successive: L0: A + BT, L1: Nothing, L2: A L0: Nothing, L1: A + BT
There’s still some debate whether it makes sense from an information theory perspective to add a L1 B term without an L0 B term. However this might be more representative of how people usually model thermodynamics.
Does not distribute multiplication/sums or make assumptions about the elements of the feature lists. They can be strings, ints, objects, tuples, etc..
The number of features (related to the complexity) is a geometric series. For \(N\) temperature features and \(M\) interaction features, the total number of feature sets should be \(N(1-N^M)/(1-N)\). If \(N=1\), then there are \(M\) total feature sets.
- espei.parameter_selection.model_building.make_successive(xs)¶
Return a list of successive combinations
- Parameters
xs (list) – List of elements, e.g. [X, Y, Z]
- Returns
List of combinations where each combination include all the preceding elements
- Return type
list
Examples
>>> make_successive(['W', 'X', 'Y', 'Z']) [['W'], ['W', 'X'], ['W', 'X', 'Y'], ['W', 'X', 'Y', 'Z']]
espei.parameter_selection.redlich_kister module¶
Tools for construction Redlich-Kister polynomials used in parameter selection.
- espei.parameter_selection.redlich_kister.calc_interaction_product(site_fractions)¶
Calculate the interaction product for sublattice site fractions
Callers should take care that the site fractions correspond to constituents in sorted order, since there’s an order-dependent subtraction.
- Parameters
site_fractions (List[List[float]]) – List of site fractions for each sublattice. The list should a ragged 2d list of shape (sublattices, site fractions).
- Returns
A scalar for binary interactions and a list of 3 floats for ternary interactions
- Return type
Union[float, List[float]]
Examples
>>> # interaction product for an (A) site_fractions >>> calc_interaction_product([[1.0]]) 1.0 >>> # interaction product for [(A,B), (A,B)(A)] site fractions that are equal >>> calc_interaction_product([[0.5, 0.5]]) 0.0 >>> calc_interaction_product([[0.5, 0.5], 1]) 0.0 >>> # interaction product for an [(A,B)] site_fractions >>> calc_interaction_product([[0.1, 0.9]]) -0.8 >>> # interaction product for an [(A,B)(A,B)] site_fractions >>> calc_interaction_product([[0.2, 0.8], [0.4, 0.6]]) 0.12 >>> # ternary case, (A,B,C) interaction >>> calc_interaction_product([[0.333, 0.333, 0.334]]) [0.333, 0.333, 0.334] >>> # ternary 2SL case, (A,B,C)(A) interaction >>> calc_interaction_product([[0.333, 0.333, 0.334], 1.0]) [0.333, 0.333, 0.334]
espei.parameter_selection.selection module¶
Fit, score and select models
- espei.parameter_selection.selection.fit_model(feature_matrix, data_quantities, ridge_alpha, weights=None)¶
Return model coefficients fit by scikit-learn’s LinearRegression
- Parameters
feature_matrix (ArrayLike) – (\(M \times N\)) regressor matrix. The transformed model inputs (y_i, T, P, etc.)
data_quantities (ArrayLike) – Size (\(M\)) response vector. Target values of the output (e.g. HM_MIX) to reproduce.
ridge_alpha (float) – Value of the \(\alpha\) hyperparameter used in ridge regression. Defaults to 1.0e-100, which should be degenerate with ordinary least squares regression. For now, the parameter is applied to all features.
- Returns
List of model coefficients of size (\(N\))
- Return type
list
Notes
Solve \(Ax = b\) where \(x\) are the desired model coefficients, \(A\) is the
feature_matrix
and \(b\) corrresponds todata_quantities
.
- espei.parameter_selection.selection.score_model(feature_matrix, data_quantities, model_coefficients, feature_list, weights, aicc_factor=None, rss_numerical_limit=1e-16)¶
Use the modified AICc to score a model that has been fit.
The modified AICc is given by
\[\mathrm{mAICc} = n \ln \frac{\mathrm{RSS}}{n} + 2pk + \frac {2p^2k^2 + 2pk} {n - pk - 1}\]- Parameters
feature_matrix (ArrayLike) – (\(M \times N\)) regressor matrix. The transformed model inputs (y_i, T, P, etc.)
data_quantities (ArrayLike) – Size (\(M\)) response vector. Target values of the output (e.g. HM_MIX) to reproduce.
model_coefficients (list) – Size (\(N\)) list of fitted model coefficients to be scored.
feature_list (list) – Polynomial coefficients corresponding to each column of
feature_matrix
. Has shape (N,). Purely a logging aid.aicc_factor (float) – Multiplication factor for the AICc’s parameter penalty.
rss_numerical_limit (float) – Anything with an absolute value smaller than this is set to zero.
- Returns
A model score
- Return type
float
- espei.parameter_selection.selection.select_model(candidate_models, ridge_alpha, weights, aicc_factor=None)¶
Select a model from a series of candidates by fitting and scoring them
- Parameters
candidate_models (list) – List of tuples of (features, feature_matrix, data_quantities)
ridge_alpha (float) – Value of the \(\alpha\) hyperparameter used in ridge regression. Defaults to 1.0e-100, which should be degenerate with ordinary least squares regression. For now, the parameter is applied to all features.
aicc_factor (float) – Multiplication factor for the AICc’s parameter penalty.
- Returns
Tuple of (feature_list, model_coefficients) for the highest scoring model
- Return type
tuple
espei.parameter_selection.utils module¶
Tools used across parameter selection modules
- espei.parameter_selection.utils.get_data_quantities(desired_property, fixed_model, fixed_portions, data, sample_condition_dicts)¶
- Parameters
desired_property (str) – String property corresponding to the features that could be fit, e.g. HM, SM_FORM, CPM_MIX
fixed_model (pycalphad.Model) – Model with all lower order (in composition) terms already fit. Pure element reference state (GHSER functions) should be set to zero.
fixed_portions (List[sympy.Expr]) – SymPy expressions for model parameters and interaction productions for higher order (in T) terms for this property, e.g. [0, 3.0*YS*v.T]. In [qty]/mole-formula.
data (List[Dict[str, Any]]) – ESPEI single phase datasets for this property.
- Returns
np.ndarray[ – Ravelled data quantities in [qty]/mole-formula
- Return type
]
Notes
pycalphad Model parameters (and therefore fixed_portions) are stored as per mole-formula quantites, but the calculated properties and our data are all in [qty]/mole-atoms. We multiply by mole-atoms/mole-formula to convert the units to [qty]/mole-formula.
- espei.parameter_selection.utils.shift_reference_state(desired_data, feature_transform, fixed_model, mole_atoms_per_mole_formula_unit)¶
Shift _MIX or _FORM data to a common reference state in per mole-atom units.
- Parameters
desired_data (List[Dict[str, Any]]) – ESPEI single phase dataset
feature_transform (Callable) – Function to transform an AST for the GM property to the property of interest, i.e. entropy would be
lambda GM: -sympy.diff(GM, v.T)
fixed_model (pycalphad.Model) – Model with all lower order (in composition) terms already fit. Pure element reference state (GHSER functions) should be set to zero.
mole_atoms_per_mole_formula_unit (float) – Number of moles of atoms in every mole atom unit.
- Returns
Data for this feature in [qty]/mole-formula in a common reference state.
- Return type
np.ndarray
- Raises
ValueError –
Notes
pycalphad Model parameters are stored as per mole-formula quantites, but the calculated properties and our data are all in [qty]/mole-atoms. We multiply by mole-atoms/mole-formula to convert the units to [qty]/mole-formula.