espei.parameter_selection package

Submodules

espei.parameter_selection.model_building module

Building candidate models

espei.parameter_selection.model_building.build_candidate_models(configuration, features)

Return a dictionary of features and candidate models

Parameters:
  • configuration (tuple) – Configuration tuple, e.g. ((‘A’, ‘B’, ‘C’), ‘A’)
  • features (dict) – Dictionary of {str: list} of generic features for a model, not considering the configuration. For example: {‘CPM_FORM’: [sympy.S.One, v.T, v.T**2, v.T**3]}
Returns:

Dictionary of {feature: [candidate_models])

Return type:

dict

Notes

Currently only works for binary and ternary interactions.

Candidate models match the following spec: 1. Candidates with multiple features specified will have 2. orders of parameters (L0, L0 and L1, …) have the same number of temperatures

Note that high orders of parameters with multiple temperatures are not required to contain all the temperatures of the low order parameters. For example, the following parameters can be generated L0: A L1: A + BT

espei.parameter_selection.model_building.build_feature_sets(temperature_features, interaction_features)

Return a list of broadcasted features

Parameters:
  • temperature_features (list) – List of temperature features that will become a successive_list, such as [TlogT, T-1, T2]
  • interaction_features (list) – List of interaction features that will become a successive_list, such as [YS, YS*Z, YS*Z**2]
Returns:

Return type:

list

Notes

This allows two sets of features, e.g. [TlogT, T-1, T2] and [YS, YS*Z, YS*Z**2] and generates a list of feature sets where the temperatures and interactions are broadcasted successively.

Generates candidate feature sets like: L0: A + BT, L1: A L0: A , L1: A + BT

but not lists that are not successive: L0: A + BT, L1: Nothing, L2: A L0: Nothing, L1: A + BT

There’s still some debate whether it makes sense from an information theory perspective to add a L1 B term without an L0 B term. However this might be more representative of how people usually model thermodynamics.

Does not distribute multiplication/sums or make assumptions about the elements of the feature lists. They can be strings, ints, objects, tuples, etc..

The number of features (related to the complexity) is a geometric series. For $N$ temperature features and $M$ interaction features, the total number of feature sets should be $N*(1-N**M)/(1-N)$. If $N=1$, then there are $M$ total feature sets.

espei.parameter_selection.model_building.generate_interactions(endmembers, order, symmetry)

Returns a list of sorted interactions of a certain order

Parameters:
  • endmembers (list) – List of tuples/strings of all endmembers (including symmetrically equivalent)
  • order (int) – Highest expected interaction order, e.g. ternary interactions should be 3
  • symmetry (list of lists) – List of lists containing symmetrically equivalent sublattice indices, e.g. [[0, 1], [2, 3]] means that sublattices 0 and 1 are equivalent and sublattices 2 and 3 are also equivalent.
Returns:

List of interaction tuples, e.g. [(‘A’, (‘A’, ‘B’))]

Return type:

list

espei.parameter_selection.model_building.generate_symmetric_group(configuration, symmetry)

For a particular configuration and list of sublattices with symmetry, generate all the symmetrically equivalent configurations.

Parameters:
  • configuration (tuple) – Tuple of a sublattice configuration.
  • symmetry (list of lists) – List of lists containing symmetrically equivalent sublattice indices, e.g. [[0, 1], [2, 3]] means that sublattices 0 and 1 are equivalent and sublattices 2 and 3 are also equivalent.
Returns:

Tuple of configuration tuples that are all symmetrically equivalent.

Return type:

tuple

espei.parameter_selection.model_building.make_successive(xs)

Return a list of successive combinations

Parameters:xs (list) – List of elements, e.g. [X, Y, Z]
Returns:List of combinations where each combination include all the preceding elements
Return type:list

Examples

>>> make_successive(['W', 'X', 'Y', 'Z'])
[['W'], ['W', 'X'], ['W', 'X', 'Y'], ['W', 'X', 'Y', 'Z']]
espei.parameter_selection.model_building.sorted_interactions(interactions, max_interaction_order, symmetry)

Return interactions sorted by interaction order

Parameters:
  • interactions (list) – List of tuples/strings of potential interactions
  • max_interaction_order (int) – Highest expected interaction order, e.g. ternary interactions should be 3
  • symmetry (list of lists) – List of lists containing symmetrically equivalent sublattice indices, e.g. [[0, 1], [2, 3]] means that sublattices 0 and 1 are equivalent and sublattices 2 and 3 are also equivalent.
Returns:

Sorted list of interactions

Return type:

list

Notes

Sort by number of full interactions, e.g. (A:A,B) is before (A,B:A,B) The goal is to return a sort key that can sort through multiple interaction orders, e.g. (A:A,B,C), which should be before (A,B:A,B,C), which should be before (A,B,C:A,B,C).

espei.parameter_selection.selection module

Fit, score and select models

espei.parameter_selection.selection.fit_model(feature_matrix, data_quantities, ridge_alpha)

Return model coefficients fit by scikit-learn’s LinearRegression

Parameters:
  • feature_matrix (ndarray) – (M*N) regressor matrix. The transformed model inputs (y_i, T, P, etc.)
  • data_quantities (ndarray) – (M,) response vector. Target values of the output (e.g. HM_MIX) to reproduce.
  • ridge_alpha (float) – Value of the $alpha$ hyperparameter used in ridge regression. Defaults to 1.0e-100, which should be degenerate with ordinary least squares regression. For now, the parameter is applied to all features.
Returns:

List of model coefficients of shape (N,)

Return type:

list

Notes

Solve Ax = b. x are the desired model coefficients. A is the ‘feature_matrix’. b corrresponds to ‘data_quantities’.

espei.parameter_selection.selection.score_model(feature_matrix, data_quantities, model_coefficients, feature_list, rss_numerical_limit=1e-16)

Use the AICc to score a model that has been fit.

Parameters:
  • feature_matrix (ndarray) – (M*N) regressor matrix. The transformed model inputs (y_i, T, P, etc.)
  • data_quantities (ndarray) – (M,) response vector. Target values of the output (e.g. HM_MIX) to reproduce.
  • model_coefficients (list) – List of fitted model coefficients to be scored. Has shape (N,).
  • feature_list (list) – Polynomial coefficients corresponding to each column of ‘feature_matrix’. Has shape (N,). Purely a logging aid.
  • rss_numerical_limit (float) – Anything with an absolute value smaller than this is set to zero.
Returns:

A model score

Return type:

float

Notes

Solve Ax = b, where ‘feature_matrix’ is A and ‘data_quantities’ is b.

The likelihood function is a simple least squares with no regularization. The form of the AIC is valid under assumption all sample variances are random and Gaussian, model is univariate. It is assumed the model here is univariate with T.

espei.parameter_selection.selection.select_model(candidate_models, ridge_alpha)

Select a model from a series of candidates by fitting and scoring them

Parameters:
  • candidate_models (list) – List of tuples of (features, feature_matrix, data_quantities)
  • ridge_alpha (float) – Value of the $alpha$ hyperparameter used in ridge regression. Defaults to 1.0e-100, which should be degenerate with ordinary least squares regression. For now, the parameter is applied to all features.
Returns:

Tuple of (feature_list, model_coefficients) for the highest scoring model

Return type:

tuple

espei.parameter_selection.ternary_parameters module

Build fittable models for ternary parameter selection

espei.parameter_selection.ternary_parameters.build_ternary_feature_matrix(prop, candidate_models, desired_data)

Return an MxN matrix of M data sample and N features.

Parameters:
  • prop (str) – String name of the property, e.g. ‘HM_MIX’
  • candidate_models (list) – List of SymPy parameters that can be fit for this property.
  • desired_data (dict) – Full dataset dictionary containing values, conditions, etc.
Returns:

An MxN matrix of M samples (from desired data) and N features.

Return type:

numpy.ndarray

espei.parameter_selection.ternary_parameters.get_muggianu_samples(desired_data)

Return the data values from desired_data, transformed to interaction products. Specifically works for Muggianu extrapolation.

Parameters:desired_data (list) – List of matched desired data, e.g. for a single property
Returns:List of sample values that are properly transformed.
Return type:list

Notes

Transforms data to interaction products, e.g. YS*{}^{xs}G=YS*XS*DXS^{n} {}^{n}L Each tuple in the list is a tuple of (temperature, (site_fraction_product, interaction_product)) for each data sample Interaction product itself is a list that corresponds to the Mugiannu corrected interactions products for components [I, J, K]

espei.parameter_selection.utils module

Tools used across parameter selection modules

espei.parameter_selection.utils.interaction_test(configuration, order=None)

Returns True if the configuration has an interaction

Parameters:order (int, optional) – Specific order to check for. E.g. a value of 3 checks for ternary interactions
Returns:True if there is an interaction.
Return type:bool

Examples

>>> configuration = [['A'], ['A','B']]
>>> interaction_test(configuration)
True  # has an interaction
>>> interaction_test(configuration, order=2)
True  # has a binary interaction
>>> interaction_test(configuration, order=3)
False  # has no ternary interaction
espei.parameter_selection.utils.shift_reference_state(desired_data, feature_transform, fixed_model)

Shift data to a new common reference state.

Module contents