.. raw:: latex

   \chapter{Recipes}

.. _Recipes:

=======
Recipes
=======

Here you can find some useful snippets of code to make using ESPEI easier.

Get optimal parameter TDB files from a trace
============================================

Creating TDBs of optimal parameters from a tracefile and probfile:

.. code-block:: python

   """
   This script updates an input TDB file with the optimal parameters from an ESPEI run.

   Change the capitalized variables to your desired input and output filenames.
   """

   INPUT_TDB_FILENAME = 'CU-MG_param_gen.tdb'
   OUTPUT_TDB_FILENAME = 'CU-MG_opt_params.tdb'
   TRACE_FILENAME = 'trace.npy'
   LNPROB_FILENAME = 'lnprob.npy'

   import numpy as np
   from pycalphad import Database
   from espei.analysis import truncate_arrays
   from espei.utils import database_symbols_to_fit, optimal_parameters

   trace = np.load(TRACE_FILENAME)
   lnprob = np.load(LNPROB_FILENAME)
   trace, lnprob = truncate_arrays(trace, lnprob)

   dbf = Database(INPUT_TDB_FILENAME)
   opt_params = dict(zip(database_symbols_to_fit(dbf), optimal_parameters(trace, lnprob)))
   dbf.symbols.update(opt_params)
   dbf.to_file(OUTPUT_TDB_FILENAME)


Plot phase equilibria data
==========================

When compiling ESPEI datasets of phase equilibria data, it can be useful to plot the data to check that it matches visually with what you are expecting.
This script plots a binary phase diagram.

*TIP:* Using this in Jupyter Notebooks make it really fast to update and watch your progress.

.. code-block:: python

   """
   This script will create a plot in a couple seconds from a datasets directory
   that you can use to check your phase equilibria data.

   Change the capitalized variables to the system information and the
   directory of datasets you want to plot.
   """

   COMPONENTS = ['CU', 'MG', 'VA']
   INDEPENDENT_COMPONENT = "MG"  # component to plot on the x-axis
   PHASES = ['BCC_A2', 'CUMG2', 'FCC_A1', 'LAVES_C15', 'LIQUID']

   DATASETS_DIRECTORY = "~/my-datasets/CU-MG"

   X_MIN, X_MAX = 0.0, 1.0
   Y_MIN, Y_MAX = 400, 1400

   # script starts here, you shouldn't have to edit below this line
   import os
   from espei.plot import dataplot
   from espei.datasets import recursive_glob, load_datasets
   from pycalphad import variables as v
   import matplotlib.pyplot as plt

   plt.figure(figsize=(10,8))

   ds = load_datasets(recursive_glob(os.path.expanduser(DATASETS_DIRECTORY), '*.json'))
   conds = {v.P: 101325, v.T: (1,1,1), v.X(INDEPENDENT_COMPONENT): (1, 1, 1)}
   dataplot(COMPONENTS, PHASES, conds, ds)
   plt.xlim(X_MIN, X_MAX)
   plt.ylim(Y_MIN, Y_MAX)
   plt.show()

The script gives the following output:

.. figure:: _static/dataplot-recipe-cu-mg.png
    :alt: Cu-Mg dataplot recipe
    :width: 500pt


Plot phase diagram with data
============================

.. code-block:: python

   %matplotlib inline
   from pycalphad import Database, binplot, variables as v
   from espei.datasets import load_datasets, recursive_glob
   from espei.plot import dataplot
   import matplotlib.pyplot as plt

   # load the experimental and DFT datasets
   dataset_dir = "input-data"
   dataset_dir = "/Users/brandon/Projects/2020-workshop-material/ESPEI/input-data/run"
   datasets = load_datasets(recursive_glob(dataset_dir, "*.json"))

   # set up the pycalphad phase diagram calculation
   dbf = Database("/Users/brandon/Projects/2020-workshop-material/ESPEI/mcmc-tmp.tdb")
   comps = ["CR", "NI", "VA"]
   phases = ["LIQUID", "FCC", "BCC"]
   conds = {v.P: 101325, v.T: (700, 2200, 10), v.X("CR"): (0, 1, 0.01)}

   # plot the phase diagram and data
   fig, ax = plt.sublots(dpi=150)
   binplot(dbf, comps, phases, conds, plot_kwargs={"ax": ax})
   dataplot(comps, phases, conds, datasets, ax=ax)
   ax.set_title("Cr-Ni\nParameter Generation")
   ax.set_ylim(conds[v.T][0], conds[v.T][1])
   ax.set_xlabel("X(CR)")
   ax.set_ylabel("Temperature (K)")
   fig.show()

.. figure:: _static/phase-diagram-recipe-cr-ni.png
   :alt: Cr-Ni phase diagram recipe
   :width: 500pt


Plot thermochemical properties parameters with data
===================================================

Parameter selection in ESPEI fits Calphad parameters to thermochemical data.
MCMC can adjust these parameters.
In both cases, it may be useful to compare the energies of specific
interactions to the model predictions. The
:py:func:`espei.plot.plot_interaction` code will plot the predicted
interaction from the database against the available data, if any.

.. code-block:: python

   """
   This script plots a single interaction in a database compared to data.
   """

   # Settings
   INPUT_TDB_FILENAME = 'CU-MG_param_gen.tdb'
   DATASET_DIRECTORY = 'input-data'
   PHASE_NAME = 'LIQUID'
   # CONFIGURATION must be a tuple of the configuration to be plotted.
   # This can only plot one endmember or interaction at a time.
   # Note that the outside tuples are the whole configuration
   # and the insides are for each individual sublattice.
   # Single sublattices *MUST* have the comma after the
   # object in order to be a tuple, not just parantheses.
   # some examples:
   # ('CU', 'MG')  # endmember
   # (('CU', 'MG'),)  # (('CU', 'MG')) is invalid because it will become ('CU', 'MG')
   # ('MG', ('CU', 'MG'))
   CONFIGURATION = (('CU', 'MG'),)

   # Plot the parameter
   import matplotlib.pyplot as plt
   from pycalphad import Database
   from espei.datasets import load_datasets, recursive_glob
   from espei.plot import plot_interaction

   dbf = Database(INPUT_TDB_FILENAME)
   comps = sorted(dbf.elements)
   ds = load_datasets(recursive_glob(DATASET_DIRECTORY, '*.json'))
   plot_interaction(dbf, comps, PHASE_NAME, CONFIGURATION, 'HM_MIX', datasets=ds)
   plt.show()


Running for the single sublattice LIQUID phase in Cu-Mg gives the following output after parameter selection:

.. figure:: _static/cu-mg-plot_parameters-liquid.png
    :alt: Cu-Mg LIQUID HM_MIX
    :scale: 75%


Visualize probability convergence
=================================

Convergence can be qualitatively estimated by looking at how the
log-probability changes for all of the chains as a function of iterations.

.. code:: python

    # remove next line if not using iPython or Juypter Notebooks
    %matplotlib inline
    import matplotlib.pyplot as plt
    import numpy as np
    from espei.analysis import truncate_arrays

    trace = np.load('trace.npy')
    lnprob = np.load('lnprob.npy')

    trace, lnprob = truncate_arrays(trace, lnprob)


    ax = plt.subplot()
    ax.set_yscale('log')
    ax.set_xlabel('Iterations')
    ax.set_ylabel('- lnprob')
    ax.plot(-lnprob.T)
    plt.show()


.. image:: _static/docs-analysis-example_1_0.png


Visualize the trace of each parameter
=====================================

Looking at how each parameter chain evolves across the chains can show
if any particular chains are diverging from the rest, if there are
multiple modes being explored, or how wide the distribution of parameters
are relative to each other.

.. code:: python

    # remove next line if not using iPython or Juypter Notebooks
    %matplotlib inline
    import matplotlib.pyplot as plt
    import numpy as np

    from espei.analysis import truncate_arrays

    trace = np.load('trace.npy')
    lnprob = np.load('lnprob.npy')

    trace, lnprob = truncate_arrays(trace, lnprob)

    num_chains = trace.shape[0]
    num_parameters = trace.shape[2]
    for parameter in range(num_parameters):
        fig, ax = plt.subplots()  # new figure for every parameter
        ax.set_xlabel('Iterations')
        ax.set_ylabel('Parameter value')
        ax.plot(trace[..., parameter].T)
    plt.show()


The example below is for *one* parameter. Running the snippet above will plot
all of the parameters on separate plots.

.. image:: _static/docs-analysis-example_3_0.png


Plot a corner plot
==================

Note: You must install the ``corner`` package before using it
(``conda install corner`` or ``pip install corner``).

In a corner plot, the distributions for each parameter are plotted along
the diagonal and covariances between them under the diagonal. A more
circular covariance means that parameters are not correlated to each
other, while elongated shapes indicate that the two parameters are
correlated. Strongly correlated parameters are expected for some
parameters in CALPHAD models within phases or for phases in equilibrium,
because increasing one parameter while decreasing another would give a
similar likelihood.

.. code:: python

    # remove next line if not using iPython or Juypter Notebooks
    %matplotlib inline
    import matplotlib.pyplot as plt
    import numpy as np
    import corner

    from espei.analysis import truncate_arrays

    trace = np.load('trace.npy')
    lnprob = np.load('lnprob.npy')

    trace, lnprob = truncate_arrays(trace, lnprob)

    # flatten the along the first dimension containing all the chains in parallel
    fig = corner.corner(trace.reshape(-1, trace.shape[-1]))
    plt.show()

.. image:: _static/docs-analysis-example_5_0.png


Plot ZPF driving forces
=======================
This visualization can be used as a diagnostic for understanding which ZPF data
are contributing the most driving force towards the likelihood. Note that these
driving forces are unweighted, since the weight is applied when computing the
log-likelihood of each driving force.

.. code:: python

   import numpy as np
   import matplotlib.pyplot as plt
   from pycalphad import Database, binplot, variables as v
   from pycalphad.core.utils import extract_parameters
   from espei.datasets import load_datasets, recursive_glob
   from espei.error_functions.zpf_error import get_zpf_data, calculate_zpf_driving_forces

   # User input variables
   TDB_PATH = 'mcmc.tdb'
   DATASETS_DIR = '../input-data'
   COMPS = ['CR', 'NI', 'VA']
   INDEP_COMP = v.X('NI')  # binary assumed
   CONDS = {v.N: 1, v.P: 101325, v.T: (500, 2200, 20), INDEP_COMP: (0, 1, 0.01)}
   CMAP = 'hot'
   outfile = 'driving-forces.png'
   parameters = {}  # e.g. {'VV0001': 10000.0}
   approximate_equilibrium = False

   # Script below:*
   dbf = Database(TDB_PATH)
   phases = list(dbf.phases.keys())

   # Get the datasets, construct ZPF data and compute driving forces
   # Driving forces and weights are ragged 2D arrays of shape (len(zpf_data), len(vertices in each zpf_data))
   ds = load_datasets(recursive_glob(DATASETS_DIR, '*.json'))
   zpf_data = get_zpf_data(dbf, COMPS, phases, ds, parameters=parameters)
   param_vec = extract_parameters(parameters)[1]
   driving_forces, weights = calculate_zpf_driving_forces(zpf_data, param_vec, approximate_equilibrium=approximate_equilibrium)

   # Construct the plotting compositions, temperatures and driving forces
   # Each should have len() == (number of vertices)
   # Driving forces already have the vertices unrolled so we can concatenate directly
   Xs = []
   Ts = []
   dfs = []
   for data, data_driving_forces in zip(zpf_data, driving_forces):
       for phase_region in data['phase_regions']:
           for vertex, df in zip(phase_region.vertices, data_driving_forces):
               comp_cond = vertex.comp_conds
               if vertex.has_missing_comp_cond:
                   # No composition to plot
                   continue
               dfs.append(df)
               Ts.append(phase_region.potential_conds[v.T])
               # Binary assumptions here
               assert len(comp_cond) == 1
               if INDEP_COMP in comp_cond:
                   Xs.append(comp_cond[INDEP_COMP])
               else:
                   # Switch the dependent and independent component
                   Xs.append(1.0 - tuple(comp_cond.values())[0])

   # Plot the phase diagram with driving forces
   fig, ax = plt.subplots(dpi=100)
   binplot(dbf, COMPS, phases, CONDS, plot_kwargs={'ax': ax}, eq_kwargs={'parameters': parameters})
   sm = plt.cm.ScalarMappable(cmap=CMAP)
   sm.set_array(dfs)
   ax.scatter(Xs, Ts, c=dfs, cmap=CMAP, edgecolors='k')
   fig.colorbar(sm, ax=ax, pad=0.25)
   if outfile is not None:
       fig.savefig(outfile)
   else:
       fig.show()

.. image:: _static/driving-force-diagnostic.png