Module Fitters

Contains different optimization algorithms designed to fit reflectivity data. They take advantage of parallelization to be used on multiprocessor system.

The algorithms are developed by Martin Zwiebel and I just adopted them with slight changes to PyXMRTool. More information can be found in the PhD thesis of Martin Zwiebler.

The algorithms are not well developed yet. It is better to use existing optimizers. E.g. scipy.optimize.least_squares.

Only Explore() and related functions are recommended to use. Explore() uses scipy.optimize.least_squares to explore the complete parameter range and list_clusters(), plot_clusters_onepar(), plot_clusters_allpars() and plot_fixpoints_allpars() are used to visualize the result.

Fitters.Explore(residualsfunction, parameter_settings, number_of_seeds, verbose=2, number_of_clusters=None)[source]

A scanning function which should be usefull to explore the parameter space.

It chooses number_of_seeds different random start parameter vectors (seeds) within the given paramteter range. Each seed is used as start parameter set for a least_square fitter to find the minimum of the sum of squared residuals (ssr) (using scipy.optimize.least_squares() with the trust region reflective algorithm). This will lead to **number_of_seeds* fixpoints. They will then be analysed with a (k-means) clustering algorithm to group these fixpoint in number_of_clusters different clusters. If number_of_clusters = None (default), the clustering with the best sillouette coefficient will be used. These clusters will then be analysed: What is the SSR corresponding to the cluster centers? How many seeds lead to the corresponding clusters? What are the means and spreads of parameter values within each cluster?

The return value will be a structure containing the results.

Parameters:
  • residualsfunction (callable) – A function which returns the differences between simulated and measured data points (residuals) as list/array. It should usally be the method SampleRepresentation.ReflDataSimulator.getResiduals' of an instance of :class:`SampleRepresentation.ReflDataSimulator().
  • parameter_settings (tuple of lists/arrays of floats) – Sets start values, lower and upper limit of the parameters as (startfitparameters, lower_limits, upper_limits ), where each of the entries is an list/array of values of same length. The startfitparameters are not used (can be None) and just necessaray for compatibility.
  • number_of_seeds (int) – number of random seeds which should be generated
  • verbose ({0, 1, 2}) –
    determines the level of the optimizer’s algorithm’s verbosity:
    0 : work silently. 1 : display a termination report for each seed. 2 (default) : display progress during iterations.
  • number_of_clusters (int) – number of clusters in which the resulting fixpoints shall be grouped
Fitters.Cluster(scan_output, ssrfunction, number_of_clusters=None)[source]

Function to deal with output of Explore().

Clusters the found fixpoints and returns the result as structure in the same format as Explore(). Actually, it is used by Explore() internally.

Parameters:
  • scan_output (struct) – return value of Explore()
  • ssrfunction (callable) – A function which returns the sum of squared residuals between simulated and measured data points (ssr). It should usally be the method SampleRepresentation.ReflDataSimulator.getSSR() of an instance of SampleRepresentation.ReflDataSimulator.
  • number_of_clusters (int) – number of clusters in which the resulting fixpoints shall be grouped
Fitters.list_clusters(scan_output)[source]

Function to deal with output of Explore().

Lists all found clusters (which should correspond to fixpoints) and their properties.

Fitters.plot_clusters_onepar(scan_output, p, parameter_pool=None, ssr_lim=None)[source]

Function to deal with output of Explore() (scan_output).

Shows the parameter values of the centers of the found clusters of one parameter. On the y axis the corresponding sum of squared residuals of the fit are shown. The size (area) of the bubles corresponds to the ratio of seeds which converged to this fixpoint. The bars show the range of parameter values which were assigned to this cluster/fixpoint.

Parameters:
  • scan_output (struct) – Structure as returned by Explore().
  • p (int or str) – Selects the parameter. Either with its index or its name. In the second case parameter_pool has to be given.
  • parameter_pool (Parameters.ParameterPool) – The paramter pool containing the parameters, which are under consideration. If given, parameter names are plotted and the x axis is adjusted to lower and upper limits stored in parameter_pool.
  • ssr_lim (list/tuple) – lower and upper limit of y-axis (ssr)
Fitters.plot_clusters_allpars(scan_output, parameter_pool=None, ssr_lim=None)[source]

Function to deal with output of Explore() (scan_output).

Shows the parameter values of the centers of the found clusters of all parameter in a multiplot. On the y axis the corresponding sum of squared residuals of the fit are shown. The size (area) of the bubles corresponds to the ratio of seeds which converged to this fixpoint. The bars show the range of parameter values which were assigned to this cluster/fixpoint.

Parameters:
  • scan_output (struct) – Structure as returned by Explore().
  • parameter_pool (Parameters.ParameterPool) – The paramter pool containing the parameters, which are under consideration. If given, parameter names are plotted and the x axes are adjusted to lower and upper limits stored in parameter_pool.
  • ssr_lim (list/tuple) – lower and upper limit of y-axis (ssr)
Fitters.plot_fixpoints_allpars(scan_output, parameter_pool=None, ssr_lim=None)[source]

Function to deal with output of Explore() (scan_output).

Shows the parameter values of the centers of the found clusters of all parameter in a multiplot. On the y axis the corresponding sum of squared residuals of the fit are shown. The size (area) of the bubles corresponds to the ratio of seeds which converged to this fixpoint. The bars show the range of parameter values which were assigned to this cluster/fixpoint. Additionally, all fixpoints are also shown, colored according to their cluster asscociated.

Parameters:
  • scan_output (struct) – Structure as returned by Explore().
  • ssrfunction (callable) – A function which returns the sum of squared residuals between simulated and measured data points (ssr). It should usally be the method SampleRepresentation.ReflDataSimulator.getSSR() of an instance of SampleRepresentation.ReflDataSimulator.
  • parameter_pool (Parameters.ParameterPool) – The paramter pool containing the parameters, which are under consideration. If given, parameter names are plotted and the x axes are adjusted to lower and upper limits stored in parameter_pool.
  • ssr_lim (list/tuple) – lower and upper limit of y-axis (ssr)
Fitters.Evolution(costfunction, parameter_settings, iterations, number_of_cores=1, generation_size=300, mutation_strength=0.01, elite=2, parent_percentage=0.25, control_file=None, plotfunction=None)[source]

Evolutionary fit algorithm. Slow but good in finding the global minimum. Return the optimized parameter set and the coresponding value of the costfunction.

Parameters:
  • costfunction (callable) –

    A function which returns a measure (cost) for the difference between measurement and simulated data according to the paramter set given as list of values. Usually the sum of squared residuals (SSR) is used as cost. It should usally be the method SampleRepresentation.ReflDataSimulator.getSSR() of an instance of SampleRepresentation.ReflDataSimulator wrapped in a function. The wrapping is necessaray due to some implemetation issues connected to the parallelization. Example for the wrapping:

    simu = SampleRepresentation.ReflDataSimulator("l")
    ...
    def cost(fitpararray):
        return simu.getSSR(fitpararray)
    

    Pass then the function cost as costfunction. It can also be any other function which takes the array of fit parameters and returns one real value which should be minimized by Evolution().

  • parameter_settings (tuple of lists of floats) – Sets start values, lower and upper limit of the parameters as (startfitparameters, lower_limits, upper_limits ), where each of the entries is an list/array of values of same length.
  • iterations (int) – number of iterations/generations
  • number_of_cores (int) – Number of jobs used in parallel. Best performance when set to the number of available cores on your computer.
  • generation_size (int) – Generate this many individual fit parameter sets in each generation.
  • mutation_strength (float) – Mutates children by adding this factor times (upper_limit - lower_limit) –> use rather small values
  • elite (int) – Remember the best individuals for the next generation.
  • parent_percentage (flota) – Use this fraction of a gereneration (the best) for reproduction.
  • control_file (str) – Filename of a control file. If it is given, you can abort the optimization routine by writing “terminate 1” to the beginning of its first line.
  • plotfunction (callable) – Function which is used to plot the current state of fitting (simulated data with currently best parameter set) after every iteration if given. It should take only one parameter: the array of fitparameters.

This Evolutionary algorithm is mainly the same as Martins. Only the rule for mutation has changed:

Martin: children[i]=children[i] * (1 + s * random float(-1,1))
I: children[i]=children[i] + s * random float(-1,1)*(upper_limits-lower_limits)
Fitters.Levenberg_Marquardt_Fitter(residualandcostfunction, parameter_settings, parallel_points, number_of_cores=1, strict=True, convergence_criterium=1e-07, control_file=None, plotfunction=None)[source]

Modified Levenberg-Marquard algorithm (see PhD thesis of Martin Zwiebler). Good convergence, but might end up in a local mininum. Return the optimized parameter set and the coresponding value of the costfunction.

Parameters:
  • residualandcostfunction (callable) –

    A function which returns the differences between simulated and measured data points (residuals) as list and a scalar measure (cost) for these differences in total according to the paramter set given as list of values. Usually the sum of squared residuals (SSR) is used as cost. It should usally be the method SampleRepresentation.ReflDataSimulator.getResidualsSSR() of an instance of SampleRepresentation.ReflDataSimulator wrapped in a function. The wrapping is necessaray due to some implemetation issues connected to the parallelization. Example for the wrapping:

    simu = SampleRepresentation.ReflDataSimulator("l")
    ...
    def rescost(fitpararray):
        return simu.getResidualsSSR(fitpararray)
    

    Pass then the function rescost as costfunction. It can also be any other function which takes the array of fit parameters and returns a tuple of 1.) a list of residuals (will be used to determine derivatives) 2.) a value of the costfunction which should be minimized by Levenberg_Marquardt_Fitter().

  • parameter_settings (tuple of lists of floats) – Sets start values, lower and upper limit of the parameters as (startfitparameters, lower_limits, upper_limits ), where each of the entries is an list/array of values of same length.
  • parallel_points (int) – This should be something like the number of threads that can run in parallel/number of cores. The algorithm will first find a direction for a good descent and then check this number of points on the line. The best one will yield the new fit parameter set.
  • number_of_cores (int) – Number of jobs used in parallel. Best performance when set to the number of available cores on your computer.
  • strict (bool) – Usually this algorithm fails if the residuals are locally independent of one of the parameters. If you set stict = False this parameter will be neglected locally.
  • convergence_criterium (float) – If the relative difference between the costs in two succeeding iterations is smaller than convergence_criterium, the fitting is defined as `converged`.
  • control_file (str) – Filename of a control file. If it is given, you can abort the optimization routine by writing “terminate 1” to the beginning of its first line.
  • plotfunction (callable) – Function which is used to plot the current state of fitting (simulated data with currently best parameter set) after every iteration if given. It should take only one parameter: the array of fitparameters.