Exploring Least-Squares Fitting
=====================================

To appreciate the power of the sample variance approach to acquiring and
analyzing data, we will use Igor Pro first to generate some noisy data that
ought to fit a line, and then use Igor Pro’s fitting operation to perform a
weighted least-squares fit to the synthetic data, from which we will extract the
slope and offset values from the fit. We can then compare these values to the
ones we used to generate the noisy data in the first place, and by exploring the
statistics of these parameter determinations over many repetitions of the
procedure, judge to what extent we may trust the parameter estimates we obtain
from least-squares fitting.

Generating noisy data
----------------------------------------

The function ``jumble`` takes parameters for the ``mean_noise``, the standard
deviation of the noise level, and the number of data points to compute and
generates a set of data for a line to which random gaussian noise has been
added. The slope and offset parameters of the line are determined by the
global variables ``slope`` and ``offset``, stored in Igor's current data folder.

After computing the data, ``jumble`` prepares a plot of the noisy data, runs
a weighted fit to the data, and reports the best-fit parameters and their
uncertainties.

.. code-block:: igorpro

        // jumble(mean_noise, std_noise, npts) creates a noisy data set,
        // plots and fits a line to it, and shows the residuals, normalized
        // residuals, and fitting parameters
        //
        // Example: jumble(4, 1, 50)
        Function jumble(mean_noise, std_noise, npts)
          Variable mean_noise, std_noise, npts
          
          NVAR/Z offset, slope                       // these are global variables that you
          if (!(NVAR_Exists(offset)))                // can edit in the Data Browser (cmd-B)
            Variable/G offset = -5.7, slope = 3.23   // make them, since they didn't already exist
          endif
          
          Make/O/N=(npts) data, amp, noise           // create waves for the data and the noise
          amp = mean_noise + gnoise(std_noise)       // each point has its own noise amplitude
          noise = gnoise(amp)                        // and its own noise
          data = offset + slope * x + noise          // the data are the line + noise
          
          KillWindow/Z NoiseLine                     // the /Z flag means don't complain
          Display/K=1 data                           // the /K=1 flag means no dialog on closing window
          DoWindow/C NoiseLine                       // change the window name to "NoiseLine"
          ErrorBars/Y=3 data Y,wave=(amp,amp)
          HMC#FixGraph()                             // a utility function to apply standard styling
          HMC#CaptureGraphHistory("NoiseLine")       // So that the AddChiSqInfoToPlot command has info
          String cmd = "CurveFit/TBOX=784 line data /W=amp /I=1 /D /R"
          print cmd
          Execute cmd
          
          HMC#AddChiSqInfoToPlot()                   // Fix up the annotation to be more informative
          HMC#AddNormalizedResiduals(0)              // and add a panel of normalized residuals
          ErrorBars/Y=3 Res_data Y,wave=(amp,amp)    // tighten up the end caps on the error bars
          ModifyGraph msize=2,expand=(CmpStr(IgorInfo(2),"Macintosh") ? 1.5 : 2)
          printf "Compare: offset = %g, slope = %g\r", offset, slope
        End

An example of the output of ``jumble`` is shown in the following figure.

.. figure:: NoiseLine.png
		:scale: 75%
		:alt: Example fit of a line to noisy data
		:align: center
        
		Some data scattered with gaussian noise about a line whose slope was set
		to :math:`3.23` and whose offset was :math:`-5.7`.

Copy the code for ``jumble`` into the Procedure window (you can show the
Procedure window either selecting it from the Window|Procedures menu or using
Cmd/Ctrl-M). Click the Compile button at the bottom of the Procedure window and
then enter ``Jumble(4, 1, 50)`` in the command line. Press return. If you have
copied faithfully, and if you have successfully installed the HMC menu, you
should see a figure similar to the one above.

Modifications
--------------------------

You can modify the slope and offset constants and try again, to see whether
``jumble`` behaves as expected. The ``slope`` and ``offset`` values are stored
in the root data folder. You can either set new values from the command line
like so:

.. code::

   slope = 4.32

or by bringing up the Data Browser (Cmd/Ctrl-B or from the Data menu), making
sure that the checkbox next to **Variables** is checked, and then looking in the
list of variables.

.. figure:: Browser.png
			:align: center

			To edit a variable in the Data Browser, click the pencil icon.

Then issue the ``jumble`` command again to see if the new data are suitably
modified.

Exploring parameter distributions
-------------------------------------------

We have shown theoretically that when we weight each data point using its
standard deviation in performing the least-squares fit, we expect the parameter
values that result from the fit to be close to the true values, and that the
uncertainty in the parameter values represent the standard deviation of their
distribution. That is, if we were to gather statistics over many fitting trials,
the values we deduce for the slope will be within one :math:`\sigma` from the
true value about 68% of the time.

To check this theory, we can run a large number ``nTrials`` of repetitions, 
and then make histograms of the normalized discrepancies between the parameter
values d


.. code-block:: igorpro

   
    // StudyParamDetermination(nTrials, mean_noise, std_noise, npts) runs a fit
    // such as described in jumble many (nTrials) times, keeping track of how far
    // from the true value of the fit parameter the determined value is. It
    // then produces a histogram of the normalized values for both the slope
    // and the offset fit parameter.
    //
    // Example: StudyParamDetermination(10000, 4, 1, 60)
    
    Function StudyParamDetermination(nTrials, mean_noise, std_noise, npts)
      Variable nTrials, mean_noise, std_noise, npts
      
      // We will perform nTrials linear fits, each yielding
      // a slope and offset, which will differ from the true value
      // used to compute the noisy data by an amount we can record
      // as a multiple of the uncertainty in the fitted parameter
      // value. We record those errors in offset_error and slope_error.
      
      Make/N=(nTrials)/O offset_error, slope_error
      Make/D/N=2/O coeffs                 // slope and offset
      Make/O/N=(npts) data, amp, noise    // waves for the data and noise
      NVAR offset, slope                  // the global parameters
      
      Variable n
      for (n = 0; n < nTrials; n += 1)
        amp = mean_noise + gnoise(std_noise)  // the noise amplitude may vary
                                          // for each data point
        noise = gnoise(amp)               // draw the actual noise for each 
                                          // point from a gaussian distribution
        data = offset + slope * x + noise

        // Perform the linear fit, putting the answers in the
        // wave coeffs. The /Q flag means to do it quietly, so
        // no delays from reporting to the history area and no dialogs.

        CurveFit/Q line kwCWave=coeffs data /W=amp /I=1
        
        WAVE W_sigma    // this wave is created automatically by CurveFit; it holds
                        // the uncertainties of the fit parameters
        
        // Record the normalized discrepancy in the fitting parameters

        offset_error[n] = (coeffs[0] - offset) / W_sigma[0]
        slope_error[n] = (coeffs[1] - slope) / W_sigma[1]
      endfor
      
      // Prepare a histogram for the slope errors and the offset errors

      Make/N=100/O slope_hist, offset_hist
      Histogram/B=1/C slope_error,slope_hist
      Histogram/B=1/C offset_error,offset_hist

      // Prepare graphs of the histograms
      Display/K=1 slope_hist
      HMC#FixGraph()
      ModifyGraph msize=2
      Display/K=1 offset_hist
      HMC#FixGraph()
      ModifyGraph msize=2

      // Lastly, shift the graphs so we see both of them
      GetWindow/Z kwTopWin wsizef
      MoveWindow V_right, V_top, 2*V_right - V_left, V_bottom
    End

An example of the output of ``StudyParamDetermination`` is shown in the figure below. 

.. figure:: histograms.png

    Histograms of the normalized discrepancy for the slope coefficient and the
    offset coefficient generated by ``StudyParamDetermination`` for 100,000
    trials.

Exercises
---------------

1. Use ``StudyParamDetermination`` to generate histograms of normalized discrepancies. Then
   investigate the claim that the discrepancies are normally distributed. Do the distributions
   have the expected width? Hint: the uncertainty in a point of a histogram that has :math:`N` 
   counts should be :math:`\sqrt{N}`.
  
2. What does all of this mean? That is, summarize what all this simulation really signifies.


Example
____________________

.. figure:: slope_histogram.png
    :scale: 125%
    :align: center

    Do these data support the claim that the normalized discrepancies are
    normally distributed with the appropriate gaussian width?