-
Notifications
You must be signed in to change notification settings - Fork 0
SMEP: Nonlinear Models (GSOC)
Divyanshu Bandil (Josef Perktold) GSOC 2012
Status: unfinished, not merged, nonlinear least squares needs mainly cleanup
Statsmodels currently implements linear and robust linear regression models which use the explanatory variable, response variable and covariance matrices to find an optimal solution for beta array using the least squares method. The proposed project aims to extend this framework to nonlinear regression models. More specifically, the project on completion would allow the user to fit their dataset to a nonlinear model function and estimate the parameter array closely following the current implementation for Ordinary Least Squares(OLS), Weighted Least Squres(WLS) and Robus Linear Model(RLM) models. In the process, advanced robust estimators for nonlinear regression, like MM-estimator and tau-estimator will be introduced in statsmodels.
Discussion on statsmodels mailing list has resulted in following suggestions-
- Try to reuse parts of existing models for nonlinear regression modelling
- Improve the curve fitting algorithms for the nonlinear case
- Implement advanced robust estimators, especially MM-estimator and tau-estimator mentioned in the research paper
- Write an exhaustive test suite for verifying the results
- Previous work on nonlinear models in this branch
The concerned thread on the mailing list is here
The following steps give an overview of the execution plan
Specifying the python class for nonlinear models and necessary attributes for storing the model specification. One important attribute would be the nonlinear model function of explanatory variable and parameters to be estimated that represents the regression equation.
Providing with an appropriate function fitting method for nonlinear models. Most helpful would scipy.optimize.curve_fit module which is based on Non linear least squares approach and uses the popular Levenberg–Marquardt algorithm. The nonlinls.py module in statsmodels also tries to implement it. Note - An important research paper I went through is here which transforms the nonlinear least squares to a linear problem.
Developing a nonlinear results class derived from the Regressionresults class
Integrate and test the above machinery using OLS and WLS models. Some of the candidate nonlinear functions for testing can be found here.
Move on to robust models and integrate RLM class with nonlinear models class to implement M-estimation technique for robust nonlinear models.
Implement the robust MM-estimator and tau-estimator for nonlinear regression. The paper mentioned above uses differential evolution technique. A python implementation of the DE algorithm is provided in pyeq2. Although I believe it may not be necessary as SAS implements it by estimating the scale using an iterative algorithm followed by the usual IRLS for calculating the final MM-estimate. The relevant SAS manual page is here
Test suite for the above introduced advanced robust estimators.
Community Bonding Period - The time would be used for delving deeper into solutions available for nonlinear curve fitting. It would mainly consist of going through the refrences available. Also optimization algorithms used in other statistical packages will be considered. The aim of this period will be to consider the regular practices followed for nonlinear models across different statistical packages and new and efficient methods available. Depending on them, a roadmap will be developed for their implementation in statsmodels.
Week 1-2 - Begin with implementing the NonLinearModel class and related functions and attributes. Simultaneously develop the nonlinear function fitting machinery.
Week 3-4 - Implementing NonLinearModel_Results class. Integrating the developed classes with OLS and WLS models. Refactoring and including comments and docstrings as necessary. Writing a test suite for verifying the above classes and the introduced fit methods.
Week 5 - Start with robust models. Integrating the RLM class with NonLinearModel class and the respective reults classes. Testing it by calculating M-estimator for different datasets and candidate functions for nonlinear models and adding result statistics to the result classes.
Week 6-8 - Implementing the MM-estimator and tau-estimator for robust nonlinear models preferably using the existing solution framework. Updating setup.py and make the features of nonlinear modelling fully functional.
Week 9-10 - This period will be used for writing an extensive test suite using datasets similar to that used in other statistical packages. Deviations from the expected results to be documented and improved upon to find suitable explanations.
Week 11-12 - Integration of the code into the master branch. Provide documentation on Nonlinear modelling and the references used. Also it will be a cushion period for any unexpected delays.
Weekly progress reports will be added to the following blog Blog Url: http://nonlinearstatsmodels.blogspot.in/
If the project targets are achieved ahead of the schedule, following options for extensions are proposed-
Extend the generalised linear models and discrete models for nonlinear models Develop more curve fitting algorithms Implementation of other robust estimators
I am presently studying probability and statistics which has its application in electronics and communication engineering especially modelling and parameter estimation of different systems. Texts being referred to are:
Probability and Statistics for Engineering and the Sciences by Jay L. Devore Probability and Statistics in Engineering by William W. Hines, Douglas C. Montgomery, David M. Goldsman, Connie M. Borror
Some references for understanding statsmodels code and developing project plan are as follows:
- Statistics
- Statistical Methods by George Waddel Snedecor, William Gemmell Cochran Statistical Computing by William Jo Kennedy, James E. Gentle
- Econometrics
- Basic Econometrics By Gujarati Introduction to Econometrics By Christopher Dougherty