In this project, I will be evaluating prices of houses in suburbs of Boston by using Machine Learning. Machine learning is achieved using Python 3. Model used for evaluating results is Decision Tree Regressor.
Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.
Regression. Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.
At time of developing this program following version of packages/libraries are used.
- Python version 3.7.1
- scikit-learn version 0.20.1
- numpy version 1.15.4
- pandas version 0.23.4
- matplotlib version 3.0.2
The Boston data frame has 506 rows and 14 columns. This data frame contains the following columns: crim: per capita crime rate by town. zn: proportion of residential land zoned for lots over 25,000 sq.ft. indus: proportion of non-retail business acres per town. chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise). nox: nitrogen oxides concentration (parts per 10 million). rm: average number of rooms per dwelling. age: proportion of owner-occupied units built prior to 1940. dis: weighted mean of distances to five Boston employment centres. rad: index of accessibility to radial highways. tax: full-value property-tax rate per $10,000. ptratio: pupil-teacher ratio by town. black: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town. lstat: lower status of the population (percent). medv: median value of owner-occupied homes in $1000s.
This is required & correct information upto my knowledge.