PropTech Use Case
CEO of newly established FinTech company focused on real estate digitization is looking to establish fair price of real estate (houses, apartments) in a specific area. The user here, CEO, is keen to establish daily pricing summaries. You are to design ML algorithm to deliver on this solution.
Business Example: BrickX
Functional Programming Example Here
The purpose of this Use Case / Story is
(1) to demonstrate number of ML algorithms as used in an experimental setting and
(2) to demonstrate innovation in FinTech and Real Estate Financing and Valuations
(3) to expand on previous discussions related to Regression Analysis.
(4) to showcase the reasons for using ML algorithms --> SHARPER PREDICTIONS!!!
OLS belongs to the ML supervised algorithms
As a start we need to understand the use case. We have requirement to predict value of real estate asset given certain parameters collected from other external sources. Then we need to find a method that is robust and tested to be able to confidently deliver on the solution. We start to analyze all available data sources. In this case we operate with already cleaned and pre-processed data as per below
Data Attributes & Features
For this we choose standard ML database - Boston Housing Data.
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per USD 10,000
PTRATIO pupil-teacher ratio by townB 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in USD 1000's
Where MEDV is the median value (y) of real estate asset we are aiming to predict given the above features (X)
It is the Random Forest Model that we choose from model selection as this modeling framework returns to most accurate results and predictions.
MSE train: 1.638, test: 11.067
R^2 train: 0.980, test: 0.877
How did we get here?
Well, simple we start with Python and then Data Extract, Transforms and Load then Data Feature Engineering and then Model Selection and finally Implementation! Find the execution here
Post capturing data sequence and data streams it is time to enhance our understanding of the data source by using visual capability. We look at correlation matrix and detailed visual data breakdown:
Machine Learning Algorithms
Now we are in a position to proceed and look at regressing MEDV value variable across one selected feature - number of rooms in a specific real estate, here we define your variables & model parameters and design as OUR BASE CASE:
ML RANdom SAmple Consensus (RANSAC) Algorithm
Let us explore more advanced ML models here we look at RANSAC Regressors
ML Model Accuracy
It is time to consider how well our models actually are.
ML LASSO Algorithm
Least Absolute Shrinkage and Selection Operator (LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.
"Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. A decision tree is a simple representation for classifying examples. For this section, assume that all of the inputs have finite discrete domains, and there is a single target feature called the "classification". Each element of the domain of the classification is called a class. A decision tree or a classification tree is a tree in which each internal (non-leaf) node is labeled with an input feature. The arcs coming from a node labeled with an input feature are labeled with each of the possible values of the target feature or the arc leads to a subordinate decision node on a different input feature. Each leaf of the tree is labeled with a class or a probability distribution over the classes, signifying that the data set has been classified by the tree into either a specific class, or into a particular probability distribution (which, if the decision tree is well-constructed, is skewed towards certain subsets of classes)" [x]
ML Random Forest
Random forests or random decision forests are an ensemble learning method for classification, regression and others, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.
MSE train: 1.639, test: 11.063
R^2 train: 0.980, test: 0.878
FINAL PRECISION MEASURES