PropTech Use Case

CEO of newly established FinTech company focused on real estate digitization is looking to establish fair price of real estate (houses, apartments) in a specific area. The user here, CEO, is keen to establish daily pricing summaries. You are to design ML algorithm to deliver on this solution.

Business Example: __BrickX__

Functional Programming Example __Here__

The purpose of this Use Case / Story is

(1) to demonstrate number of ML algorithms as used in an experimental setting and

(2) to demonstrate innovation in FinTech and Real Estate Financing and Valuations

(3) to expand on previous discussions related to Regression Analysis.

(4) to showcase the reasons for using ML algorithms -->** SHARPER PREDICTIONS!!!**

Source: Raschka(2015)

OLS belongs to the ML supervised algorithms

**Problem**

As a start we need to understand the use case. We have requirement to predict value of real estate asset given certain parameters collected from other external sources. Then we need to find a method that is robust and tested to be able to confidently deliver on the solution. We start to analyze all available data sources. In this case we operate with already cleaned and pre-processed data as per below

*Data Attributes & Features*

*Data Attributes & Features*

For this we choose standard ML database - Boston Housing Data.

CRIM per capita crime rate by town

ZN proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS proportion of non-retail business acres per town

CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

NOX nitric oxides concentration (parts per 10 million)RM average number of rooms per dwelling

AGE proportion of owner-occupied units built prior to 1940

DIS weighted distances to five Boston employment centres

RAD index of accessibility to radial highways

TAX full-value property-tax rate per USD 10,000

PTRATIO pupil-teacher ratio by townB 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

LSTAT % lower status of the population

MEDV Median value of owner-occupied homes in USD 1000's

Where MEDV is the median value (y) of real estate asset we are aiming to predict given the above features (X)

**Results**

It is the Random Forest Model that we choose from model selection as this modeling framework returns to most accurate results and predictions.

MSE train: 1.638, test: 11.067

R^2 train: 0.980, test: 0.877

**How did we get here?**

**Well, simple we start with Python and then Data Extract, Transforms and Load then Data Feature Engineering and then Model Selection and finally Implementation! Find the execution **__here__

Post capturing data sequence and data streams it is time to enhance our understanding of the data source by using visual capability. We look at correlation matrix and detailed visual data breakdown:

**Machine Learning Algorithms**

Now we are in a position to proceed and look at regressing MEDV value variable across one selected feature - number of rooms in a specific real estate, here we define your variables & model parameters and design as OUR BASE CASE:

**ML RANdom SAmple Consensus (RANSAC) Algorithm**

Let us explore more advanced ML models here we look at __RANSAC Regressors __

**ML Model Accuracy**

It is time to consider how well our models actually are.

**ML LASSO Algorithm**

Least Absolute Shrinkage and Selection Operator (LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.

**Decision Tree**

"Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. A decision tree is a simple representation for classifying examples. For this section, assume that all of the inputs have finite discrete domains, and there is a single target feature called the "classification". Each element of the domain of the classification is called a *class*. A decision tree or a classification tree is a tree in which each internal (non-leaf) node is labeled with an input feature. The arcs coming from a node labeled with an input feature are labeled with each of the possible values of the target feature or the arc leads to a subordinate decision node on a different input feature. Each leaf of the tree is labeled with a class or a probability distribution over the classes, signifying that the data set has been classified by the tree into either a specific class, or into a particular probability distribution (which, if the decision tree is well-constructed, is skewed towards certain subsets of classes)" [__x__]

**ML Random Forest**

Random forests or random decision forests are an ensemble learning method for classification, regression and others, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.

MSE train: 1.639, test: 11.063

R^2 train: 0.980, test: 0.878

FINAL PRECISION MEASURES