*Actions have consequences!!!*

Regression analysis is a statistical technique which studies the cause effect relationship between the independent variable (explanatory/predictors) and the dependent (outcome/response) variables.

Cause Effect → An event (cause) that leads to happening of other event(effect).

The underlying objective of regression is to identify the nature of cause-effect relationship between the variables. Based on the kind of relationship, the regression technique is chosen.

*Lets breakdown first, what is meant by linear regression?*Linear regression is one of the oldest, simplest, and most popular regression technique. Linear regression is based on assumption of linear relationship between predictor and outcome variable. In other words, the relationship can be expressed with first order equation. Linear regression is originally statistical technique which is also borrowed by machine learning under supervised learning algorithm. …

Regression analysis is a statistical technique used for unraveling the relationship between independent variable (explanatory/predictors) and dependent (outcome/response) variables. Regression analysis is used for understanding the nature of relationship between the variables as well as for forecasting. In business terms this is extremely useful as one can tune the desired output by tuning the dependent variables.

One size doesn’t fit all!

Depending on conditions and data characteristics, a particular regression model is suited more than others. Regression algorithms tries to model the “relationship” between variables. To make precise prediction (or accurate modeling of the data), algorithms minimize the error in predictions. …

Before we start discussing the quantitative variable EDA, let’s quickly recap EDA in general and categorical EDA. EDA includes graphical and statistical technique to find the dataset characteristics. Although approach of EDA for categorical variables is slightly different from quantitative variables, the underlying reasons more or less are the same.

Part I describes Categorical variable EDA in depth.

The sample statistics are accessed using the quantitative variables, which can be used to paint initial picture of population distribution. The distribution of population is characterised by the central tendency, dispersion, asymmetry, peakedness. …

As the name suggest, it is a technique to analyse the data by exploring it. EDA is one of the key as well as indispensable step in data analysis. This can be understood as a general checkup of patient (data) by the doctors (data enthusiast), before doing any surgery (analysis, modeling, prediction, classification, etc.).

EDA is done for two major underlying reasons, firstly to understand the data and secondly to identify faults or peculiar events (data points) in the dataset. Let’s try to understand what is meant by understanding and peculiar data points in detail.

**— To understand the data characteristics **

Problem statement:Let’s say, we have to create a multi-label classifier from raw audio files with some spoken words.

Before we actually dive into nitty-gritty of how to classify a sound fragment, let’s understand the basics indispensable concepts. I will try to keep things simple and conceptually easily to understand.

**Phase 1: Data Exploration**

First and foremost step is to explore and try to understand the data. For instance, find out what is the distribution of labels, is there any noisy files, what is average duration of files. Listen to the audio files that you find little odd. …

About