Distance calculation

Introduction

In Machine Learning & AI, when handling large datasets for prediction or forecasting class imbalance issues are quite common. If it is a classification problem and the target label is multiclass type, the best practice is to have the class labels balanced. Class balance in single or multiple variables can be performed using a variety of distance measures, ratings, and approaches. One of the most difficult aspects of the data preparation phase is detecting outliers in multivariate data. Handling Outliers & Class balancing are quite important to develop more generalized model. One of the most well-known distance measures for identifying outliers based on their distance from the center point is called Euclidean distance. A Z-Score may also be calculated to identify distances in a single numeric variable. Clustering methods may be preferable in some circumstances. All of these approaches look at outliers from various angles. Outliers identified by one approach may not be identified as outliers by others. As a result, the distribution of the variables should be considered while selecting these techniques and metrics. This, however, highlights the need for various measures. Let's see what are Euclidean and Mahalanobis distance measure for finding outliers in multivariable data.

Page updated

Google Sites

Report abuse