Naïve Bayes is a classification algorithm that is a probabilistic classifier based on Bayes theorem. Before getting into the intricacies of Naïve Bayes, we first understand the Bayes theorem.

Bayes theorem says that if event B has occurred, then we can find the probability of event A given B. …

The imbalanced dataset in real-world problems is not so rare. In layman terms, an imbalanced dataset is a dataset where classes are distributed unequally. An imbalanced data can create problems in the classification task. …

Bag of words (BoW) converts the text into a feature vector by counting the occurrence of words in a document. It is not considering the importance of words. **Term frequency — Inverse document frequency (TFIDF) **is based on the Bag of Words (BoW) model, which contains insights about the less…

Text data is used in natural language processing (NLP), which interacts between humans and machines using natural language. Text data helps analyze movie reviews, products using Amazon reviews, etc. But the question that arises here is **how to deal with text data when building a machine learning model?**

The Receiver Operating characteristic (ROC) curve is explicitly used for binary classification. However, it can be extended for multiclass classification.

In binary classification, when a model gives probability scores as output, we use 0.5 as a threshold for the simplest model. If the probability of a query point is greater…

Accuracy performance metrics can be decisive when dealing with imbalanced data. In this blog, we will learn about the Confusion matrix and its associated terms, which looks confusing but are trivial. The confusion matrix, precision, recall, and F1 score gives better intuition of prediction results as compared to accuracy. …

The machine learning model is built using training data (which has input as well as output). Prediction is made on the test data (unseen data which does not have an output label) using the same model. **But how do you figure out the effectiveness of the model?** …

- INTRODUCTION
- K-DISTANCE AND K-NEIGHBORS
- REACHABILITY DISTANCE (RD)
- LOCAL REACHABILITY DISTANCE (LRD)
- LOCAL OUTLIER FACTOR (LOF)
- EXAMPLE
- ADVANTAGES OF LOF
- DISADVANTAGES OF LOF
- CONCLUSION
- REFERENCES

An outlier is a data point that is different or far from the rest of the data points. …

- What is KNN?
- Working of KNN algorithm
- What happens when K changes?
- How to select appropriate K?
- Limitation of KNN
- Real-world application of KNN
- Conclusion

K nearest neighbors (KNN) is a supervised machine learning algorithm. A supervised machine learning algorithm’s goal is to learn a function such that f(X) =…