MLAI/Preprocessing
-
Feature ScalingMLAI/Preprocessing 2020. 1. 18. 21:39
1. Issue Let's explain what its features scaling and why we need to do it. So as you can see we have these two columns age and salary that contain numerical numbers. Let's just focus on the age and the salary. You notice that the variables are not on the same scale because the age is going from 27 to 50. And the salaries going from 40K to like 90K. So because this age variable in the salary vari..
-
Categorical DataMLAI/Preprocessing 2020. 1. 18. 20:02
1. Overview 2. Description 2.1 Encode Categorical Data Since machine learning models are based on mathematical equations you can intuitively understand that it would cause some problem if we keep the text here and the categorical variables in the equations because we would only want numbers in the equations. So that's why we need to encode the catacombs variables. That is to encode the text that..
-
Missing DataMLAI/Preprocessing 2020. 1. 18. 18:36
1. Overview to start preparing the data so that our machine learning models run correctly and the first problem that we have to deal with is the case where you have some missing data in your data set and that happens quite a lot actually in real life. 2. Description 2.1 Handling Missing Data 2.1.1 Deletion to remove this line and remove this line but that can be quite dangerous because imagine t..
-
Term frequency–inverse document frequency(TF-IDF)MLAI/Preprocessing 2019. 9. 25. 01:34
1. Overview A document-term or term-document matrix consists of the frequency of terms that exist in a collection of documents. In the document-term matrix, rows represent documents in the collection and columns represent terms whereas the term-document matrix is the transpose of it. 1.1 Motivation we have a large number of documents: Books Academic Articles Legal Documents Websites etc A user t..