Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Feature Extraction
Neural Narrator
Jun 19, 2024
23 views
Feature Extraction
Series on NLP #4
Most classic ML algorithms can't take a raw text, Thus introduced feature "extraction" from the raw text in order to pass numerical features to the ML algorithm
for example: we can count the occurence of each word to map text to a number.
Let's discuss Counter Vectorisation along with Term-Frequency and Inverse Document Frequency.
messages = ["hey, let's go to the game today!", "call your sister", "want to go to walk your dogs"]
An alternative to Count vectorizer is something called TF-IDF vectorizer i.e term frequency - inverse document frequency vectorizer
let's talk about what TF-IDF means?
However, Term frequency alone is not enough for a thorough feature analysis of text.
For example: stop words like a, or the.
Because the term "the" is so common term frequency will tend to incorrectly emphasize documents which happen to use the word "the" more frequently, without giving enough weight to the more meaningful terms "red" or "dogs"
An inverse document frequency factor is incorporated which
how to implement in code: