Resolved: What is the job of tokenizer ?
Hi there,
I think I should review the NLP course and complete it, but I want to have a short answer for the need of tokenizer each time we do practicing, is the model not able to do the task without a tokenizer? What is the contribution of tokenization?
3 answers ( 1 marked as helpful)
Hi Abdulrahman!
Thanks for reaching out.
Think of the tokenizer as a bridge between the human language and the model’s understanding. A model doesn’t understand raw text directly. It only understands numbers, and the tokenizer turns text into a sequence of numbers (or tokens).
So, is the model able to do the tasks without tokenizer? The short answer is "no". Because without it, the model wouldn’t know how to read or interpret the input.
Hope this helps.
Best,
Ivan
Great and thanks for your answer.
Turning text to sequence of numbers isn't the job of the embbedings ?
Turning text to sequence of numbers isn't the job of the embbedings ?
Hey Abdulrahman!
Partly.
First, during tokenization, the raw text is split into tokens, which are still parts of the text — like words, subwords, or characters, depending on the tokenizer.
Then, the tokenizer maps these tokens to token IDs — which are numbers.
Finally, the embedding layer takes these token IDs and transforms them into vectors — numerical representations that capture semantic meaning.
So, you could say the tokenizer and embedding layer work together, but they handle different stages of the transformation.
Hope this helps.
Ivan
Partly.
First, during tokenization, the raw text is split into tokens, which are still parts of the text — like words, subwords, or characters, depending on the tokenizer.
Then, the tokenizer maps these tokens to token IDs — which are numbers.
Finally, the embedding layer takes these token IDs and transforms them into vectors — numerical representations that capture semantic meaning.
So, you could say the tokenizer and embedding layer work together, but they handle different stages of the transformation.
Hope this helps.
Ivan