Last answered:

30 May 2025

Posted on:

25 May 2025

0

Resolved: What is the job of tokenizer ?

Hi there, 

I think I should review the NLP course and complete it, but I want to have a short answer for the need of tokenizer each time we do practicing, is the model not able to do the task without a tokenizer? What is the contribution of tokenization?

3 answers ( 1 marked as helpful)
Instructor
Posted on:

30 May 2025

0

Hi Abdulrahman!
Thanks for reaching out.
Think of the tokenizer as a bridge between the human language and the model’s understanding. A model doesn’t understand raw text directly. It only understands numbers, and the tokenizer turns text into a sequence of numbers (or tokens). 

So, is the model able to do the tasks without tokenizer? The short answer is "no". Because without it, the model wouldn’t know how to read or interpret the input.
Hope this helps.
Best,
Ivan

Posted on:

30 May 2025

0
Great and thanks for your answer. 
Turning text to sequence of numbers isn't the job of the embbedings ?
Instructor
Posted on:

30 May 2025

0
Hey Abdulrahman!
Partly. 
First, during tokenization, the raw text is split into tokens, which are still parts of the text — like words, subwords, or characters, depending on the tokenizer.
Then, the tokenizer maps these tokens to token IDs — which are numbers.
Finally, the embedding layer takes these token IDs and transforms them into vectors — numerical representations that capture semantic meaning.
So, you could say the tokenizer and embedding layer work together, but they handle different stages of the transformation.

Hope this helps.
Ivan

Submit an answer