Last answered:

29 Jul 2024

Posted on:

27 Jul 2024

0

what is tokenization?

what is tokenization in context of large language models, embedings, words etc.

1 answers ( 0 marked as helpful)
Instructor
Posted on:

29 Jul 2024

0

Hey Tonci,


Thank you for reaching out!


Tokenization is the process of breaking down text into smaller units called 'tokens.' There are many tokenization models and algorithms, but within the context of OpenAI's tokenization model, a token is typically defined as approximately 3/4 of a word.


You can explore this process on OpenAI's tokenization platform.

https://platform.openai.com/tokenizer


Kind regards,

365 Hristina

Submit an answer