GPT tokenizer

Gabriel Gruenberger

ML Engineer
Software Engineer
AI Model Developer
Jupyter
Python
Visual Studio Code

Using byte pair encoding, I created a tokenizer which converts from natural language into an index which could be passed into the token embedding table of a language model. It it used by openAI and is useful as the vocabulary size can be adjusted depending on the individual requirements of the project when training the tokenizer.

Partner With Gabriel
View Services

More Projects by Gabriel