2020年6月17日 星期三

Machine Learning Foundations NLP(1): Tokenization for Natural Language Processing

Tokenization for Natural Language Processing


Tokenize is mean to break down a sentence to server work, for example:

I have a car. -> I / have / a/ car

Tensorflow provide a tokenize tool:Tokenizer, it can easily to use for tokenize input sentence.


Code:



Result:

{'have': 1, 'a': 2, 'i': 3, 'he': 4, 'apple': 5, 'bike': 6, 'pen': 7, 'car': 8}



Reference:

Machine Learning Foundations: Ep #8 - Tokenization for Natural Language Processing



沒有留言:

張貼留言

Linux driver: How to enable dynamic debug at booting time for built-in driver.

 Dynamic debug is useful for debug driver, and can be enable by: 1. Mount debug fs #>mount -t debugfs none /sys/kernel/debug 2. Enable dy...