How to do word and sentence tokenize using nltk in python


To write a piece of python code for tokenize the words and sentences in the text data using nltk.


Sample text


Tokenized word
Tokenized sentence


  Import nltk library.

  Import word_tokenize() and sent_tokenize().

  Took sample text data.

  Fit the data ti the constructor properly.

  Tokenize the words and sentence in the text data.

Sample Code

from nltk.tokenize import sent_tokenize, word_tokenize

#sample text
sample_text = “Python is a scripting language. Also used as a general purpose language”
print(“Original text”)

#tokenize the words
word_token = word_tokenize(sample_text)
print(“After word tokenizing\n”,word_token,”\n”)

#sentence tokenize
sent_token = sent_tokenize(sample_text)
print(“After sentence tokenizing\n”,sent_token)

