List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Perform Word Lemmatization Using the NLTK Library in NLP?

how-to-do-word-lemmatizing-using-nltk-library-in-nlp

Condition for Performing Word Lemmatization Using the NLTK Library in NLP

  • Description:
    Word lemmatization is the process of reducing a word to its base or root form (known as the lemma), while considering the word's part of speech. The nltk library provides a convenient way to perform lemmatization using the WordNetLemmatizer.
Step-by-Step Process
  • Install and Import NLTK:
    Ensure nltk is installed, and import the necessary modules:
  • Using WordNetLemmatizer:
    The WordNetLemmatizer works by taking a word and its part of speech (POS) to reduce it to its base form.
  • Lemmatization with POS:
    To get more accurate results, provide the correct POS tag (e.g., verb, adjective).
Sample Code
  • from nltk.tokenize import word_tokenize
    from nltk import pos_tag
    from nltk.corpus import wordnet
    from nltk.stem import WordNetLemmatizer
    lemmatizer = WordNetLemmatizer()
    # Function to map NLTK POS tags to WordNet POS tags
    def get_wordnet_pos(tag):
     if tag.startswith('J'):
      return wordnet.ADJ
     elif tag.startswith('V'):
      return wordnet.VERB
     elif tag.startswith('N'):
      return wordnet.NOUN
     elif tag.startswith('R'):
      return wordnet.ADV
     else:
      return wordnet.NOUN # Default to noun
    # Text to process
    text = "The striped bats are hanging on their feet for best."
    # Tokenize and POS tagging
    tokens = word_tokenize(text)
    tagged_tokens = pos_tag(tokens)
    # Lemmatize each word with its POS tag
    lemmatized_words = [ lemmatizer.lemmatize(word, get_wordnet_pos(tag)) for word, tag in tagged_tokens
    ]
    print("Original:", tokens)
    print("Lemmatized:", lemmatized_words)
Screenshots
  • Lemmatization.png