How to Perform Word Lemmatization Using the NLTK Library in NLP?
Share
Condition for Performing Word Lemmatization Using the NLTK Library in NLP
Description: Word lemmatization is the process of reducing a word to its base or root form (known as the lemma), while considering the word's part of speech. The nltk library provides a convenient way to perform lemmatization using the WordNetLemmatizer.
Step-by-Step Process
Install and Import NLTK: Ensure nltk is installed, and import the necessary modules:
Using WordNetLemmatizer: The WordNetLemmatizer works by taking a word and its part of speech (POS) to reduce it to its base form.
Lemmatization with POS: To get more accurate results, provide the correct POS tag (e.g., verb, adjective).
Sample Code
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
# Function to map NLTK POS tags to WordNet POS tags
def get_wordnet_pos(tag):
if tag.startswith('J'):
return wordnet.ADJ
elif tag.startswith('V'):
return wordnet.VERB
elif tag.startswith('N'):
return wordnet.NOUN
elif tag.startswith('R'):
return wordnet.ADV
else:
return wordnet.NOUN # Default to noun
# Text to process
text = "The striped bats are hanging on their feet for best."
# Tokenize and POS tagging
tokens = word_tokenize(text)
tagged_tokens = pos_tag(tokens)
# Lemmatize each word with its POS tag
lemmatized_words = [
lemmatizer.lemmatize(word, get_wordnet_pos(tag)) for word, tag in tagged_tokens
]
print("Original:", tokens)
print("Lemmatized:", lemmatized_words)