How to Extract Multiple Keyphase using Ranking Method in KeyBERT?
Share
Condition for Extract Multiple Keyphase using Ranking Method in KeyBERT
Description: KeyBERT generates keyphrases by embedding the input text and candidate phrases using a pre-trained transformer model like BERT. It compares the semantic similarity between the input text and candidate phrases (with configurable n-gram sizes) to rank them by relevance. By adjusting parameters like keyphrase_ngram_range, it extracts single or multi-word keyphrases that best represent the text. These keyphrases are scored, enabling easy identification of the most.
Step-by-Step Process
Input Sentence: The input sentence provides the context for generating keyphrases. This sentence contains concepts like "machine learning" ,"data analysis" and "analytical model".
Keyphrase Extraction: Tokenization: Breaks the sentence into smaller units (words or phrases) based on the specified n-gram range. Embedding: Generates embeddings for both the sentence and candidate phrases using a pre-trained transformer model like BERT. Similarity Scoring: Compares the embeddings of each candidate phrase with the embedding of the sentence and assigns a relevance score. Ranking: Sorts the phrases by their similarity scores in descending order.
Output Keyphrases: The extracted keyphrases include multi-word phrases relevant to the input sentence
Sample Code
from keybert import KeyBERT
# Initialize the KeyBERT model
kw_model = KeyBERT()
# Input sentence or document
sentence = "Machine learning is a method of data analysis that automates analytical model building."
# Extract keyphrases (allowing phrases of 1 to 3 words)
keyphrases = kw_model.extract_keywords(
sentence,
keyphrase_ngram_range=(1, 3),
stop_words='english',
use_maxsum=True,
nr_candidates=20,
top_n=5
)
# Display the keyphrases with their scores
print("Extracted Keyphrases:")
for phrase, score in keyphrases:
print(f"{phrase}: {score}")