List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Extract Multiple Keyphase using Ranking Method in KeyBERT?

Extract Multiple Keyphase using Ranking Method in KeyBERT

Condition for Extract Multiple Keyphase using Ranking Method in KeyBERT

  • Description:
    KeyBERT generates keyphrases by embedding the input text and candidate phrases using a pre-trained transformer model like BERT. It compares the semantic similarity between the input text and candidate phrases (with configurable n-gram sizes) to rank them by relevance. By adjusting parameters like keyphrase_ngram_range, it extracts single or multi-word keyphrases that best represent the text. These keyphrases are scored, enabling easy identification of the most.
Step-by-Step Process
  • Input Sentence:
    The input sentence provides the context for generating keyphrases.
    This sentence contains concepts like "machine learning" ,"data analysis" and "analytical model".
  • Keyphrase Extraction:
    Tokenization: Breaks the sentence into smaller units (words or phrases) based on the specified n-gram range.
    Embedding: Generates embeddings for both the sentence and candidate phrases using a pre-trained transformer model like BERT.
    Similarity Scoring: Compares the embeddings of each candidate phrase with the embedding of the sentence and assigns a relevance score.
    Ranking: Sorts the phrases by their similarity scores in descending order.
  • Output Keyphrases: The extracted keyphrases include multi-word phrases relevant to the input sentence
Sample Code
  • from keybert import KeyBERT
    # Initialize the KeyBERT model
    kw_model = KeyBERT()
    # Input sentence or document
    sentence = "Machine learning is a method of data analysis that automates analytical
    model building."
    # Extract keyphrases (allowing phrases of 1 to 3 words)
    keyphrases = kw_model.extract_keywords(
     sentence,
     keyphrase_ngram_range=(1, 3),
     stop_words='english',
     use_maxsum=True,
     nr_candidates=20,
     top_n=5
    )
    # Display the keyphrases with their scores
    print("Extracted Keyphrases:")
    for phrase, score in keyphrases:
     print(f"{phrase}: {score}")
Screenshots
  • Multiple_keyphase