#5, First Floor, 4th Street , Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

How to use character classes and POSIX character classes in regular expression using R?
Description

To use character classes and POSIX character classes in regular expression using R

Process

Character classes :

  Set of characters enclose din a square bracket [ ].

  Matches only the character enclose din the bracket Can be sued in conjunction with the quantifiers.

  A caret (^) ahead of the expression negates the expression

[aeiou] Matches lower case vowels
[AEIOU] Matches upper case vowels
[0123456789] Matches any digits
[0-9] Matches any digits
[a-z] Matches any lower case letter
[A-Z] Matches any upper case letters
[a-zA-Z0-9] Matches any of above classes
[^aeiou] Matches everything except aeiou
[^0-9] Matches everything except digits

POSIX Character classes :

  Enclosed within double brackets [[ ]]

  Works same like character classes

  A caret (^) ahead of the expression negates the expression

[[:lower:]] Matches lower case letter
[[:upper:]] Matches upper case letter
[[:alpha:]] Matches letter
[[:digit:]] Matches digits
[[:space:]] Matches space characters(\t,\n,space etc)
[[:blank:]] Matches blank characters
[[:alnum:]] Matches Alphanumeric characters
[[:cntrl:]] Matches control characters
[[:punct:]] Matches punctuation characters
[[:xdigit:]] Matches hexadecimal digits
[[:print:]] Matches printable characters [[:alpha:]] [[:punct:]] and space
[[:graph:]] Matches Graphical characters comprise [[:alpha:]] and [[:punct:]]
Matches Graphical characters comprise [[:alpha:]] and [[:punct:]]
Sapmle Code

library(readtext)
library(stringr)
data #To split the string using a pattern
data=unlist(str_split(data$text,” “))
#To matche character classes
#To match digit
grep(pattern = “[0-9]”,data,value=T )
#To match any upper case letter
grep(pattern = “[A-Z]”,x = data,value = T)
#To match POSIX Character classes
#To match digit
grep(pattern =”[[:digit:]]”,data,value=T)
#To match any upper case letter
grep(pattern = “[[:upper:]]”,x = data,value = T)
#To match punctuation character
grep(pattern = “[[:punct:]]”,x = data,value = T)

Screenshots