To use character classes and POSIX character classes in regular expression using R
Set of characters enclose din a square bracket [ ].
Matches only the character enclose din the bracket Can be sued in conjunction with the quantifiers.
A caret (^) ahead of the expression negates the expression
| [aeiou] | Matches lower case vowels |
| [AEIOU] | Matches upper case vowels |
| [0123456789] | Matches any digits |
| [0-9] | Matches any digits |
| [a-z] | Matches any lower case letter |
| [A-Z] | Matches any upper case letters |
| [a-zA-Z0-9] | Matches any of above classes |
| [^aeiou] | Matches everything except aeiou |
| [^0-9] | Matches everything except digits |
Enclosed within double brackets [[ ]]
Works same like character classes
A caret (^) ahead of the expression negates the expression
| [[:lower:]] | Matches lower case letter |
| [[:upper:]] | Matches upper case letter |
| [[:alpha:]] | Matches letter |
| [[:digit:]] | Matches digits |
| [[:space:]] | Matches space characters(\t,\n,space etc) |
| [[:blank:]] | Matches blank characters |
| [[:alnum:]] | Matches Alphanumeric characters |
| [[:cntrl:]] | Matches control characters |
| [[:punct:]] | Matches punctuation characters |
| [[:xdigit:]] | Matches hexadecimal digits |
| [[:print:]] | Matches printable characters [[:alpha:]] [[:punct:]] and space |
| [[:graph:]] | Matches Graphical characters comprise [[:alpha:]] and [[:punct:]] |
library(readtext)
library(stringr)
data #To split the string using a pattern
data=unlist(str_split(data$text,” “))
#To matche character classes
#To match digit
grep(pattern = “[0-9]”,data,value=T )
#To match any upper case letter
grep(pattern = “[A-Z]”,x = data,value = T)
#To match POSIX Character classes
#To match digit
grep(pattern =”[[:digit:]]”,data,value=T)
#To match any upper case letter
grep(pattern = “[[:upper:]]”,x = data,value = T)
#To match punctuation character
grep(pattern = “[[:punct:]]”,x = data,value = T)

