To use quantifiers in regular expression with R
Quantifiers are mainly used to determine the length of the resulting match
Quantifiers exercise their power on items to the immediate left of it
Quantifiers can be used with meta characters, sequences, and character classes to return complex patterns.
Greedy Quantifiers : The symbol .* is known as a greedy quantifier. It says that for a particular pattern to be matched, it will try to match the pattern as many times as its repetition are available.
Non-Greedy Quantifiers : The symbol .? is known as a non-greedy quantifier. Being non-greedy, for a particular pattern to be matched, it will stop at the first match.
. | Matches everything except a new line |
? | Items to its left is optional and is matched at most once |
* | Items to its left is matched zero or more times |
+ | Items to its left is matched one or more times |
{n} | Items to its left is matched exactly n times |
{n,} | Item to its left is matched n or more times |
{n,m} | Item to its left is matched at least n times but not more than m times |
library(readtext)
library(stringr)
data #To split the string using a pattern
data=unlist(str_split(data$text,” “))
#greedy
number=1015001601981357
regmatches(number, gregexpr(pattern = “1.*1”,text = number))
#non greedy
regmatches(number, gregexpr(pattern = “1.?1”,text = number))
#To match itmes zero or more times
regmatches(number, gregexpr(pattern = “1*”,text = number))
#To match itmes one or more times
grep(pattern = “in+”,x = data,value = T)
#To match exactly n times
grep(pattern = “e{2}”,x = data,value = T)
#To match n or more times
grep(pattern = “e{1,}”,x = data,value = T)
#To match leass than n times but not more than m times{n,m}
grep(pattern=”in{1,2}”,x = data,value = T)
grep(pattern=”in{2,3}”,x = data,value = T)