The syntax consists of two forward-slashes separated by a character pattern and ending with a flag.
What Are the Flags, Patterns, and Quantifiers?
For the effective working of Regular Expressions, flags, patterns, and quantifiers are crucial. Let us try to understand them one by one.
- g - Global. Finds all possible matches for the given characters
- i - Ignore case. /[a-z]/i is equivalent to /[a-zA-Z]/
- m - Multiline. ^ and $ are used to match the beginning and end of each line, respectively
- u - Unicode. If this flag is not supported, you must match specific Unicode characters with \uXXXX where XXXX is the character's value in hexadecimal
- y - Finds all consecutive matches
In the example presented above, the pattern provided was the word “Welcome”. As you can see, the word is highlighted in the text. The flag ‘g’ is used to indicate that it highlights all the matches in the text for the word ‘Welcome’.
In this example, the flag “i” is used to indicate case insensitivity. Although the pattern is in the lower case, the string gets matched in the text.
- [a-z] - Finds all the characters from a to z (lower cases only)
- [^a-z] - Finds all the characters that are not letters from a to z. It selects all the whitespaces as well
- [0-9] - Finds all digits between 0 and 9
- [a-z|0-9] - Finds any character of digits separated by “|”
In this example, all the digits are matched.
If you closely observe, the upper case letters are not matched. This is because the flag “i” is not used. All the lowercase letters in the alphabet and digits are matched.
A ‘^’ (caret) symbol indicates that the characters following it should not be matched with the text. In the example above, all the lowercase letters are ignored. However, all the digits, whitespaces, and uppercase letters are matched in the text.
Quantifiers define the number of occurrences of a string. The most commonly used quantifiers are ‘+’, ‘*’ and ‘?’.
+ - Indicates one or more occurrence of the character n
* - Indicates zero or more occurrences of the character n
? - Indicates zero or one occurrence of the character n
In the example shown above, we are looking for the character ‘a’ in the string and rightly so, they are highlighted.
Although the quantifier ‘*’ indicates zero or more occurrences of the character ‘a’, it is followed by the character ‘b’. As a result, the letter ‘a’ in the first text does not get matched since it is not followed by the letter ‘b’.
In this example, all strings with zero or more occurrences of ‘a’, followed by zero or more occurrences of ‘b’ are matched. Hence, the letter ‘a’ in the first text gets matched.
What Are Metacharacters?
Characters with special meaning are known as metacharacters. Given below is a list of a couple of metacharacters and their descriptions.
Finds a character except for newline or a line terminator
Matches any word character
Matches any word that is not a word character
Matches any whitespace character
Matches any character that is not a whitespace (tab, spaces, line breaks)
In the above example, all the word characters are matched.
All the characters that are not words are matched.
All whitespace characters are matched.
All the digits are matched.
A Simple Demo
Consider an eight-digit telephone number with a three-digit code in the beginning. It can be written in two ways:
XXX - XXXXXXXX or XXXXXXXXXXX where X represents a digit.
A single expression can be written that matches both these strings.
Here, the number in the curly brackets indicates the number of digits to match. The presence of ‘?’ quantifier indicates zero of one occurrence of the preceding token, i.e., hyphen. Both the strings get matched for the same character pattern.
If you have any questions or feedback, let us know in the comments section. Our experts will get back to you at the earliest.