realLifeinfo

Introduction to Regular Expression Briefly explained concepts

Nerd, Solutions 

Regular expression is the symbol or a group of symbols representing a text pattern and have specific meaning to the regular expression engine. Regular expression can be used in matching, searching and replacing text.

Sponsored

Regular expression is supported by many programming languages and many computer tools. Regular expression usage can be seen in:

These are some of things you can achieve with regular expression. You can run regular expression in various modes. These modes will determine how regular expression engine will behave in matching and searching text pattern.

Those modes are:

 These modes can be combined to achieved a certain effect. Example, we can combine global mode with case-insensitive mode. /pattern/gi

These are common regular expression modes and they are supported widely by many regular expression engines; however, different regular expression engines support difference modes.

For example, languages like Perl and PHP support 10 modes, JavaScript support 5 modes and Python support 7 modes. These are some of programming languages and their support of regular expression.

Sponsored

 Regular expression with string literals

We can pass a normal string literal as regular expression pattern and the engine will try to match it in the searched string. Example, we can pass the word "hello" as regular expression pattern and the regular expression engine will only match it inside the sentence "hello world".

Regular expression Standard mode

Case-insensitive by default:

Regular expression by default is case-sensitive, this is when we search the pattern while regular expression is in standard mode; /pattern/ .This will only match text or word as described in the pattern. Example, /HELLO/; this pattern will only match HELLO in uppercase and not in lowercase.

Non-global:

Regular expression by default run in non-global mode; this means that earliest leftmost match is always preferred. Example, this pattern /oo/, will only match the first oo in this word footoo and not oo after the t. But in global mode this will match both oo in the word.

 Metacharacters in Regular Expression

These are characters that have special meaning to the regular expression engine. These characters can be used to trigger certain behavior in regular expression engine.

These are characters that have special meaning in regular expression:

 These metacharacters can have more than one meaning depends on how they have been used in the context.

Matching Unicode characters in Regular expression

We can write pattern that can match unicode characters. This is regular expression engine dependents; every engine has its own syntax on how to match unicode characters.

In language like Java, JavaScript, .NET, and ruby we use this syntax for unicode characters. 

u is the modifier to match unicode characters. This will match the word cafe with acute accent character and not cafe without acute accent character.

In languages like Perl and PHP support the \x syntax:

 Python and JavaScript support both syntax. 

In unicode character to specify a wild card character we use the \X. Example /caf\X/ this will match the word cafe with acute accent and the word cafe without acute accent. This is only supported in Perl and PHP engines.

Sponsored

Metacharacters in action

Defining a character set

Metacharacter for defining a character set [begin a character setend a character set. Any one of several characters, but only one character can be matched. Order of characters does not matter.

Code Example:

 Character ranges:

We can define character ranges by defining a range of characters. We use the dash metacharacter. Example /[a-z]/ this will match any character between small letter and smaller letter z.

Negative character sets:

This tells regular expression engine to match all character that are not specified in the character set; and to define this we use the caret character.

Example, /[^a-z]/; this will match any character that is not small letter to z. And in order for this to work, this character must be the first character in the character set.

 Shorthand character sets:

These are characters that will help regular expression authors to shorten the regular expression length that they need to manually type in:

Usage of Shorthand characters

Examples:

Repetition metacharacters

Usage of repetition metacharacters

Example, this regular expression pattern will match one ore more in the string. /a+/ . This pattern will match a, aa, aaa, aaaa and any string with one or more will be a match.

Sponsored

 Quantified Repetition

We need to use quantified repetition to limit how may times does the character allowed to repeat. Quantified repetition metacharacter are start with and end with }. The syntax for quantified metacharacters are as follows {min, max}, we put the value for a minimum value allowed and the maximum value allowed. Min must be included.

Examples:

 Quantified repetition metacharacters are important because they can be used to shorten regular expression pattern. Example, pattern for matching three digits followed by 4 digits can be written in this way:

First syntax without repetition quantified metacharacters

Second syntax with repetition quantified metacharacters

Grouping metacharacter

 A grouping metacharacters can be used to group regular expression values and characters. A grouping metacharacter is the normal brackets (). This helps in apply repetition operators to a group, makes expressions easier to read, captures group for use in matching and replacing.

Example: /(abc)+/ this pattern will match abc, abcabc, abcabcabc and all kind abc variations.

Also grouping metacharacter can be used to group regular expression pattern for capturing, so they can be referenced latter. Example, this will match any HMTL like tags:

To access captured data, we use the backslash followed by the number 1-9, which represent back reference position.

Positive look ahead assertions

This is used to check what come ahead of the string; assertion of what ought to be ahead.

The syntax for look ahead assertion is: /(?=regex)/ .Example: /light(?= house)/. This pattern will match the word light that is immediately followed by house, but will not match the word light followed by bulb. This tells regular expression engine to look for the word hose that is immediately followed by the value represented in look ahead assertion.

 Negative look ahead assertions

This is the opposite of positive look ahead assertion. The syntax for his is, we use instead of using ; Syntax: /(?!regex)/. We use this the same way as we use the positive look ahead assertion. This will return the opposite of positive look ahead assertion.

Example: /light(?! house)/ .This will match the word light when is not immediately followed by house. so, this will match light in the light bulb sentence and not light in the light house.

 Look behind assertions

Metacharacters used for look behind assertions are: ?<= for positive look behind assertion and ?<! for negative look behind assertion.

Syntax for positive look behind assertion: (?<=regext)regex

Syntax for negative look behind assertion: (<!regex)regex

Example: This will match ball in baseball and not in football; (?<=base)ball

And this will match ball in other words, but not in baseball. This will match ball in football and not in baseball. (?<!base)ball.

You can use one of these websites to learn more and test your regular expression pattern online:

www.regex101.com; This website support many programming languages regular expression, it's support JavaScript, PCRE( Perl and PHP engines), Java, Python and golang.

www.regexpal.com; This website has support for JavaScript and PCRE (Perl and PHP engines).

Sponsored