Core Java - Regex


Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
The java.util.regex package primarily consists of the following three classes:

  • Pattern Class : A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.

  • Matcher Class : A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher method on a Pattern object.

  • PatternSyntaxException : A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.


  • Capturing Groups:
    Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
    Capturing groups are numbered by counting their opening parentheses from left to right.

    In the expression ((A)(B(C))), for eg, there are four such groups:
  • ((A)(B(C)))
  • (A)
  • (B(C))
  • (C)

  • To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher's pattern.
    There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.
    Eg:
        import java.util.regex.Matcher;
        import java.util.regex.Pattern;
        public class RegexMatches{
            public static void main(String args[]){
            // String to be scanned to find the pattern.
            String line ="This order was places for QT3000! OK?";
            String pattern ="(.*)(\\d+)(.*)";
            // Create a Pattern object
            Pattern r =Pattern.compile(pattern);
            // Now create matcher object.
            Matcher m = r.matcher(line);
            if(m.find()){
            System.out.println("Found value: "+ m.group(0));
            System.out.println("Found value: "+ m.group(1));
            System.out.println("Found value: "+ m.group(2));
            }else{
            System.out.println("NO MATCH");
            }
            }
        }
    
    
    Output :
    Found value:This order was places for QT3000! OK?
    Found value:This order was places for QT300
    Found value:0


    Regular Expression Syntax :

    
       S.No	 Subexpression	                                        Matches
       ----  --------------   --------------------------------------------------------------------------------------------------
        1	    ^	            Matches beginning of line.
    
        2	    $	            Matches end of line.
    
        3	    .	            Matches any single character except newline. Using m option allows it to match newline as well.
    
        4	    [...]           Matches any single character in brackets.
    
        5	    [^...]          Matches any single character not in brackets.
    
        6	    \A	            Beginning of entire string.
    
        7	    \z	            End of entire string.
    
        8	    \Z	            End of entire string except allowable final line terminator.
    
        9	    re*	            Matches 0 or more occurrences of preceding expression.
    
        10	    re+	            Matches 1 or more of the previous thing.
    
        11	    re?	            Matches 0 or 1 occurrence of preceding expression.
    
        12	    re{ n}          Matches exactly n number of occurrences of preceding expression.
    
        13	    re{ n,}         Matches n or more occurrences of preceding expression.
    
        14	    re{ n, m}	    Matches at least n and at most m occurrences of preceding expression.
    
        15	    a| b            Matches either a or b.
    
        16	    (re)            Groups regular expressions and remembers matched text.
    
        17	    (?: re)         Groups regular expressions without remembering matched text.
    
        18	    (?> re)         Matches independent pattern without backtracking.
    
        19	    \w	            Matches word characters.
    
        20	    \W	            Matches nonword characters.
    
        21	    \s	            Matches whitespace. Equivalent to [\t\n\r\f].
    
        22	    \S	            Matches nonwhitespace.
    
        23	    \d	            Matches digits. Equivalent to [0-9].
    
        24	    \D	            Matches nondigits.
    
        25	    \A	            Matches beginning of string.
    
        26	    \Z	            Matches end of string. If a newline exists, it matches just before newline.
    
        27	    \z	            Matches end of string.
    
        28	    \G	            Matches point where last match finished.
    
        29	    \n	            Back-reference to capture group number "n".
    
        30	    \b	            Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
    
        31	    \B	            Matches nonword boundaries.
    
        32	    \n, \t, etc	    Matches newlines, carriage returns, tabs, etc.
    
        33	    \Q	            Escape (quote) all characters up to \E.
    
        34	    \E	            Ends quoting begun with \Q.
    
    



    Methods :

    Index methods provide useful index values that show precisely where the match was found in the input string and

    Study methods review the input string and return a Boolean indicating whether or not the pattern is found:
    
       S.No	            Methods	                                        Description
       ---- -----------------------------  -----------------------------------------------------------------------------
        1	public int start()               Returns the starting index of the matched subsequence.
    
        2	public int start(int group)      Returns the start index of the subsequence captured by the given group 
                                             during the previous match operation.
    
        3	public int end()                 Returns the offset after the last character matched.
    
        4	public int end(int group)        Returns the offset after the last character of the subsequence captured 
                                             by the given group during the previous match operation.
    
        5	public boolean matches()         Test whether the regular expression matches the pattern.
    
        6	boolean find()	                 Finds the next expression that matches the pattern.
    
        7	public boolean find(int start)	 Finds the next expression that matches the pattern from the given start number.
    
        8	public boolean lookingAt()       Attempts to match the input sequence, starting at the beginning of the region, 
                                             against the pattern.
    
    



    Replacement Methods :
    Replacement methods are useful methods for replacing text in an input string:
    
       S.No	                Methods	                                                    Description
       ---- ------------------------------------------------    ----------------------------------------------------------------------
        1	public String replaceAll(String replacement)	    Replaces every subsequence of the input sequence that matches 
                                                                the pattern with the given replacement string.
    
        2	public StringBuffer appendTail(StringBuffer sb)	    Implements a terminal append-and-replace step.
    
        3	public Matcher appendReplacement(StringBuffer sb,   Implements a non-terminal append-and-replace step.
            String replacement)	
    
        4	public String replaceFirst(String replacement)	    Replaces the first subsequence of the input sequence that matches 
                                                                the pattern with the given replacement string.
    
        5	public static String quoteReplacement(String s)	    Returns a literal replacement String for the specified String. 
                                                                This method produces a String that will work as a literal replacement 
                                                                s in the appendReplacement method of the Matcher class.
    
    
    start and end Methods :
    Following is the example that counts the number of times the word "java" appears in the input string:
        import java.util.regex.Matcher;
        import java.util.regex.Pattern;
        public class Test{
           private static final String REGEX ="\\bjava\\b";
           private static final String INPUT ="This is regex expression topic which is present java language...";
           public static void main(String args[]){
           Pattern p =Pattern.compile(REGEX);
           Matcher m = p.matcher(INPUT);// get a matcher object
           int count =0;
           while(m.find()){
           count++;
           System.out.println("Match number :"+count);
           System.out.println("start(): "+m.start());
           System.out.println("end(): "+m.end());
           System.out.println("------------------");
           }
           }
        }
    
    
    Output :
    Match number : 1
    start(): 48
    end(): 52
    ------------------
    Match number : 2
    start(): 62
    end(): 66
    ------------------

    matches and lookingAt Methods:
    The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingAt does not.
    Both methods always start at the beginning of the input string.
    Eg:
        import java.util.regex.Matcher;
        import java.util.regex.Pattern;
        public class Test{
            private static final String REGEX ="java";
            private static final String INPUT ="java is a language";
            private static Pattern pattern;
            private static Matcher matcher;
            public static void main(String args[]){
            pattern =Pattern.compile(REGEX);
            matcher = pattern.matcher(INPUT);
            System.out.println("Current REGEX is: "+REGEX);
            System.out.println("Current INPUT is: "+INPUT);
            System.out.println("lookingAt(): "+matcher.lookingAt());
            System.out.println("matches(): "+matcher.matches());
            }
        }
    
    
    Output :
    Current REGEX is: java
    Current INPUT is: java is a language
    lookingAt(): true
    matches(): false

    replaceFirst and replaceAll Methods :
    The replaceFirst and replaceAll methods replace text that matches a given regular expression. As their names indicate, replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences.
    Eg:
        import java.util.regex.Matcher;
        import java.util.regex.Pattern;
            public class RegexMatches{
            private static String REGEX ="dog";
            private static String INPUT ="The dog says meow. "+"All dogs say meow.";
            private static String REPLACE ="cat";
            public static void main(String[] args){
            Pattern p =Pattern.compile(REGEX);
            // get a matcher object
            Matcher m = p.matcher(INPUT);
            INPUT = m.replaceAll(REPLACE);
            System.out.println(INPUT);
            }
        }
    
    
    Output :
    The cat says meow.All cats say meow.

    appendReplacement and appendTail Methods:
    The Matcher class also provides appendReplacement and appendTail methods for text replacement.
    Eg:
        import java.util.regex.Matcher;
        import java.util.regex.Pattern;
        public class RegexMatches{
            private static String REGEX ="see this";
            private static String INPUT ="Hi hello see this. Hi now your going to see this java language";
            private static String REPLACE ="-";
            public static void main(String[] args){
            Pattern p =Pattern.compile(REGEX);
            // get a matcher object
            Matcher m = p.matcher(INPUT);
            StringBuffer sb =new StringBuffer();
            while(m.find()){
            m.appendReplacement(sb,REPLACE);
            }
            m.appendTail(sb);
            System.out.println(sb.toString());
            }
        }
    
    
    Output :
    Hi hello -. Hi now your going to - java language

    (Core Java - DateAPI)