CMPUT 201, Winter 1996 - (C) David Southwell Regular Expressions c ^ $ . [abc] [d-k] [^xyz] Any character; use \c if a specialdenotes the start of the linedenotes the end of the lineany single charactera, b or c are matchedd,e,f,g,h,i,j or k are matchedany character not in the set... Some single character matching codes...Operators...*xy\(...\)\n After an expression, this willmatch n duplications (n can be 0)expression x followed byexpression ydelimits a subexpressionn : 0-9 \n matches the nth sub-expression matched this line 7 Regular Expressions...are used for pattern matching strings.W e can use strcmp(char *, char *) for precise string comparison, but very oftenwe would like to be able to find stringswhich conform to certain constraints.Regular expressions prescribe a patternin an array of characters. W e can tell the system what our regular expressionis, and then ask it if particular stringsconform to it.They are found throughout the UNIXenvironment, either at the commandline (through GREP - Get REgularexPression), or through system callsfrom within user programs...as we shallsee... CMPUT 201, Winter 1996 - (C) David Southwell 10 Regular ExpressionsSome examples...SupercomputerSuper .* [13579]^Once upon a time[A-Z][a-z]*[qQ][^u].*\([0-9][0-9]*\)\1[^aeiou][aeiuo]*So, how do we access this from withinour C programs ?First, we must set up your environmentso that the dynamic linker can find thesystem libraries for the manipulation ofregular expressions...LD_LIBRAR Y_P A TH /usr/ucblib CMPUT 201, Winter 1996 - (C) David Southwell 9 Regular ExpressionsUNIX grepgrep ...scans all the characters in the filesspecified, and prints out each line wherethere is at least 1 match with theregular expression given.eg. cd /usr/includegrep '.printf ' *.h Note that you will need touse [ '] if your regular expression includes codes interpreted bythe shell before grep sees them. UNIX is a line oriented OS - nearly allthe utilities work on text files, and on aline by line basis. CMPUT 201, Winter 1996 - (C) David Southwell 12 Regular ExpressionsExample program :#include char *re_comp(char *);int re_exec(char *);main(){char *return_value;char input_string[256];return_value = re_comp("[A-Z][a-z]*");if (return_value != (char *)NULL) {puts(return_value);exit(1);} CMPUT 201, Winter 1996 - (C) David Southwell 1 1 Regular ExpressionsAlso, modify the makefile so that theappropriate libraries are visible from gcc.Use the flags -L/usr/ucblib -lucbIt appears that the function headers arenot available from falun. Y ou should use the following to keep the type checkerhappy :char *re_comp(char *);int re_exex(char *);The pattern matching must be veryfast, and the raw text form of theregular expressions require someparsing, to take them down to a moreef ficient representation. W e use the re_comp() call to do this. The re_exec()call is used to check the string againstour regular expression. CMPUT 201, Winter 1996 - (C) David Southwell 14 Where to get these slidesCopies of these notes can be foundfrom my web page : http://web.cs.ualberta.ca/~daves They are in compressed postscript form,so to print them out... - download them from the web, by clicking on them. - uncompress, with gunzip... gunzip 201_dave.ps.gz - and print directly to a postscript printer ... lpr -P 201_dave.ps CMPUT 201, Winter 1996 - (C) David Southwell 13 Regular Expressionsdo {scanf("%s", input_string);} while (re_exec(input_string)); printf("%s wasn 't a capitalised name!\n", input_string); exit(0);}The loop takes a string from the console,and checks it with the previouslysubmitted regular expression; this willreturn a 1 for a match, or a 0 for nomatch. The loop continues until nomatch is found - the program terminates.re_exec continues to use the initialregular expression until re_comp() iscalled with a new one.