Pages

Friday, June 20, 2014

Regex Regular Expressions – Find and Replace Text With Previous Matched Text That Regex Remembers with ( )

Using Regex to Replace Previous Matched Text

But even more useful is that you can use regex to help with replacing text as well. In the example above, you might want to keep the page numbers but replace the word "Page". You could use then use something like:
  • Find:
    Page (\d+)
  • Replace:
    This is page number \1
tutorial-find-adv-replace
This will replace any occurrence of "Page" followed by a number with "This is page" followed by the same number:
  • Before:
    Page 418
  • After:
    This is page 418
The first point to note about this replacement is the use of the regex code (\d+) in the Find box. The \d+ code tells the search to look for numbers as above. But the use of parenthesis around the code tells the search to remember what those numbers were – to remember anything matched in the parenthesis. The other point to note is the use of the regex code \1 in the Replace box, which tells Replace to substitute the characters remembered in the Find statement for the string \1 wherever it finds it.

Using Regex to Change Formatting

As a further example of regex, this is how you might change the formatting of certain text into chapter headings.
Let’s say that you have an imported HTML file that contains lots of chapter headings, but none of them are marked using the h1 heading tag. Instead they are all marked as paragraphs like this:
<p>CHAPTER 7</p>
Assuming every paragraph like this is a chapter heading, you could use this regex:
  • Find:
    <p>\sCHAPTER\s(\d+)\s</p>
  • Replace:
    <h1>Chapter \1</h1>
That’s quite a lot to digest, but if you look carefully you can see that it’s very similar to the Page number example above in that it’s remembering the digits in the chapter name and using them in the replace.
It's the Find that is the most interesting. It breaks down like this:
  • <p> – Look for a starting paragraph tag.
  • \s – Regex code to match any white space (blanks, tabs, etc.).
  • CHAPTER – Match the word CHAPTER (Regex is case-sensitive by default).
  • \s – Regex code to match any white space.
  • (\d+) – Regex code to match any number of digits in a row and remember them.
  • \s – Regex code to match any white space.
  • </p> – Look for an end paragraph tag.
You could just use a space instead of \s but \s is more flexible since it will match any number of blank spaces and tabs.
So the results of that search could be:
  • Before:
    <p>  CHAPTER    14</p>
  • After:
    <h1>Chapter 14<h1>

No comments:

Post a Comment