More automated checks. Fun with #scrivener and regex

I ran several other Scrivener checks today, once again using the amazingly handy Regular Expression (RegEx) facility to find patterns on things.

Comma.svg
Oh, comma, why you so pushy?

Punctuation is easy to mess up, especially where spaces are involved and I find some of the following patterns very handy. Remember that to do this in Scrivener, you will need to go to the global search the project field, set the Operator to RegEx and then you’ll be in the right mode. Here are some handy patterns, with explanation.

  • ^\s+\w
    • This pattern will find any sentence that has spaces before you start using words. The ^ means start at the beginning of the line. The \s+ means you are looking for 1 or more spaces (\s means space, + means ‘at least one or more’). \w represents a word character, from a-z, A-Z, 0-9, including the _ (underscore) character. This will catch lines such as (ignoring the quotation marks, here used for clarity) ” Bosco said…” and ”  Bosco said…”, which can be very hard to pick up from visual inspection.
  • \w,\w
    • This will pick up word characters that are tightly packed around a comma. If you are prone to writing “very,very high” then this will find that for you. (Sometimes this is a legitimate pattern! 100,000 is valid but will be detected by this.)
  • \w\s+,\w
    • This is very similar to the above but now it picks up when you have the space in the wrong place. This will find “very ,very high”.
  • \s+(\w+)\s+(\1)\s
    • This one is more complicated! I’ve mentioned it before but this is an optimised version. Remember that \s means “a space character” although formally it means any whitespace character including tab, space, carriage return or new line. Adding + means that \s+ will match ” ” and ”   ” and ”   newline tabcharacter “. \w+ means that we are now looking at words of at least one character in length.
    • But what does ( and ) mean? This is a grouping operator and, because we’ve used it, anything that matches this now has a numerical reference. We call this a capturing group because we’ve ‘captured’ that pattern and numbered it! We can refer to this capturing group (our first) with the shorthand \1.
    • Now we can explain the whole pattern. Starting with any number of spaces (but at least one), we look for characters that make up words, stopping when we find a space. Remember this captured group of characters is labelled as \1. By using \1 again, we are saying that we want to find all the situations where we have the same pattern of characters twice in a row, separated by spaces. Once we match that first group of characters, the regex system can then build a pattern where that group is repeated. The magic of regex!
    • This pattern will find ” of of “, ” it IT”, ” and and ” and also things like ” R R ” if you’re spelling something out.
    • If you simplify it to (\w+)\s+(\1), you’ll find all of those patterns plus things like “he heard” (where the he he is actually located), “pithy thyme” (thy thy) or “puny NYC” (ny ny pattern will be found here unless you use the search option ‘case sensitive’).
  • [ ]{2}
    • There’s a single space between [ and ]. The {2} means that you are looking for exactly two of these in a row. This will find every time that you typed ”  ” instead of ” “. (Some of you like to double space. Many of us do not.)

That’s it for today. Back to editing for me!

2 thoughts on “More automated checks. Fun with #scrivener and regex

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s