Search and Replace Quoted Strings in YAML

Regex capture groups and bash positional arguments to replace text in VS Code

I've been knee deep in OpenAPI specs in my current role as an API Product Manager. While the work is fulfilling, there are some very mundane aspects to onboarding a development team when using the OpenAPI specification; particularly, the formatting of YAML documents. There is a wonderful readability aspect to working with YAML when certain behaviors are adhered to. Without getting into the firestorm of using single quote ('), double quote (") or no quotes, you can read more about that here, I find myself reviewing several documents where our development team are diverging from each other in the way they are documenting our API's. Certainly, there is a selfish aspect of wanting consistency while reviewing these documents. I have 1000's of specs to review for our internal developer ecosystem and I'm trying my best to automate as much review work as possible.

Ok, let's get to it: I have documents exceeding 10-thousand lines of code and there are 100's of enumerations throughout. I needed a quick and easy way to clean up the excessive quotes wrapping enum strings because it looks hideous!

Typically, I'm reviewing OpenAPI docs in VS Code because first and foremost, it's my favorite editor, and secondly, the plethora of extensions for making my life easy to read 10-thousand line documents.

some of my faves: Indent-Rainbow Spectral API Linter Prettier

Start with ctrl + f / cmd + f and select the .* icon, or toggle regex with alt + r / cmd + r

There is probably some caveat here about using example(s) in OpenAPI because there are a few different ways to represent them, depending on the version. I'm using example at the schema level which means we don't have any representation of an array example which can be overwritten by my regex.

We start with an OpenAPI document similar to our Animal example:

openapi: 3.0.3
info:
  description: This is a great representation of an Animal
  version: '1.0.0'
  title: Animal
  contact:
    name: 'jeremy'
    email: 'hola@jeremyfiel.com'
paths:
    /endpoint:
      get:
         ...
components:
  schemas:
    Animal:
      description: Some type of living creature
      type: object
      required:
        - breed
      properties:
        breed:
          type: string
          enum:
            - 'GERMAN SHEPARD'
            - 'BEAGLE'
            - 'RETRIEVER'
          example: 'BEAGLE'

I'm looking for any representation of an enumeration in the file.

The formatting is:

  • any line starting with spaces or tabs
  • eventually a -
  • followed by a single space to indicate the array of enumerations available.
  • Then, I'm expecting a string wrapped in some form of quotes.
 ^(\s+-\s)  ('|"|`)  (\w+((\s|-)+\w+)*)  ('|"|`)$
 ^---1---^  ^--2--^  ^--------3--------^ ^--4--^

Capture Groups:
  one: starts with any number of spaces or tabs, a dash and exactly one following space
  two: finds any type of quote. single('), double("), backtick (`)
  three: finds any word representation, including spaces, underscore, or hyphens
  four: finds any type of quote. single('), double("), backtick (`)

VS Code uses ripgrep as their regex search engine, so we are able to use positional arguments provided by the bash scripting engine. This is why I've split the regex into capture groups. Each capture group is returned as a separate argument and we can utilize these arguments to replace our enum strings without the ugly quotes.

The arguments follow a numbering sequence and can be called directly in the replace text field.

If we were to look at the argument sequence, it breaks down like this:

  • $0 is equivalent to the first match of the entire find command. In our case, the result of the regex pattern we used to search ......- 'GERMAN SHEPARD'
  • $1 is the indentation, dash and single space before the string ......-
  • $2 is the first set of quotes '
  • $3 is our enum string GERMAN SHEPARD
  • $4 is the final set of quotes '

Now we can use these arguments to replace the text without the quotes.

VS Code Find & Replace dialogue

$1$3

   $1 holds the first capture group which is "        - ".  This is the indentation and dash 
   $3 holds the third capture group which is our enum string *without* quotes.

Viola!


      properties:
        breed:
          type: string
          enum:
            - GERMAN SHEPARD
            - BEAGLE
            - RETREIVER
          example: 'BEAGLE'

It's so beautiful!! Now I can get back to reviewing my documents without stabbing my eyeballs with a bunch of ugly quotes.

If you like tips like these, let me know!