Where's Baldo?

Using LAMBDAs in Google Sheets to Create Filler Data

10 min read
Data ProcessingGoogleHow-toLAMBDASpreadsheets
Article author: michael
michael
Published:
Cover Image for Using LAMBDAs in Google Sheets to Create Filler Data

LAMBDA functions in Google Sheets are a pretty powerful, but poorly documented feature. They allow you to do some pretty cool things though, once you wrap your head around why they’re even useful in the first place.

In this post I’ll explore a particular use case for them: using LAMBDAs to generate filler data for queries in Google Sheets.

What are LAMBDA functions?

Ok so first off, what exactly are LAMBDA functions in Google Sheets? And I say LAMBDA functions in Google Sheets, because LAMBDA functions are a thing in programming languages too, just not the same thing.

If you search online for LAMBDA functions in Google Sheets, you’ll find a lot of sites regurgitating the same information that Google has in their docs, without really explaining what they are or how they can be used.

So, in a nutshell, a LAMBDA function is a small, anonymous, inline, custom function that you can define in a cell in Google Sheets that can use any of the built-in Google Sheets functions. That’s basically all they are… but the magic isn’t so much in what the LAMBDA function is, but in how you can use it. And while LAMBDA functions themselves are anonymous, you can use them in named functions, therefor making them no longer anonymous. But that’s something for another day!

Ok, but back to LAMBDAs in Google Sheets …

Now just to be clear, on their own, LAMBDA functions are pretty useless. They’re just a way to define a function in a cell, and they can’t do anything the built-in functions can already do on their own without the use of a LAMBDA. But when you combine LAMBDAs with the special helper functions that make use of them (MAP, REDUCE, BYCOL, BYROW, SCAN, and MAKEARRAY), they start to become somewhat magical. 😏

Basic Syntax

The basic syntax for a LAMBDA function is as follows:

=LAMBDA([parameter1, parameter2, ...], expression)

Where parameter1, parameter2, etc. are the parameters that the function will take, and expression is the expression is the code (functions) that you want to run.

It’s important to note though that while you can sometimes use multiple parameters in a LAMBDA function, some of the helper functions restrict you to 1 or 2 parameters, so you’ll need to be aware of this when you’re defining your LAMBDA functions.

So as an example, let’s say you wanted to create a LAMBDA function that would take a number and return that number squared. You could define it like this:

=LAMBDA(x, x^2)

… but actually, that would give you an error and wouldn’t work. Used on its own, the LAMBDA needs to be called with the parameters. So if for example you wanted to square the number 5, you’d need to call it like this:

=LAMBDA(x, x^2)(5)

In the above example, the number 5, which is passed here to the LAMBDA as the parameter x, is squared and returned as the result of the LAMBDA function. So basically, the above expression would return the number 25 in the cell.

Like I said, used on their own, LAMBDA’s are pretty useless because you could just write the expression directly in the cell.

What’s the point of LAMBDA functions then?

So if LAMBDA functions are pretty useless on their own, what’s the point of them?

Well, like I mentioned above, the magic of LAMBDA functions comes when you use them in conjunction with the special helper functions that Google has provided.

These helper functions allow you to apply the LAMBDA function to a range of data, and do some pretty cool things with the data in those cells. And when I say “range of data”, I mean that you can supply a range of cells, or a range of data from the output of other functions.

Using LAMBDA functions to Generate Missing Data

For a better example of the power of LAMBDA functions, I’m going to describe an issue I ran into recently, and how I was able to use LAMBDA functions to solve it.

The Problem

In my Google spreadsheet, I have various sheets that contain data that I want to query and use in a main sheet.

The data in the sheets are structured similarly, with them all containing a single column for dates, and another column for dollar amounts. Not all of the sheets have the same amount of columns though, nor do they have the same placement of the columns.

Take these two example sheets in the same spreadsheet:

Data Sheet 1

Data Sheet 2

The one on the left has 4 columns, with the date and totals in the 1st and 4th column respectively, while the one on the right has 3 columns, with the date and totals in the 2nd and 3rd columns respectively.

Note that for this example scenario, I’ve simply used the same dates and totals in both examples for simplicity.

Note also however that the data doesn’t contain dates and totals for every month. Now if we’re just doing some SUM’ming of the totals, we probably wouldn’t care. But if we want to pull a SUM of totals by month, for all months, we have a problem.

For a sum by month type of situation, I’d likely be using the QUERY function, with something along the lines of (using Data1 as the sheet name for the first example):

=QUERY(Data1!A2:D, "SELECT YEAR(A), MONTH(A), SUM(D) WHERE A IS NOT NULL GROUP BY YEAR(A), MONTH(A) LABEL YEAR(A) 'Year', MONTH(A) 'Month', SUM(D) 'Total'")

Running that in my test sheet gives the following output:

Data Sheet 1 Query

As expected, it gives me the sum of the totals by month, but only for the months that have data in the sheet. It’s absolutely crucial that I get the sum of the totals for all months though, even if there’s no data for that month in the sheet.

The Solution in a Programming Language

When I’m coding in an actual programming language, in this type of scenario, I’d generally:

  1. Create a two-dimensional filler array with the 0-index containing the numbers 1 to 12, and the other index initialized with a default sum of 0: [[1,0],[2,0],...,[12,0]]
  2. Run the code (query a DB for example) that gets the data
  3. Use a .map() function to iterate over the array of numbers, and for each number, check if there’s data for that month. If there is, assign/return the sum of the totals for that month at the 2nd index, and if there isn’t, assign/return 0.

So basically, what I’m missing here is a filler array, and possibly a map function similar to that in programming languages. Unfortunately, that I could tell anyway, there’s no direct way to make a filler array (range) in Google Sheets, and the built-in MAP function doesn’t work the same way. So I had to get creative.

The Solution in Google Sheets

So, how do I get the same result in Google Sheets?

Well, of course I’m sure there are several ways, but this is what I came up with, using some LAMBDA functions…

=BYROW(
  SEQUENCE(num_years*12,1,1),
  LAMBDA(seq_num,BYCOL(
    MAKEARRAY(1,num_cols,LAMBDA(row,col,col)),
    LAMBDA(col_idx,
      IF(
        col_idx=date_index,
        DATE(start_year,seq_num,1),
        IF(
          col_idx=fill_index,
          filler,
          ""
        )
      )
    )
  ))
)

In the above example, I’ve tabbed it out and placed it on multiple lines for easier digesting, and I’ve used some names that need substituting, depending on the situation:

  • num_years: the number of years to generate data for
  • num_cols: the number of columns to output
  • date_index: the index of the column that contains the dates
  • fill_index: the index of the column to fill with the filler data
  • start_year: the year I want to start the dates from
  • filler: the actual value I want to fill the column with, like 0 or ""

In my example data above, I had dates mostly from 2023, so I’m using 1 for num_years, 4 for num_cols since my first example data set had 4 columns total, 1 for the date_index since the dates are in column 1, 4 for the fill_index since my SUMs will be in the 4th column, 2023 as the start_year, and finally, 0 as the filler value.

So with those substitutes in place, I end up with:

=BYROW(
  SEQUENCE(1*12,1,1),
  LAMBDA(seq_num,
    BYCOL(
      MAKEARRAY(1,4,LAMBDA(row,col,col)),
      LAMBDA(col_idx,
        IF(
          col_idx=1,
          DATE(2023,seq_num,1),
          IF(
            col_idx=4,
            0,
            ""
          )
        )
      )
    )
  )
)

And after it runs, I end up with this output:

Filler output

This now gives me a range of data I can add in with my other data, and then run my QUERY function on, to get the sum of the totals by month, for all months, even if there’s no data for that month in the sheet. I’ll give you a look at what that looks like in just a bit, but first, let’s break down the above formula a bit.

Breaking Down the Formula

So, what’s going on in the above formula?

Functions in spreadsheets are a bit of a pain to follow, especially when they’re nested like this, but I’ll try to break it down as best I can.

I’ll start with the innermost part of the formula, and work my way out.

The Innermost Part

The innermost part of the formula is the MAKEARRAY function:

MAKEARRAY(1,4,LAMBDA(row,col,col))

This will create a 1x4 array, with the column index (from 1 to 4) as the value in each outputted column. The output of this function will essentially look like:

| 1 | 2 | 3 | 4 |

What’s the use of this? Well, I want a row of data with index values for the columns, so that I can use them in the next part of the formula.

The Next Part

The next part of the formula is the BYCOL function, which is a bit more involved:

BYCOL(
  MAKEARRAY(1,4,LAMBDA(row,col,col)),
  LAMBDA(col_idx,
    IF(
      col_idx=1,
      DATE(2023,seq_num,1),
      IF(
        col_idx=4,
        0,
        ""
      )
    )
  )
)

This function will take the output of the MAKEARRAY function, and then iterate over each column, applying a LAMBDA function to each column.

The LAMBDA function will check the index of the column, and if it’s the 1st column (the 1st column because that’s where our dates are), it will output a date, and if it’s the 4th column (because that’s where we want the filler data), it will output the filler value. Otherwise it will simply output an empty string, because we don’t care about the other columns.

The Outermost Part

The outermost part of the formula is the BYROW function:

BYROW(
  SEQUENCE(1*12,1,1),
  LAMBDA(seq_num,
    BYCOL(
      MAKEARRAY(1,4,LAMBDA(row,col,col)),
      LAMBDA(col_idx,
        IF(
          col_idx=1,
          DATE(2023,seq_num,1),
          IF(
            col_idx=4,
            0,
            ""
          )
        )
      )
    )
  )
)

This code will take the output of the SEQUENCE function, which in our case here will output a sequence of rows with the numbers 1 to 12 (our months of the year), and iterate over each of the rows (hence the name “BYROW”), applying a LAMBDA function to each row.

The LAMBDA here is going to run the BYCOL code-block that I explained in the previous section above, for each of the rows outputted by the SEQUENCE function (which contain the numbers 1 to 12), and will use those numbers as the month for the date in the BYCOL code-block.

It’s a bit complicated when you look at it as one code-block, but if you break it down into its parts, it’s not too bad. Or it least I hope it’s not! 😅

Applying the Filler Data to my Original Query

Ok, so now the fun part … applying the filler data to my original query.

Now that I have my filler data, I can add it to my original query, and get the sum of the totals by month, for all months, even if there’s no data for that month in the sheet.

So, if I add the filler data code-block to the original query, I end up with this bad boy:

=QUERY({
  Data1!A2:D;
  BYROW(
    SEQUENCE(1*12,1,1),
    LAMBDA(seq_num,
      BYCOL(
        MAKEARRAY(1,4,LAMBDA(row,col,col)),
        LAMBDA(col_idx,
          IF(
            col_idx=1,
            DATE(2023,seq_num,1),
            IF(
              col_idx=4,
              0,
              ""
            )
          )
        )
      )
    )
  )
}, "SELECT YEAR(Col1), MONTH(Col1), SUM(Col4) WHERE Col1 IS NOT NULL GROUP BY YEAR(Col1), MONTH(Col1) LABEL YEAR(Col1) 'Year', MONTH(Col1) 'Month', SUM(Col4) 'Total'")

And the output I get, again using the initial data I showed above, is:

Query output with filler data

… with all of the months in 2023 now showing, and the totals for each month, even if there’s no data for that month in the sheet!

Note: I’ve used the Col1, Col4, etc. notation in the query, because the data is now in an array, and not in a sheet. The Col1, Col4, etc. notation is how you reference the columns in an array in Google Sheets.

Also note that the months are one order off from the actual month numbers, because the QUERY function, like in actual SQL, uses 0-indexed numbers for the months. So January is 0, February is 1, etc.

Conclusion

So, that’s how I used LAMBDA functions in Google Sheets to generate filler data for my date-based queries. A bit of a mess I’ll admit, but when you’re working with spreadsheets, sometimes you have to get a bit creative. 🥴

I hope this post has helped you understand LAMBDA functions a bit better, and how you can use them to do some pretty cool things in Google Sheets.


Was any of this unclear? Do you have any questions? Or maybe you have a better way to do this? Let me know in the comments below! 👇

Until next time,
michael 😀

Share this post:

Comments