Generating a sequence of numbers in a query is fairly hard and time consuming in SQL, but super useful for things like generating data and avoiding loops. A common solution is to use a prebuilt numbers table.
At Narrator we frequently use a sequence to generate test data, so we have a numbers table of our own:
number123...9999999
We generated the data in Python as a dict – we have a backend system that can insert tables.
N = 10000000
number_data = dict(
columns=[dict(name="number", type="integer")],
rows=[dict(number=num) for num in range(N)],
)
Generate 10M rows in Python
But what if you wanted to do this in SQL? Here's how to do it for a few different warehouses.
The following approach uses a cross join to generate roughly 60m numbers. It's very standard and should work in nearly all warehouses. I've tested it on Azure SQL, BigQuery, Postgres, Redshift, and Snowflake.
-- Generates a numbers table
-- Works on Azure SQL, BigQuery, Postgres, Redshift, and Snowflake
--
-- empty_rows generates 36 rows with a single column
-- the FROM statements are cross joins -- each multiplies the number of rows by 36
-- row_number gives us our desired output
-- going all the way to e is 36^5 = 60M rows
WITH empty_rows AS (
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n
)
SELECT row_number() over (order by a.n) as number
FROM empty_rows as a, empty_rows as b, empty_rows as c, empty_rows as d, empty_rows as e
view rawnumbers_table.sql hosted with ❤ by GitHub
It's easy to tune it to create a desired number of rows – either add a limit or adjust the number of cross joins. Each new select will multiply the total row count by 36.
Some warehouses have their own ways to directly create sequences of numbers. Those queries are a bit easier to understand but otherwise will have the same result.
Snowflake has a generator function to do this automatically. The following code is basically straight from the docs. Here we're selecting the seq4()
function which generates a 4-byte integer sequence.
select seq4() as number
from table(generator(rowcount => 10000000))
order by 1;
Generate 10M rows in Snowflake
Postgres supports generate_series() so this is fairly straightforward
SELECT * FROM generate_series(0,10000000);
I haven't tested the performance of this but if you're using numbers this large it's probably best to create a table or materialized view.
For Redshift the general cross-join SQL is the best to use.
Some people also use generate_series(), but that's officially unsupported and won't work on queries that insert data.
I'm not aware of any specific way to generate numbers with Microsoft SQL Server. The general-purpose cross-join approach is the best.
StackOverflow has an answer suggesting the use of the generate_array
function but I haven't had a chance to try out this approach. It looks like it's limited to about 1M rows per call – and I have no idea how it performs.
SELECT num FROM UNNEST(GENERATE_ARRAY(0, 10000000)) AS num;
Generate 10M rows in BigQuery
If you have a pointer to a snippet for any other warehouses please feel free to contact me and I'll update the post.