Benchmarking tables with millions of rows

from the Artful SQL Server & Access Tips List


When you perform benchmark tests, you probabl do this: design a database containing no rows in any table (because all queries are lightning fast), populate the database with a few rows of test data, and then test the database.

How accurate will your test results be if you know that heavily-trafficked tables in the production database will contain millions of rows?

To get an idea of what performance will be like when millions of rows are involved, you need to manufacture a million or so rows, beginning with an existing table that has 1,000 rows. You’ll need to give all of the rows a primary key. Also, you'll do each job separately, as shown in the following code:

USE northwind
SELECT COUNT(*) FROM [Source Table]
SELECT *
    INTO TempTable
    FROM [Source Table]
SELECT COUNT(*) FROM [TempTable]
GO

DECLARE @i Integer
SET @i = 1
WHILE @i < 10
    BEGIN
        INSERT INTO TempTable
            SELECT * FROM TempTable
            SET @i = @i + 1
    END
SELECT COUNT(*) FROM TempTable
SELECT Identity( int, 1, 1 ) AS PK, *
    INTO BigTable
    FROM TempTable
GO

SELECT COUNT(*) FROM BigTable
GO

DROP TempTable

CREATE CLUSTERED INDEX
    BigTable_Index
ON
    BigTable (PK)

This code manufactures a million rows from the source table nine times; then it adds an Identity primary key to the table using the SELECT Identity construct.

Now you can perform your benchmark tests with a realistic data set. If you need 10 million rows for a realistic data set, just modify the WHILE @i line to set a new upper limit.

Note: Although it's possible to write a tool that will manufacture data, it's beyond the scope of this tip. However, there are tools available that manufacture data. Sybase PowerDesigner contains a pretty smart one that lets you populate text files with sample data for each table and then manufactures various combinations according to the relational tree.


Return to the Artful SQL Server & Access Tips page