It is common in Enterprise scale projects to simulate the test environment with very large-scale databases. The preparation of real time data and loading consumes considerable amount of time during the environment setup. The SQL Server 2005 partition techniques allow parallel data loading into the tables using BCP / Bulk Insert statements. The data loading of 10 Million records into “Sales” tables using the partition and non-partition methods is explained below.
Data loading with out Partition
CREATE TABLE Sales
(
UniqueID BIGINT,
ItemName VARCHAR(100),
SaledAmount NUMERIC(10,3),
SalesDate DATETIME
)on [Primary]
The data loading for this table can be done using the BCP / Bulk Insert options. But we cannot load the data in parallel, as it will lead to table locking
Data loading with Partition
CREATE TABLE Sales(
ItemName VARCHAR(100),
SaledAmount NUMERIC(10,3),
SalesDate DATETIME,
PartitionID INT
)on fPartitionID(PartitionID)
The table “Sales” is partitioned using the column PartitionID into 10 different partitions (say PartitionID accepts value between 1.10)
As the table is split into 10 different partitions, data loading of this table can be done in parallel follows
- Generate the data into to 10 X 1 Million files
- Load data into the table using 10 BCP / Bulk Insert instances in parallel
Conclusion
The performance of the data loading can be improved up to 10 times if the table is partitioned. This option can be preferred only if there are sufficient hardware resources. As CPU usage of BCP is very high, the number of parallel instances can be reduced as per the resource availability