Hi Everybody,
Where can I find some good educational material on tuning ETL procedures for a data warehouse environment? Everything I've found on the web regarding query tuning seems to be geared only toward OLTP systems. (For example, most of our ETL queries don't use a WHERE statement, so the vast majority of searches are table scans and index scans, whereas most index tuning sites are striving for index seeks.)
I have read Microsoft's "Best Practices for Data Warehousing with SQL Server 2008R2," but I was only able to glean a few helpful hints that don't also apply to OLTP systems:
- often better to recompile stored procedure query plans in order to eliminate variances introduced by parameter sniffing (i.e., better to use the right plan than to save a few seconds and use a cached plan SOMETIMES);
- partition tables that are larger than 50 GB;
- use minimal logging to load data precisely where you want it as fast as possible;
- often better to disable non-clustered indexes before inserting a large number of rows and then rebuild them immdiately afterward (sometimes even for clustered indexes, but test first);
- rebuild statistics after every load of a table.
But I still feel like I'm missing some very crucial concepts for performant ETL development.
BTW, our office uses SSIS, but only as a glorified stored procedure execution manager, so I'm not looking for SSIS ETL best practices. Except for a few packages that pull from source systems, the majority of our SSIS packages consist of numerous "Execute SQL" tasks.
Thanks, and any best practices you could include here would be greatly appreciated.
-Eric
.
social.technet.microsoft.com/Forums
No comments:
Post a Comment