Showing posts with label star join optimization. Show all posts
Showing posts with label star join optimization. Show all posts

Tuesday, May 14, 2013

[SQL Server Data Warehousing] star join optimization, quick questions...


A lot of questions in one call:


1. yes you can best use integers as surrogate keys between fact and dimension tables


2. Also for your time dimension, you can best use integer datatypes, but here it doesn't need to be an incremental number, you can transform your datevalue in a integer. Eg: YYYYMMDD if your granularity is limitted to days, YYYYMMDDHHMM is your granularity needs to include minutes.


3. Your choice wether you use a bigint or tinyint is purely based in content, if your mex value remains well under 2 billion, you should use an INT, here a bigint will only cause a -slight- penalty on performance with no added value.


4. A well designed ETL process should not have any negative impact on declarative integrity. On the other hand, not having relations defined can - and sooner or later will - cause data corruption that sometimes is very hard to solve.


5 Indexing is a more difficult subject. You should certainly not create one big index over all columns (unless you use columnstore) nor you should create a separate index for each individual column. Index strategy is comparable to OLTP systems were things like selectivity and usability are important. Short: make sure you know how data are accessed by the clients, build your indexes on that info and test, test, test


Hopes this helps



Geert Vanhove DCOD ------ http://geertvanhove.wordpress.com/ ----------- Please click the Mark as Answer or Vote As Helpful if a post solves your problem or is helpful!



.

social.technet.microsoft.com/Forums

Friday, May 10, 2013

[SQL Server Data Warehousing] star join optimization, quick questions...


A lot of questions in one call:


1. yes you can best use integers as surrogate keys between fact and dimension tables


2. Also for your time dimension, you can best use integer datatypes, but here it doesn't need to be an incremental number, you can transform your datevalue in a integer. Eg: YYYYMMDD if your granularity is limitted to days, YYYYMMDDHHMM is your granularity needs to include minutes.


3. Your choice wether you use a bigint or tinyint is purely based in content, if your mex value remains well under 2 billion, you should use an INT, here a bigint will only cause a -slight- penalty on performance with no added value.


4. A well designed ETL process should not have any negative impact on declarative integrity. On the other hand, not having relations defined can - and sooner or later will - cause data corruption that sometimes is very hard to solve.


5 Indexing is a more difficult subject. You should certainly not create one big index over all columns (unless you use columnstore) nor you should create a separate index for each individual column. Index strategy is comparable to OLTP systems were things like selectivity and usability are important. Short: make sure you know how data are accessed by the clients, build your indexes on that info and test, test, test


Hopes this helps



Geert Vanhove DCOD ------ http://geertvanhove.wordpress.com/ ----------- Please click the Mark as Answer or Vote As Helpful if a post solves your problem or is helpful!



.

social.technet.microsoft.com/Forums

Search This Blog