Friday, June 7, 2013

[SQL Server Data Warehousing] Collecting data from multiple sites


Collecting data from multiple sites



Your question has two different parts;


1- archiving existing databases


2- Integrating ,cleansing, de-duplicating (maybe), data warehousing, ETL , Cube


for the first part there are many articles published for archiving SQL Server databases. for example:


http://www.mssqltips.com/sqlservertip/1121/archiving-data-in-sql-server/


I am going to answer the second part of your question, which is more related to Data Warehousing and BI;


First of all you need to integrate data from those several hundred locations into an integrated database (this can be called as integrating or staging area). SSIS is good data transfer tool which helps to a lot in transferring data from those sources into the integrated database. you can use different methods for transferring data in SSIS, use FTP, exported CSV files, connect to SQL Server databases (if live connection exists), web services ,....


You will also need to design the data warehouse, Kimball is one of the most common methodologies of designing data warehouse based on business requirements. you can use The Data Warehouse Toolkit book from Kimball to learn about Dimensional Modeling.


After designing the data warehouse you will require an ETL (Extract Transform Tool) to get data from the integrated database (or staging) into the data warehouse. SSIS again is very powerful ETL tool. it will provide massive amount of data transformation components for using in ETL process.


For data cleansing; I recommend using Data Quality Services (DQS) which is part of SQL Server 2012 services. with this service you can create knowledge bases for different business domains, and somebody from business will play role of Data Steward to maintain that knowledge base, In SSIS there is a DQS Cleansing component which talks with DQS and apply data quality rules of knowledge bases to incoming data stream in ETL process, and as a result you will have more cleansed data.


For data consistency and governance; The best way to keep data consistency is a master data management tool, Microsoft tool for MDM is Master Data Services (MDS). this tool is part of SQL Server installation as well, you will create data models with entities, relationship between entities, and business rules. MDS provides only ONE SINGLE VERSION OF TRUTH for your data structure, this means that you can have only single version of customer information. or only single version of product info. this will help data consistency.


Finally for working with OLAP Cubes; you can use Analysis Services (SSAS) to build olap cubes on your data warehouse.


and if you want to create reports and dashboards from your cube; there are many data visualization tools in Microsoft stack, such as SSRS, PerformancePoint, Power View, Excel and Excel Services



http://www.rad.pasfu.com



.

social.technet.microsoft.com/Forums

No comments:

Post a Comment

Search This Blog