[how to] Suitable database software for small business

Suitable database software for small business
Buffered operations with MongoDB
oracle blocking sessions
Recommendation for RDBMS for large table
Expanding a dataset based on a weight field
Example of loading into RDS from S3 using SQL*Loader
Mongo repair database "ERROR: error: exception cloning object in local.system.replset "
Query sql-server db and insert results into PostgreSQL
Are there any third party tools that can read and/or analyze SQL Profiler trace (.trc) files
Select rows, where 3 columns have the same values
Is there a DB that will allow the same field name with different cases?
What's the optimal server configuration for fastest inserting in a MySql database in this scenario?
MongoDB RAM requirements doubt
Monthly backup of SQL server DB to PostgreSQL?
Profiling PostgreSQL
Relationship between professor and course
Fixing wildcard expansion resulting in too many terms error (DRG-51030)
pgsql slow query on table of 500 million rows
Oracle Enterprise Manager Database Console - default login and password?
Should I refresh query plan cache
Push Data from MySQL to SQL Server
Convert units of measurement
export (mysqldump) databases from MySQL version 4.0?
Where and how to store my weird datamart
why would command text not execute at all?
Avoiding performance hit from GROUP BY during FULLTEXT search?
AWS performance of RDS with provisioned IOPS vs EC2
sp_startpublication_snapshot Parameter(s)
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Suitable database software for small business

Posted: 24 Jul 2013 08:45 PM PDT

First of all, I apologise if this is in the wrong area.

I'm looking for a suitable database program for a small engineering business, which needs to log various jobs. Currently, we're just using an Excel spreadsheet shared over the cloud, however this has problems with conflicts when two people edit the file at the same time.

What we want is something that can operate over the internet (so that multiple computers can access it from different locations), allowing multiple editors simultaneously. It also needs to be capable of conditional formatting.

A screenshot example of what we currently have is here

From my research I have a vague inkling that SQL would be a good base, but I'm really looking for a program that requires absolutely no background knowledge of how the language works. i.e something that isn't going to take a significant amount of time to learn to use. If there's info that I've not included, please let me know.

Thanks in advance.

Buffered operations with MongoDB

Posted: 24 Jul 2013 08:22 PM PDT

I have a server with 8GB of disk space running MongoDB. It collects 250,000 documents a day (spread across three collections), so the 8GB is only enough to last it maybe a week or so before I have to push the documents periodically to a server with more space. Unfortunately, the machine isn't online 95% of the time and connecting it to a dedicated MongoDB server with more disk space isn't an option either.

My current approach is to connect to the machine every few days, transfer all of the documents, and wipe the database, but that seems a little dangerous and hacky.

I looked into replication, but that will still have the disk space problem and from what I can tell. Sharding won't work either, as I'd like to maintain a complete duplicate of the database on at least the computer that I store the full database on.

Does MongoDB have any features that I overlooked that can work for this scenario? Maybe some sort of read/write buffer that can be flushed?

oracle blocking sessions

Posted: 24 Jul 2013 08:26 PM PDT

I am trying to investigate some concurrency issues on Oracle 11 RAC. So far the most helpful tool I have is "blocking sessions" view of the Enterprise Manager, which tells me which query is blocking everyone else and what it is waiting for. However, to take advantage of that view, I need to catch the problem as it is happening. So I'm looking for a way to query oracle for historic data which would give me data similar to what "blocking sessions" screen has to offer in Enterprise Manager.

Recommendation for RDBMS for large table

Posted: 24 Jul 2013 05:22 PM PDT

I have been asked for recommendation on the optimal RDBMS for a large amount of data. The largest table would contain 2TB of data. My own largest table is only 400GB (mysql, performs very well).

The individual rows are going to be short, no blobs/etc, just a lookup table.

Is PostgreSQL nowadays performing better than MySQL? Can I ask DBAs with tables in this order of magnitude for their experience? Oracle might be considered as well. The available hardware is probably a standard linux box with about 64G ram.

Expanding a dataset based on a weight field

Posted: 24 Jul 2013 07:44 PM PDT

I have a table that contains records in an un-expanded form. Each record has an associated integer weight that essentially informs us how many times the record should be replicated in order to get the true population.

Say, I have 3 records in my table. sampn ids a unique record and weight is the frequency weight. The un-expanded dataset looks like this:

sampn   weight  attrib1 attrib2 attrib3      1       2       23      32      65      2       1       32      56      75      3       3       54      25      87

Once expanded, the dataset will be like this (note - I removed the weight field - but this is not essential):

sampn   attrib1 attrib2 attrib3      1       23      32      65      1       23      32      65      2       32      56      75      3       54      25      87      3       54      25      87      3       54      25      87

I have tried to do this using cursors but it is taking a really long time to execute. Is there a clever way to do this really fast? Any predefined T-SQL stored procedure that achieves this?

Example of loading into RDS from S3 using SQL*Loader

Posted: 24 Jul 2013 04:25 PM PDT

Does anyone have a working sample of loading data into an Amazon AWS RDS Oracle database using SQL*Loader where the source file is on S3?

In principle it's clearly possible. The Amazon RDS Data Import Guide for Oracle documents using sqlldr. But the only sample there loads data from a local EC2 instance.

I have worked with RDS and with S3 and with SQL*Loader before, but I haven't worked with all 3 together. One sample would be really useful. (Related advice would be welcome too... but a sample is all I'm really looking for.)

Mongo repair database "ERROR: error: exception cloning object in local.system.replset "

Posted: 24 Jul 2013 03:37 PM PDT

I'm trying to repair a mongo database and am running into the following error early on:

Wed Jul 24 16:55:21 [initandlisten] ERROR: error: exception cloning object in local.system.replset   E11000 duplicate key error index: local.system.replset.$_id_  dup key: { : "replicaset-name" }   obj:{ _id: "replicaset-name", version: 92, members: [ { _id: 16, host: "host1:27032" },   { _id: 17, host: "host2:27032" },   { _id: 18, host: "host3:27032", priority: 0.0, hidden: true },   { _id: 19, host: "host4:27032", priority: 6.0 },   { _id: 20, host: "host5:27032", priority: 7.0 },   { _id: 22, host: "host6:27032", priority: 4.0 },   { _id: 23, host: "host7:27032", priority: 3.0 },   { _id: 24, host: "host8:27032", votes: 0, priority: 0.0 } ] }

(replica set and host names have been sanitized for public posting)

To attempt the repair I shut down the mongod and started it up with

/usr/bin/mongod-2.2 -f /var/local/mongo/mongod.conf --nojournal --repair --repairpath /mnt/mongo_repair/

My mongod.conf consists of just two lines:

dbpath = /var/local/mongo/data/replicaset-name  port = 17000

The .conf normally did have a replSet declaration, but this was removed for the repair as I'm trying to run the repair on a secondary. (Per the docs repairs can only be done on primaries or standalones; so need to make it look like a standalone temporarily).

Any idea what the problem could be?

Query sql-server db and insert results into PostgreSQL

Posted: 24 Jul 2013 01:10 PM PDT

I have a query that returns data from sql-server and I need to be able to put the results into PostgreSQL. I'm using C# for the task.

Are there any third party tools that can read and/or analyze SQL Profiler trace (.trc) files

Posted: 24 Jul 2013 12:47 PM PDT

I have a .trc file from a trace that a DBA did on one of my databases. I don't have the SQL profiler tool installed on my PC so I can't view the contents and analyze the trace log. Are there any third party tools that can read and/or analyze SQL Profiler trace (.trc) files?

Select rows, where 3 columns have the same values

Posted: 24 Jul 2013 11:52 AM PDT

I have a structure like this:

 col0 col1 col2 col3   1    1    1    1   1    1    1    2   1    1    1    3   1    2    1    1   1    2    3    1   2    3    4    5   2    3    4    9

I now want to select all rows, where the values of the first three rows are equal, so the result of this query using the sample data would be:

 col0 col1 col2 col3   1    1    1    1   1    1    1    2   1    1    1    3   2    3    4    5   2    3    4    9

How can I achieve this, I'm kinda stuck at the early beginning...

Thanks in advance! Alex

Is there a DB that will allow the same field name with different cases?

Posted: 24 Jul 2013 01:13 PM PDT

Is there a DB that will allow a user to create a table with two fields, one calls MYFIELD, the other called myfield?

What's the optimal server configuration for fastest inserting in a MySql database in this scenario?

Posted: 24 Jul 2013 01:09 PM PDT

I have a MyISAM table partitioned by range with 90 partitions representing the last 90 days. I expect this table to receive somewhere between 500 million to 1 billion records before i start rotating partitions (dropping old ones and creating new ones).

What server configuration options would you put in your .my.conf file to ensure that the fastest possible insertion into this table was possible at all times?

Any other tips would also be appreciated.

EDIT: By 'fastest possible' i mean best possible insert speed and being sustained once it contains 100's of millions of rows.

MongoDB RAM requirements doubt

Posted: 24 Jul 2013 10:01 AM PDT

I have a data file with 100 million documents. The number of documents is expected to always remain between 100 million and 200 million for atleast a year. I have 15 GB Ram. Currently the update operation takes 2 hours to load. Suppose I would like to increase the performance to require 40 minutes to load, will increasing RAM be required? How much RAM would I need to add in order to achieve that increase? Please do let me know if I must provide additional details.

I have indexes for the updated fields as well as I have set --nojournaling option. Currently I have optimized the update process by using syncdelay 0 and then manually fsync-ing every 100K records. Then I implemented a system with only in-place updates rather than updates that add new fields or increase document size. Now I feel that I am at a point where increasing RAM might be the only way to go.

Monthly backup of SQL server DB to PostgreSQL?

Posted: 24 Jul 2013 06:13 PM PDT

The company I'm working for has a SQL Server with read-only access. They use Crystal Reports hooked up to PostgreSQL for reporting. Is there any way to make it so I can move all the data from the MSSQL DB to PostgreSQL without user interaction? That seems to be the caveat to what I'm trying to do. They need to be able to run this report after I leave without having to interact with it during the process.

Or am I looking at this the wrong way? Is there a way to save a "snapshot" of the SQL Server DB that can be manipulated in Crystal Reports? The ultimate goal is that since the DB is dynamic we need to be able to have a static DB at the end of the month that all the reports can be ran on without having to worry about it changing.

Profiling PostgreSQL

Posted: 24 Jul 2013 06:24 PM PDT

We have a system built with Python 2.7, Django, PostgreSQL 9.1 and PostGIS 1.5. In this system there is some extensive validation processes that are, well, intensive.

Our machine:

SO: CentOS 6 HW: 8 cores, 32GB RAM, 4Tb hard-drive;

We, for development and testing, are using the default PostgreSQL configuration, what I think, is somewhat wrong. I want to take advantage of the full capacities that our hardware provides, and so far we didnt got a lot of traffic. When this is release to the public, there will be a lot of users concurrently.

Our application has some queues and async strategies to handle the load, but, the main question is: how can I safely tune PostgreSQL, run some tests, and assure that the tuning was somewhat effective?

Relationship between professor and course

Posted: 24 Jul 2013 12:42 PM PDT

I'm designing a database where I have a Course table and and Professor table. The business rules are :

1) A professor can teach many courses. (This is obvious).

However, I get stuck in the second rule.

2) On a given semester, a course can be offered on different schedules, each of those offerings can be taught by a different professor. For example a typical freshman course on calculus has about 100 students every semester. The course is divided into 4 sections, each has 25 students and a different professor.

I would like to treat each combination of (course,professor) as a unique entry.For example, (Calculus , Prof Joe) and (Calculus, Prof Jhon) are two different courses. How do I implement this? Thank you

Fixing wildcard expansion resulting in too many terms error (DRG-51030)

Posted: 24 Jul 2013 03:54 PM PDT

How can I resolve the wildcard_maxterm problem in Oracle 10g?

My index creation syntax is:

begin      ctx_ddl.drop_preference('SUBSTRING_PREF');  end;  /    begin      ctx_ddl.create_preference('SUBSTRING_PREF', 'BASIC_WORDLIST');      ctx_ddl.set_attribute('SUBSTRING_PREF', 'SUBSTRING_INDEX','TRUE');  end;  /    begin      ctx_ddl.drop_preference('bus_obj1_lexer');  end;  /    begin      ctx_ddl.create_preference('bus_obj1_lexer','BASIC_LEXER');      ctx_ddl.set_attribute('bus_obj1_lexer', 'base_letter', 'YES');      ctx_ddl.set_attribute('bus_obj1_lexer', 'mixed_case', 'YES' );      ctx_ddl.set_attribute('bus_obj1_lexer','printjoins', ';,@_');        ctx_ddl.set_attribute('bus_obj1_lexer','skipjoins', '-');   end;  /    create index BUS_OBJ_FTS_Name on BUSINESS_OBJECTS1(name)    indextype is ctxsys.context     parameters ('wordlist SUBSTRING_PREF                  MEMORY 128M                  DATASTORE CTXSYS.DEFAULT_DATASTORE                 SECTION GROUP CTXSYS.AUTO_SECTION_GROUP                 STOPLIST CTXSYS.EMPTY_STOPLIST                                          LEXER bus_obj1_lexer                                          SYNC (ON COMMIT)');

And my query is:

select * from BUSINESS_OBJECTS1 where contains(name,'%WIN%')>0 and rownum<=100;

There are 15 million rows in that table, and more than 50000 rows match that query. I have set wildcard_maxterm=50000 and default=20000 but I'm still getting this error:

DRG-51030: wildcard query expansion resulted in too many terms.

Can anybody hep me how to solve this error?

pgsql slow query on table of 500 million rows

Posted: 24 Jul 2013 09:25 AM PDT

I have a table that keeps measurements of latencies between nodes running MPI tasks in a large cluster. The table looks like this:

CREATE TABLE latency(  from_rank int,  to_rank int,  from_host varchar(20),  to_host varchar(20),  from_cpu varchar(20),  to_cpu varchar(20),  latency float8);    CREATE INDEX ON latency(from_host, to_host);

Now after a large experiment I collected over 500 million rows of data. I find querying these data painfully slow, below is an example of a SELECT COUNT(*)

psql (9.4devel)  Type "help" for help.    routing=# \timing   Timing is on.  routing=# SELECT COUNT(*) FROM latency;     count     -----------   522190848  (1 row)    Time: 759462.969 ms  routing=# SELECT COUNT(*) FROM latency;     count     -----------   522190848  (1 row)    Time: 96775.036 ms  routing=# SELECT COUNT(*) FROM latency;     count     -----------   522190848  (1 row)    Time: 97708.132 ms  routing=#

I am running both the PgSQL server and client on the same machine, which has 4 Xeon E7-4870s (40 cores/80 threads in total) and 1 TB of RAM. The effect of Linux file caching is obvious: the first query took well over 12mins while the subsequent ones took about 1.5min.

Is there anything I can do to make the query run faster, since 1.5min isn't exactly responsive.

Thanks.

Oracle Enterprise Manager Database Console - default login and password?

Posted: 24 Jul 2013 02:12 PM PDT

I've just installed an Oracle 11g Standard Edition Database. It ran successfully and everything, however when going to Oracle Enterprise Manager Im not really sure what my "username" would be.

I entered a password during the installer, and I tried SYS and my password connecting as "Normal" and "SYSDBA". However, I'm not really sure what my username is meant to be. Is it just that I haven't set up a username/password for this yet?

Thanks

Should I refresh query plan cache

Posted: 24 Jul 2013 10:10 AM PDT

Please let me explain my problem and situation:

I have a web application - MVC3, MSSQL Server 2005, LinqToSQL. It has been running great until one fine morning I pushed a lot of rows to a table that is heavily used and since then I was getting query timeouts. In order to fix the problem I run the Database Tuning Advisor and I added some Indexes and Statistics. I also created a maintenance plan to rebuild indexes daily. After those additions, the application has been behaving unstable; it would work fast for couple of hours then it would start timing out again. Next, life forced me to clean up the table in matter, and the amount of rows in it is even smaller now than it was before but the timeouts are still happening. So, I removed all indexes that I created and now the website is much more stable but from time to time I still see some timeouts.

I've been trying to figure out how to fix those queries and when I profile it and paste the query directly into the SQL Management Studio it returns the results in 1 second, but when I run this query from my application, it's about 25 seconds. Then after it runs for the first time, next time it goes as fast as on the server!

I started doing some research and it looks like when I played with all those indexes my query plans got messed up and now they are creating issues.

My questions are :

Should I refresh my query plan cache (around 22000 queries - a lot of them has been used only once) and
If I do it, what would the impact be on the SQL while they are all rebuilding?

Push Data from MySQL to SQL Server

Posted: 24 Jul 2013 01:50 PM PDT

I am not sure if my title is correct, but I will try my best to explain the current problem.

Current Scenario:

I have a php web application with MySQL database (let's call it DB1) which is stored in Machine A.
I have another php web application with MySQL database (let's call it DB2) which is stored in Machine B,
ASP.Net Application with SQL Server database (let's call it DB3) which is stored in Machine C which will serve as a ledger.
They are connected via VPN.

I have completed this:

Let's say the transaction will go like this, a new transaction occurs, it will validate records on both code behind A and push to through Web Service on Machine B for validation and posting(Transaction ID, and corresponding Transaction Details)
I will then get the response from B and if successful I will proceed to the problem below.

I have been reading and trying the following:

MySQL connector ODBC: It is working coz I use it to Linked Server MySQL in SQL Server.
Linked Server on SQL Server: I am not sure though if this is the solution for my current problem, but at the moment I can view and query DB1 from DB3, but the trigger should be in DB1 to push the record to DB3 everytime there is a new record inserted on DB1.
Transactional replication: Not sure if it will work out
SSIS: I have tried to add both DB1 and DB3 by creating Integration Service and used mysql connector odbc and added dsn(both user and system) but there is an error like what I have mentioned below
MySQL Data Controller: Been reading, have not tried due coz I need to recompile feetds lib with distributive pack of MySQL
MySQL Federated Table: But this only works for MySQL(Both end)
Incremental load:

Problems:

How can I 'push' the record from DB1 (MySQL) to DB3 (SQL Server) and if successful get a valid response(ID preferably so that I can have a reference on both DB1 and DB3 (via trigger in DB1)
I have tried SSIS, successfully added ADO.Net Source for 127.0.0.1 for DB3 (the SSIS resides on Machine B), but when I am adding a connection to the DB1 that resides on Machine B I get an error("Error[IM002][Microsoft][ODBC Manager] Data Source Not found and no default driver found")
I have tried Linked server but somehow it only shows DB1 from DB3, any software or addons/plugins so I can view DB3 from DB1 ad push the records from DB1?
I need to have it as a transactional replication (somehow I read it is possible but at the moment I can only search for SQL Server to SQL Server)
I cannot put the inserting of new records/updating records on Code Behind(which is possible and easier) for security reasons (just in case there is/are changes on either DB). Same goes for the usage on Web Service for this part.
Performance likewise, will it be wise to do this if the number of transaction is large?(probably 360,000 inserted records in a daily basis)

Any detail explanation is greatly appreciated. Thanks

Taken from my question in: http://stackoverflow.com/questions/17583924/push-data-from-mysql-to-mssql

Convert units of measurement

Posted: 24 Jul 2013 06:28 PM PDT

Looking to calculate the most suitable unit of measurement for a list of substances where the substances are given in differing (but compatible) unit volumes.

Unit Conversion Table

The unit conversion table stores various units and how those units relate:

id  unit          coefficient                 parent_id  36  "microlitre"  0.0000000010000000000000000 37  37  "millilitre"  0.0000010000000000000000000 5   5  "centilitre"  0.0000100000000000000000000 18  18  "decilitre"   0.0001000000000000000000000 34  34  "litre"       0.0010000000000000000000000 19  19  "dekalitre"   0.0100000000000000000000000 29  29  "hectolitre"  0.1000000000000000000000000 33  33  "kilolitre"   1.0000000000000000000000000 35  35  "megalitre"   1000.0000000000000000000000 0

Sorting by the coefficient shows that the parent_id links a child unit to its numeric superior.

This table can be created in PostgreSQL using:

CREATE TABLE unit_conversion (    id serial NOT NULL, -- Primary key.    unit text NOT NULL, -- Unit of measurement name.    coefficient numeric(30,25) NOT NULL DEFAULT 0, -- Conversion value.    parent_id integer NOT NULL DEFAULT 0, -- Relates units in order of increasing measurement volume.    CONSTRAINT pk_unit_conversion PRIMARY KEY (id)  )

There should be a foreign key from parent_id to id.

Substance Table

The Substance Table lists specific quantities of substances. For example:

 id  unit          label     quantity   1   "microlitre"  mercury   5   2   "millilitre"  water     500   3   "centilitre"  water     2   4   "microlitre"  mercury   10   5   "millilitre"  water     600

The table might resemble:

CREATE TABLE substance (    id bigserial NOT NULL, -- Uniquely identifies this row.    unit text NOT NULL, -- Foreign key to unit conversion.    label text NOT NULL, -- Name of the substance.    quantity numeric( 10, 4 ) NOT NULL, -- Amount of the substance.    CONSTRAINT pk_substance PRIMARY KEY (id)  )

Problem

How would you create a query that finds a measurement to represent the sum of the substances using the fewest digits that has a whole number (and optionally real component)?

For example, how would you return:

  quantity  unit        label          15  microlitre  mercury          112  centilitre  water

But not:

  quantity  unit        label          15  microlitre  mercury         1.12  litre       water

Because 112 has fewer real digits than 1.12 and 112 is smaller than 1120. Yet in certain situations using real digits is shorter -- such as 1.1 litres vs 110 centilitres.

Mostly, I'm having troubles picking the correct unit based on the recursive relation.

Source Code

So far I have (obviously non-working):

-- Normalize the quantities  select    sum( coefficient * quantity ) AS kilolitres  from    unit_conversion uc,    substance s  where    uc.unit = s.unit  group by    s.label

Ideas

Does this require using log₁₀ to determine the number of digits?

export (mysqldump) databases from MySQL version 4.0?

Posted: 24 Jul 2013 09:37 AM PDT

Goal: export the schema and data from a MySQL version 4.0.16 DB.

Background:

I was asked to migrate a database from MySQL version 4.0.16. The database was installed on a Windows XP machine, and I just could not find mysqldump tool on that machine.

I then putty into a linux machine that has mysqldump. I ran the command, but got an error:

-bash-4.1$ mysqldump --databases db_2_dump --host='myhost' --port='6012' -u <user> -p >db-export.sql  Enter password:  mysqldump: Error: 'You have an error in your SQL syntax.  Check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT DISTINCT LOGFILE_GROUP_NAME FROM INFORMATION_SCHEMA.FILE' when trying to dump tablespaces

I looked up mysql reference. INFORMATION_SCHEMA is something new to version 5 and above.

This means the mysqldump on the linux box is incompatible with the production MySQL server which is on v4.0.16.

I tried to download Mysql 4.0 server and install it on a windows machine so that I can get hold of a compatible version, but searching on Mysql website found nothing older than 5.0.

(I also tried Mysql workbench to connect to this DB. Failed. "Unknown character set utf8")

So how can I export the schema and data at all from this legacy mysql DB???...

Where and how to store my weird datamart

Posted: 24 Jul 2013 02:55 PM PDT

I could really use some help here.

Here's my situation.

I'm building a datamart in SQL Server 2005, which will furnish a single report (for now). We have 26 clients (medical organizations) that will be using this report. Each client will have between 1 and 2,000 users accessing it at any given time (most likely to average in the 50's but it needs to scale). There will be 26 versions of the same rdl, one for each client. Each one will access its own respective client database.

The interface for said report is SSRS 2012. This is our first 2012 report - the rest are still 2005, and the rdl will access a 2005 database. I am not using OLAP; the SSRS report runs SPs and views.

We built a virtual server and installed SQL 2012 to host the report (both these virtual servers live on the same physical machine, in case that matters). Nothing else will be running on the SQL 2012 virtual server.

These are the facts about the environment...

Our system is not OLTP heavy - with one exception, which I'll describe below, it's all Read (except for the ETL of course).

We have one client-facing database per client, 26 altogether. In these we store transactional data, rolled-up data, some report-ready flat tables and a ton of T-SQL code that crunches numbers when clients pull up reports in SSRS. I'll refer to these databases as "operational" because, for me, they will essentially function as ODS's.

The operational databases are loaded by a horrific ETL process (clients have varying load schedules - usually monthly or weekly). I'll be building a mini ETL process (hopefully not so horrific) to populate the datamart from those operational databases.

All of the datamart's dimensions are conforming, but due to HIPAA restrictions, some of them (like Physician and Patient) can't be stored in a central database, and neither can the fact table. So there will need to be 26 versions of the same fact and dimension tables, one for each client.

There is a real-time component to our system. Physicians and nurses can enter transactional data via our site and all reports need to reflect the changes immediately. As for the datamart, they'll only affect the fact table. For what it's worth, this is why I decided not to use SSAS. I know diffs process really quickly but it just feels like too many moving parts.

I plan to create a trickle-in fact table with a view that combines it with the main fact table. Again, I'll need 26 of these. A pared down version of my new ETL process will need to run upon each user edit.

Here are my questions...

Where should I store the 26 sets of datamart tables?
- On a dedicated 2005 server, away from the SSRS server and the operational databases?
- On the same server as the operational databases but in dedicated dds databases?
- Within the operational databases themselves?
- On the SQL 2012 reporting server?
- On the moon? Other?
Where should I store the 26 trickle-in fact tables?
- In the same database as the main fact tables?
- On the same server as the operational databases but in dedicated DDS databases?
- Within the operational databases themselves? This seems logical to me given that they'll need to be unioned at runtime...
Should I create a central database for non-sensitive dimensions?
- Maybe create a cloning process to copy them to the individual DDS's?
- Or is it simpler to just have 26 of these darned things as well?

With all of these questions, I'm concerned about good design practice, but mostly about the performance of the report and the performance of the ETL that will need to run upon user edit.

I hope all of this made sense. I would very much appreciate any feedback!

EDIT: @Jon Seigel - There will be 26 versions of the same rdl, one for each client. Each one will access its own respective client database.

EDIT: @JNK - I merged accounts and read the faq's. Hopefully I'm responding correctly now.

why would command text not execute at all?

Posted: 24 Jul 2013 05:38 PM PDT

in my report i have::

<CommandText>SELECT         column1,          column2,           'poop'  from mytable  </CommandText>

i am using sql server profiler to see exactly what statement is being set.

i have set only two filters:

databaseName

yet after running the report, no statement gets intercepted.

i suspect that because i am a beginner at SSRS, i am missing something crucial here.

for what reason would commandtext not be executed at all?

i did follow this question, to make sure that i am using sql profiler correctly, and indeed, i am: http://stackoverflow.com/questions/9107383/sql-server-profiler-capture-calls-to-your-databases-stored-procs-during-ssrs

another bit of important information is although the chart shows no data:

enter image description here

i actually am indeed showing data when i run the commandtext from ssms!

Avoiding performance hit from GROUP BY during FULLTEXT search?

Posted: 24 Jul 2013 10:37 AM PDT

Is there any clever way to avoid the performance hit from using group by during fulltext search?

SELECT p.topic_id, min(p.post_id)   FROM forum_posts AS p   WHERE MATCH (p.post_text) AGAINST ('baby shoes' IN BOOLEAN MODE)  GROUP BY p.topic_id  LIMIT 20;

In this example it's fetching the lowest post_id for unique topic_ids that match the text.

With the group by to find the min, it's taking 600ms in a million row database, with about 50K rows examined.

If I remove the MIN but leave the GROUP BY, it's the same slowness, so it's the GROUP hit.

I suspect this is because it can only use one index, the fulltext ?

key: post_text | Using where; Using temporary; Using filesort    Query_time: 0.584685  Lock_time: 0.000137  Rows_sent: 20  Rows_examined: 57751  Full_scan: No  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: No  Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0

Without the GROUP BY it's 1ms so this has to be filesort speed?

(I've removed ORDER BY and everything else to isolate where the hit is)

Thanks for any insight and ideas.

(using MyISAM under mariadb if it matters)

AWS performance of RDS with provisioned IOPS vs EC2

Posted: 24 Jul 2013 11:37 AM PDT

Has anyone done a performance comparison of AWS RDS with the new provisioned IOPS vs EC2? I've found plenty of non-high IOPS RDS vs EC2 but nothing with the new high IOPS feature in RDS.

sp_startpublication_snapshot Parameter(s)

Posted: 24 Jul 2013 06:38 PM PDT

I am creating a stored procedure that:

Restores a DB from a .bak giving the .mdf and .ldf a new name (so we have have several copies of the same DB up
(If specified in the SP's parameter) Creates three merge replication publications
(What I need help doing) Generating the snapshots for the three publications using sp_startpublication_snapshot

Here is my new brick wall... On this DB server, I have a 'shell' db that they will be running the SP from, that has a history table so I can keep track of who created/deleted databases using my SP's... The only parameter for sp_startpublication_snapshot is @publication... I can give it the publication name, but since I am not running it from the publishing database, how do I specify the publishing database?

i.e.: the publication shows up as:

[WC48_Database1]: upb_Inspection_PrimaryArticles

but I am running the script from the database [WC_QACatalog]

Any ideas about how to accomplish this?

Thank you, Wes

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Posted: 24 Jul 2013 02:09 PM PDT

I had to write a simple query where I go looking for people's name that start with a B or a D :

SELECT s.name   FROM spelers s   WHERE s.name LIKE 'B%' OR s.name LIKE 'D%'  ORDER BY 1

I was wondering if there is a way to rewrite this to become more performant. So I can avoid or and / or like?

Sql samples, sql tips

Wednesday, July 24, 2013