Tuesday, April 2, 2013

[how to] Set field values to newly imported rows in PostgreSQL table with existing data

[how to] Set field values to newly imported rows in PostgreSQL table with existing data


Set field values to newly imported rows in PostgreSQL table with existing data

Posted: 02 Apr 2013 08:06 PM PDT

I have a PostgreSQL table with existing data, and needs to import all the rows in a CSV file into that table. I am using pgadmin3's Import tool to do the import.

enter image description here

Question: For the newly imported rows, how do I set the values for a specific column (not found in the CSV)? I cant do UPDATE staging.tablename SET colname = 'somename' the way I could when importing into an empty table. Or is this approach wrong?

SQL Server migrations go through client machine or not?

Posted: 02 Apr 2013 06:10 PM PDT

I want to migrate a SQL Server db on one machine to another machine. Both machines are remote. If I run the Copy Database wizard, will it pull all the data across the Internet to my client machine and then push it back out to the destination? Or will the source and destination machines communicate directly?

It's a very big database, so this matters quite a bit.

Unexpected Table Scan with Parameterized LIKE

Posted: 02 Apr 2013 05:25 PM PDT

I'm experiencing an unexpected table scan on SQL Server 2005 against a heap table when parameterizing a LIKE statement... but when the same value as the variable is hard-coded, the expected Index Seek occurs instead.

The issue only happens given this specific scenario... so I'm not confused as to how to resolve the issue, I'm confused as to why this is happening.

The following T-SQL should recreate the issue on SQL Server 2005:

IF (OBJECT_ID('tempdb.dbo.#tblTest') IS NOT NULL)      DROP TABLE dbo.#tblTest  GO    CREATE TABLE dbo.#tblTest (      ID INT IDENTITY(1, 1),      SerialNumber VARCHAR(50)  )  GO    -- Populate the table with 10,000 rows  SET NOCOUNT ON  DECLARE @i INT  SET @i = 0    WHILE @i < 10000  BEGIN      INSERT INTO dbo.#tblTest VALUES(CAST(@i AS VARCHAR(10)))      SET @i = @i + 1  END  GO    -- To recreate the issue, the table must be a heap.  ALTER TABLE dbo.#tblTest ADD CONSTRAINT PK_tblTest PRIMARY KEY NONCLUSTERED (ID)  GO    -- Create a (non-covering) index on serial number.  CREATE NONCLUSTERED INDEX IX_tblTest_SerialNumber ON dbo.#tblTest (SerialNumber)  GO    DECLARE @Criteria VARCHAR(50)  SET @Criteria = '1234%'    -- This produces a Table Scan.  SELECT *   FROM dbo.#tblTest  WHERE SerialNumber LIKE @Criteria    -- This produces an Index Seek  SELECT *  FROM dbo.#tblTest  WHERE SerialNumber LIKE '1234%'  

I was directed towards this article by Paul White which seems very closely related, but the conclusions / explanations don't match my specific issue.

Any insight is appreciated.

Oracle equiv of T-SQL UNION ALL query

Posted: 02 Apr 2013 02:24 PM PDT

In T-SQL I can accumulate the results of two unioned queries using the following syntax:

SELECT Q.a, SUM(Q.b)  FROM (  SELECT f1 as a, f2 as b FROM TBL1  UNION ALL   SELECT f1 as a, f2 as b FROM TBL2  )  GROUP BY Q.a  

What is the equivalent in Oracle?

Login Failed for Domain\ServerName$ when trying to create a new SQL Server Reporting Services Application in SharePoint 2013

Posted: 02 Apr 2013 01:51 PM PDT

Basic setup:

All of the following is installed on Domain\Servername:

  • SharePoint 2013 Standalone Install (Trial version)
  • Currently using an instance of SQL Server 2012 SP1 Express for basic SharePoint databases (upgraded from SQL Server 2008 R2 Evaluation).
  • Also using an instance of SQL Server 2012 SP1 Evaluation for Reporting Services for Sharepoint (Express instance doesn't support Reporting Services for Sharepoint Integration and Express instance couldn't be upgraded to Evaluation)

Problem:
Ok, so I'm going through the steps to Install Reporting Services Sharepoint Mode for Sharepoint 2013 that can be found at http://msdn.microsoft.com/en-us/library/jj219068.aspx.  When I get to the end of Step 3: Create a Reporting Services Service Application and press OK, I get the following error: Login Failed for user 'Domain\Servername$'

According to what I've found on the interweb this means that "a process running as NETWORK SERVICE or as LocalSystem has accessed a remote resource, has authenticated itself as the machine account and was denied authorization." 

This makes sense since the account that is used for database provisioning for the new Reporting Database that needs to be created behind the scenes as part of Step 3 is the Identity for the Application Pool of SharePoint Central Administration which I currently have set as Network Service.

Anyhoo, the agreed upon solution is to add the login Domain\Servername$ to the SQL Server Instance that's giving me trouble and give it the right permissions ( link for similar problem: http://support.microsoft.com/kb/889646/en-us).

So, I add the login for Domain\Servername$ and give it server roles: dbcreator, public, securityadmin and sysadmin and then just to be on the safe side I give it db_owner permissions for each database in the SQL Server 2012 Evaluation instance I have.
And I still get the same error :S

Just to get some more information I look at the SQL Server Error Logs using Management Studio and I can see that ther error ID is 18456, Severity 14 State 5 and the Reason for the error is  Could not find a login matching the name provided. [CLIENT: ]

Question

Why am I getting this error if I've already added the login Domain\ServerName$? Any ideas or workarounds?

Thanks!

What must be in place to validate a XMLTYPE against a schema?

Posted: 02 Apr 2013 05:48 PM PDT

I have a procedure that generates an XMLTYPE and I want to validate it against a schema. The problem is that there seems to be a permissions issue running createSchemaBasedXML because when I run the procedure as AUTHID DEFINER it gives the error "ORA-31050: Access denied", but when I run it as AUTHID CURRENT_USER it actually returns a validation specific error (I'll deal with that separately). CURRENT_USER is not an acceptable solution.

My supposition is that CURRENT_USER works because the user has the XMLADMIN role. Granting the permissions the role includes does not resolve the issue, so it must be the roles ability to bypass the ACLs.

The thing is, querying RESOURCE_VIEW for the ACL that protects the resource shows that it is protected by /sys/acls/all_owner_acl.xml. DBMS_XDB.getPrivileges shows that the xsd has all the following permissions:

  <read-properties/>    <read-contents/>    <write-config/>    <link/>    <unlink/>    <read-acl/>    <write-acl-ref/>    <update-acl/>    <resolve/>    <link-to/>    <unlink-from/>    <dav:lock/>    <dav:unlock/>    <dav:write-properties/>    <dav:write-content/>    <dav:execute/>    <dav:take-ownership/>    <dav:read-current-user-privilege-set/>  

Using DBMS_XDB.getAclDocument shows a principal of dav:owner has all privileges, so that must not be enough to allow the owner of the schema to create schema based XML. With this thought in mind I created a block to run DBMS_XDB.createResource creating a new ACL. I can successfully create the ACL and from SQLDeveloper I can see that it exists in the location I created it in. The issue comes when I try to run DBMS_XDB.setACL. It returns the exception ORA-31020: The operation is not allowed, Reason: Not a valid ACL path even though I use the same name and path as in the createResource statement.

There are any number of places I could be going wrong in this process, so the core of what I am looking for is this:

What must be in place to validate a XMLTYPE against a schema?

Database theories possible topics to research through [closed]

Posted: 02 Apr 2013 12:42 PM PDT

I need to conduct a research on database theories. I want to know what possible questions should I research on which is believed to be suitable when researching database topics for a online assessment system?

I got one questions which is "Types of Databases" where I want to look up relational databases, object orientated databases and online databases. Then is it reasonable to look up each three types of databases individually by looking at their purpose, benefits and limitations? Also is the mysql database used in phpmyadmin actually a relational databases or classed as an online database?

Is there an more questions I need to research through under database theories?

Thanks

Inner join using an array column in PostgreSQL

Posted: 02 Apr 2013 12:07 PM PDT

Having trouble indexing and executing a query in O(log n) time.

The query includes an inner join, an ORDER BY, and an equality operation. If I understand the laws of databases correctly, a query can be indexed and executed in O(log n) time (or thereabouts) if a non-equality operator is not used on more than one field. In this case, I believe the inner join does count as an equality operator and the non-equality operator would be the 'ORDER BY' portion of the query. This table has upwards of 10,000,000 rows and needs to handle several reads and writes per second.

Using PostgreSQL. This is what the table looks like. As you can see, the field 'Names' is a list property and it is the column that the inner join goes against:

  Age Names                       Date  34  ['carla', 'john', 'sam']    3/13/2011  26  ['json', 'cindy', 'joel']   3/13/2011  72  ['beth', 'amber', 'susie']  3/13/2011  14  ['john', 'jim', 'debie']    3/13/2011  

This is the query that we are trying to do:

  SELECT * FROM the_table WHERE Age==26 AND Names=='john' ORDER BY Date  

My background is from using App Engine's Big Table, so I've used equality operators here to indicate that 'john' should be one of the names in the 'Names' column. This would be an acceptable query in GAE's big table, it would execute in O(log N) time as all Big Table queries are reqyured to do. I am assuming there is a way to do this in PostgreSQL as well since PostgreSQL accepts list data types as columns.

Is this possible to do in PostgreSQL?

If so, how should the index be set up (we can't figure out how to set up an index that takes into account the three properties)?

Concatenation Physical Operation: Does it guarantee order of execution?

Posted: 02 Apr 2013 11:40 AM PDT

In standard SQL, the result of a union all is not guaranteed to be in any order. So, something like:

select 'A' as c union all select 'B'  

Could return two rows in any order (although, in practice on any database I know of, 'A' will come before 'B').

In SQL Server, this turns into an execution plan using a "concatenation" physical operation.

I could easily imagine that the concatenation operation would scan its inputs, returning whatever input has records available. However, I found the following statement on the web (here):

The Query Processor will execute this plan in the order that the operators appear in the plan, the first is the top one and the last is the end one.

Question: Is this true in practice? Is this guaranteed to be true?

I haven't found any reference in Microsoft documentation that the inputs are scanned in order, from the first to the last. On the other hand, whenever I try running it, the results suggest that the inputs are, indeed, processed in order.

Is there a way to have the engine process more than one input at a time? My tests (using much more complicated expressions than constants) are on a parallel-enabled 8-core machine, and most queries do take advantage of the parallelism.

Handling expiration in a course registration or ticketing system

Posted: 02 Apr 2013 04:15 PM PDT

I want to restructure the database for a web application I inherited for handling summer camp registrations, but my question is relevant to any event registration or ticketing system.

My question is about how to best handle the need for registrations to expire if unpaid for X amount of time.

Currently the database design is something like this (some columns omitted for brevity):

PK = primary key  FK = foreign key    sale_header (      PK id      FK account_id      payment_status      total      payment_method      payment_time  )    sale_item (      PK (sale_header_id, camper_id)      price  )    class_registration (      PK (camper_id, class_id)      time_added      registered (boolean)  )  

When payment is completed, the registered flag is set to true. There is a dump script that runs regularly to clear out unpaid registrations by checking for registrations where `class_registration.registered = 0 AND sale_header.payment_status='not paid'" and for which the configured time allowed until expiration has elapsed.

There are currently some bugs related to this, and anyway it seems to me to be a somewhat strange design given that the registered flag is only set to true when payment is completed, so the payment_status column seems to be unnecessary; it seems like it would be much cleaner to have the sale_header table only contain transactions that were actually completed. Also, even if I kept this approach, it seems it would be better if there were a FK linking sale_item and class_registration.

I've been researching potential database designs for this scenario and one thought I had was to add price info to the class_registration table and get rid of the sale_item table, since the system is only for selling class registrations, nothing else:

sale (      PK id      FK account_id      total      payment_method      payment_time  )    class_registration (      FK camper_id      FK class_id      FK sale_id      price      time_added  )  

In this design, the FK sale_id would initially be null while the parent was still choosing classes for their camper(s), so finding expired, unpaid registrations would be done by querying for registrations where sale_id is NULL, and which had exceeded the expiration time.

Another option I thought of, inspired by this example (scroll down to "Data Modeling in the Sample Application"), would be to have a separate class_availability table, which might be good for querying performance when determining available capacities:

class_registration (      FK camper_id      FK class_id      FK sale_id      price      time_added  )    class_availability (      FK class_id      FK camper_id      FK registration_id   )  

Finally, I considered what it would look like if I kept the sale_item table, although I'm not sure what would be the advantage of doing so:

class_registration (      PK (camper_id, class_id)      FK sale_item_id      time_added  )    sale_item (      PK sale_item_id      FK sale_header_id      price  )  

So my specific concerns are:

  • What are the pros and cons of these approaches?
  • Should expired registrations stay in the database and just be excluded when querying, or actually be deleted?
  • A requirement to keep in mind is that it should be possible to change the configured expiration time in a live system (in case of bugs, which has happened before with this system), so having a column like expiration_time in the database isn't an option; it needs to be calculated based on time_added.
  • (Bonus :) Feel free to skip this one and focus on the above questions.) In my new proposed designs (or some other design), how could I handle the possible need for admins to be able to extend the expiration time for a particular camper in case there was some issue with payment that needed to be worked out, and their registration should be held in the meantime?

Thanks!

SQL server ODBC connection for Active Directory user who doesn't log in to the windows

Posted: 02 Apr 2013 01:15 PM PDT

Do you think it is possible to create a SQL server ODBC Connection for an active directory user who doesn't log in into the windows.

Ideally this type of users will be used in the batch process. So, Another person logs in and creates a batch process and runs it with another user.

Note: I dont want to create a SQL server authentication. Instead would like to use active directory.

Thanks.

How to reinsert corrected rows from the conflict table?

Posted: 02 Apr 2013 01:43 PM PDT

I have a bidirectional merge replication. I had failure constraints because the primary key was just integer. I change the primary key to the old primary key + a location identifier.The problem is how can I reinsert the old rows of conflict table ( that I can correct manually from MSmerge_conflict_) to the publishers and subscribers. can you help me please?

sorry for making faults, I'm not english speaker

Connector Table Hash Values try to optimize data storage

Posted: 02 Apr 2013 09:01 AM PDT

I am designing a system which stores certain hash values to represent certain pieces of text for example.

Each hash value can represent more than one text file.

I have 4 tables being the details_table,text_table,attribute_table,connector_table.

The attribute_table is laid out as followed:

id|attribute|type|date_added  

The details_table is laid out like:

id|detail|date_added  

The text_table is laid out like :

id|text|status|user_added  

And finally the connector_table is laid out like :

id|text_table_id|attribute.id|detail.id|date_added  

The details table contains the hash value. To save time when a hash value is marked as ready to export I would like the associated text rows to also be marked as ready to export how would I do this?

Would I need to add another table with hash status ?

If I use the status related to the text table then when the text fields are exported it would mark the hash as exported and so any new files that had the same hash would be marked as exported.

Its really bugging me as the only way I can see is if I create another table and connect the detail has to it and set the status that way?

If anyone else has anything better or a different solution I would appreciate the guidance?

How to execute a non-table-locking update operation on PostgreSQL?

Posted: 02 Apr 2013 10:38 AM PDT

Looking for a good way to update the value on a column for all the rows in a database (not huge, but big enough - about 10M records), without locking the whole table, so operations can continue while the update is working. The update operation is pretty simple, because the value of the new column is basically computed from another column, kind of like this:

UPDATE table      SET a_col = array[col];  

However, I have to leave this running for a long time, and I would like to not bog down the usual DB activity while the update happens. Is there a better method to do this than running a process that does everything in one go?

Thanks very much!

Backups script completes, but doesn't back up all dbs

Posted: 02 Apr 2013 04:24 PM PDT

I'm running sql server 2005 and I've made a simple backup script that backs up all the databases on one server. The script always succeeds, but when I go to check the log or the files, I see it has only backed up around half of my databases. I've tried it on multiple servers and it does the same thing. I've checked the application log for the backups in the eventvwr, however, it tells me to check the application log for more details so I can never find the detailed information I need.

I saw this similar question and tried the solution that was recommended to try and catch errors. However, there were no errors thrown when I run the script.

DECLARE @db_names NVARCHAR(1000)    DECLARE db_cursor CURSOR      FOR select name from sys.databases              where name not in ('tempdb')       order by name    OPEN db_cursor  FETCH NEXT FROM db_cursor into @db_names  WHILE @@FETCH_STATUS = 0      BEGIN        BEGIN TRY            EXECUTE [Full_Backup] @DB_name = @db_names        END TRY        BEGIN CATCH            EXEC spErrorHandling        END CATCH    FETCH NEXT FROM db_cursor INTO @db_names  END  CLOSE db_cursor  DEALLOCATE db_cursor  

I used this sites example for error handling.

-- Declaration statements  DECLARE @Error_Number int  DECLARE @Error_Message varchar(4000)  DECLARE @Error_Severity int  DECLARE @Error_State int  DECLARE @Error_Procedure varchar(200)  DECLARE @Error_Line int  DECLARE @UserName varchar(200)  DECLARE @HostName varchar(200)  DECLARE @Time_Stamp datetime    -- Initialize variables  SELECT @Error_Number = isnull(error_number(),0),  @Error_Message = isnull(error_message(),'NULL Message'),  @Error_Severity = isnull(error_severity(),0),  @Error_State = isnull(error_state(),1),  @Error_Line = isnull(error_line(), 0),  @Error_Procedure = isnull(error_procedure(),''),  @UserName = SUSER_SNAME(),  @HostName = HOST_NAME(),  @Time_Stamp = GETDATE();    -- Insert into the dbo.ErrorHandling table  INSERT INTO dbo.ErrorHandling (Error_Number, Error_Message, Error_Severity, Error_State, Error_Line,   Error_Procedure, UserName, HostName, Time_Stamp)    SELECT @Error_Number, @Error_Message, @Error_Severity, @Error_State, @Error_Line,   @Error_Procedure, @UserName, @HostName, @Time_Stamp  

Is there a reason why backups fail silently? Is there better error handling I can be doing?

[Full_Backup]

declare @path varchar(200)      set @path = 'P:\Backups\' + CONVERT(char(10), GetDate(),126) + '_' + @DB_name + '.bak'    declare @backupStr nvarchar(max)  set @backupStr = 'BACKUP DATABASE [' + @DB_name + '] TO  DISK = N''' + @path+ ''' WITH NOFORMAT, NOINIT,  NAME = N'''+@DB_name+'-Full Database Backup'', SKIP, NOREWIND, NOUNLOAD,  STATS = 10, CHECKSUM        declare @backupSetId as int      select @backupSetId = position from msdb..backupset where database_name=N'''+@DB_name+''' and backup_set_id=(select max(backup_set_id) from msdb..backupset where database_name=N'''+@DB_name+''' )      if @backupSetId is null begin raiserror(N''Verify failed. Backup information for database '''''+@DB_name+''''' not found.'', 16, 1) end      RESTORE VERIFYONLY FROM  DISK = N''' + @path + ''' WITH  FILE = @backupSetId,  NOUNLOAD,  NOREWIND'  

Edited to add the [Full_Backup] sp

Process attempted to unlock a resource it does not own

Posted: 02 Apr 2013 12:34 PM PDT

SQL Server 2005 SP4 32-Bit

I have a DBCC CHECKDB job running nightly. Last night, soon after the job started, I got the errors below. The database is NOT in suspect mode, and CHECKDB comes back clean when I run it now. The database is fully accessible. Should I be concerned? I'd hate to go back to a backup at this point.

 2013-04-02 02:10:55.53 spid56      Error: 1203, Severity: 20, State: 1.   2013-04-02 02:10:55.53 spid56      Process ID 56 attempted to unlock a resource it                                      does not own: PAGE: 34:1:388664. Retry the                                       transaction, because this error may be caused                                       by a timing condition. If the problem persists,                                       contact the database administrator.   2013-04-02 02:10:55.58 spid56      Error: 3314, Severity: 17, State: 3.   2013-04-02 02:10:55.58 spid56      During undoing of a logged operation in                                       database 'MY_DATABASE_NAME', an error occurred                                       at log record ID (1342973:12519:37). Typically,                                       the specific failure is logged previously as                                       an error in the Windows Event Log service.                                       Restore the database or file from a backup,                                       or repair the database.  

Automated SQL backup on a timely fashion, & cleaup the database for the backed up data

Posted: 02 Apr 2013 04:39 PM PDT

I need to back up SQL database (historian), on a timely fashion, and then clean up the database by removing the backed up data.

I am using MS SQL 2008 (R2), on a Windows XP machine. The biggest issue is the very limited hard disk space. The database is limited to a maximum of 3GB! In terms of overall performance, the PC is really slow, and unfortunately I do not have the choice to change that. So, I could consider backing up overnight when the data flow is expected to be less.

The intention is to back up the data every two weeks, have it stored in a special directory (e.g. c:\ ). Then an operator can move the backup to another machine. Given the limited space, I could consider some 'house clean up', by removing the backed up data. What is more important is the ability to merge the regular backups to an external database. So perhaps a typical SQL backup routine and restore, could be an option.

I would appreciate your kind advice regarding this matter. Thank you.

Foreign keys - link using surrogate or natural key?

Posted: 02 Apr 2013 11:18 AM PDT

Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is Jack Douglas' answer in this question, and his reasoning seems sound to me. I'm aware of the discussion beyond that that rules change, but this would be something that would need to be considered in any situation.

The main reason for asking is that I have a legacy application that makes uses of FKs with natural keys, but there is a strong push from devlopers to move to an OR/M (NHibernate in our case), and a fork has already produced some breaking changes, so I'm looking to either push them back on track using the natural key, or move the legacy app to use surrogate keys for the FK. My gut says to restore the original FK, but I'm honestly not sure if this is really the right path to follow.

The majority of our tables already have both a surrogate and natural key already defined (though unique constraint and PK) so having to add extra columns is a non-issue for us in this insance. We're using SQL Server 2008, but I'd hope this is generic enough for any DB.

Innodb Slow queries since convert from MyISAM

Posted: 02 Apr 2013 10:35 AM PDT

a few days ago we converted some write intensive tables from MyISAM to InnoDB hoping to have a better performance due the better locking system of InnoDB, but instead of gain performance, we start to see simple queries on slow log.

for exemple, the bellow query took 3.6 sec to run:

# Time: 130402  7:24:07  # User@Host: iron[iron] @ localhost []  # Query_time: 3.596235  Lock_time: 0.000033 Rows_sent: 0  Rows_examined: 1  SET timestamp=1364883847;  UPDATE `cookies` SET `lastSelectedLanguage`="english" WHERE  `cookieID`="27276286";  

see bellow the table structure:

mysql> describe cookies;  +----------------------+------------------+------+-----+---------+----------------+  | Field                | Type             | Null | Key | Default | Extra          |  +----------------------+------------------+------+-----+---------+----------------+  | cookieID             | bigint(20)       | NO   | PRI | NULL    | auto_increment |  | containerID          | int(10) unsigned | NO   | MUL | NULL    |                |  | dtCreated            | datetime         | NO   |     | NULL    |                |  | lastSelectedLanguage | varchar(31)      | YES  |     | NULL    |                |  +----------------------+------------------+------+-----+---------+----------------+  

to run a select(not using cache) with the same WHERE clause it run in 0 sec

 mysql> SELECT SQL_NO_CACHE * FROM `cookies` WHERE `cookieID`="27276286";  +----------+-------------+---------------------+----------------------+  | cookieID | containerID | dtCreated           | lastSelectedLanguage |  +----------+-------------+---------------------+----------------------+  | 27276286 |           6 | 2013-04-02 06:23:52 | english              |  +----------+-------------+---------------------+----------------------+  1 row in set (0.00 sec)  

The server is a 16 core cpu's:

...  processor   : 15  vendor_id   : GenuineIntel  cpu family  : 6  model       : 44  model name  : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz  stepping    : 2  cpu MHz     : 2393.931  cache size  : 12288 KB  ...  

And has SSD Drivers.

The total size of InnoDB tables on the server is 1.79 GB:

mysql> SELECT (SUM(DATA_LENGTH)+SUM(INDEX_LENGTH)) /1024/1024/1024 AS Total_InnoDB_in_GB FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='InnoDB';  +--------------------+  | Total_InnoDB_in_GB |  +--------------------+  |     1.781707763672 |  +--------------------+  1 row in set (0.12 sec)  

As you can see bellow, we have 2.5 GB of innodb_buffer_pool_size witch is almost 1GB more the the sum of index + data

mysql> SHOW VARIABLES LIKE '%innodb%';  +-----------------------------------------+------------------------+  | Variable_name                           | Value                  |  +-----------------------------------------+------------------------+  | have_innodb                             | YES                    |  | ignore_builtin_innodb                   | OFF                    |  | innodb_adaptive_hash_index              | ON                     |  | innodb_additional_mem_pool_size         | 1048576                |  | innodb_autoextend_increment             | 8                      |  | innodb_autoinc_lock_mode                | 1                      |  | innodb_buffer_pool_size                 | 2684354560             |  | innodb_checksums                        | ON                     |  | innodb_commit_concurrency               | 0                      |  | innodb_concurrency_tickets              | 500                    |  | innodb_data_file_path                   | ibdata1:10M:autoextend |  | innodb_data_home_dir                    |                        |  | innodb_doublewrite                      | ON                     |  | innodb_fast_shutdown                    | 1                      |  | innodb_file_io_threads                  | 4                      |  | innodb_file_per_table                   | ON                     |  | innodb_flush_log_at_trx_commit          | 2                      |  | innodb_flush_method                     | O_DIRECT               |  | innodb_force_recovery                   | 0                      |  | innodb_lock_wait_timeout                | 50                     |  | innodb_locks_unsafe_for_binlog          | OFF                    |  | innodb_log_buffer_size                  | 8388608                |  | innodb_log_file_size                    | 536870912              |  | innodb_log_files_in_group               | 2                      |  | innodb_log_group_home_dir               | ./                     |  | innodb_max_dirty_pages_pct              | 90                     |  | innodb_max_purge_lag                    | 0                      |  | innodb_mirrored_log_groups              | 1                      |  | innodb_open_files                       | 300                    |  | innodb_rollback_on_timeout              | OFF                    |  | innodb_stats_method                     | nulls_equal            |  | innodb_stats_on_metadata                | ON                     |  | innodb_support_xa                       | ON                     |  | innodb_sync_spin_loops                  | 20                     |  | innodb_table_locks                      | ON                     |  | innodb_thread_concurrency               | 8                      |  | innodb_thread_sleep_delay               | 10000                  |  | innodb_use_legacy_cardinality_algorithm | ON                     |  +-----------------------------------------+------------------------+  38 rows in set (0.00 sec)  

I'm running version 5.1.66 (as the version 5.1 is the latest version included on non back-port repository of debian squeeze, update to 5.5/5.6 unfortunately is not an option)

mysql> SHOW VARIABLES LIKE 'version%';  +-------------------------+-----------------------+  | Variable_name           | Value                 |  +-------------------------+-----------------------+  | version                 | 5.1.66-0+squeeze1-log |  | version_comment         | (Debian)              |  | version_compile_machine | x86_64                |  | version_compile_os      | debian-linux-gnu      |  +-------------------------+-----------------------+  4 rows in set (0.00 sec)  

Update 1:

Just had a new occur on 16:46:02, see bellow the iostat from 14:46:00 (2 sec interval)

Tue Apr  2 16:46:00 IST 2013  Linux 2.6.32-5-amd64 (hemlock)  02/04/13        _x86_64_        (16 CPU)    avg-cpu:  %user   %nice %system %iowait  %steal   %idle             2.54    0.00    0.44    0.22    0.00   96.79    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn  sda              67.88      1667.89      1369.53  801352236  658003944    Tue Apr  2 16:46:02 IST 2013  Linux 2.6.32-5-amd64 (hemlock)  02/04/13        _x86_64_        (16 CPU)    avg-cpu:  %user   %nice %system %iowait  %steal   %idle             2.54    0.00    0.44    0.22    0.00   96.79    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn  sda              67.88      1667.91      1369.53  801366876  658004640    Tue Apr  2 16:46:04 IST 2013  Linux 2.6.32-5-amd64 (hemlock)  02/04/13        _x86_64_        (16 CPU)    avg-cpu:  %user   %nice %system %iowait  %steal   %idle             2.54    0.00    0.44    0.22    0.00   96.79    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn  sda              67.89      1667.95      1369.52  801389476  658005912    Tue Apr  2 16:46:06 IST 2013  Linux 2.6.32-5-amd64 (hemlock)  02/04/13        _x86_64_        (16 CPU)    avg-cpu:  %user   %nice %system %iowait  %steal   %idle             2.54    0.00    0.44    0.22    0.00   96.79    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn  sda              67.89      1667.95      1369.53  801389628  658012616    Tue Apr  2 16:46:08 IST 2013  Linux 2.6.32-5-amd64 (hemlock)  02/04/13        _x86_64_        (16 CPU)    avg-cpu:  %user   %nice %system %iowait  %steal   %idle             2.54    0.00    0.44    0.22    0.00   96.79    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn  sda              67.89      1667.94      1369.53  801389652  658014192    Tue Apr  2 16:46:10 IST 2013  Linux 2.6.32-5-amd64 (hemlock)  02/04/13        _x86_64_        (16 CPU)    avg-cpu:  %user   %nice %system %iowait  %steal   %idle             2.54    0.00    0.44    0.22    0.00   96.79    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn  sda              67.89      1667.93      1369.53  801389668  658015880  

And also the \s in that period (16:46:00 16:46:02 16:46:04 16:46:06):

Tue Apr  2 16:46:00 IST 2013  --------------  mysql  Ver 14.14 Distrib 5.1.66, for debian-linux-gnu (x86_64) using readline 6.1    Connection id:          370306  Current database:  Current user:           readonly@localhost  SSL:                    Not in use  Current pager:          stdout  Using outfile:          ''  Using delimiter:        ;  Server version:         5.1.66-0+squeeze1-log (Debian)  Protocol version:       10  Connection:             Localhost via UNIX socket  Server characterset:    latin1  Db     characterset:    latin1  Client characterset:    latin1  Conn.  characterset:    latin1  UNIX socket:            /var/run/mysqld/mysqld.sock  Uptime:                 9 hours 13 min 25 sec    Threads: 2  Questions: 5265901  Slow queries: 109  Opens: 9852  Flush tables: 1  Open tables: 2048  Queries per second avg: 158.587  --------------    Tue Apr  2 16:46:02 IST 2013  --------------  mysql  Ver 14.14 Distrib 5.1.66, for debian-linux-gnu (x86_64) using readline 6.1    Connection id:          370336  Current database:  Current user:           readonly@localhost  SSL:                    Not in use  Current pager:          stdout  Using outfile:          ''  Using delimiter:        ;  Server version:         5.1.66-0+squeeze1-log (Debian)  Protocol version:       10  Connection:             Localhost via UNIX socket  Server characterset:    latin1  Db     characterset:    latin1  Client characterset:    latin1  Conn.  characterset:    latin1  UNIX socket:            /var/run/mysqld/mysqld.sock  Uptime:                 9 hours 13 min 27 sec    Threads: 11  Questions: 5266200  Slow queries: 109  Opens: 9853  Flush tables: 1  Open tables: 2048  Queries per second avg: 158.587  --------------  Tue Apr  2 16:46:04 IST 2013  --------------  mysql  Ver 14.14 Distrib 5.1.66, for debian-linux-gnu (x86_64) using readline 6.1    Connection id:          370361  Current database:  Current user:           readonly@localhost  SSL:                    Not in use  Current pager:          stdout  Using outfile:          ''  Using delimiter:        ;  Server version:         5.1.66-0+squeeze1-log (Debian)  Protocol version:       10  Connection:             Localhost via UNIX socket  Server characterset:    latin1  Db     characterset:    latin1  Client characterset:    latin1  Conn.  characterset:    latin1  UNIX socket:            /var/run/mysqld/mysqld.sock  Uptime:                 9 hours 13 min 29 sec    Threads: 27  Questions: 5266361  Slow queries: 110  Opens: 9858  Flush tables: 1  Open tables: 2048  Queries per second avg: 158.582  --------------    Tue Apr  2 16:46:06 IST 2013  --------------  mysql  Ver 14.14 Distrib 5.1.66, for debian-linux-gnu (x86_64) using readline 6.1    Connection id:          370402  Current database:  Current user:           readonly@localhost  SSL:                    Not in use  Current pager:          stdout  Using outfile:          ''  Using delimiter:        ;  Server version:         5.1.66-0+squeeze1-log (Debian)  Protocol version:       10  Connection:             Localhost via UNIX socket  Server characterset:    latin1  Db     characterset:    latin1  Client characterset:    latin1  Conn.  characterset:    latin1  UNIX socket:            /var/run/mysqld/mysqld.sock  Uptime:                 9 hours 13 min 31 sec    Threads: 2  Questions: 5267006  Slow queries: 117  Opens: 9864  Flush tables: 1  Open tables: 2048  Queries per second avg: 158.592  --------------  

Does anyone has any clue about why this updates (it happens to insert and also to delete) are so slow?

Relating ExecutionInstanceGUID to the SSISDB

Posted: 02 Apr 2013 07:42 PM PDT

The 2012 release of SQL Server Integration Services, SSIS, has delivered an SSISDB catalog which tracks the operations of packages (among other things). The default package execution for solutions using the Project Deployment model will have logging to the SSISDB turned on.

When a package executes, the System::ExecutionInstanceGUID is populated with a value that, if one were using explicit logging (to sys.sysdtslog90/sys.sysssislog) would record all the events for a specific package execution.

What I'd like to know, is how do I tie an ExecutionInstanceGUID to anything in the SSISDB catalog. Alternatively, is an SSIS package executing in the SSISDB privy to the value of its catalog.executions.execution_id

Ultimately, I am trying to use the existing, custom audit table and link it back to the detailed history in the SSISDB catalog but can't seem to find the link.

How can I improve my table design for different types of an entity?

Posted: 02 Apr 2013 07:30 PM PDT

Consider an accounting system as an example. I have an Entity called Client. Client can be of different types, with different fields applicable to different types. I consider creating separate tables for different types of Client, each having fields applicable to the respective type and have one master table referencing all of them and have fields applicable to all types.

Currently, I come up with the following design:

enter image description here

But I don't think my design is efficient enough (or even correct and free of errors). What would you suggest? Also, if this is important in any way, I am planning to utilize MariaDB.

In what data type should I store an email address in database?

Posted: 02 Apr 2013 08:03 PM PDT

I understand that an 254 character email address is valid, but implementations I have researched tend to use a varchar(60) to varchar(80) or equivalent. For example: this SQL Server recommendation uses varchar(80) or this Oracle example

Is there a reason to not use the full 254 character maximum? Doesn't a varchar by definition use only as much storage as needed to hold the data?

Are there significant performance implications/trade-offs which cause so many implementations to use less than the full 254 possible characters?

compare the same table

Posted: 02 Apr 2013 10:59 AM PDT

I am facing an issue with the following query. When I execute the query, it takes very long. I broke the query into two parts, compared with a shell script, but is there any chance to go with one query?

Any suggestion welcome.

select distinct substring(mobile_num,3,12)  from mobile  where  status ='INACTIVE'    and date(unsub_date) >= DATE(CURDATE() - INTERVAL 90 DAY)    and mobile_num not in(select distinct mobile_num from mobile where status='ACTIVE')  order by updtm;  
| mobile_num  | varchar(12)   | keyword     | varchar(45)   | sub_date    | datetime     | unsub_date  | datetime     | circle_name | varchar(45)   | type        | varchar(45)   | status      | varchar(45)  | operator    | varchar(45)   | act_mode    | varchar(45)   | deact_mode  | varchar(45)   | id          | bigint(20)    | updtm       | timestamp     

sql server database sharding - what to do with common data / non sharded data

Posted: 02 Apr 2013 04:59 PM PDT

We have a very large scale enterprise level database. As part of our business model all web users hit our web servers at the same time each month which in turn hammer our sql box. The traffic is very heavy and continues to grow heavier the larger the company grows. sql proc optimization has been performed and hardware has already been scaled up to a very high level.

We are looking to shard the database now to ensure that we can handle company growth and future loads.

We have decided what particular data should be sharded. It is a subset of our database which is highly utilized.

However, my question is regarding the non sharded data which is common/universal. An example of data like this may be an Inventory table for instance or possibly an Employee table, user table etc .

I see two options to handle this common/universal data:

1) design 1 - Place the common/universal data in an external database. All writes will occur here. This data will then be replicated down to each shard allowing each shard to read this data and inner join to this data in t-sql procs.

2) design 2 - Give each shard its own copy of all common/universal data. Let each shard write locally to these tables and utilize sql merge replication to update/sync this data on all other shards.

concerns about design #1

1) Transactional issues: If you have a situation in which you must write or update data in a shard and then write/update a common/universal table in 1 stored proc for instance, you will no longer be able to do this easily. The data now exists on seperate sql instances and databases. You may need to involve MS DTS to see if you can wrap these writes into a transaction since they are in a separate database. Performance is a concern here and possible rewrites may be involved for procs that write to sharded and common data.

2)a loss of referential integrity. Not possible to do cross database referential integrity.

3) Recoding large areas of the system so that it knows to write common data to the new universal database but read common data from the shards.

4). increased database trips. Like #1 above, when you run into a situation in which you must update sharded data and common data you are going to make multiple round trips to accomplish this since the data is now in separate databases. Some network latency here but I am not worried about this issue as much as the above 3.

concerns about design #2

In design #2 each shard gets its own instance of all common/universal data. This means that all code that joins to or updates common data continues to work/run just like it does today. There is very little recoding/rewriting needed from the development team. However, this design completely depends on merge replication to keep data in sync across all shards. the dbas are highly skilled and are very concerned that merge replication may not be able to handle this and should merge replication fail, that recovery from this failure is not great and could impact us very negatively.

I am curious to know if anyone has gone with design option #2. I am also curious to know if i am overlooking a 3rd or 4th design option that I do not see.

thank you in advance.

ParallelPeriod returning null for Feb. 29 in date dimension

Posted: 02 Apr 2013 12:17 PM PDT

I have a calendar date dimension backed by a physical table of dates (originally created on SQL Server 2000, hence the datetime instead of date):

CREATE TABLE [dbo].[PostDate_Dimension](      [post_date] [datetime] NOT NULL PRIMARY KEY,      [day_of_year] [int] NOT NULL,      [day_of_month] [int] NOT NULL,      [month_of_year] [int] NOT NULL,      [post_year]  AS (datepart(year,[post_date])),      [post_month]  AS (datepart(month,[post_date])),      [post_day]  AS (datepart(day,[post_date]))  )  

The Post Date dimension has four attributes (with member key columns listed, some of which are calculated in the DSV):

  1. Day (Dimension Key) - post_date
  2. Month - post_year, post_month
  3. Quarter - post_year, post_quarter = DatePart(quarter, "post_date"))
  4. Year - post_year

It's nothing too fancy, obviously. I also have a few calculated measures that use ParallelPeriod to calculate YTD figures from the previous year, for quick side-by-side comparison without requiring the user to choose a specific slice of dates. Just pick the current year, and it will find the latest date with sales in it, then compare to that same range from the previous year.

Finding the appropriate date in the previous year normally boils down to this:

ParallelPeriod(      [Post Date].[Post Date].[Year],      1,      Tail(          NonEmpty(              Descendants(                  [Post Date].CurrentMember,                  ,                  Leaves              ),              Measures.[Total Price]          ),          1      ).Item(0)  )  

The Tail call is where it finds the latest date beneath the currently selected Post Date member (typically the current year). That works fine. But if that returns Feb. 29, meaning the last sale for a particular combination of dimension members occurred on Feb. 29, then it passes Feb. 29 into the ParallelPeriod function, which subsequently returns null. And then the previous-year YTD measure also returns null.

So, in a nutshell: Based on this particular schema, is there a simple way to have ParallelPeriod behave nicely for Feb. 29 inputs? If it just returns Feb. 28 of the previous year, that's fine.

EDIT:

A few things I've tried:

  • Using this expression to adjust the Post Date member:
    Iif(MONTH([Post Date].[Post Date].CurrentMember.Member_Caption) = 2 And DAY([Post Date].[Post Date].CurrentMember.Member_Caption) = 29, [Post Date].[Post Date].CurrentMember.PREVMEMBER, [Post Date].[Post Date].CurrentMember)
    This works, but the code would be atrocious, since I'd have to replace all [Post Date].[Post Date].CurrentMember with Tail(NonEmpty(Descendants([Post Date].CurrentMember,, Leaves), Measures.[Total Price]), 1).Item(0)).
  • Using Except to remove all Feb. 29 dates from the results of NonEmpty(Descendants([Post Date].CurrentMember,, Leaves), Measures.[Total Price]). I can't figure out the proper syntax (if any) to get a set of all Feb. 29s from the dimension.
  • Creating a .NET assembly with a user-defined function that takes a member as a parameter, and returns the previous member if it's a Feb. 29. It seems like the classes in Microsoft.AnalysisServices.AdomdServer are extremely limited and don't even allow for this basic task (nor even retrieving the member key as a date value).

Why is DROP DATABASE taking so long? (MySQL)

Posted: 02 Apr 2013 04:19 PM PDT

New CentOS installation.

I was running an import of a large DB (2GB sql file) and had a problem. The SSH client seemed to lose the connection and the import seemed to freeze. I used another window to login to mysql and the import appeared to be dead, stuck on a particular 3M row table.

So I tried

DROP DATABASE huge_db;  

15-20 minutes later, nothing. In another window, I did:

/etc/init.d/mysqld restart  

The DROP DB window messaged: SERVER SHUTDOWN. Then I actually restarted the physical server.

Logged back into mysql, checked and the db was still there, ran

DROP DATABASE huge_db;

again, and again I'm waiting already about 5 minutes.

Once again, it's fresh installation. The huge_db is the only db (other than system dbs). I swear I've dropped db's this large before and quickly, but maybe I'm wrong.

Please help.

EDIT:

I've successfully dropped the database. It took something like 30 minutes. Also note that I think I was mistaken when I thought the mysqldump import was dead. The terminal connection was lost, but I think the process was still running. I most-likely killed the import mid-table (the 3M row table) and probably 3/4 of the way through the whole db. It was misleading that "top" showed mysql using only 3% of memory, when it seemed like it should be using more.

Dropping the DB ended up taking 30 min, so, again, I might not have had to restart the server and possibly could have just waited for the DROP to finish, but I don't know how mysql would react to getting a DROP query for the same db that it's importing via mysqldump.

Still, the question remains, why does it take 30min+ to DROP a 2GB database when all it should have to do is delete all the db files and remove all references to the DB from information_schema? What's the big deal?

Why are so many MPP solutions based on PostgreSQL instead of MySQL?

Posted: 02 Apr 2013 09:03 AM PDT

Astor Data, Greenplum and GridSQL all allow Massive Parallel Processing of SQL queries. They are also all built around PostgreSQL technology. Is this just because of licensing issues or are there other reasons? To me, it seems like the MyISAM, not being ACID complient and therefore not running into the same issues with MVCC (like seen here) as PostgreSQL is far better suited for building high-performance data warehouses. After all OLAP load does not require transactions as far as I can see.

No comments:

Post a Comment

Search This Blog