Friday, June 7, 2013

[how to] Restoring SQL_ASCII dumps to a UTF8-encoded database

[how to] Restoring SQL_ASCII dumps to a UTF8-encoded database


Restoring SQL_ASCII dumps to a UTF8-encoded database

Posted: 07 Jun 2013 03:53 PM PDT

I've got a Postgres 8.4 environment where the encoding on all our databases is set to SQL_ASCII - We're finally migrating to Postgres 9.2, and I would like to migrate everything over to UTF8 encoding.

Unfortunately the text data in this DB is not clean -- Trying to restore the pg_dump to a utf8-encoded database throws errors about invalid byte sequences, even if I specify --encoding=UTF8 when I run pg_dump (presumably because Postgres doesn't know what to make of them and just dumps them unchanged?).

We have a LOT of data (upwards of a million rows with text/string elements), and auditing all of it by hand would be very time consuming (and error prone) so I'd like to automate this if possible.

Is there an easy way to find the non-utf8-conforming strings/text fields in the database so we can fix them? Or am I stuck with a manual audit to straighten this mess out?

Need a log based replication / CDC solution to publish postgres to oracle subscribers

Posted: 07 Jun 2013 02:03 PM PDT

We have a moderately large postgres 9.1 db that is currently 3TB and will likely grow much larger over the next couple of years. we need a reliable and fast solution for moving this data into either oracle 11G or sql server 2012 databases. we need a log based solution like replication or CDC with a minimal foot print of the postgres server and not looking for a code based ETL solution like SSIS or SAP Data Services.

It should also be able to effectively handle blob / spatial data that is stored in postgres

How to troubleshoot high cpu problems related to postgres and postmaster service?

Posted: 07 Jun 2013 02:08 PM PDT

I'm using an open source(RHCE 6.2) based machine running SIEM software , when i do the top command i see

postgress postmaster

both eating 96% of cpu usage? Is there a way to pin-point or see what causing these service to stack up?

thanks.

How to implement somthing similar to "pointer arrays" on MYSQL

Posted: 07 Jun 2013 04:36 PM PDT

While building a site I came across the following issue:

I want to be able to have multiple containers and within each container have ids pointing to multiple items that lie on another table. (Scheme: http://i.stack.imgur.com/pb78k.png)

I am using MYSQL and would like to stick with it. However I am open to other ideas.

My questions are:

  1. Would this be a reasonable data structure, or is there any better way to organize this data?
  2. If So, How would I go about implementing this idea of "array of pointers" to elements.

User roles and permissions

Posted: 07 Jun 2013 12:11 PM PDT

Look at the attached graphic.

I'm trying to keep this as simple as possible.

Basically I want to have multiple admins(or superusers) who own regular users who own sites.

  • Admins can give regular users permission or role to a website.
  • Admins can see and edit all sites of the users they own.
  • Users can only see & edit their own sites. (based on the role).
  • One site can be managed by multiple users.

I don't know if I should have permissions table with multiple rows for each role so one user can have many rows for each role or maybe use an array in one row only... Idk. I'm stuck there.

What's the best way to do this simply?

P.S: I know the graphic has User2 under site1 and site2 but I don't need it that complex. Same users for same site but different roles is sufficient. No need for 1 regular user to edit 2 sites.

enter image description here

Running two loops simultaneously to move a list to rows in two different columns of different lengths

Posted: 07 Jun 2013 12:11 PM PDT

I need help in below table updation using SQL in Ms-Access 2007 database.

I've a table called table1 which has below entries:

table1:

--------------------------------------------------  |  col1  |   col2   |          col3         | Col4  -------------------------------------------------  |    A1  |  abcd    |      123;456;789      | Banana, water  -------------------------------------------------  |    B2  |  efgh    |       246;369         | fish, yellow, water  --------------------------------------------------  |    C3  |   ijk    |                       | socks  --------------------------------------------------   

Using SQL, I want to change the above table (particularly the column col3 and col4 multiple values into individual multiple rows) as shown below:

--------------------------------------------------  |  col1  |   col2   |          col3       | col4  -------------------------------------------------  |    A1  |  abcd    |         123         |banana  -------------------------------------------------  |    A1  |  abcd    |         456         |water  -------------------------------------------------  |    A1  |  abcd    |         789         |  -------------------------------------------------  |    B2  |  efgh    |         246         |fish  --------------------------------------------------  |    B2  |  efgh    |         369         |yellow  --------------------------------------------------  |    B2  |  efgh    |                     |water  -------------------------------------------------  |    C3  |   ijk    |                     |socks  

I found your answer in Ms-Access 2007-2010 SQL - from single row record how to create multiple rows and I was able to alter it to fit my needs. I am having trouble creating a nested loop. When I nest the loops I end up getting duplicated rows depending on whichever col i have running on the inner row. I cannot find a way to run the two loops simultaneously. I would really appreciate any help you can give me!

Difference between Sql server and Oracle [closed]

Posted: 07 Jun 2013 11:44 AM PDT

What is the Difference between Sql server and Oracle?

/usr/sbin/mysqld was deleted and replaced by another mysqld with different inode

Posted: 07 Jun 2013 07:23 PM PDT

This happened on three of our mysql server running on RHEL. While checking for open and deleted files, I found that mysqld in use is deleted (as seen in lsof) and was replaced by a similar mysqld (in /usr/sbin/mysqld) with a different inode. Size and blocks of both the files (deleted and current) are same. new mysqld (not in use) seems to be of the same version as the deleted one. I am trying to figure out what could have caused this (there are no cronjobs running on the system). I checked system logs, database logs and yum logs and found nothing relevant. Any input is appreciated.

Thanks!

SQL Server 2012 FileTable Missing sometimes

Posted: 07 Jun 2013 03:41 PM PDT

I have created 3 folders by using file table.

One of the folder have ~392 GB data (with heavy data read/write) and it is not showing up sometimes and I need to restart to SQL Server 2012 to make it appear in explorer again.

I don't see any issues other folders with less load.

Please advise.

Is it safe to remove an entry in a "config.mongos" MongoDB collection?

Posted: 07 Jun 2013 10:39 AM PDT

There are old entries in my "config.mongos"; they reference hosts which previously had "mongos" running on them but no longer do.

Can I safely remove those entries?

e.g.,:  # Old entries  { "_id" : "silicon:27117", "ping" : ISODate("2012-12-27T19:51:04.560Z"), "up" : 9301, "waiting" : true }  ...  # Current entries  { "_id" : "app1:27017", "ping" : ISODate("2013-06-07T17:12:02.856Z"), "up" : 5593404, "waiting" : true }  { "_id" : "app2:27017", "ping" : ISODate("2013-06-07T16:17:49.812Z"), "up" : 5590271, "waiting" : false }  { "_id" : "app4:27017", "ping" : ISODate("2013-06-07T17:12:02.110Z"), "up" : 5593413, "waiting" : true }  { "_id" : "app3:27017", "ping" : ISODate("2013-06-07T17:12:02.854Z"), "up" : 5593578, "waiting" : true }  

Percona XtraDB Cluster and MySQL Replication

Posted: 07 Jun 2013 07:56 PM PDT

I've setup a 3-node multi master replication with percona xtradb cluster, it works perfectly.

Now I've tried to add some read-only slaves setting up replication as usual but it seems the binlog doesn't include the new inserts,

I've set binlog_do_db on the master of a database, slave says the log position is the same as the master ones but the new data isn't there.

Is there a special way to do replication on xtradb cluster?

Can I optimize this MERGE statement?

Posted: 07 Jun 2013 10:20 AM PDT

I am attempting a single column merge between two tables. The first table (VisitorSession) has 40,000,000 rows. The second (ShoppingCart) has 9,000,000 rows.

In my development environment, the query takes just under 8 minutes. But in the production environment, it should take significantly less (much more powerful machine). However, I anticipate the query taking at least 2 minutes to run in production. I know that this query was causing timeouts for other developers in the development environment, which means it could easily cause timeouts for customers. Is there a safer and/or faster way to perform this query?

declare @dt datetime = cast(dateadd(month, -6, getdate()) as date);    merge ShoppingCart as TargetTable  -- 07:55 to complete in Dev  using   (    select * from -- 04:55 to run select, resulting in 12,727,927 rows in Dev    (      select        visitorid  -- int, not null, foreign key        ,useripaddress  -- varchar(55), null        ,row_number() over         (partition by visitorid order by createdate desc) as [row]      from VisitorSession (nolock)      where UserIPAddress is not null      and CreateDate > @dt   -- createdate is a datetime, not null    ) as subTbl    where subTbl.[row] = 1  ) as SourceTable  on (TargetTable.VisitorID = SourceTable.VisitorID)  -- visitorid is not a primary key  when matched    then update set    TargetTable.UserIpAddress = SourceTable.UserIpAddress;  

How to change x-axis labels in a report

Posted: 07 Jun 2013 10:15 AM PDT

I am creating a trend report, in which the x-axis labels are first laywr as year (2014), second quarter (1,2), third period (month) (3,1), fourth week (3,4,5,1,2).

I need to change these labels as y2014 for year, Q1,Q2 for quarter, month3,month1 for month, and W3,W4,W5 etc. for week.

All I need is to change the labels as they appear.

Can I default to T-SQL only view in VS 2012?

Posted: 07 Jun 2013 01:28 PM PDT

When creating new tables in a database project in VS 2012, opened sql files default to a split view. One is the T-SQL text editor view, and the other is the Design view. I've never used the design view, and probably never will. I just prefer the text editor. Is there a way to tell VS to default to the T-SQL view?

What signals do the DBWn processes send to LGWR?

Posted: 07 Jun 2013 08:06 PM PDT

I would really like to know what signals are in the following quote from the Process Architecture docs, and how does DBWn know LGWR has completed?

Before DBWn can write a modified buffer, all redo records that are associated with the changes to the buffer must be written to disk (the write-ahead protocol). If DBWn finds that some redo records have not been written, it signals LGWR to write the redo records to disk and waits for LGWR to complete writing the redo log buffer before it can write out the data buffers.

Query keeps on executing

Posted: 07 Jun 2013 10:38 AM PDT

I'm executing below query, but it keeps on executing.

In place of some time I have a time value.

select distinct(col1)   from table1  where col2 > 'some time'    and col1 not in         (select distinct(col1)          from table2          where col2 not like '%XYZ%');  

SQL Agent embedded PowerShell script in CmdExec step fails with import-module sqlps

Posted: 07 Jun 2013 10:43 AM PDT

SQL Server 2008R2 PowerShell 2.1

I am trying to create a SQL Agent job that dynamically backs up all non-corrupted SSAS databases on an instance without the use of SSIS. In my SQL Agent job, when I create a CmdExec step and point to a PowerShell script file (.ps1) like this:

powershell.exe "c:\MyPSFile.ps1"   

the job executes successfully (or at least gets far enough to only encounter logic or other syntax issues).

This approach won't work for a final solution, because there is a requirement to keep the PowerShell script internal to SQL. So I have a different CmdExec step that embeds the PowerShell script like so:

powershell.exe "import-module sqlps –DisableNameChecking    $server_name = "localhost"  $backup_location = "C:\BackupsGoHere"    [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.AnalysisServices") | out-null  $server = New-Object Microsoft.AnalysisServices.Server  $server.connect($server_name)    # Generate an array of databases to be backed up  foreach ($database in ($server.get_Databases() | Where {$_.EstimatedSize -gt 0 -and $_.Cubes.Count -gt 0})) {      $directory_path = $backup_location + "\" + $database.Name      if (!(Test-Path -Path $directory_path)) {          New-Item $directory_path -type directory | out-null      }      [string] $timestamp = date      $timestamp = $timestamp.Replace(':','').Replace('/','-').Replace(' ','-')      $database.Backup("$directory_path\$database-$timestamp.abf")  }  $server.disconnect()"  

However, when executed with the embedded script, the job errors out quickly with the following response:

The specified module 'sqlps' was not loaded because no valid module file was found in any module directory.

Why can't I reference the module from an embedded script, but doing so in a ps1 file works just fine?

Need an idea for a complex SQL question

Posted: 07 Jun 2013 12:04 PM PDT

I have had my cup of coffee, but cannot come up with a good solution for the following problem - so I am here to pick your brain. ;-)

Here is what I am trying to accomplish (for PostgreSQL):

  • Got a table with logged communications related to products.
  • The table contains an ID (PK, serial), date/time when inserted, a product ID, an error code field (and some other fields not relevant for now)
  • Rows get validated upon reception, and all rows are stored.
  • Rows are either accepted or rejected (error code field).
  • When accepted, fine, the rows get processed (as a 'step 2' - not relevant for now).
  • However, when rejected the sender gets an error message and is requested to correct the data and resend.
  • So, when the corrected data is received again and is validated and accepted, we should have a product row that got rejected and a product row that got accepted.
  • We have learned that not all senders check these error message and resend their (corrected) data.

So, what I am looking for are these rows for product that were rejected but never sent again correctly - it may happen that rows get send multiple times and rejected multiple times by our validation process.

I have been at it for a couple of hours, but so far without much luck.
What SQL trick could I pull out of the hat to find these rows?

How to optimize multiple self-JOINs?

Posted: 07 Jun 2013 06:13 PM PDT

I'm looking for advice on either optimizing multiple self-joins, or a better table/DB design.

One of the tables looks as follows (relevant cols only):

CREATE TABLE IF NOT EXISTS CountryData (    countryDataID INT PRIMARY KEY AUTO_INCREMENT,    dataID INT NOT NULL REFERENCES DataSources (dataID),    dataCode VARCHAR(30) NULL,    countryID INT NOT NULL REFERENCES Countries (countryID),    year INT NOT NULL ,    data DEC(20,4) NULL,    INDEX countryDataYear (dataID, countryID, year));  

The data column has values for a few hundred indicators, 90 countries, and 30 years for ~1mn rows total. A standard query requires selecting N indicators for a particular year and C countries, yielding a CxN table for 90 rows max.

With all values in a single column, self-joins seemed like the way to go. So I have experimented with various suggestions to speed those up, including indexing and creating new (temp) tables. At 9 self-joins, the query takes a little under 1 min. Beyond that, it spins forever.

The new table from where the self-joins take place has only about 1,000 rows, indexed on what seem to be the relevant variables - creation takes about 0.5 sec:

CREATE TABLE Growth      SELECT dataID, countryID, year, data      FROM CountryData      WHERE dataID > 522 AND year = 2017;    CREATE INDEX growth_ix       ON Growth (dataID, countryID);  

The SELECT query then arranges up to XX indicators in the results table, with XX unfortunately <10:

SELECT       Countries.countryName AS Country,         em01.em,      em02.em,      em03.em      ...      emX.em  FROM          (SELECT          em1.data AS em,          em1.countryID      FROM Growth AS em1      WHERE      em1.dataID = 523) as em01      JOIN       (SELECT          em2.data AS em,          em2.countryID      FROM Growth AS em2      WHERE      em2.dataID = 524) as em02      USING (countryID)      JOIN      (SELECT          em3.data AS em,          em3.countryID      FROM Growth AS em3      WHERE      em3.dataID = 525) as em03      USING (countryID)      ...      JOIN      (SELECT          emX.data AS em,          emX.countryID      FROM Growth AS em5      WHERE      emX.dataID = 527) as emXX      USING (countryID)      JOIN Countries       USING (countryID)  

I'd actually like to retrieve a few more variables, plus potentially join other tables. Now I'm wondering whether there's a way to run this more efficiently, or whether I should take an altogether different approach, such as using wide tables with indicators in different columns to avoid self-joins.

Efficient way to fetch records since a given criteria

Posted: 07 Jun 2013 08:24 PM PDT

I'm trying to implement a logic where the user can say give me n records since a given id#. E.g.

SELECT TOP (100) col1, col2, ... colN   FROM Table1  WHERE ID > @id  ORDER BY ID  

Performance is the biggest issue here, especially when you get into nested loops for complex joins. I've looked at the new OFFSET-FETCH feature in SQL Server 2012 but they require you to specify a number rather than a WHERE clause. I wouldn't know the offset unless I count num rows beforehand.

Is there an alternate way to do this efficiently in SQL Server (2008 R2 and above)?

Update: Complete SQL as generated by EF Code first

-- Region Parameters  DECLARE @p__linq__0 BigInt = 60375518904121  DECLARE @p__linq__1 BigInt = 60375518904121  -- EndRegion  SELECT   [Project3].[Id] AS [Id],   [Project3].[C2] AS [C1],   [Project3].[C1] AS [C2],   [Project3].[C3] AS [C3]   FROM ( SELECT       [Limit1].[Id] AS [Id],       [Limit1].[C1] AS [C1],       [Limit1].[C2] AS [C2],      [Limit1].[TmId] AS [TmId],       CASE WHEN ([Extent4].[TmId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3]      FROM   (SELECT TOP (100) [Project2].[Id] AS [Id], [Project2].[TmId] AS [TmId], [Project2].[C1] AS [C1], [Project2].[C2] AS [C2]          FROM ( SELECT               [Extent1].[Id] AS [Id],               [Extent2].[TmId] AS [TmId],               CASE WHEN ([Extent2].[TmId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1],               1 AS [C2]              FROM  [dbo].[TextMessages] AS [Extent1]              LEFT OUTER JOIN [dbo].[Tms] AS [Extent2] ON [Extent1].[Id] = [Extent2].[TmId]              WHERE ( EXISTS (SELECT                   1 AS [C1]                  FROM [dbo].[Tmr] AS [Extent3]                  WHERE ([Extent1].[Id] = [Extent3].[TmId]) AND ([Extent3].[IsDel] = 1) AND ([Extent3].[UserId] = @p__linq__0)              )) OR ([Extent1].[CreatedBy_Id] = @p__linq__1)          )  AS [Project2]          ORDER BY [Project2].[Id] DESC ) AS [Limit1]      LEFT OUTER JOIN [dbo].[Tmr] AS [Extent4] ON [Limit1].[Id] = [Extent4].[TmId]  )  AS [Project3]  ORDER BY [Project3].[Id] DESC, [Project3].[TmId] ASC, [Project3].[C3] ASC  

Update: The real bottleneck turned out to be network latency caused by EC2 servers.

Is the key_buffer_size applicable to myisam tmp tables?

Posted: 07 Jun 2013 04:24 PM PDT

I have a database about 750GB in size. It's all innodb.

Larger analytical queries often need to group by several columns or use distinct so it's common that MySQL will have to create tmp tables. The tmp tables fit into memory.

My cache hit ratio (Key_reads / Key_read_requests) is 0.

When MySQL creates these tmp tables, I'm guessing it doesn't it create pseudo indexes to be used by key_buffer_size.

Update one table from another table while sorting that table based on one column

Posted: 07 Jun 2013 03:24 PM PDT

This is the problem I'm trying to figure out in MySQL. We have an old table contains some forms submitted by our users. Somehow, the previous decision was each time a user comes to this survey, a new form will be submitted. So in the old table we easily have several rows with the same Firstname, Lastname, but different values in the other columns, and there's a timestamp column Date_Submission as well.

Now we are trying to move everything to a new table, but this time, for each person we only keep one row. And we want to keep some of the the latest old data of that user (like email, phone number, etc)

I could do the following:

update newtable, oldtable set newtable.email = oldtable.email where newtable.firstname = oldtable.firstname and newtable.lastname = oldtable.lastname;

Apparently this won't give me the "latest" old date for each person.

So I tried this one:

update newtable, oldtable set newtable.email = oldtable.email where newtable.firstname = oldtable.firstname and newtable.lastname = oldtable.lastname order by oldtable.Date_Submission;

But they MySQL will complain with:

"ERROR 1221 (HY000): Incorrect usage of UPDATE and ORDER BY".

So I'm wondering, what's the correct way to achieve this?

increasing mysql table open cache?

Posted: 07 Jun 2013 02:24 PM PDT

I often read that it is best to increase this variable slowly. Can someone explain why? My status indicates that I should increase it... What is best practice / "slowly"?

Thanks!

select count(*) in mysql 5.5 innodb-- rewrite advice?

Posted: 07 Jun 2013 10:24 AM PDT

I need advice on how to rewrite a select count(*) query for innodb tables mysql 5.5. in new environment its very slow...

select count(*)   from mails3     join questions using (question_id)   where mails3.from_user_id = '86696'     and mails3.is_community_star = 1     and mails3.rating = 3     and questions.is_rated_community = 1;  

the query execution plan looks simple enough but very slow

*************************** 1. row ***************************             id: 1    select_type: SIMPLE          table: mails3           type: ref  possible_keys: question_id_index,emails_doublestars,favorite_emails,user_responses,owner_responses,sentmail,watchlist,read_responses,rating_date,read_and_unrated            key: emails_doublestars        key_len: 8            ref: const,const,const           rows: 2784          Extra: Using where  *************************** 2. row ***************************             id: 1    select_type: SIMPLE          table: questions           type: eq_ref  possible_keys: PRIMARY,is_rated_community_senddate_idx            key: PRIMARY        key_len: 5            ref: answerology.mails3.QUESTION_ID           rows: 1          Extra: Using where  2 rows in set (0.00 sec)  

Search every column in every table in Sybase Database

Posted: 07 Jun 2013 08:24 PM PDT

I'm been taxed with the task of creating an application that pulls data from our Micros Point of Sales system. The POS is using a Sybase database running on one of our servers. The database schema is very convoluted. I've been able to figure out most of the schema to gain access to the data I need, however, there are a few things still left to find. I know what they are called in the actual POS, but I can't seem to find them anywhere in the database (although I haven't actually looked through all 200+ tables).

I'm wondering if there's any way to run a query to search for a specific string in all the columns in all the tables. BTW, I'm using the "Interactive SQL" application that comes with the Sybase Database Client software to connect to the database.

Parameter Sniffing vs VARIABLES vs Recompile vs OPTIMIZE FOR UNKNOWN

Posted: 07 Jun 2013 12:21 PM PDT

So we had a long running proc causing problems this morning (30 sec + run time). We decided to check to see if parameter sniffing was to blame. So, we rewrote the proc and set the incoming parameters to variables so as to defeat parameter sniffing. A tried/true approach. Bam, query time increased (less than 1 sec), when looking at the query plan the improvements were found in an index the original wasn't using.

Just to verify that we didn't get a false positive we did a dbcc freeproccache on the original proc and reran to see if the improved results would be the same. But, to our surprise the original proc still ran slow. We tried again with a WITH RECOMPILE, still slow (we tried a recompile on the call to the proc and inside the proc it'self). We even restarted the server (dev box obviously).

So, my question is this... how can parameter sniffing be to blame when we get the same slow query on an empty plan cache... there shouldn't be any parameters to snif???

Are we instead being affected by table stats not related to the plan cache. And if so, why would setting the incoming parameters to variables help??

In further testing we also found that inserting the OPTION (OPTIMIZE FOR UNKNOWN) on the internals of the proc DID get the expected improved plan.

So, some of you folks smarter than I, can you give some clues as to whats going on behind the scenes to produce this type of result?

On another note, the slow plan also get's aborted early with reason GoodEnoughPlanFound while the fast plan has no early abort reason in the actual plan.

In summary

  • Creating variables out of incoming parameters (1 sec)
  • with recompile (30+ sec)
  • dbcc freeproccache (30+ sec)
  • OPTION (OPTIMIZE FOR UKNOWN) (1 sec)

UPDATE:

See slow execution plan here: https://www.dropbox.com/s/cmx2lrsea8q8mr6/plan_slow.xml

See fast execution plan here: https://www.dropbox.com/s/b28x6a01w7dxsed/plan_fast.xml

Note: table, schema, object names changed for security reasons.

No comments:

Post a Comment

Search This Blog