Wednesday, July 3, 2013

[how to] Adjacency list model vs nested list model or any other database models

[how to] Adjacency list model vs nested list model or any other database models


Adjacency list model vs nested list model or any other database models

Posted: 03 Jul 2013 08:51 PM PDT

I have the following hierarchy

Terms -> Sub-Terms -> Keywords

1) There will be many terms and 2) Each term have many sub-terms and 3) Each sub-term have many keywords

A post can be attached to a Term, or a sub-Term(Which is under a Term) or a keyword(which is under a sub-Term)

I am in a dilemma whether to choose adjacency list model or nested list model for this categorization...Though this question has been answered many times in online forums, i am not able to finalize the model...Some says nested list model is overkill for fixed number of levels...

Someone please guide me in the right path...

thank you

MySQL Replicate Databases without Tables or Content

Posted: 03 Jul 2013 07:48 PM PDT

I would like to replicate all the databases from a MySQL master to a slave used for archiving data older than 30 days. It isn't common for this data to be accessed so I would like it removed from the master. I only need to replicate the database table structure without any content. But only two tables (tablea and tableb) out of the 10 tables need to be created in the database structure.

The table content will be imported using an archive script that is using pt-archiver which is why I don't need the content replicated to the slave. However when the main database is removed from the master it should also remove the database from this archive slave. Is this possible using replication?

UNF to 3NF is this relation workable?

Posted: 03 Jul 2013 04:26 PM PDT

I've inherited a flat table of data consisting of around 2 million records. Looking at the data its already in 1NF, i'd like to get this into 3NF.

The data is Australian so suburb can't determine state as suburb names exist in more than one state, plus one postcode often covers more than one suburb. Meaning suburb, state, postcode should be a super key?

There are multiple companies at different sites, some have the same phone number (freephone 1300 or 1800) and others have distinct phone numbers, every record has a fax number and most have a phone number. ContactName is mostly blank so phone and fax numbers relate to companies not contacts.

Selection of records by users is typically RLIKE on the field BusinessTypeDescription coupled with geographic criteria which of course means a full table scan on every query and therefore super slow results.

Is the proposed table structure below in 3NF and is it practical?

EXISTING TABLE  -----------------------  ID (PK)  CompanyName  Address  Suburb  State  Region  Postcode  Country  ContactName (this field is mostly empty)  Phone  Fax  Mobile  Email  (this field is mostly empty)  Website  (this field is mostly empty)  PremiseType  BusinessTypeDescription  ANZSICCode  ANZSICDescription  RecordSource  

This is the proposed structure...

COMPANY  ---------------  ID (PK)  CompanyName  Address  LocationID (FK)  PhoneID (FK)  FaxID (FK)  MobileID (FK)  ContactName  PremiseType  BusinessTypeID (FK)  ANZSICCode (FK)  Email  Website  RecordSource      LOCATION        COUNTRY         REGION  --------------  --------------  ------------  ID (PK)         ID (PK)         ID (PK)  Suburb          Country         Region  State  Postcode  RegionID (FK)  CountryID (FK)    PHONES  ------------  ID  Phone    MOBILES  ------------  ID  Mobile    FAXES  ------------  ID  Fax    BUSINESSTYPE  ----------------------  ID  BusinessTypeDescription      ANZSICS  ----------------------  ANZSICCode  ANZSICDescription  

What menu item in TOAD can be used to monitor SQL queries in real time in a database?

Posted: 03 Jul 2013 04:36 PM PDT

What menu item in TOAD can be used to monitor SQL queries in real time in a database? I own a license for TOAD and I have used the "live SQL monitoring" feature before but I don't quite remember which menu item to get to it. Does anyone know? This is for monitoring a Oracle database.

I know about Menu-->Database-->Monitor-->SQL Tracker, which can bind to a local process client and trace SQL queries but I am looking to trace all queries from all clients without needing to bind to a process like the "SQL Tracker" tool does.

I have seen this work on my computer but I just can't remember how to find it. I have scoured all the menu items in my TOAD app and haven't been able to relocate it.

ALLOW_SNAPSHOT_ISOLATION Duration

Posted: 03 Jul 2013 07:26 PM PDT

I would like to enable ALLOW_SNAPSHOT_ISOLATION on a production database that has ~400 million total records. I understand that 14 bytes will need to be added to each record.

If I set ALLOW_SNAPSHOT_ISOLATION will it block for a period of time proportional to the record count, or will the data be updated asynchronously?

Primarily, I want to be sure that my database will not be out of service for hours when this setting is enabled.

Active Directory group permissions not being applied to user

Posted: 03 Jul 2013 05:27 PM PDT

I have an odd situation. An Active Directory group was recently created for the sole purpose of being a securable in SQL Server. I created a login based on the AD group, a user based on the login, and granted permissions to the user. All pretty standard stuff. When I impersonate one of the group members (using execute as login='domain\username'), I'm able to perform the granted action. However, when the user himself tries it, he gets "permission denied". I'm at a loss to explain the behavior and subsequently fix it.

ADO SQL modifies data on other tables?

Posted: 03 Jul 2013 02:33 PM PDT

i´ve been trying to update some record using a form that has a couple of comboboxes in it and those are populated from auxiliary tables that contains city and department (the table that needs to be updated only holds the ID from both "side" tables.

Since i was having some hard times to insert the data directly i decided to use ADO for direct SQL Insert.

So far my VBA code looks like this:

Private Sub btnClose_Click()      Dim Msg, Style, Title, Ctxt, Help, Response, MyString        Msg = "Want to save changes?"      Style = vbYesNo + vbQuestion      Tytle = "Confirm changes"      Ctxt = 1000        Response = MsgBox(Msg, Style, Tytle, Help, Ctxt)        If Response = vbNo Then          Cancel = True      Else          Call ManualUpdate      End If  End Sub  

And the ManualUpdate sub:

Private Sub ManualUpdate()      Dim ccnDb As ADODB.Connection      Set ccnDb = CurrentProject.Connection      Dim cmd As ADODB.Command      Set cmd = New ADODB.Command        Dim strSQL As String      strSQL = "UPDATE personal SET personaApPaterno = " & "'" & Trim(Me.personaApPaterno.Value) & "'"      strSQL = strSQL & ", personaApMaterno = " & "'" & Trim(Me.personaApMaterno.Value) & "'"      strSQL = strSQL & ", personaNombre = " & "'" & Trim(Me.personaNombre.Value) & "'"      strSQL = strSQL & ", personaCargo = " & "'" & Trim(Me.personaCargo.Value) & "'"      strSQL = strSQL & ", departamentoId = " & Me.cmbDepto.Value      strSQL = strSQL & ", ciudadId = " & Me.cmbCiudad.Value      strSQL = strSQL & ", personaProfesion = " & "'" & Trim(Me.personaProfesion.Value) & "'"      strSQL = strSQL & ", personaGerente = " & Me.personaGerente.Value      strSQL = strSQL & ", personaExterno = " & Me.personaExterno.Value      strSQL = strSQL & ", personaSexo = " & Me.ogSexo.Value      strSQL = strSQL & " WHERE personaRUT = " & Me.personaRUT.Value        If Me.Dirty Then          Me.Dirty = False      End If        With cmd          .CommandText = strSQL          .ActiveConnection = ccnDb          .Execute      End With        Set cmd = Nothing      DoCmd.Close  End Sub  

So far... so good...

As you can see the SQL does a DIRECT upgrade to the table called "personal", and is working fine, the ONLY thing that is driving me crazy is that, for some odd reason, Access 2007 is updating the auxiliary tables and replacing the text at the "ciudadName" and "departamentoName" fields with their own respective Id´s. Just for some extra information, i was using DAO before ADO.

Here is the DAO code also...

Private Sub ManualUpdate()      Dim dbDao As DAO.Database      Dim rsDao As DAO.Recordset        If Me.Dirty Then          Me.Dirty = False      End If        Set dbDao = CurrentDb      Set rsDao = dbDao.OpenRecordset("personal", dbOpenTable)        rsDao.FindFirst ("personaRUT = " & Me.personaRUT.Value)      rsDao.Edit        rsDao![personaApPaterno] = Trim(Me.personaApPaterno.Value)      rsDao![personaApMaterno] = Trim(Me.personaApMaterno.Value)      rsDao![personaNombre] = Trim(Me.personaNombre.Value)      rsDao![personaCargo] = Me.personaCargo.Value      rsDao![departamentoId] = Me.cmbDepto.Value      rsDao![comunaId] = Me.cmbComuna.Value      rsDao![personaProfesion] = Me.personaProfesion.Value      rsDao![personaGerente] = Me.personaGerente.Value      rsDao![personaExterno] = Me.personaExterno.Value      rsDao![personaSexo] = Me.ogSexo.Value        rsDao.Update        rsDao.Close      dbDao.Close        Set dbDao = Nothing      Set rsDao = Nothing      DoCmd.Close  End Sub  

BTW: DAO was doin THE same "update" where it was not supposed to be happening, that why i went for the ADO way, but with no luck at all.

Information about Disk Storage MySQL

Posted: 03 Jul 2013 02:31 PM PDT

I am trying to create an application for monitoring MySQL,in my application there's a part of monitor DISK STORAGE, so after more research I have not found how to calculate :

  • used and free size of tablespace,
  • used and free of data files,
  • used and free of log files

Anyone know how to calculate their sizes or someone can give me articles or books can help me. And Thanks.

How can I better optimize my tables and a LEFT JOIN query to find items that don't yet exist in the right table?

Posted: 03 Jul 2013 02:15 PM PDT

I have two tables that represent a list of urls and their related word indexes. Here are the table definitions for reference.

desc urllist;  +-------+---------------------+------+-----+---------+----------------+  | Field | Type                | Null | Key | Default | Extra          |  +-------+---------------------+------+-----+---------+----------------+  | id    | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |  | url   | text                | NO   |     | NULL    |                |  +-------+---------------------+------+-----+---------+----------------+  

and

desc wordlocation;  +----------+---------------------+------+-----+---------+-------+  | Field    | Type                | Null | Key | Default | Extra |  +----------+---------------------+------+-----+---------+-------+  | urlid    | bigint(20) unsigned | NO   |     | NULL    |       |  | wordid   | bigint(20) unsigned | NO   |     | NULL    |       |  | location | int(10) unsigned    | NO   |     | NULL    |       |  +----------+---------------------+------+-----+---------+-------+  

The software app is a web spider. It crawls a list of urls, extracts those urls, and inserts these into the urllist table. Then, an indexer checks to see what urls have not yet been indexed, and then proceeds to index said urls.

Here is the query I am using to find items in the left table (urllist) that have not yet been indexed in the right table (wordlocation). This query is as suggested on the mysql.com website:

select * from urllist ul   left join wordlocation wl on ul.id = wl.urlid   where wl.urlid IS NULL;  

As of this writing my test database has only 600 indexed urls, and the wordlocation table has 1.3 million rows. However, my CPU is at 100%, and the longest I've waited to see if the query would complete is a half hour (which, it never did by the way).

To be thorough, here is the explanation of the query:

explain select * from urllist ul left join wordlocation wl on ul.id = wl.urlid where wl.urlid IS NULL;  +----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+  | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                   |  +----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+  |  1 | SIMPLE      | ul    | ALL  | NULL          | NULL | NULL    | NULL |   50364 |                         |  |  1 | SIMPLE      | wl    | ALL  | NULL          | NULL | NULL    | NULL | 1351371 | Using where; Not exists |  +----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+  

I need this query to finish in seconds, not minutes. Also, I'm worried about scalability. I have 40,000 unique urls that are waiting to be added to the index, so how do I take that into consideration with my table and query design? 400,000 urls?

Just a couple notes about my decisions on the current table structure.

I have no intention of stopping at 400,000 urls, but perhaps bigint(20) is a bit overzealous?

Url as text is for more practical reasons. I index a lot of Asian and other foreign language domains which do not appear as their equivalent Kanji or other characters in the database and frequently take more than 255 characters.

I'm using MySQL. I'm definitely open to suggestions as to better table and query design. Please let me know if I can provide more information.

MySQL on Unix, should administrator accounts join the operating system "mysql" group?

Posted: 03 Jul 2013 01:22 PM PDT

When I install MySQL via Yum, it automatically creates the mysql operating system user and group. Then mysqld executes as mysql (instead of as root), and all the files in the data directory are owned by mysql:mysql.

This is obviously correct security practice, I understand why this happens and don't want to subvert it.

When I'm administering a MySQL system, I've just gotten in the habit of using sudo when I need to interact with a protected file (e.g., reading a binary log with mysqlbinlog, creating a backup tarball, etc).

Would it also work to add my OS user to the mysql group? Are there major downsides of this I'm not considering?

Transferring Data from OLTP to OLAP databases

Posted: 03 Jul 2013 11:25 AM PDT

Systems that have both OLTP and OLAP databases periodically move data from the OLTP database to the OLAP database, so that reports can be generated based on the OLAP database.

What's the official term for this process and is there an overview of this process online?

TOAST Table Growth Out of Control - FULLVAC Does Nothing

Posted: 03 Jul 2013 08:49 PM PDT

Recently, I've had a PostgreSQL 8.2.11 server upgraded to 8.4 in order to take advantage of autovacuum features. This server exists in a very closed environment (isolated network, limited root privileges; this explains the older software in use) and runs on RHEL5.5 (i686). After the upgrade, the database has constantly been growing to the tune of 5-6 GB a day. Normally, the database, as a whole, is ~20GB; currently, it is ~89GB. We have a couple other servers which run equivalent databases and actually synchronize the records to each other via a 3rd party application (one I do not have access to the inner workings). The other databases are ~20GB as they should be.

Running the following SQL, it's fairly obvious there's an issue with a particular table, and, more specifically, its TOAST table.

SELECT nspname || '.' || relname AS "relation",      pg_size_pretty(pg_relation_size(C.oid)) AS "size"    FROM pg_class C    LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)    WHERE nspname NOT IN ('pg_catalog', 'information_schema')    ORDER BY pg_relation_size(C.oid) DESC    LIMIT 20;  

Which produces:

                relation              |  size    ------------------------------------+---------      pg_toast.pg_toast_16874           | 89 GB      fews00.warmstates                 | 1095 MB      ...    (20 rows)  

This TOAST table is for a table called "timeseries" which saves large records of blobbed data. A SUM(LENGTH(blob)/1024./1024.) of all the records in timeseries yields ~16GB for that column. There should be no reason this table's TOAST table should be as large as it is.

I've performed a VACUUM FULL VERBOSE ANALYZE timeseries, and the vacuum runs to completion with no errors.

INFO: vacuuming "pg_toast.pg_toast_16874"
INFO: "pg_toast_16874": found 22483 removable, 10475318 nonremovable row versions in 10448587 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 37 to 2036 bytes long.
There were 20121422 unused item pointers.
Total free space (including removable row versions) is 0 bytes. 4944885 pages are or will become empty, including 0 at the end of the table. 4944885 pages containing 0 free bytes are potential move destinations.
CPU 75.31s/29.59u sec elapsed 877.79 sec.
INFO: index "pg_toast_16874_index" now contains 10475318 row versions in 179931 pages
DETAIL: 23884 index row versions were removed.
101623 index pages have been deleted, 101623 are currently reusable.
CPU 1.35s/2.46u sec elapsed 21.07 sec.

REINDEXed the table which freed some space (~1GB). I can't CLUSTER the table as there isn't enough space on disk for the process, and I'm waiting to rebuild the table entirely as I'd like to find out why it is so much bigger than equivalent databases we have.

Ran a query from the PostgreSQL wiki here - "Show Database Bloat", and this is what I get:

  current_database | schemaname |           tablename            | tbloat | wastedbytes |              iname              | ibloat | wastedibytes    -----------------+------------+--------------------------------+--------+-------------+---------------------------------+--------+--------------    ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_synchlevel       |    0.0 |            0    ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_localavail       |    0.0 |            0    ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_expirytime       |    0.0 |            0    ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | idx_timeseries_expiry_null      |    0.0 |            0    ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | uniq_localintid                 |    0.0 |            0    ptrdb04          | fews00     | timeseries                     |    1.0 |           0 | pk_timeseries                   |    0.1 |            0    ptrdb04          | fews00     | idx_timeseries_expiry_null     |    0.6 |           0 | ?                               |    0.0 |            0  

It looks like the database doesn't consider this space as "empty," at all, but I just don't see where all the disk space is coming from!

I suspect that this database server is deciding to use 4-5x as much disk space to save the same records pulled from the other data servers. My question is this: Is there a way I can verify the physical disk size of a row? I'd like to compare the size of one row on this database to another "healthy" database.

Thanks for any help you can provide!

EDIT

I ended up rebuilding the table from a dumped schema due to its size (couldn't leave it alone for another day). After synchronizing the data, via the software synch process, the TOAST table was ~35GB; however, I could only account for ~9GB of it from that blob column which should be the longest in terms of values. Not sure where the other 26GB is coming from. CLUSTERed, VACUUM FULLed, and REINDEXed to no avail. The postgresql.conf files between the local and remote data servers are exactly the same. Is there any reason this database might be trying to store each record with a larger space on disk?

What are Measures and Dimensions in Cubes

Posted: 03 Jul 2013 07:30 PM PDT

I'm very new to Microsoft Sql Server Business Intelligence and Analysis Service(but I'm programming for years with SQL Server).Can any one describe Measures and Dimensions in Cubes in Simple words(If it's possible with images)?

thanks

How does one select the histogram / scatterplot / etc. Data Viewers in SSIS 2012

Posted: 03 Jul 2013 11:27 AM PDT

I have a data flow task in SSIS 2012 and have created a flat file source and an OLE DB connection destination. I can select the data flow path between them as in earlier versions of SSIS and I can right click and select the Data Flow Path Editor.

The Data Viewer option is available as in past versions, but there is no add button and I cannot determine how to add histogram/ scatterplot /etc. data viewer.

All I can seem to configure is a sample of the data that actually moved across.

On which machine should TCP Chimney Offload be disabled?

Posted: 03 Jul 2013 03:07 PM PDT

I am advising someone on how to disable TCP Chimney Offload, which is enabled by default by Windows Server 2003 SP2. [1] We are working in an environment with multiple server machines, namely the application server machine which communicates with the database server machine.

On which machine should TCP Chimney Offload be disabled? Application, database or both?

[1] This can cause communication problems between application and database. See http://support.microsoft.com/kb/942861 and http://blogs.msdn.com/b/psssql/archive/2008/10/01/windows-scalable-networking-pack-possible-performance-and-concurrency-impacts-to-sql-server-workloads.aspx

MySQL replication: 'Duplicated entry for PRIMARY key'

Posted: 03 Jul 2013 12:30 PM PDT

Could you please help me to understand why I'm receiving 'Duplicated entry for PRIMARY key' on a slave server after a full re-sync.

Basically 'mysqldump' was running almost whole night and then the restore process took a couple of hours so when I've started the slave it was ~63874 seconds behind master.

The slave server is read only (read_only) and there was no any writes during re-sync process so I don't understand why there are duplicated keys.

Binary log format is set to MIXED on master.

Command used to backup DB:

mysqldump --opt --single-transaction -Q --master-data=2 db | bzip2 -cz > db.sql.bz2  

The slave is replicating only one database from master (db -> db_backup) with the following options:

replicate-wild-do-table = db_backup.%  replicate-rewrite-db = db->db_backup  

"SQL1042C An unexpected system error" when creating a database

Posted: 03 Jul 2013 01:22 PM PDT

I installed DB2 Express-C 10.1 on OS X 10.8.4. I installed it in user mode, and it seems to have created an instance with my username (avernet). I checked the instance is indeed there by running db2ilist. The database manager is running, as when I run db2start, I get the message SQL1026N The database manager is already active.

Now, I am trying to create a database by running db2 from the command line, and typing the command create database orbeon, but it returns:

QL1042C  An unexpected system error occurred.  SQLSTATE=58004  

What am I missing here? How can I create a simple database from the command line (I prefer not to bother with Data Studio as it isn't available for OS X)?

In db2dump/db2diag.log, I see the follwing error:

2013-06-29-15.31.38.948217-420 E159047E344          LEVEL: Error (OS)  PID     : 1766                 TID : 140735175762304PROC : db2star2  INSTANCE: avernet              NODE : 000  HOSTNAME: huashan.local  FUNCTION: DB2 UDB, SQO Memory Management, sqloMemCreateSingleSegment, probe:100  CALLED  : OS, -, shmget                           OSERR: 17  

Note that for the install to go through, I increased the OS X shared memory, per this recommendation.

Connection to local SQL Server 2012 can be established from SSMS 2008 but not from SSMS 2012

Posted: 03 Jul 2013 03:47 PM PDT

I have two local SQL Server instances running on my local machine. The first is SQL Server 2008 R2 Enterprise Edition (named MSSQLSERVER) and the 2nd is SQL Server 2012 Business Intelligence Edition.

My problem is with SSMS 2012 which can connect to distant servers but not the local 2012 instance; I can however connect to this instance from SSMS 2008.

The error message I get when trying to login is

Login Failed. The login is from an untrusted domain and cannot be used with Windows Authentication. (Microsoft SQL Server, Error: 18452)

I must point out that I don't have the necessary privileges to access SQL Server Configuration Manager (blocked by group policy).

Any help would be appreciated.

Is there a way to find the least recently used tables in a schema?

Posted: 03 Jul 2013 05:47 PM PDT

Is there a way to find the least recently used tables in a MySQL schema? Besides going into data directories? I was hoping there was a metadata or status trick-- but Update_Time in STATUS and INFORMATION_SCHEMA is always NULL.

Optimize command issuing "'View is not base table" and "Corrupt" error messages

Posted: 03 Jul 2013 01:46 PM PDT

I have no experience with MySQL. My boss ran an optimize against a MySQL database. Against a bunch of views we got the message

"tablexyz" is not BASE TABLE

and the next has message is

Corrupt

It seems to be only against views, we didn't get any of these error messages against the base tables.

Does this look like an incorrect error message, or do we have issues with our tables?

How to find Oracle home information on Unix?

Posted: 03 Jul 2013 08:47 PM PDT

Need help finding Oracle home path corresponding to a database instance in RAC environment. I am aware of few of the ways to achieve the same. Listing them below to avoid the same answers.

  1. /etc/oratab This file is not mandatory and hence may not have all instance information.

  2. Parsing contents of Listener.ora In RAC environment, the listener.ora can be located at non default location.

  3. use TNS_ADMIN to find Listener.ora location and parse the file.

  4. ORACLE_HOME env variable May not be set always.

  5. ps -ef | grep tns to get the home path from service name. Gives path for currently running listener

  6. select "SYSMAN"."MGMT$TARGET_COMPONENTS"."HOME_LOCATION"
    from "SYSMAN"."MGMT$TARGET_COMPONENTS"
    where "SYSMAN"."MGMT$TARGET_COMPONENTS"."TARGET_NAME" = <Database SID>

    The schema sysman can be dropped after first time login to oracle.

  7. SELECT NVL(SUBSTR(FILE_SPEC, 1, INSTR(FILE_SPEC, '\', -1, 2) -1) , SUBSTR(FILE_SPEC, 1, INSTR(FILE_SPEC, '/', -1, 2) -1)) FOLDER
    FROM DBA_LIBRARIES
    WHERE LIBRARY_NAME = 'DBMS_SUMADV_LIB';

    So if a DBA changes Oracle Home (and hence the location of libqsmashr.so) after installation of Oracle, the path retrieved from above query would be invalid.

  8. . oraenv Works only for 11g

I am trying to find out a generic way which will work for all Oracle versions and it should not be dependent on anything which is not useful to DBA.

Do you have any way other than listed above to do the same?

Many Thanks in advance.

How to handle "many columns" in OLAP RDBMS

Posted: 03 Jul 2013 12:31 PM PDT

I have a fact that has around 1K different numerical attributes (i.e. columns). I would like to store this in to a column-oriented DB and perform cube analysis on it.

I tried to design a star schema, but I'm not sure how to handle this many columns. Normalising it sounds wrong, but I can't just have flat columns either. The combination of attributes are also too diverse to have a simple dimension table for this, even if I'd reduce the numerical values into categories (ranges), which is an option. I thought about storing them as XML or JSON for each row, but that doesn't sound great either.

If it helps, I'm planning to use Amazon's redshift for the DB.

Note: We have strong preference for RedShift as it fits perfectly for at least other few operations we do on this data. Hence I want to avoid other technologies like HBase if possible.

Mysql DB server hits 400% CPU

Posted: 03 Jul 2013 11:46 AM PDT

I have been facing problem with my database server quite a month, Below are the observations that I see when it hits the top.

 - load average 40 to 50   - CPU % - 400%    - idle % - 45%   - wait % - 11%   - vmstat procs r-> 14 and b-> 5   

And then drains down within 5 minutes. And when I check the show processlist I see queries for DML and SQL are halted for some minutes. And it processes very slowly. Whereas each query are indexed appropriately and there will be no delay most of the time it returns less than 1 second for any query that are being executed to server the application.

  • Mysql Version : 5.0.77
  • OS : CentOS 5.4
  • Mem: 16GB RAM (80% allocated to INNODB_BUFFER_POOL_SIZE)
  • Database Size: 450 GB
  • 16 Processor & 4 cores
  • Not in per-table model.
  • TPS ranges 50 to 200.
  • Master to a Slave of the same configuration and seconds behind is 0.

Below url shows show innodb status \G and show open tables; at the time spike. And this reduced within 5 minutes. Sometimes rare scenarios like once in two months I see the processes takes more than 5 to 8 hours to drain normal. All time I notice the load processor utilization and how it gradually splits its task and keep monitoring the process and innodb status and IO status. I need not do anything to bring it down. It servers the applications promptly and after some time it drains down to normal. Can you find anything suspicious in the url if any locks or OS waits any suggestion to initially triage with or what could have caused such spikes ?

http://tinyurl.com/bm5v4pl -> "show innodb status \G and show open tables at DB spikes."

Also there are some concerns that I would like to share with you.

  1. Recently I have seen a table that gets inserts only about 60 per second. It predominantly locks for a while waiting for auto-inc to get released. And thus subsequent inserts stays in the processlist tray. After a while the table gets IN_USE of about 30 threads and later I don't know what it makes to free them and clears the tray. (At this duration the load goes more than 15 for 5 minutes)

  2. Suppose if you say application functionality should be shapped to best suite the DB server to react. There are 3 to 5 functionalities each are independent entities in schema wise. Whenever I see the locks it gets affected to all other schemas too.

  3. Now what makes best fuzzy is the last one. I see slave keeps in synch with master with a delay of 0 second all time whereas slave has a single thread SQL operation that is passed from relay IO that which acts in FIFO model from the binary logs where Master had generated. When this single headed slave can keep the load less and have the operations upto-date, should the concurrent hits are really made to be concurrent for the functionalities which I assume making the possible IO locks in OS level. Can this be organized in application itself and keep the concurrent tenure density thinner?

Error 1044 Access denied to user

Posted: 03 Jul 2013 04:47 PM PDT

This is driving me crazy.

When I try to create a few tables from my Workbench model I get this error 1044.

I've been trying to find a solution but nothing works for me.

Curiously when I run SELECT USER(),CURRENT_USER(); I get two versions of the same user. One is techboy@(an ip address) and the other is techboy@%.

Access is denied to both at one point or another.

The MySql server is a remote hosted server with the user permissions correctly set.

Multiple database servers for performance vs failover

Posted: 03 Jul 2013 06:47 PM PDT

If I have two database servers, and I am looking for maximum performance vs high-availability, what configuration would be best?

Assuming the architecture is two load-balanced web/app servers in front of two db servers, will I be able to have both db servers active with synced data, with web1 to db1, web2 to db2 setup? Is this active/active?

I'm also aware that the two db servers can have their own schema to manually 'split' the db needs of the app. In this case daily backups would be fine. We don't have 'mission critical data.'

If it matters, we have traffic around 3,000-7,000 simultaneous users.

Advices for modeling facts, time hierarchies based on two conditions

Posted: 03 Jul 2013 03:08 PM PDT

Sorry if my question does not belong to that SE but I tried to find the best place and it ended up here.

My problem is this :

I've got a table modeled as this in our datawarehouse env. :

MYTABLE (service_id, date_input, date_when_active, date_when_inactive, bunch_of_values...)  

Each night this table is updated with new records from our prod env. When a record changes in prod and already exist in dw, a new record is created in dw with the same serviceid and the old record gets updated with sysdate in when_inactive field. The newly created record has its when_inactive field to null.

We use a query to extract data from this table using some conditions to get exactly the value of all services in a precise period. The query goes like this :

select *  from mytable  where service_date_input between :p_date_start and :p_date_end  and :p_date_end between date_when_active and nvl(date_when_inactive,:p_date_end)  

What I would like to do is to model a time hierarchy in Cognos Framework Manager so I can drill through fiscal years, periods, weeks, months, etc.

What I can't figure out is if I need to remodel the table to include calculated fiscal_year_id, period_id, week_id, month_id, etc. directly in the DB or is it possible to do it directly in Cognos FWM?

One of the requirement would be to not force the use of two date sets on the user (filter by input date AND fiscal year). The input date should be filtered transparently : When you use a fiscal year parameter to filter or display the data, the input date rule should apply accordingly.

Slow backup and extremely slow restores

Posted: 03 Jul 2013 12:46 PM PDT

I don't normally work with MySQL but with MS-SQL and am having issues restoring a dump backup of a 9 GB database. I converted it to MS-SQL and it takes a grand total of 4 minutes to restore but the MySQL DB takes over an hour on the same server. The MySQL database is using InnoDB but is there an alternative to speeding up the restores? Both databases are on the same machine, Windows 2008R2 in a VM with a dymanic SANs.

Correction - it takes MS-SQL 1 minute to restore, 1 hour to restore the same database in MySQL

EDIT: mysql.ini (with commented lines removed):

[client]  no-beep  port=3306  [mysql]  default-character-set=utf8  [mysqld]  port=3306  basedir="C:\Program Files\MySQL\MySQL Server 5.5\"  datadir="C:\ProgramData\MySQL\MySQL Server 5.5\data\"  character-set-server=utf8  default-storage-engine=INNODB  sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"  log-output=NONE  general-log=0  general_log_file="CM999-SV510.log"  slow-query-log=0  slow_query_log_file="CM999-SV510-slow.log"  long_query_time=10  log-error="CM999-SV510.err"  max_connections=100  query_cache_size=0  table_cache=256  tmp_table_size=22M  thread_cache_size=8  myisam_max_sort_file_size=100G  myisam_sort_buffer_size=43M  key_buffer_size=8M  read_buffer_size=64K  read_rnd_buffer_size=256K  sort_buffer_size=256K  innodb_additional_mem_pool_size=4M  innodb_flush_log_at_trx_commit=1  innodb_log_buffer_size=2M  innodb_buffer_pool_size=124M  innodb_log_file_size=63M  innodb_thread_concurrency=9  

TempDB data files don't shrink well on SQL 2008

Posted: 03 Jul 2013 07:47 PM PDT

So ideally you want to pre-size your TempDB data and log files appropriately so that this isn't a concern, but sometimes a rogue developer runs a crazy huge query on a production server during work hours, causing the TempDB data files to blow up huge.

If TempDB was literally the only thing on the drive, then I could probably just leave it like that, but on some servers I have several SQL instances that all share the same TempDB drive.

So how can I shrink these TempDB data files without restarting the instance?

I normally try:

DBCC SHRINKFILE (name = 'tempdev', size = 5000)  

This worked fairly consistently in SQL 2000 and 2005 (assuming the actual tempdb activity had tapered off), but seems to not work very often in 2008 (on my current server, it only worked on 1 of 4 data files, the others continue to remain 2-3x larger).

No comments:

Post a Comment

Search This Blog