Saturday, October 12, 2013

[how to] MySQL Database "Table Doesn't Exist" When Clicked in phpMyAdmin

[how to] MySQL Database "Table Doesn't Exist" When Clicked in phpMyAdmin


MySQL Database "Table Doesn't Exist" When Clicked in phpMyAdmin

Posted: 12 Oct 2013 02:36 PM PDT

I recently updated MAMP (the LocalHost for Mac) to the latest Version 2.2 in order to get the latest versions of Apache, MySQL, and PHP. After the upgrade, all my LocalHost Websites are unusable. They can't load in the Browser (with MAMP running). I see the MySQL DB Files end in .FRM (Form). When I click on a Table in phpMyAdmin, it says "Table Does Not Exist," even though they are Listed in phpMyAdmin and the Folder for that particular Database inside MAMP/db. How do I fix this, to be able to Edit My Websites Locally?

How to split/explode comma delimited string field into SQL query

Posted: 12 Oct 2013 05:57 AM PDT

I have field id_list='1234,23,56,576,1231,567,122,87876,57553,1216'

and I want to use it to search IN this field:

SELECT *   FROM table1  WHERE id IN (id_list)  
  • id is integer

  • id_list is varchar/text

But in this way this doesn't work, so I need in some way to split id_list into select query.

What solution should I use here? I'm using the T-SQL Sybase ASA 9 database (SQL Anywhere). But in this way this doesn't work, so I need in some way to split id_list into select query.

Way I see this, is to create own function with while loop through, and each element extract based on split by delimiter position search, then insert elements into temp table which function will return as result.

MySQL needs more space

Posted: 12 Oct 2013 04:55 AM PDT

I am using a program to import a Wikipedia dump to my local mysql server. The program is running. I start it four days ago. Unfortunately the drive C: is going to be full. I have two HDDs connected to my PC. Each one are 80GB. The econd HDD is empty. How can prevent the program from throwing exception. It has no pause option. Is is possible to use second HDD in the scenario?

enter image description here

Design DB for users with different information fields?

Posted: 12 Oct 2013 10:46 AM PDT

Let's say I want to register all faculty of a university and they are in different fields and have different resume information. Here we have some common fields and some field specific fields. for example for CS and economic and medicine we may have:

CS:

ID:123  Name:Ali  Lname:Alipour  projects:p1, p3, p6  programming Skills:C[5 star],Java[5 star],JS[3 star],etc.  Research Interest: Network   

Medicine faculty:

ID:456  Name:Jafar  Lname:Jafarson  Hospital experience:hospital1 (2 year), Hopital2(1 month),...  Research Interest: Human body  

Economic faculty:

ID:789  Name:Sadegh  Lname:Alipour  Company: Company1(company_name, add, tel,etc), Company2(company_name, add, tel,etc)  Research Interest: online currencies  

We also may have students in the system:

ID:st_123  Name:mahdi  Lname:Mahdiyar  Major:CS  reg_date:2013.09.09  

My first guess was to make a user_table and place all the common fields in it and then make field specific table for each major, CS_faculty_table, Medicine_faculty_table, Economic_faculty_table and student_table. Then thought about inheritance and polymorphism is OO programming language and if it is available in DB area. Then I saw accepted answer of this question that says you can have different fields inside a JSON file and store it in your DB as a BLOB field!

I'm new to DB filed and don't have enough experience in designing DBs so I wanted to know what's the best approach in this kind of situations?

I want a general answer. I don't know if the DB differs or not, and I don't care to change my DB to another open-source DB (sql or no-sql). I had PostgreSQL in mind.

Transactions' order of commitment within Serializable schedules

Posted: 12 Oct 2013 02:31 AM PDT

The following diagram was taken from a book(T1 and T2 are transactions which read and write to databases objects A and B). For convenience, I quoted few lines of text in that book which describe the diagram and the quote is below.

"Even though the actions of T1 and T2 are interleaved, the result of this schedule is equivalent to running T1(in its entirety) and then running T2. Intuitively, T1's read and write of B is not influenced by T2's actions on A, and the net effect is the same if these actions are 'swapped' to obtain the serial schedule T1;T2."

My question is regarding this "Even though the actions of T1 and T2 are interleaved, the result of this schedule is equivalent to running T1(in its entirety) and then running T2." statement. How can this be true if T2 commits before T1? Please give a detailed answer.

A serializable schedule

MySQL: Optimizing for large but discrete data sets

Posted: 12 Oct 2013 10:17 AM PDT

In brief: I'm developing a database that handles GTFS datasets from multiple transit agencies. Each dataset contains millions of rows in the stop_times.txt file (and thus its corresponding table). Updating the table gets slower and slower as it gets bigger. I can deal with a couple of million rows from a single agency, but what happens when I add 10 more feeds? 50?

Now, the data sets are completely independent of one another. I won't be trying to join information across DART, MTA, and Transport for London. I feel like it would be very bad database design, but I'm tempted to create a separate table for each and forget about the whole thing.

I'm sure this has been answered somewhere, but I really don't know what I'm searching for. I've read up a bit on partitioning, but I'm not sure if that will solve my problem. Would adding a hash partition on my agency_id field solve issues with exploding BTREE indexes?

Here's my current table structure:

CREATE TABLE `stop_times` (    `trip_id` bigint(20) unsigned DEFAULT NULL,    `arrival_time` time DEFAULT NULL,    `departure_time` time DEFAULT NULL,    `stop_id` bigint(20) unsigned DEFAULT NULL,    `stop_sequence` smallint(5) unsigned DEFAULT NULL,    `stop_headsign` tinytext,    `route_id` mediumint(8) unsigned DEFAULT NULL,    `feed_id` smallint(5) unsigned DEFAULT NULL,    `update_id` int(10) unsigned DEFAULT NULL,    UNIQUE KEY `stop_sequence` (`trip_id`,`stop_sequence`) USING HASH,    KEY `trip_id` (`trip_id`) USING HASH,    KEY `departure_time` (`departure_time`) USING BTREE,    KEY `stop_id` (`stop_id`) USING HASH,    KEY `feed_id` (`feed_id`) USING HASH,    KEY `update_id` (`update_id`) USING HASH  ) ENGINE=InnoDB DEFAULT CHARSET=latin1;  

Thanks in advance for the help.

Cannot create stored procedure

Posted: 12 Oct 2013 08:13 AM PDT

I have the following piece of statement entered into MySQL5.6 Command Line Client. However, the following error was received. I haven't even been able to add in END// Delimiter; after the select statement.

At the same time, i was wondering after the stored procedure has been created successfully, how do i CALL the stored procedure without the command line but using java codes.

Kindly assist. Greatly appreciated!

enter image description here

synchronizing local and server database

Posted: 12 Oct 2013 08:19 AM PDT

I am developing a billing system software. For this I have created a database it contains many tables database and this is in local system all transactions data will be stored in local systems database.

I would like to provide data backup tables in server. Whenever the person(billing s/w user) wants he/she can upload the data to server(only newly added data should be uploded if existing data is there).

If local system data is currepted or get deleted by some reason it can be downloaded from the database server. This all features should be done by using billing software. How to do this.

Should I join datetime to a date using cast or range?

Posted: 12 Oct 2013 05:28 AM PDT

This question is a take-off from the excellent one posed here:

Cast to date is sargable but is it a good idea?

In my case, I am not concerned with the WHERE clause but in joining to an events table which has a column of type DATE

One table has DATETIME2 and the other has DATE... so I can effectively JOIN using a CAST( AS DATE) or I can use a "traditional" range query (>= date AND < date+1).

My question is which is preferable? The DATETIME values will almost never match the predicate DATE value.

I expect to stay on the order of 2M rows having the DATETIME and under 5k having the DATE (if this consideration makes a difference)

Should I expect the same behavior on the JOIN as I might using the WHERE clause? Which should I prefer to retain performance with scaling? Does the answer change with MSSQL 2012?

My generalized use-case is to treat my events table like a calendar table

SELECT      events.columns      ,SOME_AGGREGATIONS(tasks.column)  FROM      events  LEFT OUTER JOIN      tasks          --This appropriately states my intent clearer          ON CAST(tasks.datetimecolumn AS DATE) = events.datecolumn           --But is this more effective/scalable?          --ON tasks.datetimecolumn >= events.datecolumn           --AND tasks.datetimecolumn < DATEADD(day,1,events.datecolumn)  GROUP BY      events.columns  

Booted by MYSQL Error (2003) 10060 mid way through work

Posted: 12 Oct 2013 10:24 AM PDT

I was working on some queries and then my HeidiSQL froze, I tried to reboot the connection and I get good old MYSQL Error (2003) (10060). It worked just fine before that.

I haven't made any firewall changes, and I checked the "white list" of IPs on AWS it still was fine. I encountered this error code before but never during work with no changes.

Thoughts?

Edit 1:
I turned OFF firewall and still same error

Edit 2:
It works all of a sudden, but I would like to know why such thing happened. Connection issues?

insufficient privileges while executing oracle stored procedure?

Posted: 12 Oct 2013 08:26 AM PDT

Im getting insufficient privileges error while executing the following oracle stored procedure. Im using Oracle Database 10g Express Edition.

CREATE OR REPLACE  PROCEDURE sp_update_acounts(      accounts_file_dir  IN VARCHAR2,      accounts_file_name IN VARCHAR2)  IS  BEGIN    EXECUTE IMMEDIATE 'CREATE OR REPLACE DIRECTORY ext_accounts_dir AS '''||accounts_file_dir||'''';      EXECUTE IMMEDIATE 'grant read, write on directory ext_accounts_dir to myuser';    EXECUTE IMMEDIATE 'drop table crm_account_stage';    EXECUTE IMMEDIATE 'CREATE TABLE crm_account_stage (entity_account_id NUMBER(19,0), crm_id VARCHAR2(255 CHAR)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY    ext_accounts_dir ACCESS PARAMETERS (FIELDS TERMINATED BY '','' ( entity_account_id CHAR(225), crm_id CHAR(225))) LOCATION ('''||accounts_file_name||''''||') )';    MERGE INTO ua_crm_accounts acc  USING (    SELECT entity_account_id,           crm_id    FROM crm_account_stage) acc_stage  ON (acc_stage.entity_account_id = acc.pkey)  WHEN MATCHED THEN    UPDATE SET acc.crm_id = acc_stage.crm_id;    END;  

Im using the post Update oracle sql database from CSV to build this SP. I could compile this stored procedure sucessfully. I have all the rights for the oracle user because im the admin. I have given all possible rigts.

But when i execute the SP im getting error like

Error starting at line 13 in command:  execute sp_update_acounts('C:\Users\surenr\Desktop\UA\Intitial-Conversion','acc_data.csv')  Error report:  ORA-01031: insufficient privileges  ORA-06512: at "myuser.SP_UPDATE_ACOUNTS", line 7  ORA-06512: at line 1  01031. 00000 -  "insufficient privileges"  *Cause:    An attempt was made to change the current username or password             without the appropriate privilege. This error also occurs if             attempting to install a database without the necessary operating             system privileges.             When Trusted Oracle is configure in DBMS MAC, this error may occur             if the user was granted the necessary privilege at a higher label             than the current login.  *Action:   Ask the database administrator to perform the operation or grant             the required privileges.             For Trusted Oracle users getting this error although granted the             the appropriate privilege at a higher label, ask the database             administrator to regrant the privilege at the appropriate label.  

Update: I'm not trying to update user or password. But the error message says as i'm trying to modify user details. When i try the same code step by step outside the stored procedure, its executing without any problem.

What could be the reason for this? How can i resolve the issue?

One Materialized Views in Two Refresh Groups

Posted: 12 Oct 2013 02:40 AM PDT

I have five materialized views that I want to refresh in two occasions, every Sunday and at the 1st of every month. I created a Refresh Group for the weekly and that works fine. But when I tried to create the second Refresh Group for the monthly I get a "materialized view is already in a refresh group".

You can have a materialized view in only one refresh group?

What options do I have to refresh it in different intervals?

"ORA-03113: end-of-file on communication channel" on startup

Posted: 12 Oct 2013 06:26 AM PDT

I have been reading posts here, on Oracle support, and anywhere else I can find for the last three days and I've given up on this problem...

An Oracle database hung. Shutdown of the database sat for a few hours and then it quit. It wouldn't restart. The server was restarted. Oracle was restarted. Going step by step: startup nomount works, alter database mount works, alter database open returns ORA-03113. This is all on localhost - not over the network. The machine has no firewall of any kind running.

Any idea how to get past this ORA-03113 error? I've been on the phone with support in India for the last 4.5 hours and I haven't found anyone helpful yet.

Create Login command error

Posted: 12 Oct 2013 03:26 PM PDT

What is wrong here in following screen shot? enter image description here

I am trying solutions posted here and here to add a login

MySQL gives me:“Can't open and lock privilege tables: Table 'host' is read only”

Posted: 12 Oct 2013 12:26 PM PDT

I am facing problem restoring a MySQL database. My primary database was MySQL 5.1 and now I am trying to copy it to MySQL 5.5. The database was backed up by using Xtrabackup.

I am using Ubuntu 12.04.3 LTS on this server, MySQL version is: 5.5.32-0ubuntu0.12.04.1-log.

I have followed all the steps to restore using Xtrabackup, this created database files, which I have copied to a tmp directory.

I have modified my.cnf to point to this tmp directory. I have changed the tmp directory permissions and changed the ownership of the files to mysql user.

drwxr-xr-x 12 mysql mysql 4096 Sep 10 10:04 base  

Now when I start the MySQL server I get this error:

[ERROR] Fatal error: Can't open and lock privilege tables: Table 'host' is read only

I have given a try as follows:

  1. Even tried installing MySQL 5.1 to see if that was the issue.
  2. tried chcon mysql_db_t to change the context but it gives me:

    can't apply partial context to unlabelled file

  3. I have used --skip-grant to get into database, but using this I can only access InnoDB tables only, MyISAM tables throw read-only error.

  4. After --skip-grant, I have used upgrade_mysql. This throws me errors stating that many tables are read-only.
  5. I have removed AppArmor too. And restarted too.
  6. I have restored a different database (5.1 to 5.5) previously on Ubuntu 12.04.2 LTS without any issues.

Can some please point me in right direction? I am not sure whats wrong with permissions.

Percona Xtradb Cluster : How to speed up insert?

Posted: 12 Oct 2013 02:26 PM PDT

I recently installed a 3 full master node cluster based on Percona Xtradb (very easy install). But now i need to make some tuning to increase INSERT/UPDATE requests. Actually, i made around 100 insertions every 5 minutes, but also made around 400 update in the same time. All this operation take less than 3 minutes when i was on a single server architecture. And now, with 3 node server, it takes more than 5 minutes ...

Is there any tuning i can do to speed up this operations ? Here is my actual cnf configuration :

[mysqld]  datadir=/var/lib/mysql  user=mysql    wsrep_provider=/usr/lib/libgalera_smm.so  wsrep_cluster_address=gcomm://dbnode01,dbnode02,dbnode03    binlog_format=ROW  default_storage_engine=InnoDB  innodb_locks_unsafe_for_binlog=1  innodb_autoinc_lock_mode=2  wsrep_node_address=1.2.3.4  wsrep_cluster_name=my_cluster  wsrep_sst_method=xtrabackup  wsrep_sst_auth="user:password"  

Here are the 3-server hard config :

Node#1

CPU: Single Processor Quad Core Xeon 3470 - 2.93Ghz (Lynnfield) - 1 x 8MB cache w/HT  RAM: 8 GB DDR3 Registered 1333  HDD: 500GB SATA II  

Node#2

CPU: Single Processor Quad Core Xeon 1270 V2 - 3.50GHz (Ivy Bridge) - 1 x 8MB cache w/HT  RAM: 4 GB DDR3 1333  HDD: 1.00TB SATA II  

Node#3

CPU: Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz (4-Cores)  RAM: 32G  HDD: 2T  

UPDATE

Actualy there's around 2.4M records (24 fields each) in the table concerned by the INSERT/UPDATE statements (6 fields indexed).

How to design a table that each rows have 5K boolean attributes?

Posted: 12 Oct 2013 11:26 AM PDT

I have about 2M rows and each row looks like the following.

244 true false ... true

-> One integer column(V) and about 5K boolean columns(B1, B2, ..., B5K) associated to the integer.

Due to the limitation of the maximum number of columns that I can have for a row, I have separated the boolean columns(attributes) in a separate table.

Table V:    idx_V value_V  --------------  1     244      ...    Table B:    idx_V idx_B value_B  --------------------  1     1     true  1     2     false  ...  1     5K    true  ...  

This design works alright when I try to find V's that match one boolean column. For example, finding V's where the 2nd boolean attribute is true:

select value_V   where VT.idx_A = BT.idx_A       and idx_B = 2       and value_B = true   from V_Table as VT       and B_Table as BT  

But the query becomes awful when I have to find V's that match a multiple boolean columns, sometimes even for all 5K columns, like finding V's with B1=true, B2=false, B3=true, ... and B5K=false.

My primary use of the tables would be the following 2:

  1. Find V's that x1th, x2th and xnth boolean columns are false/true (n can be anything between 1 and 5K)
  2. Sublists:
    • Find the sequence of the boolean columns for a specific V: T F T T F F ...
    • Find other V's that match the sequence found in 2-A

I'm thinking about constructing a varchar[5K] field to store the boolean sequence to do 2 but it seems like there's just too much waste in space since each boolean only requires just 1 bit but I'm allocating a byte.

What would be the best way to go about this?

Is there a way to implement a cross-database task on SQL Server 2012 with the Availability Groups feature?

Posted: 12 Oct 2013 10:25 AM PDT

We use SQL Server 2012 and its new Availability Groups (AG) feature. There is a task for moving old data of some tables from one database to another database. Both databases are included into different availability groups.

Previously (before using the AG feature) the task was resolved by adding the second server instance as a linked server (sp_addlinkedserver) and executing a distributed transaction in the following way:

  1. begin transaction
  2. insert old data into server2.table2 from server1.table1
  3. delete old data from server1.table1
  4. commit transaction

Unfortunately, distributed transactions are not supported for AG because databases may become inconsistent in case of failover (http://technet.microsoft.com/en-us/library/ms366279.aspx).

Is there some way to implement this task with keeping the AG feature and without implementing the rollback logic in case of exceptions?

Migrating from SQL Server to MySQL using MySQL Workbench tool

Posted: 12 Oct 2013 10:24 AM PDT

I'm trying to migrate few tables from SQL Server to MySQL using MySQL Workbench migration wizard. All work fine for structure migrations but when I go to the data migration section it throws an error for one table:

ERROR: dbo.Documents:SQLExecDirect(SELECT [DocumentID], [CategoryID], CAST([DocumentName] as NVARCHAR(255)) as [DocumentName], [Active], [NavigatorID], CAST([DocumentText] as NTEXT) as [DocumentText], [UseSubtitle], CAST([DocumentSubtitle] as NVARCHAR(255)) as [DocumentSubtitle], CAST([DocumentPlainText] as NTEXT) as [DocumentPlainText], [DocumentType], CAST([DocumentLink] as NVARCHAR(255)) as [DocumentLink], [Sitemap], CAST([SubtitleImage] as NVARCHAR(255)) as [SubtitleImage], CAST([MetaTags] as NVARCHAR(8000)) as [MetaTags], CAST([MetaDescription] as NVARCHAR(8000)) as [MetaDescription], [AccessLevel] FROM [ctool_test].[dbo].[Documents]): 42000:1131:[Microsoft][ODBC SQL Server Driver][SQL Server]The size (8000) given to the convert specification 'nvarchar' exceeds the maximum allowed for any data type (4000).

2131:[Microsoft][ODBC SQL Server Driver][SQL Server]The size (8000) given to the convert specification 'nvarchar' exceeds the maximum allowed for any data type (4000).

Based on that what I can understand it limits columns with nvarchar data to max size of 4000 when MySQL can handle 65535.

Any clue how I can get this to work?

Thanks

Restoring database to UNC path on local drive

Posted: 12 Oct 2013 10:26 AM PDT

When I try to restore a database using a restore command with a local UNC path:

RESTORE DATABASE [dbname]   FROM DISK = N'\\PC91\D\backup.BAK' WITH  FILE = 1,    MOVE N'test' TO N'\\PC91\D\dbname.MDF',    MOVE N'test_log' TO N'\\PC91\D\dbname_log.LDF',    NOUNLOAD, STATS = 10  

I get an error:

Msg 3634, Level 16, State 1, Line 1
The operating system returned the error '5(Access is denied.)' while attempting 'CreateFileW' on '\PC91\D\dbname.MDF'.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

If I use a local drive letter instead, then it works:

RESTORE DATABASE [dbname]   FROM DISK = N'D:\backup.BAK' WITH FILE = 1,    MOVE N'test' TO N'D:\dbname.MDF',    MOVE N'test_log' TO N'D:\dbname_log.LDF',    NOUNLOAD, STATS = 10  

This command also restores the database to same folder. So why is there an error when I specify the network path?

how to run Db2 export command in shell

Posted: 12 Oct 2013 09:26 AM PDT

I am trying to run the following db2 command through the python pyodbc module

IBM DB2 Command : "DB2 export to C:\file.ixf of ixf select * from emp_hc"

i am successfully connected to the DSN using the pyodbc module in python and works fine for select statement

but when i try to execute the following command from the Python IDLE 3.3.2

cursor.execute(" export to ? of ixf select * from emp_hc",r"C:\file.ixf") pyodbc.ProgrammingError: ('42601', '[42601] [IBM][CLI Driver][DB2/LINUXX8664] SQL0104N An unexpected token "db2 export to ? of" was found following "BEGIN-OF-STATEMENT". Expected tokens may include: "". SQLSTATE=42601\r\n (-104) (SQLExecDirectW)')

or cursor.execute(" export to C:\file.ixf of ixf select * from emp_hc")

Traceback (most recent call last): File "", line 1, in cursor.execute("export to C:\myfile.ixf of ixf select * from emp_hc") pyodbc.ProgrammingError: ('42601', '[42601] [IBM][CLI Driver][DB2/LINUXX8664] SQL0007N The character "\" following "export to C:" is not valid. SQLSTATE=42601\r\n (-7) (SQLExecDirectW)')

am i doing something wrong ? any help will be greatly appreciated.


From what i came to know db2 export is a command run in shell, not through SQL via odbc.

can you please give me some more information on how to run the command in the shell i am confused and what does that mean ? any guide or small quick tutorial will be great

Import from incremental backups to a new host in Oracle 11g

Posted: 12 Oct 2013 05:26 PM PDT

I am using Oracle 11g. I would like to know that whether it is possible to import from incremental level 0 & 1 backups to a new host using RMAN. If yes, how can I do that?

For level 1 I am using differential method.

InnoDB Tablespace critical error in great need of a fix

Posted: 12 Oct 2013 07:26 AM PDT

Link to screenshot : http://www.nouvellesduquartier.com/i/1/p/Munin_%20Critical_MySql_InnoDB_.JPG (The value reported is outside the allowed range - Byte free, free, gauge, warn, critic)

Question: Could the error shown on the screenshot be the reason why my site is very slow?

If so, I really need help to fix it since I am far from beeing an engeneer! Thank you in advance.

Syncronize mysql databases between local and hosted servers automatically

Posted: 12 Oct 2013 05:26 AM PDT

We have many website with Development , Staging and Production Server. we have many developers for many projects, we need a solution to synchronize the database with developer database with staging database. after that one is works we can move to live database.

That one is need to be fully automatically synchronize so that developer dont need to run that tool each and every time

effective mysql table/index design for 35 million rows+ table, with 200+ corresponding columns (double), any combination of which may be queried

Posted: 12 Oct 2013 06:26 PM PDT

I am looking for advice on table/index design for the following situation:

i have a large table (stock price history data, InnoDB, 35 million rows and growing) with a compound primary key (assetid (int),date (date)). in addition to the pricing information, i have 200 double values that need to correspond to each record.

CREATE TABLE `mytable` (  `assetid` int(11) NOT NULL,  `date` date NOT NULL,  `close` double NOT NULL,  `f1` double DEFAULT NULL,     `f2` double DEFAULT NULL,  `f3` double DEFAULT NULL,     `f4` double DEFAULT NULL,   ... skip a few …  `f200` double DEFAULT NULL,   PRIMARY KEY (`assetid`, `date`)) ENGINE=`InnoDB` DEFAULT CHARACTER SET latin1 COLLATE      latin1_swedish_ci ROW_FORMAT=COMPACT CHECKSUM=0 DELAY_KEY_WRITE=0       PARTITION BY RANGE COLUMNS(`date`) PARTITIONS 51;  

i initially stored the 200 double columns directly in this table for ease of update and retrieval, and this had been working fine, as the only querying done on this table was by the assetid and date (these are religiously included in any query against this table), and the 200 double columns were only read. My database size was around 45 Gig

However, now i have the requirement where i need to be able to query this table by any combination of these 200 columns (named f1,f2,...f200), for example:

select from mytable   where assetid in (1,2,3,4,5,6,7,....)  and date > '2010-1-1' and date < '2013-4-5'  and f1 > -0.23 and f1 < 0.9  and f117 > 0.012 and f117 < .877  etc,etc  

i have not historically had to deal with this large of an amount of data before, so my first instinct was that indexes were needed on each of these 200 columns, or i would wind up with large table scans, etc. To me this meant that i needed a table for each of the 200 columns with primary key, value, and index the values. So i went with that.

CREATE TABLE `f1` (  `assetid` int(11) NOT NULL DEFAULT '0',  `date` date NOT NULL DEFAULT '0000-00-00',  `value` double NOT NULL DEFAULT '0',  PRIMARY KEY (`assetid`, `date`),  INDEX `val` (`value`)  ) ENGINE=`InnoDB` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci ROW_FORMAT=COMPACT CHECKSUM=0 DELAY_KEY_WRITE=0;  

i filled up and indexed all 200 tables. I left the main table intact with all 200 columns, as regularly it is queried over assetid and date range and all 200 columns are selected. I figured that leaving those columns in the parent table (unindexed) for read purposes, and then additionally having them indexed in their own tables (for join filtering) would be most performant. I ran explains on the new form of the query

select count(p.assetid) as total   from mytable p   inner join f1 f1 on f1.assetid = p.assetid and f1.date = p.date  inner join f2 f2 on f2.assetid = p.assetid and f2.date = p.date   where p.assetid in(1,2,3,4,5,6,7)  and p.date >= '2011-01-01' and p.date < '2013-03-14'   and(f1.value >= 0.96 and f1.value <= 0.97 and f2.value >= 0.96 and f2.value <= 0.97)   

Indeed my desired result was achieved, explain shows me that the rows scanned are much smaller for this query. However i wound up with some undesirable side effects.

1) my database went from 45 Gig to 110 Gig. I can no longer keep the db in RAM. (i have 256Gig of RAM on the way however)

2) nightly inserts of new data now need to be done 200 times instead of once

3) maintenance/defrag of the new 200 tables take 200 times longer than just the 1 table. It cannot be completed in a night.

4) queries against the f1, etc tables are not necessarily performant. for example:

 select min(value) from f1    where assetid in (1,2,3,4,5,6,7)    and date >= '2013-3-18' and date < '2013-3-19'  

the above query, while explain shows that it lookgin at < 1000 rows, can take 30+ seconds to complete. I assume this is because the indexes are too large to fit in memory.

Since that was alot of bad news, I looked further and found partitioning. I implemented partitions on the main table, partitioned on date every 3 months. Monthly seemed to make sense to me but i have read that once you get over 120 partitions or so, performance suffers. partitioning quarterly will leave me under that for the next 20 years or so. each partition is a bit under 2 Gig. i ran explain partitions and everything seems to be pruning properly, so regardless i feel the partitioning was a good step, at the very least for analyze/optimize/repair purposes.

I spent a good deal of time with this article

http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/testing-partitions-large-db.html

my table currently is partitioned with primary key still on it. The article mentions that primary keys can make a partitioned table slower, but if you have a machine that can handle it, primary keys on the partitioned table will be faster. Knowing i have a big machine on the way (256 G RAM), i left the keys on.

so as i see it, here are my options

Option 1

1) remove the extra 200 tables and let the query do table scans to find the f1, f2 etc values. non-unique indexes can actually hurt performance on a properly partitioned table. run an explain before the user runs the query and deny them if the number of rows scanned is over some threshold i define. save myself the pain of the giant database. Heck, it will all be in memory soon anyways.

sub-question:

does it sound like i have chosen an appropriate partition scheme?

Option 2

Partition all the 200 tables using the same 3 months scheme. enjoy the smaller row scans and allow the users to run larger queries. now that they are partitioned at least i can manage them 1 partition at a time for maintenance purposes. Heck, it will all be in memory soon anyways. develop efficient way to update them nightly.

sub-question:

do you see a reason that i may avoid primary key indexes on these f1,f2,f3,f4... tables, knowing that i always have assetid and date when querying? seems counter intuitive to me but i am not used to data sets of this size. that would shrink the database a bunch i assume

Option 3

Drop the f1,f2,f3 columns in the master table to reclaim that space. do 200 joins if i need to read 200 features, maybe it wont be as slow as it sounds.

Option 4

You all have a better way to structure this than i have thought of so far.

* NOTE: i will soon be adding another 50-100 of these double values to each item, so i need to design knowing that is coming

thanks for any and all help

Update #1 - 3/24/2103

I went with the idea suggested in the comments i got below and created one new table with the following setup:

create table 'features'{    assetid int,    date    date,    feature varchar(4),    value   double  }  

I partitioned the table in 3 month intervals.

I blew away the earlier 200 tables so that my database was back down to 45 Gig and started filling up this new table. A day and a half later, it completed, and my database now sits at a chubby 220 Gigs!

It does allow the possibility of removing these 200 values from the master table, as i can get them from one join, but that would really only give me back 25 Gigs or so maybe

I asked it to create a primary key on assetid, date,feature and an index on value, and after 9 hours of chugging it really hadn't made a dent and seemed to freeze up so i killed that part off.

i rebuilt a couple of the partitions but it did not seem to reclaim much/any space.

So that solution looks like it probably isn't going to be ideal. Do rows take up significantly more space than columns i wonder, could that be why this solution took up so much more space?

I came across this article

http://www.chrismoos.com/2010/01/31/mysql-partitioning-tables-with-millions-of-rows

it gave me an idea.

where he says

"At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions."

Now i am range partitioning by date as well, but will also be allowing searches by large date range, which will decrease the effectiveness of my partitioning. I will always have a date range when i search, however i will also always have a list of assetids. Perhaps my solution should be to partition by assetid and date, where i identify typically searched assetid ranges (which i can come up with, there are standard lists, S&P 500, russell 2000, etc). this way i would almost never look at the entire data set.

Then again, i am primary keyed on assetid and date anyways, so maybe that wouldnt help much.

any more thoughts/comments would be appreciated

thanks

SELECTing multiple columns through a subquery

Posted: 12 Oct 2013 01:26 PM PDT

I am trying to SELECT 2 columns from the subquery in the following query, but unable to do so. Tried creating alias table, but still couldn't get them.

SELECT DISTINCT petid, userid,  (SELECT MAX(comDate) FROM comments WHERE petid=pet.id) AS lastComDate,  (SELECT userid FROM comments WHERE petid=pet.id ORDER BY id DESC LIMIT 1) AS lastPosterID    FROM pet LEFT JOIN comments ON pet.id = comments.petid  WHERE userid='ABC' AND deviceID!='ABC' AND comDate>=DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 2 MONTH);  

Basically, I am trying to get the lastComDate & lastPosterID from the same row - the row which is the latest one in comments for the specific pet. Please suggest how can I get them in an efficient way.

The above query works, but seems overkill as same row is fetched twice. Moreover, the ORDER BY clause is significantly slower than the aggregate function - as I found while profiling query. So, a solution avoiding sorting would be appreciated.

Designing Simple Schema for Disaggregation of Demand Forecast

Posted: 12 Oct 2013 04:26 PM PDT

I am doing a simple database design task as a training exercise where I have to come up with a basic schema design for the following case:

I have a parent-child hierarchy of products (example, Raw Material > Work in Progress > End Product).

  • Orders are placed at each level.
  • Number of orders shall be viewable in weekly buckets for the next 6 months.
  • Demand forecast can be done for each product level.
  • Demand forecast for any week within next 6 months can be done today.
  • Demand forecast is done for weekly buckets, for the next 6 months.

Demand Forecast is usually done at the higher level in hierarchy (Raw Material or Work in Progress level) It has to be disaggregated to a lower level (End Product).

There are 2 ways in which demand forecast can be disaggregated from a higher level to lower level:

  1. User specifies percentage distribution for end product. Say, there's a forecast of 1000 for Work In Progress.. and user says I want 40% for End Product 1 and 60% for End Product 2 in bucket 10.. Then for 10th week (Sunday to Saturday) from now, forecast value for End Product 1 would be 400 and, for End Product 2 would be 600.
  2. User says, just disaggregate according to orders placed against end products in Bucket 5, and orders in bucket 5 for End Product 1 and 2 are 200 and 800 respectively, then forecast value for EP1 would be ((200/1000) * 100)% and for EP2 would be ((800/1000) * 100)% of forecast for 'Work in Progress'.

Forecast shall be viewable in weekly buckets for the next 6 months and the ideal format should be:

product name | bucket number | week start date | week end date | forecast value | created_on  

PRODUCT_HIERARCHY table could look like this:

id  |   name                |   parent_id  __________________________________________  1   |   raw material        |   (null)  2   |   work in progress    |   1  3   |   end product 1       |   2  4   |   end product 2       |   2  

ORDERS table might look like this:

id | prod_id | order_date | delivery_date | delivered_date  

where,

prod_id is foreign key that references id of PRODUCT_HIERARCHY table,

How to store forecast? What would be a good basic schema for such a requirement?


My idea to select orders for 26 weekly buckets is:

SELECT      COUNT(*) TOTAL_ORDERS,      WIDTH_BUCKET(          delivery_date,          SYSDATE,          ADD_MONTHS(sysdate, 6),           TO_NUMBER( TO_CHAR(SYSDATE,'DD-MON-YYYY') - TO_CHAR(ADD_MONTHS(sysdate, 6),'DD-MON-YYYY') ) / 7      ) BUCKET_NO  FROM      orders_table  WHERE      delivery_date BETWEEN SYSDATE AND ADD_MONTHS(sysdate, 6);  

But this will give weekly buckets starting from today irrespective of the day. How can I convert them to Sunday to Saturday weeks in Oracle?

Please help designing this database structure.

(will be using Oracle 11g)

Cast to date is sargable but is it a good idea?

Posted: 12 Oct 2013 05:25 AM PDT

In SQL Server 2008 the datatype date datatype was added.

In this connect item you can see that casting a datetime column to date is sargable and may use an index on the datetime column.

select *  from T  where cast(DateTimeCol as date) = '20130101';  

The other option you have is to use a range instead.

select *  from T  where DateTimeCol >= '20130101' and        DateTimeCol < '20130102'  

Are these queries equally good or should one be preferred over the other?

SQL Fiddle

T SQL Table Valued Function to Split a Column on commas

Posted: 12 Oct 2013 07:55 AM PDT

I wrote a Table Valued Function in Microsoft SQL Server 2008 to take a comma delimited column in a database to spit out separate rows for each value.

Ex: "one,two,three,four" would return a new table with only one column containing the following values:

one  two  three  four  

Does this code look error prone to you guys? When I test it with

SELECT * FROM utvf_Split('one,two,three,four',',')   

it just runs forever and never returns anything. This is getting really disheartening especially since there are no built in split functions on MSSQL server (WHY WHY WHY?!) and all the similar functions I've found on the web are absolute trash or simply irrelevant to what I'm trying to do.

Here is the function:

USE *myDBname*  GO  SET ANSI_NULLS ON  GO  SET QUOTED_IDENTIFIER ON  GO  ALTER FUNCTION [dbo].[utvf_SPlit] (@String VARCHAR(MAX), @delimiter CHAR)    RETURNS @SplitValues TABLE  (      Asset_ID VARCHAR(MAX) NOT NULL  )    AS  BEGIN              DECLARE @FoundIndex INT              DECLARE @ReturnValue VARCHAR(MAX)                SET @FoundIndex = CHARINDEX(@delimiter, @String)                WHILE (@FoundIndex <> 0)              BEGIN                    DECLARE @NextFoundIndex INT                    SET @NextFoundIndex = CHARINDEX(@delimiter, @String, @FoundIndex+1)                    SET @ReturnValue = SUBSTRING(@String, @FoundIndex,@NextFoundIndex-@FoundIndex)                    SET @FoundIndex = CHARINDEX(@delimiter, @String)                    INSERT @SplitValues (Asset_ID) VALUES (@ReturnValue)              END                RETURN  END  

No comments:

Post a Comment

Search This Blog