Sunday, October 6, 2013

[how to] best approach to keys, if data from separate DBs to be presented in same UI

[how to] best approach to keys, if data from separate DBs to be presented in same UI


best approach to keys, if data from separate DBs to be presented in same UI

Posted: 06 Oct 2013 07:17 PM PDT

Suppose we are creating databases containing information about monkeys, and a user interface to present monkey information to users.

We have two separate databases. The schemas are identical (and are at least intended to remain so) but they are on different machines and have information on different monkeys. This is inconvenient but unavoidable.

The user interface we're building queries both databases and presents all the monkeys in the same filterable HTML table to the user.

What's the best approach to generating IDs for monkeys? Is it acceptable to prefix the ID according to which database the monkeys are in?

The Wikipedia dump format [on hold]

Posted: 06 Oct 2013 09:30 AM PDT

Wikipedia dump files are available in XML format. (http://dumps.wikimedia.org/fawiki/latest/)

Is there any point in building sql dump in XML format? The dump files are more bulky in XML than sql.

Software and hardware solutions required to setup MYSQL database(Wikipedia DB) size of 25 TB [on hold]

Posted: 06 Oct 2013 11:01 AM PDT

I have a situation regarding the Wikipedia Database Dumps population multiple terabytes in size, so Its mandatory to discuss with community at this point of time. Please see the following description.

Background: The situation,i have is to download the data that contains all wikipedia articles and its complete revisions . Wikipedia complete Revisions are in Terabytes

I've to download all the xml files in 7z format that contain "pages-meta-history" in name for complete revisions.

Problem: Now, I'm setting up the MYSQL database from wikipedia XML dumps. If we've to import all revision dumps in MYSQL DB then it may size to 22 - 25 terabytes Approx.

This size is too huge, what is workaround for it ??

**If it is necessary to dumps all xmls to DB then i think linux only support up to 16 TB maximum. SO how we can import into single MYSQL server deployed on linux OS.

I need advice of following findings:

  1. MYSQL DB can manage such large data(Terabytes in size) efficiently ?.

  2. Clusters formation (Hadoop) solution is feasible ? How other organizations manage such large data e.g. wikipedia (Non profit organizations).. I think not all org able to buy too expense servers..

  3. We can't buy expensive server machines that have terabytes storage space built in, So what is solution in term of hardware and software ? e.g. following are two solutions that we explored are they workable::

Solution 1: cluster of 4 PC's corei3 each has 4 hard disk (4 * 2 TB).. and we make data server via Open NAS and NFS and mount it with linux directory...

Solution 2: configure a corei7 Server machine with Hardware Raid controller and software Raid that can support 16 TB or up..

`

Statistics:

Files to download in quantity :: 158   Download Size of Files 7z compressed format:: 66.93 GB  Extracted size of these files (XML to SQL conversion)::22.7 TB   Estimated time to Import it in MYSQL ::25 -30 Days Actual (Done experiments based on stack overflow community DB performance suggestions)  

DB Performance Suggestion 1: MYSQL Insert Huge SQL Files of GB in Size

DB Performance Suggestion2: MYSQL Huge SQL Files Insertion | MyISAM speed suddenly slow down for Insertions (strange issue)

Note: I posted this question as researcher on Wikipedia, working in university research lab, so this is not commercial\corporate sector problem. As student i asked so that it helps me and other researchers too.. We can't spend too much money to buy expensive terabyte servers, we want a software and hardware solution e.g. cluster programming etc.. that is affordable for students using some tweaks.

Why does @@dbts increase after backup/restore?

Posted: 06 Oct 2013 03:56 AM PDT

I have a SQL Server 2005 database, containing some tables which have a Timestamp (or RowVersion) column. I need to know the value of the current timestamp, so I use the following query: SELECT CAST(@@dbts AS BIGINT);

This will return, for example, 10505.

Immediately afterwards, without updating, inserting, ... anything, I do a BACKUP DATABASE and a RESTORE DATABASE and I run the SELECT query again. Only this time, the result is 14000, while none of the timestamps in the tables have increased.

Why/how does this happen?

How to DROP multiple tables from databases?

Posted: 06 Oct 2013 02:00 PM PDT

I have situation where someone mistakenly execute script for creating tables under wrong database.

Let's say we have database DB1 which is some test database, and we have to create other database DB2. Script for creating tables is intended for DB2, but someone execute it under DB1.

Now my database DB1 has its own correct tables, and other incorrect tables. I know how to write a select query to view them, and I know that all wrongly created tables were created on a specific date, let's say 2009-12-12. But the tables are related with foreign keys, and I can't just drop them in the order that they appear, I must first to remove all foreign keys and then drop all of the tables.

Can someone suggest how to do that?

I can do it with cursor, but that is not elegant solution.

Mysql Getting Access denied for user 'root'@'localhost' (using password: YES) constantly in log

Posted: 06 Oct 2013 05:25 AM PDT

I have MonYog running and it is monitoring my DB Server. It gives me constant alerts that the number of refused connections has exceeded the threshold etc. Looking in the LySQL.Log I can see constant login failures for root@localhost.

This is on a hosted server out in the ether so I have been as strict as possible with access etc. Firstly, the 3306 port on the firewall is open only to a couple of selected IP addresses. All other users (for dev and testing etc) use SSH. Secondly, there is no 'root' user. I renamed the root account to something else. This new named account only has localhost access.

So from the above, I just dont see how any login attempts are getting to the DB from localhost. The 3306 port is closed to all but one IP address (mine) and I am not attempting any connect to the server all through the night as the logs are showing. And anyway, I never connect as the (new) root dba user unless I need to do some special admin task.

Is there any way of finding out where these login attempts are coming from (IP address, host name, application name etc...).

Or am I looking in the wrong place. I just cant work out where they are coming from but I am convinced it cant be good!

Thanks for your help Mark

inner joins with where clause for no values?

Posted: 06 Oct 2013 09:25 AM PDT

I have 2 tables :

Table1

  Parent(varchar) Child (varchar)  

Table2

  C1(varchar)    PC (varchar)  

Sample data:

enter image description here

Requirement - I need Table2.C1 values for which column Table2.PC = Table1.Child , but Child values must be different from Table1.Parent's values..

I'm using below query in mysql:

select distinct(C1),Child,PC     from Table2 inner join Table1       on Table2.PC=Table1.Child    where Table1.Child not in (select Parent from Table1);  

It is giving empty set, but there are values in Child which is same as in PC, but not in Parent....

Where I'm getting wrong?

Queryplan changes depending on filter values

Posted: 06 Oct 2013 01:25 PM PDT

I created a clustered index on a table expecting it to make the queries with ranges perform better, but, different values in the where clause can produce differente query plans, one uses the clustered index and one does not.

My question is: What can I do to make the DBMS use the better query plan? Or better yet, should I change my schema to something better?

Details:

  • I'm using Amazon's RDS (Server version: 5.5.31-log)
  • I executed optimize table on each table (expecting it to "rebuild" the clustered index and reset the statistics), sometimes it does not change anything, sometimes the DBMS will use worse query plans, sometimes makes it faster because it will use the clustered index.
  • explain extended followed by a show warnings did not produce anyting interesting/useful
  • I'm aware of index hinting. I tested it and the query plan used the primary index but I don't know if it always works, also, I'm using django and django's ORM does not support index hinting, so a solution that did not require it would be nice.

Tables:

-- DDL  create table api_route (     id int(11) not null auto_increment primary key,     origin_id int(11) not null,     destination_id int(11) not null,     group_id int(11) not null,     foreign key (origin_id) references api_area (id),     foreign key (destination_id) references api_area (id),     foreign key (group_id) references api_group (id)  ) engine=innodb, collate=utf8;    create table api_area (    id int(11) not null auto_increment primary key,    name varchar(50) not null  ) engine=innodb, collate=utf8;    create table api_location (    id int(11) not null auto_increment primary key,    area_id int(11),    foreign key (area_id) references api_area (id)  ) engine=innodb, collate=utf8;    create table api_locationzip (     location_ptr_id int(11) not null,     zip_start int(11) not null,     zip_end int(11) not null,     foreign key locationzip_to_location (location_ptr_id) references api_location (id),     constraint primary key using btree (zip_start, zip_end, location_ptr_id)  ) engine=innodb, collate=utf8;    create table api_locationkey (    location_ptr_id int(11) not null,    key varchar(10) not null,    foreign key locationkey_to_location (location_ptr_id) references api_location (id)  ) engine=innodb, collate=utf8;  

Query:

An area has many locations, every location has either a zip or key.

select * from    api_route,    api_area origin,    api_area destination,    api_location location_origin,    api_location location_destination,    api_locationzip origin_zip,    api_locationzip destination_zip  where    api_route.group_id IN (1,2,3,...) and    -- filter origin by zip code    api_route.origin_id = origin.id and    origin.id = location_origin.area_id and    location_origin.id = origin_zip.location_ptr_id and    origin_zip.zip_start <= <zipcode_origin> and    origin_zip.zip_end >= <zipcode_origin> and    -- filter destination by zip code    api_route.destination_id = destination.id and    destination.id = location_destination.area_id and    location_destination.id = destination_zip.location_ptr_id and    destination_zip.zip_start <= <zipcode_destination> and    destination_zip.zip_end >= <zipcode_destination>  limit 100  

Execution plans:

Here is an explain of a slow query (~1.6s):

*************************** 1. row ***************************             id: 1    select_type: SIMPLE          table: destination           type: index  possible_keys: PRIMARY            key: api_area_group_id_599f0627e68b9613_uniq        key_len: 156            ref: NULL           rows: 3794          Extra: Using index  *************************** 2. row ***************************             id: 1    select_type: SIMPLE          table: api_route           type: ref  possible_keys: api_route_0261d0a2,api_route_8de262d6            key: api_route_8de262d6        key_len: 4            ref: master.T6.id           rows: 9          Extra:   *************************** 3. row ***************************             id: 1    select_type: SIMPLE          table: origin           type: eq_ref  possible_keys: PRIMARY            key: PRIMARY        key_len: 4            ref: master.api_route.origin_id           rows: 1          Extra:   *************************** 4. row ***************************             id: 1    select_type: SIMPLE          table: location_origin           type: ref  possible_keys: PRIMARY,api_location_a4563695            key: api_location_a4563695        key_len: 4            ref: master.origin.id           rows: 39          Extra: Using where; Using index  *************************** 5. row ***************************             id: 1    select_type: SIMPLE          table: origin_zip           type: ref  possible_keys: PRIMARY,locationzip_to_location             key: locationzip_to_location         key_len: 4            ref: master.location_origin.id           rows: 1          Extra: Using where; Using index  *************************** 6. row ***************************             id: 1    select_type: SIMPLE          table: location_destination           type: ref  possible_keys: PRIMARY,api_location_a4563695            key: api_location_a4563695        key_len: 4            ref: master.destination.id           rows: 39          Extra: Using index  *************************** 7. row ***************************             id: 1    select_type: SIMPLE          table: destination_zip           type: ref  possible_keys: PRIMARY,locationzip_to_location             key: locationzip_to_location         key_len: 4            ref: master.location_destination.id           rows: 1          Extra: Using where; Using index  7 rows in set (0.00 sec)  

Here is the explain of a fast query (~100ms):

*************************** 1. row ***************************             id: 1    select_type: SIMPLE          table: destination_zip           type: range  possible_keys: PRIMARY,locationzip_to_location             key: PRIMARY        key_len: 4            ref: NULL           rows: 119268          Extra: Using where; Using index  *************************** 2. row ***************************             id: 1    select_type: SIMPLE          table: location_destination           type: eq_ref  possible_keys: PRIMARY,api_location_a4563695            key: PRIMARY        key_len: 4            ref: master.destination_zip.location_ptr_id           rows: 1          Extra:   *************************** 3. row ***************************             id: 1    select_type: SIMPLE          table: api_route           type: ref  possible_keys: api_route_0261d0a2,api_route_8de262d6            key: api_route_8de262d6        key_len: 4            ref: master.location_destination.area_id           rows: 9          Extra:   *************************** 4. row ***************************             id: 1    select_type: SIMPLE          table: origin           type: eq_ref  possible_keys: PRIMARY            key: PRIMARY        key_len: 4            ref: master.api_route.origin_id           rows: 1          Extra:   *************************** 5. row ***************************             id: 1    select_type: SIMPLE          table: location_origin           type: ref  possible_keys: PRIMARY,api_location_a4563695            key: api_location_a4563695        key_len: 4            ref: master.origin.id           rows: 39          Extra: Using where; Using index  *************************** 6. row ***************************             id: 1    select_type: SIMPLE          table: origin_zip           type: ref  possible_keys: PRIMARY,locationzip_to_location             key: locationzip_to_location         key_len: 4            ref: master.location_origin.id           rows: 1          Extra: Using where; Using index  *************************** 7. row ***************************             id: 1    select_type: SIMPLE          table: destination           type: eq_ref  possible_keys: PRIMARY            key: PRIMARY        key_len: 4            ref: master.location_destination.area_id           rows: 1          Extra:   7 rows in set (0.00 sec)  

Edit: Added create table code and full query

How to build a database that contain only the delta from yesterday

Posted: 06 Oct 2013 07:25 PM PDT

I need to know what has been changed on my database since last night. Is it possible to extract this data from the LDF file and to build a new Database that contains the delta?

For example, let say I have a table for users and now, a new user was added and one of the users update his home address. I need to be able to build a new database that users table will contain two records 1. The new user (and to add a new column to know if it's new or update field) 2. The user that update his record (it will be nice to know which record has been update)?

BTW, I have to SQL servers that I can use (2008 and 2012)

Thanks In Advance

Why does Log Shipping .TRN file copy just stop

Posted: 06 Oct 2013 02:25 PM PDT

I apologize in advance for a long post but I have had it up to here with this error of having to delete LS configuration and starting it over for any DB thats got this error.

I have LS setup on 3 win2k8r2 servers(pri,sec,monitor) with 100 databases transactions backed up and shipped from the primary to secondary and monitored by monitor. Back ups and copies are run every 15min and then the ones older than 24hrs are deleted. Some DBs are very active and some not so much but shipped regardless for uniformity sake(basically to make secondary server identical to primary). Some DBs are for SP2010 and majority for inhouse app.

The issue is that after all LS configs are setup, all works well for about 3 to 4 days then i go to the Transaction LS Status report on the secondary, I see that randomly some LS jobs have an Alert Status because the time since last copy is over 45min so no restore has occured. This seems random and the only errors i see is from an SP2010 DB(WebAnalyticsServiceApplication_ReportingDB_77a60938_##########) which I belive is a reports db that gets created weekly and LS cannot just figure which the last copy to backup or to restore is. I posted here regarding that and i have yet to find a permanent solution. For my main error(time since last copy) i have not seen anything that could have caused that and i dont get any messages(even though some alert statuses have been ignored for 3 days). Anyway, I would really appreciate any input on understanding whats causing this and how i could fix it. Thanks.

Complicated join with where clause

Posted: 06 Oct 2013 04:25 AM PDT

I have four tables:

  • sales
  • sales_thumbs
  • sales_images
  • sales_sizes

sales table:

+--------------+---------------------+------+-----+---------+----------------+  | Field        | Type                | Null | Key | Default | Extra          |  +--------------+---------------------+------+-----+---------+----------------+  | sale_id      | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |  | store_id     | bigint(20) unsigned | NO   |     | NULL    |                |  | store_name   | varchar(100)        | NO   |     | NULL    |                |  | sale_url     | varchar(255)        | NO   | UNI | NULL    |                |  | headline     | varchar(200)        | NO   |     | NULL    |                |  | description  | text                | NO   |     | NULL    |                |  | category     | varchar(100)        | NO   |     | NULL    |                |  | sub_category | varchar(100)        | NO   |     | NULL    |                |  | brand        | varchar(100)        | NO   |     | NULL    |                |  | gender       | varchar(5)          | NO   |     | NULL    |                |  | full_img_url | varchar(240)        | NO   |     | NULL    |                |  | onsale       | int(1)              | NO   |     | NULL    |                |  | soldout      | int(1)              | NO   |     | 0       |                |  | created_at   | datetime            | NO   |     | NULL    |                |  | updated_at   | datetime            | NO   |     | NULL    |                |  | parsed       | int(1)              | NO   |     | 0       |                |  +--------------+---------------------+------+-----+---------+----------------+  

sales_thumbs:

+------------+---------------------+------+-----+---------+-------+  | Field      | Type                | Null | Key | Default | Extra |  +------------+---------------------+------+-----+---------+-------+  | sale_id    | bigint(20) unsigned | NO   | MUL | NULL    |       |  | thumb_name | varchar(240)        | NO   | PRI | NULL    |       |  +------------+---------------------+------+-----+---------+-------+  

sales_images:

+------------+---------------------+------+-----+---------+-------+  | Field      | Type                | Null | Key | Default | Extra |  +------------+---------------------+------+-----+---------+-------+  | sale_id    | bigint(20) unsigned | NO   | MUL | NULL    |       |  | image_name | varchar(240)        | NO   | PRI | NULL    |       |  +------------+---------------------+------+-----+---------+-------+  

sales_sizes:

+---------+---------------------+------+-----+---------+-------+  | Field   | Type                | Null | Key | Default | Extra |  +---------+---------------------+------+-----+---------+-------+  | sale_id | bigint(20) unsigned | NO   | MUL | NULL    |       |  | size    | varchar(10)         | NO   |     | NULL    |       |  | country | varchar(20)         | NO   |     | NULL    |       |  +---------+---------------------+------+-----+---------+-------+  

I'm looking to build a query that let's me:

  • SELECT * FROM sales WHERE [something] GROUP BY created_at LIMIT [something] OFFSET [something]
  • JOIN sales_thumbs and sales_images ON sale_id
  • (most importantly) JOIN sales_sizes ON sale_id WHERE sales_sizes.size = [size]

But here's the catch: I want to return ALL sales_sizes.size for a unique sale_id that has a specific size among all its sizes.

For example if sale_id = 8655 has the following sizes, I want to return all sizes if it has (for example) 'm' as one of its sizes...

mysql> select * from sales_sizes where sale_id = 8655;  +---------+------+---------+  | sale_id | size | country |  +---------+------+---------+  |    8655 | s    | eu      |  |    8655 | m    | eu      |  |    8655 | l    | eu      |  +---------+------+---------+  

Extra info:

I'm building the query from a GET string, which could look something like this:

www.mysite.com?category=shirts&gender=women&size=m&page=2

Hope someone can help

Thanks

Optimize UNION query in MySQL

Posted: 06 Oct 2013 03:25 AM PDT

I have a problem with a UNION query in MySQL. We have 10 millions players on our website and we would like to select players with a multi-criterias system. For exemple, selecting US people, men, more than 35 years of age.

We are using "vertical partionning": 1 table per criter. For example:

* user_country  - id_user  - id_country  

We would like to do this kind of query:

SELECT id_inscri FROM userdata_langue  WHERE id_langue='43'    UNION  SELECT id_inscri FROM userdata_sexe  WHERE sexe='2'    UNION  SELECT id_inscri FROM userdata_nb_jour  WHERE nb_jour>='31'    UNION  SELECT id_inscri FROM userdata_last  WHERE last<='2013-04-07'    AND last>='2013-04-03' ;  

How do I optimize that?

----- More details

Explain output of the query:

id  select_type table   type    possible_keys   key key_len ref rows    Extra  1   PRIMARY userdata_langue ref id_langue   id_langue   1   const       398846  Using index  2   UNION   userdata_sexe   ref sexe    sexe    1   const   1667137 Using index  3   UNION   userdata_nb_jour    range   nb_jour nb_jour 2   NULL    5830    Using where; Using index  4   UNION   userdata_last   range   last    last    3   NULL    371614  Using where; Using index  NULL    UNION RESULT    <union1,2,3,4>  ALL NULL    NULL    NULL    NULL    NULL  

SHOW CREATE TABLE

Table   Create Table  userdata_langue CREATE TABLE `userdata_langue` (   `id_inscri` bigint(20) NOT NULL,   `id_langue` tinyint(3) unsigned NOT NULL,   PRIMARY KEY (`id_inscri`),   KEY `id_langue` (`id_langue`)  ) ENGINE=InnoDB DEFAULT CHARSET=latin1  

Execution of a java program by a trigger

Posted: 06 Oct 2013 02:25 AM PDT

Can an update trigger execute a java program?

I have a table T1 with a column named Flag. Whenever Flag changes, I want to run a trigger that results in an execution of a java program.

Twice job results in sql server, last one is left in progress

Posted: 06 Oct 2013 07:25 AM PDT

I have the job MP - Create CSV which execute a package SSIS in sql server 2008, when I check for the job history I see the result at step 0, I expand and I have 2 results for the step 1.
Fist result have the icon for Success with the result :

Message
Executed as user: companyname\sa. The step succeeded.

The second one have the step 1 also, but with the icon for In Progress with the result :

Message
Microsoft (R) SQL Server Execute Package Utility
Version 10.50.4000.0 for 64-bit
Copyright (C) Microsoft Corporation 2010. All rights reserved.

Started: 11:50:00 PM
DTExec: The package execution returned DTSER_SUCCESS (0).
Started: 11:50:00 PM
Finished: 11:50:26 PM
Elapsed: 26.645 seconds

My problem is I have a report to show all the job with their status not equal to 1 (success), my report do not work anymore because of this result "In progress".
Here is the sql I use to run my report :

SELECT *  from  msdb..sysjobhistory WHERE run_status != 1  

My questions are why do I have 2 results for my step ? I think I should have only one. And what should I do when I have a job with a step status left "In progress" ?

For more information, my job is running daily and succeed every time, thank you.

An admin (GUI or shell tool) for managing all kinds of databases

Posted: 06 Oct 2013 09:21 PM PDT

I needed an interface/admin (web or desktop) from which I could manage all the databases in my system. Currently I use separate admins for each (phpmyadmin for mysql, pgadmin/phppgadmin for postgres, sqlite manager in firefox for sqlite etc). Whenever I have to work with all the databases, the only option I am left with is shell access to all the databases.

I wanted to know if there is an admin which can manage all kinds of databases in a single interface, possibly similar to the one used by jennifer widom in her coursera lectures?

Can 2 tables lock each other if concurrent read/update do not affect same rows? pagelocks off

Posted: 06 Oct 2013 01:25 AM PDT

Can 2 tables cause a deadlock, if the pagelocks are off, and the rows are not related. E.g.

Query 1 runs for a few seconds

Select * from Orders where CustomerID = 1 and DateOrdered between @D1 and @D2  

Query 2 starts to run, and ends, before Query 1 is finished, but with a different customerID

Update Orders set Quantity = Quantity + 10 where OrderID = 20 and CustomerID = 2

(These are just examples to prove a point please)

Now there is a PK_Orders which is a clustered index on the OrderID. There is also a separate index (ix_Order_Customer) which is a non-clustered index, and contains CustomerID as one of the fields to index. This index has pagelocks enabled.

What I don't understand is how SQL (with traceID 1222) showed me in SQL Server logging that the PK_Orders caused a deadlock when the query which was executed did not even affect the same OrderIDs which were updated.

Extract from Log File Viewer:

 keylock hobtid=720229632 dbid=5 objectname=myDB.dbo.Orders   indexname=PK_Orders id=lock1b0b23c0 mode=X associatedObjectId=720229632  

and

 pagelock fileid=1 pageid=195848 dbid=5 objectname=myDB.dbo.Orders id=lock24442a40   mode=S associatedObjectId=720229274  

Can someone help explain how this is possible please, should the Deadlock not only kick in if there were intersecting rows here? I copied the 2 queries and ran them at the same time, yet I cannot recreate the error and in SQL Server Management Studio this error does not occur - the update goes through.

The error in SQL Event Log Viewer specifically says that PK_Orders is the cause here.

MySQL incredibly slow on WAMP and XAMPP

Posted: 05 Oct 2013 11:25 PM PDT

I've installed WAMP on my new Windows 8 machine and for some reason when I use PHPMyAdmin to import an sql file (it's only about 5mb), it takes several minutes before timing out. And then when I check I can see it's only imported a fraction of the tables.

I've tried other sql files and it's the same story. These same sql files import perfectly fine (and in a few couple of seconds) on my Windows 7 machine also running WAMP.

I've since uninstalled WAMP and tried XAMPP, and the problem still exists. Even just browsing the databases with PHPMyAdmin takes a long time between page loads.

I am guessing it's a problem with MySQL. Any suggestions would be helpful.

No comments:

Post a Comment

Search This Blog