Friday, September 27, 2013

[how to] Optimizing join between big tables

[how to] Optimizing join between big tables


Optimizing join between big tables

Posted: 27 Sep 2013 05:26 PM PDT

I have a server on my pc and I need information across several large tables (~ 20MM each). I installed WAMP on my windows 7 and use mysql for this.

No idea how I can optimize the configuration of mysql to get faster queries. Lately it takes hours to do a join between two tables of approximately 22MM records.

  • Only 1 user (me) use this server.
  • My tables are with indices and ~20 columns.
  • Processor: i7
  • RAM: 12gb
  • HD: common
  • MYSQL 5.6.12

How could i optimize them?

What are concepts need to Learn for MSSQL DBA? [on hold]

Posted: 27 Sep 2013 02:00 PM PDT

I am MSSQL DBA. I have gone through basic concepts ,now i am decided to lead my career in MSSQL DBA. Please let me know what are the concepts i should learn for successful SQL DBA. Please help me on this.

MySQL possibility to create global routines (stored procedures and/or functions)

Posted: 27 Sep 2013 01:32 PM PDT

Is it possible to somehow define globally available routines? It seems like every routine must be created in a scope of the database.

When I tried to create a routine from console (without prior issuing use dbname) I am getting an error:

ERROR 1046 (3D000): No database selected

We have tons of identical databases (data is different) and the goal is to create some triggers for some tablenames. But we want to run only one routine so we don't have to create those routines for every database (since they are identical, routines would work the same for each database).

Multiple system.replset collections in local

Posted: 27 Sep 2013 01:03 PM PDT

I have a replicaset that's showing multiple system.replset collection in the local db.

replset1:PRIMARY> show collections;  me  oplog.rs  replset.minvalid  slaves  system.indexes  system.profile  system.replset  system.replset  

Running db.system.replset.find() returns the documents from both with out error.

They are different versions, one showing the current membership, another an older setup. I haven't seen this before. Is this expected?

I'm not really seeing any problems from the mongo or application side, this just seems strange. The duplicates are present on the primary and secondaries.

Should .remove({"version":}) be safe?

MySQL query optimization with aggregate ORDER BY

Posted: 27 Sep 2013 01:01 PM PDT

I have an analytics table that stores metrics aggregated the day level. I need to ask questions such as "For a given date range and client, what are the most popular items."

The schema is as follows:

CREATE TABLE `t1` (      `imps` bigint(20) DEFAULT NULL,      `n_id` int(11) NOT NULL,      `b_id` bigint(20) NOT NULL,      `date` date NOT NULL,      `m1` bigint(20) NOT NULL,      `m2` bigint(20) NOT NULL,      `is_b` tinyint(1) NOT NULL,      `i_id` varchar(128) NOT NULL,      KEY `idx1` (`b_id`,`date`,`i_id`,`n_id`)  ) ENGINE=InnoDB DEFAULT CHARSET=utf8  

(This table is split into 256 partitions by HASH(b_id), in case that matters)

A representative query:

SELECT i_id, n_id, SUM(m1), SUM(m2)  FROM t1  WHERE b_id=1234 AND date>='2013-06-01' AND date<='2013-08-31'  GROUP BY i_id, n_id  ORDER BY (SUM(m1) + SUM(m2)) DESC  LIMIT 10 OFFSET 0  

And the EXPLAIN output:

+----+-------------+-------+------+---------------+------+---------+-------+-------+----------------------------------------------+  | id | select_type | table | type | possible_keys | key  | key_len | ref   | rows  | Extra                                        |  +----+-------------+-------+------+---------------+------+---------+-------+-------+----------------------------------------------+  |  1 | SIMPLE      | t1    | ref  | idx1          | idx1 | 8       | const | 1395772 | Using where; Using temporary; Using filesort |  +----+-------------+-------+------+---------------+------+---------+-------+-------+----------------------------------------------+  

On a table with ~110M records these queries take > 10s to execute. I believe the SUM function in the ORDER BY clause forces the entire resultset satisfied by the WHERE clause to be written to disk, hence the "Using temporary; Using filesort" messages.

Do I have any recourse in improving the performance of these queries short of redesigning the schema and application?

Best practices for backing up a MySQL DB

Posted: 27 Sep 2013 12:27 PM PDT

I've recently discovered that our production web servers that run off MySQL are not being backed up regularly (or at all). I'm used to backing up SQL Server DB's but don't have a ton of experience with MySQL DB's. Any best practices for using 'mysqldump' or any other DB backup tools?

I'll probably cron job the schedule so that it's done nightly and then backup the files with my backup system.

Thanks.

Simple table normalisation

Posted: 27 Sep 2013 05:30 PM PDT

In raw form, I have data with 2 columns: skills in welsh and stats for that skill. For storing these information in a database, would it be best to combine the stats with the categories, or split them up as below?

WELSH_SKILLS_TABLE  ----------------------------------------------------  |  SKILL_ID   |                SKILL  ----------------------------------------------------  |      0      |            CAN'T SPEAK  ----------------------------------------------------      WELSH_STATS_TABLE  ----------------------------------------------------  |  SKILL_FK   |                COUNT  ----------------------------------------------------  |      0      |                 235  ----------------------------------------------------  

Instinctively, the laid out method seems to be correct, but for such a simple task, also seems like overkill?

Thanks!

How can I verify that MySQL's slave_compressed_protocol is working

Posted: 27 Sep 2013 12:39 PM PDT

I've set slave_compressed_protocol=1 on both my master and slave.

Using SHOW VARIABLES, I can see that it's set.

How can I check that it's actually working?

Production (20GB+) Database running slowly and timing out

Posted: 27 Sep 2013 01:19 PM PDT

I know the title thread is a bit ambiguous (and my first post here!) but I am having a very severe production database problem. I am not a DBA, though I know how to work with queries and SQL Manager. The database is constantly under load, as there's a constant input and output. The problems started about a week ago and I've done everything I know to get it working... from Rebuilding Indexes to Updating Statistics to Shrinking DB. As an example of the recent problem I used to run a daily backup at 5PM and it took 20-25 minutes, now its taking over an hour. Some days the thing will work just fine, for example yesterday we didn't have an issue and today its all over the place. Its a 20+GB Database running in a VM with Windwos Server 2008R2, dual Xeon @ 2.4GHZ and 12GB of RAM plus a partition for OS, one for DB and one for backups. As one of the measures I am taking I created a separate Log partition to move the log from the DB HardDisk to its own separate HDD. Haven't yet done it as I need to detach and take everything offline. I checked the fragmentation for the main table (5+ million entries) and it was over 99%... this was AFTER rebuilding the index.

At this point, I am just stuck. The application users use is timing out, transactions not going in or out (financial institution) sometimes. I am constantly monitoring the performance, and heck even the SQL Server Activity Monitor timed out for me a few moments ago. Any suggestions and what else should I do (aside from separating the log and DB) would be greatly appreciated.

EDIT: Server RAM Usage just in case: ram[1]

EDIT2: Code used for rebuilding Index

  ALTER INDEX ALL ON DailyTransactions  REBUILD WITH (FILLFACTOR = 80, SORT_IN_TEMPDB = ON,            STATISTICS_NORECOMPUTE = On);  

Sharing a single primary key sequence across a database?

Posted: 27 Sep 2013 07:09 PM PDT

Is it an acceptable practice to use a single sequence as a primary key across all tables (instead of a primary key being unique for a given table, it is unique for all tables)? If so, is it objectively better than using a single primary key sequence across tables.

I'm a junior software developer, not a DBA, so I am still learning many of the basics of good database design.

Edit: In case anyone is wondering, I recently read a critique of a database design by one of our company's DBAs who mentioned it was a problem that the design didn't use a single primary key across the entire database, which sounded different than what I've learned so far.

Edit2: To answer a question in the comments, this is for Oracle 11g, but I was wondering on a non-database specific level. If this question does depend upon the database, I would be interested to know why, but in such a case I would be looking for an answer specific to Oracle.

Trigger - Add to a date using a subquery

Posted: 27 Sep 2013 01:05 PM PDT

(postgresql 9.2) I'v 2 tables

Table 1 (extinguisher)

  extinguisher_id (serial)  type_designation_extinguisher_type(text)  manufacturing_date (date)  life_date(date)  

Table 2 (extinguisher_designation)

  extinguihser_type(serial)  extinguisher_designation(text)  lifespan(int)   

lifespan can take values like: 10 or 20 and it represents years

My objective is to do a trigger that will automatically populate table1.life_date based on the extinguisher_type using table2.

The problem is that lifespan is an int and can change. The usual way to add year to a date is using date + interval '10 year'

The problem is that I want a variable instead of the 10.

At this point i'v this:

CREATE OR REPLACE FUNCTION ext_life_span()    RETURNS trigger AS    $BODY$    DECLARE  new_date timestamptz;  manu_date char(10);  intervalo char(10);  BEGIN  IF (TG_OP = 'INSERT') THEN  intervalo := to_char((SELECT lifespan FROM extinguisher_type, extinguisher where type_designation = type_designation_extinguisher_type), '99999');  manu_date := to_char(New.manufacturing_date, 'YYYY-MM-DD');  new_date := (date 'manu_date' + integer 'intervalo' );  update extinguisher set life_date = new_date;  RETURN NULL;  ELSIF (TG_OP = 'UPDATE') THEN  IF new.manufacturing_date IS DISTINCT FROM old.manufacturing_date THEN  intervalo := to_char((SELECT lifespan FROM extinguisher_type, extinguisher where type_designation = type_designation_extinguisher_type), '99999');  manu_date := to_char(New.manufacturing_date, 'YYYY-MM-DD');  new_date := (date 'manu_date' + integer 'intervalo' );  update extinguisher set life_date = new_date;  END IF;  RETURN NULL;  END IF;  END;  $BODY$  LANGUAGE plpgsql VOLATILE  COST 100;  ALTER FUNCTION ext_life_span() OWNER TO postgres;    CREATE TRIGGER insert_lifespan  AFTER INSERT OR UPDATE OF manufacturing_date  ON "extinguisher"  FOR EACH ROW  EXECUTE PROCEDURE ext_life_span();  

I'm getting an error: invalid input syntax for type date: "menu_date"

Am I doing the best approach? Is there an easier way to do it?

Backing up Oracle home directory while instances are running

Posted: 27 Sep 2013 10:12 AM PDT

Can I take a backup of Oracle home directory (with cp -r) before upgrade while the instances using the home are running or is it required to shutdown instances first?

Trace files/alert logs are in different directory structure and all data files, redo log files, control files and spfiles are in ASM.

Indexing a reference column

Posted: 27 Sep 2013 12:45 PM PDT

Say I have a reference table like languages with only 20 records. Is it worth indexing a field like language_code (e.g. en, es) that's frequently used in queries? I'm wondering if an efficient server would effectively be indexing columns like this automatically.

This is specifically for MariaDB with InnoDB engine.

How can I get a list of tables contained in a database diagram?

Posted: 27 Sep 2013 12:39 PM PDT

I need to get a list of the tables contained in a diagram in SQL Server. How can I do it using an SQL script?

I don't need a list of tables in the database, but the list of tables contained in a diagram of the database in SQL Server 2008r2/2012.

Options for Data Encryption in SQL Server 2008 R2 Standard Edition?

Posted: 27 Sep 2013 10:50 AM PDT

I'm helping a friend with setting up encryption of data on SQL Server 2008 R2 Standard edition. Upon original research I thought I could use TDE but did not realize that it was only available for Enterprise or DataCenter versions of SQL Servers. Upon further research into SQL Server 2008 R2 features I saw that it does allow for "Data encryption and key management" but I'm not sure what it means or how to implement it.

What is the most efficient and low cost method for data encryption? Should I do something through SQL Server or just use third party tools to encrypt the whole volume where DB and backups are? Also if someone can point to a way to use "Data encryption" that comes as feature in standard edition I would really appreciate. Every time I search for encryption on sql server I keep ending up on how to use TDE and in current scenario it is not feasible for the size of business to purchase Enterprise Edition.

Edit:
reason for encryption - HIPAA Compliance

What is causing Waiting for table level lock errors?

Posted: 27 Sep 2013 01:23 PM PDT

We got the database hanging twice already and trying to find a cause.

show processlist  Waiting for global read lock | INSERT INTO {myisam_table} ...  

In here the disk space was full so we thought the problem was over after giving it some more but the next day at midday it hanged again:

show processlist  Waiting for table level lock | UPDATE {myisam_table} ...   

What could be causing it?

Mysql Default Engine: InnoDB.

Database has a mixture of tables with both MyISAM and InnoDB engines.

Log posted here:

http://arturito.net/2013/08/28/mysql-waiting-for-table-level-lock-errors/

Does TRIGGER improve the performance?

Posted: 27 Sep 2013 04:38 PM PDT

I understand that TRIGGERs are useful, as they take care of subsequent queries, but does they have an advantage from performance point of view too?

Replacing two or three queries as

INSERT INTO table1 ...  UPDATE table2 ...  UPDATE table3 ...  

with a trigger-based query as

INSERT INTO table 1  

with trigger

CREATE TRIGGER test  AFTER INSERT ON table1  FOR EACH ROW BEGIN  UPDATE table2 ... WHERE id=NEW.id;  UPDATE table3 ... WHERE id=NEW.id;  END  
  1. Does we still have the same three queries with identical performance? I mean does it matter if we perform the subsequent queries or the TRIGGER does?

  2. What if we use an API (like conducting queries through PHP). The subsequent queries are internal with no need to connector/driver. Does the TRIGGER improve the performance?

Additional Information: The database is mysql 5.5 with innoDB.

SQL Server Min Server Memory

Posted: 27 Sep 2013 10:50 AM PDT

using SQL 2008 R2 with SP2 with a server with 64 GB of RAM. I have set max server memory to 58GB considering SQL Server is limited to only database engine related service only. I have a question related to setting min server memory

If I set min server memory to 48 GB does that mean

a) If windows sends low memory notification to SQL -- SQL server will trims all its pool and try to maintain memory usage up to 48 GB of memory?

b) In case of aggressive working set trimming by windows -- would it leave SQL server's memory usage to 48 GB setting a "floor" for SQL's memory usage.

c) Are there any recommendation on min server memory?

Searching the web -- internet has plenty of information for max server memory but only few place I read (Brent Ozar blog for configuration best practices) to set min server memory to 50% of total available RAM on the server.

database structure - complicated requirements

Posted: 27 Sep 2013 01:07 PM PDT

I have a project to build a website but it's complicated and I'm having trouble figuring out what the best way to build the database would be to handle these particular requirements.

The site is for a local builders and farmers (and anyone else who uses heavy equipment) to rent their machinery amongst themselves. Users should be able to sign up and list an item of equipment which is then searchable and bookable by other users of the site.

So a builder might sign-up and upload a listing for his concrete mixer. Then another user can search for concrete mixers to hire between 2 dates and place a booking for the mixer through the site.

So far so good.

Problem is that the builder should be able to set a default per-day rate but they should also be able to say that through-out the month of July, or on the last two weekends in August the mixers default daily rate is different. So basically everyday could end up having a different rate and I'm having trouble figuring out what is the most efficient way to structuring the database and how to calculate the total costs of renting for several days if there's potentially a different rate every day.

At the moment I'm imaging having to loop through a 365 sized array but that can't be right. I'm a bit new to this so I'm probably just confused.

Issue after moving the ib_logfile1 and ib_logfile0 files

Posted: 27 Sep 2013 01:03 PM PDT

I wanted to increase the innodb_log_file_size for better performance. And I increased it from the default value to 256 MB. As usual I got the error saying InnoDB: Error: log file ./ib_logfile0 is of different size 0 5242880 bytes. So I moved these files to different location and restarted the MySQL server. It worked. Before shutting down the MySQL, server I also made innodb_fast_shutdown = 0.

But I have a doubt, after MySQL shutdown can we expect mysql is an consistent state and all the data that were in the ib_logfile0 and ib_logfile1 have been written to data files. So that I can delete these files or should I do something else also.

And if the answer is MySQL will be in consistent state than why these files contain some data even after graceful shutdown and start up with innodb_fast_shutdown = 0.

[EDIT details] giving the steps i have done

Steps i followed:

  1. SET GLOBAL innodb_fast_shutdown = 0
  2. /etc/init.d/mysqld stop
  3. rm -f /var/lib/mysql/ib_logfile0
  4. rm -f /var/lib/mysql/ib_logfile1
  5. /etc/init.d/mysqld start

Is the above step is fine? if yes then how come after the clean shutdown and start up the log files contains some data. Ideally it should be empty.

SQL Server Replication: "ALTER TABLE ALTER COLUMN" is not propagated to subscribers

Posted: 27 Sep 2013 10:23 AM PDT

We are running SQL Server 2008 R2 SP1 as publisher & distributor, and SQL Server 2005 SP3 as subscriber. The replication of schema changes is activated, and the replication has been running for years, including frequent schema changes (new column, new constraints, etc).

The following instruction was sent on the publisher:

use myDatabase  alter table Tbl_TypeLignePaye        alter column natureTypeLignePaye nvarchar(12)  go  

where field natureTypeLignePaye was originally nvarchar(3) null

The query ran without errors on the main database. The result is the following:

  1. The field natureTypeLignePaye still appears as nvarchar(3) in the object browser
  2. The column properties screen indicates a nvarchar type with a length of 12
  3. The change was not propagated to the subscribers

Any idea on what is going on with this database?

Publisher: object browser window vs property window give incoherent data

field type and length

Is it possible to have extra tables in a Slave with MySQL Replication

Posted: 27 Sep 2013 03:23 PM PDT

As my title mention I have a Master and a Slave database.

Master if for operations data and my slave mainly for reporting stuff.

The issue is that I need to create extra tables on reporting that can't be on the master, but the way my replication is set (the simplest one mentioned by the official doc) at the moment, this breaks the replication system.

How could I add tables on the Slave without Master caring about it ? Is it even possible ?

Replication issue - CREATE SELECT alternative?

Posted: 27 Sep 2013 05:23 PM PDT

I've an MySQL 5.1 slave for our BI team.

They need to make some CREATE SELECT with big select queries (several million lines).

As CREATE SELECT is a DDL, if the replication attempts to update some rows in same tables than the SELECT statement, replication is blocked until the freeing of the CREATE SELECT.

Do you now a good non-blocking alternative to thoses CREATE SELECT statements?

I thought to an SELECT INTO OUTPUT FILE then LOAD DATA INFILE but they will fill out our disks as BI guys like to do... :)

Max.

How to drop a DB2 instance when the instance owner was removed

Posted: 27 Sep 2013 12:23 PM PDT

This is a real sticky situation. I was handed over a machine (running an AIX 7.1), and my first task was to re-install DB2 server on it. But someone before me had conveniently removed an instance owner account, and probably recreated it. Now, the problem is this:

1) When I try to uninstall DB2, it says the instance is active and has to be dropped first.

2) When I try to drop this instance, DB2 says there is no such instance.

I am quite new to DB2 administration. Not sure how to proceed here. Any help is appreciated

Thanks

Optimizing multi-table left joins

Posted: 27 Sep 2013 01:11 PM PDT

I have four tables which I am trying to join together.

Table event

+------------+------------------+------+-----+---------+-------+  | Field      | Type             | Null | Key | Default | Extra |  +------------+------------------+------+-----+---------+-------+  | sid        | int(10) unsigned | NO   | PRI | NULL    |       |  | cid        | int(10) unsigned | NO   | PRI | NULL    |       |  | signature  | int(10) unsigned | NO   | MUL | NULL    |       |  | timestamp  | datetime         | NO   | MUL | NULL    |       |  | is_deleted | tinyint(1)       | NO   | MUL | 0       |       |  +------------+------------------+------+-----+---------+-------+  

Indexes - sid,cid,timestamp,is_deleted

Table iphdr

+----------+----------------------+------+-----+---------+-------+  | Field    | Type                 | Null | Key | Default | Extra |  +----------+----------------------+------+-----+---------+-------+  | sid      | int(10) unsigned     | NO   | PRI | NULL    |       |  | cid      | int(10) unsigned     | NO   | PRI | NULL    |       |  | ip_src   | int(10) unsigned     | NO   | MUL | NULL    |       |  | ip_dst   | int(10) unsigned     | NO   | MUL | NULL    |       |  +----------+----------------------+------+-----+---------+-------+  

Indexes - sid,cid,(ip_src,ip_dst)

Table tcphdr

+-----------+----------------------+------+-----+---------+-------+  | Field     | Type                 | Null | Key | Default | Extra |  +-----------+----------------------+------+-----+---------+-------+  | sid       | int(10) unsigned     | NO   | PRI | NULL    |       |  | cid       | int(10) unsigned     | NO   | PRI | NULL    |       |  | tcp_sport | smallint(5) unsigned | NO   | MUL | NULL    |       |  | tcp_dport | smallint(5) unsigned | NO   | MUL | NULL    |       |  +-----------+----------------------+------+-----+---------+-------+  

Indexes - sid,cid

Table udphdr

+-----------+----------------------+------+-----+---------+-------+  | Field     | Type                 | Null | Key | Default | Extra |  +-----------+----------------------+------+-----+---------+-------+  | sid       | int(10) unsigned     | NO   | PRI | NULL    |       |  | cid       | int(10) unsigned     | NO   | PRI | NULL    |       |  | udp_sport | smallint(5) unsigned | NO   | MUL | NULL    |       |  | udp_dport | smallint(5) unsigned | NO   | MUL | NULL    |       |  +-----------+----------------------+------+-----+---------+-------+  

Indexes - sid,cid

The fields 'sid','cid' are common to all the tables and are primary keys in all. The first two tables have about 1M rows while the next two have about 600K and 200K rows respectively.

I need to find distinct IP links for not deleted events. So I write a query like

SELECT   inet_ntoa(iphdr.ip_src) AS src_ip, iphdr.ip_src AS src_ip_inet, inet_ntoa(iphdr.ip_dst) AS dst_ip, iphdr.ip_dst AS dst_ip_inet, count('*') AS total, count(DISTINCT event.signature) AS count_unique    FROM   event  INNER JOIN iphdr ON event.sid = iphdr.sid AND event.cid = iphdr.cid   LEFT OUTER JOIN tcphdr ON iphdr.sid = tcphdr.sid AND iphdr.cid = tcphdr.cid   LEFT OUTER JOIN udphdr ON iphdr.sid = udphdr.sid AND iphdr.cid = udphdr.cid   WHERE event.is_deleted = 0  GROUP BY iphdr.ip_src, iphdr.ip_dst   order by total desc   LIMIT 5  

The query runs fine and returns result in about 7 seconds.

The MySQL query plan is

-+--------+---------------------------------+  | id | select_type | table  | type   | possible_keys                                  | key     | key_len | ref                               | rows   | Extra                           |  +----+-------------+--------+--------+------------------------------------------------+---------+---------+-----------------------------------+--------+---------------------------------+  |  1 | SIMPLE      | event  | ref    | PRIMARY,index_event_sid,index_event_cid,is_del | is_del  | 1       | const                             | 548283 | Using temporary; Using filesort |  |  1 | SIMPLE      | iphdr  | eq_ref | PRIMARY,index_iphdr_sid,index_iphdr_cid        | PRIMARY | 8       | test.event.sid,test.event.cid |      1 |                                 |  |  1 | SIMPLE      | tcphdr | eq_ref | PRIMARY,index_tcphdr_sid,index_tcphdr_cid      | PRIMARY | 8       | test.iphdr.sid,test.event.cid |      1 | Using index                     |  |  1 | SIMPLE      | udphdr | eq_ref | PRIMARY,index_udphdr_sid,index_udphdr_cid      | PRIMARY | 8       | test.iphdr.sid,test.event.cid |      1 | Using index                     |  +----+-------------+--------+--------+------------------------------------------------+---------+---------+-----------------------------------+--------+---------------------------------+  

As can be seen the index iplink (ip_src,ip_dst) is not used.

If I drop the where clause,

SELECT   inet_ntoa(iphdr.ip_src) AS src_ip, iphdr.ip_src AS src_ip_inet, inet_ntoa(iphdr.ip_dst) AS dst_ip, iphdr.ip_dst AS dst_ip_inet, count('*') AS total, count(DISTINCT event.signature) AS count_unique    FROM   iphdr   LEFT OUTER JOIN tcphdr ON iphdr.sid = tcphdr.sid AND iphdr.cid = tcphdr.cid   LEFT OUTER JOIN udphdr ON iphdr.sid = udphdr.sid AND iphdr.cid = udphdr.cid   INNER JOIN event ON event.sid = iphdr.sid AND event.cid = iphdr.cid   GROUP BY iphdr.ip_src, iphdr.ip_dst   order by total desc   LIMIT 5  

The query is 2X faster (Runs in about 3.9s) and returns correct result. Also, the query plan is altered.

+----+-------------+--------+--------+-------------------------------------------+---------+---------+-----------------------------------+---------+----------+----------------------------------------------+  | id | select_type | table  | type   | possible_keys                             | key     | key_len | ref                               | rows    | filtered | Extra                                        |  +----+-------------+--------+--------+-------------------------------------------+---------+---------+-----------------------------------+---------+----------+----------------------------------------------+  |  1 | SIMPLE      | iphdr  | index  | PRIMARY,index_iphdr_sid,index_iphdr_cid   | iplink  | 8       | NULL                              | 1065036 |   100.00 | Using index; Using temporary; Using filesort |  |  1 | SIMPLE      | tcphdr | eq_ref | PRIMARY,index_tcphdr_sid,index_tcphdr_cid | PRIMARY | 8       | test.iphdr.sid,test.iphdr.cid |       1 |   100.00 | Using index                                  |  |  1 | SIMPLE      | udphdr | eq_ref | PRIMARY,index_udphdr_sid,index_udphdr_cid | PRIMARY | 8       | test.iphdr.sid,test.iphdr.cid |       1 |   100.00 | Using index                                  |  |  1 | SIMPLE      | event  | eq_ref | PRIMARY,index_event_sid,index_event_cid   | PRIMARY | 8       | test.iphdr.sid,test.iphdr.cid |       1 |   100.00 |                                              |  +----+-------------+--------+--------+-------------------------------------------+---------+---------+-----------------------------------+---------+----------+----------------------------------------------+  

But since I need to filter out deleted events, I add WHERE clause to the INNER JOIN

SELECT   inet_ntoa(iphdr.ip_src) AS src_ip, iphdr.ip_src AS src_ip_inet, inet_ntoa(iphdr.ip_dst) AS dst_ip, iphdr.ip_dst AS dst_ip_inet, count('*') AS total, count(DISTINCT event.signature) AS count_unique    FROM   event  INNER JOIN iphdr ON event.sid = iphdr.sid AND event.cid = iphdr.cid and event.is_deleted = 0  LEFT OUTER JOIN tcphdr ON iphdr.sid = tcphdr.sid AND iphdr.cid = tcphdr.cid   LEFT OUTER JOIN udphdr ON iphdr.sid = udphdr.sid AND iphdr.cid = udphdr.cid   GROUP BY iphdr.ip_src, iphdr.ip_dst   order by total desc   LIMIT 5  

This query runs in about 7s and gives same query plan as query 1

I also tried to use subquery to select only those events that had is_deleted=0 but it gave me an even worse performance (9s)

My questions are -

  1. How do I optimize the first query to make use of the index on (ip_src,ip_dst) (i.e. my group by terms) without sacrificing the where clause?
  2. Why does the where clause (or join with event.is_deleted=0) lead to such a drastic resetting of query plan
  3. Can I re-write the query to make it more efficient?

Extract query fingerprints and tables from large log files using pt-query-digest

Posted: 27 Sep 2013 01:09 PM PDT

I tried to parse a large log file using pt-query-digest but it gives "Out of memory!" when I try to read the file. I have 3GB RAM, 40GB empty space on HDD, the file is 20GB. I did not found anything in the documentation about memory.

I reduce the command at:

pt-query-digest --group-by fingerprint --print --no-report /path/to/largeLogFile.log  pt-query-digest --table-access /path/to/largeLogFile.log  

I have 2 questions:

  1. How can I extract the query fingerprints from a large log file generated by general_log ?

  2. How can I extract all the tables that were used ?

Edit:
Tried @DTest --sample option (with 2 and 1), but the result is the same. Just before it reaches 50% memory in htop it dies every time (see screnshoot): enter image description here

I did an strace just before it dies I get:

read(3, "phone from teams as t ri"..., 4096) = 4096  mremap(0x5b27a000, 1543499776, 1543503872, MREMAP_MAYMOVE) = -1 ENOMEM (Cannot allocate memory)  mmap2(NULL, 1543503872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)  brk(0x64c2d000)                         = 0x8c2e000  mmap2(NULL, 1543634944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)  mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x5b07a000  munmap(0x5b07a000, 548864)              = 0  munmap(0x5b200000, 499712)              = 0  mprotect(0x5b100000, 135168, PROT_READ|PROT_WRITE) = 0  mmap2(NULL, 1543503872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)  write(2, "Out of memory!\n", 15Out of memory!  )        = 15  munmap(0x5b27a000, 1543499776)          = 0  ... [output removed] ...  

The only option I can think so far is splitting the file "bite size" pieces (let's say 20 files of 1GB each), digest them and merge the result.

SQL Server 2012 backward compatibility for backups with 2008

Posted: 27 Sep 2013 04:28 PM PDT

I have a number of clients with SQL Server 2008 and that's what I have here on my server too. I use backup files to send databases back and forth between clients and at my office.

I have read that when you create a backup from SQL Server 2012 there is no way to restore it onto a 2008 instance. I assumed that the compatibility level would take care of this problem, but it doesn't. Therefore, I am at a loss as to how to upgrade. Other than upgrade all my clients all at once, which is impossible, I can think of no clean way to do this.

I have the need to send a database to a client as well as receive a database from a client. This is my first version upgrade on SQL Server, so I'm new to this problem. Any ideas on how to proceed?

No comments:

Post a Comment

Search This Blog