Thursday, June 6, 2013

[how to] MySQL - Export two columns from large table without causing a lock

[how to] MySQL - Export two columns from large table without causing a lock


MySQL - Export two columns from large table without causing a lock

Posted: 06 Jun 2013 07:34 PM PDT

I am a sysadmin who doesn't do a ton of dba stuff. For a project, I have access to a production server--which I am not the sysadmin for--with an enormous 40,000,000 row >10 GB MySQL InnoDB table. I want to export two small columns, one of which is an INT(11), and the other of which is a VARCHAR(20) from that table to a CSV or .SQL file (either is fine, I'll write a crosswalk for the next step no problem).

We can call the columns ColumnA and ColumnB and the table SampleTable.

MySQLdump is not the right tool for this because I can't specify the columns and I don't need to export a massive massive table just got two tiny columns.

I know I can do a SELECT INTO statement (either to create a new table with just the columns or to do a SELECT INTO OUTFILE to skip the intermediate step), but I am concerned that this will cause a table level lock on a production server. The table is InnoDB.

What's my best bet to avoid inconveniencing any live traffic on the server or locking anything?

Thanks

Resources on database design and data integrity enforcement best practices?

Posted: 06 Jun 2013 07:13 PM PDT

I am a developer at a very small software company who has historically taken a very loose approach to database design and data integrity techniques. I'd like to turn that around and start using more in the way of keys, constraints, transactions, etc to prevent software defects and inconsistencies from causing data problems and also bring those causes to light more quickly. Are there any great/classic resources or books on best practices for database design and these sort of data integrity features, discussion of trade offs, etc?

How to "analyze" a SQL query?

Posted: 06 Jun 2013 08:45 PM PDT

I'm not sure if I used the right term in my question.

I am creating a prototype, which allows users to type in a raw SQL query, which will query the backend DB. I want to also make sure that users only apply SQl to the tables that they are allowed to.

So when a user types in something like SELECT * FROM t1 WHERE id > 10, I feel like I need to

  • Analyze the query to make sure that the tables are ok - in this case "t1"
  • They cannot access any other admin-type tables. For example, I don't want them to access "information_schema (i.e. for Postgres)"

What do you think the best strategy would be for this?

FYI, I'm NodeJS, so the primary programming language would be Javascript.

Would I need to "parse" the query to make sure that the tables being accessed by the query are ok?

FTP map disapears after a while [migrated]

Posted: 06 Jun 2013 04:38 PM PDT

I'm deploying an C# mvc application with a couple of content maps to save my images in. I made them all in the ~Content/Images/ map and they work fine but after a while, sometimes just 15 minutes, a couple of maps dissapear. Only my map ~Content/Images/Advert and ~Content/Images/Flags stay always. But the the 3e map in this ../Images/ map always dissapears with all it's submaps + content.

I've tried to add them with FileZilla, give them full rights, add images or files in them just to keep them, but nothing helps. Does anyone know how to fix this anoying problem?

Regards.

Running a DELETE Stored Proc but finding degradation in Performance in WHILE loop as iterations increase

Posted: 06 Jun 2013 05:37 PM PDT

Have a quick general question. I have a table i'm trying to purge a table. I'm deleting using a WHILE loop with WAITFOR DELAY time of 50ms between batches and 2000 records per batch. The real issue is that as time progresses the number of records deleted drops over time. Refer to the following:

Minute Number | Number of Records Deleted:                1 | 162,000               2 | 116,000               3 |  80,000               4 |  72,000               5 |  62,000               6 |  38,000               7 |  38,000               8 |  34,000               9 |  20,000  

Wondering if there is something fundamental about using WHILE loop batches for deleting records that causes the performance to degrade with each iteration of the loop. we have been manually watching the performance and then stopping the proc as the performance begins to drop drastically around the 5th to 6th minute and then restarting the deployment again. We don't think its a locking issue directly because as we play with the batch size the performance is always dropping at around the 5 to 6 minute mark.

New to mySQL Relationships

Posted: 06 Jun 2013 03:47 PM PDT

I'm new to mySQL relastionships and I'm wondering if you can help me out.

This is what I want to do:

Users Table user_id

user_name

pass_word

permission_id

Permissions Table

p_id

permission_name

permission_type

I want to create a relationship between p_id (permissions table) & permission_id (user table) so when I query the user table it also brings through the corresponding permission name & type?

Is this possible or am I getting it all wrong?

Should I just use joins?

Thanks,

  • WebDevB

MySQL dynamically optimize innodb tables without "file per table" setting

Posted: 06 Jun 2013 03:05 PM PDT

We are getting a "too many connections" error once a week the same time a mysql procedure runs. The procedure runs "optimize table" on hundreds of tables and takes nearly ten hours to finish, taking websites offline. I am thinking because tables are locked connections get backed up, or something like that. So I am thinking it's the procedure that is the cause of this error and am looking at ways to refactor it to be smarter as to which tables it optimizes and/or chop up task to be run over many days.

This procedure only filters which tables and data bases to run the optimize command on by using LIKE against the table or database name. So, for every table match, "optimize table {tablename}" is run. These tables are both of MyISAM and InnoDB engine types.

One refactor approach would be to only optimize if the table really needs it. If a fragmented table signifies it needs optimizing then finding out if is fragmented is not too difficult if, if the table is a MyISAM, or InnoDB using innodb_file_per_table, where you can do the math on the information_schema fields "Data_free", data_length and index_length.

Unfortunately the MySQL server is not using the "file per table" setting. This makes the "data_free" field seemly useless for this task because every InnoDB table with have the same value i.e. the free space in the ibdata file.

Maybe if I can understand answers to these questions I will better understand the solution i need.

  • Q 1: How much fragmentation is allowed before it affects performance?
  • Q 2: Should InnoDB tables even be optimized (some say yes others say no)?
  • Q 3: How do you test for InnoDB fragmentation if the server does not use the "file per table" option?
  • Q 4: Is fragmentation the only reason to run "optimize table"?
  • Q 5: If I do need to run "optimize table" on an InnoDB table should I run ALTER TABLE mydb.mytable ENGINE=InnoDB; and not ANALYZE TABLE

Best high availability solution to avoid maintenance downtime?

Posted: 06 Jun 2013 12:10 PM PDT

We currently only have one production SQL Server 2008 server. Anytime we need to perform maintenance or install patches, it causes downtime as the server reboots. In addition, if the server ever fails, that will obviously cause a great deal of downtime.

What is the best solution to avoid this maintenance downtime? We are looking for a solution that will automatically switch to a failover server so we can apply rolling updates.

Running pt-table-checksum on a table in different databases

Posted: 06 Jun 2013 11:28 AM PDT

I'm using filters for replication:

replicate-wild-do-table = test_slave.%  replicate-rewrite-db = test->test_slave  

Is there an option to specify different DB name on slave for "pt-table-checksum"

pt-table-checksum --host=localhost --databases=test --max-load=Threads_connected:25 --no-check-replication-filters  

EDIT 1:

mysql> SELECT COUNT(*) FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'test_slave';  +----------+  | COUNT(*) |  +----------+  |      145 |  +----------+  1 row in set (0.00 sec)      mysql> SELECT COUNT(*) FROM percona.checksums;  +----------+  | COUNT(*) |  +----------+  |       40 |  +----------+  1 row in set (0.00 sec)  

EDIT 2:

# ls -l *.frm | wc -l  145  # ls -l | awk '{print $3,$4}' | uniq    mysql mysql  

It's also complaining that two tables have no indexes and are oversized. It looks like it's just skipping some tables?

Regards

Query should yield result but doesn't

Posted: 06 Jun 2013 11:24 AM PDT

I need to get data from a remote SQL 2000 server and store it in a new MySQL server. I am using the FreeTSD module for php 5.3 to connect to the SQL 2000 server. I can connect without problems.

Here's my issue; one particular table isn't yielding any result to this query:

SELECT * FROM Item  

But this query is successful when I test it SQL Query Analyzer.

I do get results (also remote) when I change it to:

SELECT * FROM Brands  

The Item table holds over 500K records and I'm thinking that that might have something to do with it. Could that be the issue and how can I work around that. I have tried this without result:

SELECT TOP 10 * FROM Item  

This is the php code to connect:

try {      $db = new PDO('odbc:Driver=FreeTDS; Server=xxx.xxx.xxx.xxx; Port=1433; Database=xxxxx; UID=xxxxx; PWD=xxxxxx;');  } catch(PDOException $exception) {      die("Unable to open database.<br>Error message:<br><br>$exception.");  }  

And this is the select code:

$query = "SELECT * FROM Item";  $statement = $db->prepare($query);  $statement->execute();  $data = $statement->fetchAll(PDO::FETCH_ASSOC);    echo '<pre>';  var_dump($data);  echo '</pre>';  

Which returns an empty array.

looking for a postgresql load emulation client

Posted: 06 Jun 2013 08:48 PM PDT

Is there any load emulation client for postgresql like mysqlslap? And is there any good monitoring tool for postgresql database server?

Thanks

pg_dump 9.2.x command does not work with pg_dump 9.2.3

Posted: 06 Jun 2013 09:03 PM PDT

This is how i used to create all the time backups of my databases with PostgreSQL 9.2.x:

pg_dump -Fc -Z 6 -U postgres mydatabase > 2013-xx-xx_xxxxx.db  

However on a machine where PostgreSQL with version 9.2.3 is running I get all the time the error "too many command line arguments" starting with "-Z". If I remove -Z he is complaining about "-U"? What is wrong?


Okay there is definitely sth. wrong. I removed argument for argument until i just started pg_dump. Then i added argument by argument and it worked. So I opened another command line window and tried the same: First "too many arguments", after calling only "pg_dump" and adding further arguments it worked...

Why InnoDB ALTER TABLE is so slow?

Posted: 06 Jun 2013 04:11 PM PDT

The machine has a 4-core hyper-threaded Intel i7 670 @ 2.8GHz, 16G RAM, 8G of which is InnoDB buffer pool. The database resides on Intel SSD disk. DBMS is MariaDB 5.5.30 x64 running under Windows Server 2008 R2 x64. The problem is that ALTER TABLE still runs too damn slow, i.e. converting a 100M InnoDB table to COMPRESSED row format takes about two minutes. During the operation the CPU isn't even running at full speed, showing 10-20% load. More than a half of InnoDB buffer pool is free. SSD disk active time is roughly 5% and lower. So what could be a bottleneck this time? InnoDB setup is available here.

Backup not creating a new file

Posted: 06 Jun 2013 03:35 PM PDT

I run a daily backup of my database through command prompt:

c:\sqlcmd -H localhost -Q "BACKUP DATABASE test TO DISK='c:\test.bak'"  

But the new backup replaces the previous backup. Please tell me what to change so that all backups will be stored on disk.

Does SQL Server have a reference partitioning equivalent?

Posted: 06 Jun 2013 05:40 PM PDT

Oracle 11g has introduced a new feature that is called Reference Partition.

Using reference partitioning, a child table can inherit the partitioning characteristics from a parent table.

http://www.orafaq.com/wiki/Reference_partitioning

Is there an equivalent feature in SQL Server?

Databse Mail sending functionality not working on local system

Posted: 06 Jun 2013 03:28 PM PDT

I am using Database Mail functionality to send mail in Sql Server 2008 database via following sp:

EXEC sp_send_dbmail @profile_name='MyProfile', @recipients='abc@companyname.com', @subject='Test message',@body='Congrats Database Mail Received By you Successfully.'   

I have tried with my gmail account profile on my local system it's working properly but not with my company or outlook profile.

Error Message:

The mail could not be sent to the recipients because of the mail server failure. (Sending Mail using Account 5 . Exception Message: Could not connect to mail server. (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond abc.j.i.ooo:pp). )  

What would be the problem.

Thanks

SQL Server Many-to-One replication

Posted: 06 Jun 2013 01:58 PM PDT

I have 8 individual SQL Server 2008 R2 machines, each hosting 1 database. Each database has an identical table structure and schema, and entirely unique data.

I would like to establish a reporting server (may be 2008 or 2012), that consolidates the rows from selected tables across the 8 source servers into a single instance of those tables on the reporting server. This is one-way replication (no changes will be made to the reporting server). I will need to replicate changes from the source databases with relatively low latency (say 20-30 seconds).

Furthermore, I'd like to find methods to achieve this with as little impact to the source servers as is practical. 3rd-party agents, triggers, or schema mods to those servers are difficult in my environment.

My questions:

  • What are promising architectures and technologies for achieving this goal?
  • I've looked at SQL Server Merge Replication, but I am concerned about latency. Is this an appropriate technology for this goal?
  • Are there many-to-one architectures for transactional replication?
  • Should I be looking at 1-to-1 replication into 8 databases on my reporting server, followed by some custom merge function (a 2-step replication)?

Thanks, John

Cannot Delete FileStream Group from Sql Server 2008

Posted: 06 Jun 2013 11:46 AM PDT

We have migrated all our varbinary(max) column data to Azure Blob Storage and so we want to remove the old filestream columns that remain in our Sql 2008 database and the filestream filegroup but when we try to we are getting the error:

Msg 5042, Level 16, State 11, Line 2  The filegroup 'FileStreamGroup' cannot be removed because it is not empty.  

However when we run this:

exec sp_helpfilegroup 'FileStreamGroup'  

It is returning this:

groupname           groupid    filecount  FileStreamGroup     2          0  

So the file count is 0 but it won't let us remove it, has anybody else had this problem and how on earth do you completely remove the filestream from the database.

In addition this query:

select * from sys.tables t   join sys.data_spaces ds on t.filestream_data_space_id = ds.data_space_id   

Returns 0 rows so no tables are using any filestream data if I understand this correctly.

In Memory Database for high transaction and volume spikes on a website

Posted: 06 Jun 2013 02:13 PM PDT

I am looking at the IA for a hypothetical application scenario, for example:

  • A high demand application such as an online ticketing website where there can be demand for 100,000 tickets to be purchased per minute. At the same time the number of tickets needs to be tracked accurately so that they are not oversold (so I'm thinking to maintain the ticket count in memory).

I am thinking the less time critical tasks can be queued or deferred, ie email confirmation, and then write everything to the physical DB after the ticket purchase transaction is complete.

The only risk with in memory I believe is if the software/hardware fails, but I believe there may be some clustered options with some IMDB providers so that they are failsafe.

I have considered horizontal scaling/sharding with regular RDBMS but I am worried about the costs. I'd love to hear some thoughts if anybody has dealt with similar scenarios and if they used IMDB or something else?

Can I add a unique constraint that ignores existing violations?

Posted: 06 Jun 2013 02:37 PM PDT

I have a table which currently has duplicate values in a column.

I cannot remove these erroneous duplicates but I would like to prevent additional non-unique values from being added.

Can I create a UNIQUE that doesn't check for existing compliance?

I have tried using NOCHECK but was unsuccessful.

In this case I have a table which ties licensing information to "CompanyName"

EDIT: Having multiple rows with the same "CompanyName" is bad data, but we can't remove or update those duplicates at this time. One approach is to have the INSERTs use a stored procedure which will fail for duplicates... If it was possible to have SQL check the uniqueness on its own, that would be preferable.

This data is queried by company name. For the few existing duplicates this will mean that multiple rows are returned and displayed... While this is wrong, it's acceptable in our use case. The goal is to prevent it in the future. It seems to me from the comments that I have to do this logic in the stored procedures.

Why don't databases create their own indexes automatically?

Posted: 06 Jun 2013 03:42 PM PDT

I would have thought that databases would know enough about what they encounter often and be able to respond to the demands they're placed under that they could decide to add indexes to highly requested data.

Shell: How to time a script running in SQLPlus and kill it after x amount of minutes?

Posted: 06 Jun 2013 05:05 PM PDT

I have a little tool which automatically runs a series of SQL scripts when and outputs to .XLS when an appropriate request is made. However, some SQL scripts need to be ran on a live database, and I don't want them to run for over 5 minutes.

Is there a way to - within my shell script - time the script's run time and kill it after 5 minutes?

Thanks in advance.

Would Mongo's ObjectID work well in InnoDB's clustered index?

Posted: 06 Jun 2013 05:50 PM PDT

Mongo's ObjectID is defined like this:

ObjectId is a 12-byte BSON type, constructed using:

a 4-byte value representing the seconds since the Unix epoch,

a 3-byte machine identifier,

a 2-byte process id,

and a 3-byte counter, starting with a random value.

Assuming the PRIMARY key was BINARY(12), will this work as well as an auto incremented INT? Does it count as sequential for InnoDB's clustered index? Would gaps in the timestamp be a problem?

How database administrators can see my requests to SQL Server?

Posted: 06 Jun 2013 02:39 PM PDT

I'm a SQL Server 2008 user. I have access to some tables. I need to request few columns from table as I usually do. But I need to do it once (for example) in 5 seconds and system administrators shouldn't see (feel:) my activity.

Result of request - table with approximately 100 lines. My query contains only select and where clause by index. (it is light and it is executing very fast)

As I know, SELECT operations don't write to transaction log. I mean, if I only read database, where is log of my select actions SQL Server keep? Can administrator see my select queries?

C2 audit, as I can see in properties, is disabled.

Is there any other ways to see my activity?

Thanks.

Deriving formulas for input/output

Posted: 06 Jun 2013 08:24 PM PDT

I'm currently enrolled in a DBS class and am having problem with an assignment. I've searched around and have been unable to understand what it is I'm meant to be doing with this derivation formula.

A plant file with TREE-GENUS as the key field includes records with the following TREE-GENUS values: Tsuga, Ficus , Arbutus, Quercus, Melaleuca, Tristaniopsis, Cornus, Sequoiadendron, Lithocarpus, Liriodendron, Pittosporum.
Suppose that records with these search field values are inserted into a random (heap) file with a maximum of 3 records per block. Derive a formula for the expected number of disk I/O to scan these records and to search for a particular record

I've been using some software that was given with the assignment and it also asks what are the maximum number of blocks that are allowed and that is not given by the above brief. I'm not really sure how to derive a formula for this. I've assumed that because there are 3 records per block there are 4 blocks required and that a random heap file uses 1 disk i/o per write/read.

If this is a larger topic than is worth explaining a link to a reliable few pages is also helpful.

Is SQL Server 2012 Management Studio "Express" any different?

Posted: 06 Jun 2013 03:33 PM PDT

Are there any differences from the version of SSMS that comes as part of the SQL Server 2012 Enterprise Installer and the SQL Server 2012 Express Management Studio?

Proper Use of Lookup Tables

Posted: 06 Jun 2013 12:19 PM PDT

I'm having trouble figuring out exactly how to place good boundaries for when and where to use lookup tables in a database. Most sources I've looked at say that I can never have too many but, at some point, it seems like the database would be broken down into so many pieces that, while it may be efficient, it is no longer manageable. Here's a thrown together example of what I'm working with:

Let's say I have a table called Employees:

ID  LName   FName   Gender  Position  1   Doe     John    Male    Manager  2   Doe     Jane    Female  Sales  3   Smith   John    Male    Sales  

Pretend for a moment that the data is more complex and contains hundreds of rows. The most obvious thing I see that could be moved to a lookup table would be Position. I could create a table called Positions and stick the foreign keys from the Positions table into the Employees table in the Position column.

ID  Position  1   Manager  2   Sales  

But how far can I continue to break the information down into smaller lookup tables before it becomes unmanageable? I could create a Gender table and have a 1 correspond to Male and a 2 correspond to Female in a separate lookup table. I could even put LNames and FNames into tables. All "John" entries are replaced with a foreign key of 1 that points to the FName table that says an ID of 1 corresponds to John. If you go down this rabbit hole too far like this, though, your Employees table is then reduced to a mess of foreign keys:

ID  LName   FName   Gender  Position  1   1       1       1       1  2   1       2       2       2  3   2       1       1       2  

While this might or might not be more efficient for a server to process, this is certainly unreadable to a normal person who may be trying to maintain it and makes it more difficult for an application developer trying to access it. So, my real question is how far is too far? Are there "best practices" for this sort of thing or a good set of guidelines somewhere? I can't find any information online that really nails down a good, useable set of guidelines for this particular issue I'm having. Database design is old hat to me but GOOD database design is very new so overly technical answers may be over my head. Any help would be appreciated!

Is normalization process needed?

Posted: 06 Jun 2013 03:10 PM PDT

Well after learning DBMS as a subject i got so many questions in mind. Normalization is one of them. As i learnt it there was a lot more confusion and i found that whatever we do in normalization process we can do it by general common sense also. Even while making projects also people are not used to follow it. So is it really needed? Is it followed in the companies? I am asking this question because probably it might consume more time to normalize the database. We can directly normalize it using just common sense therefore i don't think there is any need of following the standard normalization procedure. Correct me if i am wrong.

No comments:

Post a Comment

Search This Blog