[how to] Eliminating duplicate records in data cleansing |
- Eliminating duplicate records in data cleansing
- Failed to set releationship for queries
- Can't get rid of filesort on joined query sorting on second table
- Statistical Analysis of Data that has to be done in an order?
- Cannot see Green Button with white arrow in object explorer; DTC config problems?
- SQL Server Memory for Backup Buffer
- UTF8 vs ASCII or custom binary format: high-performance optimization on very big tables
- Can I limit an index to only contain the newest relevant row?
- Exadata exachk.sh
- Optimum configuration for 4 SSD drives
- Migrating oracle to sybase using insert into fails due to missing functions
- How to restore sql server databse with mirroring
- Is there any way to Restore a Database from newer version to an Older Version of SQL Server? [duplicate]
- MySQL error 2006 MySQL server has gone away after upgrade to 5.6.10 using homebrew on Max OS X 10.8.3
- Mysql query grouping by two columns for max date takes too long
- Check if bit is set in mysql
- Clear schema from database without dropping it
- Are there any disadvantages to partitioning on financial year?
- SUPER privilege not defined for master user in Amazon MySQL RDS
- How to import table's data in MySql from Sql Server?
- Is it possible to pipe the result of a mysqldump straight to rsync as the source argument?
- Generate XML using Oracle XML schema
- Import batches of rows from Excel in parallel
- Connect Error (2054) - mysql_old_password issue still not solved
- MySQL Workbench sync keeps requesting the same changes
- Deleting Data From Multiple Tables
- Minimizing Indexed Reads with Complex Criteria
- How to remove a database server from Utility Explorer
- What Problems Does an Eager Spool Indicate
Eliminating duplicate records in data cleansing Posted: 27 May 2013 07:28 PM PDT I have a database full of records of people with simple information like first name, last name, email, location, ... . I need to eliminate the duplicate records. As I've search the process is called "duplicate elimination in Data Cleansing". Does anyone know a good open source tool to do that? |
Failed to set releationship for queries Posted: 27 May 2013 09:05 PM PDT I have two tables:
Note here, 1005 is only in table 2, its not in table 1. I have tried many ways to solve the problem, but the 1005 record is not coming back from my query. |
Can't get rid of filesort on joined query sorting on second table Posted: 27 May 2013 03:20 PM PDT For some reason, this query: says it would use a filesort: I have tried setting up several indexes on |
Statistical Analysis of Data that has to be done in an order? Posted: 27 May 2013 04:23 PM PDT Bear with me - that is the first time try that in SQL Server, normally I have been doing that on the front end ;) I a implementing some analysis on time coded data series. This is not super complicated stuff, but some of it requires some numbers we do not store in the database and that has to be calculated by aggregating the numbers in a specific algorithm IN ORDER. To give an example:
This can not be pre-calculated due to dynamic filtering - there are a number of filters that can be applied to the data. So far - past - I pulled the data to the application, now for the standard stuff I plan to try to keep that in the sql server. My problem now is - I can see how that works (acceptable) in SQL Server: But if I put that into a view... and then filter out rows, the Sum is still calcualted from the beginning. And I need a view because I want (need) to map that standard analysis data into an ORM (so dynamic SQL is out). Anyone an idea how to do that? |
Cannot see Green Button with white arrow in object explorer; DTC config problems? Posted: 27 May 2013 01:35 PM PDT I have SQL Servers across locations setup for replication. One of the SQL Server instances that I installed, running locally I can see the green button with the white arrow in the object explorer when I connect to it. However, when I connect to it from any other location, I cannot see that or neither can I start or Stop the SQL Serer Agent; even though locally I am able to do that. Additionally, the MDTC doesn't has only 2 items sent both with were rejected, where as other servers have 100's committed and sent. Is there something wrong with the DTC settings? Please help. |
SQL Server Memory for Backup Buffer Posted: 27 May 2013 02:05 PM PDT In SQL Server 2012 memory for backup buffer comes from buffer pool or non buffer pool (VAS)? Is this behavior changed between SQL server 2008 R2 and SQL 2012? EDIT: Does MAX Server Memory configuration setting now controls memory for backup buffer? Historically, this was coming out of MemToLeave (or Virtual Address Space) outside SQL Server buffer pool? |
UTF8 vs ASCII or custom binary format: high-performance optimization on very big tables Posted: 27 May 2013 12:25 PM PDT The summary of my question is whether there are advantages, even if minimal, to use ASCII, or even a smaller format designed yet created specifically for the situation, instead of UTF-8 for strings. Is possible to use webserver who will access the data translate one ASCII string to a more compact format. On this question, just 1-3 tables on database handle a massive amount of data, so any byte can make difference if could fit on memory instead of access disk. The information will be accessed via a RESTfull service Read vs WritesProject need much more read than writes. But writing has a peculiarity: every 10 seconds an amount between 40-300 new rows are inserted into the main table. These could be written in parallel, since do not depend of each other. Memory vs Disk usageRecent inserted rows, that will be used immediately, will be inserted also on a cache for the webservice use, so no need to read they again. But for search on old records, the database will need, and it should be fast. This is why I suppose that use less bytes to store some fields will make diference: even for larger amount of data, will be more easy for fit on memory. If I cannot fit data on memory, and database cannot abstract to me some speed, or I will need to force slow table/partition scans each 10 seconds for just one user, or I will be forced to do single select and cache it on Webserver, but this break concept of "stateless" from REST concept. Characters that must be supported0-9,A-Z, "-", "_". Maybe will need "a-z". Just 38 or 64 caracters, and never more than this. For now, most colluns are CHAR(3), CHAR(6), VARCHAR(8) and VARCHAR(10). Examples:
Technologies usedDatabase will be MariaDB. Maybe part of RAW data will be on some NoSQL database. The language of webservice does not really makes diference here, but will be PHP 5.4 with framework Phalcon PHP. Different types of cache could be used, from Varnish-cache to APC/Mencached untill caches inside database. My initial hypothesisOne field with VARCHAR(10) needs 80bits. If I use one custom table to use just 64 or less instead 255 characters, I save around 4 times, so instead of 80bits will use just 20bits. |
Can I limit an index to only contain the newest relevant row? Posted: 27 May 2013 06:46 PM PDT I have a table in PostgreSQL 9.2 that records login sessions called user_login_session. This table contains a user_id, start_timestamp and end_timestamp columns, amongst others. This table has started growing as my system gains traction, and I can't purge old records for now. I've defined an index on this table but I'm concerned about its growth over time. Is it possible to define an index which only keeps a handle to the newest login session row per user ID? Older rows are not accessed operationally. |
Posted: 27 May 2013 11:48 AM PDT I would like to run exachk.sh on regular basis. Now I have to interact with the script and answer all kinds of password questions before the script starts to extract exadata information. I want to automate this and I am looking for ways of doing this. Do I really need to modify the exachk.sh script? The problem with this is that this script is fairly often updated on metalink. Suggestions? Thanks, Bjarte Brandt |
Optimum configuration for 4 SSD drives Posted: 27 May 2013 12:40 PM PDT I would like to move a SQL Server database from a shared configuration with the web-server to it's own dedicated box. My current budget will allow me to put 4 disks together in an array with a single hot spare. I'd like to stretch to 8+ drives but the costs are a little out of my budget right now (and probably somewhat overkill). So my question is, what would be the optimum configuration for SQL Server 2012 when limited to 4 disks? The database is around 29 GB and is growing around 250-500 MB per month. The database will typically deliver 80% reads to 20% inserts/updates/deletes. I understand from researching this subject that my options are as follows:
I'm looking for a solution that will give me reasonable performance but won't obliterate the array if a single drive fails (which I understand is quite common with SSDs). Current Hardware ------------------ HP ProLiant DL360 G7 1 x Xeon E5640 / 2.66 GHz - RAM 12 GB - 2 x 300GB Pluggable SAS SFF 10,000 rpm disks in RAID 1. |
Migrating oracle to sybase using insert into fails due to missing functions Posted: 27 May 2013 12:26 PM PDT I generated the DDL from oracle to sybase, created the tables with no issues even migrated most tables into sybase using insert into, however when came to the time functions I get errors: and |
How to restore sql server databse with mirroring Posted: 27 May 2013 04:28 PM PDT I have sql server with mirroring. I am trying to restore the database using the command: I tried the followings: I set the "remove tail log" I set the "close connections" override existing database. I get the error: Additional info: Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64) |
Posted: 27 May 2013 11:21 AM PDT This question already has an answer here: I am having a database backup file from SQL SERVER 2012 and i wanted to restore it on SQL SERVER 2008R2 . If we can't restore a SQL Server 2012 database to earlier version what is the next best option? So Is their any way to restore by any other way ?? |
Posted: 27 May 2013 12:43 PM PDT I upgraded my installation of MySQL on my Mac OS X 10.8.3 using homebrew
Everything seemed to go smoothly, but...not really, as it turns out. I cannot add a new user (neither through the command line nor through phpmyadmin. I always get the error
I can create new databases without a problem, and I can install, for example Joomla 3.1.1 with no problems. However, when I try to install a new extension into my Joomla installation, I get errors. For example, I get the following error when I try to install a component from NoNumber: Same thing for many other extensions. I tried to start mysql with
and it started with no problem I looked up issues where error #2006 were being reported and how to solve them, and they referred to modifying /etc/my.cnf but there is no /etc/my.cnf configuration file on my system, and there wasn't one in the previous installation of MySQL 5.5.29 (also installed with homebrew) and I had no problems. I ran The output from Exploring a little further, I tried to revert to my previous install of MySQL using
This resulted in the following output: I then accessed mysql from the command line and ran status: As you can see above, it reports But, it also says So something isn't syncing up. Furthermore, when I show databases, I get the following output: But I know I have more than just an So, somehow, it seems to me that mysql is not connecting to the right server, as phpmyadmin reports the same server information that mysql's status command reports: At this point, I am completely lost, and would really need some help. |
Mysql query grouping by two columns for max date takes too long Posted: 27 May 2013 06:13 PM PDT I have table that looks like this: The timestamp field is a binary. All other fields are varbinary. I am not interested in the l_name field but it is relevant for the indexes. What I want to get back is for each unique combination of l_p and l_a the row with the max timestamp. I also have some additional conditions: the timestamp should be smaller than 2013-1-1 and the l_act field should be q. The table I want to get would look something like this: The table contains about 50million records. My current query is extremely slow. It takes hours and still I haven't gotten a result yet. This is the query: I've also tried this variation with the same results: The indexes on the table are: Any ideas on why it takes so long? The table is innodb. |
Posted: 27 May 2013 06:09 PM PDT I have a field which stores the days of the week in binary representation. For example, Mon-Wed-Fri would be represented as b'0101010' with the 1st digit as Sunday and the last as Saturday. What I want to achieve: If today is Monday, I want to check if the Monday bit is set to 1. I do that by doing (b'0101010' & b'0100000' > 0). But I am struggling in converting the current date into the corresponding weekday binary representation. How can I achieve so? |
Clear schema from database without dropping it Posted: 27 May 2013 03:58 PM PDT I'm working on a school project where I have a SQL Server with a database for my team. I already imported a local database created with Entity Framework. Now the model has changed, table properties were added/deleted and I want to update my full database. However, the teachers didn't gave us the create rights so dropping the whole database isn't really an option. Now is my question, is it possible to drop all the tables currently in the database and just import the newly created one without problems? Or do I really need to drop the whole database? |
Are there any disadvantages to partitioning on financial year? Posted: 27 May 2013 10:55 AM PDT Our current set up has one table per financial year (May 1- April 30). Each table has approx 1.5 million rows. We have about 8 years of data, and will obviously be adding each year. The majority of queries are within the financial year/one partition. Either My plan is to have a range partition on an InnoDB table. e.g. This means that the PK has to become Are there any significant disadvantages to partitioning compared to having an unpartitioned table? I know that means the PK is now length 12 and all further indexes will have that prepended to it. Does that make a difference? The table needs to work faster on reads than writes, and there are a fair few indexes on it.
We do sometimes need to query the time across all time or over "the last X months", but this is pretty rare. The main advantages of moving to a single table is to eliminate the logic in the application working out which table to insert/update/select and not needing to calculate unions in those situations where we need more than one table. |
SUPER privilege not defined for master user in Amazon MySQL RDS Posted: 27 May 2013 01:12 PM PDT I have created one medium instance on amazon rds in asia pecific (singapore) region. i have created my master user with master password. and it is working/connecting fine with workbench installed on my local PC. When, I am going to create function on that instance, it show me following error
At my instance, my variable (log_bin_trust_function_creators) shows OFF. now when I go to change with variable using it gives me another error
I don't know how to solve this error. Can anybody help??? |
How to import table's data in MySql from Sql Server? Posted: 27 May 2013 02:12 PM PDT I am trying to export table from SQL Server 2008 R2 TO MySql 5.5. For this I am using Here this error may be occurring because table in Sql Server has a column with data type Please provide your expert answers. If not possible through |
Is it possible to pipe the result of a mysqldump straight to rsync as the source argument? Posted: 27 May 2013 03:12 PM PDT Is it possible to pipe the result of a mysqldump straight to rsync as the source argument? Conceptually, I was thinking something like: I've seen people pipe the result to mysql for their one liner backup solution, but I was curious if it was possible with rsync. You know--- cause rsync is magic :) Thanks for your time! |
Generate XML using Oracle XML schema Posted: 27 May 2013 03:53 PM PDT I have some XML schema definitions that have been registered using DBMS_XMLSCHEMA. I have created tables using the XMLTypes generated. I can create XML using XMLQuery and if I understand correctly, the query results can be inserted into the table if the definition is correct. Ultimately the XML needs to be written to a file, but the part I am unsure about is if writing XMLQueries from scratch is the best way to generate the XML. Is there any way to have the XML schema generate an XML stub for a guide or generate template XMLQuery? The data is currently all in regular Oracle tables, but arranged quite differently, so any information on easing the export to XML based on a specific set of registered schema would be useful. |
Import batches of rows from Excel in parallel Posted: 27 May 2013 03:53 PM PDT We receive an Excel file weekly currently containing about 250k rows. The file grows by about 300 rows each week. In SSIS I've hooked up an Excel source to an OLE DB destination to import the file but it takes a long time. I've looked at using the Is it possible in SSIS to split the Excel file (by rows) and import it in parallel? |
Connect Error (2054) - mysql_old_password issue still not solved Posted: 27 May 2013 11:48 AM PDT i use php 5.4.3 and i want to connect to a MySQL DB remotely (php script resides on a different server than the db) with normal oop way: and i get error: I found this and this but they dont seem to help. I used php 5.2.9 and everything worked ok but lowering php is not a solution for me. I also have to mention that i dont have full access to DB's settings cause the client doesn't give me that full access. I told him to ask hostgator for newer password format support, but they answered that they cannot do anything which was not what i was expecting. I did a new effort from another remote server and i get now this: by using php 5.3.9 It looks the DB doesn't let me in because of the remote server's php script but with oracle's workbench everything works fine. Thanks for your time. |
MySQL Workbench sync keeps requesting the same changes Posted: 27 May 2013 06:12 PM PDT I am using MySQL Workbench, and when I try to "synchronize" it with my remote database, it keeps detecting some changes to make. Specifically, the most recurrent ones are:
I was compliant and executed all the queries given to me (and added the semi-colon that they forgot). MySQL didn't complain and executed them. However it didn't help, I can run it 20 times in a row, it will still ask the same useless changes. |
Deleting Data From Multiple Tables Posted: 27 May 2013 11:12 AM PDT Suppose,I've a table called UNIVERSITY containing universities name: Now these universities ID's has been(obviously) used in many tables within the database(name e.g.Education),Suppose 10 tables. Q.Now what happen if i delete one university? A.The universityID field in other tables becomes NULL. But I don't want these,rather when I delete 1 university from UNIVERSITY TABLE,all its occurrences with Rows in all 10 table should get deleted. What will be the shortest and easiest MySQL Query for this operation. NOTE:I'm using PHP language. |
Minimizing Indexed Reads with Complex Criteria Posted: 27 May 2013 12:12 PM PDT I'm optimizing a Firebird 2.5 database of work tickets. They're stored in a table declared as such: I generally want to find the first ticket that hasn't been processed and is in My processing loop would be:
Nothing too fancy. If I'm watching the database while this loop runs I see the number of indexed reads climbs for each iteration. The performance doesn't seem to degrade terribly that I can tell, but the machine I'm testing on is pretty quick. However, I've received reports of performance degradation over time from some of my users. I've got an index on -- Edits for comments -- In Firebird you limit row retrieval like: So when I say "first", I'm just asking it for a limited record set where |
How to remove a database server from Utility Explorer Posted: 27 May 2013 12:23 PM PDT I use the Utility Explorer to monitor some of my databases but I forgot to remove one before retiring the server. Normally, you could just right click and select "Remove Managed Instance" but I can't do that since the server has been wiped from existence. Now it is stuck in my list without any data |
What Problems Does an Eager Spool Indicate Posted: 27 May 2013 10:47 AM PDT Currently running on SQL Server 2008 R2 I am attempting to increase performance of an UPDATE statement. I notice an Eager Spool operation in the showplan popping up. My understanding of spooling operations is pretty basic - they create temporary storage for the table during the update. I also know that, while they are preventing much worse execution times, eager spools are often indicative of underlying problems with table structure and/or query statements. My question is pretty simple: When you see an Eager Spool in your query plan, what problems do you first look to address? I will be analyzing every part of our system to increase performance - I'm just looking for guidance as to where I should start. |
You are subscribed to email updates from Recent Questions - Database Administrators Stack Exchange To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |
No comments:
Post a Comment