[how to] Set field values to newly imported rows in PostgreSQL table with existing data |
- Set field values to newly imported rows in PostgreSQL table with existing data
- SQL Server migrations go through client machine or not?
- Unexpected Table Scan with Parameterized LIKE
- Oracle equiv of T-SQL UNION ALL query
- Login Failed for Domain\ServerName$ when trying to create a new SQL Server Reporting Services Application in SharePoint 2013
- What must be in place to validate a XMLTYPE against a schema?
- Database theories possible topics to research through [closed]
- Inner join using an array column in PostgreSQL
- Concatenation Physical Operation: Does it guarantee order of execution?
- Handling expiration in a course registration or ticketing system
- SQL server ODBC connection for Active Directory user who doesn't log in to the windows
- How to reinsert corrected rows from the conflict table?
- Connector Table Hash Values try to optimize data storage
- How to execute a non-table-locking update operation on PostgreSQL?
- Backups script completes, but doesn't back up all dbs
- Process attempted to unlock a resource it does not own
- Automated SQL backup on a timely fashion, & cleaup the database for the backed up data
- Foreign keys - link using surrogate or natural key?
- Innodb Slow queries since convert from MyISAM
- Relating ExecutionInstanceGUID to the SSISDB
- How can I improve my table design for different types of an entity?
- In what data type should I store an email address in database?
- compare the same table
- sql server database sharding - what to do with common data / non sharded data
- ParallelPeriod returning null for Feb. 29 in date dimension
- Why is DROP DATABASE taking so long? (MySQL)
- Why are so many MPP solutions based on PostgreSQL instead of MySQL?
Set field values to newly imported rows in PostgreSQL table with existing data Posted: 02 Apr 2013 08:06 PM PDT I have a PostgreSQL table with existing data, and needs to import all the rows in a CSV file into that table. I am using pgadmin3's Import tool to do the import. Question: For the newly imported rows, how do I set the values for a specific column (not found in the CSV)? I cant do |
SQL Server migrations go through client machine or not? Posted: 02 Apr 2013 06:10 PM PDT I want to migrate a SQL Server db on one machine to another machine. Both machines are remote. If I run the Copy Database wizard, will it pull all the data across the Internet to my client machine and then push it back out to the destination? Or will the source and destination machines communicate directly? It's a very big database, so this matters quite a bit. |
Unexpected Table Scan with Parameterized LIKE Posted: 02 Apr 2013 05:25 PM PDT I'm experiencing an unexpected table scan on SQL Server 2005 against a heap table when parameterizing a The issue only happens given this specific scenario... so I'm not confused as to how to resolve the issue, I'm confused as to why this is happening. The following T-SQL should recreate the issue on SQL Server 2005: I was directed towards this article by Paul White which seems very closely related, but the conclusions / explanations don't match my specific issue. Any insight is appreciated. |
Oracle equiv of T-SQL UNION ALL query Posted: 02 Apr 2013 02:24 PM PDT In T-SQL I can accumulate the results of two unioned queries using the following syntax: What is the equivalent in Oracle? |
Posted: 02 Apr 2013 01:51 PM PDT Basic setup:
Problem: |
What must be in place to validate a XMLTYPE against a schema? Posted: 02 Apr 2013 05:48 PM PDT I have a procedure that generates an XMLTYPE and I want to validate it against a schema. The problem is that there seems to be a permissions issue running createSchemaBasedXML because when I run the procedure as AUTHID DEFINER it gives the error "ORA-31050: Access denied", but when I run it as AUTHID CURRENT_USER it actually returns a validation specific error (I'll deal with that separately). CURRENT_USER is not an acceptable solution. My supposition is that CURRENT_USER works because the user has the XMLADMIN role. Granting the permissions the role includes does not resolve the issue, so it must be the roles ability to bypass the ACLs. The thing is, querying RESOURCE_VIEW for the ACL that protects the resource shows that it is protected by Using There are any number of places I could be going wrong in this process, so the core of what I am looking for is this: What must be in place to validate a XMLTYPE against a schema? |
Database theories possible topics to research through [closed] Posted: 02 Apr 2013 12:42 PM PDT I need to conduct a research on database theories. I want to know what possible questions should I research on which is believed to be suitable when researching database topics for a online assessment system? I got one questions which is "Types of Databases" where I want to look up relational databases, object orientated databases and online databases. Then is it reasonable to look up each three types of databases individually by looking at their purpose, benefits and limitations? Also is the mysql database used in phpmyadmin actually a relational databases or classed as an online database? Is there an more questions I need to research through under database theories? Thanks |
Inner join using an array column in PostgreSQL Posted: 02 Apr 2013 12:07 PM PDT Having trouble indexing and executing a query in O(log n) time. The query includes an inner join, an ORDER BY, and an equality operation. If I understand the laws of databases correctly, a query can be indexed and executed in O(log n) time (or thereabouts) if a non-equality operator is not used on more than one field. In this case, I believe the inner join does count as an equality operator and the non-equality operator would be the 'ORDER BY' portion of the query. This table has upwards of 10,000,000 rows and needs to handle several reads and writes per second. Using PostgreSQL. This is what the table looks like. As you can see, the field 'Names' is a list property and it is the column that the inner join goes against: Age Names Date 34 ['carla', 'john', 'sam'] 3/13/2011 26 ['json', 'cindy', 'joel'] 3/13/2011 72 ['beth', 'amber', 'susie'] 3/13/2011 14 ['john', 'jim', 'debie'] 3/13/2011 This is the query that we are trying to do: SELECT * FROM the_table WHERE Age==26 AND Names=='john' ORDER BY Date My background is from using App Engine's Big Table, so I've used equality operators here to indicate that 'john' should be one of the names in the 'Names' column. This would be an acceptable query in GAE's big table, it would execute in O(log N) time as all Big Table queries are reqyured to do. I am assuming there is a way to do this in PostgreSQL as well since PostgreSQL accepts list data types as columns. Is this possible to do in PostgreSQL? If so, how should the index be set up (we can't figure out how to set up an index that takes into account the three properties)? |
Concatenation Physical Operation: Does it guarantee order of execution? Posted: 02 Apr 2013 11:40 AM PDT In standard SQL, the result of a Could return two rows in any order (although, in practice on any database I know of, 'A' will come before 'B'). In SQL Server, this turns into an execution plan using a "concatenation" physical operation. I could easily imagine that the concatenation operation would scan its inputs, returning whatever input has records available. However, I found the following statement on the web (here):
Question: Is this true in practice? Is this guaranteed to be true? I haven't found any reference in Microsoft documentation that the inputs are scanned in order, from the first to the last. On the other hand, whenever I try running it, the results suggest that the inputs are, indeed, processed in order. Is there a way to have the engine process more than one input at a time? My tests (using much more complicated expressions than constants) are on a parallel-enabled 8-core machine, and most queries do take advantage of the parallelism. |
Handling expiration in a course registration or ticketing system Posted: 02 Apr 2013 04:15 PM PDT I want to restructure the database for a web application I inherited for handling summer camp registrations, but my question is relevant to any event registration or ticketing system. My question is about how to best handle the need for registrations to expire if unpaid for X amount of time. Currently the database design is something like this (some columns omitted for brevity): When payment is completed, the There are currently some bugs related to this, and anyway it seems to me to be a somewhat strange design given that the I've been researching potential database designs for this scenario and one thought I had was to add price info to the In this design, the FK Another option I thought of, inspired by this example (scroll down to "Data Modeling in the Sample Application"), would be to have a separate Finally, I considered what it would look like if I kept the So my specific concerns are:
Thanks! |
SQL server ODBC connection for Active Directory user who doesn't log in to the windows Posted: 02 Apr 2013 01:15 PM PDT Do you think it is possible to create a SQL server ODBC Connection for an active directory user who doesn't log in into the windows. Ideally this type of users will be used in the batch process. So, Another person logs in and creates a batch process and runs it with another user. Note: I dont want to create a SQL server authentication. Instead would like to use active directory. Thanks. |
How to reinsert corrected rows from the conflict table? Posted: 02 Apr 2013 01:43 PM PDT I have a bidirectional merge replication. I had failure constraints because the primary key was just integer. I change the primary key to the old primary key + a location identifier.The problem is how can I reinsert the old rows of conflict table ( that I can correct manually from MSmerge_conflict_) to the publishers and subscribers. can you help me please? sorry for making faults, I'm not english speaker |
Connector Table Hash Values try to optimize data storage Posted: 02 Apr 2013 09:01 AM PDT I am designing a system which stores certain hash values to represent certain pieces of text for example. Each hash value can represent more than one text file. I have 4 tables being the details_table,text_table,attribute_table,connector_table. The attribute_table is laid out as followed: The details_table is laid out like: The text_table is laid out like : And finally the connector_table is laid out like : The details table contains the hash value. To save time when a hash value is marked as ready to export I would like the associated text rows to also be marked as ready to export how would I do this? Would I need to add another table with hash status ? If I use the status related to the text table then when the text fields are exported it would mark the hash as exported and so any new files that had the same hash would be marked as exported. Its really bugging me as the only way I can see is if I create another table and connect the detail has to it and set the status that way? If anyone else has anything better or a different solution I would appreciate the guidance? |
How to execute a non-table-locking update operation on PostgreSQL? Posted: 02 Apr 2013 10:38 AM PDT Looking for a good way to update the value on a column for all the rows in a database (not huge, but big enough - about 10M records), without locking the whole table, so operations can continue while the update is working. The update operation is pretty simple, because the value of the new column is basically computed from another column, kind of like this: However, I have to leave this running for a long time, and I would like to not bog down the usual DB activity while the update happens. Is there a better method to do this than running a process that does everything in one go? Thanks very much! |
Backups script completes, but doesn't back up all dbs Posted: 02 Apr 2013 04:24 PM PDT I'm running sql server 2005 and I've made a simple backup script that backs up all the databases on one server. The script always succeeds, but when I go to check the log or the files, I see it has only backed up around half of my databases. I've tried it on multiple servers and it does the same thing. I've checked the application log for the backups in the eventvwr, however, it tells me to check the application log for more details so I can never find the detailed information I need. I saw this similar question and tried the solution that was recommended to try and catch errors. However, there were no errors thrown when I run the script. I used this sites example for error handling. Is there a reason why backups fail silently? Is there better error handling I can be doing? [Full_Backup] Edited to add the [Full_Backup] sp |
Process attempted to unlock a resource it does not own Posted: 02 Apr 2013 12:34 PM PDT SQL Server 2005 SP4 32-Bit I have a DBCC CHECKDB job running nightly. Last night, soon after the job started, I got the errors below. The database is NOT in suspect mode, and CHECKDB comes back clean when I run it now. The database is fully accessible. Should I be concerned? I'd hate to go back to a backup at this point. |
Automated SQL backup on a timely fashion, & cleaup the database for the backed up data Posted: 02 Apr 2013 04:39 PM PDT I need to back up SQL database (historian), on a timely fashion, and then clean up the database by removing the backed up data. I am using MS SQL 2008 (R2), on a Windows XP machine. The biggest issue is the very limited hard disk space. The database is limited to a maximum of 3GB! In terms of overall performance, the PC is really slow, and unfortunately I do not have the choice to change that. So, I could consider backing up overnight when the data flow is expected to be less. The intention is to back up the data every two weeks, have it stored in a special directory (e.g. c:\ ). Then an operator can move the backup to another machine. Given the limited space, I could consider some 'house clean up', by removing the backed up data. What is more important is the ability to merge the regular backups to an external database. So perhaps a typical SQL backup routine and restore, could be an option. I would appreciate your kind advice regarding this matter. Thank you. |
Foreign keys - link using surrogate or natural key? Posted: 02 Apr 2013 11:18 AM PDT Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is Jack Douglas' answer in this question, and his reasoning seems sound to me. I'm aware of the discussion beyond that that rules change, but this would be something that would need to be considered in any situation. The main reason for asking is that I have a legacy application that makes uses of FKs with natural keys, but there is a strong push from devlopers to move to an OR/M (NHibernate in our case), and a fork has already produced some breaking changes, so I'm looking to either push them back on track using the natural key, or move the legacy app to use surrogate keys for the FK. My gut says to restore the original FK, but I'm honestly not sure if this is really the right path to follow. The majority of our tables already have both a surrogate and natural key already defined (though unique constraint and PK) so having to add extra columns is a non-issue for us in this insance. We're using SQL Server 2008, but I'd hope this is generic enough for any DB. |
Innodb Slow queries since convert from MyISAM Posted: 02 Apr 2013 10:35 AM PDT a few days ago we converted some write intensive tables from MyISAM to InnoDB hoping to have a better performance due the better locking system of InnoDB, but instead of gain performance, we start to see simple queries on slow log. for exemple, the bellow query took 3.6 sec to run: see bellow the table structure: to run a select(not using cache) with the same WHERE clause it run in 0 sec The server is a 16 core cpu's: And has SSD Drivers. The total size of InnoDB tables on the server is 1.79 GB: As you can see bellow, we have 2.5 GB of innodb_buffer_pool_size witch is almost 1GB more the the sum of index + data I'm running version 5.1.66 (as the version 5.1 is the latest version included on non back-port repository of debian squeeze, update to 5.5/5.6 unfortunately is not an option) Update 1:Just had a new occur on 16:46:02, see bellow the iostat from 14:46:00 (2 sec interval) And also the Does anyone has any clue about why this updates (it happens to insert and also to delete) are so slow? |
Relating ExecutionInstanceGUID to the SSISDB Posted: 02 Apr 2013 07:42 PM PDT The 2012 release of SQL Server Integration Services, SSIS, has delivered an SSISDB catalog which tracks the operations of packages (among other things). The default package execution for solutions using the Project Deployment model will have logging to the SSISDB turned on. When a package executes, the What I'd like to know, is how do I tie an ExecutionInstanceGUID to anything in the SSISDB catalog. Alternatively, is an SSIS package executing in the SSISDB privy to the value of its Ultimately, I am trying to use the existing, custom audit table and link it back to the detailed history in the SSISDB catalog but can't seem to find the link. |
How can I improve my table design for different types of an entity? Posted: 02 Apr 2013 07:30 PM PDT Consider an accounting system as an example. I have an Entity called Currently, I come up with the following design: But I don't think my design is efficient enough (or even correct and free of errors). What would you suggest? Also, if this is important in any way, I am planning to utilize MariaDB. |
In what data type should I store an email address in database? Posted: 02 Apr 2013 08:03 PM PDT I understand that an 254 character email address is valid, but implementations I have researched tend to use a varchar(60) to varchar(80) or equivalent. For example: this SQL Server recommendation uses varchar(80) or this Oracle example Is there a reason to not use the full 254 character maximum? Doesn't a varchar by definition use only as much storage as needed to hold the data? Are there significant performance implications/trade-offs which cause so many implementations to use less than the full 254 possible characters? |
Posted: 02 Apr 2013 10:59 AM PDT I am facing an issue with the following query. When I execute the query, it takes very long. I broke the query into two parts, compared with a shell script, but is there any chance to go with one query? Any suggestion welcome. |
sql server database sharding - what to do with common data / non sharded data Posted: 02 Apr 2013 04:59 PM PDT We have a very large scale enterprise level database. As part of our business model all web users hit our web servers at the same time each month which in turn hammer our sql box. The traffic is very heavy and continues to grow heavier the larger the company grows. sql proc optimization has been performed and hardware has already been scaled up to a very high level. We are looking to shard the database now to ensure that we can handle company growth and future loads. We have decided what particular data should be sharded. It is a subset of our database which is highly utilized. However, my question is regarding the non sharded data which is common/universal. An example of data like this may be an Inventory table for instance or possibly an Employee table, user table etc . I see two options to handle this common/universal data: 1) design 1 - Place the common/universal data in an external database. All writes will occur here. This data will then be replicated down to each shard allowing each shard to read this data and inner join to this data in t-sql procs. 2) design 2 - Give each shard its own copy of all common/universal data. Let each shard write locally to these tables and utilize sql merge replication to update/sync this data on all other shards. concerns about design #1 1) Transactional issues: If you have a situation in which you must write or update data in a shard and then write/update a common/universal table in 1 stored proc for instance, you will no longer be able to do this easily. The data now exists on seperate sql instances and databases. You may need to involve MS DTS to see if you can wrap these writes into a transaction since they are in a separate database. Performance is a concern here and possible rewrites may be involved for procs that write to sharded and common data. 2)a loss of referential integrity. Not possible to do cross database referential integrity. 3) Recoding large areas of the system so that it knows to write common data to the new universal database but read common data from the shards. 4). increased database trips. Like #1 above, when you run into a situation in which you must update sharded data and common data you are going to make multiple round trips to accomplish this since the data is now in separate databases. Some network latency here but I am not worried about this issue as much as the above 3. concerns about design #2 In design #2 each shard gets its own instance of all common/universal data. This means that all code that joins to or updates common data continues to work/run just like it does today. There is very little recoding/rewriting needed from the development team. However, this design completely depends on merge replication to keep data in sync across all shards. the dbas are highly skilled and are very concerned that merge replication may not be able to handle this and should merge replication fail, that recovery from this failure is not great and could impact us very negatively. I am curious to know if anyone has gone with design option #2. I am also curious to know if i am overlooking a 3rd or 4th design option that I do not see. thank you in advance. |
ParallelPeriod returning null for Feb. 29 in date dimension Posted: 02 Apr 2013 12:17 PM PDT I have a calendar date dimension backed by a physical table of dates (originally created on SQL Server 2000, hence the datetime instead of date): The Post Date dimension has four attributes (with member key columns listed, some of which are calculated in the DSV):
It's nothing too fancy, obviously. I also have a few calculated measures that use ParallelPeriod to calculate YTD figures from the previous year, for quick side-by-side comparison without requiring the user to choose a specific slice of dates. Just pick the current year, and it will find the latest date with sales in it, then compare to that same range from the previous year. Finding the appropriate date in the previous year normally boils down to this: The Tail call is where it finds the latest date beneath the currently selected Post Date member (typically the current year). That works fine. But if that returns Feb. 29, meaning the last sale for a particular combination of dimension members occurred on Feb. 29, then it passes Feb. 29 into the ParallelPeriod function, which subsequently returns null. And then the previous-year YTD measure also returns null. So, in a nutshell: Based on this particular schema, is there a simple way to have ParallelPeriod behave nicely for Feb. 29 inputs? If it just returns Feb. 28 of the previous year, that's fine. EDIT: A few things I've tried:
|
Why is DROP DATABASE taking so long? (MySQL) Posted: 02 Apr 2013 04:19 PM PDT New CentOS installation. I was running an import of a large DB (2GB sql file) and had a problem. The SSH client seemed to lose the connection and the import seemed to freeze. I used another window to login to mysql and the import appeared to be dead, stuck on a particular 3M row table. So I tried 15-20 minutes later, nothing. In another window, I did: The DROP DB window messaged: SERVER SHUTDOWN. Then I actually restarted the physical server. Logged back into mysql, checked and the db was still there, ran DROP DATABASE huge_db; again, and again I'm waiting already about 5 minutes. Once again, it's fresh installation. The Please help. EDIT: I've successfully dropped the database. It took something like 30 minutes. Also note that I think I was mistaken when I thought the mysqldump import was dead. The terminal connection was lost, but I think the process was still running. I most-likely killed the import mid-table (the 3M row table) and probably 3/4 of the way through the whole db. It was misleading that "top" showed mysql using only 3% of memory, when it seemed like it should be using more. Dropping the DB ended up taking 30 min, so, again, I might not have had to restart the server and possibly could have just waited for the DROP to finish, but I don't know how mysql would react to getting a DROP query for the same db that it's importing via mysqldump. Still, the question remains, why does it take 30min+ to DROP a 2GB database when all it should have to do is delete all the db files and remove all references to the DB from information_schema? What's the big deal? |
Why are so many MPP solutions based on PostgreSQL instead of MySQL? Posted: 02 Apr 2013 09:03 AM PDT Astor Data, Greenplum and GridSQL all allow Massive Parallel Processing of SQL queries. They are also all built around PostgreSQL technology. Is this just because of licensing issues or are there other reasons? To me, it seems like the MyISAM, not being ACID complient and therefore not running into the same issues with MVCC (like seen here) as PostgreSQL is far better suited for building high-performance data warehouses. After all OLAP load does not require transactions as far as I can see. |
You are subscribed to email updates from Recent Questions - Database Administrators Stack Exchange To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |
No comments:
Post a Comment