[how to] InnoDB, Hash Table, Reference Table and Fragmentation |
- InnoDB, Hash Table, Reference Table and Fragmentation
- SQL Server 2008: Is there a graphical reporting tool that shows tps (or select statements) per second for a specific table?
- MySQL Best Practices for a log table to display website statistics
- function hangs with null case operation
- Learning to draw entity relational diagram
- SSIS hangs during For Each File enumeration
- Mongo DB Cant Start
- UPDATE statment with and without SELECT keyword
- Phonetic search with Solr, for Brazilian Portuguese
- Will Ola's default IndexOptimizer job update my stats?
- Modeling a database for easy counting / reporting
- Trouble configuring Toad for oracle 9.7.2 commercial with instant client [on hold]
- Does existing msrepl_tran_version column need to be removed when re-creating replication?
- Run job once a minute, every day, except during a certain time
- Putting a Select statement in a transaction
- INT or TIMESTAMP for the change time column in a logging table?
- How do I insert a row which contains a foreign key?
- representation in ms-access
- Is this good/bad database design and why?
- What is the best way to transport database fields from one database to another?
- Several PC Cannot Connect To MySQL Server on Certain Time
- Start service error code 3417 with SQL Server 2012
- Time series data for ad platform
- If an account has REQUIRE SUBJECT, does it still need a password?
- MySQL users corrupt
- I have multiple sources INSERTing into a MySQL innodb table. My periodic aggregation script never makes accurate aggregates. Why?
- Are there any tables that I can get all Oracle supported system privileges and object privileges?
- SQL Server 2008 R2 (Suspect) mode - how to repair?
InnoDB, Hash Table, Reference Table and Fragmentation Posted: 16 Jul 2013 08:30 PM PDT I have a big list of domains and URLs that I store in a database, there are around 150M domains and 300M URLs, I'm using InnoDB to store each in the format: Which is fine. New records are assigned what is effectively an auto-increment, so there's no fragmentation and inserts are in ascending order. When new data comes in however (typically between 250K and 2M rows), I use two separate 'hash' tables to see whether the domain or URL already exists in the database. Strictly speaking it's not a 'hash' table, just a bunch of MD5s which I use ensuring values are unique, with the added benefit of the table being fixed length. The table is also partitioned.
For seeing whether a bunch of domains already exist in the database, this works relatively well, however, the table gets fragmented due to the random nature of insertions. The hash table is basically only used for insertions and quickly looking up whether a domain exists in the DB or not. During insertions, a script walks from 0-255 and performs the necessary check. My question is, do you know of a better procedure in order to handle inserts/selects better? I believe when I started out with this database I simply had a key on domains_list.domain, which was slow. I find that when the partitions are re-organised the lookups are very quick, but after a number of batch insertions, the same lookups slow down somewhat. Server has 32GB of RAM and I use 16GB for the buffer pool, while the table itself takes up 5.4GB on disk. |
Posted: 16 Jul 2013 07:00 PM PDT In SQL Server 2008: Measure tps / select statements per second for a specific table? , the question was "how to do it" OK, so I can do it.... but I want automated, pretty charts, so I can easily find and watch over days tps by table (to guide us in optimization work, etc). Yes, we could "roll our own" and write all kinds of reports, etc. but that looks to be 1-n days of work. Is there a tool to automate all or some of this? E.g. chart: TPS by table for database N (with timeline over 1 hr, 24 hrs, 7 days, etc) chart: Top five tables w hottest tps etc |
MySQL Best Practices for a log table to display website statistics Posted: 16 Jul 2013 05:25 PM PDT Google Analytics is a bit too large for what we want to do on our website and we would like to display custom statistics for info logged on each page load. Inserted data would be like: [ip, geo_data, page_url, referer_url, timestamp, etc], statistics would be like: [visits today, visits of a page in particular, New-York visitors for today, etc]. I looked around on Google but only found some partial infos. What are the best practices? For instance, I'm not sure which engine to choose between MyISAM, InnoDB or Archive Storage (I know about table/row locking features, just not sure which is best for this). I'm not sure about index either as I know that indexes can increase the table size and I'm not sure if I should put a single index on each column or indexes on several columns for each possible search combinations. There are probably important facts things I didn't even think about. |
function hangs with null case operation Posted: 16 Jul 2013 05:46 PM PDT I created a function that accepts a start and end date, with the end date being optional. I then wrote a When I call the function for the most recent month of the data: ... the query hangs. If I specify the end date: ... the result is returned normally. I took the code out of the function and ran it fine inside a query window. I can't duplicate the issue the fiddle either. A query like: ... also works fine. Is there anything in the query (below) that could be causing the function to hang when a |
Learning to draw entity relational diagram Posted: 16 Jul 2013 08:14 PM PDT I am trying to learn entity relationship diagram, following question is the objective. Information consulting firm has number of offices with employees. The employees can move offices depend on the project, but location is always recorded. This consulting firm manage number of projects and have vast base of clients. The company's success is depend on assigning relative employee with relative skill and location. The time an employee spends on project is recorded each day. This include time for travel and break time etc. To reduce time for travel they try to allocate employees from the nearest office possible. Draw Entity Relationship diagram and entity listing with attributes. I created entity relationship diagram by using Visio. Please comment on my attempt. Here is my attempt in this image: |
SSIS hangs during For Each File enumeration Posted: 16 Jul 2013 02:24 PM PDT While debugging an SSIS package that loops over thousands of files, execution hangs with no errors and no failed tasks. Slightly simplified, the package does this: When I execute the package (debug mode in BIDS), it runs as expected through the first file and well into the second, but eventually hangs between iterations of the inner loop. When it hangs, all tasks are green, and the two loop containers are yellow. The output window does generate any messages, and going through the process tab, each task has a matching number of starts and stops except for the two loops and the package itself. My first thought was that I'd exceeded an iteration limit for the For Each File enumerator, but I cannot find any documentation for such a limit. What else can I check to determine why SSIS stops running? |
Posted: 16 Jul 2013 04:46 PM PDT Evening we have had a mongodb running fine for a few weeks and all of sudden it has gone down. We are unable to start it back up as it is complaining that the "dbpath (/data/db/) does not exist" I've tried routing through the documentation and have seen others suggest using the dbpath flag, however still no luck. We are starting it with "sudo service mongodb start" and the config file dbpath config points to a directory with our db in. Any advice would be greatly appreciated. Thanks in advance. Gary |
UPDATE statment with and without SELECT keyword Posted: 16 Jul 2013 03:57 PM PDT I have been doing some testing on removing the SELECT keyword from this update: From the testing I have done it looks like the same data is returned whether or not I have the SELECT keyword. I ran the code by our DBA to get his approval of removing the SELECT statement and he said I can not as removing the SELECT statement would cause the update to work differently but he can not articulate how it works differently. I have done some searching on the new and found a few MSDN pages where they run the UPDATE FROM command with out a SELECT keyword. So now I am confused, my testing and other examples I found on MSDN say I can remove the SELECT keyword but my DBA says no. Does the removal of the SELECT keyword affect how the UPDATE FROM works and if so how does it affect it? |
Phonetic search with Solr, for Brazilian Portuguese Posted: 16 Jul 2013 01:14 PM PDT we are implementing Solr as the new internal search engine for our website. Most features are running just fine, others are in the adjusting and calibration phase. But there is one feature that I'm not finding any good documentation over the web. So here it goes:
I was able already to create an index, using the official stemming tokenizer http://docs.lucidworks.com/display/solr/Language+Analysis#LanguageAnalysis-BrazilianPortuguese But the match against uses parsers adapted to understand everything as english. That is where the problem lies. Tutorial, documentation, how to or reply are welcome. |
Will Ola's default IndexOptimizer job update my stats? Posted: 16 Jul 2013 01:20 PM PDT I just installed Ola Hallengren's MaintenanceSolution.sql maintenance scripts. After looking at the output of the log generated by the The motivation behind this is that there is a query that runs once an accounting period (every four weeks) for the date range of the period. The query takes over 20 minutes if I do not manually run My plan is to schedule this job to run once a week. Will it update the stats as required? |
Modeling a database for easy counting / reporting Posted: 16 Jul 2013 02:04 PM PDT I have an app where user is known (user_id) and he can do several actions (action_id). Every time he makes an action I need to save the fact that he made it for reports/analytics. I guess it is similar to other analytic solutions and their db design. Once I have the data, provided with a time window (minutes resolution) I need to count for each user (all or some) the number of times he did actions and which actions he did. (sum all data grouped by action_id). Some assumptions:
I'm considering SQL, NoSQL and RRD to save the data. I put RRD here because it's easy to implement the insert of the data into statds+graphite. I'm concerned if I take this approach, the querying (although provided by graphite) will not be indexed and will probably have to count all the data whenever I ask for a window/user (no indexing). Another problem is that when querying all the data, all users info will be needed, resulting in reading all the files concurrently which I'm not sure is a good thing. SQL - Very easy implementation when inserting the data and querying. Easy to index, order and group by. However I'm not sure it's easy if I'm anticipating high traffic. Also, I'm not sure how effective is the count() of sql (haven't used SQL in the last few years) after group by. Can it offer parallel computation? NoSQL - Is there a solution out there that is the right fit for this type of scenario (perhaps a Map/Reduce algorithm to fast generation of counts in a time window?) Thanks for helping me model |
Trouble configuring Toad for oracle 9.7.2 commercial with instant client [on hold] Posted: 16 Jul 2013 11:54 AM PDT I have succesfully setup the instant client with sqlplus connecting to database correctly. Now on starting Toad 9.7.2 commercial My service name as specified in the tnsnames.ora does shows up under database but there is no instant client showing there . Earlier i had ran TOAD 12 TRIAL and it it worked with the same instant client nicely . what is wrong with this previous version of toad . PLATFORM -MS WIN7 64 bit INSTANT CLIENT basic and sqlplus Version 12.1.0.1.0 for windows 32 bit |
Does existing msrepl_tran_version column need to be removed when re-creating replication? Posted: 16 Jul 2013 11:40 AM PDT What is the process for removing a replication and then re-creating the replication. I know the process to set up replication. However, If i wanted to take a publisher database copy it and restore it and create another publication and replication how would I do that? Would I need to remove the previous mesrepl_tran_version column from all the tables??? The type of replication that I will be re-creating is Transactional Replication with updateable subscribers. Additionally, After I restore the database and create the publication do I need to let the snapshot complete and then back up that database and restore on subscribers and create the subscription or when I restore the database for the publisher can I also restore it on the subscriber? Help will be greatly appreciated! |
Run job once a minute, every day, except during a certain time Posted: 16 Jul 2013 11:19 AM PDT I have a Sql server 2012 agent job which currently runs every minute, every day, etc. I'd like to modify it so that it will run every minute, every day except on Saturdays between 7:00 pm and 11:30 pm. Is this possible with the job scheduler? I tried using a daily frequency and I can change the time on a per day basis that way, but not for a specific day. Current schedule: |
Putting a Select statement in a transaction Posted: 16 Jul 2013 11:10 AM PDT What is the difference between these 2 queries: And without transaction: What is the effect of having a If |
INT or TIMESTAMP for the change time column in a logging table? Posted: 16 Jul 2013 12:54 PM PDT We're having an argument over what to use for storing change date in a new log table for our site. One side says, use The other side says, use Since this is a logging table, we're not concerned about the 1970-2038 range of TIMESTAMP. Which makes more sense? |
How do I insert a row which contains a foreign key? Posted: 16 Jul 2013 03:20 PM PDT Using PostgreSQL v9.1. I have the following tables: Say the first table Is there any way to insert rows into Here is an example of pseudo-code showing what I was hoping could be done: |
Posted: 16 Jul 2013 03:01 PM PDT I have a database in microsoft access. I want to know how to look up a singular datum from a reference table giving a dynamic set of values. Here is a representation of what I mean: I have the following tables: Points for Pushups(m): Fitness Tests: People: I want the query to use People.DOB and the Test date to find the age the person was during the test. I then want the query to use this value to determine which column to look in, and the value from reps to determine which row to look in coming back with the singular value and naming it points. for example I want bob to show Query: Does anyone know how to do the dynamic reference part? I know how to make the query and I know how to get age I just don't know how to use the values as columns in the reference table, I've seen it done, but long ago and never looked into it. |
Is this good/bad database design and why? Posted: 16 Jul 2013 12:27 PM PDT I am building a database for a small project. I don't have much experience so I am unsure, but I get the feeling I am not approaching this correctly. I have a system which allows users to create courses, which contain sections of questions which they answer. I have separated out courses, sections and questions as separate entities, but the questions table requires foreign keys from:
Question: In terms of the Questions table requiring foreign keys from everywhere, is this layout acceptable? Is there any improvements that could be made? Thanks for any help. |
What is the best way to transport database fields from one database to another? Posted: 16 Jul 2013 02:15 PM PDT I have two databases. The table name and fields name are different and field numbers are unequal. I need to transport all fields from one database to another. I can import the database as CSV format. I can use a PHP script which will accomplish this. But is there any other way to do this easily without any script. |
Several PC Cannot Connect To MySQL Server on Certain Time Posted: 16 Jul 2013 12:15 PM PDT I have a network of 10 PCs and 1 Server. I installed MySQL Community Server v5.6.10 64-bit on the Server, and all the PCs are Windows XP, using ODBC Connection 5.2(a). Some of the PC (previously it was 3, now become 4) cannot connect to the MySQL at a certain time, which is 2.00 pm. Before, it was OK, the client can connect to the Server normally, but at 2.00 pm, those PC cannot connect anymore, with this error: From the above error message:
The only action I do to make everything work again is by restarting the PC (client), however, I don't want such solution. I want to know what's the cause, is it because of the ODBC connector, the PC, or any other reason. Could anyone give me some hints on what I should check? Thank you |
Start service error code 3417 with SQL Server 2012 Posted: 16 Jul 2013 11:47 AM PDT I cannot start the SQL Server 2012 service. The event log says the service-specific error code is 3417. In the Event Viewer:
I've changed the service logon account in the service properties but this error happened again. I think the problem is in the EDIT: I've try Rebuild the Resource database but this error happened again! |
Time series data for ad platform Posted: 16 Jul 2013 12:04 PM PDT I am trying to figure out how to store time series data for an ad platform I am working on. Basically I want to know some strategies/solutions for storing billions of rows of data so that I can easily search it (about 6-8 indexes on the table) and get fast counts based on queries. I tried mySQL with the tokuDB engine and this seems to be very fast but is extremely slow when I try to do a COUNT query when the rows reached about 5-8 million. I was looking at some noSQL alternatives but since I want to be able to search this data this is probably not the best solution. I was using dynamoDB. I would have had to store the data is many places in order to account for all the searching on the data. What I am storing is a row in the database for each click on an AD that occurs. This table will grow very fast, especially when this site gets large. Another solution would be to separate this data per advertiser. This means each advertiser will have their own table where all their data goes into. This means it will be much smaller and the COUNT queries will be much faster. I can even split it up by advertiser and month. My goal is to give an advertiser the ability to search and display in a paginated way all their clicks. They should be able to get data between a time period and filter by about 5-8 other indexes if they want to. |
If an account has REQUIRE SUBJECT, does it still need a password? Posted: 16 Jul 2013 12:03 PM PDT I'm in the process of setting up SSL-secured replication between two servers. Each server has its own public/private keypair, and the CA cert is just the concatenation of the two public certs, like this answer. Now I'm updating the replication account with REQUIRE SUBJECT "exact subject of the client" Is there any practical value to also having a password on the replication account (IDENTIFIED BY "secret")? |
Posted: 16 Jul 2013 03:15 PM PDT I have a strange situation here: From time to time I cannot log in with any of my mysql users. I even cannot make a mysql dump. So I started searching in mysql files and I found that I searched about the problem and I found that there was some bug in the MySQL, but it was in the older versions (4.X). I'm running 5.5. Any ideas? Thanks! |
Posted: 16 Jul 2013 01:15 PM PDT I apologize in advance if this is a repeat. I'm not really sure how to properly ask for what I'm running into. I have a large InnoDB table set up. I have 3 sources that all INSERT concurrently at a moderately high volume (300-500 INSERT/s). I have a PERL script running on a cron job every 5 minutes that aggregates data from the last time it ran. It keeps track of the last record it processed by storing the auto_increment value of the last row in a metadata table. The aggregates are always off. But not by much. This has been so frustrating because it's just plain simple math (SELECT account_id,sum(sold) GROUP BY account_id). I have a suspicion that it has something to do with the transaction isolation (repeatable-read). I recently found FlexViews which looks very cool and might address my problem. But I was wondering if anyone could:
I'm like 95% sure the auto_increment thing along with transaction commit ordering is hosing me up, but I don't understand enough about DBs to really ask the right question. Oh, one thing to note, I've already checked over the field types. This issues isn't the result of rounding. |
Are there any tables that I can get all Oracle supported system privileges and object privileges? Posted: 16 Jul 2013 11:24 AM PDT Are there any tables that I can get all Oracle supported system privileges and object privileges? I know there is a view called Thanks. |
SQL Server 2008 R2 (Suspect) mode - how to repair? Posted: 16 Jul 2013 09:30 AM PDT I have SQL Server 2008 R2 database in Suspect mode. I tried to fix it running this query: But the repair output was this message: How to repair the database fully? This repair which i made, works only for some days, then database again goes to Suspect mode... |
You are subscribed to email updates from Recent Questions - Database Administrators Stack Exchange To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |
No comments:
Post a Comment