Thursday, May 16, 2013

[how to] BI and Self Service Reporting Tools that Support Web (HTML) Interface?

[how to] BI and Self Service Reporting Tools that Support Web (HTML) Interface?


BI and Self Service Reporting Tools that Support Web (HTML) Interface?

Posted: 16 May 2013 08:27 PM PDT

Hopefully a simple question. What BI tools (commercial or OSS) support self-service reporting or self-service BI via an HTML web interface?

I specified HTML because I know of at least one that can do it via the web, but only in Silverlight.

Microsoft Assessment and Planning Toolkit for Windows server 2003

Posted: 16 May 2013 08:14 PM PDT

Can anyone please tell me where I can find Microsoft Assessment and Planning Toolkit for Windows server 2003? I tried using version 8.0 but it fails to install on Windows server 2003.

thanks

No way to set maximum retrieved rows in pgadmin query tool

Posted: 16 May 2013 07:17 PM PDT

I'm using version 1.16.1, and I can't find a "max(imum) rows" param to tweak. Any help would be appreciated.

Is there a fast way to change column type without dropping the clustered index?

Posted: 16 May 2013 03:24 PM PDT

I have a large table that is clustered index on a bigint. We would like to change it to just an int to reduce the space and improve the performance. However, dropping the clustered index and recreate the index is extremely slow.

Is there a way to speed this process, or by design this is the only to change the data type?

[Update] Just want to elaborate a bit about my question, the column that I like to update is a date dimension key. It is currently a bigint (8 bytes), and I would like to convert it to int (4 bytes). This should reduce the size of the database, and it should theoretically improve the database performance in general.

Table Design for user-specific and user-agnostic criteria

Posted: 16 May 2013 02:34 PM PDT

I want to design a table that is general enough to accommodate an increasing number of achievements as I come up with new accomplishments to reward. Initially, I thought to design the table as follows

  • Name (of Milestone)
  • Description (of Milestone)
  • Metric

So for the last milestone listed above, I might say

  • Name: 1k All-Star
  • Description: Completed a total of 1000 tasks
  • Metric: 1000

The code would check the number of completed tasks against the metric to determine whether to award the milestone. But there are two types of milestone examples I listed. One in which the metric is universal for all users, and one in which the metric is subjective and depends on the user's previous actions.

Would I want to define universal metrics on one table and make a separate milestone table for the user-specific achievements? I imagine there that I might create a secondary table of these user-specific metrics so as not to repeat fields:

Subjective Milestone Table

  • Name
  • Description
  • Metric - one-to-many connection to Subjective Metric Table

Subjective Metric Table

  • Name
  • User
  • Metric

So as an example

  • Name: Weekend Warrior
  • Description: Exceeded the record for most number of tasks completed on a Sunday
  • Metric: 5

  • Name: Weekend Warrior

  • User: Samson
  • Metric: 5

Or am I going about this all wrong?

How un-clustered is a CLUSTER USING table in Postgres

Posted: 16 May 2013 02:51 PM PDT

I have some tables which benefit greatly from CLUSTER ON/CLUSTER USING in Postgres. Data accessed at the same time is "defragmented" into a small number of disk blocks:

# CLUSTER table USING index_name;  # ANALYZE VERBOSE table;  # CLUSTER VERBOSE;  

A maintenance task periodically runs CLUSTER VERBOSE to keep things fresh. But is there a test I can run to see how fragmented the table is, prior to running CLUSTER VERBOSE? Maybe something like:

# CLUSTER ANALYZE  table 40000 records. 4000 observed clusters, 5000 potential clusters (20% fragmentation)  

How to remove leaf member values?

Posted: 16 May 2013 01:51 PM PDT

I have a dimension hierarchy that is related to multiple fact tables with different granularities. In the case of one fact table, it relates to the middle level of a three level hierarchy.

When you drill down that dimension, I would prefer to have values at that granularity and nothing if you drill down further. The default behaviour seems to be just duplicating the parent total for each of the leaf members.

I thought there was a property to control this but for the life of me I can't find it now.

Is it a good practice to create tables dynamically in a site?

Posted: 16 May 2013 01:58 PM PDT

A friend asked me to build a site with a few "static" and "dynamic" tables.

In fact he wants to have a few tables which can't be deleted, and some "dynamic" tables which can be created directly from site users, according to their needs.

I.E. if the user needs some "optional" that in the current db doesn't exist, he creates a new table for his specific need.

I think this is not a good way to do it, i think it is better to have a list of all possible optionals in a table and then flag them for each user, i.e.:

Table garden(id, camellia, californa_poppy, azalea) contains a generic garden with 3 flower species that the garden can contain

Table users(id, username, some_user_infos, garden_id) contains a generic user and it's garden id.

If garden_id value from table users is 1, and line with id 1 from table garden is (1, 1, 0, 1) It means that user with garden_id=1 has every allowed flower but California Puppy.

This is my idea.

My friend says it would be better to dinamically create tables and connect them to the user to dinamically define new flowers.

Wouldn't it be better to have every possible flower as a field of the garden table and flag it as i want to do?

Time series data for ad platform

Posted: 16 May 2013 12:47 PM PDT

I am trying to figure out how to store time series data for an ad platform I am working on.

Basically I want to know some strategies/solutions for storing billions of rows of data so that I can easily search it (about 6-8 indexes on the table) and get fast counts based on queries.

I tried mySQL with the tokuDB engine and this seems to be very fast but is extremely slow when I try to do a COUNT query when the rows reached about 5-8 million.

I was looking at some noSQL alternatives but since I want to be able to search this data this is probably not the best solution. I was using dynamoDB. I would have had to store the data is many places in order to account for all the searching on the data.

What I am storing is a row in the database for each click on an AD that occurs. This table will grow very fast, especially when this site gets large.

Another solution would be to separate this data per advertiser. This means each advertiser will have their own table where all their data goes into. This means it will be much smaller and the COUNT queries will be much faster. I can even split it up by advertiser and month.

My goal is to give an advertiser the ability to search and display in a paginated way all their clicks. They should be able to get data between a time period and filter by about 5-8 other indexes if they want to.

CTE memory space

Posted: 16 May 2013 06:56 PM PDT

How do CTE memory space work? When you have complex SQL statements executing with the CTE and you get unusually large result set, can it increase the logical reads of the query?

I have a CTE that is executing 32 trillion reads, and other times it runs with 200K

PostgreSQL group roles

Posted: 16 May 2013 02:30 PM PDT

This is not a "group by" question. I am struggling with permissions in postgres because I apparently don't understand how group roles work. Searching for this on SO has proven difficult because I get buried in questions about group by, which isn't the problem.

I have been using postgres to manage a series of research projects. These databases are not connected to any web front-ends. I import all of the data from CSV. For the past few months, I have been the only user. This has worked really well. I am the db owner and nobody else needed to care. Recently, I built a complex database containing data from several sources. Because of the number of tables involved, each data source has its own schema and I have created a series of convenience views to make it easier to work with these various data sets. These schemas should be read only to all users, except for myself.

This project requires several other people to have access to this database. Each user has a username / password and an individual schema, where they can work without cluttering up the public schema. I want to be able to control access to the schemas such that the schemas holding the original data are "select only" to the other analysts. However, I want the all other schemas in the database to more or less function like the public schema in terms of access. These schemas exist for logical structure, not to control access. I want everyone to be able to read/write/create/drop/etc. in these other schemas that don't hold the original data.

Unfortunately, postgres roles are rather tricky or I've been rather slow to understand. Setting this up has been tricky as new tables are created by different users. I keep having to re-run the grant access commands manually as we add new tables.

The alter default privileges seems to imply that I can do this using group roles, but when I have tried to do so, other users were unable to access new tables. This seems to imply that I can set up group roles that will allow me to manage access, but I haven't succeed thus far:

You can change default privileges only for objects that will be created by yourself or by roles that you are a member of.

I'm just looking for some help to understand how to best utilize group roles to control access to schemas/tables/views in Postgres. I don't want to have to manually grant access every time I add a new table, or worse, manually grant access every time I add a new user. I would prefer to be able to add a new user to a larger group, that controls the access.

Full text catalog

Posted: 16 May 2013 09:35 AM PDT

I am using SQL Server 2008 R2.

I need to find when and which user created full text catalog.

also, who added/modified/deleted the fields(that were full text index) in that full text catalog.

SA permissions issues with many nested objects

Posted: 16 May 2013 11:58 AM PDT

I have a broker application that's relatively complicated.

Today, after I made some changes, I started getting the error:

The server principal 'sa' is not able to access the database 'XYZ' under the current security context.

The whole scenario up to the point of the error is:

(In Database ABC)

  • A broker message gets submitted into a queue
  • The queue reader picks it up
  • The queue reader updates a table to indicate the work is starting
  • This table has an update trigger on it. The trigger
    • Checks a table in database XYZ to make sure the inserted value for this field is valid
    • The table is accessed via a synonym

The check in the trigger I believe is what is causing the issue.

If I run the update manually, it works fine. I have also used EXECUTE AS to run the update manually as sa which works fine.

Other relevant facts:

  • sa is the owner of both database ABC and database XYZ
  • there's no funny business with the sa account - it's db_owner role in both DBs as well

Is there some sort of strange scoping happening because all this is running in the context of broker?

Updates

Some more info:

  • DB ownership chaining is on at the server level but not in the DBs. I turned it on and it made no difference.
  • Using a three part name instead of a synonym didn't make a difference
  • Profiler trace shows that the statements are running as SA and not another login
  • Setting both databases TRUSTWORTHY to ON didn't make a difference
  • If I run the queue activation proc manually, it processes correctly (under my credentials).

How to get the information of the optimizer of PostgreSQL?

Posted: 16 May 2013 02:27 PM PDT

I am curious about the information like

  • total optimization time
  • the time spent on each stage of optimization
  • the number of alternative plans and
  • the cost of each alternative plan.

psql coloured prompt misbehaving

Posted: 16 May 2013 02:27 PM PDT

In theory, there is a possibility to use coloured prompts in psql. I decided (not the first time :) that I need them, so I thought I would give them a try. So I have in my .psqlrc the following:

\set prompt_colour '%[%033[1;33m%]'  \set reset_colour '%[%033[0m%]'    \set PROMPT1 :prompt_colour'%n@%/%R%#%':reset_colour  

This is very nice, I have a fancy yellow prompt. Everything is good until I don't start walking back in the command history and hit a long enough command (in my experiments, 37 is OK, 38 is too long and multiline commands of any length do the same) - when I do, the Home button does not go back to the end of the row, it stops (in this case) 11 characters to the right. If now I try to edit the command, it gets edited in the wrong positions.

I try to depict this:

# SELECT 123456789012345678 FROM a_table;  [here is an error, of course, but it doesn't count]  # SELECT 123456789012345678 FROM a_table;  [ cursor is  ^ here after pressing Home ]  [ hitting Enter gives the error ]  ERROR:  syntax error at or near "ELECT"  LINE 1: ELECT 123456789012345678 FROM a_table;  

(From the documentation it looks like this version should also work but here the control characters aren't parsed, so I have no coloured prompt by the two expressions echoed:

\set PROMPT1 "%:prompt_colour:%n@%/%R%#%:reset_colour:"  

but this results in

%[%033[1;33m%]postgres@postgres=#%[%033[0m%]  

)

So does anybody has any idea how to do this properly?

How to avoid using variables in WHERE clause

Posted: 16 May 2013 09:24 AM PDT

Given a (simplified) stored procedure such as this:

CREATE PROCEDURE WeeklyProc(@endDate DATE)  AS  BEGIN    DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)    SELECT      -- Stuff    FROM Sale    WHERE SaleDate BETWEEN @startDate AND @endDate  END  

If the Sale table is large the SELECT can take a long time to execute, apparently because the optimizer can't optimize due to the local variable. We tested running the SELECT part with variables then hard coded dates and the execution time went from ~9 minutes to ~1 second.

We have numerous stored procedures that query based on "fixed" date ranges (week, month, 8-week etc) so the input parameter is just @endDate and @startDate is calculated inside the procedure.

The question is, what is the best practice for avoiding variables in a WHERE clause so as not to compromise the optimizer?

The possibilities we came up with are shown below. Are any of these best practice, or is there another way?

Use a wrapper procedure to turn the variables into parameters.

Parameters don't affect the optimizer the same way local variables do.

CREATE PROCEDURE WeeklyProc(@endDate DATE)  AS  BEGIN     DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)     EXECUTE DateRangeProc @startDate, @endDate  END    CREATE PROCEDURE DateRangeProc(@startDate DATE, @endDate DATE)  AS  BEGIN    SELECT      -- Stuff    FROM Sale    WHERE SaleDate BETWEEN @startDate AND @endDate  END  

Use parameterized dynamic SQL.

CREATE PROCEDURE WeeklyProc(@endDate DATE)  AS  BEGIN    DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)    DECLARE @sql NVARCHAR(4000) = N'      SELECT        -- Stuff      FROM Sale      WHERE SaleDate BETWEEN @startDate AND @endDate    '    DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'    EXECUTE sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate  END  

Use "hard-coded" dynamic SQL.

CREATE PROCEDURE WeeklyProc(@endDate DATE)  AS  BEGIN    DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)    DECLARE @sql NVARCHAR(4000) = N'      SELECT        -- Stuff      FROM Sale      WHERE SaleDate BETWEEN @startDate AND @endDate    '    SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))    SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))    EXECUTE sp_executesql @sql  END  

Use the DATEADD() function directly.

I'm not keen on this because calling functions in the WHERE also affects performance.

CREATE PROCEDURE WeeklyProc(@endDate DATE)  AS  BEGIN    SELECT      -- Stuff    FROM Sale    WHERE SaleDate BETWEEN DATEADD(DAY, -6, @endDate) AND @endDate  END  

Use an optional parameter.

I'm not sure if assigning to parameters would have the same problem as assigning to variables, so this might not be an option. I don't really like this solution but including it for completeness.

CREATE PROCEDURE WeeklyProc(@endDate DATE, @startDate DATE = NULL)  AS  BEGIN    SET @startDate = DATEADD(DAY, -6, @endDate)    SELECT      -- Stuff    FROM Sale    WHERE SaleDate BETWEEN @startDate AND @endDate  END  

Recommendation for mySQL table structure (no code)

Posted: 16 May 2013 08:35 PM PDT

I need some advice on how to layout the tables for a mySQL database.

I will first describe what I am trying to do:

  1. I have a table for users. (This covers basic info for now)
  2. A user will select a certain workout plan (say plan 1-6). Different workout
  3. The user will then check a number of checkboxes. Say there are 15 checkboxes for the different machines, a user can select however many he/she wants. Let's say the user choose 6 different checkboxes, not necessarily in order. Each checkbox would have a certain ID. Those checkboxes could post a value of 1 to the database if checked
  4. On the next page a list of every checkbox the user clicked would be generated. Each checkbox has a picture and name associated with it. EX: For gym machines
  5. Each machine the user chose is displayed along with a certain number of sets that is set by an admin

I already have the layout pretty much done. I just need to know how to layout my tables in the database so i can start populating data dynamically.

Do i just make 1 table for users and ad like 40 different columns? Probably not.

So let's say a user has clicked 6 machines, each of those machines gets a value of 1. Then on the next page each machine with a value of 1 is displayed along with the picture and name associated with that machine. The number of sets will be set by the admin.

And ideas? I can clarify further if needed.

If you want a better idea of what i am doing, take a look at this picture:

enter image description here This picture shows the page after the user has selected which machines he/she wants. You can see that 12 machines were chosen. So 12 different machines were chosen and they were all listed here.

The sets at the top were created by an admin and also reflect each machine shown at the bottom.

Everything is being populated from a database and i already have the layout done i just need a good recommendation for a table structure.

How to disable oracle's MAX_ENABLED_ROLES limit

Posted: 16 May 2013 01:08 PM PDT

How to disable oracle's MAX_ENABLED_ROLES limit or expand the value of limitation. [oracle 10g (win32)]

Avoiding Multiple Queries when Searching for Records Associated with a Set of Records

Posted: 16 May 2013 08:08 PM PDT

So, I am sure I have done something really stupid while designing this, and I'm open to schema changes if they'll really help me out.

On to the problem:

I have a custom shopping cart system (backed by a MySQL database) that includes a products table and a price_rules table that's used for computing discounts and applying promo code discounts. Some price rules don't have promo codes attached to them; some are simply "10% off of product X from March 1st through April 1st" or similar. Because a single price rule can apply to many individual products, I also have a join table called price_rule_product.

When showing a set of products (for example, on the main shop page or listing all products in a category) I'm currently running a separate query for each product to look for price rules that apply to that product. Here's what one of those queries looks like:

SELECT DISTINCT price_rule.* FROM price_rule      INNER JOIN price_rule_product ON          price_rule.id = price_rule_product.price_rule_id      INNER JOIN product ON          product.id = price_rule_product.product_id      WHERE product.id = 10          AND price_rule.apply_to = 'line_items'          AND ( price_rule.start_date IS NULL              OR price_rule.start_date = 0              OR price_rule.start_date <= 1366063902 )          AND ( price_rule.end_date IS NULL              OR price_rule.end_date = 0              OR price_rule.end_date >= 1366063902 )          AND ( price_rule.promo_code IS NULL              OR price_rule.promo_code = '' )      ORDER BY sort ASC  

Oh SQL Gods, I pray you have some suggestions/solutions for this. It is causing some significant performance issues, and I'm just not experienced enough with SQL to figure out where to go from here.

EDIT: Here's the output of EXPLAIN SELECT both with and without DISTINCT:

WITH DISTINCT

id  select_type table               type    possible_keys   key key_len ref                         rows    Extra  1   SIMPLE          product             const   PRIMARY         PRIMARY 3   const                           1   Using index; Using temporary; Using filesort  1   SIMPLE          price_rule      ALL         PRIMARY         NULL    NULL    NULL                            14  Using where  1   SIMPLE          price_rule_product  eq_ref  PRIMARY         PRIMARY 8   craft_art.price_rule.id,const   1   Using where; Using index; Distinct  

WITHOUT DISTINCT

1   SIMPLE          product             const   PRIMARY         PRIMARY 3   const                           1   Using index; Using filesort  1   SIMPLE          price_rule      ALL         PRIMARY         NULL    NULL    NULL                            14  Using where  1   SIMPLE          price_rule_product  eq_ref  PRIMARY         PRIMARY 8   craft_art.price_rule.id,const   1   Using where; Using index  

sql server 2008 execution plan different on two production servers

Posted: 16 May 2013 02:28 PM PDT

We have two servers running SQL server 2008 R2, with identical databases, where we implemented a datawarehousing solution. The cube processing operation was taking a VERY long time in one of the servers, which led us to dig into the queries generated by SQL Server. We found the query that was taking long time, and we saw that it is generated from a Many-to-many relationship in the datawarehouse, where 6 Joins of the same table are being executed. this table contains about 4M records.

We tried to look into the execution plans of this query on both servers. In the first server, the execution plan uses parallelism and executes in 90 secs, whereas the seconds uses only sequential executions, which results in a 14 HOURS execution time.

Data in the two servers is different. The server which takes more time has more data (obviously). We tried to update statistics, rebuild indexes, re-compute execution plans, copy statistics from one server to the other, but no result!

Hope you can help us guys with this problem, because we're running on a production server and clients are waiting to see their reports from the datawarehouse.

Thanx in advance

effective mysql table/index design for 35 million rows+ table, with 200+ corresponding columns (double), any combination of which may be queried

Posted: 16 May 2013 11:08 AM PDT

I am looking for advice on table/index design for the following situation:

i have a large table (stock price history data, InnoDB, 35 million rows and growing) with a compound primary key (assetid (int),date (date)). in addition to the pricing information, i have 200 double values that need to correspond to each record.

CREATE TABLE `mytable` (  `assetid` int(11) NOT NULL,  `date` date NOT NULL,  `close` double NOT NULL,  `f1` double DEFAULT NULL,     `f2` double DEFAULT NULL,  `f3` double DEFAULT NULL,     `f4` double DEFAULT NULL,   ... skip a few …  `f200` double DEFAULT NULL,   PRIMARY KEY (`assetid`, `date`)) ENGINE=`InnoDB` DEFAULT CHARACTER SET latin1 COLLATE      latin1_swedish_ci ROW_FORMAT=COMPACT CHECKSUM=0 DELAY_KEY_WRITE=0       PARTITION BY RANGE COLUMNS(`date`) PARTITIONS 51;  

i initially stored the 200 double columns directly in this table for ease of update and retrieval, and this had been working fine, as the only querying done on this table was by the assetid and date (these are religiously included in any query against this table), and the 200 double columns were only read. My database size was around 45 Gig

However, now i have the requirement where i need to be able to query this table by any combination of these 200 columns (named f1,f2,...f200), for example:

select from mytable   where assetid in (1,2,3,4,5,6,7,....)  and date > '2010-1-1' and date < '2013-4-5'  and f1 > -0.23 and f1 < 0.9  and f117 > 0.012 and f117 < .877  etc,etc  

i have not historically had to deal with this large of an amount of data before, so my first instinct was that indexes were needed on each of these 200 columns, or i would wind up with large table scans, etc. To me this meant that i needed a table for each of the 200 columns with primary key, value, and index the values. So i went with that.

CREATE TABLE `f1` (  `assetid` int(11) NOT NULL DEFAULT '0',  `date` date NOT NULL DEFAULT '0000-00-00',  `value` double NOT NULL DEFAULT '0',  PRIMARY KEY (`assetid`, `date`),  INDEX `val` (`value`)  ) ENGINE=`InnoDB` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci ROW_FORMAT=COMPACT CHECKSUM=0 DELAY_KEY_WRITE=0;  

i filled up and indexed all 200 tables. I left the main table intact with all 200 columns, as regularly it is queried over assetid and date range and all 200 columns are selected. I figured that leaving those columns in the parent table (unindexed) for read purposes, and then additionally having them indexed in their own tables (for join filtering) would be most performant. I ran explains on the new form of the query

select count(p.assetid) as total   from mytable p   inner join f1 f1 on f1.assetid = p.assetid and f1.date = p.date  inner join f2 f2 on f2.assetid = p.assetid and f2.date = p.date   where p.assetid in(1,2,3,4,5,6,7)  and p.date >= '2011-01-01' and p.date < '2013-03-14'   and(f1.value >= 0.96 and f1.value <= 0.97 and f2.value >= 0.96 and f2.value <= 0.97)   

Indeed my desired result was achieved, explain shows me that the rows scanned are much smaller for this query. However i wound up with some undesirable side effects.

1) my database went from 45 Gig to 110 Gig. I can no longer keep the db in RAM. (i have 256Gig of RAM on the way however)

2) nightly inserts of new data now need to be done 200 times instead of once

3) maintenance/defrag of the new 200 tables take 200 times longer than just the 1 table. It cannot be completed in a night.

4) queries against the f1, etc tables are not necessarily performant. for example:

 select min(value) from f1    where assetid in (1,2,3,4,5,6,7)    and date >= '2013-3-18' and date < '2013-3-19'  

the above query, while explain shows that it lookgin at < 1000 rows, can take 30+ seconds to complete. I assume this is because the indexes are too large to fit in memory.

Since that was alot of bad news, I looked further and found partitioning. I implemented partitions on the main table, partitioned on date every 3 months. Monthly seemed to make sense to me but i have read that once you get over 120 partitions or so, performance suffers. partitioning quarterly will leave me under that for the next 20 years or so. each partition is a bit under 2 Gig. i ran explain partitions and everything seems to be pruning properly, so regardless i feel the partitioning was a good step, at the very least for analyze/optimize/repair purposes.

I spent a good deal of time with this article

http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/testing-partitions-large-db.html

my table currently is partitioned with primary key still on it. The article mentions that primary keys can make a partitioned table slower, but if you have a machine that can handle it, primary keys on the partitioned table will be faster. Knowing i have a big machine on the way (256 G RAM), i left the keys on.

so as i see it, here are my options

Option 1

1) remove the extra 200 tables and let the query do table scans to find the f1, f2 etc values. non-unique indexes can actually hurt performance on a properly partitioned table. run an explain before the user runs the query and deny them if the number of rows scanned is over some threshold i define. save myself the pain of the giant database. Heck, it will all be in memory soon anyways.

sub-question:

does it sound like i have chosen an appropriate partition scheme?

Option 2

Partition all the 200 tables using the same 3 months scheme. enjoy the smaller row scans and allow the users to run larger queries. now that they are partitioned at least i can manage them 1 partition at a time for maintenance purposes. Heck, it will all be in memory soon anyways. develop efficient way to update them nightly.

sub-question:

do you see a reason that i may avoid primary key indexes on these f1,f2,f3,f4... tables, knowing that i always have assetid and date when querying? seems counter intuitive to me but i am not used to data sets of this size. that would shrink the database a bunch i assume

Option 3

Drop the f1,f2,f3 columns in the master table to reclaim that space. do 200 joins if i need to read 200 features, maybe it wont be as slow as it sounds.

Option 4

You all have a better way to structure this than i have thought of so far.

* NOTE: i will soon be adding another 50-100 of these double values to each item, so i need to design knowing that is coming

thanks for any and all help

Update #1 - 3/24/2103

I went with the idea suggested in the comments i got below and created one new table with the following setup:

create table 'features'{    assetid int,    date    date,    feature varchar(4),    value   double  }  

I partitioned the table in 3 month intervals.

I blew away the earlier 200 tables so that my database was back down to 45 Gig and started filling up this new table. A day and a half later, it completed, and my database now sits at a chubby 220 Gigs!

It does allow the possibility of removing these 200 values from the master table, as i can get them from one join, but that would really only give me back 25 Gigs or so maybe

I asked it to create a primary key on assetid, date,feature and an index on value, and after 9 hours of chugging it really hadn't made a dent and seemed to freeze up so i killed that part off.

i rebuilt a couple of the partitions but it did not seem to reclaim much/any space.

So that solution looks like it probably isn't going to be ideal. Do rows take up significantly more space than columns i wonder, could that be why this solution took up so much more space?

I came across this article

http://www.chrismoos.com/2010/01/31/mysql-partitioning-tables-with-millions-of-rows

it gave me an idea.

where he says

"At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions."

Now i am range partitioning by date as well, but will also be allowing searches by large date range, which will decrease the effectiveness of my partitioning. I will always have a date range when i search, however i will also always have a list of assetids. Perhaps my solution should be to partition by assetid and date, where i identify typically searched assetid ranges (which i can come up with, there are standard lists, S&P 500, russell 2000, etc). this way i would almost never look at the entire data set.

Then again, i am primary keyed on assetid and date anyways, so maybe that wouldnt help much.

any more thoughts/comments would be appreciated

thanks

How can I verify I'm using SSL to connect to mysql?

Posted: 16 May 2013 03:08 PM PDT

I have configured my server to allow SSL, and have modified my client ~/.my.cnf so I use SSL:

[client]  ssl  ssl-cipher=DHE-RSA-AES256-SHA  ssl-ca=~/certs/ca-cert.pem  

When I log in with my client and view the status, it lists a cipher on the SSL line:

mysql> \s  --------------  SSL:            Cipher in use is DHE-RSA-AES256-SHA  

Without installing something like wireshark to verify that the connection is secure, can I assume that I'm connecting via SSL based on this information?

MySQL Replication using SSL

Posted: 16 May 2013 05:08 PM PDT

I am in the process of replicating my database so i can have a master slave configuration, one of the issues i have is with security i am basically generating my server/client keys and certificates using openssl i also generate my own CA key and certificate to self sign, i understand the issues with self signing certificates on a public website, but do you think this will be as a serious problem when used in replication?

Finding out the hosts blocked by mysql server

Posted: 16 May 2013 10:07 AM PDT

Can someone tell me how to list the hosts which are blocked by the mysql server due to the reason that they crossed the limit of max_connect_errors. Is there any table in which MySQL server keeps this data. I am using mysql-server-5.1.63

Delete word, its meanings, its meaning's example sentences from DB

Posted: 16 May 2013 02:08 PM PDT

I have three tables as below (simplified for demonstration):

words  =====  integer id  text    word    meanings  ========  integer id  integer word_id  text    meaning    examples  ========  integer id  integer meaning_id  text    sentence  

where, word_id stores id of the word in words table and meaning_id stores id of the meaning in meanings table. I am trying to figure out a sql query, given a word's id, to delete the word with all its meanings and example sentences all at one time. Is such sql query possible to compose? If so, how?

Edit1: I am using SQLite3 as the database.

Edit2: I figured the following solution which requires 3 sql queries in order:

DELETE FROM examples WHERE meaning_id IN (SELECT id FROM meanings WHERE word_id=the_given_id);  DELETE FROM meanings WHERE word_id=the_given_id;  DELETE FROM words WHERE id=the_given_id;  

I'm still looking for the answer to my question: is the whole process possible to be done in one query?

How to modify an update in Oracle so it performs faster?

Posted: 16 May 2013 12:08 PM PDT

I have this query:

UPDATE   (      SELECT   h.valid_through_dt, h.LAST_UPDATE_TMSTMP      FROM   ETL_FEE_SCH_TMP d, FEE_SCHEDULE_HISTORICAL h      WHERE       h.FUND_ID = d.FUND_ID      AND h.FEETYPE_NAME = d.FEETYPE_NAME      AND h.BREAKPOINT_TYPE = d.BREAKPOINT_TYPE      AND h.BREAKPOINT_QTY = d.BREAKPOINT_QTY      AND h.LOW_BREAKPOINT_AMT = d.LOW_BREAKPOINT_AMT      AND h.VALID_THROUGH = TO_DATE ('31-DEC-9999', 'dd-mon-yyyy')      AND h.universe = 'DC'      AND h.universe = d.universe      AND EXISTS      (          SELECT 1          FROM FEE_SCHEDULE s          WHERE s.FUND_ID = h.FUND_ID          AND s.FEETYPE_NAME = h.FEETYPE_NAME          AND s.BREAKPOINT_TYPE = h.BREAKPOINT_TYPE          AND s.BREAKPOINT_QTY = h.BREAKPOINT_QTY          AND s.LOW_BREAKPOINT_AMT = h.LOW_BREAKPOINT_AMT          AND s.universe = 'DC'      )  ) updateTable  SET     updateTable.VALID_THROUGH = (SYSDATE - 1),  updateTable.LAST_UPDATE_TMSTMP = SYSTIMESTAMP;  

The trouble that I am having is that this query takes a long time to run. I don't know whether it is possible to run this on parallel, or it would be easier to update a cursor in a pipeline function.

What would you suggest?

This is all the information that I believe it is relevant.

This is the execution plan of the internal select:

Execution Plan  ----------------------------------------------------------  Plan hash value: 57376096  ---------------------------------------------------------------------------------------------------------  | Id  | Operation                    | Name                     | Rows  | Bytes| Cost (%CPU)| Time     |  ---------------------------------------------------------------------------------------------------------  |   0 | SELECT STATEMENT             |                          |     1 |   306 |  8427   (1)| 00:01:42 |  |   1 |  NESTED LOOPS                |                          |       |       |            |          |  |   2 |   NESTED LOOPS               |                          |     1 |    306|  8427   (1)| 00:01:42 |  |   3 |    MERGE JOIN CARTESIAN      |                          |     1 |    192|  8426   (1)| 00:01:42 |  |*  4 |     INDEX RANGE SCAN         | SYS_C000666              |     1 |     96|     2   (0)| 00:00:01 |  |   5 |     BUFFER SORT              |                          |  3045K|   278M|  8425   (1)| 00:01:42 |  |   6 |      SORT UNIQUE             |                          |  3045K|   278M|  8425   (1)| 00:01:42 |  |*  7 |       TABLE ACCESS FULL      | FEE_SCHEDULE             |  3045K|   278M|  8425   (1)| 00:01:42 |  |*  8 |    INDEX RANGE SCAN          | FEE_SCHDL_IDX1           |     1 |       |     1   (0)| 00:00:01 |  |*  9 |   TABLE ACCESS BY INDEX ROWID| FEE_SCHEDULE_HISTORICAL  |     1 |   114 |     1   (0)| 00:00:01 |  ---------------------------------------------------------------------------------------------------------  Predicate Information (identified by operation id):  ---------------------------------------------------     4 - access("D"."UNIVERSE"='DC')     7 - filter("S"."UNIVERSE"='DC')     8 - access("H"."UNIVERSE"='DC' AND "S"."FUND_ID"="H"."FUND_ID" AND                "S"."FEETYPE_NAME"="H"."FEETYPE_NAME" AND                "S"."BREAKPOINT_TYPE"="H"."BREAKPOINT_TYPE" AND                "S"."BREAKPOINT_QTY"="H"."BREAKPOINT_QTY" AND                "S"."LOW_BREAKPOINT_AMT"="H"."LOW_BREAKPOINT_AMT")         filter("H"."FUND_ID"="D"."FUND_ID" AND                "H"."FEETYPE_NAME"="D"."FEETYPE_NAME" AND                "H"."BREAKPOINT_TYPE"="D"."BREAKPOINT_UNIT_TY  

Table data:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  UNIVERSE|FUND_ID   |FEETYPE_NAME |BREAKPOINT_TYPE|BREAKPOINT_QTY|LOW_BREAKPOINT_AMT|HIGH_BREAKPOINT_AMT|FEE_PCT|FEE_SCHDL_SEQ_ID|GROUP_ID|LAST_UPDATE_TMSTMP  |VALID_FROM|VALID_THROUGH|INSERT_TMSTMP        |JOB_ID|  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  DC      |DC9ZTPLPHO|DeferLoad    |Percentage     |4             |10000             |300000             |3.14   |780250          |null    |1/4/2012  3:59:54 PM|6/23/2012 |12/31/9999   |1/5/2011   3:59:54 PM|666   |  DC      |DCE86Y8XFU|RedemptionFee|Percentage     |9             |  100             |100500             |7.67   |780251          |null    |6/4/2012  4:49:54 PM|11/12/2011|12/31/9999   |8/17/2011  2:00:54 PM|666   |  DC      |DCAYL0KONA|FrontLoad    |Percentage     |2             |50000             |601500             |5.00   |780252          |null    |4/25/2012 4:49:54 PM|8/2/2012  |12/31/9999   |12/19/2012 9:59:00 PM|666   |  DC      |DC9ZTPLPHO|DeferLoad    |Percentage     |7             |80000             |900000             |2.24   |780252          |null    |4/25/2012 4:49:54 PM|8/2/2012  |12/31/9999   |12/19/2012 9:59:00 PM|666   |  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  

This is the script of the historical table:

CREATE TABLE FEE_SCHEDULE_HISTORICAL  (    UNIVERSE                        VARCHAR2(2 BYTE) NOT NULL,    FUND_ID                         VARCHAR2(10 BYTE) NOT NULL,    FEETYPE_NAME                    VARCHAR2(75 BYTE),    BREAKPOINT_TYPE                 VARCHAR2(50 BYTE),    BREAKPOINT_QTY                  VARCHAR2(10 BYTE),    LOW_BREAKPOINT_AMT              NUMBER(19,6),    HIGH_BREAKPOINT_AMT             NUMBER(19,6),    FEE_PCT                         NUMBER(19,6),    FEE_SCHDL_SEQ_ID                NUMBER        NOT NULL,    GROUP_ID                        NUMBER,    LAST_UPDATE_TMSTMP              DATE          NOT NULL,    VALID_FROM                      DATE          NOT NULL,    VALID_THROUGH                   DATE          NOT NULL,    INSERT_TMSTMP                   DATE          NOT NULL,    JOB_ID                          NUMBER        NOT NULL  );    CREATE UNIQUE INDEX FEE_SCHDL_PK ON FEE_SCHEDULE_HISTORICAL(FEE_SCHDL_SEQ_ID);    CREATE UNIQUE INDEX FEE_SCHDL_HST_IDX ON FEE_SCHEDULE_HISTORICAL (      UNIVERSE,      FUND_ID,      FEETYPE_NAME,      BREAKPOINT_TYPE,      BREAKPOINT_QTY,       LOW_BREAKPOINT_AMT,      VALID_FROM,      JOB_ID  )    CREATE INDEX FEE_SCHEDULE_HST_IDX2 ON FEE_SCHEDULE_HISTORICAL(LAST_UPDATE_TMSTMP)    CREATE INDEX FEE_SCHEDULE_HST_IDX3 ON FEE_SCHEDULE_HISTORICAL(VALID_THROUGH)    ALTER TABLE FEE_SCHEDULE_HISTORICAL ADD (      CONSTRAINT FEE_SCHDL_PK      PRIMARY KEY      (FEE_SCHDL_SEQ_ID)  );  

This is the other table:

CREATE TABLE FEE_SCHEDULE  (    UNIVERSE                        VARCHAR2(2 BYTE) NOT NULL,    FUND_ID                         VARCHAR2(10 BYTE) NOT NULL,    FEETYPE_NAME                    VARCHAR2(75 BYTE),    BREAKPOINT_TYPE                 VARCHAR2(50 BYTE),    BREAKPOINT_QTY                  VARCHAR2(10 BYTE),    LOW_BREAKPOINT_AMT              NUMBER(19,6),    HIGH_BREAKPOINT_AMT             NUMBER(19,6),    FEE_PCT                         NUMBER(19,6),    JOB_RUN_ID                      NUMBER        NOT NULL,    FILE_DATE                       DATE          NOT NULL,    CYCLE_DATE                      DATE          NOT NULL  )  

The temporary table is the result of FEE_SCHEDULE_HISTORICAL minus FEE_SCHEDULE

Query to find and replace text in all tables and fields of a mysql db

Posted: 16 May 2013 04:08 PM PDT

I need to run a query to find and replace some text in all tables of a mysql database.

I found this query, but it only looks for the text in the tbl_name table and just in the column field.

update tbl_name set column=REPLACE(column, 'fuschia', 'fuchsia');   

I need it to look in all tables and all fields: (everywhere in the database)

Cannot connect to MS SQL 2008 R2 by DBVisualizer. "Native SSPI library not loaded" error

Posted: 16 May 2013 07:18 PM PDT

I try to connect to MS SQL 2008R2 database by DBVisualizer. I use jTDS driver, but following error occurs

An error occurred while establishing the connection:    Long Message:  I/O Error: SSO Failed: Native SSPI library not loaded. Check the java.library.path system property.    Details:     Type: java.sql.SQLException     SQL State: 08S01  

I added ntlmauth.dll to jre/bin but it didn't help. I have java 1.6.0_25.

No comments:

Post a Comment

Search This Blog