[clue-tech] hairy sql (postgres) - suggestions welcome

Thu Nov 9 23:06:46 MST 2006

On Nov 9, 2006, at 10:43 PM, Dave Price wrote:

> Right now, the query runs in .5 seconds for 1 month over 100,000
> records returning 250 records for a single device on an amd64, but
> takes almost 1.5 minutes against the live database on a 3ghz Xeon
> which is receiving constant INSERTS from a farm of asterisk boxes.

Doesn't that seem to hint that the problem isn't the query itself,  
but the internal locking going on inside the DB when the INSERT's are  
going?

I'm no SQL guru, but the data you just shared shows that on a quiet  
DB the performance is much faster, but you didn't say how many  
records were in the live DB.  Is is scaling somewhat linearly, or is  
it getting far worse when you go from test to production?

Those INSERTs have me wondering what they do under the hood to the  
engine itself.  I suppose since this is near-real-time data the  
INSERTs are immediately COMMITed (assuming postgres does such things,  
I think it does).  Do those COMMITs if they're happening cause  
massive slowdown?

On some commercial systems (thinking Informix here now, just 'cause  
it's what I'm currently used to seeing), if you're looking at an old  
timeframe on the data that's nice and stable (no chance any new  
records are being added that you truly need for your query, let's say  
you never look at today's data, only yesterday's and back 60 days, or  
whatever...) you can use commands like setting the isolation level to  
allow dirty-reads, and still know your "old" data query you're doing  
is fine.  Then the engine can avoid having to even deal with  
following locking rules on your query... the Informix equivalent of  
this would be "set isolation dirty-read" before the query.

I didn't look over your SQL closely enough to tell if you need  
today's "live" updating data, or if you're just post-processing.

I'm just asking out of curiosity -- I'm not (much) help with your SQL  
query itself.  Especially not knowing the internals of what costs  
what on postgres.

Of course (not knowing how postgres works, I probably shouldn't "go  
here" because I don't know what I'm talking about) -- DISCLAIMER...

Are there any fields you're using in that query that can easily/ 
safely be indexed better by the postgres engine itself to speed up  
your query?  Anything that's not indexed on those tables that should be?

Just generic DB "why's it slow" type of questions -- nothing specific  
in mind here.

Also you didn't mention if you measured raw I/O speeds between those  
two machines.  Are the disks and bus faster by a large margin on the  
AMD64 than on the Xeons?

There's... um, LOTS of variables.  Need to define and eliminate some  
of the standard ones while you're hammering on the SQL itself.  I've  
seen some amazing performance increases on DB's when someone took the  
time to figure out and quantify where the bottlenecks were.   But  
those examples were on commercial DB engines.  Oracle and Informix,  
mostly.

> We are looking at slony replication so that we can query an offline
> copy, but that is not
> in production yet

That's one way to "throw hardware at the problem" and might work, but  
it adds an operational layer of complexity and could cause other  
problems.  Just sayin'...

If you're thinking about replication you're going to need to plan to  
do it at a quieter time if you ever need a "resync" the replication  
server, since in the real world you *will* have replication failures  
eventually.  If there already is such a quiet time, can you run  your  
query during that timeframe and see if it's significantly faster than  
during the heavy/busy time and all the INSERT's?  It's simpler than  
setting up replication to batch up heavy hitter processes for late at  
night, or whenever the production DB is quiet.

Does postgres still have that silly vacuum command?  Has it been run  
lately on the production DB to clean things up?  Do you run your  
query right after that?  It's been a looooong looooong time since I  
messed around with postgres.

Just thoughts...
--
Nate Duehr
nate at natetech.com