[clue-tech] hairy sql (postgres) - suggestions welcome
Nate Duehr
nate at natetech.com
Thu Nov 9 23:06:46 MST 2006
On Nov 9, 2006, at 10:43 PM, Dave Price wrote:
> Right now, the query runs in .5 seconds for 1 month over 100,000
> records returning 250 records for a single device on an amd64, but
> takes almost 1.5 minutes against the live database on a 3ghz Xeon
> which is receiving constant INSERTS from a farm of asterisk boxes.
Doesn't that seem to hint that the problem isn't the query itself,
but the internal locking going on inside the DB when the INSERT's are
going?
I'm no SQL guru, but the data you just shared shows that on a quiet
DB the performance is much faster, but you didn't say how many
records were in the live DB. Is is scaling somewhat linearly, or is
it getting far worse when you go from test to production?
Those INSERTs have me wondering what they do under the hood to the
engine itself. I suppose since this is near-real-time data the
INSERTs are immediately COMMITed (assuming postgres does such things,
I think it does). Do those COMMITs if they're happening cause
massive slowdown?
On some commercial systems (thinking Informix here now, just 'cause
it's what I'm currently used to seeing), if you're looking at an old
timeframe on the data that's nice and stable (no chance any new
records are being added that you truly need for your query, let's say
you never look at today's data, only yesterday's and back 60 days, or
whatever...) you can use commands like setting the isolation level to
allow dirty-reads, and still know your "old" data query you're doing
is fine. Then the engine can avoid having to even deal with
following locking rules on your query... the Informix equivalent of
this would be "set isolation dirty-read" before the query.
I didn't look over your SQL closely enough to tell if you need
today's "live" updating data, or if you're just post-processing.
I'm just asking out of curiosity -- I'm not (much) help with your SQL
query itself. Especially not knowing the internals of what costs
what on postgres.
Of course (not knowing how postgres works, I probably shouldn't "go
here" because I don't know what I'm talking about) -- DISCLAIMER...
Are there any fields you're using in that query that can easily/
safely be indexed better by the postgres engine itself to speed up
your query? Anything that's not indexed on those tables that should be?
Just generic DB "why's it slow" type of questions -- nothing specific
in mind here.
Also you didn't mention if you measured raw I/O speeds between those
two machines. Are the disks and bus faster by a large margin on the
AMD64 than on the Xeons?
There's... um, LOTS of variables. Need to define and eliminate some
of the standard ones while you're hammering on the SQL itself. I've
seen some amazing performance increases on DB's when someone took the
time to figure out and quantify where the bottlenecks were. But
those examples were on commercial DB engines. Oracle and Informix,
mostly.
> We are looking at slony replication so that we can query an offline
> copy, but that is not
> in production yet
That's one way to "throw hardware at the problem" and might work, but
it adds an operational layer of complexity and could cause other
problems. Just sayin'...
If you're thinking about replication you're going to need to plan to
do it at a quieter time if you ever need a "resync" the replication
server, since in the real world you *will* have replication failures
eventually. If there already is such a quiet time, can you run your
query during that timeframe and see if it's significantly faster than
during the heavy/busy time and all the INSERT's? It's simpler than
setting up replication to batch up heavy hitter processes for late at
night, or whenever the production DB is quiet.
Does postgres still have that silly vacuum command? Has it been run
lately on the production DB to clean things up? Do you run your
query right after that? It's been a looooong looooong time since I
messed around with postgres.
Just thoughts...
--
Nate Duehr
nate at natetech.com
More information about the clue-tech
mailing list