Slow queries after many deletes

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow queries after many deletes

Sergio Bossa
Hi,

I have a table with several million rows and an index in the form (X,Y,W,Z).
After deleting thousands of rows under the sub-key (x1,y1,w1), queries
over ranges of the key (x1,y1,w1,Z) are way way slower than queries
over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
even for very small datasets: in other words, it's like they're
spending lots of time iterating over the rows, or maybe the index, on
the file system, even if the actual dataset is very small due to
previous deletes; that is, it's like they're still reading some of the
deleted rows / deleted indexes (but not returning them).
To make a concrete example, let's say I have 10000 rows between keys
(x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
only 100 of them from the previous range: now a query in the range is
very slow, even if only 100 values should be found and retrieved.

I'm obviously running frequent checkpoint defrags to optimize space,
wich doesn't help by the way.

Any ideas will be very welcome.
Thanks,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Fred Toussi-2
If you are using MVCC the rows are not immediately deleted.

If any transaction which started before the current transaction is
uncommitted, the rows remain. Once all such transactions committed or
rolled back, the rows are permanently deleted.

When a checkpoint (with or without defrag) is successfully completed,
all deletes become permanent. Checkpoint defrag will definitely remove
the rows and empty spaces are compacted out.

If this access pattern is the most important one for this table, you can
make the table clustered on the columns of the index. This will have an
effect after each subsequent checkpoint defrag. See the Guide on SET
TABLE CLUSTERED

Also, if your .data file easily fits in memory, you can increase the NIO
file size to speed up access.

Fred
On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:

> Hi,
>
> I have a table with several million rows and an index in the form
> (X,Y,W,Z).
> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
> even for very small datasets: in other words, it's like they're
> spending lots of time iterating over the rows, or maybe the index, on
> the file system, even if the actual dataset is very small due to
> previous deletes; that is, it's like they're still reading some of the
> deleted rows / deleted indexes (but not returning them).
> To make a concrete example, let's say I have 10000 rows between keys
> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
> only 100 of them from the previous range: now a query in the range is
> very slow, even if only 100 values should be found and retrieved.
>
> I'm obviously running frequent checkpoint defrags to optimize space,
> wich doesn't help by the way.
>
> Any ideas will be very welcome.
> Thanks,
>
> Sergio B.
>
> --
> Sergio Bossa
> http://www.linkedin.com/in/sergiob
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Sergio Bossa
Hi Fred,

thanks for your answers.

I'm using MVCC, but I see the data file size decreasing after checkpoints/defrags, which should mean deletes definitely take place.
I'm also already using clustered tables on the index, which answers your second point.

So there must be something weird going on, even because all queries for that sub-index, even those out of deleted ranges, slow down a lot...

Sergio Bossa
Sent by iPhone

Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi <[hidden email]> ha scritto:

> If you are using MVCC the rows are not immediately deleted.
>
> If any transaction which started before the current transaction is
> uncommitted, the rows remain. Once all such transactions committed or
> rolled back, the rows are permanently deleted.
>
> When a checkpoint (with or without defrag) is successfully completed,
> all deletes become permanent. Checkpoint defrag will definitely remove
> the rows and empty spaces are compacted out.
>
> If this access pattern is the most important one for this table, you can
> make the table clustered on the columns of the index. This will have an
> effect after each subsequent checkpoint defrag. See the Guide on SET
> TABLE CLUSTERED
>
> Also, if your .data file easily fits in memory, you can increase the NIO
> file size to speed up access.
>
> Fred
> On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
>> Hi,
>>
>> I have a table with several million rows and an index in the form
>> (X,Y,W,Z).
>> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
>> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
>> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
>> even for very small datasets: in other words, it's like they're
>> spending lots of time iterating over the rows, or maybe the index, on
>> the file system, even if the actual dataset is very small due to
>> previous deletes; that is, it's like they're still reading some of the
>> deleted rows / deleted indexes (but not returning them).
>> To make a concrete example, let's say I have 10000 rows between keys
>> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
>> only 100 of them from the previous range: now a query in the range is
>> very slow, even if only 100 values should be found and retrieved.
>>
>> I'm obviously running frequent checkpoint defrags to optimize space,
>> wich doesn't help by the way.
>>
>> Any ideas will be very welcome.
>> Thanks,
>>
>> Sergio B.
>>
>> --
>> Sergio Bossa
>> http://www.linkedin.com/in/sergiob
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Hsqldb-user mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Fred Toussi-2
You can check what's happening in a debugger. First check the query is
definitely using th index, then observe the IndexAVL's IndexRowIterator
inner class and see if redundant rows are read between or after the
returned rows, or if there are big gaps in row id's (which indicate the
file position).

And if you create a test database that shows this behaviour, I will
have a look.

Fred

On Tue, Apr 24, 2012, at 20:17, Sergio Bossa wrote:

> Hi Fred,
>
> thanks for your answers.
>
> I'm using MVCC, but I see the data file size decreasing after
> checkpoints/defrags, which should mean deletes definitely take place.
> I'm also already using clustered tables on the index, which answers your
> second point.
>
> So there must be something weird going on, even because all queries for
> that sub-index, even those out of deleted ranges, slow down a lot...
>
> Sergio Bossa
> Sent by iPhone
>
> Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi
> <[hidden email]> ha scritto:
>
> > If you are using MVCC the rows are not immediately deleted.
> >
> > If any transaction which started before the current transaction is
> > uncommitted, the rows remain. Once all such transactions committed or
> > rolled back, the rows are permanently deleted.
> >
> > When a checkpoint (with or without defrag) is successfully completed,
> > all deletes become permanent. Checkpoint defrag will definitely remove
> > the rows and empty spaces are compacted out.
> >
> > If this access pattern is the most important one for this table, you can
> > make the table clustered on the columns of the index. This will have an
> > effect after each subsequent checkpoint defrag. See the Guide on SET
> > TABLE CLUSTERED
> >
> > Also, if your .data file easily fits in memory, you can increase the NIO
> > file size to speed up access.
> >
> > Fred
> > On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
> >> Hi,
> >>
> >> I have a table with several million rows and an index in the form
> >> (X,Y,W,Z).
> >> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
> >> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
> >> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
> >> even for very small datasets: in other words, it's like they're
> >> spending lots of time iterating over the rows, or maybe the index, on
> >> the file system, even if the actual dataset is very small due to
> >> previous deletes; that is, it's like they're still reading some of the
> >> deleted rows / deleted indexes (but not returning them).
> >> To make a concrete example, let's say I have 10000 rows between keys
> >> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
> >> only 100 of them from the previous range: now a query in the range is
> >> very slow, even if only 100 values should be found and retrieved.
> >>
> >> I'm obviously running frequent checkpoint defrags to optimize space,
> >> wich doesn't help by the way.
> >>
> >> Any ideas will be very welcome.
> >> Thanks,
> >>
> >> Sergio B.
> >>
> >> --
> >> Sergio Bossa
> >> http://www.linkedin.com/in/sergiob
> >>
> >> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond. Discussions
> >> will include endpoint security, mobile security and the latest in malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Hsqldb-user mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
> >
> > ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Hsqldb-user mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Sergio Bossa
That was exactly my plan, just wanted to know if some issue already existed.

I'll keep you posted,
Thanks,
Cheers,

Sergio B.

On Tue, Apr 24, 2012 at 8:42 PM, Fred Toussi
<[hidden email]> wrote:

> You can check what's happening in a debugger. First check the query is
> definitely using th index, then observe the IndexAVL's IndexRowIterator
> inner class and see if redundant rows are read between or after the
> returned rows, or if there are big gaps in row id's (which indicate the
> file position).
>
> And if you create a test database that shows this behaviour, I will
> have a look.
>
> Fred
>
> On Tue, Apr 24, 2012, at 20:17, Sergio Bossa wrote:
>> Hi Fred,
>>
>> thanks for your answers.
>>
>> I'm using MVCC, but I see the data file size decreasing after
>> checkpoints/defrags, which should mean deletes definitely take place.
>> I'm also already using clustered tables on the index, which answers your
>> second point.
>>
>> So there must be something weird going on, even because all queries for
>> that sub-index, even those out of deleted ranges, slow down a lot...
>>
>> Sergio Bossa
>> Sent by iPhone
>>
>> Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi
>> <[hidden email]> ha scritto:
>>
>> > If you are using MVCC the rows are not immediately deleted.
>> >
>> > If any transaction which started before the current transaction is
>> > uncommitted, the rows remain. Once all such transactions committed or
>> > rolled back, the rows are permanently deleted.
>> >
>> > When a checkpoint (with or without defrag) is successfully completed,
>> > all deletes become permanent. Checkpoint defrag will definitely remove
>> > the rows and empty spaces are compacted out.
>> >
>> > If this access pattern is the most important one for this table, you can
>> > make the table clustered on the columns of the index. This will have an
>> > effect after each subsequent checkpoint defrag. See the Guide on SET
>> > TABLE CLUSTERED
>> >
>> > Also, if your .data file easily fits in memory, you can increase the NIO
>> > file size to speed up access.
>> >
>> > Fred
>> > On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
>> >> Hi,
>> >>
>> >> I have a table with several million rows and an index in the form
>> >> (X,Y,W,Z).
>> >> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
>> >> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
>> >> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
>> >> even for very small datasets: in other words, it's like they're
>> >> spending lots of time iterating over the rows, or maybe the index, on
>> >> the file system, even if the actual dataset is very small due to
>> >> previous deletes; that is, it's like they're still reading some of the
>> >> deleted rows / deleted indexes (but not returning them).
>> >> To make a concrete example, let's say I have 10000 rows between keys
>> >> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
>> >> only 100 of them from the previous range: now a query in the range is
>> >> very slow, even if only 100 values should be found and retrieved.
>> >>
>> >> I'm obviously running frequent checkpoint defrags to optimize space,
>> >> wich doesn't help by the way.
>> >>
>> >> Any ideas will be very welcome.
>> >> Thanks,
>> >>
>> >> Sergio B.
>> >>
>> >> --
>> >> Sergio Bossa
>> >> http://www.linkedin.com/in/sergiob
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Live Security Virtual Conference
>> >> Exclusive live event will cover all the ways today's security and
>> >> threat landscape has changed and how IT managers can respond. Discussions
>> >> will include endpoint security, mobile security and the latest in malware
>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> _______________________________________________
>> >> Hsqldb-user mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>> >
>> > ------------------------------------------------------------------------------
>> > Live Security Virtual Conference
>> > Exclusive live event will cover all the ways today's security and
>> > threat landscape has changed and how IT managers can respond. Discussions
>> > will include endpoint security, mobile security and the latest in malware
>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> > _______________________________________________
>> > Hsqldb-user mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Hsqldb-user mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user



--
Sergio Bossa
http://www.linkedin.com/in/sergiob

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Sergio Bossa
Hi Fred,

I've made some tests without removing rows, and results were the same,
so I verified the execution plans and noticed that HSQLDB is using a
less optimal index.
As already said, I have an index in the form (X,Y,W,Z), but also a
primary key in the form (X,Y,W,ZZ): while the query is on (X,Y,W,Z),
the execution plan shows it's using the primary key (sub)index,
leading to a fast index access to (X,Y,W) and a slow scan access to Z.
So is there a way to force using a specific index?

Thanks,

Sergio B.

On Tue, Apr 24, 2012 at 9:54 PM, Sergio Bossa <[hidden email]> wrote:

> That was exactly my plan, just wanted to know if some issue already existed.
>
> I'll keep you posted,
> Thanks,
> Cheers,
>
> Sergio B.
>
> On Tue, Apr 24, 2012 at 8:42 PM, Fred Toussi
> <[hidden email]> wrote:
>> You can check what's happening in a debugger. First check the query is
>> definitely using th index, then observe the IndexAVL's IndexRowIterator
>> inner class and see if redundant rows are read between or after the
>> returned rows, or if there are big gaps in row id's (which indicate the
>> file position).
>>
>> And if you create a test database that shows this behaviour, I will
>> have a look.
>>
>> Fred
>>
>> On Tue, Apr 24, 2012, at 20:17, Sergio Bossa wrote:
>>> Hi Fred,
>>>
>>> thanks for your answers.
>>>
>>> I'm using MVCC, but I see the data file size decreasing after
>>> checkpoints/defrags, which should mean deletes definitely take place.
>>> I'm also already using clustered tables on the index, which answers your
>>> second point.
>>>
>>> So there must be something weird going on, even because all queries for
>>> that sub-index, even those out of deleted ranges, slow down a lot...
>>>
>>> Sergio Bossa
>>> Sent by iPhone
>>>
>>> Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi
>>> <[hidden email]> ha scritto:
>>>
>>> > If you are using MVCC the rows are not immediately deleted.
>>> >
>>> > If any transaction which started before the current transaction is
>>> > uncommitted, the rows remain. Once all such transactions committed or
>>> > rolled back, the rows are permanently deleted.
>>> >
>>> > When a checkpoint (with or without defrag) is successfully completed,
>>> > all deletes become permanent. Checkpoint defrag will definitely remove
>>> > the rows and empty spaces are compacted out.
>>> >
>>> > If this access pattern is the most important one for this table, you can
>>> > make the table clustered on the columns of the index. This will have an
>>> > effect after each subsequent checkpoint defrag. See the Guide on SET
>>> > TABLE CLUSTERED
>>> >
>>> > Also, if your .data file easily fits in memory, you can increase the NIO
>>> > file size to speed up access.
>>> >
>>> > Fred
>>> > On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
>>> >> Hi,
>>> >>
>>> >> I have a table with several million rows and an index in the form
>>> >> (X,Y,W,Z).
>>> >> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
>>> >> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
>>> >> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
>>> >> even for very small datasets: in other words, it's like they're
>>> >> spending lots of time iterating over the rows, or maybe the index, on
>>> >> the file system, even if the actual dataset is very small due to
>>> >> previous deletes; that is, it's like they're still reading some of the
>>> >> deleted rows / deleted indexes (but not returning them).
>>> >> To make a concrete example, let's say I have 10000 rows between keys
>>> >> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
>>> >> only 100 of them from the previous range: now a query in the range is
>>> >> very slow, even if only 100 values should be found and retrieved.
>>> >>
>>> >> I'm obviously running frequent checkpoint defrags to optimize space,
>>> >> wich doesn't help by the way.
>>> >>
>>> >> Any ideas will be very welcome.
>>> >> Thanks,
>>> >>
>>> >> Sergio B.
>>> >>
>>> >> --
>>> >> Sergio Bossa
>>> >> http://www.linkedin.com/in/sergiob
>>> >>
>>> >> ------------------------------------------------------------------------------
>>> >> Live Security Virtual Conference
>>> >> Exclusive live event will cover all the ways today's security and
>>> >> threat landscape has changed and how IT managers can respond. Discussions
>>> >> will include endpoint security, mobile security and the latest in malware
>>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> >> _______________________________________________
>>> >> Hsqldb-user mailing list
>>> >> [hidden email]
>>> >> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>> >
>>> > ------------------------------------------------------------------------------
>>> > Live Security Virtual Conference
>>> > Exclusive live event will cover all the ways today's security and
>>> > threat landscape has changed and how IT managers can respond. Discussions
>>> > will include endpoint security, mobile security and the latest in malware
>>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> > _______________________________________________
>>> > Hsqldb-user mailing list
>>> > [hidden email]
>>> > https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Hsqldb-user mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Hsqldb-user mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>
>
>
> --
> Sergio Bossa
> http://www.linkedin.com/in/sergiob



--
Sergio Bossa
http://www.linkedin.com/in/sergiob

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Sergio Bossa
Also, the wrong choice happens when I select a range on Z. If I use a
plain equality predicate, the correct index is chosen.

On Wed, Apr 25, 2012 at 7:39 AM, Sergio Bossa <[hidden email]> wrote:

> Hi Fred,
>
> I've made some tests without removing rows, and results were the same,
> so I verified the execution plans and noticed that HSQLDB is using a
> less optimal index.
> As already said, I have an index in the form (X,Y,W,Z), but also a
> primary key in the form (X,Y,W,ZZ): while the query is on (X,Y,W,Z),
> the execution plan shows it's using the primary key (sub)index,
> leading to a fast index access to (X,Y,W) and a slow scan access to Z.
> So is there a way to force using a specific index?
>
> Thanks,
>
> Sergio B.
>
> On Tue, Apr 24, 2012 at 9:54 PM, Sergio Bossa <[hidden email]> wrote:
>> That was exactly my plan, just wanted to know if some issue already existed.
>>
>> I'll keep you posted,
>> Thanks,
>> Cheers,
>>
>> Sergio B.
>>
>> On Tue, Apr 24, 2012 at 8:42 PM, Fred Toussi
>> <[hidden email]> wrote:
>>> You can check what's happening in a debugger. First check the query is
>>> definitely using th index, then observe the IndexAVL's IndexRowIterator
>>> inner class and see if redundant rows are read between or after the
>>> returned rows, or if there are big gaps in row id's (which indicate the
>>> file position).
>>>
>>> And if you create a test database that shows this behaviour, I will
>>> have a look.
>>>
>>> Fred
>>>
>>> On Tue, Apr 24, 2012, at 20:17, Sergio Bossa wrote:
>>>> Hi Fred,
>>>>
>>>> thanks for your answers.
>>>>
>>>> I'm using MVCC, but I see the data file size decreasing after
>>>> checkpoints/defrags, which should mean deletes definitely take place.
>>>> I'm also already using clustered tables on the index, which answers your
>>>> second point.
>>>>
>>>> So there must be something weird going on, even because all queries for
>>>> that sub-index, even those out of deleted ranges, slow down a lot...
>>>>
>>>> Sergio Bossa
>>>> Sent by iPhone
>>>>
>>>> Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi
>>>> <[hidden email]> ha scritto:
>>>>
>>>> > If you are using MVCC the rows are not immediately deleted.
>>>> >
>>>> > If any transaction which started before the current transaction is
>>>> > uncommitted, the rows remain. Once all such transactions committed or
>>>> > rolled back, the rows are permanently deleted.
>>>> >
>>>> > When a checkpoint (with or without defrag) is successfully completed,
>>>> > all deletes become permanent. Checkpoint defrag will definitely remove
>>>> > the rows and empty spaces are compacted out.
>>>> >
>>>> > If this access pattern is the most important one for this table, you can
>>>> > make the table clustered on the columns of the index. This will have an
>>>> > effect after each subsequent checkpoint defrag. See the Guide on SET
>>>> > TABLE CLUSTERED
>>>> >
>>>> > Also, if your .data file easily fits in memory, you can increase the NIO
>>>> > file size to speed up access.
>>>> >
>>>> > Fred
>>>> > On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
>>>> >> Hi,
>>>> >>
>>>> >> I have a table with several million rows and an index in the form
>>>> >> (X,Y,W,Z).
>>>> >> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
>>>> >> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
>>>> >> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
>>>> >> even for very small datasets: in other words, it's like they're
>>>> >> spending lots of time iterating over the rows, or maybe the index, on
>>>> >> the file system, even if the actual dataset is very small due to
>>>> >> previous deletes; that is, it's like they're still reading some of the
>>>> >> deleted rows / deleted indexes (but not returning them).
>>>> >> To make a concrete example, let's say I have 10000 rows between keys
>>>> >> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
>>>> >> only 100 of them from the previous range: now a query in the range is
>>>> >> very slow, even if only 100 values should be found and retrieved.
>>>> >>
>>>> >> I'm obviously running frequent checkpoint defrags to optimize space,
>>>> >> wich doesn't help by the way.
>>>> >>
>>>> >> Any ideas will be very welcome.
>>>> >> Thanks,
>>>> >>
>>>> >> Sergio B.
>>>> >>
>>>> >> --
>>>> >> Sergio Bossa
>>>> >> http://www.linkedin.com/in/sergiob
>>>> >>
>>>> >> ------------------------------------------------------------------------------
>>>> >> Live Security Virtual Conference
>>>> >> Exclusive live event will cover all the ways today's security and
>>>> >> threat landscape has changed and how IT managers can respond. Discussions
>>>> >> will include endpoint security, mobile security and the latest in malware
>>>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> >> _______________________________________________
>>>> >> Hsqldb-user mailing list
>>>> >> [hidden email]
>>>> >> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>> >
>>>> > ------------------------------------------------------------------------------
>>>> > Live Security Virtual Conference
>>>> > Exclusive live event will cover all the ways today's security and
>>>> > threat landscape has changed and how IT managers can respond. Discussions
>>>> > will include endpoint security, mobile security and the latest in malware
>>>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> > _______________________________________________
>>>> > Hsqldb-user mailing list
>>>> > [hidden email]
>>>> > https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>> will include endpoint security, mobile security and the latest in malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Hsqldb-user mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Hsqldb-user mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>
>>
>>
>> --
>> Sergio Bossa
>> http://www.linkedin.com/in/sergiob
>
>
>
> --
> Sergio Bossa
> http://www.linkedin.com/in/sergiob



--
Sergio Bossa
http://www.linkedin.com/in/sergiob

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Sergio Bossa
While doing some further investigations, I tried changing some
indexes, and it looks like indexes with common sub-keys are chosen by
order of declaration in case of range queries, rather than full match
on all keys: so, when having an index on (X,Y,W,ZZ) declared before
another index on (X,Y,W,Z), a range query on (X,Y,W,Z) will wrongly
use the former; if declaration order is inverted, the correct index is
used.

On Wed, Apr 25, 2012 at 7:42 AM, Sergio Bossa <[hidden email]> wrote:

> Also, the wrong choice happens when I select a range on Z. If I use a
> plain equality predicate, the correct index is chosen.
>
> On Wed, Apr 25, 2012 at 7:39 AM, Sergio Bossa <[hidden email]> wrote:
>> Hi Fred,
>>
>> I've made some tests without removing rows, and results were the same,
>> so I verified the execution plans and noticed that HSQLDB is using a
>> less optimal index.
>> As already said, I have an index in the form (X,Y,W,Z), but also a
>> primary key in the form (X,Y,W,ZZ): while the query is on (X,Y,W,Z),
>> the execution plan shows it's using the primary key (sub)index,
>> leading to a fast index access to (X,Y,W) and a slow scan access to Z.
>> So is there a way to force using a specific index?
>>
>> Thanks,
>>
>> Sergio B.
>>
>> On Tue, Apr 24, 2012 at 9:54 PM, Sergio Bossa <[hidden email]> wrote:
>>> That was exactly my plan, just wanted to know if some issue already existed.
>>>
>>> I'll keep you posted,
>>> Thanks,
>>> Cheers,
>>>
>>> Sergio B.
>>>
>>> On Tue, Apr 24, 2012 at 8:42 PM, Fred Toussi
>>> <[hidden email]> wrote:
>>>> You can check what's happening in a debugger. First check the query is
>>>> definitely using th index, then observe the IndexAVL's IndexRowIterator
>>>> inner class and see if redundant rows are read between or after the
>>>> returned rows, or if there are big gaps in row id's (which indicate the
>>>> file position).
>>>>
>>>> And if you create a test database that shows this behaviour, I will
>>>> have a look.
>>>>
>>>> Fred
>>>>
>>>> On Tue, Apr 24, 2012, at 20:17, Sergio Bossa wrote:
>>>>> Hi Fred,
>>>>>
>>>>> thanks for your answers.
>>>>>
>>>>> I'm using MVCC, but I see the data file size decreasing after
>>>>> checkpoints/defrags, which should mean deletes definitely take place.
>>>>> I'm also already using clustered tables on the index, which answers your
>>>>> second point.
>>>>>
>>>>> So there must be something weird going on, even because all queries for
>>>>> that sub-index, even those out of deleted ranges, slow down a lot...
>>>>>
>>>>> Sergio Bossa
>>>>> Sent by iPhone
>>>>>
>>>>> Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi
>>>>> <[hidden email]> ha scritto:
>>>>>
>>>>> > If you are using MVCC the rows are not immediately deleted.
>>>>> >
>>>>> > If any transaction which started before the current transaction is
>>>>> > uncommitted, the rows remain. Once all such transactions committed or
>>>>> > rolled back, the rows are permanently deleted.
>>>>> >
>>>>> > When a checkpoint (with or without defrag) is successfully completed,
>>>>> > all deletes become permanent. Checkpoint defrag will definitely remove
>>>>> > the rows and empty spaces are compacted out.
>>>>> >
>>>>> > If this access pattern is the most important one for this table, you can
>>>>> > make the table clustered on the columns of the index. This will have an
>>>>> > effect after each subsequent checkpoint defrag. See the Guide on SET
>>>>> > TABLE CLUSTERED
>>>>> >
>>>>> > Also, if your .data file easily fits in memory, you can increase the NIO
>>>>> > file size to speed up access.
>>>>> >
>>>>> > Fred
>>>>> > On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
>>>>> >> Hi,
>>>>> >>
>>>>> >> I have a table with several million rows and an index in the form
>>>>> >> (X,Y,W,Z).
>>>>> >> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
>>>>> >> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
>>>>> >> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
>>>>> >> even for very small datasets: in other words, it's like they're
>>>>> >> spending lots of time iterating over the rows, or maybe the index, on
>>>>> >> the file system, even if the actual dataset is very small due to
>>>>> >> previous deletes; that is, it's like they're still reading some of the
>>>>> >> deleted rows / deleted indexes (but not returning them).
>>>>> >> To make a concrete example, let's say I have 10000 rows between keys
>>>>> >> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
>>>>> >> only 100 of them from the previous range: now a query in the range is
>>>>> >> very slow, even if only 100 values should be found and retrieved.
>>>>> >>
>>>>> >> I'm obviously running frequent checkpoint defrags to optimize space,
>>>>> >> wich doesn't help by the way.
>>>>> >>
>>>>> >> Any ideas will be very welcome.
>>>>> >> Thanks,
>>>>> >>
>>>>> >> Sergio B.
>>>>> >>
>>>>> >> --
>>>>> >> Sergio Bossa
>>>>> >> http://www.linkedin.com/in/sergiob
>>>>> >>
>>>>> >> ------------------------------------------------------------------------------
>>>>> >> Live Security Virtual Conference
>>>>> >> Exclusive live event will cover all the ways today's security and
>>>>> >> threat landscape has changed and how IT managers can respond. Discussions
>>>>> >> will include endpoint security, mobile security and the latest in malware
>>>>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> >> _______________________________________________
>>>>> >> Hsqldb-user mailing list
>>>>> >> [hidden email]
>>>>> >> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>>> >
>>>>> > ------------------------------------------------------------------------------
>>>>> > Live Security Virtual Conference
>>>>> > Exclusive live event will cover all the ways today's security and
>>>>> > threat landscape has changed and how IT managers can respond. Discussions
>>>>> > will include endpoint security, mobile security and the latest in malware
>>>>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> > _______________________________________________
>>>>> > Hsqldb-user mailing list
>>>>> > [hidden email]
>>>>> > https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Live Security Virtual Conference
>>>>> Exclusive live event will cover all the ways today's security and
>>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>>> will include endpoint security, mobile security and the latest in malware
>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> _______________________________________________
>>>>> Hsqldb-user mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>> will include endpoint security, mobile security and the latest in malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Hsqldb-user mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>>>
>>>
>>>
>>> --
>>> Sergio Bossa
>>> http://www.linkedin.com/in/sergiob
>>
>>
>>
>> --
>> Sergio Bossa
>> http://www.linkedin.com/in/sergiob
>
>
>
> --
> Sergio Bossa
> http://www.linkedin.com/in/sergiob



--
Sergio Bossa
http://www.linkedin.com/in/sergiob

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Fred Toussi-2
OK.

For conditions such as WHERE X = m AND Y = n AND W = o AND Z > p, the
three equality conditions select the index and the last condition is
later
added to the selected index if column Z happens to appear in the index
after the other columns. If there are multiple indexes having any
combination of X,Y,Z as their initial columns, the first one with the
largest total column count is selected.

There is indeed a way to force the correct index by adding ORDER
BY and LIMIT

ORDER BY X, Y, W, Z LIMIT 1000000000 USING INDEX

Fred


On Wed, Apr 25, 2012, at 10:13, Sergio Bossa wrote:

> While doing some further investigations, I tried changing some
> indexes, and it looks like indexes with common sub-keys are chosen by
> order of declaration in case of range queries, rather than full match
> on all keys: so, when having an index on (X,Y,W,ZZ) declared before
> another index on (X,Y,W,Z), a range query on (X,Y,W,Z) will wrongly
> use the former; if declaration order is inverted, the correct index is
> used.
>
> On Wed, Apr 25, 2012 at 7:42 AM, Sergio Bossa <[hidden email]>
> wrote:
> > Also, the wrong choice happens when I select a range on Z. If I use a
> > plain equality predicate, the correct index is chosen.
> >
> > On Wed, Apr 25, 2012 at 7:39 AM, Sergio Bossa <[hidden email]> wrote:
> >> Hi Fred,
> >>
> >> I've made some tests without removing rows, and results were the same,
> >> so I verified the execution plans and noticed that HSQLDB is using a
> >> less optimal index.
> >> As already said, I have an index in the form (X,Y,W,Z), but also a
> >> primary key in the form (X,Y,W,ZZ): while the query is on (X,Y,W,Z),
> >> the execution plan shows it's using the primary key (sub)index,
> >> leading to a fast index access to (X,Y,W) and a slow scan access to Z.
> >> So is there a way to force using a specific index?
> >>
> >> Thanks,
> >>
> >> Sergio B.
> >>
> >> On Tue, Apr 24, 2012 at 9:54 PM, Sergio Bossa <[hidden email]> wrote:
> >>> That was exactly my plan, just wanted to know if some issue already existed.
> >>>
> >>> I'll keep you posted,
> >>> Thanks,
> >>> Cheers,
> >>>
> >>> Sergio B.
> >>>
> >>> On Tue, Apr 24, 2012 at 8:42 PM, Fred Toussi
> >>> <[hidden email]> wrote:
> >>>> You can check what's happening in a debugger. First check the query is
> >>>> definitely using th index, then observe the IndexAVL's IndexRowIterator
> >>>> inner class and see if redundant rows are read between or after the
> >>>> returned rows, or if there are big gaps in row id's (which indicate the
> >>>> file position).
> >>>>
> >>>> And if you create a test database that shows this behaviour, I will
> >>>> have a look.
> >>>>
> >>>> Fred
> >>>>
> >>>> On Tue, Apr 24, 2012, at 20:17, Sergio Bossa wrote:
> >>>>> Hi Fred,
> >>>>>
> >>>>> thanks for your answers.
> >>>>>
> >>>>> I'm using MVCC, but I see the data file size decreasing after
> >>>>> checkpoints/defrags, which should mean deletes definitely take place.
> >>>>> I'm also already using clustered tables on the index, which answers your
> >>>>> second point.
> >>>>>
> >>>>> So there must be something weird going on, even because all queries for
> >>>>> that sub-index, even those out of deleted ranges, slow down a lot...
> >>>>>
> >>>>> Sergio Bossa
> >>>>> Sent by iPhone
> >>>>>
> >>>>> Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi
> >>>>> <[hidden email]> ha scritto:
> >>>>>
> >>>>> > If you are using MVCC the rows are not immediately deleted.
> >>>>> >
> >>>>> > If any transaction which started before the current transaction is
> >>>>> > uncommitted, the rows remain. Once all such transactions committed or
> >>>>> > rolled back, the rows are permanently deleted.
> >>>>> >
> >>>>> > When a checkpoint (with or without defrag) is successfully completed,
> >>>>> > all deletes become permanent. Checkpoint defrag will definitely remove
> >>>>> > the rows and empty spaces are compacted out.
> >>>>> >
> >>>>> > If this access pattern is the most important one for this table, you can
> >>>>> > make the table clustered on the columns of the index. This will have an
> >>>>> > effect after each subsequent checkpoint defrag. See the Guide on SET
> >>>>> > TABLE CLUSTERED
> >>>>> >
> >>>>> > Also, if your .data file easily fits in memory, you can increase the NIO
> >>>>> > file size to speed up access.
> >>>>> >
> >>>>> > Fred
> >>>>> > On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
> >>>>> >> Hi,
> >>>>> >>
> >>>>> >> I have a table with several million rows and an index in the form
> >>>>> >> (X,Y,W,Z).
> >>>>> >> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
> >>>>> >> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
> >>>>> >> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
> >>>>> >> even for very small datasets: in other words, it's like they're
> >>>>> >> spending lots of time iterating over the rows, or maybe the index, on
> >>>>> >> the file system, even if the actual dataset is very small due to
> >>>>> >> previous deletes; that is, it's like they're still reading some of the
> >>>>> >> deleted rows / deleted indexes (but not returning them).
> >>>>> >> To make a concrete example, let's say I have 10000 rows between keys
> >>>>> >> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
> >>>>> >> only 100 of them from the previous range: now a query in the range is
> >>>>> >> very slow, even if only 100 values should be found and retrieved.
> >>>>> >>
> >>>>> >> I'm obviously running frequent checkpoint defrags to optimize space,
> >>>>> >> wich doesn't help by the way.
> >>>>> >>
> >>>>> >> Any ideas will be very welcome.
> >>>>> >> Thanks,
> >>>>> >>
> >>>>> >> Sergio B.
> >>>>> >>
> >>>>> >> --
> >>>>> >> Sergio Bossa
> >>>>> >> http://www.linkedin.com/in/sergiob

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|

Re: Slow queries after many deletes

Sergio Bossa
I managed to workaround the problem by setting up a primary key including all fields in the desired order.

Btw, is it really that correct to do index selection based on declaration order in your opinion? Shouldn't it first check if any indexes with exactly those columns exists? Or is there any reason for preferring declaration order?

Thanks,
Cheers,

Sergio Bossa
Sent by iPhone

Il giorno 25/apr/2012, alle ore 11:02, Fred Toussi <[hidden email]> ha scritto:

> OK.
>
> For conditions such as WHERE X = m AND Y = n AND W = o AND Z > p, the
> three equality conditions select the index and the last condition is
> later
> added to the selected index if column Z happens to appear in the index
> after the other columns. If there are multiple indexes having any
> combination of X,Y,Z as their initial columns, the first one with the
> largest total column count is selected.
>
> There is indeed a way to force the correct index by adding ORDER
> BY and LIMIT
>
> ORDER BY X, Y, W, Z LIMIT 1000000000 USING INDEX
>
> Fred
>
>
> On Wed, Apr 25, 2012, at 10:13, Sergio Bossa wrote:
>> While doing some further investigations, I tried changing some
>> indexes, and it looks like indexes with common sub-keys are chosen by
>> order of declaration in case of range queries, rather than full match
>> on all keys: so, when having an index on (X,Y,W,ZZ) declared before
>> another index on (X,Y,W,Z), a range query on (X,Y,W,Z) will wrongly
>> use the former; if declaration order is inverted, the correct index is
>> used.
>>
>> On Wed, Apr 25, 2012 at 7:42 AM, Sergio Bossa <[hidden email]>
>> wrote:
>>> Also, the wrong choice happens when I select a range on Z. If I use a
>>> plain equality predicate, the correct index is chosen.
>>>
>>> On Wed, Apr 25, 2012 at 7:39 AM, Sergio Bossa <[hidden email]> wrote:
>>>> Hi Fred,
>>>>
>>>> I've made some tests without removing rows, and results were the same,
>>>> so I verified the execution plans and noticed that HSQLDB is using a
>>>> less optimal index.
>>>> As already said, I have an index in the form (X,Y,W,Z), but also a
>>>> primary key in the form (X,Y,W,ZZ): while the query is on (X,Y,W,Z),
>>>> the execution plan shows it's using the primary key (sub)index,
>>>> leading to a fast index access to (X,Y,W) and a slow scan access to Z.
>>>> So is there a way to force using a specific index?
>>>>
>>>> Thanks,
>>>>
>>>> Sergio B.
>>>>
>>>> On Tue, Apr 24, 2012 at 9:54 PM, Sergio Bossa <[hidden email]> wrote:
>>>>> That was exactly my plan, just wanted to know if some issue already existed.
>>>>>
>>>>> I'll keep you posted,
>>>>> Thanks,
>>>>> Cheers,
>>>>>
>>>>> Sergio B.
>>>>>
>>>>> On Tue, Apr 24, 2012 at 8:42 PM, Fred Toussi
>>>>> <[hidden email]> wrote:
>>>>>> You can check what's happening in a debugger. First check the query is
>>>>>> definitely using th index, then observe the IndexAVL's IndexRowIterator
>>>>>> inner class and see if redundant rows are read between or after the
>>>>>> returned rows, or if there are big gaps in row id's (which indicate the
>>>>>> file position).
>>>>>>
>>>>>> And if you create a test database that shows this behaviour, I will
>>>>>> have a look.
>>>>>>
>>>>>> Fred
>>>>>>
>>>>>> On Tue, Apr 24, 2012, at 20:17, Sergio Bossa wrote:
>>>>>>> Hi Fred,
>>>>>>>
>>>>>>> thanks for your answers.
>>>>>>>
>>>>>>> I'm using MVCC, but I see the data file size decreasing after
>>>>>>> checkpoints/defrags, which should mean deletes definitely take place.
>>>>>>> I'm also already using clustered tables on the index, which answers your
>>>>>>> second point.
>>>>>>>
>>>>>>> So there must be something weird going on, even because all queries for
>>>>>>> that sub-index, even those out of deleted ranges, slow down a lot...
>>>>>>>
>>>>>>> Sergio Bossa
>>>>>>> Sent by iPhone
>>>>>>>
>>>>>>> Il giorno 24/apr/2012, alle ore 20:02, Fred Toussi
>>>>>>> <[hidden email]> ha scritto:
>>>>>>>
>>>>>>>> If you are using MVCC the rows are not immediately deleted.
>>>>>>>>
>>>>>>>> If any transaction which started before the current transaction is
>>>>>>>> uncommitted, the rows remain. Once all such transactions committed or
>>>>>>>> rolled back, the rows are permanently deleted.
>>>>>>>>
>>>>>>>> When a checkpoint (with or without defrag) is successfully completed,
>>>>>>>> all deletes become permanent. Checkpoint defrag will definitely remove
>>>>>>>> the rows and empty spaces are compacted out.
>>>>>>>>
>>>>>>>> If this access pattern is the most important one for this table, you can
>>>>>>>> make the table clustered on the columns of the index. This will have an
>>>>>>>> effect after each subsequent checkpoint defrag. See the Guide on SET
>>>>>>>> TABLE CLUSTERED
>>>>>>>>
>>>>>>>> Also, if your .data file easily fits in memory, you can increase the NIO
>>>>>>>> file size to speed up access.
>>>>>>>>
>>>>>>>> Fred
>>>>>>>> On Tue, Apr 24, 2012, at 18:45, Sergio Bossa wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have a table with several million rows and an index in the form
>>>>>>>>> (X,Y,W,Z).
>>>>>>>>> After deleting thousands of rows under the sub-key (x1,y1,w1), queries
>>>>>>>>> over ranges of the key (x1,y1,w1,Z) are way way slower than queries
>>>>>>>>> over different sub-keys such as (x1,y2,w2,Z), and cause lots of IO
>>>>>>>>> even for very small datasets: in other words, it's like they're
>>>>>>>>> spending lots of time iterating over the rows, or maybe the index, on
>>>>>>>>> the file system, even if the actual dataset is very small due to
>>>>>>>>> previous deletes; that is, it's like they're still reading some of the
>>>>>>>>> deleted rows / deleted indexes (but not returning them).
>>>>>>>>> To make a concrete example, let's say I have 10000 rows between keys
>>>>>>>>> (x1,y1,w1,1) and (x1,y1,w1,10000), then I delete 9900 rows keeping
>>>>>>>>> only 100 of them from the previous range: now a query in the range is
>>>>>>>>> very slow, even if only 100 values should be found and retrieved.
>>>>>>>>>
>>>>>>>>> I'm obviously running frequent checkpoint defrags to optimize space,
>>>>>>>>> wich doesn't help by the way.
>>>>>>>>>
>>>>>>>>> Any ideas will be very welcome.
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Sergio B.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sergio Bossa
>>>>>>>>> http://www.linkedin.com/in/sergiob
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user