Quantcast

BLOB performance evaluation questions (long)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

BLOB performance evaluation questions (long)

Noah
I'm evaluating HSQL BLOB performance and wanted to get some expert feedback on how things have been looking in my test harness.

My hardware environment is Windows 7 x64 Ultimate, 24 GB RAM, 2 TB 7200 RPM SATA HD, SATA 3.0 on a motherboard w/an Intel XEON E5630 @ 2.53GHz.  

Using HSQL 2.2.9 with Java 1.7.0U3 x64 and deploying the test harness into a Glassfish 3.1.2.2 application server.

My table schema looks like so:

CREATE TABLE FOO (ID BIGINT NOT NULL, SUBID INTEGER NOT NULL, PRODNAME VARCHAR(32) NOT NULL, B1 BLOB NOT NULL, B2 BLOG NOT NULL, B3 BLOB NOT NULL, PRIMARY KEY(ID))

CREATE UNIQUE INDEX I_INSTCOL_PRODNAM ON FOO (SUBID, PRODNAME)
CREATE INDEX I_INSTCOL ON FOO (SUBID)
CREATE SEQUENCE SEQUENCEFOO START WITH 1

I'm loading about 15 GB of binary data into the BLOB columns with a total 63636 rows.

On the Glassfish side I've got a JTA connection pool set up to HSQL running in file mode with the following parameters:

jdbc:hsqldb:file:E/hsql/test/testdb;sql.syntax_ora=true;hsqldb.log_data=false;hsqldb.default_table_type=cached;hsqldb.nio_max_size=2048;shutdown=true

I turned off logging because I wanted the fastest inserts during my test runs as I'm focused now on retrieval times.

In my particular use case I will have to fetch/iterate over the entire table.

I found that unwrapping the Connection from the injected EntityManager and using it to create a JDBC prepared statement which I iterated over gave better performance then JPA.

In this case for each Blob in the fetched resultSet row I call getBinaryStream() and then use Apache Commons IO IOUtils.toByteArray(is, blog.length()) on the input stream to fetch the data.

Reading in the blob data from the resultset avgs 7.3ms per row and about 7.3 minutes to load all 63k rows worth of BLOB data.

I can't store the entire DB in memory, hence the cache mode. In any event it looks like BLOB data is not read into memory regardless.  Is it possible to get faster performance then what I've detailed?  Are there any HSQL configuration tweaks that could squeeze out 20%, 30% or better performance gains? Is there a better approach for fetching the data then my JDBC approach?

FWIW, this approach is being compared vs. reading the data directly from flat files into byte[] which is taking approx. 4.5 minutes to load all 15GB worth of binary data. Obviously there's overhead in HSQL vs flat files but ideally I would like to get things as close as possible to that performance.

Thoughts would be appreciated.

-Noah
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB performance evaluation questions (long)

Fred Toussi-2
Please switch to the latest snapshot jar for 2.3.0, as this fixes some
issues with lob storage and may also be faster.

If your average blob size is about 200K then it is not likely to get a
different result with a different retrieval method. For blobs that are
larger than 500K this would probably make a difference. Instead of
getting a binary stream, you can try the PreparedStatement getBytes
method, or the getBlob method and then getBytes on the blob and see if
there is any difference.

Regarding storing the data, make sure you perform a CHECKPOINT after
inserting the data. Otherwise your data will be lost without the log. As
blob data is not logged, you are not gaining much by turning off the
log.

Fred

On Tue, Jun 11, 2013, at 20:51, Noah wrote:

> I'm evaluating HSQL BLOB performance and wanted to get some expert
> feedback
> on how things have been looking in my test harness.
>
> My hardware environment is Windows 7 x64 Ultimate, 24 GB RAM, 2 TB 7200
> RPM
> SATA HD, SATA 3.0 on a motherboard w/an Intel XEON E5630 @ 2.53GHz.  
>
> Using HSQL 2.2.9 with Java 1.7.0U3 x64 and deploying the test harness
> into a
> Glassfish 3.1.2.2 application server.
>
> My table schema looks like so:
>
> CREATE TABLE FOO (ID BIGINT NOT NULL, SUBID INTEGER NOT NULL, PRODNAME
> VARCHAR(32) NOT NULL, B1 BLOB NOT NULL, B2 BLOG NOT NULL, B3 BLOB NOT
> NULL,
> PRIMARY KEY(ID))
>
> CREATE UNIQUE INDEX I_INSTCOL_PRODNAM ON FOO (SUBID, PRODNAME)
> CREATE INDEX I_INSTCOL ON FOO (SUBID)
> CREATE SEQUENCE SEQUENCEFOO START WITH 1
>
> I'm loading about 15 GB of binary data into the BLOB columns with a total
> 63636 rows.
>
> On the Glassfish side I've got a JTA connection pool set up to HSQL
> running
> in file mode with the following parameters:
>
> jdbc:hsqldb:file:E/hsql/test/testdb;sql.syntax_ora=true;hsqldb.log_data=false;hsqldb.default_table_type=cached;hsqldb.nio_max_size=2048;shutdown=true
>
> I turned off logging because I wanted the fastest inserts during my test
> runs as I'm focused now on retrieval times.
>
> In my particular use case I will have to fetch/iterate over the entire
> table.
>
> I found that unwrapping the Connection from the injected EntityManager
> and
> using it to create a JDBC prepared statement which I iterated over gave
> better performance then JPA.
>
> In this case for each Blob in the fetched resultSet row I call
> getBinaryStream() and then use Apache Commons IO IOUtils.toByteArray(is,
> blog.length()) on the input stream to fetch the data.
>
> Reading in the blob data from the resultset avgs 7.3ms per row and about
> 7.3
> minutes to load all 63k rows worth of BLOB data.
>
> I can't store the entire DB in memory, hence the cache mode. In any event
> it
> looks like BLOB data is not read into memory regardless.  Is it possible
> to
> get faster performance then what I've detailed?  Are there any HSQL
> configuration tweaks that could squeeze out 20%, 30% or better
> performance
> gains? Is there a better approach for fetching the data then my JDBC
> approach?
>
> FWIW, this approach is being compared vs. reading the data directly from
> flat files into byte[] which is taking approx. 4.5 minutes to load all
> 15GB
> worth of binary data. Obviously there's overhead in HSQL vs flat files
> but
> ideally I would like to get things as close as possible to that
> performance.
>
> Thoughts would be appreciated.
>
> -Noah
>
>
>
> --
> View this message in context:
> http://hsqldb.10974.n7.nabble.com/BLOB-performance-evaluation-questions-long-tp3909.html
> Sent from the HSQLDB - User mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB performance evaluation questions (long)

Noah
Hi Fred,

Thanks for your feedback.

I grabbed the latest snapshot. Performance looks about the same.  

The avg. size of my blob columns are:  16.68 KB / 16.68 KB / 199.63 KB.

None of the 1st two columns are ever over 512K, they are all < 200K.

The 3rd column distribution is 68% < 200K, 18% between 200K and 512K, and 14% > 512K, .

My data file is about 16MB and my lobs file is 19GB.  Would non default NIO settings provide any befit here? I am guessing not.

-Noah


On Jun 11, 2013, at 4:26 PM, Fred Toussi <[hidden email]> wrote:

> Please switch to the latest snapshot jar for 2.3.0, as this fixes some
> issues with lob storage and may also be faster.
>
> If your average blob size is about 200K then it is not likely to get a
> different result with a different retrieval method. For blobs that are
> larger than 500K this would probably make a difference. Instead of
> getting a binary stream, you can try the PreparedStatement getBytes
> method, or the getBlob method and then getBytes on the blob and see if
> there is any difference.
>
> Regarding storing the data, make sure you perform a CHECKPOINT after
> inserting the data. Otherwise your data will be lost without the log. As
> blob data is not logged, you are not gaining much by turning off the
> log.
>
> Fred
>
> On Tue, Jun 11, 2013, at 20:51, Noah wrote:
>> I'm evaluating HSQL BLOB performance and wanted to get some expert
>> feedback
>> on how things have been looking in my test harness.
>>
>> My hardware environment is Windows 7 x64 Ultimate, 24 GB RAM, 2 TB 7200
>> RPM
>> SATA HD, SATA 3.0 on a motherboard w/an Intel XEON E5630 @ 2.53GHz.  
>>
>> Using HSQL 2.2.9 with Java 1.7.0U3 x64 and deploying the test harness
>> into a
>> Glassfish 3.1.2.2 application server.
>>
>> My table schema looks like so:
>>
>> CREATE TABLE FOO (ID BIGINT NOT NULL, SUBID INTEGER NOT NULL, PRODNAME
>> VARCHAR(32) NOT NULL, B1 BLOB NOT NULL, B2 BLOG NOT NULL, B3 BLOB NOT
>> NULL,
>> PRIMARY KEY(ID))
>>
>> CREATE UNIQUE INDEX I_INSTCOL_PRODNAM ON FOO (SUBID, PRODNAME)
>> CREATE INDEX I_INSTCOL ON FOO (SUBID)
>> CREATE SEQUENCE SEQUENCEFOO START WITH 1
>>
>> I'm loading about 15 GB of binary data into the BLOB columns with a total
>> 63636 rows.
>>
>> On the Glassfish side I've got a JTA connection pool set up to HSQL
>> running
>> in file mode with the following parameters:
>>
>> jdbc:hsqldb:file:E/hsql/test/testdb;sql.syntax_ora=true;hsqldb.log_data=false;hsqldb.default_table_type=cached;hsqldb.nio_max_size=2048;shutdown=true
>>
>> I turned off logging because I wanted the fastest inserts during my test
>> runs as I'm focused now on retrieval times.
>>
>> In my particular use case I will have to fetch/iterate over the entire
>> table.
>>
>> I found that unwrapping the Connection from the injected EntityManager
>> and
>> using it to create a JDBC prepared statement which I iterated over gave
>> better performance then JPA.
>>
>> In this case for each Blob in the fetched resultSet row I call
>> getBinaryStream() and then use Apache Commons IO IOUtils.toByteArray(is,
>> blog.length()) on the input stream to fetch the data.
>>
>> Reading in the blob data from the resultset avgs 7.3ms per row and about
>> 7.3
>> minutes to load all 63k rows worth of BLOB data.
>>
>> I can't store the entire DB in memory, hence the cache mode. In any event
>> it
>> looks like BLOB data is not read into memory regardless.  Is it possible
>> to
>> get faster performance then what I've detailed?  Are there any HSQL
>> configuration tweaks that could squeeze out 20%, 30% or better
>> performance
>> gains? Is there a better approach for fetching the data then my JDBC
>> approach?
>>
>> FWIW, this approach is being compared vs. reading the data directly from
>> flat files into byte[] which is taking approx. 4.5 minutes to load all
>> 15GB
>> worth of binary data. Obviously there's overhead in HSQL vs flat files
>> but
>> ideally I would like to get things as close as possible to that
>> performance.
>>
>> Thoughts would be appreciated.
>>
>> -Noah
>>
>>
>>
>> --
>> View this message in context:
>> http://hsqldb.10974.n7.nabble.com/BLOB-performance-evaluation-questions-long-tp3909.html
>> Sent from the HSQLDB - User mailing list archive at Nabble.com.
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> Hsqldb-user mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB performance evaluation questions (long)

Fred Toussi-2
Try the other suggested methods and see if there is a difference.

Those methods perform less work when retrieving as they don't convert
back and forth to streams. But this difference might be very small on a
fast machine.

The NIO settings do not apply to the lobs.

Fred

On Wed, Jun 12, 2013, at 0:09, Noah White wrote:

> Hi Fred,
>
> Thanks for your feedback.
>
> I grabbed the latest snapshot. Performance looks about the same.  
>
> The avg. size of my blob columns are:  16.68 KB / 16.68 KB / 199.63 KB.
>
> None of the 1st two columns are ever over 512K, they are all < 200K.
>
> The 3rd column distribution is 68% < 200K, 18% between 200K and 512K, and
> 14% > 512K, .
>
> My data file is about 16MB and my lobs file is 19GB.  Would non default
> NIO settings provide any befit here? I am guessing not.
>
> -Noah
>
>
> On Jun 11, 2013, at 4:26 PM, Fred Toussi <[hidden email]>
> wrote:
>
> > Please switch to the latest snapshot jar for 2.3.0, as this fixes some
> > issues with lob storage and may also be faster.
> >
> > If your average blob size is about 200K then it is not likely to get a
> > different result with a different retrieval method. For blobs that are
> > larger than 500K this would probably make a difference. Instead of
> > getting a binary stream, you can try the PreparedStatement getBytes
> > method, or the getBlob method and then getBytes on the blob and see if
> > there is any difference.
> >
> > Regarding storing the data, make sure you perform a CHECKPOINT after
> > inserting the data. Otherwise your data will be lost without the log. As
> > blob data is not logged, you are not gaining much by turning off the
> > log.
> >
> > Fred
> >
> > On Tue, Jun 11, 2013, at 20:51, Noah wrote:
> >> I'm evaluating HSQL BLOB performance and wanted to get some expert
> >> feedback
> >> on how things have been looking in my test harness.
> >>
> >> My hardware environment is Windows 7 x64 Ultimate, 24 GB RAM, 2 TB 7200
> >> RPM
> >> SATA HD, SATA 3.0 on a motherboard w/an Intel XEON E5630 @ 2.53GHz.  
> >>
> >> Using HSQL 2.2.9 with Java 1.7.0U3 x64 and deploying the test harness
> >> into a
> >> Glassfish 3.1.2.2 application server.
> >>
> >> My table schema looks like so:
> >>
> >> CREATE TABLE FOO (ID BIGINT NOT NULL, SUBID INTEGER NOT NULL, PRODNAME
> >> VARCHAR(32) NOT NULL, B1 BLOB NOT NULL, B2 BLOG NOT NULL, B3 BLOB NOT
> >> NULL,
> >> PRIMARY KEY(ID))
> >>
> >> CREATE UNIQUE INDEX I_INSTCOL_PRODNAM ON FOO (SUBID, PRODNAME)
> >> CREATE INDEX I_INSTCOL ON FOO (SUBID)
> >> CREATE SEQUENCE SEQUENCEFOO START WITH 1
> >>
> >> I'm loading about 15 GB of binary data into the BLOB columns with a total
> >> 63636 rows.
> >>
> >> On the Glassfish side I've got a JTA connection pool set up to HSQL
> >> running
> >> in file mode with the following parameters:
> >>
> >> jdbc:hsqldb:file:E/hsql/test/testdb;sql.syntax_ora=true;hsqldb.log_data=false;hsqldb.default_table_type=cached;hsqldb.nio_max_size=2048;shutdown=true
> >>
> >> I turned off logging because I wanted the fastest inserts during my test
> >> runs as I'm focused now on retrieval times.
> >>
> >> In my particular use case I will have to fetch/iterate over the entire
> >> table.
> >>
> >> I found that unwrapping the Connection from the injected EntityManager
> >> and
> >> using it to create a JDBC prepared statement which I iterated over gave
> >> better performance then JPA.
> >>
> >> In this case for each Blob in the fetched resultSet row I call
> >> getBinaryStream() and then use Apache Commons IO IOUtils.toByteArray(is,
> >> blog.length()) on the input stream to fetch the data.
> >>
> >> Reading in the blob data from the resultset avgs 7.3ms per row and about
> >> 7.3
> >> minutes to load all 63k rows worth of BLOB data.
> >>
> >> I can't store the entire DB in memory, hence the cache mode. In any event
> >> it
> >> looks like BLOB data is not read into memory regardless.  Is it possible
> >> to
> >> get faster performance then what I've detailed?  Are there any HSQL
> >> configuration tweaks that could squeeze out 20%, 30% or better
> >> performance
> >> gains? Is there a better approach for fetching the data then my JDBC
> >> approach?
> >>
> >> FWIW, this approach is being compared vs. reading the data directly from
> >> flat files into byte[] which is taking approx. 4.5 minutes to load all
> >> 15GB
> >> worth of binary data. Obviously there's overhead in HSQL vs flat files
> >> but
> >> ideally I would like to get things as close as possible to that
> >> performance.
> >>
> >> Thoughts would be appreciated.
> >>
> >> -Noah
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://hsqldb.10974.n7.nabble.com/BLOB-performance-evaluation-questions-long-tp3909.html
> >> Sent from the HSQLDB - User mailing list archive at Nabble.com.
> >>
> >> ------------------------------------------------------------------------------
> >> This SF.net email is sponsored by Windows:
> >>
> >> Build for Windows Store.
> >>
> >> http://p.sf.net/sfu/windows-dev2dev
> >> _______________________________________________
> >> Hsqldb-user mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/hsqldb-user
> >
> > ------------------------------------------------------------------------------
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> > _______________________________________________
> > Hsqldb-user mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/hsqldb-user
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Hsqldb-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/hsqldb-user

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Loading...