Using hsql within hadoop tasks results in java.sql.SQLException: bad TEXT table source file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Using hsql within hadoop tasks results in java.sql.SQLException: bad TEXT table source file

Lavoie, John L

Hello,

 

We need to load a lookup file into some code that is doing mapreduce processing.   I’m looking at using hsql to streamline some custom code that does lookups in this file.   Everything runs great on my desktop to load a csv file as a text table, but when I try to run it on the hadoop cluster we  run into problems.  I have a feeling that this is due to how the hadoop tasks are setting up the file via symlinks.

 

On the hadoop side we have a delimited txt file that we are loading into the hadoop Distributed Cache.  The hadoop framework will then transparently copy this file to all relevant data nodes under a common directory and set up a symlink to make the file appear under the working directory of my tasks.  Since multiple hadoop tasks may run on a single node, the file is copied once to the filecache and then symlinks are created for each task in its own working directory.  The file looks like it is in my current directory as PROVIDER.txt, but is actually located somewhere else as PCP_PRV*.txt.  Potentially multiple parallel processes could be reading from this txt file at once, so having it read-only is very important.

 

When I try to get HSQL to load the file via these commands:

CREATE TEXT TABLE PROVIDERS_INPUT (<column list>);

SET TABLE PROVIDERS_INPUT SOURCE “./PROVIDER/PROVIDER.txt;ignore_first=true;fs=|" DESC;

 

It results in this exception:

java.sql.SQLException: bad TEXT table source file - line number: 0 org.hsqldb.HsqlException: file input/output erroropenning file: java.io.FileNotFoundException: /opt/mapr/tmp/hadoop-mapr/nm-local-dir/filecache/75428/PCP_PRV_01_PCP_PD2014-08-28.txt (Permission denied) error: /opt/mapr/tmp/hadoop-mapr/nm-local-dir/filecache/75428/PCP_PRV_01_PCP_PD2014-08-28.txt in statement [SET TABLE PROVIDERS_INPUT SOURCE "/opt/mapr/tmp/hadoop-mapr/nm-local-dir/usercache/jlavoi3/appcache/application_1458318121846_767748/container_1458318121846_767748_01_000085/PROVIDER/PROVIDER.txt;ignore_first=true;fs=|" DESC]

 

It appears that the symlink is getting resolved correctly, but HSQL still can’t open the file.  I know the file is there and readable because our existing code is able to read from it.  Does using the keyword “DESC” still require having write permission to the file?  Any suggestions?

 

 

Thanks,

John Lavoie


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using hsql within hadoop tasks results in java.sql.SQLException: bad TEXT table source file

Fred Toussi-2
In some versions of the software, the DESC keyword may not have the desired effect.
 
You can use SET TABLE PROVIDERS_INPUT READ ONLY before SET TABLE PROVIDERS_INPUT SOURCE with the current version of HSQLDB. I have checked this to work.
 
In version 1.8.x, the syntax was SET TABLE PROVIDERS_INPUT READONLY TRUE
 
Fred Toussi
 
On Fri, Apr 29, 2016, at 18:37, Lavoie, John L wrote:

Hello,

 

We need to load a lookup file into some code that is doing mapreduce processing.   I’m looking at using hsql to streamline some custom code that does lookups in this file.   Everything runs great on my desktop to load a csv file as a text table, but when I try to run it on the hadoop cluster we  run into problems.  I have a feeling that this is due to how the hadoop tasks are setting up the file via symlinks.

 

On the hadoop side we have a delimited txt file that we are loading into the hadoop Distributed Cache.  The hadoop framework will then transparently copy this file to all relevant data nodes under a common directory and set up a symlink to make the file appear under the working directory of my tasks.  Since multiple hadoop tasks may run on a single node, the file is copied once to the filecache and then symlinks are created for each task in its own working directory.  The file looks like it is in my current directory as PROVIDER.txt, but is actually located somewhere else as PCP_PRV*.txt.  Potentially multiple parallel processes could be reading from this txt file at once, so having it read-only is very important.

 

When I try to get HSQL to load the file via these commands:

CREATE TEXT TABLE PROVIDERS_INPUT (<column list>);

SET TABLE PROVIDERS_INPUT SOURCE “./PROVIDER/PROVIDER.txt;ignore_first=true;fs=|" DESC;

 

It results in this exception:

java.sql.SQLException: bad TEXT table source file - line number: 0 org.hsqldb.HsqlException: file input/output erroropenning file: java.io.FileNotFoundException: /opt/mapr/tmp/hadoop-mapr/nm-local-dir/filecache/75428/PCP_PRV_01_PCP_PD2014-08-28.txt (Permission denied) error: /opt/mapr/tmp/hadoop-mapr/nm-local-dir/filecache/75428/PCP_PRV_01_PCP_PD2014-08-28.txt in statement [SET TABLE PROVIDERS_INPUT SOURCE "/opt/mapr/tmp/hadoop-mapr/nm-local-dir/usercache/jlavoi3/appcache/application_1458318121846_767748/container_1458318121846_767748_01_000085/PROVIDER/PROVIDER.txt;ignore_first=true;fs=|" DESC]

 

It appears that the symlink is getting resolved correctly, but HSQL still can’t open the file.  I know the file is there and readable because our existing code is able to read from it.  Does using the keyword “DESC” still require having write permission to the file?  Any suggestions?

 

 

Thanks,

John Lavoie


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
_______________________________________________
Hsqldb-user mailing list

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Hsqldb-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/hsqldb-user
Loading...