Page 1 of 2

messy LOAD DATA INFILE query

Posted: Tue Sep 12, 2006 5:56 am
by yarons
Hi all,

Ok, I've been struggling with this for hours now so I thought I may find my answer here.

What I am trying to do is to load a CSV file into a table.
Sounds simple enough but here are my problems:
My file is full of rows looking like this:
20060711~ ~GAFA ~ ~1000002B ~ ~2~ ~ 2078385~ ~20051220~ ~20060320~ ~99991231~ ~20060320~ ~0~ ~5000000.0~ ~.31~ ~0.0~ ~0.0~ ~0.0~ ~3875.0~ ~LALA~ ~3875.0~ ~0.0~ ~0~ ~INT~ ~0.0~ ~0.0~ ~TEST
Each iteam with surrounded by ~ is a different value (the first and last value are missing 1 ~). However, not all values should be included. Fields number 1, 5, 7 and 12 should be omitted and not entered to the table.

Any ideas?

Thanks

Posted: Tue Sep 12, 2006 9:36 am
by GM
*cough* excel *cough* :wink:

LOL

Posted: Tue Sep 12, 2006 9:42 am
by yarons
yes i am aware of excel ;)

The thing is, this is an automated process so I get the file in real time and need to store it.
Must be possible with SQL, right?

Posted: Tue Sep 12, 2006 9:52 am
by GM
Ah, sorry, I thought it was a one time operation :oops:

A quick scan of the manual (p. 703) gives this:
MySQL Manual wrote: You can also discard an input value by assigning it to a user variable and not assigning the variable
to a table column:

LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @dummy, column2, @dummy, column3);
Could this be useful to you?

hmmm

Posted: Tue Sep 12, 2006 10:16 am
by yarons
You can also discard an input value by assigning it to a user variable and not assigning the variable
to a table column:

LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, @dummy, column2, @dummy, column3);
I am not too sure how to use this. What I need is to ignore a few columns and get rid of the extra whitespaces and ~'s.
I know you can specify the delimiter in LOAD DATA INFILE but this appears to be more messy as there are 2 ~'s plus spaces.

I didn't expect to struggle with this all day but I am. still no real progress....

Posted: Tue Sep 12, 2006 10:30 am
by GM
Can you try:

Code: Select all

LOAD DATA INFILE 'C:\\input.txt' INTO TABLE table_name
FIELDS TERMINATED BY '~ ~' 
(@dummy, col1, col2, col3, @dummy, col4, @dummy, col5, col6, col7, col8, @dummy, col9, col10, col11, col12, col13, col14, col15, col16, col17, col18, col19, col20);
???

EDIT - bad syntax - sorry... changed query

Posted: Tue Sep 12, 2006 10:56 am
by GM
I've managed to get it to read in the row that you gave as an example, using the (improved) query above.

thanks

Posted: Wed Sep 13, 2006 8:56 am
by yarons
I have 2 problems though.

1. I use MySQL 4 and @dummy is not supported. my workaround is to get all the columns in and then drop the ones I don't need. Any better way to achieve that with MySQL 4?

2.

Code: Select all

FIELDS TERMINATED BY '~ ~'
doesn't work. I get all nulls. I tried this

Code: Select all

FIELDS ENCLOSED BY '~'
which works almost perfectly only I get on the last column 2 values (the first one and the last one). Can't seem to find a workaround for that.

Thanks

Re: thanks

Posted: Wed Sep 13, 2006 9:24 am
by onion2k
yarons wrote:1. I use MySQL 4 and @dummy is not supported.
What makes you think that?

Re: thanks

Posted: Wed Sep 13, 2006 9:29 am
by yarons
onion2k wrote:
yarons wrote:1. I use MySQL 4 and @dummy is not supported.
What makes you think that?
well, I tried the query with the @dummy fields and got a syntax error. When I took them out the same query worked fine.
I also didn't find any reference to @dummy in MySQL 4 manual. Am I wrong in thinking that?

Re: thanks

Posted: Wed Sep 13, 2006 9:35 am
by onion2k
yarons wrote:
onion2k wrote:
yarons wrote:1. I use MySQL 4 and @dummy is not supported.
What makes you think that?
well, I tried the query with the @dummy fields and got a syntax error. When I took them out the same query worked fine.
I also didn't find any reference to @dummy in MySQL 4 manual. Am I wrong in thinking that?
@dummy is an SQL variable. MySQL 4 definitely supports them in most operations (selects, updates, inserts, etc). It's possible that it doesn't like them in a load, but highly unlikely. I reckon they should work fine.

Re: thanks

Posted: Wed Sep 13, 2006 9:51 am
by yarons
onion2k wrote:
yarons wrote:
onion2k wrote: What makes you think that?
well, I tried the query with the @dummy fields and got a syntax error. When I took them out the same query worked fine.
I also didn't find any reference to @dummy in MySQL 4 manual. Am I wrong in thinking that?
@dummy is an SQL variable. MySQL 4 definitely supports them in most operations (selects, updates, inserts, etc). It's possible that it doesn't like them in a load, but highly unlikely. I reckon they should work fine.
Just tried it again. still no joy.
If you are right then I'm doing something terribly wrong here.
I ran the exact same query as GM suggested. Didn't work with @dummy but worked without.

Can you see what I'm doing wrong here?

Posted: Wed Sep 13, 2006 10:05 am
by GM
Hi again,

I've had a look in the MySQL 4 manual for LOAD DATA INFILE, and it seems to be quite scarse compared to the MySQL 5 manual. So I can't say with 100% certainty that MySQL 4 does support assigning the input to variables. Unfortunately, I'm without a MySQL 4 environment.

Do you have the option to clean the file up in some other way? (At it's source would be best, if possible).

Could you, for example, parse it into a PHP script and clean it there, and then send it to the database? You'd still be doing it on the fly, but it'd be a little bit of extra work.

Posted: Wed Sep 13, 2006 10:20 am
by yarons
GM wrote:Hi again,

I've had a look in the MySQL 4 manual for LOAD DATA INFILE, and it seems to be quite scarse compared to the MySQL 5 manual. So I can't say with 100% certainty that MySQL 4 does support assigning the input to variables. Unfortunately, I'm without a MySQL 4 environment.

Do you have the option to clean the file up in some other way? (At it's source would be best, if possible).

Could you, for example, parse it into a PHP script and clean it there, and then send it to the database? You'd still be doing it on the fly, but it'd be a little bit of extra work.
This is possible but it won't be a great idea as I'm dealing with very large files here (~2G).
I have the possiblity to upgrade to MySQL 5. Will this solve both my problems?

Thanks,

Posted: Wed Sep 13, 2006 10:48 am
by GM
I can't make any guarantees, but for me the query I posted seemed to work in MySQL 5.0.2 (albeit with a single line of input).

Upgrading may affect other databases you've got, so proceed with caution if you choose this route.

Maybe someone else out there will be good enough to run a couple of tests in a MySQL 4 environment? As onion2k says, I find it strange that it isn't supported.

What is the exact error that you get when you try to run the query I posted?