[CLUE-Talk] Import large fixed length records into mysql (in a single pass)?

Jed S. Baer thag at frii.com
Fri Oct 8 10:04:59 MDT 2004


On Fri, 8 Oct 2004 00:31:01 -0700 (PDT)
dave price <dp_kinaole at yahoo.com> wrote:

> The data is one record per line, but rather than any
> delimiter, we just know the character positions of
> each field in the line. i.e.:
> 
> position  length   field_name
> ========  ======   ==========
>    1         4     field_1
>    5         10    field_2
>    11        10    field_3

Here are three approaches from the book "Data Munging With Perl".

#!/usr/bin/perl
my at cols = qw(n n n n ...); #n is position (0 based) of each column
while (<STDIN>) {
  my @rec;
  my $prev = 0;
  foreach my $col (@cols) {
    push @rec, substr($_, $prev, $col - $prev);
   $prev = $col;
  }
  print join('|', @rec);
  print "\n";
}
+++++++++++++++++++++
#!/usr/bin/perl
my @widths = qw(n n n n n ...); #column lengths
my $regex;
$regex .= "(.{$_}) foreach @widths;
while (<STDIN>) {
  my @rec = /$regex/;
  print join('|', at rec);
  print "\n";
}
++++++++++++++++++++++
#!/usr/bin/perl
my $template = 'a5a20a8a6aa8'; #for unpack
while (<STDIN>) {
  my @rec = unpack($template, $_);
  print join('|', at src);
  print "\n";
}

The author says the unpack method is best. You probably want to use the
"A" element instead of "a" in the unpack template, so you'll get strings
trimmed on the right.

So once you've built your template (or modified one of the other two
scripts appropriately), then you just run the file through Perl (script
name is renoberate.pl):

  $ renoberate.pl < original.file > psv.file

Obviously, fix the /path/to/perl on the shebang line if/as approprate.

jed
-- 
http://s88369986.onlinehome.us/freedomsight/

... it is poor civic hygiene to install technologies that could someday
facilitate a police state. -- Bruce Schneier



More information about the clue-talk mailing list