[ale] perl/postgresql question
Geoffrey Myers
lists at serioustechnology.com
Fri Apr 1 08:59:01 EDT 2011
Jim Kinney wrote:
> Try a single table dump then feed through perl and output to a file.
> Now attempt the pgrestore to find which table is choking. I'm starting
> to think the problem is the restore not the perl.
I already know which table it is as the error tells me that. pg_restore
is clearly not liking the character, so I'm not sure how the restore
could be the issue. Problem is, that character should not be in the
data as the script should have removed it.
> I've never had to do a character translation like this before on a
> database. My pg stuff always goes to files as an intermediate step
> unless it's an emergency replication.
The whole issue surrounds the fact that the existing database is
SQL_ASCII and we are trying to convert it to UTF8. I can dump and
recreate SQL_ASCII to SQL_ASCII all day long. It's UTF8 that doesn't
like the funky character.
> Need to do a search for the offending character in the pgdump.
This is likely the approach I'll have to take, but searching for a hex
value in an ascii dump is questionable? Not sure what I'd be searching for.
>
> On Apr 1, 2011 7:45 AM, "Geoffrey Myers" <lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>> wrote:
> > Jim Kinney wrote:
> >> I was searching for a clue and saw the post on the pg list :-) Got all
> >> excited thinking I had found a solution to point to you to! Then I
> >> realized, "Hey! This looks REALLY similar..."
> >>
> >> Maybe a single run then a grep for non-ascii characters?
> >
> > Not sure what you're suggesting. The perl code is pulling the data out
> > of the database, cleaning it, putting it back in. Are you suggesting
> > pulling the data out and piping to grep? Are you thinking there's a bug
> > in the perl code?
> >
> >>
> >> On Thu, Mar 31, 2011 at 4:11 PM, Geoffrey Myers
> >> <lists at serioustechnology.com <mailto:lists at serioustechnology.com>
> <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>>> wrote:
> >>
> >> Jim Kinney wrote:
> >> > The "g" _should_ do it. Quite puzzling.
> >> > Can you do a run and find an example of when it fails? Is it
> >> always 0xbd?
> >>
> >> The sad thing is, all it tells me is which table in the database has the
> >> error. It does not identify the field. I've posted to the Postgres
> >> list to see if there's any way to narrow down the culprit, there is not.
> >>
> >> Once it fails, it stops loading that table, so I never get pass this
> >> error.
> >>
> >> When I first caught this error, I was simply trying to load an SQL_ASCII
> >> database dump into a UTF8 database. That's when I wrote my
> >> cleandata.pl <http://cleandata.pl> <http://cleandata.pl>
> >> code. The code cleans 62750 records. I just don't understand why it
> >> doesn't fix this one.
> >>
> >> > Maybe someone in Tibet doesn't like your code :-) (It's a Tibetan
> >> > character code in utf8)
> >>
> >> But what is it in SQL_ASCII? What happened is, some enterprising users
> >> figured out how to enter special characters from the keyboard, say, like
> >> a degrees symbol. That goes in to the SQL_ASCII database fine, but when
> >> you try to load that into the UTF8 encoded database, it pukes.
> >>
> >> >
> >> > On Thu, Mar 31, 2011 at 2:59 PM, Geoffrey Myers
> >> > <lists at serioustechnology.com <mailto:lists at serioustechnology.com>
> <mailto:lists at serioustechnology.com <mailto:lists at serioustechnology.com>>
> >> <mailto:lists at serioustechnology.com <mailto:lists at serioustechnology.com>
> >> <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>>>> wrote:
> >> >
> >> > Jim Kinney wrote:
> >> > > should it be replaced with nothing or with a NULL? Does
> >> the perl
> >> > pointer
> >> > > jump to the next character automatically when a match is
> >> made thus
> >> > > skipping a character in your replace? Does a double run
> >> fix it?
> >> >
> >> > Missed the first question. It should be removed, not
> >> replaced with an
> >> > null. For example I have:
> >> >
> >> > This is my <UGLY CHARACTER>data
> >> >
> >> > Should then be:
> >> >
> >> > This is my data
> >> >
> >> > >
> >> > > On Thu, Mar 31, 2011 at 12:04 PM, Geoffrey Myers
> >> > > <lists at serioustechnology.com <mailto:lists at serioustechnology.com>
> >> <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>>
> >> <mailto:lists at serioustechnology.com <mailto:lists at serioustechnology.com>
> >> <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>>>
> >> > <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>
> >> <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>>
> >> > <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>
> >> <mailto:lists at serioustechnology.com
> <mailto:lists at serioustechnology.com>>>>> wrote:
> >> > >
> >> > > So, we are trying to convert a number of postgresql
> >> databases
> >> > that were
> >> > > created with SQL_ASCII encoding to UTF8 encoding. As
> >> such, I
> >> > need to
> >> > > strip certain characters out of the data before
> >> dumping and
> >> > reloading
> >> > > the new databases. I'm using the following:
> >> > >
> >> > > data =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
> >> > > || (ord($1) == 11)
> >> > > || ((ord($1) >= 13) && (ord($1) <= 31))
> >> > > || ((ord($1) >= 127)) ?"": $1/egs;
> >> > >
> >> > > Yet I'm getting the following error, which indicates a
> >> > character that
> >> > > should be handled by the above code is still in the data:
> >> > >
> >> > > pg_restore: [archiver (db)] COPY failed: ERROR:
> >> invalid byte
> >> > sequence
> >> > > for encoding "UTF8": 0xbd
> >> > >
> >> > >
> >> > > Certainly, the above code should replace the 0xbd with
> >> nothing?
> >> > >
> >> > > Any perl/postgres gurus out there?
> >> > >
> >> > > --
> >> > > Until later, Geoffrey
> >> > >
> >> > > "I predict future happiness for America if they can
> >> prevent
> >> > > the government from wasting the labors of the people under
> >> > > the pretense of taking care of them."
> >> > > - Thomas Jefferson
> >> > > _______________________________________________
> >> > > Ale mailing list
> >> > > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
> <mailto:Ale at ale.org>> <mailto:Ale at ale.org <mailto:Ale at ale.org>
> >> <mailto:Ale at ale.org <mailto:Ale at ale.org>>> <mailto:Ale at ale.org
> <mailto:Ale at ale.org> <mailto:Ale at ale.org <mailto:Ale at ale.org>>
> >> > <mailto:Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
> <mailto:Ale at ale.org>>>>
> >> > > http://mail.ale.org/mailman/listinfo/ale
> >> > > See JOBS, ANNOUNCE and SCHOOLS lists at
> >> > > http://mail.ale.org/mailman/listinfo
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > --
> >> > > James P. Kinney III
> >> > > I would rather stumble along in freedom than walk
> >> effortlessly in
> >> > chains.
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> ------------------------------------------------------------------------
> >> > >
> >> > > _______________________________________________
> >> > > Ale mailing list
> >> > > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
> <mailto:Ale at ale.org>> <mailto:Ale at ale.org <mailto:Ale at ale.org>
> >> <mailto:Ale at ale.org <mailto:Ale at ale.org>>>
> >> > > http://mail.ale.org/mailman/listinfo/ale
> >> > > See JOBS, ANNOUNCE and SCHOOLS lists at
> >> > > http://mail.ale.org/mailman/listinfo
> >> >
> >> >
> >> > --
> >> > Until later, Geoffrey
> >> >
> >> > "I predict future happiness for America if they can prevent
> >> > the government from wasting the labors of the people under
> >> > the pretense of taking care of them."
> >> > - Thomas Jefferson
> >> > _______________________________________________
> >> > Ale mailing list
> >> > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
> <mailto:Ale at ale.org>> <mailto:Ale at ale.org <mailto:Ale at ale.org>
> >> <mailto:Ale at ale.org <mailto:Ale at ale.org>>>
> >> > http://mail.ale.org/mailman/listinfo/ale
> >> > See JOBS, ANNOUNCE and SCHOOLS lists at
> >> > http://mail.ale.org/mailman/listinfo
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > --
> >> > James P. Kinney III
> >> > I would rather stumble along in freedom than walk effortlessly in
> >> chains.
> >> >
> >> >
> >> >
> >> >
> >> ------------------------------------------------------------------------
> >> >
> >> > _______________________________________________
> >> > Ale mailing list
> >> > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
> <mailto:Ale at ale.org>>
> >> > http://mail.ale.org/mailman/listinfo/ale
> >> > See JOBS, ANNOUNCE and SCHOOLS lists at
> >> > http://mail.ale.org/mailman/listinfo
> >>
> >>
> >> --
> >> Until later, Geoffrey
> >>
> >> "I predict future happiness for America if they can prevent
> >> the government from wasting the labors of the people under
> >> the pretense of taking care of them."
> >> - Thomas Jefferson
> >> _______________________________________________
> >> Ale mailing list
> >> Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
> <mailto:Ale at ale.org>>
> >> http://mail.ale.org/mailman/listinfo/ale
> >> See JOBS, ANNOUNCE and SCHOOLS lists at
> >> http://mail.ale.org/mailman/listinfo
> >>
> >>
> >>
> >>
> >> --
> >> --
> >> James P. Kinney III
> >> I would rather stumble along in freedom than walk effortlessly in
> chains.
> >>
> >>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> Ale mailing list
> >> Ale at ale.org <mailto:Ale at ale.org>
> >> http://mail.ale.org/mailman/listinfo/ale
> >> See JOBS, ANNOUNCE and SCHOOLS lists at
> >> http://mail.ale.org/mailman/listinfo
> >
> >
> > --
> > Until later, Geoffrey
> >
> > "I predict future happiness for America if they can prevent
> > the government from wasting the labors of the people under
> > the pretense of taking care of them."
> > - Thomas Jefferson
> > _______________________________________________
> > Ale mailing list
> > Ale at ale.org <mailto:Ale at ale.org>
> > http://mail.ale.org/mailman/listinfo/ale
> > See JOBS, ANNOUNCE and SCHOOLS lists at
> > http://mail.ale.org/mailman/listinfo
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
--
Until later, Geoffrey
"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson
More information about the Ale
mailing list