[ale] perl/postgresql question

Geoffrey Myers lists at serioustechnology.com
Thu Mar 31 16:11:14 EDT 2011


Jim Kinney wrote:
> The "g" _should_ do it. Quite puzzling.
> Can you do a run and find an example of when it fails? Is it always 0xbd?

The sad thing is, all it tells me is which table in the database has the 
error.  It does not identify the field.  I've posted to the Postgres 
list to see if there's any way to narrow down the culprit, there is not.

Once it fails, it stops loading that table, so I never get pass this error.

When I first caught this error, I was simply trying to load an SQL_ASCII 
database dump into a UTF8 database.  That's when I wrote my cleandata.pl 
code.  The code cleans 62750 records.  I just don't understand why it 
doesn't fix this one.

> Maybe someone in Tibet doesn't like your code :-)  (It's a Tibetan 
> character code in utf8)

But what is it in SQL_ASCII?  What happened is, some enterprising users 
figured out how to enter special characters from the keyboard, say, like 
a degrees symbol.  That goes in to the SQL_ASCII database fine, but when 
you try to load that into the UTF8 encoded database, it pukes.

> 
> On Thu, Mar 31, 2011 at 2:59 PM, Geoffrey Myers 
> <lists at serioustechnology.com <mailto:lists at serioustechnology.com>> wrote:
> 
>     Jim Kinney wrote:
>      > should it be replaced with nothing or with a NULL? Does the perl
>     pointer
>      > jump to the next character automatically when a match is made thus
>      > skipping a character in your replace? Does a double run fix it?
> 
>     Missed the first question.  It should be removed, not replaced with an
>     null.  For example I have:
> 
>     This is my <UGLY CHARACTER>data
> 
>     Should then be:
> 
>     This is my data
> 
>      >
>      > On Thu, Mar 31, 2011 at 12:04 PM, Geoffrey Myers
>      > <lists at serioustechnology.com <mailto:lists at serioustechnology.com>
>     <mailto:lists at serioustechnology.com
>     <mailto:lists at serioustechnology.com>>> wrote:
>      >
>      >     So, we are trying to convert a number of postgresql databases
>     that were
>      >     created with SQL_ASCII encoding to UTF8 encoding.  As such, I
>     need to
>      >     strip certain characters out of the data before dumping and
>     reloading
>      >     the new databases.  I'm using the following:
>      >
>      >     data =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
>      >                     || (ord($1) == 11)
>      >                     || ((ord($1) >= 13) && (ord($1) <= 31))
>      >                     || ((ord($1) >= 127)) ?"": $1/egs;
>      >
>      >     Yet I'm getting the following error, which indicates a
>     character that
>      >     should be handled by the above code is still in the data:
>      >
>      >     pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte
>     sequence
>      >     for encoding "UTF8": 0xbd
>      >
>      >
>      >     Certainly, the above code should replace the 0xbd with nothing?
>      >
>      >     Any perl/postgres gurus out there?
>      >
>      >     --
>      >     Until later, Geoffrey
>      >
>      >     "I predict future happiness for America if they can prevent
>      >     the government from wasting the labors of the people under
>      >     the pretense of taking care of them."
>      >     - Thomas Jefferson
>      >     _______________________________________________
>      >     Ale mailing list
>      >     Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
>     <mailto:Ale at ale.org>>
>      >     http://mail.ale.org/mailman/listinfo/ale
>      >     See JOBS, ANNOUNCE and SCHOOLS lists at
>      >     http://mail.ale.org/mailman/listinfo
>      >
>      >
>      >
>      >
>      > --
>      > --
>      > James P. Kinney III
>      > I would rather stumble along in freedom than walk effortlessly in
>     chains.
>      >
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > Ale mailing list
>      > Ale at ale.org <mailto:Ale at ale.org>
>      > http://mail.ale.org/mailman/listinfo/ale
>      > See JOBS, ANNOUNCE and SCHOOLS lists at
>      > http://mail.ale.org/mailman/listinfo
> 
> 
>     --
>     Until later, Geoffrey
> 
>     "I predict future happiness for America if they can prevent
>     the government from wasting the labors of the people under
>     the pretense of taking care of them."
>     - Thomas Jefferson
>     _______________________________________________
>     Ale mailing list
>     Ale at ale.org <mailto:Ale at ale.org>
>     http://mail.ale.org/mailman/listinfo/ale
>     See JOBS, ANNOUNCE and SCHOOLS lists at
>     http://mail.ale.org/mailman/listinfo
> 
> 
> 
> 
> -- 
> -- 
> James P. Kinney III
> I would rather stumble along in freedom than walk effortlessly in chains.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo


-- 
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson


More information about the Ale mailing list