[CLUE-Tech] Re: PHP - str_replace, preg_replace, wtf?
David L. Willson
DLWillson at TheGeek.NU
Fri Feb 27 17:02:30 MST 2004
I fixed it. Apparently, my newlines and backslash n's weren't making it
all the way through to the replacement engine. I needed to double my
backslashes. As a bonus, when I replaced str_replace with preg_replace,
I was able to reduce the number of elements in the replacement arrays.
The program looks like this now:
-----------------------------------------------------------------
#!/usr/bin/php -q
<?php
/**
* @return string
* @param string $strHTML
* @desc Converts ~all~ HTML amp-codes to actual characters.
*/
function html_decode($strHTML) {
$strHTML = html_entity_decode($strHTML);
$strPattern = "/&#(\d{1,4});/e";
$strReplace = "chr($1)";
$strHTML = preg_replace($strPattern,$strReplace,$strHTML);
return $strHTML;
}
/**
* @return void
* @param int $urgency (0 for chatter, 3 for critical errors)
* @param any $message
* @desc Show something, if it is more urgent than the threshold
*/
function chat ($urgency,$message) {
global $chatlevel;
if ($chatlevel >= $urgency){
/*
What I ~should~ do is check if $message is a string
with a \n for the last char, and echo a \n, if not.
*/
print_r($message);
}
}
$chatlevel = 3; // 0 = no chatting, 3 = chatty
$e_info = 3; $e_warn = 2; $e_err = 1;
$files = glob("/home/dlwillson/gatstuph/example.*");
$old[0] = "/'/" ; $new[0] = "''" ;
$old[] = "/ /" ; $new[] = chr(32) ;
$old[] = "/".chr(160)."/" ; $new[] = chr(32) ;
$old[] = "/ /" ; $new[] = chr(32) ;
$old[] = "/ \ +/" ; $new[] = chr(32) ;
$old[] = "/^ +/" ; $new[] = "" ;
$old[] = "/ '/" ; $new[] = "'" ;
$old[] = "/\\\\r/" ; $new[] = "" ;
$old[] = "/\\r/" ; $new[] = "" ;
$old[] = "/\\\\n'/" ; $new[] = "'" ;
$old[] = "/'\\\\n/" ; $new[] = "'" ;
$old[] = "/\s*\\\\n\s*/" ; $new[] = "\\\\n" ;
$old[] = "/\\\\n\\\\n/" ; $new[] = "\\\\n" ;
$old[] = "/\s*\\n\s*/" ; $new[] = "\\n" ;
$old[] = "/\\n\\n+/" ; $new[] = "\\n" ;
print_r($old);
print_r($new);
foreach ( $files as $file_num => $infile) {
chat($e_warn, "Begin processing on file $file_num: $infile\n");
$outfile =
fopen(str_replace("gatstuph","gatstuph/testing",$infile),'w');
$arrFile = file($infile);
//print_r($arrFile);
foreach ( $arrFile as $line_num => $line_text ) {
chat($e_info, "$line_num:$line_text");
// Single-quotes are most trouble-some. Getting rid of them
first.
$line_text = rtrim($line_text);
$newString = preg_replace($old,$new,$line_text);
while ($line_text <> $newString) {
$line_text = $newString;
$newString = preg_replace($old,$new,$line_text);
}
chat($e_info, "$line_num:$line_text\n");
$newString = html_decode($line_text);
while ($line_text <> $newString) {
$line_text = $newString;
$newString = html_decode($line_text);
}
$line_text = strip_tags($line_text);
chat($e_info, "$line_num:$line_text\n");
$newString = preg_replace($old,$new,$line_text);
while ($line_text <> $newString) {
$line_text = $newString;
$newString = preg_replace($old,$new,$line_text);
}
$line_text .= "\n";
chat($e_info, "$line_num:$line_text\n");
fwrite($outfile, $line_text);
}
}
?>
On Thu, 2004-02-26 at 15:38, David L. Willson wrote:
> I'm trying to do some stripping and cleaning of data.
> These are my goals:
> - Remove all the encoded HTML by decoding it, and then stripping it
> out.
> - Remove all extraneous whitespace ('\r', doubled spaces,
> doubled-newlines, mixed adjacent whitespace, etc...)
>
> I'll show you a code snippet, then an output snippet. The bugger is
> exhibited from 'Line 1432' of the output onward. Why won't the spaces
> near the '\n's go away? I'd like to know what I'm doing wrong, or
> failing that, a method that works!
> -------------------------------------------------------------
> #!/usr/bin/php -q
>
> <?php
>
> /**
> * @return string
> * @param string $strHTML
> * @desc Converts ~all~ HTML amp-codes to actual characters.
> */
> function html_decode($strHTML) {
> $strHTML = html_entity_decode($strHTML);
> $strPattern = "/&#(\d{1,4});/e";
> $strReplace = "chr($1)";
> $strHTML = preg_replace($strPattern,$strReplace,$strHTML);
> return $strHTML;
> }
>
> $chatlevel = 3; // 0 = no chatting, 3 = chatty
>
> $e_info = 3; $e_warn = 2; $e_err = 1;
>
> $files = glob("/home/dlwillson/gatstuph/data-sql/airland*.sql");
>
> $old[0] = "'" ; $new[0] = "''" ;
> $old[] = " " ; $new[] = chr(32) ;
> $old[] = chr(160) ; $new[] = chr(32) ;
> $old[] = " " ; $new[] = chr(32) ;
> $old[] = " " ; $new[] = chr(32) ;
> $old[] = "\\r" ; $new[] = "" ;
> $old[] = "\r" ; $new[] = "" ;
> $old[] = "\\n'" ; $new[] = "'" ;
> $old[] = "'\\n" ; $new[] = "'" ;
> $old[] = " \\n " ; $new[] = "\\n" ;
> $old[] = "\\n " ; $new[] = "\\n" ;
> $old[] = " \\n" ; $new[] = "\\n" ;
>
> foreach ( $files as $file_num => $infile) {
> // chat($e_warn, "Begin processing on file: $infile\n");
> $outfile = fopen(str_replace("data-sql","testing",$infile),'w');
> $arrFile = file($infile);
> foreach ( $arrFile as $line_num => $line_text ) {
> chat($e_info, "Line $line_num: $line_text");
> // Single-quotes are most trouble-some. Getting rid of them
> first.
> $line_text = rtrim($line_text);
> while ($line_text <> str_replace($old,$new,$line_text)) {
> $line_text = str_replace($old,$new,$line_text);
> }
> $line_text = html_decode($line_text);
> $line_text = strip_tags($line_text);
> while ($line_text <> str_replace("\\n\\n", "\\n", $line_text)) {
> $line_text = str_replace("\\n\\n", "\\n", $line_text);
> }
> $line_text .= "\n";
> chat($e_info, "Line $line_num: $line_text");
> fwrite($outfile, $line_text);
> }
> }
> ?>
> --------------------------------Output------------------------------------
> Line 1408: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
> "ProductRefName")
> Line 1408: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
> "ProductRefName")
> Line 1409: VALUES ('wiannircocra', 'index', 'Home');
> Line 1409: VALUES ('wiannircocra', 'index', 'Home');
> Line 1410: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
> "ProductRefName")
> Line 1410: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
> "ProductRefName")
> Line 1411: VALUES ('wiannircocra', 'racobo1', 'Radio Control
> Boats');
> Line 1411: VALUES ('wiannircocra', 'racobo1', 'Radio Control Boats');
> Line 1412: INSERT INTO MSTR_Products (storename, tablename, table_id,
> item_id,
> Line 1412: INSERT INTO MSTR_Products (storename, tablename, table_id,
> item_id,
> Line 1413: "product-url",
> Line 1413: "product-url",
> Line 1414: "name",
> Line 1414: "name",
> Line 1415: "image",
> Line 1415: "image",
> Line 1416: "code",
> Line 1416: "code",
> Line 1417: "price",
> Line 1417: "price",
> Line 1418: "sale-price",
> Line 1418: "sale-price",
> Line 1419: "orderable",
> Line 1419: "orderable",
> Line 1420: "caption",
> Line 1420: "caption",
> Line 1421: "features",
> Line 1421: "features",
> Line 1422: "specification",
> Line 1422: "specification",
> Line 1423: "taxable")
> Line 1423: "taxable")
> Line 1424: VALUES ('RCGATELYS', 'item.', 'solidtype', 'wiannircocra',
> Line 1424: VALUES ('RCGATELYS', 'item.', 'solidtype', 'wiannircocra',
> Line 1425: 'http://store.yahoo.com/rcgatelys/wiannircocra.html',
> Line 1425: 'http://store.yahoo.com/rcgatelys/wiannircocra.html',
> Line 1426: 'Wicked Angel Nitro R/C Ocean Racer',
> Line 1426: 'Wicked Angel Nitro R/C Ocean Racer',
> Line 1427: 'http://edit.store.yahoo.com/I/rcgatelys_1772_14861613',
> Line 1427: 'http://edit.store.yahoo.com/I/rcgatelys_1772_14861613',
> Line 1428: 'MTC6801, MTC7500',
> Line 1428: 'MTC6801, MTC7500',
> Line 1429: '$425.00',
> Line 1429: '$425.00',
> Line 1430: '$354.99',
> Line 1430: '$354.99',
> Line 1431: 'T',
> Line 1431: 'T',
> Line 1432: 'The ready-to-race Wicked Angel is a
> pro-competition nitro ocean racer. This 30-inch long, 35MPH boat
> featres Megatech's 2.7cc Nitro Mariner engine with unique AquaRam
> water cooling feature. From stem to stern, every design feature found on
> this model was put there for one reason-to make it go
> faste!<br>\r\n<p>Wicked Angel's modified V hull
> design featuring multiple planning strakes for breaking up wate surface
> tension, the biggest drag-causing factor on any race boat, and the
> Wicked Angel's planing strakes are placed at serveral points on the
> hull's bottom to defeat this speed-robbing phenomenon. As a
> result-this boat literallly flies over the water's
> surface.<br>\r\n <br>\r\n</p>\r\n<p>Other full race features include heavy-duty drive shaft with bronze phosphorous bushings, triple shoe centrifugal clutch, quick-fill racing tank, shaft oiler reservoir and sealed radio box. The Wicked Angel comes fully assembled with engine and 2 channel radio installed and guaranteed to ecite any onlookers!<br>\r\n</p>\r\n<p></p>',
> Line 1432: 'The ready-to-race Wicked Angel is a pro-competition nitro
> ocean racer. This 30-inch long, 35MPH boat featres Megatech''s 2.7cc
> Nitro Mariner engine with unique AquaRam water cooling feature. From
> stem to stern, every design feature found on this model was put there
> for one reason-to make it go faste!\nWicked Angel''s modified V hull
> design featuring multiple planning strakes for breaking up wate surface
> tension, the biggest drag-causing factor on any race boat, and the
> Wicked Angel''s planing strakes are placed at serveral points on the
> hull''s bottom to defeat this speed-robbing phenomenon. As a result-this
> boat literallly flies over the water''s surface.\n \nOther full race
> features include heavy-duty drive shaft with bronze phosphorous
> bushings, triple shoe centrifugal clutch, quick-fill racing tank, shaft
> oiler reservoir and sealed radio box. The Wicked Angel comes fully
> assembled with engine and 2 channel radio installed and guaranteed to
> ecite any onlookers!\n',
> Line 1433: '<li type="disc">M16 1+ HP Nitro Water
> Cooled Motor <br>\r\n <li type="disc">Tuned
> Exhaust System <br>\r\n <li type="disc">Custom
> tuned High Speed Prop <br>\r\n <li
> type="disc">Water proof Sealed Radio Box
> <br>\r\n <li type="disc">35+ mph Out Of The Box
> <br>\r\n <li type="disc">Adjustable trim tabs
> and skid fins <br>\r\n <li type="disc">15-20
> Minute run time <br>\r\n <li type="disc">2
> Channel FM Radio <br>\r\n <li type="disc">Gas
> Completer Combo <br>\r\n <li type="disc">Fuel
> <br>\r\n <li type="disc">Glow ignitor
> <br>\r\n <li type="disc">Glow Plug,
> <br>\r\n <li type="disc">Glow Plug Wrench
> <br>\r\n <li type="disc">12 AA Batteries
> \r\n <li type="disc">Gas Completer Combo
> (MTC7500) Fuel <br>',
> Line 1433: 'M16 1+ HP Nitro Water Cooled Motor \n Tuned Exhaust System
> \n Custom tuned High Speed Prop \n Water proof Sealed Radio Box \n 35+
> mph Out Of The Box \n Adjustable trim tabs and skid fins \n 15-20 Minute
> run time \n 2 Channel FM Radio \n Gas Completer Combo \n Fuel \n Glow
> ignitor \n Glow Plug, \n Glow Plug Wrench \n 12 AA Batteries\nGas
> Completer Combo (MTC7500) Fuel ',
> Line 1434: 'Hull Length : 30&quot; <br>Weight: 48 oz
> <br>',
> Line 1434: 'Hull Length : 30" Weight: 48 oz ',
> Line 1435: 'T');
> Line 1435: 'T');
More information about the clue-tech
mailing list