[CLUE-Tech] PHP - str_replace, preg_replace, wtf?

David L. Willson DLWillson at TheGeek.NU
Thu Feb 26 15:38:30 MST 2004


I'm trying to do some stripping and cleaning of data.
These are my goals:
 - Remove all the encoded HTML by decoding it, and then stripping it
out.
 - Remove all extraneous whitespace ('\r', doubled spaces,
doubled-newlines, mixed adjacent whitespace, etc...)

I'll show you a code snippet, then an output snippet.  The bugger is
exhibited from 'Line 1432' of the output onward.  Why won't the spaces
near the '\n's go away?  I'd like to know what I'm doing wrong, or
failing that, a method that works!
-------------------------------------------------------------
#!/usr/bin/php -q

<?php

/**
* @return string
* @param string $strHTML
* @desc Converts ~all~ HTML amp-codes to actual characters.
*/
function html_decode($strHTML) {
   $strHTML = html_entity_decode($strHTML);
   $strPattern = "/&#(\d{1,4});/e";
   $strReplace = "chr($1)";
   $strHTML = preg_replace($strPattern,$strReplace,$strHTML);
   return $strHTML;
}

$chatlevel = 3; // 0 = no chatting, 3 = chatty

$e_info = 3; $e_warn = 2; $e_err = 1;

$files = glob("/home/dlwillson/gatstuph/data-sql/airland*.sql");

$old[0] = "&#39;" ; $new[0] = "''"   ;
$old[] = "&#160;" ; $new[] = chr(32) ;
$old[] = chr(160) ; $new[] = chr(32) ;
$old[] = "&nbsp;" ; $new[] = chr(32) ;
$old[] = "  "     ; $new[] = chr(32) ;
$old[] = "\\r"    ; $new[] = ""      ;
$old[] = "\r"     ; $new[] = ""      ;
$old[] = "\\n'"   ; $new[] = "'"     ;
$old[] = "'\\n"   ; $new[] = "'"     ;
$old[] = " \\n "  ; $new[] = "\\n"   ;
$old[] = "\\n "   ; $new[] = "\\n"   ;
$old[] = " \\n"   ; $new[] = "\\n"   ;

foreach ( $files as $file_num => $infile) {
//   chat($e_warn, "Begin processing on file: $infile\n");
   $outfile = fopen(str_replace("data-sql","testing",$infile),'w');
   $arrFile = file($infile);
   foreach ( $arrFile as $line_num => $line_text ) {
      chat($e_info, "Line $line_num: $line_text");
      // Single-quotes are most trouble-some.  Getting rid of them
first.
      $line_text = rtrim($line_text);
      while ($line_text <> str_replace($old,$new,$line_text)) {
         $line_text = str_replace($old,$new,$line_text);
      }
      $line_text = html_decode($line_text);
      $line_text = strip_tags($line_text);
      while ($line_text <> str_replace("\\n\\n", "\\n", $line_text)) {
         $line_text = str_replace("\\n\\n", "\\n", $line_text);
      }
      $line_text .= "\n";
      chat($e_info, "Line $line_num: $line_text");
      fwrite($outfile, $line_text);
   }
}
?>
--------------------------------Output------------------------------------
Line 1408: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
"ProductRefName")
Line 1408: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
"ProductRefName")
Line 1409:     VALUES ('wiannircocra', 'index', 'Home');
Line 1409:  VALUES ('wiannircocra', 'index', 'Home');
Line 1410: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
"ProductRefName")
Line 1410: INSERT INTO MSTR_ProductRefs (item_id, "ProductRefID",
"ProductRefName")
Line 1411:     VALUES ('wiannircocra', 'racobo1', 'Radio Control
Boats');
Line 1411:  VALUES ('wiannircocra', 'racobo1', 'Radio Control Boats');
Line 1412: INSERT INTO MSTR_Products (storename, tablename, table_id,
item_id,
Line 1412: INSERT INTO MSTR_Products (storename, tablename, table_id,
item_id,
Line 1413:     "product-url",
Line 1413:  "product-url",
Line 1414:     "name",
Line 1414:  "name",
Line 1415:     "image",
Line 1415:  "image",
Line 1416:     "code",
Line 1416:  "code",
Line 1417:     "price",
Line 1417:  "price",
Line 1418:     "sale-price",
Line 1418:  "sale-price",
Line 1419:     "orderable",
Line 1419:  "orderable",
Line 1420:     "caption",
Line 1420:  "caption",
Line 1421:     "features",
Line 1421:  "features",
Line 1422:     "specification",
Line 1422:  "specification",
Line 1423:     "taxable")
Line 1423:  "taxable")
Line 1424:   VALUES ('RCGATELYS', 'item.', 'solidtype', 'wiannircocra',
Line 1424:  VALUES ('RCGATELYS', 'item.', 'solidtype', 'wiannircocra',
Line 1425:     'http://store.yahoo.com/rcgatelys/wiannircocra.html',
Line 1425:  'http://store.yahoo.com/rcgatelys/wiannircocra.html',
Line 1426:     'Wicked Angel Nitro R/C Ocean Racer',
Line 1426:  'Wicked Angel Nitro R/C Ocean Racer',
Line 1427:     'http://edit.store.yahoo.com/I/rcgatelys_1772_14861613',
Line 1427:  'http://edit.store.yahoo.com/I/rcgatelys_1772_14861613',
Line 1428:     'MTC6801, MTC7500',
Line 1428:  'MTC6801, MTC7500',
Line 1429:     '$425.00',
Line 1429:  '$425.00',
Line 1430:     '$354.99',
Line 1430:  '$354.99',
Line 1431:     'T',
Line 1431:  'T',
Line 1432:     'The ready&#45;to&#45;race Wicked Angel is a
pro&#45;competition nitro ocean racer. This 30&#45;inch long, 35MPH boat
featres Megatech&#39;s 2.7cc Nitro Mariner engine with unique AquaRam
water cooling feature. From stem to stern, every design feature found on
this model was put there for one reason&#45;to make it go
faste!&#60;br&#62;\r\n&#60;p&#62;Wicked Angel&#39;s modified V hull
design featuring multiple planning strakes for breaking up wate surface
tension, the biggest drag&#45;causing factor on any race boat, and the
Wicked Angel&#39;s planing strakes are placed at serveral points on the
hull&#39;s bottom to defeat this speed&#45;robbing phenomenon. As a
result&#45;this boat literallly flies over the water&#39;s
surface.&#60;br&#62;\r\n	&#60;br&#62;\r\n&#60;/p&#62;\r\n&#60;p&#62;Other full race features include heavy&#45;duty drive shaft with bronze phosphorous bushings, triple shoe centrifugal clutch, quick&#45;fill racing tank, shaft oiler reservoir and sealed radio box. The Wicked Angel comes fully assembled with engine and 2 channel radio installed and guaranteed to ecite any onlookers!&#60;br&#62;\r\n&#60;/p&#62;\r\n&#60;p&#62;&#60;/p&#62;',
Line 1432:  'The ready-to-race Wicked Angel is a pro-competition nitro
ocean racer. This 30-inch long, 35MPH boat featres Megatech''s 2.7cc
Nitro Mariner engine with unique AquaRam water cooling feature. From
stem to stern, every design feature found on this model was put there
for one reason-to make it go faste!\nWicked Angel''s modified V hull
design featuring multiple planning strakes for breaking up wate surface
tension, the biggest drag-causing factor on any race boat, and the
Wicked Angel''s planing strakes are placed at serveral points on the
hull''s bottom to defeat this speed-robbing phenomenon. As a result-this
boat literallly flies over the water''s surface.\n	\nOther full race
features include heavy-duty drive shaft with bronze phosphorous
bushings, triple shoe centrifugal clutch, quick-fill racing tank, shaft
oiler reservoir and sealed radio box. The Wicked Angel comes fully
assembled with engine and 2 channel radio installed and guaranteed to
ecite any onlookers!\n',
Line 1433:     '&#60;li type=&#34;disc&#34;&#62;M16 1+ HP Nitro Water
Cooled Motor &#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Tuned
Exhaust System &#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Custom
tuned High Speed Prop &#60;br&#62;\r\n	&#60;li
type=&#34;disc&#34;&#62;Water proof Sealed Radio Box
&#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;35+ mph Out Of The Box
&#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Adjustable trim tabs
and skid fins &#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;15&#45;20
Minute run time &#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;2
Channel FM Radio &#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Gas
Completer Combo &#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Fuel
&#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Glow ignitor
&#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Glow Plug,
&#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;Glow Plug Wrench
&#60;br&#62;\r\n	&#60;li type=&#34;disc&#34;&#62;12 AA Batteries
\r\n        &#60;li type=&#34;disc&#34;&#62;Gas Completer Combo
(MTC7500) Fuel &#60;br&#62;',
Line 1433:  'M16 1+ HP Nitro Water Cooled Motor \n	Tuned Exhaust System
\n	Custom tuned High Speed Prop \n	Water proof Sealed Radio Box \n	35+
mph Out Of The Box \n	Adjustable trim tabs and skid fins \n	15-20 Minute
run time \n	2 Channel FM Radio \n	Gas Completer Combo \n	Fuel \n	Glow
ignitor \n	Glow Plug, \n	Glow Plug Wrench \n	12 AA Batteries\nGas
Completer Combo (MTC7500) Fuel ',
Line 1434:     'Hull Length :  30&#38;quot; &#60;br&#62;Weight:   48 oz
&#60;br&#62;',
Line 1434:  'Hull Length : 30&quot; Weight: 48 oz ',
Line 1435:     'T');
Line 1435:  'T');




More information about the clue-tech mailing list