Author Topic: Text formatting  (Read 1974 times)

ZeaLitY

  • Entity
  • End of Timer (+10000)
  • *
  • Posts: 10795
  • Spring Breeze Dancin'
    • View Profile
    • My Compendium Staff Profile
Text formatting
« on: January 31, 2004, 03:12:28 pm »
I'm trying to replace all instances of <p> with a
line break. Unfortunately, when I paste the instance of such into textpad, it comes out as /n, and does nothing. Is it possible to do this? I'm trying to format the encyclopedia for phpbb.

Ramsus

  • Guest
Text formatting
« Reply #1 on: January 31, 2004, 05:21:51 pm »
Now that you have Perl installed, you can do the following:

Create a folder to work in. Copy and paste all of the files you want to edit into this folder, making sure nothing else is there. Now then, use Notepad to create a new file with the following text:

Code: [Select]

perl -i.bak -p -e 's/<p.*?>/\n/g; s/<\/p.*?>//g;' *.txt


And save it in that folder as htmltotxt.bat. The *.txt means edit all files that have a .txt extension in the current folder.

Then double click it. The folder should now be filled with .bak backups of the original files.

The .txt files should now be stripped of the <p> and </p> tags. I'll explain what the 's/<p.*?>/\n/g; s/<\/p.*?>//g;' means later. It's what's called a regular expression (or regex).

You can also use:

Code: [Select]

s/<br.*?>/\n/g;


To replace <br>, <br /> and <br anything> with linebreaks.

So in the end, you'd want to use a .bat file with:
Code: [Select]

perl -i.bak -p -e 's/<p.*?>/\n/g; s/<\/p.*?>//g; s/<br.*?>/\n/g;' *.txt


EDIT: I'll also teach you how to write your own Perl scripts (.pl) later, instead of using .bat files that call Perl from the commandline. This will allow you to do a lot more.

ZeaLitY

  • Entity
  • End of Timer (+10000)
  • *
  • Posts: 10795
  • Spring Breeze Dancin'
    • View Profile
    • My Compendium Staff Profile
Text formatting
« Reply #2 on: January 31, 2004, 06:57:19 pm »
Access Denied?

Ramsus

  • Guest
Text formatting
« Reply #3 on: January 31, 2004, 08:49:27 pm »
Is that supposed to be an error? If so, are the files being used by another program? Did you put them and the .bat file in the same folder by themselves?

Can you please be more detailed? What did you do exactly? Where did it say Access Denied? Was it a dialog box? Was there anything else?

ZeaLitY

  • Entity
  • End of Timer (+10000)
  • *
  • Posts: 10795
  • Spring Breeze Dancin'
    • View Profile
    • My Compendium Staff Profile
Text formatting
« Reply #4 on: February 01, 2004, 12:37:19 am »
I received that when trying the first string. When I tried the third, complete line, I received 'File not Found.' I have followed instructions as given.

Ramsus

  • Guest
Text formatting
« Reply #5 on: February 01, 2004, 01:25:01 am »
Make sure Perl is in your path by going to Start->Run, typing command, hitting enter, and then typing 'perl -v' in the prompt provided. If Perl is properly installed, then it should print some version information for your copy of perl.

EDIT: I'm thinking maybe I should write a simple Windows program for applying regular expressions to text files.

Ramsus

  • Guest
Text formatting
« Reply #6 on: February 01, 2004, 03:40:06 pm »
Well, no matter what, we can be sure of one thing: ActivePerl has set things up so you can run Perl scripts (.pl files) by double clicking on them.

So copy and paste the following into a file called "htmltotxt.pl" or download the script here (rename it to have a .pl extension):

Code: [Select]

#!/usr/bin/perl

foreach $file (@ARGV) {
    open(FILE, "<$file") or die "Could not open file: $line";
    if ($file =~ /(.*)(\.html)/) {
$ofile = "$1.txt";
    } else {
$ofile = "$file.txt";
    }
    open(OFILE, ">$ofile") or die "Could not open output file: $ofile";

    print "Generating $ofile from $file...\n";

    foreach $line (<FILE>) {
chomp($line);
$line =~ s/<p.*?>/\n/;
$line =~ s/<\/p.*?>//;
$line =~ s/<br.*?>/\n/;
$line =~ s/<i.*?>/[i]/;
$line =~ s/<\/i.*?>/[\/i]/;
$line =~ s/<b.*?>/[b]/;
$line =~ s/<\/b.*?>/[\/b]/;
print OFILE "$line\n";
    }
   
    close(OFILE);
    close(FILE);
}


To use it, simply drag and drop a file (or multiple files) onto the .pl script. It should generate text files. It tells you what files it generates, so you'll know where to look.

ZeaLitY

  • Entity
  • End of Timer (+10000)
  • *
  • Posts: 10795
  • Spring Breeze Dancin'
    • View Profile
    • My Compendium Staff Profile
Text formatting
« Reply #7 on: February 01, 2004, 08:20:13 pm »
Is there a command line function I can use to accomplish the drag and drop? The system is not letting me do so, though it did with bat files.