Chrono Compendium

Zenan Plains - Site Discussion => General Discussion => Topic started by: ZeaLitY on January 31, 2004, 03:12:28 pm

Title: Text formatting
Post by: ZeaLitY on January 31, 2004, 03:12:28 pm
I'm trying to replace all instances of <p> with a
line break. Unfortunately, when I paste the instance of such into textpad, it comes out as /n, and does nothing. Is it possible to do this? I'm trying to format the encyclopedia for phpbb.
Title: Text formatting
Post by: Ramsus on January 31, 2004, 05:21:51 pm
Now that you have Perl installed, you can do the following:

Create a folder to work in. Copy and paste all of the files you want to edit into this folder, making sure nothing else is there. Now then, use Notepad to create a new file with the following text:

Code: [Select]

perl -i.bak -p -e 's/<p.*?>/\n/g; s/<\/p.*?>//g;' *.txt


And save it in that folder as htmltotxt.bat. The *.txt means edit all files that have a .txt extension in the current folder.

Then double click it. The folder should now be filled with .bak backups of the original files.

The .txt files should now be stripped of the <p> and </p> tags. I'll explain what the 's/<p.*?>/\n/g; s/<\/p.*?>//g;' means later. It's what's called a regular expression (or regex).

You can also use:

Code: [Select]

s/<br.*?>/\n/g;


To replace <br>, <br /> and <br anything> with linebreaks.

So in the end, you'd want to use a .bat file with:
Code: [Select]

perl -i.bak -p -e 's/<p.*?>/\n/g; s/<\/p.*?>//g; s/<br.*?>/\n/g;' *.txt


EDIT: I'll also teach you how to write your own Perl scripts (.pl) later, instead of using .bat files that call Perl from the commandline. This will allow you to do a lot more.
Title: Text formatting
Post by: ZeaLitY on January 31, 2004, 06:57:19 pm
Access Denied?
Title: Text formatting
Post by: Ramsus on January 31, 2004, 08:49:27 pm
Is that supposed to be an error? If so, are the files being used by another program? Did you put them and the .bat file in the same folder by themselves?

Can you please be more detailed? What did you do exactly? Where did it say Access Denied? Was it a dialog box? Was there anything else?
Title: Text formatting
Post by: ZeaLitY on February 01, 2004, 12:37:19 am
I received that when trying the first string. When I tried the third, complete line, I received 'File not Found.' I have followed instructions as given.
Title: Text formatting
Post by: Ramsus on February 01, 2004, 01:25:01 am
Make sure Perl is in your path by going to Start->Run, typing command, hitting enter, and then typing 'perl -v' in the prompt provided. If Perl is properly installed, then it should print some version information for your copy of perl.

EDIT: I'm thinking maybe I should write a simple Windows program for applying regular expressions to text files.
Title: Text formatting
Post by: Ramsus on February 01, 2004, 03:40:06 pm
Well, no matter what, we can be sure of one thing: ActivePerl has set things up so you can run Perl scripts (.pl files) by double clicking on them.

So copy and paste the following into a file called "htmltotxt.pl" or download the script here (http://www.chronocompendium.com/htmltotxt.pl.txt) (rename it to have a .pl extension):

Code: [Select]

#!/usr/bin/perl

foreach $file (@ARGV) {
    open(FILE, "<$file") or die "Could not open file: $line";
    if ($file =~ /(.*)(\.html)/) {
$ofile = "$1.txt";
    } else {
$ofile = "$file.txt";
    }
    open(OFILE, ">$ofile") or die "Could not open output file: $ofile";

    print "Generating $ofile from $file...\n";

    foreach $line (<FILE>) {
chomp($line);
$line =~ s/<p.*?>/\n/;
$line =~ s/<\/p.*?>//;
$line =~ s/<br.*?>/\n/;
$line =~ s/<i.*?>/[i]/;
$line =~ s/<\/i.*?>/[\/i]/;
$line =~ s/<b.*?>/[b]/;
$line =~ s/<\/b.*?>/[\/b]/;
print OFILE "$line\n";
    }
   
    close(OFILE);
    close(FILE);
}


To use it, simply drag and drop a file (or multiple files) onto the .pl script. It should generate text files. It tells you what files it generates, so you'll know where to look.
Title: Text formatting
Post by: ZeaLitY on February 01, 2004, 08:20:13 pm
Is there a command line function I can use to accomplish the drag and drop? The system is not letting me do so, though it did with bat files.