One Late Night With Perl (aka search and replace on big large huge files)

One Late Night With Perl: A Troubleshooting Story

A few weeks ago we needed to do a search and replace on a few hundred IIS web log files. Each file was between 100MB and 2GB. We’re all windows here, so I figured I needed to download a tool. With a GUI. And many buttons I can click.

I first tried numerous “Search and Replace” tools from download.com and other shareware sites. SuperSearch, WinGrep, WGrep, Advanced Search and Replace, Fast Search, etc, etc. I must have tried out 6 or 7 different apps on a small subset. They ranged from bleh to moderate to very useful for smaller files, but my best hopes were that it would take anywhere from 12 hours to 2 days to update all this data. Did I mention that it was 9pm and we needed to update reports for the following morning?

I finally found TextPipe…it looked promising, but we had to buy a copy in order to update all the files. The web site said we’d get a registered version in 15 minutes, so I popped in my CC, got my “thanks for ordering” email, and waited. And waited. After 20 minutes I got an email telling me my order would be fulfilled in 4 hours. WTF!?

So it’s back to square one, and I figured to try out Perl. BTW I don’t know Perl from Clipper. Downloaded ActivePerl (which is free) and started hitting the news groups. There were numerous samples, but all for Unix, Linux, etc…no Windows. Anyhow, I finally found a sample that would work:

== replace.bat ==

rem replace "foo" with "bar" in *.log

for %%n in (*.log) do perl -pi.bak -e "s/foo/bar/g" %%n

== EOF ==

And it was fast as hell! I was hoping for something that would recurse through directories, but I didn’t have time to find it. So instead I just made a batch file kinda like

for %%n in (f:w3svc1*.log) do perl -pi.bak -e "s/foo/bar/g" %%n

for %%n in (f:w3svc2*.log) do perl -pi.bak -e "s/foo/bar/g" %%n

for %%n in (f:w3svc3*.log) do perl -pi.bak -e "s/foo/bar/g" %%n

etc. There were only a dozen directories, so no big deal. Hmm, spend 2 hours learning Perl so you can write a function to recurse 10 subdirectories, or write 10 lines in a batch file? 10:30pm, you tell me. I added a few lines to the batch file to write out timestamps when it finished each directory, kicked it off, & cabbed home. I didn’t know how long it would take, but it was the best I could do for the time being.

Epilogue: I come in the next morning around 8am to hear that the reports were running. I checked the timestamps from the search & replace — under 4 hours to update everything! Pretty amazing, and we were abe to get the reports done in time. Oh, and TextPipe, the program I paid several hundred for and expected to get in 15 minutes to save my day? It finally arrived 11 hours after I ordered it. Needless to say I got a refund immediately, and cozied up to my new friend Perl.

0