# Archive for category perl

Sometimes (especially in one-liners) you want to assign a value only if a corresponding regex (regular expression) that picks out the value matches. I.e if it has once matched you don’t want it overwritten with undef if the regex later fails on a subsequent row in your file.

This can be solved thusly:

 $var =$1 if (/Correct (\d)+ %/); 

The above snippet will assign $var if the regex on the right hand side matches and picks out a value (via the capturing parenthesis on the right hand side and otherwise leave it unchanged. Tags: A quick and dirty one-liner (depending on the length of your lines ;)) to calculate the average of a value in a bunch of files in a directory structure. The below one line picks out a value in each file that matches the name “Logfile*.txt” in the underlying directory structure. In the below case, the line was in the form of:  Correctly Classified Instances 37 60.6557 %  or  Correctly Classified Instances 37 60 %  The code traverses the directory structure from the current dir and picks out the “60.6557″ and sums that over the number of files that matched and then divides with however many files that matched.  find . -name "Logfile*.txt" -exec perl -ne '($var) = (/^Correctly.*\s+((\.|\d)+)\s+%/); print "$var\n" if$var;' '{}' \; | xargs perl -e 'use List::Util qw(sum); print(sum(@ARGV)/scalar(@ARGV)); print "\n";' 

OBS: Not very robust!! But it IS a one-liner!

Tags:

A common thing I do is to scrape a Web page, run it through some Perl magic and marvel over the result. A frequent reason of contention in this process is the issue of getting å’s ä’s and ö’s correctly handled by Perl and various terminals, here’s a write up of a simple example.

The webpage is UTF-8 encoded, I save it to disk using “Save as…” in my browser. The resulting file on disk is UTF-8 encoded.

In this example the file is reasonably small so I use File::Slurp to get the full file in a scalar…

 my $text = read_file( <filename> ) ; # Slurp the file utf8::decode($text); # Decode the file from UTF-8 

I can now match with å ä and ö in my Perl code like this:

 my ($address) = ($text =~ m{title="Visa alla bilder för ([^"]+)"}sm); 

Later when I have finished my text processing and want to print the result in my terminal, Cygwin in this case I do:

 my $output = ""; $output .= "Adress: " . $house->{address} . "\n" if defined($house->{address}); $output .= "Område: " .$house->{area} . "\n" if defined($house->{area}); ...   utf8::encode($output); # Encode the text as UTF-8 which is correctly displayed by Cygwin print $output;  Note: You should not “use utf8;” in this Perl script, “use utf8;” should only be used if your Perl script is written in UTF-8! Tags: So I had this huge bunch of Spreadsheet::WriteExcel code where I was generating Excel sheets from some statistics I was gathering and I noticed that everytime I opened the Excel file I got a message with something along the lines “File error: data may have been lost”, but on a casual look the spreadsheet looked fine… but of course the devil is in the details. It turns out that if you happen to Spreadsheet::WriteExcel::write in the same cell twice (or more I guess ) that error message is what my Excel produces…. So how to find where the problem is… in this huge bunch of writes… Enter Adam Kennedy’s excellent Aspect library for Perl, a truly brilliant module!! Thanks Adam! How did I use it in this case? Here goes…  .... lots of other code.... use Aspect; my$pointcut = call qr/Spreadsheet::WriteExcel::Worksheet::write$/; # Observe the$ at the end, otherwise write_string will also match, and we don't want that ...... code passes ... sub in_my_big_excel_write_block { my %spcoords = (); my $before_write = before { my @args = @{$_->params}; if ( $spcoords{$args[1]}{$args[2]} ) { croak "Will do double write at coord: [" .$args[1] . "," . $args[2] . "] = >" .$args[3] . "<\n" . "Previous write at coord: [" . $args[1] . "," .$args[2] . "] = >" . $spcoords{$args[1]}{$args[2]} . "<\n"; }$spcoords{$args[1]}{$args[2]} = $args[3]; }$pointcut; .... lots of $worksheet->write(...) code....   }  Now my poor perl script will die with a message telling me where and what I tried to write double and what I wrote there previously, now it’s easy to find! A more general debugging tip would be this simple “before” advice:  before { if( ($cnt % 1000) == 0 ) { my @args = @{$_->params}; print "Calling " .$_->sub_name . " with args : " . Dumper(@args) . "\n"; sleep 1; } $cnt++; }$pointcut; 

Neat huh!? And trust me, this is only some small simple example of the power of the Aspect library.

Tags: , , ,

Ok, this is sooo cool! For us who love Perl and often have the feeling of “If I could just filter this emacs buffer through Perl, how happy I would be!” here’s a tip:

Step 1: Mark the buffer (or the parts of it that you are interested on running through Perl).
Step 2: Type M+| (“Escape” + “|” (vertical bar) on your normal keyboard) or
Step 2 (alt): Issue the “shell-command-on-region” elisp function
Step 3: Enter your perl magic, for instance ‘perl -ne “print if /gnu/”‘ to remove all lines which does not match “gnu”
Step 4: Be amazed!

Now, the result will end up in the minibuffer if it can fit on one line, otherwise a new “shell command output buffer” will be created and hold your stuff.

Tags: , , ,

I often find myself wanting to pass a regex as and argument to some function or another with the initial idea of passing the regex either as a string or a compiled regex (see the qr operator) and I invariably end up with problems. One example is for instance when I pass it as a string I mess up quoting ? / * . etc..

Another example of problems with this approach is the following, imagine that you have written a function to take a regex and you imagine searching all cells in an Excel sheet for cells matching this regex, as follows:

sub find_in_worksheet {
my $regex = shift; my$workbook = shift;
my $worksheet_name = shift; my$worksheet  = $workbook->worksheet($worksheet_name);

my ( $row_min,$row_max ) = $worksheet->row_range(); my ($col_min, $col_max ) =$worksheet->col_range();

for my $row ($row_min .. $row_max ) { for my$col ( $col_min ..$col_max ) {

my $cell =$worksheet->get_cell( $row,$col );
next unless $cell; if($cell->unformatted() =~ $regex ) { return ($row, $col); } } } return undef; }  That works fine, until inevetably you want the cell contents NOT to match the regex… *sigh*, what now…? Well it turns out that a much more fruitful (and a more general) approach is as follows: sub find_in_worksheet { my$predicate = shift;
my $workbook = shift; my$worksheet_name = shift;
my $worksheet =$workbook->worksheet($worksheet_name); my ($row_min, $row_max ) =$worksheet->row_range();
my ( $col_min,$col_max ) = $worksheet->col_range(); for my$row ( $row_min ..$row_max ) {
for my $col ($col_min .. $col_max ) { my$cell = $worksheet->get_cell($row, $col ); next unless$cell;

if( $predicate->($cell->unformatted()) ) {
return ($row,$col);
}
}
}
return undef;
}


You would call this functions as follows:

my ($row,$col) = find_in_worksheet( sub { $_[0] =~ /regex to match cell contents/ } ,$workbook, $worksheetname);  See the “sub {$_[0] =~ /regex to match cell contents/ }” part? There’s the magic. Now if you later on figure out that you want the cell contents NOT to match, you just call the method thusly:

my ($row,$col) = find_in_worksheet( sub { $_[0] !~ /regex to match cell contents/ } ,$workbook, $worksheetname);  Volá! Now you can even change the code for the predicate to something completely different, maybe not even involving regexes… for instance: my ($row, $col) = find_in_worksheet( sub {$_[0] == 10 } , $workbook,$worksheetname);


This is just a small example of the power of using “sub’s” as arguments to functions and the very nice syntax in Perl for achieveing it. It also carries the benefit of giving you compile time checking of your regex! (Yes, Perl is a compiled language!)

Tags: , ,

As a followup on date calculations, here’s small one to calculate the number of days between two dates, the same disclamer as for the previous post applies, so, without further ado, here goes:

# Given two dates, calculate the number of days between them
sub diff_dates {
my ($sec1,$min1,$hour1,$mday1,$mon1,$year1) = @_[0..5];
my ($sec2,$min2,$hour2,$mday2,$mon2,$year2) = @_[6..11];
my $time1 = timegm($sec1,$min1,$hour1,$mday1,$mon1,$year1); my$time2 = timegm($sec2,$min2,$hour2,$mday2,$mon2,$year2);
my $diff_seconds =$time1 - $time2; my$daydiff = $diff_seconds / 24 / 3600; return int($daydiff );
}


Tags: , ,

Ok, here’s a small snippet for calculating a new date given a date and the numer of days forward or backwards from that date without having to include Date::Time or some other big date handling lib which is not part of the standard perl distro. It wouldn’t surprise me if there are some subtile problems with this way of doing it, maybe leap years or something will mess it up, but I have found that it works fine for the most for me at least YMMV!

The format of the input is of course according to timegm.

# Given a date and the number of days forward or backwards (negative) calculate that date
sub calc_date {
my ($sec,$min,$hour,$mday,$mon,$year) = @_[0..5];
my $diff_days =$_[6];
my $time = timegm($sec,$min,$hour,$mday,$mon,$year); my$diff_seconds = $diff_days * 24 * 3600; my$end_time = $time +$diff_seconds;

return localtime($end_time); }  Tags: , , There are two basic alternatives for file slurping: my @lines = <FH>;  Or… { local($/, *FH ) ;
open( FH, $file ) or die "sudden flaming death\n";$text = <FH>;
}


A small comment on Perl 6, it has finally managed to get it right!

 my $content = slurp$filename;
# or
my @lines = slurp $filename;  Does it get easier than that? Shall we look at the Java code for the corresponding operation? No better not, the post will become so long that noone will have time to read it! Tags: , To continue on the track of installing modules as a user in an administered environment, what if you want to use CPAN (you do!) then how would you go about it?? Well, first of fire away cpan:  perl -MCPAN -e shell  If you haven’t already configured CPAN it will ask you to do it. If you have, you can reconfigure CPAN by issuing the following command at the CPAN prompt:  o conf init  At this point the configuration will start. Where it is especially interresting for us trapped users are: Every Makefile.PL is run by perl in a separate process. Likewise we run 'make' and 'make install' in processes. If you have any parameters (e.g. PREFIX, LIB, UNINST or the like) you want to pass to the calls, please specify them here. If you don't understand this question, just press ENTER. Parameters for the 'perl Makefile.PL' command? Typical frequently used settings: PREFIX=~/perl non-root users (please see manual for more hints) Your choice: []  Here you would put in whatever existing directory in your home catalog. For instance:  PREFIX=~/perllib  Or you should be able to use:  INSTALL_BASE=~/perllib  The next interesting section is: If you're accessing the net via proxies, you can specify them in the CPAN configuration or via environment variables. The variable in the$CPAN::Config takes precedence.



Ok, now you should only need to finish the configuration and be ready to start installing perl modules from CPAN in your own home directory.

Remember, to use these installed modules later on you may need to add the

use lib

directive to your perl scripts or export a suitable environment variable (see previous post on the topic).

Update: Had a particularly difficult machine where CPAN persistently tried to run ‘make install’ as root, which it did not have privilage to.

This:

perl -Iperl5/lib/perl5 -MCPAN -Mlocal::lib -e 'CPAN::install(Lingua::EN::StopWords)'


Solved that problem…