words.pl: slogan word generator

About a year ago, I was really into playing this game online where you were given a single sentence and you had to use the letters in that sentence to make up as many words as possible. The longer the word, the higher the points.

Creating a script may be considered cheating if you’re in it for money. If you’re in it for fun, script away. That’s what I always say.

Here’s the gist of it:

#!/usr/bin/env perl
# words.pl: Find all possible slogan words from a single sentence. 
use strict; $|++;

@ARGV == 2 or die "usage: $0 input_file output_file 'sentence'\n";
my ($infile, $outfile, $sentence) = @ARGV;
$sentence = $sentence || 'how much wood could a woodchuck chuck';

open INPUT, "< $infile" or die $!;
open OUTPUT, "> $outfile" or die $!;

my $stdout = select STDOUT;
$| = 1;
select $stdout;

my %sentence_letters;
my $stmp = $sentence;
$sentence_letters{$&}++ while($stmp =~ s/[a-z]//);

print "Using the sentence '$sentence'\n";
print "Found the following letters:\n";
print "\t$_ - ". $sentence_letters{$_} ."\n" foreach(sort(keys %sentence_letters));
print "Processing $infile for slogan words\n";

my $count = 0;
my @indicators = qw{\ / | .};
LINE: while(<INPUT>) {
    my $word = $_;
    my $tmp = $word;
    next LINE if($word =~ /['\&\d]/);
    my %word_letters;
    $word_letters{$&}++ while($tmp =~ s/[a-z]//);
    
    foreach(keys %word_letters) {
        next LINE if ($word_letters{$_} > $sentence_letters{$_});
    }
    print OUTPUT $word;

    my $word_len = length($word);
    open WORD_LEN_OUTPUT, ">> $outfile.$word_len";
    print WORD_LEN_OUTPUT $word;

    print $indicators[++$count % 4], "\r";
}

print "\nDone.\nView $outfile.* for words\n";

When I wrote this, I had only recently started using Perl. Please go easy on me if it’s poorly written.

The script takes an input file, an output file format (e.g. words.txt will be words.txt.20 for words of 20 characters), and an optional sentence to parse.

It gets a set of letters in the sentence, then runs through the list of words to see if the word can be made from any combination of letters.

For instance, if your ‘sentence’ is “baby cakes”, the script will create a hash of those letters and their counts. Conceptually, this looks like:

// hash is an array
hash['a'] = 2
hash['b'] = 2
hash['c'] = 1
hash['e'] = 1
hash['k'] = 1
hash['s'] = 1
hash['y'] = 1

If, while walking line-by-line through your list of words, the script sees ‘abracadabra’, the loop will return false because (conceptually):

word['a'] = 5
word['a'] <= hash['a'] == false

The script also employs some interesting stdout manipulation. This allows the script to output “spinner text” and update the current line when the terminating character is a line-feed.

To run the script in a linux-based environment, you may do:

mkdir ~/projects && cd ~/projects
git clone git://gist.github.com/1733871.git gist-1733871
cd gist-gist-1733871
perl words.pl /usr/share/dict/words generated.txt 'Good goly, Miss Molly'

You should see output similar to:


jim at schubert in ~/projects/gist-1733871 on master*
$ tree .
.
├── generated.txt
├── generated.txt.1
├── generated.txt.2
├── generated.txt.3
├── generated.txt.4
├── generated.txt.5
├── generated.txt.6
├── generated.txt.7
├── generated.txt.8
└── words.pl

0 directories, 10 files

If you look at generated.txt.7, you will probably see something similar to:

Hollis
Osgood
glossy
goodly
idylls
igloos
solids

Flattr this!