Prime Factorization – Actual Code

I finally had a little time free and put together a solution to the prime factorization problem. As with the tic-tac-toe program before, there are lots of opportunities to make this program better. This only runs once, rather than looping. The number being factored is a constant so you have to edit the program to get the factors of a different number. I limited the list of primes so the program as it stands will only compute the prime factors of numbers less than 65,535 (an homage to the old 8-bit days).Continue reading “Prime Factorization – Actual Code”

I Hate Your Favorite VCS

Whenever I have to start working with a new software package to do a task I already know how to perform using a different software package, I feel a little frustrated. I’m sure everyone can relate to this. Programs that do similar things are often unnecessarily differentiated. It’s as if DeWalt and Makita made cordless drills that not only had different colored plastic and different battery packs but also spun around in entirely different dimensions and one was hand operated while the other was controlled with facial tics.

I’ve now used, professionally, several different version control systems. Visual SourceSafe, CVS, Perforce, subversion, Mercurial, Bazaar, and git. I hate them all.

Okay, maybe that’s a bit strong. I’ve figured out how to get work done with Perforce. I really love its changelists. That’s great. Other systems let you shelve changes, and that’s great, too. But here’s the deal: I’m in a new environment, I know how to write software and I’ve got bugs to fix. I don’t want my tools to get in the way. And yet, here I am, trying to figure out how in the world to undelete a file, how to commit a change, how to remove files, and how to generate a diff that doesn’t make my eyes bleed. And did I mention that I’ve got actual work to do?

The latest crop of distributed SCM tools (git, Mercurial, and Bazaar) want you to drink their Kool-Aid and spend days just becoming a dittohead for their path. I’m getting a bit profane and testy because it’s taking me too dang long just to get done what I want to get done. I’m not a 16 year old with nothing better to do. I will pay actual money for someone to write a decent manual.

I want it to be task oriented and with real examples. I have a repo with files in it that shouldn’t be there. How do I delete them? I want my cleaned up repo to be picked up by the main repo. How do I do that? Those deleted files are actually metadata for my development environment; once I delete them from the repo I want to put them back in place locally; how do I keep the VCS client from deleting my files or bitching about them the next time I pull changes from the repo?

Everyone puts up “how-to” pages for creating a new repo (which you do once per project), checking out source, adding files, checking in changes, and sometimes even branching and merging. That’s not enough. Revert files to the unlabeled version from yesterday before lunch. Restore a deleted file. Ignore some files in the source tree that were temporary files or program logs or IDE metadata. Rename a file. Move a file from one directory to another. Move a whole directory. Look at the version history of a file to figure out who, four years ago, was the person who wrote an otherwise undocumented subroutine so you can ask about it. Start doing these things and you realize why configuration management is an actual professional field distinct from software development. You’ll also discover that whoever set up the repo in your company didn’t know what he was doing, any more than you do.


Hey, I Built a Thing

My new job involves hacking on Bugzilla and part of that involves email. Bug email, system administration email, yadda yadda. I wanted a way to test that email without it ever leaving my development system. We had a mechanism in place where the emails would just get written to a plain file, but that doesn’t help with HTML email. I wanted to be able to see the email rendered all nice and tidy in A couple of jobs ago, one of my coworkers put something together out of Python and it worked great, but I couldn’t find an already-done example online anywhere. Other servers were GUI or Mac-only and therefore wouldn’t work on a headless Linux test server. Whatever solution I picked, I’d have to write some code.

Well, I found a Java SMTP test server and wrote a simple POP3 server into it. So now I’ve modified dumbster and made my modifications available to the world. Use it, improve it, do as you like. It’ll make my life easier, and I hope it makes yours a little better, too.

Always Ask One More Question

Badb just asked me if I could show her how to calculate the square root of a number. I cast my mind back to 7th grade and tried to remember going up to the blackboard and calculating arbitrary square roots. “Um, yeah, but let me go look it up. It’s been a long time and I want to be sure I’m right,” I said. I found a nice explanation that aligned well with the method I learned oh-so-many years ago (and haven’t used since — what do you suppose slide rules, calculators, and general-purpose computers are for, if not for doing homework?) and came back to her. “Okay, so show me the number you’re supposed to take the square root of.”

It was 3600.

Srsly. Always ask another question. But keep the interwebs handy just in case your kid does need to compute the square root of 5,287 and show her work.

Time Out for SQL

Basic SQL is pretty simple, really. It’s only when I start dealing with big joins and group functions that I start to lose track of what exactly is going on. I’m in that situation now, so I’m constructing a small data set with short table and column names so I don’t have to type as much while I figure out the right syntax. The general problem statement I’ve got is this:

Given a pair of tables where one table holds a bunch of records and the other table holds labels for groups of those records, construct a single SELECT statement that will select all of the records as a single output cell with the format: “label1:’record1′,’record2′,’record3′,label2:’record4′,’record5′,’record6′”.

I’m working with MySQL and intend to use some swell functions, namely CONCAT and GROUP_CONCAT. My initial demo setup is this:

create table demo_a (
  c_id    int NOT NULL,
  c_name  varchar(60) NOT NULL

create table demo_b (
  r_id    int NOT NULL,
  r_name  varchar(60) NOT NULL,
  c_id    int NOT NULL

insert into demo_a(c_id, c_name) values (1, 'first cohort');
insert into demo_a(c_id, c_name) values (2, 'second cohort');

insert into demo_b(r_id, r_name, c_id) values(1, 'Dave', 1);
insert into demo_b(r_id, r_name, c_id) values(2, 'Sunny Jim', 1);
insert into demo_b(r_id, r_name, c_id) values(3, 'Hoos-Foos', 1);
insert into demo_b(r_id, r_name, c_id) values(4, 'Paris Garters', 2);
insert into demo_b(r_id, r_name, c_id) values(5, 'Harris Tweed', 2);
insert into demo_b(r_id, r_name, c_id) values(6, 'Zanzibar Buck-Buck McFate', 2);

Okay, now this query will return six rows, showing the names and labels of each record:

SELECT b.r_name, a.c_name from demo_b b left join demo_a a on b.c_id = a.c_id;

Let’s see if we can get the output down to two rows, each consisting of a label and some concatenated names.

SELECT a.c_name, GROUP_CONCAT(b.r_name ORDER BY b.r_id SEPARATOR ', ') from demo_b b left join demo_a a on b.c_id = a.c_id GROUP BY b.c_id;

That works! Okay, now let’s see if we can get that down to a single column.

SELECT CONCAT(a.c_name, ": ", GROUP_CONCAT(CONCAT("'",b.r_name,"'") ORDER BY b.r_id SEPARATOR ', ')) from demo_b b left join demo_a a on b.c_id = a.c_id GROUP BY b.c_id;

Hey, that’s pretty good! Okay, now how do I get these two rows collapsed down into a single one? I wonder if I need to wrap this select in another select, defining the two-row result set from the above query as a table and then grouping all those rows together. This sort of matryoshka nesting of queries is what gives me SQL headaches.

SELECT GROUP_CONCAT(x.labeled_names SEPARATOR ', ') single_row FROM
(SELECT CONCAT(a.c_name, ": ", GROUP_CONCAT(CONCAT("'",b.r_name,"'") ORDER BY b.r_id SEPARATOR ', ')) labeled_names
 from demo_b b left join demo_a a on b.c_id = a.c_id GROUP BY b.c_id) x;

OMG it totally works! Yay!

Factorize, More Boxes

Boxes and arrows, I know, it doesn’t look like a program yet. Still, it all is going somewhere, I promise. Let’s take a minute and talk about how factorization works. It’s pretty straightforward: either a number is evenly divisible by some other number (a factor) or it isn’t. Let’s look at an example: 48. What numbers go into 48? Well, I always think of it as 6 times 8, but it’s also 4 times 12. I’m the one with the keyboard, so let’s do 6 times 8. Now, let’s look at 6. Is it evenly divisible by anything? Sure, 2 and 3. How about them? Nope, they are not evenly divisible into smaller integer factors. That’s what it means to be prime, and two and three are the first two prime numbers. Eight, on the other side, divides down into twos. Here’s a tree, where each number gets split down into factors. When a factor can’t be split any more, that factor is prime:

The way the program is going to work (refer back to the previous drawing) is to start out with a number to factorize — let’s repeat this example and use 48 — and a list of primes. It’s going to start at the small end of the list of primes and keep trying until it runs out of primes or until it divides the number all the way down to primes. Here’s the tree the program would draw:

Here, the program would try the first prime, two. That works, so it would divide 48 by 2 and get 12. It would start over with 12 and see if two goes into 12. It does, and it would try again on 6. That works, too, and it would try again to divide 2 into 3. That doesn’t go evenly, so it would go to the next prime. The next prime is 3, which does go into 3 evenly. 3 divided by 3 is 1, so the program would know that it was done. This kind of drawing is called a tree (it’s upside down, don’t worry about it, but 48 is the root and all the primes are leaves). I’ve colored the primes green to highlight how they’re not being divided, and because they’re the leaves.

This process is pretty darned simple to explain. The only thing missing is, gosh, a list of prime numbers! It’s getting late and once again, I’m going to have to leave that for the next time. Here’s a bit more detail on the flowchart, though. This is the detail for the “let p be the next prime number” box. The first time we run through, we want to start at the circle labeled, “A,” but on all the following times, we want to come in at the circle marked, “B.”

Why You Gotta Hate?

For the past three weeks I’ve been working on a Bugzilla system. Specifically, working on fixing small bugs in the UI code of a customized Bugzilla installation. This system has been highly customized, with lots of new Javascript as well as a lot of server-side extensions to customize the process of bug entry and workflow. Bugzilla, in case you didn’t know, is written mostly in Perl. There’s some Javascript and the pages are all rendered with Template Toolkit, so basically, it’s Perl. The last time I worked with Perl, I wrote a couple of scripts to automate some image conversion on Windows machines. At that time, I didn’t need an IDE; Notepad was plenty for the needs of a program that used a single file. Bugzilla is huge. It has hundreds of source files and none of them are self-contained. Every file refers to at least one other file and most refer to several.

Bugzilla is written in Perl but it’s object-oriented Perl. After one day of using BBEdit (a swell text editor that I’ve used on and off for decades) to try to navigate around, I found myself really wishing for an IDE. There exist projects that claim to be Perl IDEs. There’s a Perl plugin for Eclipse (that is slow and just functional enough to be tantalizing, but all it really does is syntax coloring, and not even very well). There’s Padre, which looks swell but doesn’t do even as well as BBEdit for editing and the other stuff doesn’t work. Eventually I tried IDEA, since I have a license for it. It does syntax coloring and multi-file search and all that stuff that BBEdit does, while adding the knowledge that everything under a given folder is part of the project and not to worry about stuff outside the folder. It doesn’t really have Perl support so I still can’t do the really powerful stuff an IDE really lets you do. I still wish for a tool that lets me put the caret in a symbol and then, with a keystroke, go to the declaration of that symbol, wherever that may be within the project.

Other tools that my group support are written in Java, Ruby, and Groovy. It’s enough to make me cry, just a little bit. I’ve been going through Ruby and Groovy tutorials and, okay, I get the amazing power of Rails/Grails. That right there is 100% awesome. If you’ve gotta spin up a prototype web service in a hurry, man, Ruby on Rails is hard to beat. Looking at the language features, though, and the language comparisons (try Googling for “compare ruby java” or “groovy language features”) I start feeling tired. Why do so many people make such a big deal about closures? Are there really that many programmers developing self-modifying systems? Seriously, Lisp is an amazing language that is super powerful and awesome, but most of its awesomeness would go to waste if you’re writing a blog or an online bill paying site.

I’m starting to wonder why so many people seem to be allergic to languages where you can know, just by looking at the declaration of a method, what the method expects the parameters to be. Perl doesn’t even declare parameters, Ruby and Groovy do but the type is optional at best and usually absent. When my little chunk of code gets an object and I want to do something with it, I start wondering, “What kind of thing is this object? What can it do? What kinds of state does it have? Is it even reasonable for me to try to do anything with it?” Some might argue that I (and my code) shouldn’t wonder these things. It should be the job of the caller who’s providing the object to be sure that the object is appropriate for my code. But that’s just sweeping the problem under the rug: how in the world can the caller know, since my method doesn’t advertise parameters? It’s the same question, really.

The big advantage that I keep seeing touted on tutorial websites is that these scripting languages require less typing than stricter languages like Java or C++. You know what? I’ve got a solution for that: learn to type. Holy cow, it’s not like memory is that expensive, that the difference between having braces and no braces in the source code is gonna kill you. Source code should be readable by humans. Okay, open question to all you Perl/Python/Ruby/Groovy/whatever-the-fuck-scripty-language guys out there: why do you hate knowing what type of thing a variable is? Why do you hate tools that tell the programmer what’s going on? Why do you think that the best source code editor is ed? Jeez, guys, programming is fun, why do you have to make it so tedious?

Factorization: Some Drawings

So here’s the very high level flowchart for the factorization program. The key insight here is that the things the program does are grouped together and put inside pretty inclusive boxes. The lower-level flowcharts will start to unpack these boxes. The really interesting thing about this is that for any box we don’t really know how to implement, we can just put in a stub that pretends to do the work. Like, “Oh, I don’t know how to do prime factorization so I’m just going to print the number 5 and move on.” Computer programming is all like this: break the problem down into smaller and smaller chunks until each chunk is something you know how to tell a computer to do.Continue reading “Factorization: Some Drawings”

Programming Assignment: Prime Factorization

Last night the Badb had to find the prime factorization of a number. Thrill moment! This is exactly the problem I once wrote a computer program to solve and wound up entering that program in the school science fair. That was pretty cool.

Back when I had to solve this problem, my computer had a pretty severe limitation within its BASIC interpreter: the highest integer value it could deal with was 65,535. The clock speed on the CPU was pretty darned slow, as well. These meant that any program I wrote had to be pretty efficient if I wanted to be able to interact with it — I could feel myself aging while I waited for the computer to count to 100 — and I didn’t have to worry about really huge data structures. When the biggest array was guaranteed to hold fewer than 100,000 items, FOR..NEXT loops were not very scary.

Okay, enough reminiscing. Here’s the assignment: write a BASIC program that will accept a user’s input of a positive integer greater than 1 and less than 65,536. The program will output the prime factors of the number. The program must guard against empty, non-numeric, and out-of-range values. If the user inputs a value that does not meet the acceptance criteria then the program must print out an error message and prompt once again for input. After writing out the prime factorization of the number, the program should ask the user whether to factor another number or to exit, and then behave appropriately.

Here are some hints:

  • a sieve of Eratosthenes would probably be a good place to start to get a bunch of prime numbers
  • Any factor of a number that is greater than the square root of that number is, by necessity, part of a pair of factors — and the other half of that pair is less than the square root of the number.
  • The modulus operator (mod) might be helpful.
  • Try using pencil and paper to draw a flowchart for how to solve the problem. Start with boxes that represent big steps (“get user input”, “compute prime numbers”, etc.) and then explode the boxes until each box represents a line of actual code. This can seem kind of extreme, but it totally works and it keeps you from having to remember everything at once.

I’m gonna make some time to try to write this program myself, but it’d be super cool if any of the three of you people reading this could beat me to it. I’d be really proud!

My New Commute

Highway 85 is, unless you’re a carpool, unusable during commute hours. I’ve been trying different ways to get to NASA for the past week and so far the best I’ve been able to come up with is 70 minutes. Without freeways, there are basically two routes: mountains north and a short run east across Palo Alto / Mountain View, or mountains northeast to Los Gatos and then surface streets to Mountain View. The second option seems to run about 70 minutes both ways, while the first is about 75 (there are three schools on Page Mill and traffic is a bit slower). Five minutes is pretty much a wash. The valley route, though, has a little bit less mountain twistiness and I can get slightly better mileage on the flats even in slow commute traffic. I can drive less aggressively and be all calm and stuff, but there’s just no way around gravity when you’re going uphill.

Now I’ll start keeping track of what stores I’m passing to see if I can combine any errands with my commute.