Time Out for SQL

Basic SQL is pretty simple, really. It’s only when I start dealing with big joins and group functions that I start to lose track of what exactly is going on. I’m in that situation now, so I’m constructing a small data set with short table and column names so I don’t have to type as much while I figure out the right syntax. The general problem statement I’ve got is this:

Given a pair of tables where one table holds a bunch of records and the other table holds labels for groups of those records, construct a single SELECT statement that will select all of the records as a single output cell with the format: “label1:’record1′,’record2′,’record3′,label2:’record4′,’record5′,’record6′”.

I’m working with MySQL and intend to use some swell functions, namely CONCAT and GROUP_CONCAT. My initial demo setup is this:

create table demo_a (
  c_id    int NOT NULL,
  c_name  varchar(60) NOT NULL
);

create table demo_b (
  r_id    int NOT NULL,
  r_name  varchar(60) NOT NULL,
  c_id    int NOT NULL
);

insert into demo_a(c_id, c_name) values (1, 'first cohort');
insert into demo_a(c_id, c_name) values (2, 'second cohort');

insert into demo_b(r_id, r_name, c_id) values(1, 'Dave', 1);
insert into demo_b(r_id, r_name, c_id) values(2, 'Sunny Jim', 1);
insert into demo_b(r_id, r_name, c_id) values(3, 'Hoos-Foos', 1);
insert into demo_b(r_id, r_name, c_id) values(4, 'Paris Garters', 2);
insert into demo_b(r_id, r_name, c_id) values(5, 'Harris Tweed', 2);
insert into demo_b(r_id, r_name, c_id) values(6, 'Zanzibar Buck-Buck McFate', 2);

Okay, now this query will return six rows, showing the names and labels of each record:

SELECT b.r_name, a.c_name from demo_b b left join demo_a a on b.c_id = a.c_id;

Let’s see if we can get the output down to two rows, each consisting of a label and some concatenated names.

SELECT a.c_name, GROUP_CONCAT(b.r_name ORDER BY b.r_id SEPARATOR ', ') from demo_b b left join demo_a a on b.c_id = a.c_id GROUP BY b.c_id;

That works! Okay, now let’s see if we can get that down to a single column.

SELECT CONCAT(a.c_name, ": ", GROUP_CONCAT(CONCAT("'",b.r_name,"'") ORDER BY b.r_id SEPARATOR ', ')) from demo_b b left join demo_a a on b.c_id = a.c_id GROUP BY b.c_id;

Hey, that’s pretty good! Okay, now how do I get these two rows collapsed down into a single one? I wonder if I need to wrap this select in another select, defining the two-row result set from the above query as a table and then grouping all those rows together. This sort of matryoshka nesting of queries is what gives me SQL headaches.

SELECT GROUP_CONCAT(x.labeled_names SEPARATOR ', ') single_row FROM
(SELECT CONCAT(a.c_name, ": ", GROUP_CONCAT(CONCAT("'",b.r_name,"'") ORDER BY b.r_id SEPARATOR ', ')) labeled_names
 from demo_b b left join demo_a a on b.c_id = a.c_id GROUP BY b.c_id) x;

OMG it totally works! Yay!

Factorize, More Boxes

Boxes and arrows, I know, it doesn’t look like a program yet. Still, it all is going somewhere, I promise. Let’s take a minute and talk about how factorization works. It’s pretty straightforward: either a number is evenly divisible by some other number (a factor) or it isn’t. Let’s look at an example: 48. What numbers go into 48? Well, I always think of it as 6 times 8, but it’s also 4 times 12. I’m the one with the keyboard, so let’s do 6 times 8. Now, let’s look at 6. Is it evenly divisible by anything? Sure, 2 and 3. How about them? Nope, they are not evenly divisible into smaller integer factors. That’s what it means to be prime, and two and three are the first two prime numbers. Eight, on the other side, divides down into twos. Here’s a tree, where each number gets split down into factors. When a factor can’t be split any more, that factor is prime:

The way the program is going to work (refer back to the previous drawing) is to start out with a number to factorize — let’s repeat this example and use 48 — and a list of primes. It’s going to start at the small end of the list of primes and keep trying until it runs out of primes or until it divides the number all the way down to primes. Here’s the tree the program would draw:

Here, the program would try the first prime, two. That works, so it would divide 48 by 2 and get 12. It would start over with 12 and see if two goes into 12. It does, and it would try again on 6. That works, too, and it would try again to divide 2 into 3. That doesn’t go evenly, so it would go to the next prime. The next prime is 3, which does go into 3 evenly. 3 divided by 3 is 1, so the program would know that it was done. This kind of drawing is called a tree (it’s upside down, don’t worry about it, but 48 is the root and all the primes are leaves). I’ve colored the primes green to highlight how they’re not being divided, and because they’re the leaves.

This process is pretty darned simple to explain. The only thing missing is, gosh, a list of prime numbers! It’s getting late and once again, I’m going to have to leave that for the next time. Here’s a bit more detail on the flowchart, though. This is the detail for the “let p be the next prime number” box. The first time we run through, we want to start at the circle labeled, “A,” but on all the following times, we want to come in at the circle marked, “B.”

Why You Gotta Hate?

For the past three weeks I’ve been working on a Bugzilla system. Specifically, working on fixing small bugs in the UI code of a customized Bugzilla installation. This system has been highly customized, with lots of new Javascript as well as a lot of server-side extensions to customize the process of bug entry and workflow. Bugzilla, in case you didn’t know, is written mostly in Perl. There’s some Javascript and the pages are all rendered with Template Toolkit, so basically, it’s Perl. The last time I worked with Perl, I wrote a couple of scripts to automate some image conversion on Windows machines. At that time, I didn’t need an IDE; Notepad was plenty for the needs of a program that used a single file. Bugzilla is huge. It has hundreds of source files and none of them are self-contained. Every file refers to at least one other file and most refer to several.

Bugzilla is written in Perl but it’s object-oriented Perl. After one day of using BBEdit (a swell text editor that I’ve used on and off for decades) to try to navigate around, I found myself really wishing for an IDE. There exist projects that claim to be Perl IDEs. There’s a Perl plugin for Eclipse (that is slow and just functional enough to be tantalizing, but all it really does is syntax coloring, and not even very well). There’s Padre, which looks swell but doesn’t do even as well as BBEdit for editing and the other stuff doesn’t work. Eventually I tried IDEA, since I have a license for it. It does syntax coloring and multi-file search and all that stuff that BBEdit does, while adding the knowledge that everything under a given folder is part of the project and not to worry about stuff outside the folder. It doesn’t really have Perl support so I still can’t do the really powerful stuff an IDE really lets you do. I still wish for a tool that lets me put the caret in a symbol and then, with a keystroke, go to the declaration of that symbol, wherever that may be within the project.

Other tools that my group support are written in Java, Ruby, and Groovy. It’s enough to make me cry, just a little bit. I’ve been going through Ruby and Groovy tutorials and, okay, I get the amazing power of Rails/Grails. That right there is 100% awesome. If you’ve gotta spin up a prototype web service in a hurry, man, Ruby on Rails is hard to beat. Looking at the language features, though, and the language comparisons (try Googling for “compare ruby java” or “groovy language features”) I start feeling tired. Why do so many people make such a big deal about closures? Are there really that many programmers developing self-modifying systems? Seriously, Lisp is an amazing language that is super powerful and awesome, but most of its awesomeness would go to waste if you’re writing a blog or an online bill paying site.

I’m starting to wonder why so many people seem to be allergic to languages where you can know, just by looking at the declaration of a method, what the method expects the parameters to be. Perl doesn’t even declare parameters, Ruby and Groovy do but the type is optional at best and usually absent. When my little chunk of code gets an object and I want to do something with it, I start wondering, “What kind of thing is this object? What can it do? What kinds of state does it have? Is it even reasonable for me to try to do anything with it?” Some might argue that I (and my code) shouldn’t wonder these things. It should be the job of the caller who’s providing the object to be sure that the object is appropriate for my code. But that’s just sweeping the problem under the rug: how in the world can the caller know, since my method doesn’t advertise parameters? It’s the same question, really.

The big advantage that I keep seeing touted on tutorial websites is that these scripting languages require less typing than stricter languages like Java or C++. You know what? I’ve got a solution for that: learn to type. Holy cow, it’s not like memory is that expensive, that the difference between having braces and no braces in the source code is gonna kill you. Source code should be readable by humans. Okay, open question to all you Perl/Python/Ruby/Groovy/whatever-the-fuck-scripty-language guys out there: why do you hate knowing what type of thing a variable is? Why do you hate tools that tell the programmer what’s going on? Why do you think that the best source code editor is ed? Jeez, guys, programming is fun, why do you have to make it so tedious?

Factorization: Some Drawings

So here’s the very high level flowchart for the factorization program. The key insight here is that the things the program does are grouped together and put inside pretty inclusive boxes. The lower-level flowcharts will start to unpack these boxes. The really interesting thing about this is that for any box we don’t really know how to implement, we can just put in a stub that pretends to do the work. Like, “Oh, I don’t know how to do prime factorization so I’m just going to print the number 5 and move on.” Computer programming is all like this: break the problem down into smaller and smaller chunks until each chunk is something you know how to tell a computer to do.Continue reading “Factorization: Some Drawings”

Programming Assignment: Prime Factorization

Last night the Badb had to find the prime factorization of a number. Thrill moment! This is exactly the problem I once wrote a computer program to solve and wound up entering that program in the school science fair. That was pretty cool.

Back when I had to solve this problem, my computer had a pretty severe limitation within its BASIC interpreter: the highest integer value it could deal with was 65,535. The clock speed on the CPU was pretty darned slow, as well. These meant that any program I wrote had to be pretty efficient if I wanted to be able to interact with it — I could feel myself aging while I waited for the computer to count to 100 — and I didn’t have to worry about really huge data structures. When the biggest array was guaranteed to hold fewer than 100,000 items, FOR..NEXT loops were not very scary.

Okay, enough reminiscing. Here’s the assignment: write a BASIC program that will accept a user’s input of a positive integer greater than 1 and less than 65,536. The program will output the prime factors of the number. The program must guard against empty, non-numeric, and out-of-range values. If the user inputs a value that does not meet the acceptance criteria then the program must print out an error message and prompt once again for input. After writing out the prime factorization of the number, the program should ask the user whether to factor another number or to exit, and then behave appropriately.

Here are some hints:

  • a sieve of Eratosthenes would probably be a good place to start to get a bunch of prime numbers
  • Any factor of a number that is greater than the square root of that number is, by necessity, part of a pair of factors — and the other half of that pair is less than the square root of the number.
  • The modulus operator (mod) might be helpful.
  • Try using pencil and paper to draw a flowchart for how to solve the problem. Start with boxes that represent big steps (“get user input”, “compute prime numbers”, etc.) and then explode the boxes until each box represents a line of actual code. This can seem kind of extreme, but it totally works and it keeps you from having to remember everything at once.

I’m gonna make some time to try to write this program myself, but it’d be super cool if any of the three of you people reading this could beat me to it. I’d be really proud!

My New Commute

Highway 85 is, unless you’re a carpool, unusable during commute hours. I’ve been trying different ways to get to NASA for the past week and so far the best I’ve been able to come up with is 70 minutes. Without freeways, there are basically two routes: mountains north and a short run east across Palo Alto / Mountain View, or mountains northeast to Los Gatos and then surface streets to Mountain View. The second option seems to run about 70 minutes both ways, while the first is about 75 (there are three schools on Page Mill and traffic is a bit slower). Five minutes is pretty much a wash. The valley route, though, has a little bit less mountain twistiness and I can get slightly better mileage on the flats even in slow commute traffic. I can drive less aggressively and be all calm and stuff, but there’s just no way around gravity when you’re going uphill.

Now I’ll start keeping track of what stores I’m passing to see if I can combine any errands with my commute.

Bagpipe Tuning

There are several pages out there that go into great detail about tuning the Great Highland Bagpipes. The real trick, for me, is making sure that the chanter is in tune. I am not blessed with perfect pitch and it can be difficult to do this by ear. On top of that, my instructor doesn’t want to spend all lesson going over the tape on my chanter and fiddling with the reed. The whole band has been given a directive: show up to practice with your chanter in tune at band pitch so we spend a little time getting our drones set and can spend most of our time playing. We use a 449 A and tune to B-flat. Our scale is thus:

Chanter Note Tuner Note Variance
G G# flat by 31 cents
A B flat
B C flat by 1 cent
C D flat by 15 cents
D E flat flat by 1 cent
E F flat by 31 cents
F G flat by 5 cents
G G# flat by 31 cents
A B flat

Kids, Start Typing!

I have finished writing a BASIC implementation of a tic-tac-toe game. There are lots of ways it could be improved or modified, and the Badb has already told me she’d be interested in a tic-tac-toe program. I’ve printed it out on paper so she can type it in, herself. This is how I learned to program: by transcribing programs written by other people, with a mixture of good and bad habits. Things that were important became clear, and things that were not important also became clear. I’ve posted the program to this site so that if you, too, would like your kid (or heck, even you, yourself) to transcribe a program and play with it you can have the opportunity.

Here are some suggestions for changes, in no particular order of difficulty:

  • Instead of requiring the enter key for responses, use looping and inkey$() to get user responses
  • Instead of grid coordinates, map each cell to a single digit (play from the number keys)
  • Allow for two humans to take turns at the keyboard, using the computer as a very expensive paper and pencil
  • Have the program loop back to the beginning after a game is over so you don’t have to type “run” every time

Feel free to add your own suggestions in the comments.

Hook ’em Early

Yesterday I downloaded Chipmunk Basic and installed it on my laptop. I did code up a quick sieve of Eratosthenes with lots of comments and helpful PRINT statements, all without cheating and looking at the sieve program included in the readme. I guess I get a gold nerd for that. Anyway, this morning I told the Badb that if she did the programming — not me, mind, but she, herself — then I was perfectly okay with her using programs to do her homework. In two seconds she went from glazed and bored to hyperalert. I took her through the code and showed her how the program worked and I could see her starting to lose focus again. I’m trying to remember all the fooling around with programs I did as a kid before I was competent enough to write programs to cheat at doing homework.

We used to have a subscription to Byte magazine; some months there’d be a printed listing of a BASIC program and my dad would have me sit at the computer and type the listing in. The programs that I remember were games, some more interesting than others. One that I spent most time on was a sort of Zork imitation. There was just a text interface, of course, with the prompt being a printed description of the room your character stood in. You explored the maze, and I don’t remember if there was even a victory condition. Maybe you had to find an object and then bring it back out; I don’t remember. The thing that impresses me now is that the program was so long and complex that even though I knew that I was typing in a puzzle as I typed in the program, I didn’t know how to solve it. But I did see how to change the game — of course, the source was right there in front of me. First, I started by rewriting the descriptions of rooms. Then I figured out how to add a room and hook it in to the existing rooms. That took some doing, since the map was just a bunch of lines of DAT statements with line numbers as the data. Altering the program altered line numbers and wound up invalidating the lookup table. I think I must have spent several months fooling around with these programs before I ever undertook writing anything more complicated than, “Hello, world!” But all that fooling around taught me lots about the BASIC programming language, about how programs are structured, about the kinds of tricks you can do when you’re actually writing the software. Hmm. Maybe I’ll start out with a game that plays Hangman. Or tic-tac-toe. Or nim.

So now I wonder, having found an archive of scanned magazines, is this something my kid would be interested in? Can it really compete with web sites that let her dress up Barbie? Man, I sure hope so! I want her to think, if she’s unhappy with the array of accessories, “I know! I’ll write my own version that has the kind of bracelets I want!”

Those Were the Days

I think I just had a sports dad moment. This has always seemed a little alien to me: in the movies there’s always the middle aged dad who still remembers his glory days in high school and relives them vicariously through his children. I didn’t have glory days in high school, I had misery. Oh, there was fun, too, but the fantasy of going back in time and reliving high school holds negative attraction for me. I would pay money not to have to do it.

And yet, there we were. Badb was doing her math homework and I was helping her understand it and we got to the last problem: “Write down all the prime numbers less than 100.” I explained to her how to use pencil and paper to perform the sieve of Eratosthenes and she got started. The first problem was that in generating the list, she skipped a few numbers. I think maybe she got bored with the task and started thinking ahead of where the pencil was writing.

I told this story to Junglemonkey and she said I should try teaching Badb to program. Programming, after all, is all about attention to detail and planning, and it would really help the kid exercise mental discipline. I thought about it and I’m really unhappy with modern computers and programming languages. All the flashy, blinking, beeping, Internet-connected whiz-bang features combined with the really loose requirements for data typing, control flow, variable declaration, and so forth that you get with modern scripting languages really gets you a recipe for frustration. She hasn’t, up to this point, demonstrated any curiosity around programming. I figured it was something like, “Dad’s doing it so therefore it must not be cool,” but seriously, a little kid doesn’t care about object oriented highly scalable enterprise middleware; she just wants to solve this problem on her math homework.

When I was in the 6th grade, my dad brought home a computer because my mom didn’t want him writing the next revision of his textbook on a typewriter. I remember writing BASIC programs to do the drudgery of my math homework for me in later years (for example, computing Pythagorean triples). Suddenly, I’m all fired up to get BASIC on the Mac Mini and get Badb writing programs to do her homework. BASIC has lots of limitations, but that’s the beauty of it. It’s got huge advantages over Perl, PHP, Javascript, C, Python, Ruby, Groovy, Haskell, Scala, Lisp, or whatever other awesome language you’re about to suggest in the comments.

  • Line numbers. Having to go renumber your program by hand (and make sure that you’ve updated all the GOTO and GOSUB line references) is a major pain in the neck. Once you do that, you learn to leave yourself lots of room for future work. To an observer this might look like “planning ahead” but that’s too specific. One doesn’t know necessarily what changes one will make, nor where, nor even if one will make them. But learning to leave oneself the option…that’s a really valuable lesson.
  • Barely functional programming paradigm. Sure, you can declare subroutines, but it’s really all just labels and line numbers with no parameters. One needs to cultivate some ability to hold the whole set of instructions in mind at once. With no encapsulation, she’ll have to be able to remember what that variable means and which sections of code modify it.
  • Really primitive I/O. This is killer. She won’t get distracted from her dimensional analysis by trying to make the units feed in from some website and play an entertaining video when the answer’s spat out.
  • Cryptic error messages. Not, “Exception processing token <sieve> on line 2,” but merely, “Syntax error,” or perhaps, “Invalid character.” It’s like one of those I, Spy puzzles but instead of looking for a nail, a 4, and two butterflies you’re looking for the instruction that the computer can’t understand. You get really good at proofreading.

I love this idea. From now on, her math will be less and less about mastering basic operations and memorizing multiplication facts and become more and more about understanding and exploring relationships between numbers. Programming can totally help with that.