How to Get Kids’ Hands Dirty With Data? (Part 2)

My last post is about places students can get publicly available data. I promised to talk about the tools we can use to access it. This post will be about with the tools that are out there now.

Probably the gold standard right now for getting at real data and doing stuff with it is the Wolfram Language. There is a lot to recommend this, but the biggest one is the enormous amount of data the WL puts at your fingertips. Also the wide amount of data types the language can access and work with, including things like colors, countries, shapes, images and all kinds of other things I probably haven’t tried yet. WL is entirely comfortable with an array that looks like this:

Wolfram list

Wolfram language also allows a fantastic range of visual outputs, from 2d and 3d graphs, colored maps, graphs of functions with sliders you can manipulate, interfaces with buttons and probably a bunch of other things I haven’t tried yet. Furthermore it’s extremely easy to use, so almost anyone can look through the documentation and start typing things in to see what happens. And it connects to a lot of other APIs, like Facebook’s.

But some of these strengths are also weaknesses when your goal is to teach coding. Wolfram’s radical leniency with data types will teach your students terrible coding habits that will lead to real problems in other languages. The wide range of different kinds of output the language is capable of makes it really hard to predict what kind of output you’re going to get from a program. And though I haven’t tried it, Wolfram doesn’t seem to be set up for long complex functions. There’s not even a local development environment where a person can edit and save their programs. As a result, I think that WL is a terrific tool for teaching pretty much any subject except for programming. The one exception is if you’re preparing students to learn a functional language like Haskell, since WL is also purely functional, calling functions in a similar way and lacking things that most people expect to see in a programming language, like loops.

The two most popular tools for data manipulation in the scientific world seem to me to be Python and R. Python is a great learning language because it’s really easy to do a lot of things and there are countless modules that can do just about anything. But in spite of its ease relative to other programming languages, it still takes awhile before a student can do much. You have to spend about a semester’s worth of writing console programs before you will be ready to do any kind of graphics, and that’s enough time to lose a student who isn’t naturally attracted to coding. But a well-designed class could have students doing interesting things with data pretty soon.

I don’t know much about R. I know that professional statisticians really like it and professional programmers mostly don’t. I don’t think it has a shallow learning curve. I imagine if you wanted to take the time to teach it you could have students doing some interesting things but from what I know it’s a better tool for teaching statistics than for teaching programming.

So what’s the right tool? I think the ideal tool doesn’t exist…yet. But I have a vision in my head of what it would look like. It would be as accessible as something like Greenfoot or NetBeans, with a low scaffold for beginners. It would allow you to manipulate data as easily as a Python module, though maybe not as easily as WL. It would have tools to create graphics of different sorts. And it would have access to a wide range of open source data, with menus to choose it, though with more precision about types than WL. Ideally you’d be able to import data into SQL-type tables (or maybe JSON or XML objects), and manipulate them with either the SQL language or a functional tool like C#’s LINQ.

The tool I’m imagining is the holy grail of this blog right now. If it existed, I’d be using it now. If it continues to not exist…I might have to try to make it.

Coding Instruction: Beyond Video Games

Scratch Cat at WorkI love teaching kids to make video games. I’ve done this for quite awhile now, using a lot of different languages, including Python, Scratch, the JavaScript/Processing port at Khan Academy, the Greenfoot Java IDE, full Java with Eclipse and the once-promising XNA platform in C#, killed in its youth by short-sighted Microsoft executives (though resurrected as open source code as Monogame). There is a lot to recommend teaching coding this way. Kids love to play video games, so it makes sense they’d love making them. And starting by creating a sprite and moving it across a screen is way more interesting than another “Hello World” program. But the more I do it, the more aware of the limitations.

One important limitation is that making a video game, much like making a website, is not entirely or even mostly a coding task. A big part of making a video game is actually graphic design. Much as they say there are really only a few basic plots in fiction, there are really only a limited number of different types of “actions” in a video game, especially a simple 2D one of the sort a student would hope to make. Once you get that working, it’s mostly about how your sprites and backgrounds look. Designing sprites and backgrounds is an interesting challenge in itself, but it’s a different skill from coding and a distraction from a coding class. Often I have to make hard decisions about how much time I want to let kids spend looking for or designing game sprites, because on the one hand the small number of sprites built into most gaming platforms is limited and quickly gets boring, but on the other hand I am teaching coding, not graphic design.

Furthermore, though these different platforms mostly make beginning a game very easy (with the exception of Java, which has a lot of hills to climb to even make a game window), often students quickly hit a ceiling when they start wanting to do things that the pre-designed built-in “move” and “turn” methods don’t let them do.

When they hit this limit, one of several things happens: some kids get bored and want to quit, some kidsCommands Scratch doesn't have doodle around and do the same thing over and over, and a smaller number of kids push against the limits of the platform. This can lead to some incredibly ugly code in a limited platforms – I’ve seen people build a scrolling platformer with Scratch, but it’s like painting your house with a nail polish brush, and the code is about as attractive. Other platforms have less of a ceiling; Greenfoot contains within it the full capabilities of Java, so theoretically you can do anything, but it’s a huge stretch to get from making a single-screen game with limited sprites and objectives to the sort of games kids imagine, with things like scrolling, jumping with gravity-like motion or different kinds of 3D.

So what’s the alternative? There’s always the “traditional” coding class, where students use Python or some other interpreted language to create text-based programs that say things like “Good job Jim you guessed the correct answer in 2 tries!” But as I argued in an earlier post, that’s exactly the kind of coding class that teaches everyone except natural programmers that coding is boring and not for people like them.

Arduino Uno boardPhysical computing, such as working with Arduinos, is definitely an area of potential here. I will be working with students making Arduino projects this year, and I’ll report on how it goes. As with games, physical computing involves working with things that are interesting learning tools in an of themselves but are not actually programming. More significantly, most Arduino programs, at least of the sort students do, are very simple, and may not touch on many advanced concepts. The difficult part of physical programming is usually the physical part. And anyways, you aren’t always in an electronics lab; sometimes you just have the computers to program with. So what else can you do?

no_northwindI’ve been thinking about this a lot, and it came to me the other day: students really need to do is work with data. Real data, not the Northwind database or fake lists of names from the phone book. Data about the real world, like climate records, health surveys, demographic information, data that allows them to address real problems in the world. But where do they get the data? And how do we get it in a format that they can work with it?

I have thoughts about this, but it’s too much to add to this post. I’ll be addressing my ideas about how to get real data in the hands of students in future posts.