Pete Hindle

Pictures and stuff from a guy who likes coffee.

Tag: data

Basic Tech I – (The Hitchhikers Guide to Regex)

The current state of my Basic Techniques project is this:

It doesn’t work.

However, this is a defeatist attitude. Not quite as defeatist as I’ve been considering (it doesn’t work, I’m never going to understand regex, and I’m going to stop bothering with programming being the other considered viewpoint).

On the other hand, sometimes the ways that it doesn’t work make no sense to me. For instance, one piece of code I wrote matched the string being read to a specific string, and incremented a counter once using the ‘++’ function. Except that it didn’t, it decided to increment the counter 559 times, and then it decided that all the words I was looking for were all there, 559 times.

Back to the drawing board from that code then. I really thought that loop was going to work as well; it had all the indications of when and where, as it cycled through the newly created string array that contained the compartmentalised (granulised?) longer string.

Then, when that failed I was back at regex. And I now hate regex deeply and purely, for being such a dense science that needs introduction. A big ‘thanks’ to everybody who pointed me at the same damn impenetrable tutorial. I sort of wish I’d chosen to do a project with Arduino controlled rockets instead, because whilst rocket science might have a reputation for being hard it never involves typing a string of impenetrable characters into a search box and hoping against hope that this would be the last leap. (http://www.youtube.com/watch?v=1XBwWAu2a5U)

Even the more seasoned programmers threw some askance glances at my code when they saw the way that splitTokens() works – ie, you throw all the tokens you want to use to split up the text together in a big line. For me, this was the lump of code ” ,.?!;: “, which I’d inherited from Daniel Shiffman’s example code in “Learning Processing”. This actually made a lot more sense to me than the output of match().

According to it’s documentation, Match() outputs an array if the sequence searched for matches what is in the inputted string. It outputs a an array “if the sequence did match, an array is returned. If there are groups (specified by sets of parentheses) in the regexp, then the contents of each will be returned in the array. Element [0] of a regexp match returns the entire matching string, and the match groups start at element [1] (the first group is [1], the second [2], and so on).”

Okay: first problem. Putting parentheses in doesn’t make it work with multiple choices. I guess we can swap over to matchAll() for that, but without multiple parentheses and therefore multiple choices, what point is the items returned as an array? It could, surely, be a yes/no answer? In fact, it returns an array which flummoxed me for several days as I realised that no matter what I put into the string as input, it always returned the same value. Two.

Searching for the word ‘will’ in the phrase “Inside will a tag, you will find will will content” will only ever return the value of two. Or rather, the value of ‘yes’ transmuted into ‘two’ by way of the length of an array, which is an entirely erroneous way of doing it. Almost as erroneous as the previous way of counting through the text as a string and looking at each individual part and then counting them (again, erroneously – to the tune of 559). Balls.

In my presentation – which I guess I’ll be covering in Basic Tech II (The Poptart at the End of the Universe) – I was told by Atau that I was only a half step away from solving a few of the problems. Maybe. I can see a functioning end to this problem, just not from here. Should I use the match function and the logic structure that I’ve been working on? There’s no guarantee that the logic structure will even work (559!) Five-five-nine! My least best guess is that my Macbook wants to emigrate to the People’s Republic of China and move to computing division 559.

Creating Displays with Processing

Again, this blog post is part of my basic techniques module, so you might not find this thrilling… casual readers might want to skip this blog post and come back later.

As part of my basic techniques module I wanted to work on something quite simple. I’ve broken it down into lots of smaller chunks, and this chunk that I’ve been working on refers to how you would move the data gathered from counting words within a text to a graphical display.

fake_values

Here I’ve used a ellipse to represent a number between zero and three-hundred and sixty. The finished project won’t have that sort of word limit, and the use of an ellipse would not be a good design feature, but for the purposes of this sketch it works pretty well. Some of the important parts are that the sketch communicates the values from inside a for loop (which, in the final project, will count through the words of the document as an array), it uses rollover style data display, and it has a selection of choices from which you can pick to display different data.

Obviously, this is a mock-up in several different ways, and the data doesn’t actually mean anything – it’s more an experiment to see the data and see how it would be crafted in Processing. Some things that I’m not happy with are the immense amount of code that it takes to do the rollover affect (should that be a class by itself?) and the placement of the text above the smaller arcs. I also think that the actually main display could be better, by rounding off the value to a straight int rather than a four-point decimal.

Click on the image above to see the sketch in action and to download the source code.

Data Mining Yourself as Artistic Practice

After my presentation for DMS8002 (the Basic Techniques module that I’m doing on my course) it was suggested to me that using my coursework as a platform to generate visualisations actually means that – to some extent – the artwork I’m creating is a reflection upon the work I’m doing for my major module.

This reminded me of the Mail Trends project, which takes the contents of an IMAP-enabled email address and combs it for information. With that information, it then produces a bunch of graphs relating to the usage of the email address. Above, you can see that I’m unlikely to send you an email at 6am in the morning. Below, you can see that I’m in touch with Brian Degger a lot. But, strangely, not as much as I’m in touch with myself… Why my own name comes up more than anybody elses, I’m not sure; this might be a side effect of having between two or three other email accounts plumbed into my Gmail account.

This project is interesting to me as it gives you the chance to look at a body of work you produce, but it’s a body of work that you produce by accident. Artistically, the output isn’t fantastic; it has colour and shape, but those are really secondary concerns as to displaying the data graphically. The Feltron Report stands at the other end of this sort of practice; it’s the work of an artist who obsessively records whatever he does and produces an annual report on his activity.