Pete Hindle

Pictures and stuff from a guy who likes coffee.

Tag: infovis

Basic Tech I – (The Hitchhikers Guide to Regex)

The current state of my Basic Techniques project is this:

It doesn’t work.

However, this is a defeatist attitude. Not quite as defeatist as I’ve been considering (it doesn’t work, I’m never going to understand regex, and I’m going to stop bothering with programming being the other considered viewpoint).

On the other hand, sometimes the ways that it doesn’t work make no sense to me. For instance, one piece of code I wrote matched the string being read to a specific string, and incremented a counter once using the ‘++’ function. Except that it didn’t, it decided to increment the counter 559 times, and then it decided that all the words I was looking for were all there, 559 times.

Back to the drawing board from that code then. I really thought that loop was going to work as well; it had all the indications of when and where, as it cycled through the newly created string array that contained the compartmentalised (granulised?) longer string.

Then, when that failed I was back at regex. And I now hate regex deeply and purely, for being such a dense science that needs introduction. A big ‘thanks’ to everybody who pointed me at the same damn impenetrable tutorial. I sort of wish I’d chosen to do a project with Arduino controlled rockets instead, because whilst rocket science might have a reputation for being hard it never involves typing a string of impenetrable characters into a search box and hoping against hope that this would be the last leap. (http://www.youtube.com/watch?v=1XBwWAu2a5U)

Even the more seasoned programmers threw some askance glances at my code when they saw the way that splitTokens() works – ie, you throw all the tokens you want to use to split up the text together in a big line. For me, this was the lump of code ” ,.?!;: “, which I’d inherited from Daniel Shiffman’s example code in “Learning Processing”. This actually made a lot more sense to me than the output of match().

According to it’s documentation, Match() outputs an array if the sequence searched for matches what is in the inputted string. It outputs a an array “if the sequence did match, an array is returned. If there are groups (specified by sets of parentheses) in the regexp, then the contents of each will be returned in the array. Element [0] of a regexp match returns the entire matching string, and the match groups start at element [1] (the first group is [1], the second [2], and so on).”

Okay: first problem. Putting parentheses in doesn’t make it work with multiple choices. I guess we can swap over to matchAll() for that, but without multiple parentheses and therefore multiple choices, what point is the items returned as an array? It could, surely, be a yes/no answer? In fact, it returns an array which flummoxed me for several days as I realised that no matter what I put into the string as input, it always returned the same value. Two.

Searching for the word ‘will’ in the phrase “Inside will a tag, you will find will will content” will only ever return the value of two. Or rather, the value of ‘yes’ transmuted into ‘two’ by way of the length of an array, which is an entirely erroneous way of doing it. Almost as erroneous as the previous way of counting through the text as a string and looking at each individual part and then counting them (again, erroneously – to the tune of 559). Balls.

In my presentation – which I guess I’ll be covering in Basic Tech II (The Poptart at the End of the Universe) – I was told by Atau that I was only a half step away from solving a few of the problems. Maybe. I can see a functioning end to this problem, just not from here. Should I use the match function and the logic structure that I’ve been working on? There’s no guarantee that the logic structure will even work (559!) Five-five-nine! My least best guess is that my Macbook wants to emigrate to the People’s Republic of China and move to computing division 559.

Creating Displays with Processing

Again, this blog post is part of my basic techniques module, so you might not find this thrilling… casual readers might want to skip this blog post and come back later.

As part of my basic techniques module I wanted to work on something quite simple. I’ve broken it down into lots of smaller chunks, and this chunk that I’ve been working on refers to how you would move the data gathered from counting words within a text to a graphical display.

fake_values

Here I’ve used a ellipse to represent a number between zero and three-hundred and sixty. The finished project won’t have that sort of word limit, and the use of an ellipse would not be a good design feature, but for the purposes of this sketch it works pretty well. Some of the important parts are that the sketch communicates the values from inside a for loop (which, in the final project, will count through the words of the document as an array), it uses rollover style data display, and it has a selection of choices from which you can pick to display different data.

Obviously, this is a mock-up in several different ways, and the data doesn’t actually mean anything – it’s more an experiment to see the data and see how it would be crafted in Processing. Some things that I’m not happy with are the immense amount of code that it takes to do the rollover affect (should that be a class by itself?) and the placement of the text above the smaller arcs. I also think that the actually main display could be better, by rounding off the value to a straight int rather than a four-point decimal.

Click on the image above to see the sketch in action and to download the source code.

Basic Techniques Blogpost

As part of my coursework, I’m creating a text analysis tool. The coursework also states that blog posts are part of the working progress. Therefore, if you’re not on my course or working with Processing in some form, I doubt the post would be interesting to you.

What is the best way of storing a number next to a word?

button_test

I want to scan through a document and look for specfic words. I’ve stored the words in a string array. I thought that the best way to deal with the storage of the words would be to create a two dimensional array and store the words in the first array, and the amount of the word searched for in the second array.

This is not so good, actually, because I made a fatal misunderstanding about two dimensional arrays. The first array is more like an index, so where I thought I was creating…
Read the rest of this entry »