Basic Tech I – (The Hitchhikers Guide to Regex)
The current state of my Basic Techniques project is this:
It doesn’t work.
However, this is a defeatist attitude. Not quite as defeatist as I’ve been considering (it doesn’t work, I’m never going to understand regex, and I’m going to stop bothering with programming being the other considered viewpoint).
On the other hand, sometimes the ways that it doesn’t work make no sense to me. For instance, one piece of code I wrote matched the string being read to a specific string, and incremented a counter once using the ‘++’ function. Except that it didn’t, it decided to increment the counter 559 times, and then it decided that all the words I was looking for were all there, 559 times.
Back to the drawing board from that code then. I really thought that loop was going to work as well; it had all the indications of when and where, as it cycled through the newly created string array that contained the compartmentalised (granulised?) longer string.
Then, when that failed I was back at regex. And I now hate regex deeply and purely, for being such a dense science that needs introduction. A big ‘thanks’ to everybody who pointed me at the same damn impenetrable tutorial. I sort of wish I’d chosen to do a project with Arduino controlled rockets instead, because whilst rocket science might have a reputation for being hard it never involves typing a string of impenetrable characters into a search box and hoping against hope that this would be the last leap. (http://www.youtube.com/watch?v=1XBwWAu2a5U)
Even the more seasoned programmers threw some askance glances at my code when they saw the way that splitTokens() works – ie, you throw all the tokens you want to use to split up the text together in a big line. For me, this was the lump of code ” ,.?!;: “, which I’d inherited from Daniel Shiffman’s example code in “Learning Processing”. This actually made a lot more sense to me than the output of match().
According to it’s documentation, Match() outputs an array if the sequence searched for matches what is in the inputted string. It outputs a an array “if the sequence did match, an array is returned. If there are groups (specified by sets of parentheses) in the regexp, then the contents of each will be returned in the array. Element [0] of a regexp match returns the entire matching string, and the match groups start at element [1] (the first group is [1], the second [2], and so on).”
Okay: first problem. Putting parentheses in doesn’t make it work with multiple choices. I guess we can swap over to matchAll() for that, but without multiple parentheses and therefore multiple choices, what point is the items returned as an array? It could, surely, be a yes/no answer? In fact, it returns an array which flummoxed me for several days as I realised that no matter what I put into the string as input, it always returned the same value. Two.
Searching for the word ‘will’ in the phrase “Inside will a tag, you will find will will content” will only ever return the value of two. Or rather, the value of ‘yes’ transmuted into ‘two’ by way of the length of an array, which is an entirely erroneous way of doing it. Almost as erroneous as the previous way of counting through the text as a string and looking at each individual part and then counting them (again, erroneously – to the tune of 559). Balls.
In my presentation – which I guess I’ll be covering in Basic Tech II (The Poptart at the End of the Universe) – I was told by Atau that I was only a half step away from solving a few of the problems. Maybe. I can see a functioning end to this problem, just not from here. Should I use the match function and the logic structure that I’ve been working on? There’s no guarantee that the logic structure will even work (559!) Five-five-nine! My least best guess is that my Macbook wants to emigrate to the People’s Republic of China and move to computing division 559.

