Category Archives: Dissertation

General comments on the progress of my dissertation.

Dissertation Continued…

Well it’s been a hard few weeks. I recently lost my granddad to terminal lung cancer and I want to take this opportunity to thank him for all his support, he’s the reason I was able to study for my masters, providing the funding for the tuition fees. He meant a lot to me and he is sorely missed.

The dissertation it’s self has come along a lot. A long good brainstorm with my brilliant supervisor Dr. Julie Greensmith resulted in a rework of the application and the methodology for the project.

I’ve been making the revisions and writing up the methodology over the past week or so and I’m making some good progress.

Some new techniques have been learned by myself in the process, such as how to manipulate Dictionary<>() class collections. These are essentially dynamically buffered associative arrays. But require certain methods of manipulation to work well. I’m now using them because I need an ID number to cross reference classified binary strings as I’ve removed the symbolic string at the end of the feature vector which gives the type i.e. normal or an attack. Anyway I had a sharp lessen in optimisation and how these collections are enumerated. Initially I was using for loops then using a LINQ .ElementAt() to loop through the Dictionary collections. I realised that this was not ideal but I have numerous nested loops and when using arrays within arrays (arrayception?) aka jagged arrays, it was easier for me to follow and manipulate them easier using the for loops.

This led to some serious problems. Firstly I hadn’t actually enumerated the collection straight off, so each time the loop executed the collection would enumerate and them try to find the elements using the LINQ expression. This meant when writing to file the 500,000 feature vectors were taking about 30 minutes to write. To give some context, I write to file in this program about 40 times, all different sized data sets but still enough to make this unworkable.

So I took my finger out and read up, used the foreach() loop which enumerates the collection first then works on looping and the same process took about 5 seconds. I felt like a muppet to say the least. But all clouds have a silver lining, I’ve now learned how to use and manipulate another collection better and I also learned more LINQ statments. So overall I’m going to chalk that one up as a win.

I’m going to be super busy over the next few weeks finishing off this dissertation thesis so I won’t be doing anymore entries until it’s handed in. But once it’s done I’m going to get the projects page finished and upload the source code and my thesis to there with a link to the Dissertation blog page.

So that’s all for now, I’d like to finish this off by once again thanking my granddad for his love and support, he will be deeply missed.

Dissertation

Well this is my first blog post and it’s a bit of an odd experience but I’m sure I’ll get used to it in time.

My dissertation project is on artificial immune system algorithms. For those who are not from a computer science background, an algorithm is a set of instructions for a computer to produce a desired outcome. Artificial immune systems represent the human immune system within a digital system, the main applications of this are in computer security, hardware fault detection, anomaly detection and pattern recognition.

My dissertation is focused on Negative Selection, a perceived process in the immune system whereby immature immune cells are presented self anti-gens, those that bind with self are killed in a controlled manner called apoptosis and the remaining immune cells are self tolerant and released into the body.

I’ve managed to build my application with a negative selection algorithm (NSA), that can use R-contiguous or hamming distance for matching.

I wrote the app using C# which turns out to be not the best move for this type of research program as I represent each data feature as a binary string. However C# does not allow boolean values to be converted to 1 and 0 and vice versa. This became a significant issue as I needed the bits to be converted to numerical values (int, double) to do calculations.

Also using the 1999 KDD dataset proved more difficult than initially thought as it uses a range of value types, strings for symbolic attributes, ints and doubles. The KDD is meant to represent network traffic but is overall is a higher level abstraction than I needed. So I had to strip out redundant attributes and convert the remaining attributes to int 1 or 0 representing a binary string using the median value for the attributes as the factor for deciding if its true or false.

Anyway that’s more than enough for a late night first time blog post, I’ll continue to update this blog on my progress with the dissertation. I’ll also continue to post updates after the dissertation is completed relating to my personal projects and any open source projects I become involved in.