Sunday, December 15, 2013

Scientific Computing: Modeling Neuroplasticity

The pursuit of strong artificial intelligence involves numerous areas of research. While a lot of current research in AI is focused on achieving various intelligent tasks that the brain is capable of, disciplines like computational neuroscience seek to understand how the brain works and achieves certain tasks in general. In a recent study from the field, researchers at MIT have been able to model how the mind is able to learn new things, known as neuroplasticity, while still retaining old things it's learned.

The research has shown that neurons are constantly trying out new configurations of how they connect to other neurons to allow for the brain to learn as many tasks as it needs to and find the best configuration. This allows neurons to specialize in certain tasks while others are still able to learn new tasks.

One key element in the study that was as yet not widely explored was determining how noise acts within the model. The researchers found that noise could actually benefit the model by exploring more new connection configurations when the model is hyperplastic. The researchers concluded that the noise actually helped the model learn a variety of new things while retaining the ability to do old rather than hindering it. The model also helps explain how skills can diminish when not practiced often enough since the new connections will eventually start to overwrite old skills after too much time has elapsed.

Only time will tell if this research will lead to further research and findings on the subject or just remain an interesting fact. Regardless, any breakthrough such as this in computational neuroscience helps towards both our understanding of how the human brain works in general as well as the long term goal of trying to create an intelligence on par with it.

Computer Graphics: Breakthough in Image Pattern Detection

In a research experiment involving unsupervised machine learning, Google may have discovered the most significant image pattern on the internet, or at least on Youtube where the experiment was performed. The system was designed to detect and rank imagery patterns which could then be analyzed for what they represent by researchers. It may come at no surprise, then, that this system succeeded in detecting what is important to the users of Youtube and most of the internet in general: cats.

What was originally designed to detect significant patterns in imagery data became the world's first cat detector after the results showed that imagery of cats was some of the most detected and thus significant imagery data after being trained on Youtube videos over the course of three days. According to an article on Slate, linear tool-like objects held at about a 30 degree angle were also common features detected and after adding a round of supervised learning, the classifier could detect human faces with around 82% accuracy.

Perhaps not the most useful breakthrough in unsupervised learning on image data on the surface, the model does prove the capability of emerging methods to detect meaningful features in images and video. Perhaps with some tweaking the system will also be able to detect dogs, but no breakthroughs in this area have emerged yet.

Communications and Security: Knowing How Your Code Works

It may seem obvious when said, but knowing exactly how your code works and what it is doing, especially with fringe cases, is a very important aspect of security. When an attacker is able to determine a bug or "feature" that your code has that you are unaware of, it can often be exploited to varying degrees of maliciousness. The well-known website Spotify learned this lesson first-hand when an exploit involving the way they processed usernames was found.

The key mistake made was found to be allowing users to have usernames with any valid Unicode character while having the actual stored username be a more restricted version. The username that was actually stored and checked against for internal purposes was processed with the string.lower() method in Python, which apparently maps a large number of Unicode characters to the 26-character space of lowercase English ascii characters. As a result, it was possible for many different possible usernames to become the exact same username when processed by the lower() method. The attack itself allowed user accounts to be hijacked using this information.

Existing Spotify accounts could be hijacked by signing up for a new account with a username that maps to an existing username when processed by lower(), then submitting a password reset request. The new account creation process would associate the new email address as a change of email address to the old account and thus send the password reset request to that address instead of the one belonging to the original user. Once the password is reset one can log in as the original user with the new password.

While this is a more fringe case of not knowing exactly what code is doing going wrong--which also arguably makes it a more interesting one--it does show the importance of verifying code behavior as well as user inputs for security purposes. Luckily for Spotify the user who discovered this exploit reported it to them quickly, but the results could have been much worse under different circumstances.

Artificial Intelligence: Why You Should Have Paid Attention in Your Statistics Classes

While not a particularly new field in Computer Science, Artificial Intelligence has gone through many changes over the years with many advances and setbacks. As it has become increasingly widespread and marked with new successes over the past decade or two, one thing has become clear: statistics and probabilistic approaches seem to be the key to achieving continues success in just about all facets of the field.

One area with AI that has seen tremendous success from statistical and probabilistic approaches is that of Computational Linguistics. This area has a longtime rivalry between approaches based on hand-generated rules from linguists versus statistical/probabilistic approaches. As Peter Norvig notes, the majority of systems successfully solving a number of problems within Computational Linguistics use statistical and/or probabilistic approaches at least partially if not entirely. These include such popular and widespread problems and application areas as search engines, machine translation, and speech recognition. As new research in this area tends to be towards newer and better statistical and probabilistic approaches, this trend does not seem to be changing anytime soon.

Yet another area with traditionally more formal roots that has benefited greatly from probability and statistics is Graph Theory. The work of Thomas Bayes and Andrey Markov to formulate probabilistic graph-based models is very widespread through all of Artificial Intelligence these days and the applications of such models may be limitless. Such models are used widely in Computational Linguistics, pattern recognition, and Bioinformatics, just to name a few. Such models are capable of encoding the fluctuations, randomness, and uncertainty that are a part of most things we try to model.

Machine Learning is another popular area within Artificial Intelligence that makes heavy use of statistics. A lot of machine learning techniques are actually intended to solve problems from statistics such as linear and logistic regression using new approaches. A number of other parallels exist as well showing the interrelation between Machine Learning and Statistics.

While math in general always seems to be a driving force behind the discovery of new techniques and algorithms in software, statistics and probability in particular are becoming increasingly popular in computer science. It doesn't seem likely that computer science as a discipline will ever break away from its dependence on math, so to the CS majors out there be sure to pay attention in math class!