Meet the algorithm that can learn “everything about anything”

Posted on by Brandon Klein

The most recent advances in artificial intelligence research are pretty staggering, thanks in part to the abundance of data available on the web. We’ve covered how deep learning is helping create self-teaching and highly accurate systems for tasks such as sentiment analysis and facial recognition, but there are also models that can solve geometry and algebra problems, predict whether a stack of dishes is likely to fall over and (from the team behind Google’s word2vec) understand entire paragraphs of text.

(Hat tip to frequent commenter Oneasum for pointing out all these projects.)

One of the more interesting projects is a system called LEVAN, which is short for Learn EVerything about ANything and was created by a group of researchers out of the Allen Institute for Artificial Intelligence and the University of Washington. One of them, Carlos Guestrin, is also co-founder and CEO of a data science startup called GraphLab. What’s really interesting about LEVAN is that it’s neither human-supervised nor unsupervised (like many deep learning systems), but what its creators call “webly supervised.”


What that means, essentially, is that LEVAN uses the web to learn everything it needs to know. It scours Google Books Ngrams to learn common phrases associated with a particular concept, then searches for those phrases in web image repositories such as Google Images, Bing and Flickr. For example, LEVAN now knows that “heavyweight boxing,” “boxing ring” and “ali boxing” are all part of the larger concept of “boxing,” and it knows what each one looks like.