Machines Can See But Cannot Yet Fathom About What They See
Alexandros Louizos, MD, Deep Learning Engineer
We have traveled a long way since the original announcement of the Perceptron in the Principles of Neurodynamics by the US army in 1961. We had to wait almost 50 years but we have come a long way. Mini vision brains have found their way even in our mobile phones with tensorflow mobile or google vision kit and Intel Movidius can give a powerful vision boost to rasperry pi devices.
And this is only the beginning….
But how exactly machines see? Image a picture like a chess board of 8×8 boxes. Every box has a number from 0 to 255 which is the pixel intensity in a code of some sort.
The big question is how to take these numbers and understand what the image is? Is it a cat or is it a dog? Think about it for a moment. All the computer knows is numbers. It is not much different from what our brain sees. All the brain sees is activations of neurons or the lack of those. But still we make sense of what is in this world.
We tried for many years and with many techniques to make machines see. Till 2011 we were trying to teach machine some features which were filtering the image e.g these filters remind me of a childhood toy…
But these filter do not account for exceptions. And are not multidimensional. And are limited in success.
Then around 2011 some groups proposed that we start scanning through the image with boxes that keep certain parts on and delete other parts. Then do many small experiments. The filter that produce good results are weighted more. Those that lead to less accuracy are deleted. There you go. Not only you have automated the filter creation method, but you also have automated the filter selection process.
And this is called Convolutional Neural Networks. It is as simple as that. All else is incremental improvements on top of this basic premise.
Look into some of the filters that are created with this method below
But we start to realize that although we have amazing results with this automated filter methodology we are still far away from intelligent machines. Because by breaking images into the part we seem to have lots the ability to understand the content of the image.
The billion dollar question of today’s AI is: How to connect these pieces we can see in pictures into a coherent story of what the image contains?
These architectures are the next level of AI.
If you wanted to learn about machine vision where should you start?
1) Python is the language of choice. You know python you are set. Theoretically, you can do it with every language but python has all the modules ready.
2) Do the Andrew Ng course on machine learning. https://www.coursera.org/learn/machine-learning . There are many AI experts in the AI teams of google and facebook that started there
3) Do the Stanford CS231n, Convolutional Neural Networks for Visual Recognition. http://cs231n.stanford.edu/ Super awesome source of knowledge.
4) Start playing with datasets and code. Go to kaggle.com and see prior code and solutions. It might help you a lot to understand implementations
5) Intern at a company with experienced Machine Vision People. This will give you portfolio and experience to move into the next stage.
Are you interested in internship? In Maxnmachina we are looking for interns. We will pair you up with experienced machine vision experts and teach you all the secrets.
Impress us and we will hire you. Sounds cool?