CERN…Science will save us all!

I would like to begin this post with a quote from the Monstrous Regiment by Terry Pratchett- he says “The presence of those seeking the truth is infinitely to be preferred to the presence of those who think they’ve found it.”  Hopefully this will make sense once you read this post.

I had the pleasure of contributing to the DEEP NLP project for document analysis and classification at CERN in Switzerland/France. Yes, the CERN sits astride the Franco-Swiss border and here physicists and engineers are tasked with probing the fundamental structure of the universe.

1

The CERN is where the Higgs Boson alias the God particle was discovered. The science of particle discovery relies mainly on are purpose-built particle accelerators and detectors. Accelerators boost beams of particles to high energies before the beams are made to collide with each other or with stationary targets. Detectors observe and record the results of these collisions. According to CERN, the particles are so tiny that the task of making them collide is akin to firing two needles 10 kilometers apart with such precision that they meet halfway!!!!!

The LHC ( The Large Hadron Collider)  which is located at CERN  is the is the world’s largest and most powerful particle accelerator. The LHC consists of a 27-kilometre ring of superconducting magnets with a number of accelerating structures to boost the energy of the particles along the way.  The LHC has a number of experiments for particle detection, the most notable ones being the ATLAS and the CMS. The ATLAS was crucial in the discovery of the Higgs Boson and the interactions in the ATLAS detectors create an enormous flow of data. The ATLAS generates ~ 1 Petabyte of data/second which is approximately four times the internet’s output.  Below is a picture inside Continue reading

Resolving OpenCV issues to run Tiny YOLO on the Movidius Neural Compute Stick

I recently got my hands on the Intel Movidius Neural Compute Stick. It is an amazing piece of hardware that addresses the need for intelligence at the edge.The Movidius Neural Compute Stick(NCS) is a low-cost, form-factor developer kit for low-power vision based embedded inference applications. It enables you to develop low-power intelligent edge devices solutions for image processing using deep learning algorithms.

Picture1I came across a great tutorial on getting started with the Movidius NCS and Tiny YOLO on this blog post https://blog.codecentric.de/2017/10/objekterkennung-mit-neuronalen-netzen-movidius-neural-compute-stick/. 

I would like to share a workaround for a very frustrating problem that’s not covered in the tutorial and that  I came across trying to run the yolo_object_detection_app example which detects objects in a video stream captured by the webcam and marks the objects in the video.

I was not able to get the demo to run because I initially had installed OpenCV from the Debian source and this has no webcam support. Thus when you run the demo you’ll get the error below on the terminal

err

Since this version of OpenCV has no webcam support, it’s not possible to get a video stream for processing. You can verify this by running cv2.imshow()  in the code and see the frame returns ‘None’.

To resolve this issue, first remove all the packages of the OpenCV installed from Debian repo. Do pip3 list | grep opencv and remove all the entries listed.

Once that is done, proceed to install OpenCV from this github source http://milq.github.io/install-opencv-ubuntu-debian/    by running the installation script bash install-opencv.sh and everything should now work fine once the installation is finished! Happy exploration with the Movidius NCS!

 

Key ICML 2018 Takeaways

I had the pleasure of attending this year’s edition of the International Conference on Machine Learning which was held in Stockholm,  Sweden. I was showcasing the Intel Smart Park solution which is an AI system that is meant to assist drivers in finding available parking slots in a parking lot. It is basically a 5-layer CNN which determines whether a slot is available or not and the candidates of available parking slots are passed through a graph algorithm to determine the one that is closest and most convenient for the driver. This is integrated to the car’s HUD and it will provide the driver with instructions to navigate to the assigned spot. You can see the demo  displayed in the background.icml1

In addition to this, I was able to attend a number of very intriguing presentations at the conference the most memorable being the presentation by Sanjeev Arora from Princeton  titled Toward Theoretical Understanding Continue reading

Memory Traffic Optimization to Improve Application Performance

Memory traffic optimization yields the greatest speedup compared to all the other optimization techniques to be deployed in optimizing an application. The other techniques being vectorization and multithreading. And if we want to really leverage the power of vectorization, then we have to optimize data re-use in caches. In addition to this it is important to understand that vector arithmetic in modern processors is cheap, it’s memory access that’s expensive. It’s therefore paramount that we optimize memory access for bandwidth bound applications.

But first before delving looking at locality of memory access in space and time, let’s have a quick refresher on vectorization and multithreading.

Vectorization is basically having a single instruction operating on multiple data elements. The speedup as a result of vectorizing in your application will depend on the instruction set on your hardware since the Continue reading

Twelve Ways to Fool the Masses when giving Perfomance results on Parallel Computers

I came across one of the most interesting and humorous research papers  while doing my nightly reads. The paper is titled Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers by David H. Bailey and published in 1991. You can download the full paper  here.
The title describes exactly what the paper is about and I’ll just share some interesting snippets from the document.

To quote in part the abstract:
Many of us in the field of highly parallel scientific computing recognize that it is often quite difficult to match the run time performance of the best conventional supercomputers.  But since lay persons usually don’t appreciate these difficulties and therefore don’t understand when we quote mediocre performance results, it is often necessary for us to adopt some advanced techniques in order to deflect attention from possibly unfavorable facts

Continue reading