CGnal’s Technical Blog

Pre-trained 3D convolutional neural network for video labelling

The number of videos available on the Internet is growing up rapidly. Every day, each minute, over 400 hours of new videos are uploaded on YouTube.
In this context, an increasing number of experts is trying to analyse these videos for various purposes like search, recommendation, ranking, …In this post, we will talk about video labelling and Convolutional neural network, a class of deep neural networks, that can be applied to this problem.

Read the post

What’s the difference between Statistics and Machine Learning?

Since I approached Machine Learning during my Ph.D. in Statistics I’ve always tried to compare the statistical framework and the machine learning one. I mean, both fields are intrinsically based on data and their ultimate goal is to extract some kind of knowledge from data so where exactly is the difference? What is inherently different in those two fields?

Read the post

Using Spark with HBase and salted row keys

Spark and HBase is a great combination for many very interesting BigData use-cases.

A typical one is using HBase as a system of records for storing time series coming, for example, from a network of sensors. With the advent of the IoT we can imagine how important is being able to reliably store huge amount of measurements and being able to perform analytics on top of them.

In this kind of scenarios, a typical approach is to store a single measurement keyed by a timestamp into an HBase row. In the rest of this post I’ll talk about the main problem you can hit when storing this kind of data in HBase and how to deal with it.

Read the post

Capture Events and Metrics for time series analysis

If you are implementing a Big Data infrastructure for streams of usage data, you could face different critical features that your architecture must handle (e.g., high throughput, low latency, real-time, distribution) and different technologies you could use (e.g. Kafka, Spark, Flink, Storm, OpenTSDB, MongoDB, etc.). In this article we describe a solution for collecting events, transform them into data points (associated to metrics you wish to track over time), and store data points in the way to be further analysed and visualised.

Read the post

A simple template for using deeplearning4j with Spark and Jupyter

Recently I started to make my hands dirty with this great library: deeplearning4j. I found this library really great and I’m using it as a way for learning a bit more about the fantastic world of Deep Learning.

What I found interesting is their approach to scale the learning phase. Scaling the process of training a neural network it’s pretty tough, fortunately, recently, practical approaches are emerged to accomplish this goal: exploiting a cluster of CPUs with or without GPUs for accelerating the training of complex neural networks where the training set can be very big.

Read the post

How to reduce the gap between OLTP and OLAP graph solutions with Spark-Tinkerpop

Graphs are all around us. They can be made to model countless real-world phenomena ranging from the social to the scientific including engineering, biology, medical systems, IoT systems, and e-commerce systems. They allow us to model and structure entities (i.e. graph’s nodes) and relationships among entities (i.e. graph’s edges) in natural way. As an example, a website can easily be represented as a graph. In a simple approach one can model web pages as nodes and hyperlinks as relationships among web pages. Then, graph theory algorithms can be easily applied to extract new valuable knowledge. For example, by applying these algorithms on a website graph one can discover how information is propagated among nodes, or organize web pages in clusters having similar topics and strictly connected.

Read the post