Friday, March 01, 2013

big data visualisation


A large part of the effort in developing solutions around bigger data are concentrating around retrieving data, storing data and getting meaning out of this data. For a lot of engineers and developers getting meaning out of the vast amount of data is the end of the task. The system, for example a Hadoop implementation, has crunched vast amounts of data and is providing the answer in the form of a array of data or a file of consolidated answers. Coming from a large pile of structured and unstructured data this is a huge accomplishment however it is not the end of things.

The next step and in a large number of cases is getting this set of data, which is the result of your big data analysis, to the users. To be more precise, how do we get it there in a usable form. Receiving a flat text file is not always the best way of delivering your results to a user or customer. This might be the correct way to provide it to an analyst who will do more analysis on it however for a user or customer who quickly wants to see the information this is not the correct way most likely. 

One of the fields of big data and the way we work with computers and computer systems in relation to data will be big data visualisation. Big data visualisation will be working focusing around how do we handle massive amounts of data and how do we make sure this is represented in such a way that it is understandable by a human. Big data and the vast amount of data that is generally associated with big data solutions is no longer interpretable by a human, that is why we have machine who do this for us. However now we have to focus on how to ensure that the outcome of the big data analysis is visualize in a human interpretable format.

Below a high-level diagram is showing a setup for a distributed sensor network. Showcasing in this diagram are the sensors picking up signals and transmitting them to the sensor server in a M2M (machine to machine) fashion. This completely is part of the "sensor Domain" in the diagram. The sensor server is placing the retrieved data in to the Hadoop cluster by placing it on the HDFS (Hadoop Distributed File System) within the Hadoop Domain. 


Data within the Hadoop domain is often referred to as the data lake. Big data stored in HDFS, or as stated the data lake, is at this moment unstructured data. By using MapReduce on the data placed by the sensor server within HDFS we can make more meaning of this unstructured data. This is commonly where the point where most developers on Hadoop and MapReduce will stop. Providing the consolidated and structured data and answers via MapReduce is the endpoint. 

The next step however is presenting this data to the users and customer or to a analyst team who will do  more analysis on this data. For this it is needed to load the results of your MapReduce job into a database. In our case as shown above this is a database within the BI Domain.  The structured data resulting out of the MapReduce job can now be used within other applications or BI tools to be presented in a human readable way. 

When you have a dominant Oracle landscape it can make sense to deploy Oracle BI as a tool to connect to the database holding the MapReduce results. One of the benefits of this is that OBIEE will also allow you to use multiple datasources. Meaning that you can use the MapReduce results coming from your sensor network in combination with for example you sales data in the Oracle eBS database. OBIEE is providing some great ways of visualising data, it gives your users and customers a view of the results in multiple ways which are very human readable. 

However, when using Oracle BI tools you are limited to the options provided by the tool, this is the same for most of the tools coming from other Vendors. What I expect we will see in the upcoming future is the rise of new companies who specialise in data visualisation. One of the example of such a company is Periscopic who specialise in this field. Also we do see a lot of online companies and services coming to live providing support in visualising data. 

Also a lot of good opensource libraries are coming to life where you can download (and contribute) to open ways of visualising sets of data. A very cool implementation of opensource data visualisation if D3JS. Some examples of data visualisation done by making use of D3JS are shown below.

No comments: