Big-Data is one of the emerging concepts of this era. The term “Big-Data”describes large volume of data that can be structured, semi-structured and unstructured.
In other words, we can say that it is used for the collection of data sets that are so large and complex that that it is difficult to process using traditional applications and tools.
The features of Big-Data can be explained using 3V’s- Volume, Variety and Velocity.
Volume: Companies collect data from variety of sources like including social media, business transactions, information from sensor and machine-to-machine data.
Variety: The flow of data is very fast and must be dealt with in a timely manner.
Velocity: Data comprises of variety of formats text, numeric, audio, video, email, picture etc.
Also, there are 2 additional dimensions:
Variability: Variability means that the flow of data is highly inconsistent with periodic peaks. These event-triggered peak data loads can be very challenging to handle and that too when the data is unstructured.
Complexity: Data comes from multiple sources which are very difficult to link, match, cleanse and transform.
Storing and Analysing It can be a challenging job due to its complexity. Given below are top tools which are used to store and analyse It.
Apache Hadoop is one of the most widely used tool for Big-Data. It is a java based free software framework that can effectively store large amount of data in a cluster. HDFS i.e. Hadoop Distributed File System is the storage system of Hadoop which splits it and distribute across many nodes in a cluster. It also has the capability to replicate data and thus ensures the availability of data.
We all know that SQL proves to be very effective when it comes to handle structured data. But what about unstructured data? For that, we can use NoSQL i.e. Not Only SQL. This has the capability to store the unstructured data with no particular schema. Each row can have its own set of column values.
Microsoft HDInsight in a Microsoft solution for Big-Data and it is powered by Apache Hadoop. It is available as a service in the cloud. It uses Windows Azure Blob storage as the Default File System. It also ensures High availability with low cost.
Hive is a distributed database management for Hadoop. It supports SQL like query option HiveSQL (HSQL) to access it. This can be used for data mining operations. Also, Hive runs on the top of Hadoop.
PolyBase works on the top of SQL Server 2012 Parallel Data Warehouse (PDW) and it is used to access the data stored in PDW. It is a data warehousing appliance built for processing large volume of relational data and provides integration with Hadoop allowing us to access the non-relational data as well.
It is one of the most popular tools of Microsoft which can be used to connect data stored in Hadoop using Excel 2013. Power View feature of EXCEL 2013 can be used to easily summarise the data.
6 Replies to “Big Data | Tools used to Analyse Big Data”
find out more
I wonder about the listicles.. maybe a bit reaching.|
check this site out
Can I ask you to elaborate? Maybe show any further example? Thanks!
Shantell
Greetings from California! I’m bored to tears
at work so I decided to check out your blog on my iphone during lunch break.
I enjoy the knowledge you provide here and can’t wait to take a look when I
get home. I’m shocked at how quick your blog loaded on my mobile ..
I’m not even using WIFI, just 3G .. Anyways, awesome
Neha Bartwal
thanks a lot dear
online casino malaysia
Admittedly, issues have numerous uses. However, our highest truth lies
within anyone. Lessons solidify crucial the old behaviour
has to be modified.
minecraft
Wonderful goods from you, man. I have understand your stuff previous to and you are just too excellent.
I actually like what you’ve acquired here, really like
what you are stating and the way in which you say it.
You make it entertaining and you still care for to keep it smart.
I can’t wait to read much more from you. This is really
a terrific web site.
Comments are closed.