# What do you understand by Big Data?

Tutorials

## What do you understand by Big Data?

Introduction

The term Big Data was introduced to define data sets that were ever increasing in their Volume, Variety and Velocity, also known as the 3Vs. Such data sets were tough to be processed, captured or analysed by simple applications or computer systems and thus required a unique set of technologies and tools to manipulate or access them.  As the amount of data used is increasing every day, the 3Vs are constantly being managed to enable enhanced decision making, and process optimization of such data sets. Over time these three characteristics have been given proper definitions and also expanded to support other additional characteristics called Veracity and Variability.

Features of Big Data

• Volume- This refers to the size or quantity of data that is generated and stored.  Based on the volume we can decide if the data is Big Data or not and also how to effectively use it.
• Variety- Variety means the types of data within the Big Data set being used. By categorizing the data it becomes easier to decide which operation to perform on which type of data.
• Velocity- The speed with which the data is being produced and used is indicated by velocity. Based on the velocity, customer satisfaction can be analysed and improved if needed.
• Veracity- Veracity is the quality of data captured. If the data quality is not good, measures need to be taken to improve them.
• Variability- The degree of consistency in data is indicated by variability. The data need to be constant throughout the user interaction and must provide authenticity.

How is Big Data Managed?

Big Data is processed and generated using a programming model called MapReduce. This process works on a cluster of data and is composed mainly of two procedures, Map () and Reduce ().  The Map () procedure is used to filter data into different queries, such as splitting of address into a local and permanent address.  The Reduce () procedure on the other hand is used to perform some operation on the data, such as counting the number of entries in the data set. The MapReduce model allows distributed processing of data in a parallel manner; that is, a number of processes can be executed across multiple devices at the same time. Such parallel execution of data makes it possible for the system to uncover errors more quickly and schedule another system to work if the current one is not functioning properly.  Hadoop is another application that is used on Big Data to make it more structured and reliable. It helps solve various problems of formatting and makes  the data easier to scale and support.

Who Uses Big Data?

Big Data can be applied in any environment that deals with large amounts of data. With the increase in volume of data, there has also been an increase in the need for effectively classifying and analysing the data. Some of the most commonly used domains of Big Data are listed as follows.

• Government- Governments utilize Big Data to store and manage the large amounts data dealing with their agendas, policies and databases. When applied to governance, Big Data can bring cost efficiency and improved transparency and speed.
• Research Organizations- Large amounts of previously collected data and observations can be stored and categorized using Big Data. Big Data can also be used to make predictions based on analysis of previous data and help development in areas of healthcare, resource management etc.
• Media- Big Data connects worldwide users through simple portals. It helps in effective distribution of data within a minimum amount of time.
• Private Sector- Private sectors like retail and real estate use Big Data to analyse market trends and estimate the fluctuations in general market price.
• Sports- Big Data can used to make predictions about the game winners and also about the performance of players thereby helping in deducing the player’s value and their value in the particular season.