What is HBase?
Imagine a database that is equipped to handle gigantic data sets with millions or even billions of rows and columns. Add to it the capability to accommodate a wide range of data sources using different structures and schemas. This is precisely what HBase is in practical terms. Modeled after Google’s Big Table, HBase is an open source, NoSQL database that enables real-time access to large data sets for both read and write functionalities.
HBase has been developed by Apache Software Foundation, as a module of the Apache Hadoop project. Like Big Table, HBase can hold large volumes of data, a lot of which may be unimportant information surrounding relevant data, and then provide very quick access to the queried information.
How does HBase Work?
HBase runs on the Hadoop Distributed File System (HDFS) and also supports Hadoop's Map-Reduce programming model.
The column-based structure of HBase makes it ideal for storing large volumes of sparse data typical of Big Data use cases. Written in Java, it is a non-relational database store. Another advantage is its scalability- both module-wise and linear-wise.
Using the original Big Table concept, HBase applies compression and Bloom filters and in-memory operation. Just like traditional databases, it is comprised of a set of tables that use rows and columns to sort data. The tables are accessed using a Primary Key. Columns usually represent attributes and it is possible to group multiple, commonly used attributes together as column families. All elements of a column family can be stored together. Through these features, HBase facilitates both queries for individual/ specific records and analysis of huge volumes of data.
Benefits of Using HBase
HBase solves many of the problems posed by the need to use Big Data. It is highly useful when random access and writing of data are needed. HBase in combination with Handoop MapReduce is used to efficiently process large volumes of data running into petabytes.
High Speed: HBase works fast and enables random reading and writing across all data. It also integrates into other Apache components to create complete solutions.
Scalability: HBase facilitates scalability. All you need to do is add more servers to store massive unlimited data and make it accessible to a large number of users and applications.
Adaptability: HBase can be directly used to store data of all kinds- whether structured, semi-structured or unstructured. It offers full-fidelity data for analysis and other uses.
Dependability: Data is replicated and thereby data loss and access restriction is avoided. Even if your servers fail, the system still works and adapts to variable workloads. HBase also has functionalities that support disaster recovery.
Cases of HBase Use by Enterprises
The inherent characteristics of HBase make it ideal for data-driven websites and businesses. Facebook is perhaps the most commonly recognized user of HBase; the social networking giant has been using HBase for its messaging platform since November 2010. Other popular users include LinkedIn, Pinterest and Netflix.