Big Data technology tutorials, questions and answers
Big Data technologies are here to help companies to analyze huge set of data
generated through various sources. It provides technologies for handling huge
quantity of structured and unstructured data. Big Data technologies provides
software system for analyzing data in real-time or through scheduled job.
Famous companies like Face book, Twitter, Google are using Big Data
technologies for handling such a huge data set for their users. Due to
innovation in Big Data technologies these companies are able to handle so much
of data and the access of these data is also very fast.
In this section we are providing you many articles, tutorials and questions
and answers on various Big Data technologies.
Big Data Tutorials
Following articles are good for learning Big Data technologies:
Big Data Technologies
Let's discuss the technologies used in Big Data environment.
- Hadoop - Apache Hadoop is software system
for storing and processing of big data sets, many technologies
are used on the top of Hadoop to achieve Big Data analytics.
- Hadoop HDFS - Hadoop HDFS (Hadoop
Distributed File System) is framework for storing files (by
splitting and other means) on to distributed servers in
fault-tolerant way. This enables to store huge data sets in Big
Data Environment. Many tools and software on the top of Hadoop
HDFS is used for storing and analyzing the data.
- HBase - HBase is NoSQL database on the top
of Hadoop HDFS, which provides random and fast access to the
- Hive - Apache Hive provides SQL query
interface for searching the data stored in HDFS.
- Pig - Apache Pig system is high level
abstraction for creating and running MapReduce job on the HDFS.
- Mahout - Apache Mahout is machine learning
framework which works with Hadoop ecosystem.
- Oozie - Apache Oozie is workflow scheduling
system which works on the Hadoop ecosystem. It is used for
scheduling the jobs in Big Data environment.
- Zookeeper - Apache Zookeeper is software
system which is part of Hadoop ecosystem and it is used for
centralized management of configuration information, name
providing, distributed synchronization and providing group
Here are tutorials of Machine learning:
Big Data tutorials on devmanuals.com
Following are best topics from