How Hadoop stores files and what is block size in hadoop?
When file is sent to Hadoop for storage the Hadoop system breaks the files into a set of individual blocks. These blocks are storage in different data nodes in the cluster and it makes multiple copies of each blocks depending on the replication factor.
In Hadoop 2.x typical block size is 128MB which is configurable. It can be configured as system default or for a individual file. In previous version of Hadoop, Hadoop 1.x it was 64MB.
Hadoop is distributed system which is designed to provide high throughput to achieve parallel processing of file fast. In Hadoop block size was increased with following reasons:
It was done to improve the NameNode performance
It also helped to improve the performance of MapReduce job because number of the mapper depends on the Block size.
To mange a Hadoop cluster with 1 petabytes and block size is 64 MB was difficult where count of block size was 15+million. And such size was difficult to manage. So, Block size was increased from 64MB to 128MB to ease the handling of large Hadoop clusters.
Check more tutorials at Big Data tutorials.