Hadoop sequence data access

According to Hadoop authoritative guidelines:

HDFS is a filesystem designed for storing very large files with
streaming or sequential data access patterns

What is streaming or sequential data access? How will it reduce disk seek time?

This is not unique to Hadoop.

Sequential access mode Means that you read the data in order (usually from beginning to end). Consider the example of a book. When reading a novel, use sequential order: start from page 1, then go to page 2, and so on. Another A common pattern is called random access. This is when you jump from one place to another, and may even jump back when reading data. For the book example, consider a dictionary. You don’t read it like you would a novel. Instead, you search for your word in the middle of a certain place. When you finish looking up the word, you may go to find another word, which is located hundreds of pages away from where your book is opened. Search where you should go from Starting to read is called “seeking”.

When you visit sequentially, you only need to look for it once, and then read until you complete the data. For random access, you need to switch every time Search when you go to other locations in the file. This may have a considerable performance impact on the hard drive, because the cost of seeking on the disk drive is very high.

According to Hadoop Authority Guide:

HDFS is a filesystem designed for storing very large files with
streaming or sequential data access patterns

What is streaming or sequential data access? How will it reduce disk seek time?

This is not unique to Hadoop.

Sequential access mode means that you read data sequentially (usually from the beginning) To the end). Consider the example of a book. When reading a novel, use sequential order: start on page 1, then go to page 2, and so on. Another common pattern is called random access. This is when you When jumping from one place to another, you may even jump backward when reading data. For the book example, consider a dictionary. You don’t read it like you would a novel. Instead, you search for your in the middle of a place Word. When you are done searching for this word, you may find another word, which is hundreds of pages away from the place where your book is opened. Searching where you should start reading is called “seeking”.

When you access sequentially, you only need to look for it once, and then read until you complete the data. When doing random access, you need to search every time you want to switch to another location in the file. This is in There may be a considerable performance impact on the hard drive, because the cost of seeking on the disk drive is very high.

Leave a Comment

Your email address will not be published.