Performance – How to get time costs from HDFS read data from HDFS from spark

Spark’s timeline contains:

>Scheduler delay
>Task deserialization time
>Random play time
>Performer calculation time
>Random play write time
>Result serialization time
>Result time

It seems that the time cost of reading data from the source (such as hdfs) includes In Executor Computing Time. But I am not sure.

If it is in Executor Computing Time, how to get it without including the time cost of calculation.

Thank you.< /p>

It is difficult to correctly distinguish the time spent in the read operation, because the data is being read while the data is being read. Processing.

A simple best choice is to apply a simple operation (such as counting), the overhead of this operation is very small. If your file is quite large, then the reading will be extremely The earth governs trivial operations, especially if it is a count that can be done without moving data between nodes (except for single value results).

Spark The timeline contains:

>Scheduler delay
>Task deserialization time
>Random play time
>Performer calculation time
>Random play write Import time
>Result serialization time
>Get result time

It seems that the time cost of reading data from the source (such as hdfs) is included in the Executor Computing Time. But I am not sure.< /p>

If it is in Executor Computing Time, how to get it without including the calculated time cost.

Thank you.

p>

It is difficult to correctly distinguish the time spent in the read operation, because the data is processed while the data is being read.

A simple best choice is to apply a simple operation (Such as counting), the overhead of this operation is very small. If your file is quite large, then reading will greatly dominate trivial operations, especially if it is a count that can move data between nodes. Complete under circumstances (except for single value results).

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 2653 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Leave a Comment

Your email address will not be published.