Category: Hadoop

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distributed. Make full use of the power of clusters for high-speed computing and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (high throughput) to access application data, suitable for those with large data sets (large data sets). set) application. HDFS relaxes the requirements of POSIX and can access data in the file system in the form of streaming access. The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive amounts of data, while MapReduce provides calculations for massive amounts of data.

Hadoop – Hive query stays 99%

I use left join to insert records in Hive. When I set limit 1 query, but for all record queries, it stays at 99% to reduce jobs.

Insert overwrite table tablename select a.id, b.name from a le

October 12, 2021By Simo Hadoop 99%, Hadoop, hive, inquiry, stayLeave a Comment

Hadoop – Is Cassandra for OLAP or OLTP or both?

Cassandra does not comply with ACID like RDBMS, but CAP. Therefore, Cassandra selects AP from CAP and leaves it to the user for adjustment consistency.
I definitely can’t use it. Cassandra conduct

October 12, 2021By Simo Hadoop Both, Cassandra, for, Hadoop, OLAP, or OLTPLeave a Comment

Hadoop – Kafka Spark streaming: Unable to read

I am integrating Kafka and Spark, using spark-streaming. I created a theme as a producer of Kafka:

bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions

October 12, 2021By Simo Hadoop Hadoop, Kafka, media, Message, read, SPARK, stream, UnableLeave a Comment

Hadoop SQOOP instance

hadoop sqoop (instance) day-1 sqoop: is an open source tool, mainly used to transfer data between Hadoop and traditional databases (mysql). Import data from a relational database into Hadoop’s HDFS

October 12, 2021By Simo Hadoop Hadoop, instance, SqoopLeave a Comment