Skip to navigation Skip to content
Simon Technology Blog
  • Architecture
  • Cloud
  • Database
  • Develop
  • Hardware
  • Industry
  • Language
  • Mobile
  • Opensource
  • OS
  • Web
Main Navigation

Category: Hadoop

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distributed. Make full use of the power of clusters for high-speed computing and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (high throughput) to access application data, suitable for those with large data sets (large data sets). set) application. HDFS relaxes the requirements of POSIX and can access data in the file system in the form of streaming access. The core design of the Hadoop framework is: HDFS and MapReduce. HDFS provides storage for massive amounts of data, while MapReduce provides calculations for massive amounts of data.

Hadoop – Hive query stays 99%

I use left join to insert records in Hive. When I set limit 1 query, but for all record queries, it stays at 99% to reduce jobs.

Insert overwrite table tablename select a.id, b.name from a le

October 12, 2021By Simo Hadoop 99%, Hadoop, hive, inquiry, stayLeave a Comment

Hadoop – Is Cassandra for OLAP or OLTP or both?

Cassandra does not comply with ACID like RDBMS, but CAP. Therefore, Cassandra selects AP from CAP and leaves it to the user for adjustment consistency.
I definitely can’t use it. Cassandra conduct

October 12, 2021By Simo Hadoop Both, Cassandra, for, Hadoop, OLAP, or OLTPLeave a Comment

Hadoop – Kafka Spark streaming: Unable to read

I am integrating Kafka and Spark, using spark-streaming. I created a theme as a producer of Kafka:

bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1 –partitions

October 12, 2021By Simo Hadoop Hadoop, Kafka, media, Message, read, SPARK, stream, UnableLeave a Comment

Hadoop (1) Ali Cloud Hadoop Cluster Configuration

Three ECS cloud servers

1.1 Create /bigdata directory

mkdir /bigdata
cd /bigdata
mkdir /app 1.2 Modify the host name node01, node02, node03

1.3 Modify the hosts file

vim /e

October 12, 2021By Simo Hadoop Ali, Cloud, cluster, configuration, HadoopLeave a Comment

Hadoop SQOOP instance

hadoop sqoop (instance) day-1 sqoop: is an open source tool, mainly used to transfer data between Hadoop and traditional databases (mysql). Import data from a relational database into Hadoop’s HDFS

October 12, 2021By Simo Hadoop Hadoop, instance, SqoopLeave a Comment

How to read the ORC file in the Hadoop stream?

I want to read the ORC file in mapreduce on Python. I try to run it:

hadoop jar /usr /lib/hadoop/lib/hadoop-streaming-2.6.0.2.2.6.0-2800.jar
-file /hdfs/price/mymapper.py
-mapper’/usr/local

October 12, 2021By Simo Hadoop file, Flow, Hadoop, How, Medium, ORC, readLeave a Comment

Hive Compression Type Test

Background:

1) 4 different types of tables have been created

2) Clean up the data in the hxh2, hxh3, and hxh4 tables, and keep the data in hxh1. The data size of the hxh1 table is: 74.1

October 12, 2021By Simo Hadoop compression, hive, test, typeLeave a Comment

Hadoop Hive-2.3.5 installation

hadoop hive-2.3.5 installation:

Unzip the file: [[email protected] opt]# tar -zxvf apache-hive-2.3.5-bin.tar.gz -C /opt

Establish a soft connection: [[email protected] opt]# ln -s apach

October 12, 2021By Simo Hadoop 2.3.5, Hadoop, hive, installationLeave a Comment

HDFS advantages and disadvantages

Brief description of HDFS architecture

1. Introduction to HDFS

HDFS (Hadoop distributed File System): Hadoop distributed file system. It is developed based on the needs of streaming dat

October 12, 2021By Simo Hadoop advantages and disadvantages, HDFSLeave a Comment

Hive Advanced Query 1

hadoop hive advanced query select basics 1.0 general query

1)select * from table_name< /p> 2)select * from table_name where name=’….’ limit 1;

1.1cte and nested query

1)with t a

October 12, 2021By Simo Hadoop advanced, hive, inquiryLeave a Comment

Posts navigation

Page 1 Page 2 Page 3 Page 4 … Page 10
Recent Posts
  • Sencha-Touch-2 – Sencha Touch 2, Nested XML Analysis NodeValue
  • Add a separation line and format XML content
  • Is there a norm of simplified XML subsets?
  • Look at it when you write React
  • ReactJS – Present React Redux React-Router App to add the server to the Firebase hosted by the Firebase
Categories
  • Android
  • Apache
  • Apache Kafka
  • Asp
  • Auto-Test
  • Automated Build
  • Aws
  • Bitcoin
  • Browser
  • C & C++
  • C#
  • Centos
  • Cgi
  • Character
  • Cloud Service
  • Cocos2dx
  • Cordova
  • CSS
  • Data Structure
  • Delphi
  • Design Pattern
  • Dojo
  • Dubbo
  • ELK
  • Flex
  • football
  • Game
  • Hadoop
  • Hibernate
  • HTML
  • Hybrid
  • Intel
  • IOS
  • Ipad
  • iPhone
  • Java
  • Javascript
  • Jetty
  • JQuery
  • Jsp
  • Linux
  • Load Balance
  • Lua
  • Macbook
  • Macos
  • Mathematics
  • Micro Services
  • Monitoring
  • Motherboard
  • Mysql
  • Network Hardware
  • Network Marketing
  • Nginx
  • NodeJs
  • Nosql
  • Oracle
  • Os Theory
  • Performance
  • PHP
  • Postgresql
  • Power Designer
  • React
  • Redis
  • Regexp
  • Rom
  • Rss
  • Ruby
  • Search Engines
  • Shell Script
  • Silicon Valley
  • Silverlight
  • Software Design
  • Spring
  • Sql
  • Sqlite
  • Sqlserver
  • Storage
  • Storm
  • Surface
  • SVN
  • Swift
  • System Architecture
  • Tablet
  • Uncategorized
  • Unix
  • Visual Basic
  • Visual Studio
  • Web Crawler
  • WebService
  • Windows
  • Wireless
  • XML
  • ZooKeeper
Archives
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • September 2019
  • August 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
© Simon Technology Blog 2025 • ThemeCountry Powered by WordPress