JVM performance tuning monitoring tool hprof use detailed

Ⅰ Problem

In real-world enterprise Java development, sometimes we encounter the following problems:

OutOfMemoryError
Memory leak
Thread deadlock
Lock Contention
Java process consumes too much CPU
……
These problems may be overlooked by many people in daily development (for example, some people encounter the above problems just restart the server or increase the memory, but do not go deep into the root cause of the problem), but the ability to understand and solve these problems is advanced by Java programmers. Essential requirements. This article will introduce some common JVM performance tuning monitoring tools hprof.

Ⅱ the generation of hprof files

The hprof file can be generated in DDMS (the full name of DDMS is DalvikDebug Monitor Service, which is the Dalvik virtual machine debugging monitoring service in Android development environment. Provides test device screen capture, view the running thread of specific process and heap information, Logcat, broadcast status information, Analog phone call, analog receiving and sending SMS, virtual geographic coordinates, etc.) Select the process by clicking the “dump hprof file” button in the upper left corner of the window to generate it directly. It can also be generated by adding code in the program. The following generates hprof file by setting.

We want to automatically generate the heap dump file when the memory overflows. To do this, we add the JVM arguments at runtime:
-XX:+HeapDumpOnOutOfMemoryError

jvm performance

Note: The .hprof file generated by dump is placed under the project directory by default.
First construct an entity class User, this User class is a general java class, then we construct an ArrayList, and then put an instance of this User class in an infinite loop, because the User class and ArrayList are on the heap, and ArrayList is a Strong reference, so can not be recycled by the GC, (because our List has been used and not destroyed), so once the heap memory occupied by the ArrayList fills the entire heap size, the heap overflows.

package com.deppon.tps;
public class User {
	private String name;
	private String sex;
	private int age;
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public String getSex() {
		return sex;
	}
	public void setSex(String sex) {
		this.sex = sex;
	}
	public int getAge() {
		return age;
	}
	public void setAge(int age) {
		this.age = age;
	}
    public User( String name,String sex,int age){
        this.name=name;
        this.sex=sex;
        this.age=age;
    }
}

Then we create a main method is inside the Test1 class, create an ArrayList, and infinitely add a User class object to it is:

package com.deppon.tps;
import java.util.ArrayList;
import java.util.List;
public class Test1 {
	public static void main(String[] args) {
		List<User> persons = new ArrayList<User> ();
        while( 1>0){
            persons.add( new User("liuhai","male",25));
        }
}

When running the above code, the heap overflows and a heap dump file is generated (here the VM parameter specifies that if the heap overflows, a heap dump is generated)

Use the MAT (Memory Analyze Tool) tool — plugin to download, install and open the hprof file. and analyze this hprof file (java_pid32430.hprof). We found that it did detect the memory leak problem, as follows:

jvm performance

As shown, it is clear here that there is a collection type in the main() method, and then each element in the collection is an object of com.charles.research.User, and each object’s Shallow Heap and Retained Heap size is 24bytes. Because the ArrayList always exists, when the object is long enough, the heap is filled and overflows. Here we created 76,430,681 User objects, each occupying 24 bytes, so it occupies a total of 24 * 7634068 = 1332217632 bytes, which is about 174.72M of heap space. Here, the S hallow Size and Retained Size of the User object are both 24 bytes?

What do we want to look at for Shallow Size and Retained Size?

Shallow Size is the size of the memory occupied by the object itself, and does not contain the object it references. The Shallow Size for regular objects (non-array) is determined by the number and type of its member variables, and the ShallowSize of the array is determined by the array type and array length, which is the sum of the array element sizes.

Retained Size=The current object size + the sum of the sizes of the objects that the current object can be directly or indirectly referenced. (Indirect reference meaning: A->B->C, C is an indirect reference), and excludes objects directly or indirectly referenced by GC Roots

So, our User class here, because our machine is a 32-bit WIN7 system, so the object header occupies 8 bytes, it contains a String object reference (name), occupies 4 bytes, contains a String object reference (sex), occupies 4 bytes, contains an int type (age), occupying 4bytes, so it occupies a total of 8+4+4+4=20bytes, because the number of digits is to be filled, so the final size is 24bytes. This is the size of the object itself (Shallow Heap).

Tip: In order to explain the completion, you can also experiment. If we add a String member to User, then the User class size is still 24 bytes, because the reference to this new String object is just 4 bytes to fill the pit. Adding a String member, the size of the User class is directly increased from 24 bytes to 32 bytes, because another pit needs to be filled.

And our User class does not introduce other classes (not including String, because String is directly referenced by the Root GC), so recycling the memory occupied by User is to recycle the User itself, so the Retained Heap size is equivalent to the Shallow Heap size.

Ⅲ set jvm parameters in tomcat

  1. Open the /tomcat_home/bin/catalina.sh file
  2. Plus: JAVA_OPTS=”$JAVA_OPTS -server -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=D:\heapdump”
jvm performance

Note: When -XX:HeapDumpPath is not set, the files that are dumped are in the /tomcat_home/bin directory.

Windows system

  1. Open the /tomcat_home/bin/catalina.bat file
  2. Plus: set JAVA_OPTS=%JAVA_OPTS% -server -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=D:\heapdump

As shown below:

jvm performance

Ali Open Source Java Diagnostic Tool-Arthas

Arthas is an open source tool for Java developed by Alibaba recently. It mainly for diagnosing the problems of java.

I. Overview

This tool can help you do the following things:

  1. Which jar package loaded this class from?
  2. Why do you code throws various kinds of Exceptions?
  3. If you encounter problems on product environment , you can’t debug well. Can you only add System.out on your code and republish it again and again to find problem by adding logs?
  4. Why hasn’t the online code been executed here? Is it because there is no commit in the code? Or is it the wrong branch code?
  5. There is a problem with the data processing of a user online, but it can not debug online and can not be reappear offline.
  6. Is there a global perspective to see how the system works?
  7. What is the way to monitor the real-time running state of JVM?

II. Installation Method

1.1 Windows Installation Mode

Download address: http://search.maven.org/classic/#search%7Cga%7C1%7Cg%3A%22com.taobao.arthas%22%20AND%20a%3A%22arthas-packaging%22

After downloading, decompress, as shown in the following:

Download the latest “bin.zip” package file. After decompression, there is a file named “as.bat” in the bin directory. For the time being, this script accepts only one parameter “pid”, so it can only diagnose Java processes on one machine.

The startup command is:

as.bat <pid>

Note: When I started Windows 10, I encountered the following problems

D:\download\arthas-packaging-3.0.4-bin > telnet 'telnet'is not an internal or external command, nor is it a runnable program Or batch files.

The solution is: Control Panel – > Start or Close Windows Function – > Check Telnet Function

1.2 Linux Installation Mode

Install Arthas:

curl -L https://alibaba.github.io/arthas/install.sh | sh

Start Arthas:

./as.sh

After successful startup, you will see the following interface.

III. Common orders

3.1 Basic Command

Help — View command help information
CLS – Clear the current screen area
Session — View information about the current session
Reset – Reset Enhanced Classes, which will be restored to all classes enhanced by Arthas. When the Arthas server is closed, all enhanced classes will be reset.
Version — Outputs the version number of Arthas loaded by the current target Java process
Quit – Exit the current Arthas client, other Arthas clients are unaffected
Shutdown – Close the Arthas server and all Arthas clients quit
Keymap – Arthas shortcut list and custom shortcut keys

JVM correlation

Dashboard – Real-time Data Panel for Current System
Thread — View thread stack information for the current JVM
JVM — View the current JVM information
Sysprop — Viewing and modifying JVM system properties
New! Getstatic — View static properties of classes

Class / classloader correlation

SC — View class information loaded by JVM
SM — View method information for loaded classes
Dump – dump loaded class byte code to a specific directory
Redefine – Load external. class files, redefine to JVM
JAD — Decompiles the source code that specifies the loaded class
Classloader – View the inheritance tree, urls, class loading information of classloader, and use classloader to getResource

Monitor/watch/trace correlation

Note that these commands are implemented by bytecode enhancement technology, which inserts some aspect into the method of the specified class to achieve data statistics and observation. Therefore, when online and pre-sent, please try to identify the classes, methods and conditions that need to be observed, and execute shutdown command at the end of diagnosis or execute reset command for enhanced class.

Monitor — Method Execution Monitoring
Watch – Method Executes Data Observation
Trace — Invoke paths within methods and output time-consuming paths on each node on the method path
Stack — Outputs the call path for the current method to be invoked
TT – Method executes the time-space tunnel of data, records the input and return information of each call of the specified method, and can observe these calls at different times.

Options

Options — View or set the Arthas global switch

Pipeline

Arthas supports the use of pipes to further process the results of the above commands, such as SM org. apache. log4j. Logger | grep
Grep — Search for results that satisfy conditions
Plaintext — Remove the color from the result of the command
WC – Statistical output by line

Web Console

Connect Arthas through websocket.

Other features

Asynchronous command support
Execution results are logged
Batch processing support
Usage Description of ognl Expressions

3.2 Use examples

First, in the window, enter help to see all available commands provided (the communication is essentially through telnet protocol), as follows:

Attach success.
Connecting to arthas server... current timestamp is 1537266148
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.                           
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'                          
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.                          
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |                         
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'                          
                                                                                

wiki: https://alibaba.github.io/arthas
version: 3.0.4
pid: 25206
timestamp: 1537266148841

$ help
 NAME         DESCRIPTION                                                                                                                                                                      
 help         Display Arthas Help                                                                                                                                                              
 keymap       Display all the available keymap for the specified connection.                                                                                                                   
 sc           Search all the classes loaded by JVM                                                                                                                                             
 sm           Search the method of classes loaded by JVM                                                                                                                                       
 classloader  Show classloader info                                                                                                                                                            
 jad          Decompile class                                                                                                                                                                  
 getstatic    Show the static field of a class                                                                                                                                                 
 monitor      Monitor method execution statistics, e.g. total/success/failure count, average rt, fail rate, etc.                                                                               
 stack        Display the stack trace for the specified class and method                                                                                                                       
 thread       Display thread info, thread stack                                                                                                                                                
 trace        Trace the execution time of specified method invocation.                                                                                                                         
 watch        Display the input/output parameter, return object, and thrown exception of specified method invocation                                                                           
 tt           Time Tunnel                                                                                                                                                                      
 jvm          Display the target JVM information                                                                                                                                               
 dashboard    Overview of target jvm's thread, memory, gc, vm, tomcat info.                                                                                                                    
 dump         Dump class byte array from JVM                                                                                                                                                   
 options      View and change various Arthas options                                                                                                                                           
 cls          Clear the screen                                                                                                                                                                 
 reset        Reset all the enhanced classes                                                                                                                                                   
 version      Display Arthas version                                                                                                                                                           
 shutdown     Shut down Arthas server and exit the console                                                                                                                                     
 session      Display current session information                                                                                                                                              
 sysprop      Display, and change the system properties.                                                                                                                                       
 redefine     Redefine classes. @see Instrumentation#redefineClasses(ClassDefinition...)                                                                                                       
$ 

Here we mainly talk about watch, which monitors variables.

First, paste my test code:

package com.oct.tail;

import java.util.UUID;

/**
 * @Author Ryan
 * @Date 2018/9/18  9:58
 * @desc
 */
public class OtherTestCase {

    /**
     *
     * @return
     */
    public static String uuid(){
       return UUID.randomUUID().toString().replaceAll("-", "");
    }

    public static void main(String[] args) {

        while(true){
            System.out.println("uuid = " + uuid());

            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

Here’s an example (I’m based on Windows 10, JDK 8 environment, Linux is the same). (For the watch command, I pretend I don’t know how to use it, and immediately enter watch help to see what’s going to happen.)

 ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'


wiki: https://alibaba.github.io/arthas
version: 3.0.4
pid: 11924
timestamp: 1537326702039

$ watch -help
 USAGE:
   watch [-b] [-e] [-x <value>] [-f] [-h] [-n <value>] [-E] [-M <value>] [-s] class-
 pattern method-pattern express [condition-express]

 SUMMARY:
   Display the input/output parameter, return object, and thrown exception of specif
 ied method invocation
   The express may be one of the following expression (evaluated dynamically):
           target : the object
            clazz : the object's class
           method : the constructor or method
     params[0..n] : the parameters of method
        returnObj : the returned object of method
         throwExp : the throw exception of method
         isReturn : the method ended by return
          isThrow : the method ended by throwing exception
            #cost : the execution time in ms of method invocation
 Examples:
   watch -Eb org\.apache\.commons\.lang\.StringUtils isBlank params[0]
   watch -b org.apache.commons.lang.StringUtils isBlank params[0]
   watch -f org.apache.commons.lang.StringUtils isBlank returnObj
   watch -bf *StringUtils isBlank params[0]
   watch *StringUtils isBlank params[0]
   watch *StringUtils isBlank params[0] params[0].length==1
   watch *StringUtils isBlank '#cost>100'

 WIKI:
   https://alibaba.github.io/arthas/watch

 OPTIONS:
 -b, --before                Watch before invocation
 -e, --exception             Watch after throw exception
 -x, --expand <value>        Expand level of object (1 by default)
 -f, --finish                Watch after invocation, enable by default
 -h, --help                  this help
 -n, --limits <value>        Threshold of execution times
 -E, --regex                 Enable regular expression to match (wildcard matching b
                             y default)
 -M, --sizeLimit <value>     Upper size limit in bytes for the result (10 * 1024 * 1
                             024 by default)
 -s, --success               Watch after successful invocation
 <class-pattern>             The full qualified class name you want to watch
 <method-pattern>            The method name you want to watch
 <express>                   the content you want to watch, written by ognl.
                             Examples:
                               params[0]
                               'params[0]+params[1]'
                               returnObj
                               throwExp
                               target
                               clazz
                               method

 <condition-express>         Conditional expression in ognl style, for example:
                               TRUE  : 1==1
                               TRUE  : true
                               FALSE : false
                               TRUE  : 'params.length>=0'
                               FALSE : 1==2


$

Here, we monitor the return value of method UUID (). The monitoring results are as follows:

$
$
$ watch -f com.oct.tail.OtherTestCase uuid returnObj
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 18 ms.
ts=2018-09-19 11:13:48;result=@String[26c80eb505664dbcb14f8d810fb4811c]
ts=2018-09-19 11:13:49;result=@String[fc03c43864f94372b646ce6253d90646]
ts=2018-09-19 11:13:50;result=@String[55ff41e0d66347c2bc75ab8ff4ffda4e]
ts=2018-09-19 11:13:51;result=@String[c504388c0aa74458a41a1b3a77c3d536]
ts=2018-09-19 11:13:52;result=@String[18d59c09ffde4c7aab15feb88b3e433f]
ts=2018-09-19 11:13:53;result=@String[c19dd8c1e5f8442696c8f886e81e74d5]
ts=2018-09-19 11:13:54;result=@String[d37a74aa502f4897aa1ed84dc69b83d8]
ts=2018-09-19 11:13:55;result=@String[cc11753b6f424c1e9a6a1ab36f334349]
ts=2018-09-19 11:13:56;result=@String[75a9b3c0bed4426d9363168912f16d74]
ts=2018-09-19 11:13:57;result=@String[f13022118e5a4115800a6eacc480e6a8]

It works so well that I can hardly believe it.

Mysql delete duplicate records and only keep one row

Recently, I have been working for a subject storehouse system. Since duplicate subject are added to the storehouse, it is necessary to query the duplicate subject, and delete all duplicate subject leave only one, so that the duplicate subject cannot be taken when the test is taken.

First wrote a small example:

Single field operation

This is the table in the database:

Group By:

Select Repeat Field From Table Group By Repeat Field Having Count(*)>1

See if there is duplicate data:

GROUP BY <column name sequence>
HAVING <group conditional expression>

Query out: those groups that satisfy the group conditional expression (the number of repetitions is greater than 1) in the having clause according to the dname grouping

There is no difference between count(*) and count(1).

Query all duplicate data:

Select * From Table Where Repeat Field In (Select Repeat Field From Table Group By Repeat Field Having Count(*)>1)

Delete all duplicate subject:

Change the above query select to delete (this will cause an error)

DELETE
FROM
	dept
WHERE
	dname IN (
		SELECT
			dname
		FROM
			dept
		GROUP BY
			dname
		HAVING
			count(1) > 1

The following error will occur: [Err] 1093 – You can’t specify target table ‘dept’ for update in FROM clause

The reason is: update this table and query this table at the same time, query this table and update the table, can be understood as deadlock. Mysql does not support this update to query the same table operation

Solution: Query the columns of data to be updated as a third-party table, and then filter the updates.

query duplicate subject in the table (according to depno, except for the one with the smallest rowid)

the first method:

SELECT
	*
FROM
	dept
WHERE
	dname IN (
		SELECT
			dname
		FROM
			dept
		GROUP BY
			dname
		HAVING
			COUNT(1) > 1
	)
AND deptno NOT IN (
	SELECT
		MIN(deptno)
	FROM
		dept
	GROUP BY
		dname
	HAVING
		COUNT(1) > 1
)

The above is correct, but the query is too slow, you can try the following method:

The second method:

According to the dname grouping, find out the deptno minimum. Then query deptno and do not contain the one just found out. This will query all the duplicate data (except the smallest line of deptno)

SELECT *
FROM
	dept
WHERE
	deptno NOT IN (
		SELECT
			dt.minno
		FROM
			(
				SELECT
					MIN(deptno) AS minno
				FROM
					dept
				GROUP BY
					dname
			) dt
	)

Delete redundant duplicate subject in the table and leave only one:

the first method:

DELETE
FROM
	dept
WHERE
	dname IN (
		SELECT
			t.dname
		FROM
			(
				SELECT
					dname
				FROM
					dept
				GROUP BY
					dname
				HAVING
					count(1) > 1
			) t
	)
AND deptno NOT IN (
SELECT
	dt.mindeptno
FROM
	(
		SELECT
			min(deptno) AS mindeptno
		FROM
			dept
		GROUP BY
			dname
		HAVING
			count(1) > 1
	) dt
)

The second method (corresponding to the second method of query above, just change select to delete):

DELETE
FROM
	dept
WHERE
	deptno NOT IN (
		SELECT
			dt.minno
		FROM
			(
				SELECT
					MIN(deptno) AS minno
				FROM
					dept
				GROUP BY
					dname
			) dt
	)

Operation of multiple fields:

If you already have been learn to delete a single field is there, multiple fields are also very simple.

DELETE
FROM
	dept
WHERE
	(dname, db_source) IN (
		SELECT
			t.dname,
			t.db_source
		FROM
			(
				SELECT
					dname,
					db_source
				FROM
					dept
				GROUP BY
					dname,
					db_source
				HAVING
					count(1) > 1
			) t
	)
AND deptno NOT IN (
	SELECT
		dt.mindeptno
	FROM
		(
			SELECT
				min(deptno) AS mindeptno
			FROM
				dept
			GROUP BY
				dname,
				db_source
			HAVING
				count(1) > 1
		)

to sum up:
In fact, there are still a lot of things to optimize in the above methods. If the amount of data is too large, the implementation is very slow. You can consider adding optimization:

Add an index to the frequently queried field
Change * to the field you need to query, don’t query it all out.
Small tables drive large tables with IN, large tables drive small tables with EXISTS. The case where IN is suitable is the case where the amount of external data is small, not the case where the external data is large, because IN will traverse all the data of the outer table, assuming 100 tables of a, and 10000 of b, the number of traversals is 100*10000 times, and exists Then, it is executed 100 times to judge whether the data in the a table exists in the b table, and it only executes the number of a.length. As for which efficiency is high, it depends on the situation, because in is compared in memory, and exists is the database query operation.

Redis watch command

watch -n 1 -d “redis-cli -h 10.8.7.108 -p 6380 info | grep -e “connected_clients” -e “blocked_clients” -e “used_memory_human” -e “used_memory_peak_human” -e “rejected_connections” -e “evicted_keys” -e “instantaneous””

watch -n 5 -d “redis-cli -h 10.8.7.108 -p 6380 info | grep -e “connected_clients” -e “blocked_clients” -e “used_memory_human” -e “used_memory_peak_human” -e “rejected_connections” -e “evicted_keys” -e “instantaneous””

Logstash converts the @timestamp time to the local time.

Logstash converts the @timestamp time to the local time. If the local time zone is the Beijing time zone, add the following filter to solve the problem.

Filter {
         Ruby {
            Code => "event.set('mytime', (event.get('@timestamp').time.localtime + 8*60*60).strftime('%Y-%m-%d %H:%M :%S'))"
         }
         Date {
         Match => [ "mytime","yyyy-MM-dd HH:mm:ss" ]
         }
}

Remember, date { match => [ “mytime”,”yyyy-MM-dd HH:mm:ss” ] } can’t be less

Java problem diagnosis tool–Greys-Anatomy

Introduce a very useful Java process monitoring tool, which can print out the average time and maximum time consumption of each method in the Java class, which is very helpful for troubleshooting delays in Java programs. The name of this tool is: Greys-Anatomy.
Greys positioning is a professional JVM business problem location tool. Since it is a JVM, most of the things we face are Java programmers. I hope to share all the skills and ideas I have when writing software, so that more Java programmers can participate in or benefit from development.

Main characteristics
View class and method information that has been loaded by the JVM

1, method execution monitoring

      Call volume, success failure rate, response time

2, the method performs data operations

      Input parameters, return value, exception information record and view; support action playback

3, performance overhead rendering

       Tracking method call trajectories in a specified path, taking time

View method call stack

Download and install
Reference: https://github.com/oldmanpushcart/greys-anatomy

After the installation is complete, you can write a shell script, similar to the following (my-gresy.sh, assuming the java program is in the tomcat container):

JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/
CLASS_PATH=.:$JRE_HOME/lib
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
Export JAVA_HOME JRE_HOME CLASS_PATH PATH
Pid=$(ps -ef | grep -i tomcat-8.5-1 | grep -v 'grep'| awk '{print $2}')
/data/greys/greys.sh $pid@127.0.0.1:3258|tee -a /data/gresy/gresy.log

Execution script
> sh my-gresy.sh
The following interface appears

Then execute the monitor -c 5 com.my.api.* * command to print the monitoring information every 5 minutes, as follows:

By printing the information, you can find out which method takes the longest time to execute.

Shell scripts monitor server processes and ports

Shell scripts monitor server processes and ports
Recently learned shell programming, wrote a script that can monitor the port, PID, program name, etc. used by the current server; it can be used to find out if there are any ports that are not commonly used to be intercepted, and then judge whether it has been “engaged” by the hacker;

code show as below:

#tcp part
Port1=`netstat -an|grep LISTEN|egrep "0.0.0.0|:::"|awk '/^tcp/ {print $4}'|awk -F: '{print $2$4}'|sort -n`
Echo "TCP state:"
Echo "--------------------------------"
Echo "PORT PID COMMAND"
For a in $port1
Do
b=`lsof -n -i:$a|grep TCP|grep LISTEN|grep IPv4|awk '{printf("%d\t%s\n"),$2,$1}'`
Echo "$a $b"
Done
Echo "--------------------------------"

#udp part
Echo ""
Port2=`netstat -an|grep udp|awk '{print $4}'|awk -F: '{print $2}'|sed '/^$/d'|sort -n`
Echo "UDP state:"
Echo "--------------------------------"
Echo "PORT PID COMMAND"
For a in $port2
Do
b=`lsof -n -i:$a|grep UDP|grep IPv4|awk '{printf("%d\t%s\n"),$2,$1}'`
If [ -n "$b" ];then
Echo "$a $b"
Fi
Done
Echo "--------------------------------"

Exit 0

Use AWS SDK for Java 2.0,Set S3 object public access

Region region = Region.AP_SOUTHEAST_1;
        s3 = S3Client.builder().region(region).build();
        

        String bucket = "your bucket name";// "bucket" + System.currentTimeMillis();
        String key = "your file name";
	    
        //create bucket if it is not exist
        createBucket(bucket, region);

        // Put Object
        PutObjectRequest request = PutObjectRequest.builder().bucket(bucket).key(key).build();
        s3.putObject(request,RequestBody.fromByteBuffer(getRandomByteBuffer(10_000)));
 
        //set public acl
        PutObjectAclRequest putAclReq = PutObjectAclRequest.builder()
        		.bucket(bucket)
        		.key(key)
        		.acl(ObjectCannedACL.PUBLIC_READ)
        		//.accessControlPolicy(acl)
        		.build();
        s3.putObjectAcl(putAclReq);
        
	//set email acl
	 // get the current ACL
        	GetObjectAclRequest objectAclReq = GetObjectAclRequest.builder()
            		.bucket(bucket_name)
            		.key(object_key)
            		.build();
		String email = "your email adress";
        	GetObjectAclResponse getAclRes = s3.getObjectAcl(objectAclReq);
            Grantee grantee = Grantee.builder().emailAddress(email).build();
            Permission permission = Permission.valueOf(Permission.Read);
            List<Grant> grants = getAclRes.grants();
            Grant newGrantee = Grant.builder()
            		.grantee(grantee)
            		.permission(permission)
            		.build();
            grants.add(newGrantee);

            //put the new acl
            AccessControlPolicy acl = AccessControlPolicy.builder()
            		.grants(grants)
            		.build();
            PutObjectAclRequest putAclReq = PutObjectAclRequest.builder()
            		.bucket(bucket_name)
            		.key(object_key)
            		.accessControlPolicy(acl)
            		.build();
            s3.putObjectAcl(putAclReq);

Yii pseudo static

How to access static php files under the yii framework without having to create multiple actions, here is a simple record, I hope to leading to a better implementation:
1, configured in main.php

'urlManager'=>array(  
            'urlFormat'=>'path',  
            'showScriptName'=>false,  
            'rules'=>array(  
                  
'post/<view:.*>.html'=>'post/page/',  
  
  
                '<controller:\w+>/<action:\w+>'=>'<controller>/<action>',  
            ),  
        ), 

‘post/.html’=>’post/page/’ This line of code is the most important

2, implement a postController

<?php  
class PostController extends Controller{  
    public function actions() {  
        return array (  
                'page' => array (  
                        'class' => 'CViewAction'   
                )   
        );  
    }  
}  

3, add the post/pages directory in the corresponding views directory, and then add a static php file (such as 12345.php) in the pages directory
It can be accessed via http://domainname/post/12345.html. If there is a subdirectory (such as 20120920/123456.php), you can pass

Kafka Producer Performance Optimization

When we are talking about performance of Kafka Producer, we are really talking about two different things:

  • latency: how much time passes from the time KafkaProducer.send() was called until the message shows up in a Kafka broker.
  • throughput: how many messages can the producer send to Kafka each second.

Many years ago, I was in a storage class taught by scalability expert James Morle. One of the students asked why we need to worry about both latency and throughput – after all, if processing a message takes 10ms (latency), then clearly throughput is limited to 100 messages per second. When looking at things this way, it may look like higher latency == higher throughput. However, the relation between latency and throughput is not this trivial.

Lets start our discussion with agreeing that we are only talking about the new Kafka Producer (the one in org.apache.kafka.clients package). It makes things simpler and there’s no reason to use the old producer at this point.

Kafka Producer allows to send message batches. Suppose that due to network roundtrip times, it takes 2ms to send a single Kafka message. By sending one message at a time, we have latency of 2ms and throughput of 500 messages per second. But suppose that we are in no big hurry, and are willing to wait few milliseconds and send a larger batch – lets say we decided to wait 8ms and managed to accumulate 1000 messages. Our latency is now 10ms, but our throughput is up to 100,000 messages per second! Thats the main reason I love microbatches so much. By adding a tiny delay, and 10ms is usually acceptable even for financial applications, our throughput is 200 times greater. This type of trade-off is not unique to Kafka, btw. Network and storage subsystem use this kind of “micro batching” all the time.

Sometimes latency and throughput interact in even funnier ways. One day Ted Malaska complained that with Flafka, he can get 20ms latency when sending 100,000 messages per second, but huge 1-3s latency when sending just 100 messages a second. This made no sense at all, until we remembered that to save CPU, if Flafka doesn’t find messages to read from Kafka it will back off and retry later. Backoff times started at 0.5s and steadily increased. Ted kindly improved Flume to avoid this issue in FLUME-2729.

Anyway, back to the Kafka Producer. There are few settings you can modify to improve latency or throughput in Kafka Producer:

  • batch.size – This is an upper limit of how many messages Kafka Producer will attempt to batch before sending – specified in bytes (Default is 16K bytes – so 16 messages if each message is 1K in size). Kafka may send batches before this limit is reached (so latency doesn’t change by modifying this parameter), but will always send when this limit is reached. Therefore setting this limit too low will hurt throughput without improving latency. The main reason to set this low is lack of memory – Kafka will always allocate enough memory for the entire batch size, even if latency requirements cause it to send half-empty batches.
  • linger.ms – How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch. Normally the producer will not wait at all, and simply send all the messages that accumulated while the previous send was in progress (2 ms in the example above), but as we’ve discussed, sometimes we are willing to wait a bit longer in order to improve the overall throughput at the expense of a little higher latency. In this case tuning linger.ms to a higher value will make sense. Note that if batch.size is low and the batch if full before linger.ms time passes, the batch will send early, so it makes sense to tune batch.size and linger.ms together.

Other than tuning these parameters, you will want to avoid waiting on the future of the send method (i.e. the result from Kafka brokers), and instead send data continuously to Kafka. You can simply ignore the result (if success of sending messages is not critical), but its probably better to use a callback. You can find an example of how to do this in my github (look at produceAsync method).

If sending is still slow and you are trying to understand what is going on, you will want to check if the send thread is fully utilized through jvisualsm (it is called kafka-producer-network-thread) or keep an eye on average batch size metric. If you find that you can’t fill the buffer fast enough and the sender is idle, you can try adding application threads that share the same producer and increase throughput this way.

Another concern can be that the Producer will send all the batches that go to the same broker together when at least one of them is full – if you have one very busy topic and others that are less busy, you may see some skew in throughput this way.

Sometimes you will notice that the producer performance doesn’t scale as you add more partitions to a topic. This can happen because, as we mentioned, there is a send buffer for each partition. When you add more partitions, you have more send buffers, so perhaps the configuration you set to keep the buffers full before (# of threads, linger.ms) is no longer sufficient and buffers are sent half-empty (check the batch sizes). In this case you will need to add threads or increase linger.ms to improve utilization and scale your throughput.

Got more tips on ingesting data into Kafka? comments are welcome!