Mysql delete duplicate records and only keep one row

Recently, I have been working for a subject storehouse system. Since duplicate subject are added to the storehouse, it is necessary to query the duplicate subject, and delete all duplicate subject leave only one, so that the duplicate subject cannot be taken when the test is taken.

First wrote a small example:

Single field operation

This is the table in the database:

Group By:

Select Repeat Field From Table Group By Repeat Field Having Count(*)>1

See if there is duplicate data:

GROUP BY <column name sequence>
HAVING <group conditional expression>

Query out: those groups that satisfy the group conditional expression (the number of repetitions is greater than 1) in the having clause according to the dname grouping

There is no difference between count(*) and count(1).

Query all duplicate data:

Select * From Table Where Repeat Field In (Select Repeat Field From Table Group By Repeat Field Having Count(*)>1)

Delete all duplicate subject:

Change the above query select to delete (this will cause an error)

DELETE
FROM
	dept
WHERE
	dname IN (
		SELECT
			dname
		FROM
			dept
		GROUP BY
			dname
		HAVING
			count(1) > 1

The following error will occur: [Err] 1093 – You can’t specify target table ‘dept’ for update in FROM clause

The reason is: update this table and query this table at the same time, query this table and update the table, can be understood as deadlock. Mysql does not support this update to query the same table operation

Solution: Query the columns of data to be updated as a third-party table, and then filter the updates.

query duplicate subject in the table (according to depno, except for the one with the smallest rowid)

the first method:

SELECT
	*
FROM
	dept
WHERE
	dname IN (
		SELECT
			dname
		FROM
			dept
		GROUP BY
			dname
		HAVING
			COUNT(1) > 1
	)
AND deptno NOT IN (
	SELECT
		MIN(deptno)
	FROM
		dept
	GROUP BY
		dname
	HAVING
		COUNT(1) > 1
)

The above is correct, but the query is too slow, you can try the following method:

The second method:

According to the dname grouping, find out the deptno minimum. Then query deptno and do not contain the one just found out. This will query all the duplicate data (except the smallest line of deptno)

SELECT *
FROM
	dept
WHERE
	deptno NOT IN (
		SELECT
			dt.minno
		FROM
			(
				SELECT
					MIN(deptno) AS minno
				FROM
					dept
				GROUP BY
					dname
			) dt
	)

Delete redundant duplicate subject in the table and leave only one:

the first method:

DELETE
FROM
	dept
WHERE
	dname IN (
		SELECT
			t.dname
		FROM
			(
				SELECT
					dname
				FROM
					dept
				GROUP BY
					dname
				HAVING
					count(1) > 1
			) t
	)
AND deptno NOT IN (
SELECT
	dt.mindeptno
FROM
	(
		SELECT
			min(deptno) AS mindeptno
		FROM
			dept
		GROUP BY
			dname
		HAVING
			count(1) > 1
	) dt
)

The second method (corresponding to the second method of query above, just change select to delete):

DELETE
FROM
	dept
WHERE
	deptno NOT IN (
		SELECT
			dt.minno
		FROM
			(
				SELECT
					MIN(deptno) AS minno
				FROM
					dept
				GROUP BY
					dname
			) dt
	)

Operation of multiple fields:

If you already have been learn to delete a single field is there, multiple fields are also very simple.

DELETE
FROM
	dept
WHERE
	(dname, db_source) IN (
		SELECT
			t.dname,
			t.db_source
		FROM
			(
				SELECT
					dname,
					db_source
				FROM
					dept
				GROUP BY
					dname,
					db_source
				HAVING
					count(1) > 1
			) t
	)
AND deptno NOT IN (
	SELECT
		dt.mindeptno
	FROM
		(
			SELECT
				min(deptno) AS mindeptno
			FROM
				dept
			GROUP BY
				dname,
				db_source
			HAVING
				count(1) > 1
		)

to sum up:
In fact, there are still a lot of things to optimize in the above methods. If the amount of data is too large, the implementation is very slow. You can consider adding optimization:

Add an index to the frequently queried field
Change * to the field you need to query, don’t query it all out.
Small tables drive large tables with IN, large tables drive small tables with EXISTS. The case where IN is suitable is the case where the amount of external data is small, not the case where the external data is large, because IN will traverse all the data of the outer table, assuming 100 tables of a, and 10000 of b, the number of traversals is 100*10000 times, and exists Then, it is executed 100 times to judge whether the data in the a table exists in the b table, and it only executes the number of a.length. As for which efficiency is high, it depends on the situation, because in is compared in memory, and exists is the database query operation.

Mysql deadlock, waiting for resources, transaction lock, lock wait timeout exceeded; try restarting transaction

Mysql deadlock, waiting for resources, transaction lock, lock wait timeout exceeded; try restarting transaction
I have already learned about InnoDB. When the lock wait occurs, it will judge whether the timeout operation is needed according to the configuration of the parameter innodb_lock_wait_timeout. This document describes the viewing and analysis processing when the lock wait occurs.

Before the InnoDB Plugin, the current database request is generally viewed through the show full processlist (it is difficult to find the locked row record problem) and the show engine innodb status command, and then the lock in the current transaction is determined. With the development of mysql, a more convenient way has been provided to monitor the lock wait phenomenon in the database.

There are three tables under the information_schema: INNODB_TRX, INNODB_LOCKS, INNODB_LOCK_WAITS (solving the problem method), through which you can more easily monitor the current transaction and analyze possible problems.

More commonly used columns:

Trx_id: the unique transaction ID inside the InnoDB storage engine
Trx_status: the status of the current transaction
Trx_status: the start time of the transaction
Trx_requested_lock_id: the lock ID of the waiting transaction
Trx_wait_started: the start time of the transaction wait
Trx_weight: The weight of the transaction, which reflects the number of rows modified and locked by a transaction. When a deadlock is found to need to be rolled back, the smaller the weight, the value is rolled back.
Trx_mysql_thread_id: Process ID in MySQL, corresponding to the ID value in show processlist
Trx_query: SQL statement run by the transaction

Kill process ID; encountered a vehicle record in the work, select * from car for update or modify the value of a field, the error: Lock wait timeout exceeded; try restarting transaction solution, and fundamentally from the business logic code optimization For the operation of the database, I have encountered such a situation before. For example, if I have just modified this record and then modified it again, I will report this error and try to avoid it from the code and business level.

Mybatis annotation application mapping statement

MyBatis provides a variety of annotation mappings such as SELECT, UPDATE, INSERT, and DELETE. Let me take a closer look at the application of these mappings.

1.@Insert
We can use the @Insert annotation to declare an INSERT mapping.

Package com.owen.mybatis.mappers;
Public interface StudentMapper
{
@Insert(“INSERT INTO STUDENTS(STUD_ID, NAME, EMAIL, ADDR_ID, PHONE)
VALUES(#{studId},#{name},#{email},#{address.addrId},#{phone})”)
Int insertStudent(Student student);
}
In the insertStudent() method, we annotated @Insert, which will return the number of affected rows through the declaration of the insert.

When we used the xml file earlier, we declared the statement that automatically generated the primary key. In this annotation method, we can also use the @Options method to annotate the declaration to generate the primary key. This method contains the parameters of useGeneratedKeys and keyProperty. These two parameters are for the data to form the column value of auto_increment, the value of which is the column of an object in the existing column as the value.

@Insert(“INSERT INTO STUDENTS(NAME,EMAIL,ADDR_ID, PHONE)
VALUES(#{name},#{email},#{address.addrId},#{phone})”)
@Options(useGeneratedKeys=true, keyProperty=”studId”)
Int insertStudent(Student student);
The column of STUD_ID here will be automatically generated by the MYSQL database, and its value will be the same as the value of studId.

StudentMapper mapper = sqlSession.getMapper(StudentMapper.class);
mapper.insertStudent(student);
Int studentId = student.getStudId();
Some databases, such as Oracle can not provide the AUTO_INCREMENT column and we need to use SEQUENCE to form the primary key. We can use the @Selectkey annotation to specify any SQL declaration, and it can be used as the value of the primary key.

@Insert(“INSERT INTO STUDENTS(STUD_ID, NAME, EMAIL, ADDR_ID, PHONE)
VALUES(#{studId},#{name},#{email},#{address.addrId},#{phone})”)
@SelectKey(statement=”SELECT STUD_ID_SEQ.NEXTVAL FROM DUAL”,
keyProperty=”studId”, resultType=int.class, before=true)
Int insertStudent(Student student);
Here we apply @SelectKey to generate the value of the primary key and save it in the attribute of the student’s studId. Because we defined before=true, the primary key was already generated when it was inserted.

If you use the trigger of SEQUENCE to generate the primary key, we can get the primary key from sequence_name.currval and then execute the insert statement.

 @Insert(“INSERT INTO STUDENTS(NAME,EMAIL,ADDR_ID, PHONE)
VALUES(#{name},#{email},#{address.addrId},#{phone})”)
@SelectKey(statement=”SELECT STUD_ID_SEQ.CURRVAL FROM DUAL”,
keyProperty=”studId”, resultType=int.class, before=false)
Int insertStudent(Student student);

2.@Update
We can use the @Update annotation to declare an UPDATE.

@Update(“UPDATE STUDENTS SET NAME=#{name}, EMAIL=#{email},
PHONE=#{phone} WHERE STUD_ID=#{studId}”)
Int updateStudent(Student student);
In the updateStudent() method, we used the @Update annotation and will return the affected rows.

StudentMapper mapper = sqlSession.getMapper(StudentMapper.class);
Int noOfRowsUpdated = mapper.updateStudent(student);

3.@Delete
We can use @Delete to implement the declaration of DELETE.

@Delete(“DELETE FROM STUDENTS WHERE STUD_ID=#{studId}”)
Int deleteStudent(int studId);
The method of deleteStudent() will return the affected row.

4.@Select
We can use the @Select annotation to implement the SELECT mapping.

 Package com.owen.mybatis.mappers;
Public interface StudentMapper
{
@Select(“SELECT STUD_ID AS STUDID, NAME, EMAIL, PHONE FROM
STUDENTS WHERE STUD_ID=#{studId}”)
Student findStudentById(Integer studId);
}
To match the instance of the Student object, we use studId as an alias for stud_id. How to return a value with multiple rows will report a TooManyResultException error.