GRMS_README - grms, readme

A product recommendation system based on Hadoop

Based on characteristics:
Based on behavior: Has certain historical characteristics.
Based on user:
Based on product:

Recommendation result = user’s purchase vector * item similarity matrix

Item similarity: the number of co-occurrences of items

1. Project name: GRMS
2. Add Maven dependency: pom.xml
3. Create package:
com.briup.bigdata.project.grms
|– step1
|–step2
|–…
|–utils
4. Put the four xml configuration files on the cluster into the resources directory.
5. Create a directory under the root directory of the HDFS cluster:
/grms
|–rawdata
|—–matrix.txt
|–step1
| –…
6. Start programming.
@Original data:
10001 20001 1
10001 20002 1
10001 20005 1
10001 20006 1
10001 20007 1
10002 20003 1
10002 20004 1 10002 20006 1
10003 20002 1
10003 20007 1
10004 20001 1
10004 20002 1
10004 20005 1
10004 20006 1
10005 20001 1
10006 20004 1
10006 20007 1
a. Calculate the list of products purchased by the user
Class name: UserBuyGoodsList.java

UserBuyGoodsList
UserBuyGoodsListMapper
UserBuyGoodsListReducer
Result data :
10001 20001,20005,20006,20007,20002
10002 20006,20003,20004
10003 20002,20007
10004 20001,20002,20005,20006
10005 20001
10006 20004,20007
b. Calculate the co-occurrence relationship of commodities
File: GoodsCooccurrenceList.java

Class name: GoodsCooccurrenceList
GoodsCooccurrenceListMapper
GoodsCooccurrenceListReducer
Data source: Step 1 Calculation result
Calculation result:
20001 20001
20001 20001
20001 20002
20001 20005
20001 20006
20001 20007
20001 20001
20001 20006 20001 20005
20001 20002
20002 20007
20002 20001
20002 20005
20002 20006
20002 20007
20002 20002
20002 20006
20002 20005
20002 20002
20002 20001
20002 20002
20003 20003
20003 20004
20003 20006
20004 20004
20004 20007
20004 20004
20004 20006
20004 20003
20005 20002
20005 20006
20005 20005
20005 20001
20005 20005
20005 20006
20005 20007
20005 20001
20005 20002
20006 20005
20006 20003
20006 20004
20006 20001
20006 20002
20006 20006
20006 20002
20006 20006
20006 20007
20006 20006
20006 20001
20006 20005
20007 20006
20007 20004
20007 20007
20007 20002
20007 20007
20007 20005
20007 20001
20007 20002
20007 20007
c. Calculate the number of product co-occurrences (co-occurrence matrix)
file : GoodsCooccurrenceMatrix.java

Class name: GoodsCooccurrenceMatrix
GoodsCooccurrenceMatrixMappper
GoodsCooccurrenceMatrixReducer
Data source: the result of step 2
Calculation result:
20001 20001:3,20002: 2,20005:2,20006:2,20007:1
20002 20001:2,20002: 3,20005:2,20006:2,20007:2
20003 20003:1,20004:1,20006:1
20004 20003:1,20004:2,20006:1,20007:1
20005 20001:2,20002:2,20005:2,20006:2,20007:1
20006 20001:2,20002:2,20003:1,20004:1,20005:2,20006:3,20007: 1
20007 20001:1,20002:2,20004:1,20005:1,20006:1,20007:3
d. Calculate the user’s purchase vector
File: UserBuyGoodsVector.java

Class name: UserBuyGoodsVector
UserBuyGoodsVectorMapper
UserBuyGoodsVectorReducer
Source data: the result of step 1 or the most original data.
Calculation result:
20001 10001:1,10004:1,10005:1
20002 10001:1,10003:1,10004:1
20003 10002:1
20004 10002:1 ,10006:1
20005 10001:1,10004:1
20006 10001:1,10002:1,10004:1
20007 10001:1,10003:1,10006:1
e. The product co-occurrence matrix is multiplied by the user purchase vector to form a temporary recommendation result.
File: MultiplyGoodsMatrixAndUserVector.java

Class name: MultiplyGoodsMatrixAndUserVectorFirstMapper
MultiplyGoodsMatrixAndUserVectorSecondMapper
File: MultiplyGoodsMatrixAndUserVectorReducer
Thinking: The source of the file comes from two files, the first is the third The result of step (the co-occurrence matrix of items), the second file is the result of step 4 (the user’s purchase vector). So in an MR program, you need to use two custom Mappers to process separately, and then define a custom Reducer to process the intermediate results of the two Mappers.
1. Ensure that the Key of the two Mappers are the same.
2. The data types of Key and Value of the data output of the two Mappers are the same.
3. In the job configuration, you need to use MultipleInputs.addInputPath(job, data input path, data input format controller.class, Mapper class executed.class) for the configuration of the Mapper side;
Original data: The result data of steps 3 and 4.
Calculation result:
10001,20001 2
10001,20001 2
10001,20001 3
10001,20001 1
10001,20001 2
10001,20002 3
10001,20002 2
10001,20002 2
10001,20002 2
10001,20002 2
10001,20003 1
10001,20004 1
10001,20004 1
10001,20005 2
10001,20005 2
10001,20005 2
10001,20005 1
10001,20005 2
10001,20006 2
10001,20006 3
10001 ,20006 2
10001,20006 1
10001,20006 2
10001,20007 2
10001,20007 1
10001,20007 1
10001,20007 3
10001, 20007 1
10002,20001 2
10002,20002 2
10002,20003 1
10002,20003 1
10002,20003 1
10002,20004 1
10002,20004 2
10002,20004 1
10002,20005 2
10002,20006 3
10002,20006 1
10002,20006 1
10002,20007 1
10002,20007 1
10003,20001 2
10003,20001 1
10003,20002 3
10003,20002 2
10003,20004 1
10003,20005 2
10003,20005 1 10003,20006 2
10003,20006 1
10003,20007 2
10003,20007 3
10004,20001 2
10004,20001 2
10004,20001 3
10004,20001 2
10004,20002 3
10004,20002 2
10004,20002 2
10004,20002 2
10004,20003 1 10004,20004 1
10004,20005 2
10004,20005 2
10004,20005 2
10004,20005 2
10004,20006 2
10004,20006 3
10004,20006 2
10004,20006 2
10004,20007 2
10004,20007 1
10004,20007 1
10004,20007 1
10005,20001 3
10005,20002 2
10005,20005 2
10005,20006 2
10005,20007 1
10006,20001 1
10006,20002 2
10006,20003 1
10006 ,20004 2
10006,20004 1
10006,20005 1
10006,20006 1
10006,20006 1
10006,20007 1
10006,20007 3
f. Sum the recommended scattered results calculated in step 5.
File: MakeSumForMultiplication.java

MakeSumForMultiplication
MakeSumForMultiplicationMapper
MakeSumForMultiplicationReducer
Original data: calculation result of step 5
calculation result:
10001,20001 10 10001,20002 11
10001,20003 1
10001,20004 2
10001,20005 9
10001,20006 10
10001,20007 8
10002,20001 2
10002,20002 2
10002,20003 3
10002,20004 4
10002,20005 2
10002,20006 5
10002,20007 2
10003,20001 3
10003,20002 5
10003,20004 1
10003,20005 3
10003,20006 3
10003,20007 5
10004,20001 9
10004,20002 9
10004 ,20003 1
10004,20004 1
10004,20005 8
10004,20006 9
10004,20007 5
10005,20001 3
10005,20002 2
10005, 20005 2
10005,20006 2
10005,20007 1
10006,20001 1
10006,20002 2
10006,20003 1
10006,20004 3
10006,20005 1
10006,20006 2
10006,20007 4
g. Data deduplication, remove the product information that the user has purchased from the recommendation result.
File: DuplicateDataForResult.java

Class name: DuplicateDataForResultFirstMapper
DuplicateDataForResultSecondMapper
DuplicateDataForResultReducer
Data source:
1. FirstMapper processes the user’s purchase list data.
2. SecondMapper processes the 6th recommendation result data.
Calculation result:
10001 20004 2
10001 20003 1
10002 20002 2
10002 20007 2
10002 20001 2
10002 20005 2
10003 20006 3
10003 20005 3
10003 20001 3
10003 20004 1
10004 20007 5
10004 20004 1
10004 20003 1
10005 20006 2
10005 20002 2
10005 20005 2
10005 20007 1
10006 20006 2
10006 20002 2
10006 20005 1
10006 20003 1
10006 20001 1
h. Save the recommended results to the MySQL database Medium
Note:
1. Ensure that the table exists in advance.
grms.results(uid varchar(20),
gid varchar(20),
exp int)
2. When the data on the HDFS cluster is saved to the MySQL database through the MR program, Only the final output Key value can be saved in the database.
3. Customize the data type of the final output Key. The custom class implements WritableComparable, but as the key to output data from the HDFS cluster to the MySQL database, it also implements the DBWritable interface.
readFields(ResultSet rs)
write(PrepareStatement ps)

A impl WC,DBW{
private String uid;
private String gid;
private int exp;

readFields(ResultSet rs){
uid=rs.getString(1);
}

write(PrepareStatement ps){ ps.setString(1,uid) ; ps.setString(2,gid); ps.setInt(1,exp);}} 4. In the job configuration, you need to use DBConfiguration.setConfiguration() to specify the relevant parameters for connecting to the database. Parameter 1: The configuration object related to the current job, the Configuration object should be obtained through the Job object; Parameter 2: “com.mysql.jdbc.Driver” Parameter 3: “jdbc:mysql://ip:port/grms” Parameter 4 And 5: “Username” and “Password”. 5. The format control of data output needs to use DBOutputFormat. DBOutputFormat.setOutput(); There are three parameters: Parameter 1: Job object. Parameter 2: database table name Parameter 3: variable-length parameter, refers to the name of the column inserted into the database. insert into database table name values(?,?,?); File: SaveRecommendResultToDB.java Class name: SaveRecommendResultToDBMapper SaveRecommendResultToDBReducer Data source: No. 7 Step result data. Data destination: MySQL database, grms.result i. Build a job flow object (JobControl), and let the program submit the job by itself. File: GoodsRecommendationManagementSystemJobController.java Class name: GoodsRecommendationManagementSystemJobController 1. Create the Job objects from step1 to step8 respectively, and then configure their respective jobs. Job job1=Job.getIns(); 2. Create 8 ControlledJob objects and convert the Job object of the previous step into a job that can be controlled. ControlledJob cj1=new CJ(); cj1.setJob(job1); cj2.setJob(job2); 3. Add dependencies on jobs that can be controlled. cj2.addDepe…(cj1); 4. Construct a JobControl object and add 8 jobs that can be controlled one by one. JobControl jc=new JobControl(“”); 5. Construct a thread object, start the thread, and execute the job. Thread t=new Thread(jc); t.start();

Leave a Comment Cancel reply