Recommended algorithm – content based recommendation

Find the relevance of items based on the metadata of the recommended items, and then recommend similar items to users based on the user’s past preference records.

1. Feature extraction: extracted information useful for result prediction

Feature extraction of items-tagging

  • Users Custom tags (UGC)
  • Latent Semantic Model (LFG)
  • Expert Tags (PGC)

Feature extraction of text information-key Word

  • Word segmentation, semantic processing and sentiment analysis (NLP)
  • Latent Semantic Analysis (LSA)

Second, feature engineering: The process of using professional background knowledge and skills to process data so that features can play a better role in machine learning algorithms

Feature engineering steps:

span>

1. Feature cleaning

2. Feature processing: features are classified according to data types, and there are different feature processing methods

    a, numerical type:< /p>

      Normalization: share picture

      Discretization: share picture

      Two ways of discretization: Equal step length [simple], Equal frequency [more accurate, but the data distribution needs to be recalculated each time]

    b. Category type: The data itself has no size relationship. To be fair, you can separate them

      One-Hot code/dummy variable: categorical data parallel expansion< /span>[The feature space will expand].

    c, time Type: both discrete value and continuous value can be considered

    d, statistical type: plus Minus average, quantile line, sequentiality, proportion type

3. Feature selection

3. Recommendation based on UGC

1. User Generate tags (UGC):

Users use tags to describe their views on items, so users generate Tag (UGC) is the link between users and items, and an important data source reflecting user interests

2, triples (user u, itemi, labelb): useru gives item< span class="s1">i tagged b p>

3. The formula for user u’s interest in the item i is:

Share picture

4. UGCQuestion:

Tend to give hot tags and hot items larger Weight, the personalization and novelty of recommendation will be reduced

Fourth, term frequency-Term Frequence ency-Inverse Doucment Frequency TF-IDF)

TF-IDF: Commonly used weighting techniques for information retrieval and text mining.

Used to evaluate the importance of a word to a document set or a document in a corpus.

The importance of a word increases in proportion to the number of times it appears in the document, but at the same time it decreases in inverse proportion to the frequency of its appearance in the expected library.

TF-IDF’s main idea is:

If If a word appears in an article with a high frequency of TF, and rarely appears in other articles, it is considered that this word or phrase has a good class distinction ability and is suitable for classification /span>

Leave a Comment

Your email address will not be published.