Overview of Industrial CTR Prediction

Hand-crafted Feature Engineering

1. Feature Interaction Learning

no interaction, pair-wise interaction (inner-product, outer-product, convolutional, attention and etc.),  high-order interaction (explicitly, implicit)
No interaction: LR, GBDT+LR
Pair-wise interaction:
inner-product: FM,
High-order interaction explicitly:
Youtube-DNN:
Wide&Deep
PNN
DeepFM

1.1 FM

FM explicitly model second-order cross features by parameterizing the weight of a cross feature as the inner product of the embedding vectors of the raw features.

1.2 Wide&Deep

wide part still needs feature engineering.

The bellow code is not the official code for wide&deep implementation, just show with comment the categorical features and dense features are combined.

 

1.3 DeepFM

DeepFM replace the wide part in WD and shares the feature embedding between the FM and deep component.

1.4 Deep & Cross Network (DCN)

Explicitly capture the feature interaction.
Learn predictive cross features of bounded degrees, and requires no manual feature engineering or exhaustive searching.
(1) Embedding and Stacking Layer: stack the embedding vectors, along with the normalized dense features to form the input.

    \[ \mathbf{x}_{0}=\left[\mathbf{x}_{\mathrm{embed}, 1}^{T}, \ldots, \mathbf{x}_{\text {embed }, k}^{T}, \mathbf{x}_{\text {dense }}^{T}\right] \]

(2) Cross Network: apply explicit feature crossing in an efficient way.

    \[ \mathbf{x}_{l+1}=\mathbf{x}_{0} \mathbf{x}_{l}^{T} \mathbf{w}_{l}+\mathbf{b}_{l}+\mathbf{x}_{l}=f\left(\mathbf{x}_{l}, \mathbf{w}_{l}, \mathbf{b}_{l}\right)+\mathbf{x}_{l} \]

(3) Deep Network:

    \[ \mathbf{h}_{l+1}=f\left(W_{l} \mathbf{h}_{l}+\mathbf{b}_{l}\right) \]

(4) Combination Layer:

    \[ p=\sigma\left(\left[\mathbf{x}_{L_{1}}^{T}, \mathbf{h}_{L_{2}}^{T}\right] \mathbf{w}_{\text {logits }}\right) \]

loss fucntion:

    \[ \operatorname{loss}=-\frac{1}{N} \sum_{i=1}^{N} y_{i} \log \left(p_{i}\right)+\left(1-y_{i}\right) \log \left(1-p_{i}\right)+\lambda \sum_{l}\left\|\mathbf{w}_{l}\right\|^{2} \]

DCN-V2

DCN-V2

1.5 eXtreme Deep Factorization Machine (xDeepFM)

Models the low-order and high-order feature interactions in an explicit way

Above FMs model feature interaction with the same weight, ignoring the relative importance, Attentional Factorization Machines(AFM) uses the attention network to learn the weights of feature interactions.

1.6 Attentional Factorization Machines (AFM)

Reference.

1.7 NFM

Reference.

1.8 Feature Importance and Bilinear feature Interaction NETwork (FiBiNet)

SENET is used to boost feature discriminability.

SENET layer: Squeeze, excitation and re-weight steps.
Bilinear-interaction Layer: Combines the inner product and Hadamard product to learn the feature interactions.

Both the origin embeddings and re-weighted embedding should be sent to the bilinear-interaction layer.
Code snippet.

 

1.9 CAN

Reference

1.10 AutoInt+

Reference

1.11 ONN

Reference

 

 

2. Behavior Sequence Modeling

 

3. Multi-task Learning

 

4. Multi-modal Learning

Cross-domain Learning

Leave a Reply

Your email address will not be published. Required fields are marked *