class: center, middle # Neural Networks for Recommender Systems
### Paris 2017 Olivier Grisel .affiliations[ ![Inria](images/inria-logo.png) ] --- class: middle, center # Recommender Systems --- class: middle, center, singleimg ![Amazon books](images/amazon_book_recos.png) ??? Tradional product recommendation --- class: middle, center, singleimg ![Amazon books](images/amazon_book_recos_inputs.png) --- class: middle, center, singleimg ![Amazon books](images/amazon_book_recos_full.png) ??? Goal: Direction maximize sales / upsales --- class: middle, center, singleimg ![Spotify](images/spotify_weekly_recos.png) ??? - reduce UI friction / simplify navigation - hide holes in a catalog - maximize user satisfaction and improve user retention --- class: middle, center, singleimg ![Google](images/google_python_recos.png) ??? Search engines are now recommender systems --- class: middle, center, singleimg ![Google](images/google_python_recos_inputs.png) ??? The query is now contextualized with the user history. --- class: middle, center, singleimg ![Google](images/google_python_recos_webpages.png) ??? A google search is actually the results of 2 queries in 2 recsys: - first the traditional web pages --- class: middle, center, singleimg ![Google](images/google_python_recos_full.png) ??? The second recsys is the query for the best personnalized ad. Personnalized ads are the business model of 2 of the 10 largest global companies by market cap. Other types of recos: - apps on app stores - status messages on social medias ... --- class: middle, center # RecSys 101 --- # Content-based Inputs: user and item metadata User gender, item publication date, director, actors... -- # Collaborative Filtering Inputs: user/item interactions Stars, plays, likes, clicks -- # Hybrid systems ??? Content-based: good for new users or new items CF better for popular items and very active users Hybrid: CF + metadata to mitigate the cold-start problem --- # Explicit Feedback .singleimgnoborder.middlebelowheader[
] ??? Explicit: positive and negative feedback, continuous prediction Examples: review stars and votes Regression or ranking metrics --- # Implicit Feedback .singleimgnoborder.middlebelowheader[
] ??? Implicit **positive feedback only**: views, plays, comments... Implicit feedback can be **negative** - Page view with very short dwell time - Click on "next" button Ranking metrics or CTR Implicit feedback much more **abundant** than explicit feedback --- # Matrix Factorization for CF .center[
] $$ L(U, V) = \sum\_{(i, j) \in D} || r\_{i,j} - \mathbf{u}\_i^T \cdot \mathbf{v}\_j ||_2^2 + \lambda (||U||\_2^2 + ||V||\_2^2) $$ ??? R is observed feedback: ratings Each row of R is represent the history of a user Many missing entry: each user has seen less than 1% of the catalog Model R by matrix multiplication of U by V with d dimensions - Train $U$ and $V$ on observed ratings data $r\_{i, j}$ - Use $U^T V$ to find missing entries in sparse rating data matrix $R$ --- class: middle, center # Embeddings --- class: center, middle | User | Item | |-----------|-------| | 19483 | 45243 | | 95727 | 39572 | | 76244 | 83773 | | 2584 | 94723 | | 2584 | 45243 | | 23957 | 25892 | | 2584 | 39572 | | 49138 | 20481 | | 19483 | 25892 | | ... | ... | ??? Core of recsys training data is a set of pairs of large integers. Often with a timestamp, user and item metadata and maybe a target rating. The same integers should appear several times. --- # Symbolic variables ### Recommender Systems Item ids, user ids -- ### Text tokens Characters, words, bigrams.. -- ### Categorical descriptors Tags, movie genres, director name, visited URLs, skills on a resume, product categories... --- class: middle, center ## Symbol $s$ in vocabulary $V$ --- # One-hot representation $$onehot(\text{'salad'}) = [0, 0, 1, ..., 0] \in \\{0, 1\\}^{|V|}$$
.diagram[
]
-- - Sparse, discrete, large dimension $|V|$ - Each axis has a meaning - Symbols are equidistant from each other --- # Embedding $$embedding(\text{'salad'}) = [3.28, -0.45, ... 7.11] \in \mathbb{R}^d$$
-- - Continuous and dense - Axis have no meaning _a priori_ - Embedding metric can capture semantic distance ??? Can represent a huge vocabulary in low dimension, typically: $d \in \\{16, 32, ..., 4096\\}$ -- .left-column[ ### Euclidean distance $d(x,y) = || x - y ||_2$ ] .right-column[ ### Cosine similarity $cosine(x,y) = \frac{x \cdot y}{||x|| \cdot ||y||}$ ] ??? Distance - Simple with good properties - Dependent on norm (embeddings usually unconstrained) Cosine: - Angle between points, regardless of norm - $cosine(x,y) \in (-1,1)$ - Expected cosine similarity of random pairs of vectors is $0$ Alternatively: just the dot product --- # Embedding as a Linear Layer Equivalent to one-hot encoding multiplied by a weight matrix $\mathbf{W} \in \mathbb{R}^{n \times d}$: $$embedding(s) = onehot(s) . \mathbf{W} $$ - $W$ initialized randomly - Part of the model parameters -- In Keras: ```py # input: batch of integers Embedding(output_dim=d, input_dim=n, input_length=1) # output: batch of float vectors ``` --- class: center, middle # Architectures --- # RecSys with Explicit Feedback .diagram[
] --- # Deep RecSys Architecture .diagram[
] --- # Deep RecSys with metadata .diagram[
] --- # Implicit Feedback: Triplet loss .diagram[
] --- # Deep Triplet Networks .diagram[
] ??? Pick pair of positive feedback (user i, item j) Sample "negative" item k uniformly at random. It also possible to use the current state of the model to "mine" hard negatives. --- .center[
Deep Neural Networks for YouTube Recommendations https://research.google.com/pubs/pub45530.html ] ??? Alternative way to deal with Implicit Feedback: - no user embedding in input, the user is represented by averaged embeddings of movie watches (and queries) - the output is a classification loss: 1 for the next movie being watched by the user. - possible to model users by sequences of passed interactions with ordering information --- # Embed all the things! ### Discrete variables with linear embeddings -- ### Images and sounds with ConvNets -- ### Sequences of discrete tokens with RNNs -- ### Molecules with Graph ConvNets -- ... ??? Embed interaction data and content-based metadata naturally in shared model: Mix and match loss functions: predict clicks in search results, next song after current song, related artist clicks... Use pre-trained models with frozen weights on some parts: ImageNet CNN, word embeddings... Or fine-tune everything if very large amount user interaction data. ??? Other DL tools: Dropout regularization, Adam optimizer, open source frameworks --- class: middle, center # Thank you for your attention! ## Notebooks with keras code linked as pinned tweet: ## @ogrisel --- class: middle, center # Ethical Considerations of Recommender Systems --- # Ethical Considerations ### Amplification of existing discriminatory and unfair behaviors / bias -- ### Amplification of the filter bubble and opinion polarization -- ### Lack of transparency ??? Unfairness - Example: gender bias in ad clicks - Using the firstname as a predictive feature Polarization - People tend to unfollow people they don't agree with - Ranking / filtering systems can further amplify this issue - Optimizing for short-term clicks can promote clickbait contents ??? - Wise modeling choices (e.g. use of "firstname" as feature) - How to allow users to assess fairness by themselves? - How to allow for independent audits while respecting the privacy of users? - Learning representations that actively enforce fairness? # Fairness Censoring Representations with an Adversary Harrison Edwards, Amos Storkey, ICLR 2016 https://arxiv.org/abs/1511.05897 # Transparency - http://www.datatransparencylab.org/ - TransAlgo initiative in France