Neural Recommender Systems

class: center, middle

# Neural Networks for Recommender Systems

### Paris   2017

Olivier Grisel

.affiliations[
  ![Inria](images/inria-logo.png)
]

---
class: middle, center

# Recommender Systems

---
class: middle, center, singleimg

![Amazon books](images/amazon_book_recos.png)

???

Tradional product recommendation

---
class: middle, center, singleimg

![Amazon books](images/amazon_book_recos_inputs.png)

---
class: middle, center, singleimg

![Amazon books](images/amazon_book_recos_full.png)

???

Goal: Direction maximize sales / upsales

---
class: middle, center, singleimg

![Spotify](images/spotify_weekly_recos.png)

???

- reduce UI friction / simplify navigation
- hide holes in a catalog
- maximize user satisfaction and improve user retention

---
class: middle, center, singleimg

![Google](images/google_python_recos.png)

???

Search engines are now recommender systems

---
class: middle, center, singleimg

![Google](images/google_python_recos_inputs.png)

???

The query is now contextualized with the user history.

---
class: middle, center, singleimg

![Google](images/google_python_recos_webpages.png)

???

A google search is actually the results of 2 queries in 2 recsys:

- first the traditional web pages

---
class: middle, center, singleimg

![Google](images/google_python_recos_full.png)

???

The second recsys is the query for the best personnalized ad.

Personnalized ads are the business model of 2 of the 10 largest global
companies by market cap.

Other types of recos:

- apps on app stores

- status messages on social medias

...

---
class: middle, center

# RecSys 101

---

# Content-based

Inputs: user and item metadata

User gender, item publication date, director, actors...

# Collaborative Filtering

Inputs: user/item interactions

Stars, plays, likes, clicks

# Hybrid systems

???

Content-based: good for new users or new items

CF better for popular items and very active users

Hybrid: CF + metadata to mitigate the cold-start problem

---
# Explicit Feedback

.singleimgnoborder.middlebelowheader[
<img src="images/feedback_stars.svg" width="100%" />
]

???

Explicit: positive and negative feedback, continuous prediction

Examples: review stars and votes

Regression or ranking metrics

---
# Implicit Feedback

.singleimgnoborder.middlebelowheader[
<img src="images/feedback_plays.svg" width="400px" />
]

???

Implicit **positive feedback only**: views, plays, comments...

Implicit feedback can be **negative**

- Page view with very short dwell time
- Click on "next" button

Ranking metrics or CTR

Implicit feedback much more **abundant** than explicit feedback

---
# Matrix Factorization for CF

.center[
<img src="images/cf-mf.svg" width="600px" />
]

$$
L(U, V) = \sum\_{(i, j) \in D} || r\_{i,j} - \mathbf{u}\_i^T \cdot \mathbf{v}\_j ||_2^2 + \lambda (||U||\_2^2 + ||V||\_2^2) 
$$

???

R is observed feedback: ratings

Each row of R is represent the history of a user

Many missing entry: each user has seen less than 1% of the catalog

Model R by matrix multiplication of U by V with d dimensions

- Train $U$ and $V$ on observed ratings data $r\_{i, j}$
- Use $U^T V$ to find missing entries in sparse rating data matrix $R$

---
class: middle, center

# Embeddings

---
class: center, middle

| User      | Item  |
|-----------|-------|
| 19483     | 45243 |
| 95727     | 39572 |
| 76244     | 83773 |
| 2584      | 94723 |
| 2584      | 45243 |
| 23957     | 25892 |
| 2584      | 39572 |
| 49138     | 20481 |
| 19483     | 25892 |
| ...       | ...   |

???

Core of recsys training data is a set of pairs of large integers.

Often with a timestamp, user and item metadata and maybe a target
rating.

The same integers should appear several times.

---
# Symbolic variables

### Recommender Systems

Item ids, user ids

### Text tokens

Characters, words, bigrams..

### Categorical descriptors

Tags, movie genres, director name, visited URLs,
skills on a resume, product categories...

---
class: middle, center

## Symbol $s$ in vocabulary $V$

---
# One-hot representation

$$onehot(\text{'salad'}) = [0, 0, 1, ..., 0] \in \\{0, 1\\}^{|V|}$$
<br/><br/>
.diagram[
<img src="images/word_onehot.svg" style="width: 400px;" />
]

<br/>

- Sparse, discrete, large dimension $|V|$
- Each axis has a meaning
- Symbols are equidistant from each other

---
# Embedding

$$embedding(\text{'salad'}) = [3.28, -0.45, ... 7.11] \in \mathbb{R}^d$$
<br/>
--

- Continuous and dense
- Axis have no meaning _a priori_
- Embedding metric can capture semantic distance

???

Can represent a huge vocabulary in low dimension, typically: $d \in
\\{16, 32, ..., 4096\\}$

.left-column[
### Euclidean distance

$d(x,y) = || x - y ||_2$

]

.right-column[
### Cosine similarity

$cosine(x,y) = \frac{x \cdot y}{||x|| \cdot ||y||}$

]

???

Distance
- Simple with good properties
- Dependent on norm (embeddings usually unconstrained)

Cosine:
- Angle between points, regardless of norm
- $cosine(x,y) \in (-1,1)$
- Expected cosine similarity of random pairs of vectors is $0$

Alternatively: just the dot product

---
# Embedding as a Linear Layer

Equivalent to one-hot encoding multiplied by a weight matrix
  $\mathbf{W} \in \mathbb{R}^{n \times d}$:

$$embedding(s) = onehot(s) . \mathbf{W} $$

- $W$ initialized randomly

- Part of the model parameters

In Keras:

```py
# input: batch of integers
Embedding(output_dim=d, input_dim=n, input_length=1)
# output: batch of float vectors
```

---
class: center, middle

# Architectures

---
# RecSys with Explicit Feedback

.diagram[
<img src="images/rec_archi_1.svg" style="width: 680px;" />
]

---
# Deep RecSys Architecture

.diagram[
<img src="images/rec_archi_2.svg" style="width: 680px;" />
]

---
# Deep RecSys with metadata

.diagram[
<img src="images/rec_archi_3.svg" style="width: 680px;" />
]

---
# Implicit Feedback: Triplet loss

.diagram[
<img src="images/rec_archi_implicit_2.svg" style="width: 560px;" />
]

---
# Deep Triplet Networks

.diagram[
<img src="images/rec_archi_implicit_1.svg" style="width: 560px;" />
]

???

Pick pair of positive feedback (user i, item j)

Sample "negative" item k uniformly at random.

It also possible to use the current state of the model to
"mine" hard negatives.

---
.center[
<img src="images/youtube-neural-recsys.png" style="width: 75%;" />

Deep Neural Networks for YouTube Recommendations
https://research.google.com/pubs/pub45530.html
]

???
Alternative way to deal with Implicit Feedback:

- no user embedding in input, the user is represented by averaged
  embeddings of movie watches (and queries)

- the output is a classification loss: 1 for the next movie being
  watched by the user.

- possible to model users by sequences of passed interactions with
  ordering information

---
# Embed all the things!

### Discrete variables with linear embeddings

### Images and sounds with ConvNets

### Sequences of discrete tokens with RNNs

### Molecules with Graph ConvNets

...

???

Embed interaction data and content-based metadata naturally in shared
model:

Mix and match loss functions: predict clicks in search results, next
song after current song, related artist clicks...

Use pre-trained models with frozen weights on some parts: ImageNet CNN,
word embeddings...

Or fine-tune everything if very large amount user interaction data.

???

Other DL tools:

Dropout regularization, Adam optimizer, open source frameworks

---
class: middle, center

# Thank you for your attention!

## Notebooks with keras code linked as pinned tweet:

## @ogrisel

---
class: middle, center

# Ethical Considerations of Recommender Systems

---
# Ethical Considerations

### Amplification of existing discriminatory and unfair behaviors / bias

### Amplification of the filter bubble and opinion polarization

### Lack of transparency

???

Unfairness

- Example: gender bias in ad clicks
- Using the firstname as a predictive feature

Polarization

- People tend to unfollow people they don't agree with
- Ranking / filtering systems can further amplify this issue
- Optimizing for short-term clicks can promote clickbait contents

???

- Wise modeling choices (e.g. use of "firstname" as feature)
- How to allow users to assess fairness by themselves?
- How to allow for independent audits while respecting the privacy of
  users?
- Learning representations that actively enforce fairness?

# Fairness

Censoring Representations with an Adversary
Harrison Edwards, Amos Storkey, ICLR 2016
https://arxiv.org/abs/1511.05897

# Transparency

- http://www.datatransparencylab.org/
- TransAlgo initiative in France