8.4.2.9. sklearn.datasets.make_multilabel_classification¶
- sklearn.datasets.make_multilabel_classification(n_samples=100, n_features=20, n_classes=5, n_labels=2, length=50, allow_unlabeled=True, random_state=None)¶
- Generate a random multilabel classification problem. - For each sample, the generative process is:
- pick the number of labels: n ~ Poisson(n_labels)
- n times, choose a class c: c ~ Multinomial(theta)
- pick the document length: k ~ Poisson(length)
- k times, choose a word: w ~ Multinomial(theta_c)
 
 - In the above process, rejection sampling is used to make sure that n is never zero or more than n_classes, and that the document length is never zero. Likewise, we reject classes which have already been chosen. - Parameters : - n_samples : int, optional (default=100) - The number of samples. - n_features : int, optional (default=20) - The total number of features. - n_classes : int, optional (default=5) - The number of classes of the classification problem. - n_labels : int, optional (default=2) - The average number of labels per instance. Number of labels follows a Poisson distribution that never takes the value 0. - length : int, optional (default=50) - Sum of the features (number of words if documents). - allow_unlabeled : bool, optional (default=True) - If True, some instances might not belong to any class. - random_state : int, RandomState instance or None, optional (default=None) - If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. - Returns : - X : array of shape [n_samples, n_features] - The generated samples. - Y : list of tuples - The label sets. 
