We present a new 'learning-to-learn'-type approach for small-to-medium sized training sets. At the core lies a deep architecture (a Set2Model network) that maps sets of examples to simple generative probabilistic models such as Gaussians or mixtures of Gaussians in the space of high-dimensional descriptors. The parameters of the embedding into the descriptor space are discriminatively trained in the end-to-end fashion. The main technical novelty of our approach is the derivation of the backprop process through the mixture model fitting. A trained Set2Model network facilitates learning in the cases when no negative examples are available, and whenever the concept being learned is polysemous or represented by noisy training sets. Among other experiments, we demonstrate that these properties allow Set2Model networks to pick visual concepts from the raw outputs of Internet image search engines better than a set of strong baselines.