The success of neural networks is typically attributed to their ability to closely mimic relationships between features and labels observed in the training dataset. This, however, is only part of the answer: in addition to being fit to data, neural networks have been shown to be useful priors on the conditional distribution of labels given features and can be used as such even in the absence of trustworthy training labels. This feature of neural networks can be harnessed to train high quality models on low quality training data in tasks for which large high-quality ground truth datasets don’t exist. One of these problems is assertion classification in biomedical texts: discriminating between positive, negative and speculative statements about certain pathologies a patient may have. We present an assertion classification methodology based on recurrent neural networks, attention mechanism and two flavours of transfer learning (language modelling and heuristic annotation) that achieves state of the art results on MIMIC-CXR radiology reports.