Annotated transformer

This Harvard blog post is really nice. It filled in the detail implementation for the “Attention is all you need” paper.