Frank Odom
Dec 23, 2021

--

Sorry for the late response here. Yes, I intentionally left off the masked attention for the sake of simplicity. In fact, *every* decoder layer would need to include masked attention -- not just the first one.

--

--

No responses yet