The nice thing about circular buffers is that they can be made lock-free by making each pointer only ever modified by one function, ie. get modifies the head and put modifies the tail.
The solution in the article modifies both head and tail in the get function (when subtracting the page size to put the buffer back into the first page) which makes synchronization necessary to avoid races.
The author could actually make this implementation lock-free too, by making only the get function perform the subtraction on the head whilst the put function performs the subtraction on the tail.
You would then just need a little bit of extra logic when calculating the current size, but then you'd have a lock-free data structure.