I am reading variable-width characters from an arbitrarily long stream of bytes. My decoder is faster in bulk, so I read characters into an N character buffer, then pull characters from the buffer. When I read to the end of one buffer, I re-fill it from the current position of the byte stream. So far, so good, completely standard.
Except, I need to be able to (quickly) rewind the entire stream to an arbitrary position. I could keep a small handful of recent buffers in memory, but the stream is arbitrarily long so I don't want to try to keep all of it in memory.
If I try to read character at position X outside the current buffer, I need to be able to find the stream position P that would have been the start position of the buffer that character was originally found in, reset the stream to P and refill the buffer.
The index of the first character in a buffer is always a multiple of the buffer size, so I can modulo X down to the character index of the start of a buffer, then I need to quickly find the corresponding stream position P and rewind to it.
Given an index X, what data structure would I use to find the "biggest without going over" start position? Hashmaps seem wasteful because all the values are multiples of N so I worry about collisions (unless I make the hash function more complicated, and more expensive). Interval trees (and any tree) seem like they're going to be re-balanced a lot because I keep adding to the end but never to the middle. Skip lists seem like the right "shape" of the solution, but I don't like the probabilistic nature of them. Any suggestions?