ago
0 like 0 dislike
0 like 0 dislike
I am reading variable-width characters from an arbitrarily long stream of bytes. My decoder is faster in bulk, so I read characters into an N character buffer, then pull characters from the buffer. When I read to the end of one buffer, I re-fill it from the current position of the byte stream. So far, so good, completely standard.

Except, I need to be able to (quickly) rewind the entire stream to an arbitrary position. I could keep a small handful of recent buffers in memory, but the stream is arbitrarily long so I don't want to try to keep all of it in memory.

If I try to read character at position X outside the current buffer, I need to be able to find the stream position P that would have been the start position of the buffer that character was originally found in, reset the stream to P and refill the buffer.

The index of the first character in a buffer is always a multiple of the buffer size, so I can modulo X down to the character index of the start of a buffer, then I need to quickly find the corresponding stream position P and rewind to it.

Given an index X, what data structure would I use to find the "biggest without going over" start position? Hashmaps seem wasteful because all the values are multiples of N so I worry about collisions (unless I make the hash function more complicated, and more expensive). Interval trees (and any tree) seem like they're going to be re-balanced a lot because I keep adding to the end but never to the middle. Skip lists seem like the right "shape" of the solution, but I don't like the probabilistic nature of them. Any suggestions?
ago
0 like 0 dislike
0 like 0 dislike
How are you rewinding an arbitrarily long stream? If it's not ephemeral then you're reading from some kind of storage already.

Are you asking to index the first occurrence of each character? Or only relative to the current position? What is the character size?

Edit: you said variable width, so how do you decide the arbitrary position? That's what you need to build the index around. The most naive would be restart from the beginning of the input. The only way to do it in 1 jump would be represent every character to the buffer it was first found in.
ago
0 like 0 dislike
0 like 0 dislike
memory mapped file? let the OS handle the buffering for you.
ago

No related questions found

33.4k questions

135k answers

0 comments

33.7k users

OhhAskMe is a math solving hub where high school and university students ask and answer loads of math questions, discuss the latest in math, and share their knowledge. It’s 100% free!