Would it be beneficial to choose the size of data chunks based on the type of data, or its size? Text data might benefit from small chunks, whereas video data might benefit from large chunks, for example. Possibly the size of a file would be a sufficient indicator.

The goal would be to reduce the number of chunks, without reducing the effectiveness of de-duplication.

This needs to be measurements of real data to see if there are interesting parameters.

--liw

done interesting ideas, but old wishlist bug. --liw