What are the best single characters for compression?
In example I'll be imploding a lot of property+values together separated by a character. I will then compress the string so I'm wondering since I can choose what character I can use for that separator which non-alpha/numeric character would yield the greatest compression level?
With the same type and level of compression for example five hundred periods versus five hundred question marks...I would presume on a half-educated guess that it takes fewer 1s and more 0s to define a dot then all the dots that makeup a question mark character and thus yield a smaller size/greater level of compression?
Thoughts?
Best non-alpha/numeric characters for compression?
Moderator: General Moderators
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Best non-alpha/numeric characters for compression?
Generally speaking - less entropy leads to bigger compression ratio. Entropy means how "chaos"-like is the information.
In your example case 500x. and 500x? will result the same compression ratio because of the "byte" based information blocks.
So, I think to choice of delimiter wouldn't affect the compression ratio.
In your example case 500x. and 500x? will result the same compression ratio because of the "byte" based information blocks.
So, I think to choice of delimiter wouldn't affect the compression ratio.
There are 10 types of people in this world, those who understand binary and those who don't
- JAB Creations
- DevNet Resident
- Posts: 2341
- Joined: Thu Jan 13, 2005 6:44 pm
- Location: Sarasota Florida
- Contact:
Re: Best non-alpha/numeric characters for compression?
Would I be correct to presume the following...
1.) That not all characters can be represented by a single byte (eight bits)?
2.) The bits in each byte can be directly compressed by some/all compression algorithms? For example I presume the character '0' byte would = '00000000' in bits which may compress better then a byte such as '011000111'?
1.) That not all characters can be represented by a single byte (eight bits)?
2.) The bits in each byte can be directly compressed by some/all compression algorithms? For example I presume the character '0' byte would = '00000000' in bits which may compress better then a byte such as '011000111'?
Re: Best non-alpha/numeric characters for compression?
Entropy based compression algorithms don't have prior knowledge where the data comes from (i.e. is it English text, picture or a video), so they use 1 byte block coding.
And for your example:
00000000 -> entropy = 0 (no chaos!) => compression ratio would be very big
011000111 -> entropy > 0 => compression ratio would be lower.
I've tried to RAR two files - one with 710x. and one with 710x? - the second one was 2 bytes smaller
, but it is also only 1-2 percent smaller.
And for your example:
00000000 -> entropy = 0 (no chaos!) => compression ratio would be very big
011000111 -> entropy > 0 => compression ratio would be lower.
I've tried to RAR two files - one with 710x. and one with 710x? - the second one was 2 bytes smaller
There are 10 types of people in this world, those who understand binary and those who don't