best % data compression using only printable 8 bit ASCII

June 7, 2016 at 03:59:53
Specs: Win 7, 2Ghz/ 2Gb
The target is a JavaScript .js file putting the result in a variable that can be read. This limits the characters that can be used. I avoid base 64, Huffman encoding & binary because of this.
The object is to reduce download time on client-side JS.
The input is a 6000 set lexicon sorted alphabetically in these concatenated groupings

a) 2 letter words
b) sequences all numerals
c) sequences all caps
d) words

the groupings are encoded slightly differently as befits.

At present I achieve a reduction of 47% (+).

So, given the restriction of 8bit, what does the forum think the optimum compression might be? (if I was clever enough to implement such an algorithm).

Knowledge knows what it knows, intelligence knows when it doesn't.


See More: best % data compression using only printable 8 bit ASCII

Report •


#1
June 7, 2016 at 04:10:50
Don't cross 40%.
Actually 30-35% is good, 40% is the max you should prefer.
While archiving (assuming you're going to combine it in a package), there might be errors arised, due to loss of data and stuff.

message edited by jaysarma987


Report •

#2
June 7, 2016 at 07:45:10
If you're going for the best performance, you really should just pass the developer version of your scripts through any of the dozen javascript minifiers out there and enable gzip compression on the web server.

How To Ask Questions The Smart Way


Report •

#3
June 8, 2016 at 10:31:26
Thanx 4 comments
My current best algorithm gets to 47.4% reduction, and unless I have a lightbulb moment that is as good as it will get.
It has to be lossless - this is a lexicon, words mostly in English, and cannot rely on any server software because of the DVD distribution, it has to be stand alone. Client-Side in JavaScript. One version for both DVD & Web.

If I did it Server-Side and then DVD with the help of something like node.js I would still be de-bugging next year. And run two version - which for a hobby project is asking for trouble.

Knowledge knows what it knows, intelligence knows when it doesn't.


Report •

Related Solutions

#4
June 9, 2016 at 01:16:49
thanks for getting back to us! :)

Report •

#5
June 9, 2016 at 19:19:01
I'm a bit confused (as usual). 8 bits is not technically considered "printable", although all 256 ascii is (printable, that is, given a healthy code-page.) The 6000-value set reduces to 23.5 given 8 bits, but 47 if allowed only 7 bits, so the 6K becomes 24 chars. if 8-bit, or 47 if 7-bit (which is within the standard considered "printable ascii".) Then, you need to consider the trade-off in code-size vs. data compression. It does no good to compress data if the decompression code plus the data exceeds the alternative of simpler extraction or just "no compression". I believe either way, you will still have to include the conversion-table itself. That brings the elements to three: data, code, and table. I assume that the variable contains only the data, but your distribution software
will have to incorporate the other two elements in order to "inflate" the variable. Am I right? or just way off orbit? ps: I'm sure there are established methods to maximize compression of all requisite elements at any given threshold of "byte-size", since this would be a very common scenario both for legitimate software and viral agents. I know I haven't told you anything you didn't already know, Ha!.

message edited by nbrane


Report •


Ask Question