Discussion:
A New/Old code Just For Fun
(too old to reply)
Fiziwig
2010-07-20 17:22:25 UTC
Permalink
Just for the fun of it (I'm retired with nothing better to do with my
time :) I've created a new code (not cipher) based on a variation of
Huffman coding but using the Roman alphabet instead of binary bits to
encode English words instead of bytes of data. The code words are
variable length, but self-segregating so decoding is unambiguous in
all cases.

Encoded, the average is 2.35 Roman letters per English word, or about
half the size of the same message in English plaintext.

Complete description and downloadable 10,000 word code book at
http://fiziwig.com/crypto/huffcode.html

And remember, this is just for fun. Although with a software coder/
decoder it could be a semi-practical way to encrypt emails to friends.
Especially if an optional keyword-based enciphering layer were used.

--gary
Paulo Marques
2010-07-20 17:47:48 UTC
Permalink
Hi,
Post by Fiziwig
Just for the fun of it (I'm retired with nothing better to do with my
time :) I've created a new code (not cipher) based on a variation of
Huffman coding but using the Roman alphabet instead of binary bits to
encode English words instead of bytes of data. The code words are
variable length, but self-segregating so decoding is unambiguous in
all cases.
Encoded, the average is 2.35 Roman letters per English word, or about
half the size of the same message in English plaintext.
Complete description and downloadable 10,000 word code book at
http://fiziwig.com/crypto/huffcode.html
This seems like a cute project, but I don't understand why you didn't
use a standard huffman code to build the tables instead of using ad-hoc
rules. There are even projects available for building n-ary huffman trees:

http://sourceforge.net/projects/huffman-ta/

This will certainly decrease the average word size, since huffman is
optimal for discrete encodings with known frequencies.

(I'm sorry if I'm raining on your parade, though :( )
Post by Fiziwig
And remember, this is just for fun. Although with a software coder/
decoder it could be a semi-practical way to encrypt emails to friends.
Especially if an optional keyword-based enciphering layer were used.
Don't even think about going there! :)

This group takes cryptography seriously and your description just sounds
like an extremely "weak" cypher...

If you want a practical way to encrypt/decrypt emails to/from friends,
just take a look at gpg.
--
Paulo Marques
Software Development Department - Grupo PIE, S.A.
Phone: +351 252 290600, Fax: +351 252 290601
Web: www.grupopie.com

"This version has many new and good features. Sadly, the good ones are
not new and the new ones are not good."
Fiziwig
2010-07-20 20:52:30 UTC
Permalink
Post by Paulo Marques
Hi,
Post by Fiziwig
Just for the fun of it (I'm retired with nothing better to do with my
time :)
Encoded, the average is 2.35 Roman letters per English word, or about
half the size of the same message in English plaintext.
This seems like a cute project, but I don't understand why you didn't
use a standard huffman code to build the tables instead of using ad-hoc
http://sourceforge.net/projects/huffman-ta/
This will certainly decrease the average word size, since huffman is
optimal for discrete encodings with known frequencies.
I wanted to keep it as simple as possible for a human being using a
hard copy code book and pencil and paper. More optimal encodings,
while more efficient, would necessarily be more complex for use by
mere humans. And don't forget, if the empire falls and civilization
crumbles, future feudal lords and kings will once again have to rely
on paper and pencil cryptography. ;-)
Post by Paulo Marques
(I'm sorry if I'm raining on your parade, though :( )
No need. This is just for fun, and there's no ego invested in it. I
just wanted to see how far I could go with a modern remake of the old
telegrapher's code book idea.
Post by Paulo Marques
This group takes cryptography seriously and your description just sounds
like an extremely "weak" cypher...
Yes, very weak indeed. But I'm enjoying the project all the same. And
besides, Byrne's "weak" cipher resisted cracking for many decades.
(See my functional equivalent of the Choacipher machine at
http://fiziwig.com/crypto/tile1.html )
Post by Paulo Marques
If you want a practical way to encrypt/decrypt emails to/from friends,
just take a look at gpg.
yes, there are better ways to encrypt email for security, but this is
a pencil and paper system, which appeals to my retro nature.

--gary
Paulo Marques
2010-07-21 11:55:23 UTC
Permalink
Post by Fiziwig
Post by Paulo Marques
Hi,
[...]
I wanted to keep it as simple as possible for a human being using a
hard copy code book and pencil and paper. More optimal encodings,
while more efficient, would necessarily be more complex for use by
mere humans.
I don't see a reason for that. If you calculate the optimal encoding
using huffman and then write the "codebook" in alphabetical sorting
order, there is no reason for it to be more difficult to use.
Post by Fiziwig
And don't forget, if the empire falls and civilization
crumbles, future feudal lords and kings will once again have to rely
on paper and pencil cryptography. ;-)
Even on such a scenario, the huffman algorithm is easy enough to be
executed by hand...
Post by Fiziwig
[...]
Post by Paulo Marques
This group takes cryptography seriously and your description just sounds
like an extremely "weak" cypher...
Yes, very weak indeed. But I'm enjoying the project all the same. And
besides, Byrne's "weak" cipher resisted cracking for many decades.
(See my functional equivalent of the Choacipher machine at
http://fiziwig.com/crypto/tile1.html )
Yes, because it didn't respect the Kerckhoffs' principle [1], so only
cryptographers with nothing better to do would even consider looking at
it. The moment the algorithm was made public, it was just a question of
days (IIRC) before it fell apart...
Post by Fiziwig
Post by Paulo Marques
If you want a practical way to encrypt/decrypt emails to/from friends,
just take a look at gpg.
yes, there are better ways to encrypt email for security, but this is
a pencil and paper system, which appeals to my retro nature.
There are other pencil and paper systems out there, that have at least
not been completely broken yet and respect the Kerckhoffs' principle,
like Solitaire [2].
--
Paulo Marques - www.grupopie.com

"I used to be indecisive, but now I'm not so sure."

[1] http://en.wikipedia.org/wiki/Kerckhoffs%27_principle

[2] http://www.schneier.com/solitaire.html
http://en.wikipedia.org/wiki/Solitaire_%28cipher%29
Fiziwig
2010-07-21 19:12:02 UTC
Permalink
Post by Paulo Marques
Post by Fiziwig
Hi,
[...]
I wanted to keep it as simple as possible for a human being using a
hard copy code book and pencil and paper. More optimal encodings,
while more efficient, would necessarily be more complex for use by
mere humans.
I don't see a reason for that. If you calculate the optimal encoding
using huffman and then write the "codebook" in alphabetical sorting
order, there is no reason for it to be more difficult to use.
You right. I guess I was thinking in terms of splitting the ciphertext
stream into words. The rule that if you have a two or three letter
code group ending in {U,V,W,X,Y,Z} you need to take the next letter
from the ciphertext stream and add it to the code group made the self-
segregation _seem_ easier to perform. On the other hand, if the next
two code letters were "MQ" and there was no "MQ" in the code
dictionary, then you would have to include the next letter, say "R"
and look up "MQR", and so on until you found the shortest code group
in the ciphertext stream that had a dictionary equivalent.

Yes, that would work.

I stand corrected.

I shall have to rebuild the dictionary using the Huffman algorithm and
see what it looks like.
Post by Paulo Marques
Post by Fiziwig
And don't forget, if the empire falls and civilization
crumbles, future feudal lords and kings will once again have to rely
on paper and pencil cryptography. ;-)
Even on such a scenario, the huffman algorithm is easy enough to be
executed by hand...
Yes, I suppose that's true. thank you for your constructive input.

--gary
Paulo Marques
2010-07-22 12:09:07 UTC
Permalink
Post by Fiziwig
[...]
I shall have to rebuild the dictionary using the Huffman algorithm and
see what it looks like.
After tens of posts in sci.crypt trying to help people that didn't want
any help and took every constructive criticism as insults, this was a
real breath of fresh air. Thank you, Gary!

If you need help building the huffman-optimized codebook, please send me
the frequency table you're using and I'll try to help you out. I'm
curious about what kind of improvement on the average letters per word
value will the pure huffman give us.

Thanks again,
--
Paulo Marques - www.grupopie.com

"All generalizations are false."
Fiziwig
2010-07-22 13:17:11 UTC
Permalink
Post by Paulo Marques
Post by Fiziwig
[...]
I shall have to rebuild the dictionary using the Huffman algorithm and
see what it looks like.
After tens of posts in sci.crypt trying to help people that didn't want
any help and took every constructive criticism as insults, this was a
real breath of fresh air. Thank you, Gary!
If you need help building the huffman-optimized codebook, please send me
the frequency table you're using and I'll try to help you out. I'm
curious about what kind of improvement on the average letters per word
value will the pure huffman give us.
Thanks again,
You're welcome.

My frequency data comes from the count of occurrences of each word in
the 1 million word Brown Corpus. It contains about 14,000 words, after
removing bogus words caused by typos in the original source material.
I'm curious myself to see the results. I'm coding up a C++ program to
accomplish the task. I started to do it by hand, but that turned out
to be too big a job with 14,000 words!

As an aside, it might be fun to let each ply of the tree alternate
between consonant-headed subtrees and vowel-headed subtrees so that
every code word would alternate a vowel with a consonant and be
pronounceable. (UGABUGA) Then it could be a spoken code as well. Or
maybe that's more properly called a language, in which we've left
cryptography and entered linguistics. ;-)

--gary
Fiziwig
2010-07-22 22:38:47 UTC
Permalink
Post by Paulo Marques
If you need help building the huffman-optimized codebook, please send me
the frequency table you're using and I'll try to help you out. I'm
curious about what kind of improvement on the average letters per word
value will the pure huffman give us.
Having never programmed the Huffman Algorithm, I approached the
project with some trepidation. As it turned out, it was a piece of
cake. I used C++. My ONLY bugs were a couple of simple typos and one
uninitialized node pointer in the tree. Altogether it took me about an
hour to have it running, and another hour of fiddling with the format
of the output till I was happy with it. The only adaptation I had to
make to the basic Huffman algorithm was use an alphabet of 26 letters
instead of an alphabet of 2 letters. So instead of having two child
nodes, each parent node had a single list under it with 26 linked
entries.

If you are curious, the whole completed tree is available at
http://fiziwig.com/temp_huff.txt

Here's the sentence I used as a sample on my web page:

Here is a sample of the code used to encrypt this very sentence.
IH CL CG OYM CD CC WUG LA CF GXZU DE HO UXS <-- Original hand-made
code = 31 letters
AR PO E HXD I M CSF PXE F CSF ON CP DDI <-- Genuine Huffman algorithm
= 27 letters = 13% less space

(Note: My word list did not originally include "encrypt" so I cheated
and used "code" again. I had added it by hand into the first code book
at the end of the already used codes. That's not so easy to do with a
Huffman-coded book.)

Note that with the genuine Huffman algorithm the sentence had 4 words
that used one-letter codes. My hand made codebook had no single-letter
codes.

If I encoded the entire 1 million+ word Brown Corpus my results would
be

Total Words 1,075,609
Total Letters 2,476,824
Average letters per word 2.30

This is not that much better than my hand made code that worked out at
3.35 letters per word.

As a code for a living language there also needs to be room for
expansion within the coding scheme, without breaking the already coded
words, which would mean salting the original word file with a lot of
nulls that could later be replaced by real words as that became
necessary. I'm not sure how that would affect the efficiency of the
coding. My hand-made system left expansion gaps (for example, all
codes ending in "T", regardless of length, were unassigned and set
aside for future expansion)

I also noticed that the Huffman code reached deeper into the alphabet
to do its coding. The highest code is PZZZ leaving around 176,000
codes available for expansion. In my hand-made code book the highest
code I used was GYVF, with about 335,000 available for expansion. Both
are perfectly adequate.

I have a hunch I could factor out morphemes from the word list and cut
it in half or better. That way words like "unfortunately" would be
coded as separate morphemes: "un + fortunate + ly" Of course that
would mean a minimum of 4 letters for that word (assuming at least 2
for "fortunate"), while the present Huffman code for "unfortunately"
is "AUN". So I'm not sure morpheme factoring would help compression,
although it would certainly help dictionary size.

I do like the genuine Huffman tree, though. It's clean and efficient.
Thanks for your help and suggestions.

I think next I'll play with alternating consonants and vowels so every
code word is pronounceable. :)

--gary
MrD
2010-07-23 07:50:15 UTC
Permalink
Post by Fiziwig
Total Words 1,075,609
Total Letters 2,476,824
Average letters per word 2.30
This is not that much better than my hand made code that worked out
at 3.35 letters per word.
30% smaller is nothing to be sneezed at.
Post by Fiziwig
As a code for a living language there also needs to be room for
expansion within the coding scheme, without breaking the already
coded words, which would mean salting the original word file with a
lot of nulls that could later be replaced by real words as that
became necessary.
Have you said how many *distinct* words there are in your corpus? Sorry
if you've already said, I couldn't find it.

I believe Dr. Seuss deliberately chose a lexicon of just 300 words; that
one can get by in English with about 1,000 words; most native English
speakers manage all their lives with a lexicon of about 3,000 words, and
that Winston Churchill commanded a lexicon of about 12,000 ("12,000
strong", as adacrypt might say).

How do you handle stemming? E.g. is each form[1] of a verb treated as
a distinct word? I can't see how else you'd do it (without some
complicated grammar rules). So without a stemmer, I suppose the standard
lexicon might grow to about 15,000 and the Churchill lexicon might grow
to about 60,000. The Seuss lexicon probably wouldn't grow that much,
because in addition to his miniature lexicon, Seuss also restricted
himself to a pretty rudimentary grammar (everything is present-tense
indicative, etc.)

Actually I don't have the first clue how much a lexicon shrinks/grows by
as a result of stemming; those numbers are unadulterated guesswork. But
people who work on text-retrieval tools would know the facts.
--
MrD.
[1] MrD is stuck for the right word; he tried "declension", but knew
that wasn't what he meant. MrD thinks Churchill would have known the
right word.
Fiziwig
2010-07-23 16:19:44 UTC
Permalink
Post by MrD
Have you said how many *distinct* words there are in your corpus? Sorry
if you've already said, I couldn't find it.
No, I haven't worked that out. It would be nice to compress out the
trivial duplicates like plurals. What's the sense of having both
"apple" and "apples" as separate entries?
Post by MrD
I believe Dr. Seuss deliberately chose a lexicon of just 300 words; that
one can get by in English with about 1,000 words; most native English
speakers manage all their lives with a lexicon of about 3,000 words, and
that Winston Churchill commanded a lexicon of about 12,000 ("12,000
strong", as adacrypt might say).
Just this morning I found a nifty list of the 2,000 "most essential"
English words, and it even includes frequency data! http://jbauman.com/gsl.html
Post by MrD
How do you handle stemming? E.g. is each form[1] of a verb treated as
a distinct word? I can't see how else you'd do it (without some
complicated grammar rules). So without a stemmer, I suppose the standard
lexicon might grow to about 15,000 and the Churchill lexicon might grow
to about 60,000. The Seuss lexicon probably wouldn't grow that much,
because in addition to his miniature lexicon, Seuss also restricted
himself to a pretty rudimentary grammar (everything is present-tense
indicative, etc.)
I haven't addressed that yet. First I need to enumerate all the
various ways that one word can be derived from another. (quick,
quickly, quicker, quickest, quickness, quickosity,
quickability, ... :)
Post by MrD
Actually I don't have the first clue how much a lexicon shrinks/grows by
as a result of stemming; those numbers are unadulterated guesswork. But
people who work on text-retrieval tools would know the facts.
I don't know either. It would an interesting topic to explore. But
then, now we're really far afield from cryptography.

--gary
Paulo Marques
2010-07-23 12:25:31 UTC
Permalink
Post by Fiziwig
[...]
Having never programmed the Huffman Algorithm, I approached the
project with some trepidation. As it turned out, it was a piece of
cake. I used C++. My ONLY bugs were a couple of simple typos and one
uninitialized node pointer in the tree. Altogether it took me about an
hour to have it running, and another hour of fiddling with the format
of the output till I was happy with it. The only adaptation I had to
make to the basic Huffman algorithm was use an alphabet of 26 letters
instead of an alphabet of 2 letters. So instead of having two child
nodes, each parent node had a single list under it with 26 linked
entries.
If you are curious, the whole completed tree is available at
http://fiziwig.com/temp_huff.txt
Nice work!

I'm afraid you still have a bug, though :(

When building N-ary huffman trees you have to make sure the number of
elements fill up a N-branched tree _completely_.

This means that for a 26-ary tree, the number of starting elements needs
to be "25 * N + 26" for some integer N. If they're not, you need to add
dummy zero frequency elements to the end of the tree before start
building it.

For instance, if you have 12340 elements -> (12340 - 26) mod 25 = 14, so
you need to add (25-14) = 11 dumy elements, before starting to collapse
the tree. I'm doing this all from memory, so I hope I'm getting the math
right...

That is why in the end, the algorithm is only using letters from A to P
as the first letter and not going up to Z. So, the end result after
inserting the dummy elements should be even better than what you have now.
Post by Fiziwig
[...]
Note that with the genuine Huffman algorithm the sentence had 4 words
that used one-letter codes. My hand made codebook had no single-letter
codes.
That is one thing I was curious about: if the optimized version would
find words with such a large frequency that deserved their own single
symbol. Thanks for satisfying my curiosity ;)
Post by Fiziwig
If I encoded the entire 1 million+ word Brown Corpus my results would
be
Total Words 1,075,609
Total Letters 2,476,824
Average letters per word 2.30
This is not that much better than my hand made code that worked out at
3.35 letters per word.
The total "original letters including spaces" (i.e. roughly the total
size of the corpus) would be a nice statistic too, so that we could
appreciate the compression factor of both methods.

Anyway, as MrD pointed out, 30% is not that bad. I could even bend the
statistics the other way around and say your original method increased
the size by 46% ;)
Post by Fiziwig
[...]
I also noticed that the Huffman code reached deeper into the alphabet
to do its coding. The highest code is PZZZ leaving around 176,000
codes available for expansion. In my hand-made code book the highest
code I used was GYVF, with about 335,000 available for expansion. Both
are perfectly adequate.
I think that if you want to have expansion capability you shouldn't
leave it to chance. You can add "expansion symbols" explicitly to the
end of the table to allow for expansion, with some appropriate "expected
frequency" for them.

The number of symbols available for expansion is another thing: if don't
expect any expansion, but want to "be prepared" for it, you might even
just reserve a single word for expansion and then, when that word
appeared, it meant the next N chars described the expansion. With N=4
you would already be able to get 26^4 = 456976 expansion words, wasting
only a single symbol on the table.
Post by Fiziwig
I have a hunch I could factor out morphemes from the word list and cut
it in half or better. That way words like "unfortunately" would be
coded as separate morphemes: "un + fortunate + ly" Of course that
would mean a minimum of 4 letters for that word (assuming at least 2
for "fortunate"), while the present Huffman code for "unfortunately"
is "AUN". So I'm not sure morpheme factoring would help compression,
although it would certainly help dictionary size.
I do like the genuine Huffman tree, though. It's clean and efficient.
Thanks for your help and suggestions.
You're welcome.
Post by Fiziwig
I think next I'll play with alternating consonants and vowels so every
code word is pronounceable. :)
It will certainly be bigger, though ;)
--
Paulo Marques - www.grupopie.com

"...so she told me it was either her or the ham radio, over."
Greg Rose
2010-07-23 16:07:37 UTC
Permalink
Post by Paulo Marques
The total "original letters including spaces" (i.e. roughly the total
size of the corpus) would be a nice statistic too, so that we could
appreciate the compression factor of both methods.
Slightly out of context, I know, but I think it is
worth mentioning that with Huffman coding you can
also leave out the spaces. As you traverse the
tree, arriving at a leaf determines the end of the
word. That would make pronounceability a bit
tougher though.

Greg.
--
Fiziwig
2010-07-23 17:23:39 UTC
Permalink
Post by Paulo Marques
I'm afraid you still have a bug, though :(
When building N-ary huffman trees you have to make sure the number of
elements fill up a N-branched tree _completely_.
This means that for a 26-ary tree, the number of starting elements needs
to be "25 * N + 26" for some integer N. If they're not, you need to add
dummy zero frequency elements to the end of the tree before start
building it.
I looked it up. The number of starting elements needs to be congruent
to 1 mod n-1, so it has to be of the form 25X + 1.

I had 9315 words, so I added 11 DUMMY words with 0 probability to make
9326. The results were strange.

Total Words 1075617
Total Letters 1246124
Average letters per word 1.16

The program claims an average of 1.16 letters per word, but that
seemed suspiciously low to me so I looked at the numbers.

The top 14 words are 1-letter codes. Those top 14 words make up 26% of
a typical English text. The next top 225 words are 2-letter codes, and
the whole 239 top words make up 62% of a typical English text. So with
26% * 1 + 36% * 2 and the remaining 38% being mostly 3-letter codes
that works out to 2.12 letters per word (approximately since I didn't
account for those few 4-letter codes) so I suspect there's something
wrong with the calculation done by the program. Although I don't see
what could be wrong. It's simply the total number of code letters used
divided by the total of all word frequencies, and my original corpus
was slightly more than a million words, so that number looks right.
I'll double check it.

Even so, 2.12 is about 8% better than the previous 2.30.

Thanks for spotting my oversight.

-gary
Paulo Marques
2010-07-23 18:06:47 UTC
Permalink
Post by Fiziwig
[...]
Post by Paulo Marques
This means that for a 26-ary tree, the number of starting elements needs
to be "25 * N + 26" for some integer N. If they're not, you need to add
dummy zero frequency elements to the end of the tree before start
building it.
I looked it up. The number of starting elements needs to be congruent
to 1 mod n-1, so it has to be of the form 25X + 1.
My math didn't fail me too much, then :)
Post by Fiziwig
[...]
Although I don't see
what could be wrong. It's simply the total number of code letters used
divided by the total of all word frequencies, and my original corpus
was slightly more than a million words, so that number looks right.
I'll double check it.
It should be something like: sum_for_all_words(frequency * code_letters)
/ sum_for_all_words(frequency). I.e. the total number of letters used to
encode the corpus divided by to total number of words. This should give
the average letters per word used to encode the complete corpus.

If you need help debugging the code, you can send it to me privately.
I'm usually good at spotting other people's bugs. I just wish I could
use that superpower for my own programs :(
--
Paulo Marques - www.grupopie.com

"C++ : increment the value of C and use the old one"
Fiziwig
2010-07-23 19:08:08 UTC
Permalink
Post by Paulo Marques
It should be something like: sum_for_all_words(frequency * code_letters)
/ sum_for_all_words(frequency). I.e. the total number of letters used to
encode the corpus divided by to total number of words. This should give
the average letters per word used to encode the complete corpus.
If you need help debugging the code, you can send it to me privately.
I'm usually good at spotting other people's bugs. I just wish I could
use that superpower for my own programs :(
I found it. In my recursive display function I misplaced one line of
code so I was taking "strlen( tag )* lpScan->weight" before I appended
the final letter for this branch to the tag, so I ended up counting
the length of all the tags as one less than they should have been. I
fixed that and got:

Total Words 1075617
Total Letters 2321741
Average letters per word 2.16

SO overall, Huffman gets 2.16 vs my hand-made 2.35, or 8% improvement.

But more important, I learned a lot by doing this exercise. :)

BTW: as an alternative for making pronounceable codes, I discovered
the best approach is to build codes purely out of consonants, and then
add any old vowels you please when you use the codes. The human ear is
better at picking harmonious vowels than any program could be. So PTN
could be pronounced "patuma", or "aputiamu", or whatever you like,
without disturbing the self-segregating property. Adding a few rules
like "X" = "sh", "C" = "ch", and "Q"="th" makes even oddballs like XQ,
and CCN easy: "shathu", "chachani". Move over Apache Code Talkers. You
have met your match. :)

--gary
MrD
2010-07-24 07:45:40 UTC
Permalink
So PTN could be pronounced "patuma", or "aputiamu", or whatever you
like,
More like "potion" or "patina" or "epitonia", but not "Opountia". If I
understand you correctly.
--
MrD.
rossum
2010-07-24 12:01:24 UTC
Permalink
Post by Fiziwig
Move over Apache Code Talkers.
I thought they were Navaho Code Talkers, or were the Apache used as
well?

rossum
Fiziwig
2010-07-24 14:20:19 UTC
Permalink
Post by rossum
Post by Fiziwig
Move over Apache Code Talkers.
I thought they were Navaho Code Talkers, or were the Apache used as
well?
rossum
I looked it up. I stand corrected. They were Navajo, Cherokee, Choctaw
and Comanche. No Apache. Even at my advanced age I learn something new
every day. :)

--gary
David Eather
2010-07-24 21:14:45 UTC
Permalink
Post by Fiziwig
Post by rossum
Post by Fiziwig
Move over Apache Code Talkers.
I thought they were Navaho Code Talkers, or were the Apache used as
well?
rossum
I looked it up. I stand corrected. They were Navajo, Cherokee, Choctaw
and Comanche. No Apache. Even at my advanced age I learn something new
every day. :)
--gary
The American army used a different tribe for code talkers in WWI - I
don't remember which.

rossum
2010-07-21 20:59:51 UTC
Permalink
Post by Fiziwig
I wanted to keep it as simple as possible for a human being using a
hard copy code book and pencil and paper.
Playfair.

rossum
WTShaw
2010-07-21 23:25:38 UTC
Permalink
Post by rossum
Post by Fiziwig
I wanted to keep it as simple as possible for a human being using a
hard copy code book and pencil and paper.
Playfair.
rossum
Playfair, not so great. Yes, hard by hand, but vulnerable by machine.
rossum
2010-07-22 15:36:11 UTC
Permalink
Post by WTShaw
Post by rossum
Post by Fiziwig
I wanted to keep it as simple as possible for a human being using a
hard copy code book and pencil and paper.
Playfair.
rossum
Playfair, not so great. Yes, hard by hand, but vulnerable by machine.
Yes it is breakable by machine but it is easy to implement with just
pencil and paper.

If I wanted an easy to implement computer cypher I would pick RC4
which is simple enough to be coded from memory.

rossum
Mok-Kong Shen
2010-07-20 18:06:34 UTC
Permalink
Post by Fiziwig
Encoded, the average is 2.35 Roman letters per English word, or about
half the size of the same message in English plaintext. [snip]
Question: Would the efficiency be better if a larger alphabet, e.g.
both upper and lower case, be used?

(BTW, a different line of thought is in my post "A scheme of dictionary
coding of English words" of 29.06.2010.)

M. K. Shen
Fiziwig
2010-07-20 20:57:19 UTC
Permalink
Post by Mok-Kong Shen
Post by Fiziwig
Encoded, the average is 2.35 Roman letters per English word, or about
half the size of the same message in English plaintext. [snip]
Question: Would the efficiency be better if a larger alphabet, e.g.
both upper and lower case, be used?
(BTW, a different line of thought is in my post "A scheme of dictionary
coding of English words" of 29.06.2010.)
M. K. Shen
I'm sure the compression would be better using upper and lower case,
however, as a paper-and-pencil system there is too much room for
confusion over uppercase and lower case hand-printed letters. In
particular Cc, Kk, Ll, Oo, Pp, Ss, Uu, Vv, Xx, Yy, Zz. Since it is
meant to be a paper-and-pencil system, that would make it too error-
prone.

I looked at your system. Very compact, but again, I prefer to stick to
all upper case Roman letters for hand-written clarity.

--gary
WTShaw
2010-07-21 23:23:06 UTC
Permalink
Post by Fiziwig
Post by Mok-Kong Shen
Post by Fiziwig
Encoded, the average is 2.35 Roman letters per English word, or about
half the size of the same message in English plaintext. [snip]
Question: Would the efficiency be better if a larger alphabet, e.g.
both upper and lower case, be used?
(BTW, a different line of thought is in my post "A scheme of dictionary
coding of English words" of 29.06.2010.)
M. K. Shen
I'm sure the compression would be better using upper and lower case,
however, as a paper-and-pencil system there is too much room for
confusion over uppercase and lower case hand-printed letters. In
particular Cc, Kk, Ll, Oo, Pp, Ss, Uu, Vv, Xx, Yy, Zz. Since it is
meant to be a paper-and-pencil system, that would make it too error-
prone.
I looked at your system. Very compact, but again, I prefer to stick to
all upper case Roman letters for hand-written clarity.
--gary
For had work, it is good that you see the error prone methods first
hand. The same is with the addition of digits as only a few can be so
added.
John Nagle
2010-07-22 18:40:09 UTC
Permalink
Post by Fiziwig
Just for the fun of it (I'm retired with nothing better to do with my
time :) I've created a new code (not cipher) based on a variation of
Huffman coding but using the Roman alphabet instead of binary bits to
encode English words instead of bytes of data. The code words are
variable length, but self-segregating so decoding is unambiguous in
all cases.
Some special-purpose handheld devices used to compress text with
a codebook. This allowed low end devices to store an entire Bible
or encyclopedia.

I once encountered a handheld "Bible" device with an off-by-one
error in the codebook. Long words were being replaced by different
words which were alphabetically nearby the correct word. Very strange.

John Nagle
Continue reading on narkive:
Loading...