Optimal encoding scheme for semi-rewritable memory?

Let’s define a “semi-rewritable” memory device as having the following properties:

  • The initial blank media is initialised with all zeroes.
  • When writing to the media, individual zeroes can be turned into ones.
  • Ones can not be turned back into zeroes.

Making a physical interpretation of this is easy. Consider for instance a punch card where new holes can easily be made, but old holes can not be filled.

What makes this different from a “write once, read many” device is that a used device can be rewritten (multiple times), at the cost of reduced capacity for each rewrite.

Implicit assumptions I would like to make explicit:

  1. The memory reader has no information about what was previously written on the device. It can therefore not be relied upon to use a mechanism such as “which symbols have been changed?” to encode data on a device rewrite. That is, the reader is stateless.
  2. On the other hand, different “generations” of the device may use different encoding schemes as the available capacity shrinks.
  3. The data stored can assumed to be random bits.

Sample storage scheme, to demonstrate rewrite capability:

Information in this scheme is stored on the device as pairs of binary symbols, each pair encoding one of the three states of a ternary symbol, or [DISCARDED] in the case where both symbols have been written.

The first generation thus stores data at a density of $ \frac{log_2(3)}{2} \approx 0.79$ times that of simple binary encoding.

When the device is rewritten, the encoder considers each pair of binary symbols in sequence. If the existing state matches the one it desires to write, the encoder considers the data written. If on the other hand the pair doesn’t match, it writes the necessary modification to that pair, or in the case where that isn’t possible, writes the symbol [DISCARDED] and considers the next pair instead until it has successfully written the ternary symbol.

As such, every rewrite would discard $ \frac{4}{9}$ of existing capacity.

For a large number of cycles, the device would in sum have stored $ \frac{9log_2(3)}{8} \approx 1.78$ times the data of a simple one-time binary encoding.

(For a variation of the above, one could also encode the first generation in binary and then apply this scheme on every subsequent generation. The loss from the first generation to the second would be larger, and the total life time capacity reduced, but the initial capacity would be larger).

Question:

  1. Is it possible to have a better life-time capacity than $ \frac{9log_2(3)}{8}$ ? I suspect the the real asymptotic capacity is 2.

  2. Can a scheme do better than having $ \frac{4}{9}$ capacity loss between rewrites?

How secure is this password scheme?

I use a password scheme where I keep a small number of easy to remember personal passwords. Instead of using the passwords directly for each service, I run them through a hashing algorithm first, as a sort of a seed, together with the name of the actual service. I then use the resulting hash as my actual password for the service. (There’s some more to it, I add some extra fixed letters to satisfy some normal password requirements, but let’s look away from that in this question.)

The pattern looks like this (using SHA512, and keeping just the 12 first characters of the resulting hash):

"my_p4SSWord!"             +             =>        SHA512        =>   "d4679b768229"    "Facebook"  "my_p4SSWord!"             +             =>        SHA512        =>   "182c5c2d4a2c"    "LinkedIn" 

The pattern allows me, not to remember all of my online passwords, but to remember how to easily re-create them, whenever I need to.

There are lots of online services for calculating hashes, and I currently use this one:

https://emn178.github.io/online-tools/sha512.html

My question to the security experts is, how secure is this personal scheme of mine really? I truncate the hashes to just 12 characters. How does that affect the real crackability of my passwords? Also, I also use SHA512. How does it affect my scheme, as a contrast to using for instance bcrypt?

Any comments?

Name of binary encoding scheme for integer numbers

I once found on Wikipedia a nice technique for encoding $ k \in (2^{n-1}, 2^n)$ uniformly distributed integer numbers with less then $ \log_2n$ average bits/symbol, thanks to a simple to compute variable length code. Basically it used $ \log_2n$ for some symbols and $ \log_2n – 1$ for some others.

Unfortunately all my Googling has failed me. I recall something similar to “variable length binary”, but I keep ending on VLQ which are a different beast. Since I know your memory better than mine, can you help me?

Is there any existing obfuscation scheme that makes cipher text indistinguishable from plain text? [migrated]

Suppose a totalitarian government (in the name of anti-terrorism / protection of intellectual property):

  1. has outlawed encryption itself – encryption is only approved for cases where the state has reviewed the design and made sure it can decrypt/inspect the message, and made any unapproved encryption a criminal offense
  2. has total control over anything in and out of the network at ISP-level, as well as anything that passes through web services

How could two citizens Alice and Bob, using approved (and monitored) instant messaging service to set up a secure line of communication, conceal the fact that the communication is encrypted, i.e. to make it indistinguishable from unencrypted data, or at least, make it computationally- or financially-infeasible to distinguish it from plain text?

For example, no one would assume the following message to be encrypted:

  • Across the Great Wall, we can reach every corner in the world.

But it would be assumed that the following is:

  • WZ2A805Wq3rzpiuzE+ZCulgDrn76pVRW5PVUJ4DDadFQD4P9PsTeegbo5CAkqI4yZrO//p
    sYT+ZQkqZ6IrSGng==

  • 599D80F34E56AB7AF3A62BB313E642BA
    5803AE7EFAA55456E4F5542780C369D1
    500F83FD3EC4DE7A06E8E42024A88E32
    66B3BFFE9B184FE65092A67A22B4869E

For the purpose of this question, we assume the following technical details:

  1. the IM service is text-only, binary data is not allowed (in an IM setting, sending primarily small binary fragments back and forth would probably raise suspicion anyway)
  2. communication between Alice and the IM service, Bob and the IM service, are both end-to-end encrypted. A government agent Eve has a copy of the decryption key the IM service used
  3. proof that the message is encrypted is not required. I.e. Eve does not need to know the plain text or the algorithm used to produce the cipher text. She only needs to tell, with a reasonably-low false-positive rate, if a message is the result of an encryption
  4. the endpoint is secure, no backdoor or malware on the computer/router, etc.

I’d like to know if there are any reliable research on this, is it feasible or not, and if feasible, any existing protocol or algorithm developed for this?

Eve, in case you are watching, I’m asking this for academic purposes only. 😄

Secret sharing scheme for disaster recovery with asymmetric key?

I want to backup many secrets to make sure my family can access those secrets in case of dead using a secret sharing scheme, but I need a way to keep adding secrets (without distributing new shares).

I can encrypt all the data with a symmetric cipher and then distribute that passphrase, but then I have two options, but then to add new secrets I need to have a copy of the key (adding an attack vector I don’t like).

Is there any way to use a secret sharing scheme keeping a public key to encrypt more data (using that key), while sharing the private key?

How would anyone ever break even this basic, amateurish cipher/encryption scheme?

This question was prompted by a long Wikipedia session with me reading tons of articles on cryptography, causing far more questions than it answered.

Let’s say that I and another person know each other. We plan to do something important and dangerous. We need to send messages back and forth long-distance. We conclude that all purchasable hardware and software is compromised, and therefore devise our own scheme:

  1. I pull out the network cable from my computer, randomly generate a huge table numbered like 1, 2, 3, 4, 5… both horizontally and vertically, filled with random alphabetic letters, fitting on a standard A4 paper.
  2. I print out two copies of this table.
  3. I destroy the computer.
  4. I keep one copy myself and give the other copy to the other person, who is sitting with me.
  5. I tell him that, in order to send a message to me, or decrypt messages from me, he is to find any letter in the table corresponding to the character he needs to type in English, for example “A”, and check which number column and row it exists in. For example, it may be in the 3rd column on the 16th row. That means he is supposed to type “3” followed by a randomly picked letter followed by “16” on the blank paper, followed by another random letter. He is then to continue like this until he has a message such as:

    1A56B8L13X8C32E9L210D…

First of all, how would anyone ever be able to tell that the letters are all nonsensical and not used for anything other than separating the numbers? And even if they did, what do the numbers mean? They have no way of knowing this unless they have a copy of our sheet which only exists in two copies in the world and was generated by an offline computer which is now physically destroyed.

And we wouldn’t be using the same column+row value each time for each letter, as they are found many times around the table.

And, what if to further complicate everything, we decide to write the messages in reverse? Or to do every other look-up in reverse, so that the columns and rows are swapped every other character? With just a few simple rules like that, it seems like they could never, ever decrypt our messages, even with the most powerful computers in the world.

I probably am making a fool out of myself here, but I seriously don’t understand how anyone, no matter how smart, given unlimited time, could ever break this cipher/encryption scheme which I just came up with quickly without having any expertise in the field. I clearly must be missing something.

Will extra rules to my diceware list generation scheme decrease security?

I finished reading the Code Book by Simon Singh, I’m interested in playing with some of the ideas in the book to help increase my own understanding. I don’t intend to implement the following in any consequential settings; I’m only interested in exploring the security implications.

I want to generate alternate diceware lists that have quirks, like each word is typed with the left hand only, or keystrokes alternating hands when typing. Assuming I can generate 7776 different strings, and am able to follow all of the other guidelines of diceware, are all diceware lists equally secure?

In the German Enigma Machine no letter could be encoded to itself (ex, a cannot be encoded to a). This detail helped crack the code. However I don’t think this applies here, the strength of the password doesn’t rely on a cipher. I don’t see why 6 or 7 strings randomly chosen from a list of 7776 wouldn’t have the same entropy, no matter the list. Theoretically, it could just consist of 7776 different binary lines couldn’t it?

I understand that additional rules to password generation sometimes decrease security. If an attacker knows my diceware list, does it matter if every entry consists of only 15 unique characters of the left hand? Is there less entropy?

What’s the algorithm behind MySQL’s sha256_password hashing scheme?

MySQL’s old mysql_native_password hashing scheme was the equivalent of this in PHP:

sha1(sha1('password', true)); 

That’s a hex-encoded SHA-1 hash of a binary SHA-1 hash of the password, without any salting.

MySQL 8.0 introduced a two variants of a new hashing scheme based on SHA256 called caching_sha2_password and sha256_password, the former being the default (docs. Despite their name, neither appears to be vanilla SHA256.

Yes, I know SHA256 is not a great choice for password hashing, but it’s a lot better than SHA-1 and it wasn’t up to me!

Can anyone tell me the actual algorithms for these new schemes, in PHP or similar code?

What is meant by the term “concatenation of two q’s denotes a break between two edges in Turing Machine T”? [Universal Turing Machine Encoding Scheme]

I’m studying the topic of universal turing machine encoding and the first line says we can write the turing machine encoding in the form of syllables like

qxcyczMqz

where q’s are representing states and c’s are characters

M denotes either left or right move

I’ve understood what these lines mean but what does qxqz mean? or what does qxqx mean? I’m quite confused there’s no read/write or tapehead movement what does this all stand for?

Is this security scheme using passwords, short-lived access JWTs, and long-lived refresh tokens a good way to secure a REST API?

I’m trying to secure a REST API that I’m using as a backend for a single-page application. The API provides access to read/create/modify/delete protected resources, based on a set of permissions managed by an administrator. What I’m thinking is the following:

  • All connections must be over HTTPS; plain HTTP requests will be redirected to HTTPS.
  • Users have a username and password, which they create.
  • A client submits a username/password to a /login route; if it’s a valid password for that user, the server will return a short-lived access token and a long-lived refresh token.
    • The access token will be a signed JWT that identifies the user and has an expiration time.
    • The refresh token will be a GUID corresponding to a row in a database table; this row will store the user ID
  • When accessing protected routes (everything but /login), an access token will be required. The server will verify the signature, and if valid, will check the expiration time. If the token is not expired, the user ID will be made available to server-side code for authorization logic.
  • If the access token is expired, the client will automatically submit the refresh token to a /refresh endpoint for requesting a new access token. The server will check the database; if a corresponding row still exists, a new access token will be returned to the client.

Does this scheme sound secure?