Determining sample size of sectors in order to validate a data wipe procedure

I am doing some research into validating implementations of hardware optimized data wipe procedures in solid state storage devices, such as ATA8 Security Mode Erase and NVMe Secure Erase procedures.

As I have attempted to define what "success" means in this context I have established that a key measure would be that "it is possible to demonstrate a change in the value of a sector X of the storage medium between observations pre and post wipe."

The most rigorous approach to this would be to make a copy of all of the sectors, conduct the wipe procedure, then compare every sector’s new value with the reference copy and ensure that it is different. However this extremely time consuming and only really practical in a lab environment.

At the opposite extreme, simply checking that the initial sectors of the medium where the file-system structures are held are no long valid is not sufficient as the actual data is easily recoverable in their absence.

The middle ground then appears to be record a number of observations of sectors randomly selected from the medium, conduct the wipe, then compare. I believe the key to that is to determine in some formal fashion what how many sectors to sample in order for there to be any confidence in the outcome.

My understanding of sampling theory from college is all based upon sampling human populations using established models and tables, which I don’t think apply here. Accordingly, I am looking for suggestions as to techniques that can be applied to determine an appropriate sample size, or if due to the nature of the population it is not possible to actually construct such a sample with any useful meaning. I think I understand that statistical models rely upon the ability to reason about other people you didn’t observe based upon those you did, and it’s not clear to me that in this case there is a way to reason about the state of other sectors based upon the ones you check. If that were the case than perhaps all you are left with is making some arbitrary decision that X percentage of sectors being wiped is sufficient according to some policy standard, but that feels unsatisfactory to me.

This might be a Statistics question rather than a Computer Science question, but I am more comfortable with CS terminology that stats, and I think an understanding of how storage devices work is important to understanding the question, so I decided to start here. If this would be better off asked elsewhere please let me know.

If I can efficiently uniformly sample both $A$ and $B\subset A$, can I efficiently uniformly sample $A-B$?

As posed in the question; the statement naively seems like it should be self-evident but there are no algorithms that come immediately to mind. Suppose I have some domain $ A$ (in my case a subset of $ \mathbb{Z}^n$ for some $ n$ , but I would expect any answer to be independent of that structure), and a domain $ B\subset A$ . If I have an efficient algorithm for uniformly sampling from $ A$ , and an efficient algorithm for uniformly sampling from $ B$ , is there any way of ‘combining’ these to get an efficient algorithm for uniformly sampling $ A-B$ ? I can certainly rejection-sample, but if $ |A-B|\ll|A|$ then there’s no guarantee that that will be efficient.

How can I generate a random sample of unique vertex pairings from a undirected graph, with uniform probability?

I’m working on a research project where I have to pair up entities together and analyze outcomes. Normally, without constraints on how the entities can be paired, I could easily select one random entity pair, remove it from the pool, then randomly select the next entity pair.

That would be like creating a random sample of vertex pairs from a complete graph.

However, this time around the undirected graph is now

  • incomplete
  • with a possibility that the graph might be disconnected.

I thought about using the above method but realized that my sample would not have a uniform probability of being chosen, as probabilities of pairings are no longer independent of each other due to uneven vertex degrees.

I’m banging my head at the wall for this. It’s best for research that I generate a sample with uniform probability. Given that my graph has around n = 5000 vertices, is there an algorithm that i could use such that the resulting sample fulfils these conditions?

  1. There are no duplicates in the sample (all vertices in the graph only is paired once).
  2. The remaining vertices that are not in the sample do not have an edge with each other. (They are unpaired and cannot be paired)
  3. The sample generated that meets the above two criteria should have a uniform probability of being chosen as compared to any other sample that fulfils the above two points.

There appear to be some work done for bipartite graphs as seen on this stackoverflow discussion here. The algorithms described obtains a near-uniform sample but doesn’t seem to be able to apply to this case.

Algoritm to sample an even subgraph of a graph

In some problems related to the Ising model in physics and mathematics the following problem comes up:

Suppose I have a graph $ G$ . Then an even spanning subgraph of $ G$ is a subgraph where you keep all the vertices and some of the edges such that each vertex has even degree.There is always at least one since the empty spanning subgraph is always even. Now, among all the even spanning subgraphs of $ G$ I want to sample one uniformly at random.

Is there an fast and preferably easy to implement algoritm to do that?

Is this a well studied problem? If yes: could you point me to some references?

Some background: The space of even spanning subgraphs of a graph has some nice structure since if you have two of them then you can take their symmetric difference and it will still be an even spanning subgraph. This means that it is a vector space of the field $ \mathbb{F}_2$ and you can pick a basis of that space – in particular this shows that the number of even spanning subgraphs is always a power of 2. I wonder how difficult it is to find the basis elements since if you have some you just flip coins for each and take the symmetric difference of all the graphs where you get head. Another point is that there might be a smart low-tech randomized way to do this.

How to uniformly sample a sorted simplex

I am looking for an algorithm to uniformly generate a descending array of N random numbers, such that the sum of the N numbers is 1, and all numbers lie within 0 and 1. For example, N=3, the random point (x, y, z) should satisfy:

x + y + z = 1 0 <= x <= 1 0 <= y <= 1 0 <= z <= 1 x >= y >= z 

My guess is all i have to do is uniformly sample a simplex (Uniform sampling from a simplex), and then sort the elements. But i’m not sure whether the result sampling algorithm is uniform.

Also, rejection sampling is not ideal for me, because i’ll use this for high dimension.

Thanks!

John the ripper – ecryptfs – sample not cracked: 0 password hashes cracked

Good morning all,

I tried to use john the ripper on the sample : ecryptfs_sample_metadata.tar (password is ‘openwall’)

witch i downolad here: https://openwall.info/wiki/john/sample-non-hashes

The passeword is openwall.

If i try

sudo john ecryptfs_sample_metadata.tar --progress-every=10 --mask='openwal?l' 

The result is:

Warning: detected hash type "mysql", but the string is also recognized as "oracle" Use the "--format=oracle" option to force loading these as that type instead Warning: detected hash type "mysql", but the string is also recognized as "pix-md5" Use the "--format=pix-md5" option to force loading these as that type instead Using default input encoding: UTF-8 Loaded 1 password hash (mysql, MySQL pre-4.1 [32/64]) Warning: no OpenMP support for this hash type, consider --fork=4 Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:00  0g/s 185.7p/s 185.7c/s 185.7C/s openwala..openwalq Session completed 

If i try show i have the result:

0 password hashes cracked, 1 left 

I try to ad

--format=oracle  

or

--format=pix-md5  

with the same result.

Does anyone have an idea why the password is not cracked?

How can I determine if a malware sample is morphic? (polymorphic, metamorphic, etc)

I want to do a malware test that specifically uses recent morphic malware samples (polymorphic, metamorphic, etc). There are a couple of good sources I can pull samples from, but I need to know if their signature will change or not.

The best idea I have so far is to use a tool to disassemble it so I can look at the Assembly code. Then get it to propagate and look at the code to see if there is a change.

Does anyone know of a better way to do this? I’m not even sure of a reliable way to make it propogate.

Does there exist an algorithm to generate the production rules of CFG, given a sample production?

Lets say, we provide the algorithm a set of tokens.

e.g.

x + y - z x - x - x 

It will then try to generate a CFG which fits all the provided examples

S -> S O T | T T -> x | y | z O -> + | - 

It feels like a data compression problem but I could be wrong.

Does anybody know any existing literature or a starting point to solve this problem?

Does this problem have a name? What should I Google?

Need a code to convert some sample text to .vtt format

I need a python code to convert some sample text in notepad to .vtt format. After the conversion the code should look like the .vtt form given below. The sample is text I want to convert from to .vtt is also give bellow. Please I want the code to be written in python

After conversion the text should look like the .vtt form given below.

WEBVTT

00:00.210 –> 00:00.930 Hi there.

00:00.940 –> 00:06.110 So would you be willing to take a look at mathematical operation on me in my lab.

00:06.120 –> 00:10.700 So first thing we are going to do is to create fullest Ari.

00:11.010 –> 00:26.070 Let’s say one hour later he can dance on 3 4 and 6 AM we going to create another four or eight B that

00:26.070 –> 00:29.620 contains five six eight.

00:30.030 –> 00:39.490 Make sure that you can either use space or comma to separate your Each element in the key.

Example of the sample text(example before the conversion) that I want to convert from to .vtt format is ….

0:00I’ve got a transformation, m that’s a mapping from Rn 0:06Rn, and it can be represented by the matrix A. 0:10So the transformation of x is equal to A times x. 0:14We saw in the last video it’s interesting to find the 0:17vectors that only get scaled up or down by the 0:20transformation. 0:21So we’re interested in the vectors where I take the 0:23transformation of some special vector v. 0:27It equals of course, A times v. 0:29And we say it only gets scaled up by some factor, 0:32lambda times v. 0:34And these are interesting because they make for 0:35interesting basis vectors. 0:38You know, the transformation matrix in the alternate 0:40basis– this is one of the basis vectors.