## How to create a matrix from a generating set

I am trying to form a matrix using a generating set, some v1, v2, v3. I want the code to use the vectors in the set and create a matrix with the zero vector, each of the row vectors, and the combinations v1+v2, v1+v3, v2+v3 and v1+v2+v3, ie. all the possible linear combinations with 0 and 1 as coefficients. This is my first time coding and I feel like this should be doable, but I cannot seem to figure it out.

I’ve tried using for loops, but I end up with repeats. I’ve also been able to do it by doing each of the operations but this isn’t feasible for generating sets with many vectors.

## Newbies Guide To Generating Traffic for \$1

by: jordanng
Created: —
Category: Tutorials & Guides
Viewed: 129

## Generating fake number for a 25 digit PII number in a file containing millions of rows

I have to expose some sensitive data containing a PII column that has a 25 digit number. Rest of the columns aren’t PII data. This is done such that the data can be safely shared to the larger audience without the original PII column’s data. but if required I need to check the original value, hence a look up file is needed which maps the PII with its pseudo number.

How do I generate a pseudo unique number such that we can later map back to original data if required ?

Currently there are about 22 million rows. There could be at max 50 million such rows of data later on as the data keeps coming in.

I was thinking of the UUIDs but they aren’t really human friendly & UUIDs would be bad at indexing later on if we move to a database (Over thinking much?). Also joining two dataframes based on indexing could be slow I reckon.

My current thought process using pandas (for the first file containing 22 million rows)

1. shuffle the lines with pandas (assuming it fits in memory)
2. add a column with a auto incrementing field (say Psuedo_number)
3. add another column with uuids (UUID4)
4. Create a lookup file with our new pseudo_number, UUID, original PII number

now when new PII data comes in

1. read the highest value of the pseudo_number from the lookup file
2. use that (+1) as the starting number for the above process on the new data

tldr I need to generate unique random numbers for PII column in a file containing 22 millions rows & maintaining a look up file. Later would need to import into a database once system grows.

some intial code:

``<!-- language: lang-py -->  # dummy list >>> l = [('C0000005', 'RB', 'C0036775', '')] * 27000000 # create a sample dataframe to represents our data of 22+ million rows >>> df = pd.DataFrame(l, columns=list('abcd')) # let the following 'sensitive_col' column represent our 25 digit number for now >>> df['sensitive_col'] = df.index + 123456789  >>> df.head()   a          b             c        d         sensitive_col 0   C0000005    RB  C0036775    D185368     123456789 1   C0000005    RB  C0036775    D185368     123456790 2   C0000005    RB  C0036775    D185368     123456791 3   C0000005    RB  C0036775    D185368     123456792 4   C0000005    RB  C0036775    D185368     123456793   #actual code!! # shuffle the rows >>> df = df.sample(frac=1).reset_index(drop=True) >>> df['New_ID'] =  df.index + 123  # create the UUIDs >>> df['uuid'] = [uuid.uuid4() for _ in range(len(df.index))]  >>> df.head()             a    b         c       d     sensitive_col   New_ID  uuid 0   C0000005    RB  C0036775    D185368     132571068   123     8c1974cf-49ff-4b87-bfac-b791156d1b1b 1   C0000005    RB  C0036775    D185368     130859684   124     2a170f08-43a9-4a1d-acf5-b537a229c7e9 2   C0000005    RB  C0036775    D185368     135318849   125     5b265c8e-35ea-4100-bac0-c77f4d3f85ea 3   C0000005    RB  C0036775    D185368     145963082   126     77e2e78c-c72a-4738-907a-9e4851a328d2 4   C0000005    RB  C0036775    D185368     141664707   127     de73b056-6c5e-4276-8b93-db44cd9990ba ``

Any suggestions ?

## Algorithm for generating random incrementing numbers up to a limit

I’m trying to write a code to generate incremental sequences of numbers such as:

``0 + 1 + 3 + 5 + 1 = 9  0 + 5 + 1 + 1 + 1 = 8  0 + 1 + 1 + 2 + 1 = 5 ``

I have 3 constrains:

1) I need to have limited number of addends (here it is `=5`)

2) final sum must be smaller than certain limit ( here limit is `<9`)

As for now, I generate sequences randomly and select only suitable ones. For 2-digit numbers and for long sequences (`>8`) my algorithm takes significant time.

Is there a better algorithm for such problem?

As least could you tell me, what branch of CS is studying such problems?

UPDATE (algorith):

``0) array = [0,]; // initial array 1) if sum(array) < 99, countinue 2) generate random number in [1..99], let's say rand = 24 3) rand = array[-1] + rand // add random number to last value of array 4) array.push(rand) // add the random number to array 5) goto 1) 6) if length(array) < 5, goto 0) // 5 is desired sequence lenght ``

## Why can’t we generate the output of a cellular automata at time step t without first generating all the preceding states

Why can’t we generate the output of cellular automata at time step t without first generating all the preceding states? Why can we do this for some functions? What features of a function does it relate to? Is it about whether it can be graphed as a line or a curve? Why can’t we come up with a function that takes in a cellular automata ruleset and a timestep t and gives us the output in constant time with respect to t?

## What is the generating algorithm for the “komb” instances found on satcompetition.org?

For the 2017 and 2018 Random SAT Tracks of the SAT Competition ran by the International Conference on Theory and Applications of Satisfiability Testing there are small, yet difficult, random 3-SAT problems with planted solutions labeled “barthel,” “qhid,” and “komb.” I have been able to determine that “barthel” is referring to this procedure, and “qhid” is referring to this procedure. The “komb” instances may refer to this procedure, based on the references given in the article here on pages 60-62. Note on page 61, in the first sentence of section “Our Approach,” 3 references are given, namely, the first 3 references given above. Can anyone confirm the procedure for generating the “komb” instances in the SAT Competitions? (See satcompetition.org for more details.)

## Generating random sparse matrix

My goal is to generate a large sparse matrix with majority (~99%) zeros and ones. Ideally, I would be working with 10,000 rows and 10,000,000 columns. Additionally, each column is generated as a sequence of Bernoulli samples with a column-specific probability. So far, I’ve implemented 3 ways to generate the data:

Function 1

Creating basic dense matrix of 0/1:

``spMat_dense <- function(ncols,nrows,col_probs){   matrix(rbinom(nrows*ncols,1,col_probs),          ncol=ncols,byrow=T) } ``

Function 2

Using `Rcpp`:

``#include <RcppArmadillo.h>  // [[Rcpp::depends(RcppArmadillo)]]  using namespace std; using namespace Rcpp; using namespace arma;  // [[Rcpp::export]] arma::sp_mat spMat_cpp(const int& ncols, const int& nrows, const NumericVector& col_probs){    IntegerVector binom_draws = no_init(nrows);   IntegerVector row_pos;   IntegerVector col_pos;   int nz_counter=0;    //Generate (row,cell)-coordinates of non-zero values   for(int j=0; j<ncols; ++j){     binom_draws = rbinom(nrows,1,col_probs[j]);     for(int i=0; i<nrows; ++i){       if(binom_draws[i]==1){          row_pos.push_back(i);         col_pos.push_back(j);         nz_counter += 1;       }     }   }    //Create a 2 x N matrix - indicates row/col positions for N non-zero entries   arma::umat loc_mat(2,nz_counter);    for(int i=0;i<nz_counter; ++i){     loc_mat(0,i) = row_pos[i];     loc_mat(1,i) = col_pos[i];   }    IntegerVector x_tmp = rep(1,nz_counter);   arma::colvec x = Rcpp::as<arma::colvec>(x_tmp);    //sparse matrix constructor   arma::sp_mat out(loc_mat,x);   return out; } ``

Function 3

Using `dgCMatrix` construction in `Matrix` package:

``spMat_dgC <- function(ncols,nrows,col_probs){   #Credit to Andrew Guster (https://stackoverflow.com/a/56348978/4321711)   require(Matrix)   mat <- Matrix(0, nrows, ncols, sparse = TRUE)  #blank matrix for template   i <- vector(mode = "list", length = ncols)     #each element of i contains the '1' rows   p <- rep(0, ncols)                             #p will be cumsum no of 1s by column   for(r in 1:nrows){     row <- rbinom(ncols, 1, col_probs)            #random row     p <- p + row                                 #add to column identifier     if(any(row == 1)){       for (j in which(row == 1)){         i[[j]] <- c(i[[j]], r-1)                 #append row identifier       }     }   }   p <- c(0, cumsum(p))                           #this is the format required   i <- unlist(i)   x <- rep(1, length(i))   mat@i <- as.integer(i)   mat@p <- as.integer(p)   mat@x <- x   return(mat) } ``

Benchmarking

``ncols = 100000 nrows = 1000 col_probs = runif(ncols, 0.001, 0.002)  microbenchmark::microbenchmark(generate_SpMat1(ncols=ncols,nrows=nrows,col_probs=col_probs),                                generate_SpMat2(ncols=ncols,nrows=nrows,col_probs = col_probs),                                generate_spMat(ncols=ncols,nrows=nrows,col_probs=col_probs),                                times=5L)  Unit: seconds                                                           expr       spMat_dense(ncols = ncols, nrows = nrows, col_probs = col_probs)  spMat_cpp(ncols = ncols, nrows = nrows, col_probs = col_probs)      spMat_dgC(ncols = ncols, nrows = nrows, col_probs = col_probs)        min        lq      mean   median        uq       max neval   6.527836  6.673515  7.260482  7.13241  7.813596  8.155053     5  56.726238 57.038976 57.841693 57.24435 58.325564 59.873333     5   6.541939  6.599228  6.938952  6.62452  7.402208  7.526867     5 ``

Interestingly, my `Rcpp` code is not as optimal as I thought it would be. I’m not entirely sure why it’s not as efficient as the basic, dense construction. The advantage however in the `Rcpp` and `dgCMatrix` construction is that they don’t create a dense matrix first. The memory used is much less:

``ncols = 100000 nrows = 1000 col_probs = runif(ncols, 0.001, 0.002)  mat1 <- spMat_dense(ncols=ncols,nrows=nrows,col_probs=col_probs) mat2 <- spMat_cpp(ncols=ncols,nrows=nrows,col_probs = col_probs) mat3 <- spMat_dgC(ncols=ncols,nrows=nrows,col_probs=col_probs)  object.size(mat1) object.size(mat2) object.size(mat3)  > object.size(mat1) 400000216 bytes > object.size(mat2) 2199728 bytes > object.size(mat3) 2205920 bytes ``

Question

What is it about my `Rcpp` code that makes it slower than the other two? Is it possible to optimize or is the well-written R code with `dgCMatrix` as good as it gets?

## Mysql – generating trends of rank

I’ve a table of students score

``      ID_STUDENT | SCORE      ------------------            1     |  90            1     |  80            2     |  99            3     |  80            4     |  70            5     |  78            6     |  90            6     |  50            7     |  90 ``

So lets say on first day I’ll compute the rank and store it in one column say RANK –

``      ID_STUDENT | SCORE  | RANK      ----------------------------            3     |  99    |  1            1     |  90    |  2            7     |  90    |  2            9     |  90    |  2            2     |  80    |  3            4     |  80    |  3            6     |  78    |  4            5     |  70    |  5            8     |  50    |  6 ``

on second day I’ll refresh the score and will recompute the rank. However, here is the thing, I need the records of prompt past rank as well, like this –

``      ID_STUDENT | SCORE  | RANK | OLD_RANK      ---------------------------------------            2     |  99    |  1   |  3            8     |  92    |  2   |  6            1     |  90    |  3   |  2            9     |  90    |  3   |  2            3     |  80    |  4   |  1            4     |  80    |  4   |  3            6     |  78    |  5   |  4            5     |  70    |  5   |  5            7     |  40    |  6   |  2 ``

So using this outcome, I would able find the rank trend,much the same as you can see the songs position going up/down over a week as shown here – https://www.billboard.com/charts/hot-100

How can I achieve this using straightforward DML queries?

## Generating graphics on a canvas

I wrote some JavaScript to generate some graphics on a canvas, on a regular HTML page.

I now want the same code to run in a component that is part of a React app. I’ve done this:

``  componentDidMount() {     const canvas = this.refs.firstCanvas     const ctx = canvas.getContext('2d')     const bgCanvas = this.refs.firstbgCanvas     const bgCtx = bgCanvas.getContext('2d')      function generateStarfield() {       bgCtx.clearRect(0, 0, bgCanvas.width, bgCanvas.height)       for (let i = 0; i < 2000; i++) {         bgCtx.beginPath()         const x = Math.random()*bgCanvas.width         const y = Math.random()*bgCanvas.height         bgCtx.arc(x, y, 0.35, 0, 2*Math.PI, 'anticlockwise')         bgCtx.closePath()         bgCtx.fillStyle = 'white'         bgCtx.fill()       }     }     generateStarfield()      const origin_x = 1000     const origin_y = 1000     const scale = 1000      class planet {       constructor(orbital_velocity, offset_theta, orbital_radius, radius, colour) {         this.orbital_velocity = orbital_velocity         this.orbital_radius = orbital_radius         this.colour = colour         this.radius = radius         this.offset_theta = offset_theta         this.draw()       }        draw() {       const theta = this.offset_theta + (t*this.orbital_velocity)       const x = origin_x + (this.orbital_radius*Math.cos(theta))       const y = origin_y + (this.orbital_radius*Math.sin(theta))       ctx.beginPath()       ctx.arc(x, y, this.radius, 0, 2*Math.PI, 'anticlockwise')       ctx.closePath()       ctx.fillStyle = this.colour       ctx.fill()       }     }       let t = 0      const a = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.1, 10, 'white')     const b = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.2, 16, 'white')     const c = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.3, 18, 'white')     const d = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.4, 14, 'white')     const e = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.5, 12, 'white')     const f = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.6, 28, 'white')     const g = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.7, 22, 'white')     const h = new planet((0.2+0.4*Math.random())*Math.PI, 2*Math.random()*Math.PI, scale*0.8, 20, 'white')      setInterval(function() {       ctx.clearRect(0, 0, canvas.width, canvas.height)       ctx.beginPath()       a.draw()       b.draw()       c.draw()       d.draw()       e.draw()       f.draw()       g.draw()       h.draw()       t += 0.05     }, 40)   } ``

Now, this works (!), but I’ve just lifted it directly from the original JavaScript, and modified the `document.getElementByID` calls to use `ref`s. Is it bad practice to stick all of this stuff into `componentDidMount`?

Note – I’m aware that the code itself could be tidied up, I’m mainly asking about putting all of this (especially the definitions for `planet` and `generateStarField`) into `componentDidMount`.

## What is the underlying logic for generating an icon map for a custom registered layout (in Layout Builder)?

There seems to be a dearth of information on the internet about how to generate an `icon_map` for newly registered layouts in Layout Builder.

I’ve gotten as far as finding a patch file from the project that includes a poorly-documented comment that unfortunately doesn’t shed much light on this:

``/**  * Builds a render array representation of an SVG based on an icon map.  *  * @param string[][] \$  icon_map  *   A two-dimensional array representing the visual output of the layout.  *   For the following shape:  *   |------------------------------|  *   |                              |  *   |             100%             |  *   |                              |  *   |-------|--------------|-------|  *   |       |              |       |  *   |       |      50%     |  25%  |  *   |       |              |       |  *   |  25%  |--------------|-------|  *   |       |                      |  *   |       |         75%          |  *   |       |                      |  *   |------------------------------|  *   The corresponding array would be:  *   - ['top']  *   - ['first', 'second', 'second', 'third']  *   - ['first', 'bottom', 'bottom', 'bottom'].  ... ``

It doesn’t say why we’d use keywords like `first`, `third`, and `bottom` — or even what each term actually means to the Layout Builder module. There doesn’t seem to be any rhyme or reason as to how this is supposed to work. The documentation is also severely lacking.

What is the underlying logic behind the generation of icon maps, and how does it actually work?