Delete rows or columns of matrix containing invalid elements, such that a maximum number of valid elements is kept

Originally posted in stack-overflow but was told to post here.

Context: I am doing a PCA on a MxN (N >> M) matrix with some invalid values located in the matrix. I cannot infer these values, so I need to remove all of them, which means I need to delete the whole corresponding row or column. Of course I want to keep the maximum amount of data. The invalid entries represent ~30% of data, but most of it is completly fill in a few lines, few of it is scattered in the rest of the matrix.

Some possible approches:

  • Similar to this problem , where I format my matrix such that valid data entries are equal to 1 and invalid entries to a huge negative number. However, all proposed solutions are of exponential complexity and my problem is simpler.

  • Computing the ratio (invalid data / valid data) for each row or column, and deleting the highest ratio(s). Recompute the ratios for the sub-matrix and remove the highest(s) ratios. (not sure how many lines or columns we can remove safely in one step), and so on until there is no invalid data left. It seems like an okay solution, but I am unsure it always gives the optimal solution.

My guess is that it is a standard data analysis problem, but surprisingly I could not find a solution online.

benefits of storing columns in JSON instead of traditional tables?

Are there any benefits in using JSON over traditional table structures?

Imagine having a table structure like this:

 create table table1 (         t_id int,         first_name varchar(20),         last_name varchar(20),         age int     ) 

What if you stored the same columns inside a json like this:

{     "first_name":"name",     "last_name":"name",     "age":2 } 

and have a table like this:

create table table2 (     t_id int,     attribute jsonb ) 

Correct me if im wrong, but since both variants are causing a row to be completely rewritten if there have been any updates or deletes on that row, then both variants are identical in that regard.

SSIS 2017 – Parse flat file with multiple columns and headers on multiple lines

I receive a daily CSV file that contains 600+ company employee positions, each of which are formatted as follows:

Position,Description SUP1015,Shipping Supervisor Day Work UOM:,Hours,Active:,Yes,Permanent:,Yes,, Default Rate Level:,0,Default Rate Source:,Master,Default GL Source:,Master,, Effective Date:,,Expiry Date:,,Created Date:,29-Apr-2014,Revised Date:,06-Jun-2019 Job Class:,,,,,Location:,,1004 - Shipping,, Union Code:,,,,,Reports To:,,MGR1056 - Delivery & Shipping Manager,, Position FTE:,,1.0000,,,,,, 

My goal is to transform all 600+ records into one table:

Position | Description                     | Work UOM | Active | Permanent | Default Rate Level | Default Rate Source | Default GL Code | Effective Date | Expiry Date | Created Date | Revised Date | Job Class | Location                 | Union Code | Reports to                                    | Position FTE | ========================================================================================================================================================================================================================================================================================================================= SUP1015  | Shipping Supervisor Day         | Hours    | Yes    | Yes       | 0                  | Master              | Master          |                |             | 29-Apr-2014  | 06-Jun-2019  |           | 1004 - Shipping          |            | MGR1056 - Delivery & Shipping Manager         | 1.0000       | 

I have no idea how to parse this, given the connection managers in SSIS. Any help and guidance is greatly appreciated.

“Column check constraint cannot reference other columns”


Create a new row for each day between dates from two different columns in Redshift SQL

I am working with a view in Redshift. It contains rows with some information and two dates (a start date and an end date). I can’t seem to be able to figure out a way to create a new row for each day between the start and end date. For example, here is a row:

customer_name | start_date | end_date   |  Peter F.      | 2018-03-01 | 2018-03-05 | Sam R.        | 2018-04-17 | 2018-04-20 | 

With each row, I would like to add one day to the start date, until the end date:

customer_name | start_date | end_date   |  Peter F.      | 2018-03-01 | 2018-03-05 | Peter F.      | 2018-03-02 | 2018-03-05 | Peter F.      | 2018-03-03 | 2018-03-05 | Peter F.      | 2018-03-04 | 2018-03-05 | Peter F.      | 2018-03-05 | 2018-03-05 | Sam R.        | 2018-04-17 | 2018-04-20 | Sam R.        | 2018-04-18 | 2018-04-20 | Sam R.        | 2018-04-19 | 2018-04-20 | Sam R.        | 2018-04-20 | 2018-04-20 | 

The date is actually a time stamp, but I could work with either. Thank you beforehand!

How to perform matrix multiplication in Mixing Columns step of AES?

I am studying AES and trying to implement it. But I am having difficulty understanding the Mixing Column step. In that step we have to perform matrix multiplication between the state matrix and another fixed matrix. Here is the example given in the material I am studying from: enter image description here

I am not getting the 03*2F part. How did it turn into (02*2F)xor2F? Is the material correct or does it have some mistake?

Selecting k rows and k columns from a matrix to maximize the sum of the k^2 elements

Suppose $ A$ is an $ n \times n$ matrix, and $ k \ge 1$ is an integer. We want to find $ k$ distinct indices from $ \{1, 2, \ldots, n\}$ , denoted as $ i_1, \ldots, i_k$ , such that

$ \sum_{p, q = 1}^k A_{i_p, i_q}$

is maximized. In words, we seek $ k$ rows and the corresponding $ k$ columns, such that the intersected $ k^2$ elements of $ A$ have the largest sum.

This problem can be formulated as a quadratic assignment problem, which is NP-hard and admits no polynomial time algorithm with constant approximation bound. I’m just wondering if for this specific problem (as a special case of quadratic assignment), there exists a poly-time algorithm with constant approximation bound. Thank you.

Unable to insert two value in two different columns (WordPress database)

So I have made a new input with the name “Anaam”. When users fill in their info in the input, it should go to the database. However, it will not. I have looked at youtube videos to solve this problem. I have also looked for similar questions like this on Stack Overflow. However, this is with a WordPress database, so the inserting code is different then it usually is. Because of that, I could not find good questions on this.

My HTML code:

   <html>       <head>        </head>       <body>         <form action="" enctype="multipart/form-data" method="post">           <input name="file" type="file"/>           <br>           <br>           <input name="Anaam" type="text" placeholder="Albumnaam" class="albumnaam">           <input name="submit" type="submit" value="Upload uw album" />          </form>       </body>     </html> 

As you can see, I have made an input with the name “Anaam”.

My PHP code:

// ... Some data to connect with the remote FTP server, nothing to do with the database  $  Anaam = $  _POST["Anaam"];  if ((!$  conn_id) || (!$  login_result)) {      echo "Het spijt ons, er is momenteel geen connectie met de server.";     // echo "Attempted to connect to $  ftp_server for user $  ftp_user_name";      exit;  } else {      // echo "upload is gelukt"; }    //Only allow zip and rar files to be uploaded $  allowed = array('zip', 'rar'); if(in_array($  fileActualExt, $  allowed)) {  // upload the file (in remote ftp server) $  upload = ftp_put($  conn_id, $  destination_file, $  source_file, FTP_BINARY);  // check upload status  if (!$  upload) {      echo "Er is iets fout gegaan, excuses voor het ongemak"; } else {      // insert data in de database      global $  wpdb;      $  number_of_rows_inserted = $  wpdb->insert('wpex_programma', [       'naam' => $  fileName,       'Anaam' => "test"      ]);        var_dump($  number_of_rows_inserted); 

As you can see, I want to insert the values in the database wpex_programma in the column “naam” and the column “Anaam”. In the column “Anaam” I want to insert the value of the variable $ Anaam. Why will it still not work? It does not show any errors. It just never stops loading.

My table structure:


function that returns table with multiple columns

I want to create a basic function that returns a table with multiple columns and rows as output. I have written the following query for the same

CREATE OR REPLACE FUNCTION public.visualizar( ) RETURNS TABLE(codigo integer,fecha timestamp, cliente character varying, proveedor character varying, total integer, codigodetale integer, producto character varying, cantidad integer, precio integer, subtotal integer)  LANGUAGE 'plpgsql'  COST 100 VOLATILE  ROWS 1000 AS $  BODY$       BEGIN return query select f.codigo,f.fecha,f.cliente,f.proveedor from factura f,detalle d where f.codigo=d.codigo ;  END; $  BODY$  ; 

but it shows me the query like this enter image description here

I want thattable to show it to me like this

enter image description here

but when selecting a row it shows me the data like this enter image description here

How can i do that with a function?

load local data on fixed width file inserting all columns with NULL values

Last month I had occasion to load about 50 GB of data from fixed-width dat files. There were 7 files so I created 7 tables. I created 7 LOAD DATA LOCAL INFILE scripts to load the data and it all worked fine resulting in over 70 million rows. The data is available with updates on the 10th of the month so I downloaded the files and ran my scripts on them and only 1 worked and the rest loaded the number of rows but with all of the columns NULL. I’ve compared the previous files with the new ones and cannot find any difference that would cause this. I’ve put a small amount of data that’s failing in a test.dat file and am getting the same results, but have not been able to determine why this is happening or why one works and the rest don’t? I don’t see any difference between the files or the sql that loads them? I’ve tried changing the encoding, line endings, permissions, ownership, and various other things without success. There are no errors thrown, it just loads NULL values. Has anyone ran across this before?

Here is an example table with the load sql

DROP TABLE IF EXISTS exp_gpoper; CREATE TABLE `exp_gpoper` (   `_id` int(11) NOT NULL AUTO_INCREMENT,   `date_updated` datetime NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),   `pun_county_num` varchar(100) DEFAULT '',   `pun_lease_num` varchar(100) DEFAULT '',   `pun_sub_num` varchar(100) DEFAULT '',   `pun_merge_num` varchar(100) DEFAULT '',   `company_number` varchar(100) DEFAULT '',   `company_name` varchar(300) DEFAULT '',   PRIMARY KEY (`_id`),   KEY `date_updated` (`date_updated`),   KEY `pun_county_num` (`pun_county_num`),   KEY `pun_lease_num` (`pun_lease_num`),   KEY `pun_merge_num` (`pun_merge_num`),   KEY `pun_sub_num` (`pun_sub_num`),   KEY `company_number` (`company_number`),   KEY `company_name` (`company_name`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;  LOAD DATA LOCAL INFILE 'test.dat'  INTO TABLE exp_gpoper  (@_row)  SET `pun_county_num` = TRIM(SUBSTR(@row,1,3)),  `pun_lease_num` = TRIM(SUBSTR(@row,4,6)),  `pun_sub_num` = TRIM(SUBSTR(@row,10,1)),  `pun_merge_num` = TRIM(SUBSTR(@row,11,4)),  `company_number` = TRIM(SUBSTR(@row,15,7)),  `company_name` = TRIM(SUBSTR(@row,22,255)); 

Here is the content of the test.dat file:

001000000000000077777OTC USE                                                                                                                                                                                                                                                         003000000000000077777OTC USE                                                                                                                                                                                                                                                         003000567000000020011M & D PUMPING SERVICE INC                                                                                                                                                                                                                                       003000587000000022576SCOGGINS PRODUCTION LLC                                                                                                                                                                                                                                         003000588000000022576SCOGGINS PRODUCTION LLC                                                                                                                                                                                                                                         003000639000000017441CHESAPEAKE OPERATING LLC                                                                                                                                                                                                                                        003000963000000019694BVD INC                                                                                                                                                                                                                                                         003000964000000018119BLAKE PRODUCTION CO INC                                                                                                                                                                                                                                         003002207124830022281SANDRIDGE EXPLORATION AND PRODUCTION LLC                                                                                                                                                                                                                        003002394000000020891SUPERIOR OIL & GAS LLC                                                                                                                                                                                                                                           

This works fine:

DROP TABLE IF EXISTS `exp_gpexempt`; CREATE TABLE `exp_gpexempt` (   `_id` int(11) NOT NULL AUTO_INCREMENT,   `date_updated` datetime NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),   `pun_county_num` varchar(100) DEFAULT '',   `pun_lease_num` varchar(100) DEFAULT '',   `pun_sub_num` varchar(100) DEFAULT '',   `pun_merge_num` varchar(100) DEFAULT '',   `exemption_type` varchar(100) DEFAULT '',   `code` varchar(100) DEFAULT '',   `exemption_percentage` varchar(100) DEFAULT '',   PRIMARY KEY (`_id`),   KEY `date_updated` (`date_updated`),   KEY `pun_county_num` (`pun_county_num`),   KEY `pun_lease_num` (`pun_lease_num`),   KEY `pun_merge_num` (`pun_merge_num`),   KEY `exemption_type` (`exemption_type`),   KEY `code` (`code`),   KEY `exemption_percentage` (`exemption_percentage`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;  LOAD DATA LOCAL INFILE 'test2.dat'  INTO TABLE exp_gpexempt (@_row) SET `pun_county_num` = TRIM(SUBSTR(@_row,1,3)), `pun_lease_num` = TRIM(SUBSTR(@_row,4,6)), `pun_sub_num` = TRIM(SUBSTR(@_row,10,1)), `pun_merge_num` = TRIM(SUBSTR(@_row,11,4)), `exemption_type` = TRIM(SUBSTR(@_row,15,50)), `code` = TRIM(SUBSTR(@_row,65,5)), `exemption_percentage` = TRIM(SUBSTR(@_row,70,24)); 

Here is the content of the test2.dat file:

00300063900000School District                                   05   00000000000.000293000000 00300365500000State School Land Commission                      01   00000000000.125000000000 00301843300000State School Land Commission                      01   00000000000.125000000000 00302942700633State School Land Commission                      01   00000000000.125000000000 00302942800633State School Land Commission                      01   00000000000.125000000000 00303004100000Federal                                           02   00000000000.067632900000 00303004200000Federal                                           02   00000000000.125000000000 00303004600000Federal                                           02   00000000000.125000000000 00303004700000Federal                                           02   00000000000.125000000000 00303004800000Federal                                           02   00000000000.125000000000