Storing timeseries data with dynamic number of columns and rows to a suitable database

I have a timeseries pandas dataframe which dynamically increases the columns every minute as well as adds a new row:

Initial:

timestamp                100     200     300 2020-11-01 12:00:00       4       3       5 

Next minute:

timestamp                100     200     300   500 2020-11-01 12:00:00       4       3       5     0 2020-11-01 12:01:00      14       3       5     4 

The dataframe has these updated values and so on every minute.

so ideally, I want to design a database solution that supports such a dynamic column structure. The number of columns could grow to over 20-30k+ and since it’s one minute timeseries, it will have 500k+ rows per year.

I’ve read that relational db’s have a limit on the number of columns so that might not work here, but also, since I am setting the data for new columns and assigning a default value(0) to previous timestamps, I lose out on the DEFAULT param that’s there on MySQL.

Eventually, I will be querying data for 1 day, 1 month to get the data for the columns and their values.

Please suggest a suitable database solution for this type of dynamic row and column data.

Flexible table columns width, rows independent?

Hello,

i have HTML table with 3 rows.
the third row contains in the middle three columns that contains extraordinary long phrasesphrases which cause columns above these long columns to be long too and that looks bad, because above rows contains images so spacing is different.

How can i set the HTML table columns width to be flexible, i mean so the wide third row columns does not influence width of the columns in other rows of the same table?

Flexible table columns width, rows independent?

MySQL Transform multiple rows into a single row in same table (reduce by merge group by)

Hy, i want reduce my table and updating himself (group and sum some columns, and delete rows)

Source table "table_test" :

+----+-----+-------+----------------+ | id | qty | user  | isNeedGrouping | +----+-----+-------+----------------+ |  1 |   2 | userA |              1 | <- row to group + user A |  2 |   3 | userB |              0 | |  3 |   5 | userA |              0 | |  4 |  30 | userA |              1 | <- row to group + user A |  5 |   8 | userA |              1 | <- row to group + user A |  6 |   6 | userA |              0 | +----+-----+-------+----------------+ 

Wanted table : (Obtained by)

DROP TABLE table_test_grouped; SET @increment = 2; CREATE TABLE table_test_grouped SELECT id, SUM(qty) AS qty, user, isNeedGrouping FROM table_test GROUP BY user, IF(isNeedGrouping = 1, isNeedGrouping, @increment := @increment + 1); SELECT * FROM table_test_grouped;  +----+------+-------+----------------+ | id | qty  | user  | isNeedGrouping | +----+------+-------+----------------+ |  1 |   40 | userA |              1 | <- rows grouped + user A |  3 |    5 | userA |              0 | |  6 |    6 | userA |              0 | |  2 |    3 | userB |              0 | +----+------+-------+----------------+ 

Problem : i can use another (temporary) table, but i want update initial table, for :

  • grouping by user and sum qty
  • replace/merge rows into only one by group

The result must be a reduce of initial table, group by user, and qty summed.

And it’s a minimal exemple, and i don’t want full replace inital from table_test_grouped, beacause in my case, i have another colum (isNeedGrouping) for decide if y group or not.

For flagged rows "isNeedGrouping", i need grouping. For this exemple, a way to do is sequentialy to :

CREATE TABLE table_test_grouped SELECT id, SUM(qty) AS qty, user, isNeedGrouping FROM table_test WHERE isNeedGrouping = 1 GROUP BY user ; DELETE FROM table_test WHERE isNeedGrouping = 1 ; INSERT INTO table_test SELECT * FROM table_test_grouped ; 

Any suggestion for a simpler way?

Is there performance loss in out of sequence inserted rows (MySQL InnoDB)

I am trying to migrate from a bigger sized MySQL AWS RDS instance to a small one and data migration is the only method. There are four tables in the range of 330GB-450GB and executing mysqldump, in a single thread, while piped directly to the target RDS instance is estimated to take about 24 hours by pv (copying at 5 mbps).

I wrote a bash script that calls multiple mysqldump using ‘ & ‘ at the end and a calculated --where parameter, to simulate multithreading. This works and currently takes less than an hour with 28 threads.

However, I am concerned about any potential loss of performance while querying in the future, since I’ll not be inserting in the sequence of the auto_increment id columns.

Can someone confirm whether this would be the case or whether I am being paranoid for no reasons.

What solution did you use for a single table that is in the 100s of GBs? Due to a particular reason, I want to avoid using AWS DMS and definitely don’t want to use tools that haven’t been maintained in a while.

Turning columns result set into independant rows in MySQL

I am pretty new to this and have been struggling over one issue.

I have a result set which for each Opportunity in a table (id as primary key), provides a date for a first cashflow (DFCFW) as a column and the 10 following columns (CFW1, CFW2, …., CFW10) being the 10 possible cashflows for each of the 10 following years, expected as anniversary dates of the first cashflow.

I would like to create a view which displays, for all the opportunities, three columns: opportunity.id, date of the cashflow, cashflow; there should be 10 records for each opportunity.

Any suggestion how to achieve this ?

Thank you so much

Fred

Is the Post Correspondence Problem with more than two rows harder than the standard two-row variant?

The standard Post Correspondence Problem concerns tiles with two rows of symbols, and whether a tile arrangement can be made so that the sequence of the top symbols of the tiles is equal to the bottom one.

Let $ \text{n-PCP}, \text{n} > 0$ a generalization of the Post Correspondence Problem where the tiles contain $ \text{n}$ rows, and the sequences of the symbols have to be equal for all of these rows.

Obviously $ \text{1-PCP}$ is decidable (in fact it’s trivial because the answer to the problem is always true). $ \text{2-PCP}$ is the standard PCP.

But what if $ \text{n} > 2$ ? Is it harder or can it be reduced to the standard PCP (like >3-SAT being reduced to 3-SAT)?

Recommendations about deleting large set of rows in MSSQL

I need to delete about 75+ million rows from a table everyday that contains around 3.5 billions of record.

Database recovery mode is simple, I have writen a code that deletes 15.000 rows in a while condition until all 75M records is deleted. (i use batch delete due to log file grow) However, with current deletion speed it looks like it will take at least 5 days, which means that amount of data required to be deleted is multiply faster than my deletion speed.

Basically what i’m trying to do is summarizing (in another table) and deleting data older than 2 months. There is no update operation in that table, only insert and delete.

I have an enterprise edition of MSSQL 2017

Any suggestions will be welcome.

Delete rows or columns of matrix containing invalid elements, such that a maximum number of valid elements is kept

Originally posted in stack-overflow but was told to post here.

Context: I am doing a PCA on a MxN (N >> M) matrix with some invalid values located in the matrix. I cannot infer these values, so I need to remove all of them, which means I need to delete the whole corresponding row or column. Of course I want to keep the maximum amount of data. The invalid entries represent ~30% of data, but most of it is completly fill in a few lines, few of it is scattered in the rest of the matrix.

Some possible approches:

  • Similar to this problem , where I format my matrix such that valid data entries are equal to 1 and invalid entries to a huge negative number. However, all proposed solutions are of exponential complexity and my problem is simpler.

  • Computing the ratio (invalid data / valid data) for each row or column, and deleting the highest ratio(s). Recompute the ratios for the sub-matrix and remove the highest(s) ratios. (not sure how many lines or columns we can remove safely in one step), and so on until there is no invalid data left. It seems like an okay solution, but I am unsure it always gives the optimal solution.

My guess is that it is a standard data analysis problem, but surprisingly I could not find a solution online.

Too many rows on wordpress wp_options table

On one of my sites I saw that in wp_options table there are more than a hundred thousand rows after a while they all start similar to something like: _wp_session_ _wp_session_expires_ and it mosty sas things like “you must anable payment gateways to use digital downloads” etc….

it looks like someone inserted this bad content into my database.

Question 1) Is it ok if I just delete these rows or it will harm something? Question 2) How can I prevent someone inserting things into my database again? and my password was strong anyway Question 3) what were they doing with my database? I didnt notice anything visible on my site

my most importnat question at this point is question 1