How to save about 500GB of data in a database?

I want to save the bitcoin data and build a Bitcoin Indexer. While saving the data, the write speed becomes slower more and more. The size of all data is about 500GB and when it arrives to about 20GB, the speed is extremely slow. Notice that I have 4 tables and each table has some indexes and I’ve tried both MongoDB and MySQL. What would be a proper way for this?

Can I use PartialCorrelationFunction on multivariate data?

I am doing a bit of work on multivariate time series and I need to calculate the partial autocorrelation function of a matrix. I will provide some details bellow and show an example of how I simulate my data:

Data simulation First I create an autoregressive process that I need for my work. Mathematica’s ARProcess does not seem to work on multivariate data directly, however, I managed to find a workaround for that:

aCoef = {{{0, 0, 0, 0}, {0, 0, 0, 0.8}, {0, 0, 0, 0}, {0, 0, 0.8, 0}}, {{0, 0, 0, 0}, {0.8, 0, 0, 0},{0, 0, 0, 0}, {0.8, 0, 0, 0}}};

These coefficients allow me to simulate data that have 4 paths in the time series that influence each other in different time lags. Now that my coefficients are defined, I am simulating the data like this:

data = RandomFunction[ARProcess[aCoef, IdentityMatrix[4]], {0, 10000}];

Now what this process does, is to create a TemporalData object, which can be used for my further analysis. I would like to see what the partial autocorrelation is between the different channels. This will allow me to see if the simulated data are as I expect. I would expect to be able to use a function that would output something like this:

A matrix of the output of pacf function from R

This is the output of the pacf function in R, that I have implemented using REvaluate in Mathematica. It shows that there are values different from zero, where expected, i.e. the fourth element of the matrix on row 1, column 2, the third element of the matrix on row 1 column 4, the first element of the matrix on row 2 column 2 and finally the first element of matrix on row 2 column 4. The matrix rows represent time lags and the columns represent each path of the time series. I have tried using PartialCorrelationFunction: Output of the fucntion

According to the Wolfram Documentation, either data or tproc can used as 1st argument in that function, I have tried both with no success. Although this is not an error in the language, it does not provide the information I am looking for. I could use the R function, but ideally I would like to use the functions in Wolfram. Does anyone have any idea on how to fix this? Thanks!

Can I sell data that is displayed on a betting website in my own api? [closed]

If I was able to scrape the odds, events, date and other things from a betting sites public website, where anyone can access the information. Would I be able to sell this data in an API?

The betting sites would be the 10 most popular betting sites in the UK, and I would be getting information on all the matches in the biggest football leagues.

Could I be sued if I start selling this data, or am I allowed to as it’s publicly accessible and the only thing they can really claim to own are the odds, if they can even do that. The data would be sold as an API service where members pay monthly for a certain number of requests to the API.

Plsql query to split data based on startdate and end date

I want to split data based on start ,end date ,date range configured (yearly/monthly/weekly/quarterly)

For example if the

Startdate is 2015/10/02 and the

Enddate is 2015/12/22

and my date range is Monthly (M)

then my required output is:

Output ------------- Newstartdate newEnddate 2015/10/02.   2015/10/31 2015/11/01.   2015/11/30 2015/12/01.   2015/12/30 

So I was looking for a generic PL/SQL query to split data what ever date range can be(Y/Q/M/W/D) based on the start and end dates and the range specification.

Saving data from for loop to file and then read the data and plot graphs

I have a for loop which generates random matrices, then performs a few calculations to yield a bunch of different values. Essentially, something like so: For[j = 1, j < 100, j++, a={{1}{2}{3}{4}},b={{5}{6}{7}{8}},c=Tr[a],d=Tr[b], e=c+d, {a,b,c,d,e}>>> file]

Of course in the above, we repeat the same values over and over, I am using this as an example as the script would be too long and requires a list of different packages. I need to save these values to a file in a way that I can then extract the values and create a bunch of different graphs. What is the best way to go about this?

Oracle 11gR2 => Oracle Data Pump (expdp, impdp) => can backup be safely taken during runtime

until now I create a daily Logical backup backup of my Oracle 11gR2 Database at midnight while the database is running but the client application is in an idle state so that no queries are executed on the database.

Now I also want to implement a second Backup during the Day while the Database and the Client application are both up and running and queries (select/update/insert/delete) are executed.

Because I have already well tested Backup and Restore Scripts I want to continue using expdp and impdp.

This second "during the day" backup would not be directly imported in the production system after a potential data loss. I would import it on a mirrored test system and then manually use OracleSqlExplorer to query the lost data.

This leads to the following questions:

  1. If I use expdp to perform a backup during the database runtime is it guaranteed that the database where I take the backup from and SQL statements are executed during the backup process does contain its integrity and consistency ?
  2. Do I need to add certain Parameters to the expdp command to guarantee consistency ? I found this:

"expdp options for creating a consistent export dump: FLASHBACK_SCN, FLASHBACK_TIME, CONSISTENT=Y"

So far I use this linux shell script:

$  ORACLE_HOME/bin/expdp \"$  USERNAME/$  PASSWORD as sysdba\" SCHEMAS=<csv list of schemas> REUSE_DUMPFILES=Yes DIRECTORY=backup DUMPFILE=$  BACKUP_NAME.dmp 
  1. Can I use a backup create with expdp during the database runtime as a valid source for a impdp without having to worry about integrity and consistency ? For question number 2 I found a thread that says NO

PostgreSQL 13 – Improve huge table data aggregation

I have a huge database (current size is ~900GB and new data still comes) partitioned by Year_month and subpartition by currency. The problem is when I try to fetch aggregation from the whole partition it goes slow. This is a report so it will be queried very often. The current size of partition which I want to aggregate: 7.829.230 rows. Each subpartition will be similar. Table schema (anonymized):

-- auto-generated definition CREATE TABLE aggregates_dates (     currency              CHAR(3)                                    NOT NULL,     id                    uuid            DEFAULT uuid_generate_v4() NOT NULL,     date                  TIMESTAMP(0)                               NOT NULL,     currency              CHAR(3)                                    NOT NULL,     field01               INTEGER                                    NOT NULL,     field02               INTEGER                                    NOT NULL,     field03               INTEGER                                    NOT NULL,     field04               INTEGER                                    NOT NULL,     field05               INTEGER                                    NOT NULL,     field06               CHAR(2)                                    NOT NULL,     field07               INTEGER         DEFAULT 0                  NOT NULL,     field08               INTEGER         DEFAULT 0                  NOT NULL,     field09               INTEGER         DEFAULT 0                  NOT NULL,     field10               INTEGER         DEFAULT 0                  NOT NULL,     field11               INTEGER         DEFAULT 0                  NOT NULL,     value01               INTEGER         DEFAULT 0                  NOT NULL,     value02               INTEGER         DEFAULT 0                  NOT NULL,     value03               INTEGER         DEFAULT 0                  NOT NULL,     value04               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value05               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value06               INTEGER         DEFAULT 0                  NOT NULL,     value07               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value08               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value09               INTEGER         DEFAULT 0                  NOT NULL,     value10               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value11               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value12               INTEGER         DEFAULT 0                  NOT NULL,     value13               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value14               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value15               INTEGER         DEFAULT 0                  NOT NULL,     value16               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value17               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value18               NUMERIC(24, 12) DEFAULT '0'::NUMERIC       NOT NULL,     value19               INTEGER         DEFAULT 0,     value20               INTEGER         DEFAULT 0,     CONSTRAINT aggregates_dates_pkey         PRIMARY KEY (id, date, currency) )     PARTITION BY RANGE (date); CREATE TABLE aggregates_dates_2020_01     PARTITION OF aggregates_dates         (             CONSTRAINT aggregates_dates_2020_01_pkey                 PRIMARY KEY (id, date, currency)             )         FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2020-01-31 23:59:59')     PARTITION BY LIST (currency); CREATE TABLE aggregates_dates_2020_01_eur     PARTITION OF aggregates_dates_2020_01         (             CONSTRAINT aggregates_dates_2020_01_eur_pkey                 PRIMARY KEY (id, date, currency)             )         FOR VALUES IN ('EUR'); CREATE INDEX aggregates_dates_2020_01_eur_date_idx ON aggregates_dates_2020_01_eur (date); CREATE INDEX aggregates_dates_2020_01_eur_field01_idx ON aggregates_dates_2020_01_eur (field01); CREATE INDEX aggregates_dates_2020_01_eur_field02_idx ON aggregates_dates_2020_01_eur (field02); CREATE INDEX aggregates_dates_2020_01_eur_field03_idx ON aggregates_dates_2020_01_eur (field03); CREATE INDEX aggregates_dates_2020_01_eur_field04_idx ON aggregates_dates_2020_01_eur (field04); CREATE INDEX aggregates_dates_2020_01_eur_field06_idx ON aggregates_dates_2020_01_eur (field06); CREATE INDEX aggregates_dates_2020_01_eur_currency_idx ON aggregates_dates_2020_01_eur (currency); CREATE INDEX aggregates_dates_2020_01_eur_field09_idx ON aggregates_dates_2020_01_eur (field09); CREATE INDEX aggregates_dates_2020_01_eur_field10_idx ON aggregates_dates_2020_01_eur (field10); CREATE INDEX aggregates_dates_2020_01_eur_field11_idx ON aggregates_dates_2020_01_eur (field11); CREATE INDEX aggregates_dates_2020_01_eur_field05_idx ON aggregates_dates_2020_01_eur (field05); CREATE INDEX aggregates_dates_2020_01_eur_field07_idx ON aggregates_dates_2020_01_eur (field07); CREATE INDEX aggregates_dates_2020_01_eur_field08_idx ON aggregates_dates_2020_01_eur (field08); 

Example Query (not all fields used) which aggregate whole partition (This query might have many more WHERE conditions but this one is the worst case)

EXPLAIN (ANALYSE, BUFFERS, VERBOSE) SELECT        COALESCE(SUM(mainTable.value01), 0)            AS                                    "value01",        COALESCE(SUM(mainTable.value02), 0)       AS                                    "value02",        COALESCE(SUM(mainTable.value03), 0)       AS                                    "value03",        COALESCE(SUM(mainTable.value06), 0)       AS                                    "value06",        COALESCE(SUM(mainTable.value09), 0)    AS                                    "value09",        COALESCE(SUM(mainTable.value12), 0)      AS                                    "value12",        COALESCE(SUM(mainTable.value15), 0) AS                                    "value15",        COALESCE(SUM(mainTable.value03 + mainTable.value06 + mainTable.value09 + mainTable.value12 +                     mainTable.value15), 0) AS                                    "kpi01",        COALESCE(SUM(mainTable.value05) * 1, 0)                                         "value05",        COALESCE(SUM(mainTable.value08) * 1, 0)                                         "value08",        COALESCE(SUM(mainTable.value11) * 1, 0)                                      "value11",        COALESCE(SUM(mainTable.value14) * 1, 0)                                        "value14",        COALESCE(SUM(mainTable.value17) * 1, 0)                                   "value17",        COALESCE(SUM(mainTable.value05 + mainTable.value08 + mainTable.value11 + mainTable.value14 +                     mainTable.value17) * 1, 0)                                   "kpi02",        CASE            WHEN SUM(mainTable.value02) > 0 THEN (1.0 * SUM(                        mainTable.value05 + mainTable.value08 + mainTable.value11 +                        mainTable.value14 + mainTable.value17) / SUM(mainTable.value02) * 1000 * 1)            ELSE 0 END                                                                      "kpiEpm",        CASE            WHEN SUM(mainTable.value01) > 0 THEN (1.0 * SUM(                        mainTable.value05 + mainTable.value08 + mainTable.value11 +                        mainTable.value14) / SUM(mainTable.value01) * 1)            ELSE 0 END FROM performance mainTable WHERE (mainTable.date BETWEEN '2020-01-01 00:00:00' AND '2020-02-01 00:00:00')   AND (mainTable.currency = 'EUR') GROUP BY mainTable.field02; 

EXPLAIN:

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |QUERY PLAN                                                                                                                                                                          | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |HashAggregate  (cost=3748444.51..3748502.07 rows=794 width=324) (actual time=10339.771..10340.497 rows=438 loops=1)                                                                 | |  Group Key: maintable.field02                                                                                                                                                      | |  Batches: 1  Memory Usage: 1065kB                                                                                                                                                  | |  Buffers: shared hit=2445343                                                                                                                                                       | |  ->  Append  (cost=0.00..2706608.65 rows=11575954 width=47) (actual time=212.934..4549.921 rows=7829230 loops=1)                                                                   | |        Buffers: shared hit=2445343                                                                                                                                                 | |        ->  Seq Scan on performance_2020_01 maintable_1  (cost=0.00..2646928.38 rows=11570479 width=47) (actual time=212.933..4055.104 rows=7823923 loops=1)                        | |              Filter: ((date >= '2020-01-01 00:00:00'::timestamp without time zone) AND (date <= '2020-02-01 00:00:00'::timestamp without time zone) AND (currency = 'EUR'::bpchar))| |              Buffers: shared hit=2444445                                                                                                                                           | |        ->  Index Scan using performance_2020_02_date_idx on performance_2020_02 maintable_2  (cost=0.56..1800.50 rows=5475 width=47) (actual time=0.036..6.476 rows=5307 loops=1)  | |              Index Cond: ((date >= '2020-01-01 00:00:00'::timestamp without time zone) AND (date <= '2020-02-01 00:00:00'::timestamp without time zone))                           | |              Filter: (currency = 'EUR'::bpchar)                                                                                                                                    | |              Rows Removed by Filter: 31842                                                                                                                                         | |              Buffers: shared hit=898                                                                                                                                               | |Planning Time: 0.740 ms                                                                                                                                                             | |JIT:                                                                                                                                                                                | |  Functions: 15                                                                                                                                                                     | |  Options: Inlining true, Optimization true, Expressions true, Deforming true                                                                                                       | |  Timing: Generation 4.954 ms, Inlining 14.249 ms, Optimization 121.115 ms, Emission 77.181 ms, Total 217.498 ms                                                                    | |Execution Time: 10345.662 ms                                                                                                                                                        | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 

Server spec:

  • AMD 64 Threads
  • 315GB Ram
  • 6xSSD RAID 10 Postgres Config:
postgresql_autovacuum_vacuum_scale_factor: 0.4 postgresql_checkpoint_completion_target: 0.9 postgresql_checkpoint_timeout: 10min postgresql_effective_cache_size: 240GB postgresql_maintenance_work_mem: 2GB postgresql_random_page_cost: 1.0 postgresql_shared_buffers: 80GB postgresql_synchronous_commit: local postgresql_work_mem: 1GB 

Migrate MySQL spatial data to PostgreSQL(PostGIS)

We have a web system that uses MySQL spatial databases. We want to migrate from MySQL to PostgreSQL. Our database has geometry and point data types, 21 tables and its size is 1.6GB.

I’ve been looking for methods to do it. I have found some tools that help you to migrate. However most of them do not support spatial data like https://github.com/philipsoutham/py-mysql2pgsql.

I also took a look over https://gis.stackexchange.com/questions/104081/how-to-migrate-spatial-tables-from-mssql-to-postgis. I have just seen this post. I haven’t tried it yet, I don’t know if it could work for MySQL. Furthermore, I’d like to do it using DB managers reather than Qgis or ArcGis.

Adding arbitrary data via wp_style_add_data?

I stumbled across wp_style_add_data while reading though someone else’s code. I can see a list of keys that the function accepts and trying to add my own data-foo for example didn’t product any output however title did

Is there a filter to add any arbitrary key into this function? I tried following the source down but didn’t find anything