possible corruption or other issue that doesnt show on basic scans?

I have a WP table with about 33000 records that suddenly is taking more than a minute to query. According to optimization tools, everything is good but given the sudden nature of the trouble, I doubt that. We went from those tables taking way less than second to query to between 48-93 seconds now each time. Have confirmed no other load on the server slowing it down so its all down to this. Looking for other places to look or check given the issues?

Interpreting startup time and varying plans for Postgres seq scans

In asking a recent question, some mysterious startup time components came up in my EXPLAIN ANALYZE output. I was playing further, and discovered that the startup time drops to near 0 if I remove the regex WHERE clause.

I ran the following bash script as a test:

for i in $  (seq 1 10) do     if (( $  RANDOM % 2 == 0 ))     then         echo "Doing plain count"         psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2"     else         echo "Doing regex count"         psql -e -c "EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'"     fi done 

The first query returns a count of ~30 million, and the second counts only 7 rows. They are running on a PG 12.3 read replica in RDS with minimal other activity. Both versions take roughly the same amount of time, as I’d expect. Here is some output filtered with grep:

Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.086..38622.215 rows=10114306 loops=3) Doing regex count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3091955.34 rows=897 width=0) (actual time=16856.679..41398.062 rows=2 loops=3) Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.162..39454.499 rows=10114306 loops=3) Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.036..39213.171 rows=10114306 loops=3) Doing regex count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3091955.34 rows=897 width=0) (actual time=12711.308..40015.734 rows=2 loops=3) Doing plain count                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060374.07 rows=12632507 width=0) (actual time=0.244..39277.683 rows=10114306 loops=3) Doing regex count ^CCancel request sent 

So, a few questions:

  1. What goes into this startup component of "actual time" in the regex scan, and why is it so much larger? (10-20s vs 0-1s)

  2. Although "cost" and "time" aren’t comparable units, the planner seems to think the startup cost should be 0 in all cases – is it being fooled?

  3. Why do the strategies seem different? Both plans mention Partial Aggregate, but the regex query says actual rows is 2, but the plain version says actual rows is ~10 million (I guess this is some kind of average between 2 workers and 1 leader, summing to ~30 million). If I had to implement this myself, I would probably add up the results of several count(*) operations, instead of merging rows and counting – do the plans indicate how exactly its doing that?

So I don’t hide anything, below are full versions of the query plan for each:

 EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2                                                                        QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------  Finalize Aggregate  (cost=3093171.59..3093171.60 rows=1 width=8) (actual time=39156.499..39156.499 rows=1 loops=1)    ->  Gather  (cost=3093171.37..3093171.58 rows=2 width=8) (actual time=39156.356..39157.850 rows=3 loops=1)          Workers Planned: 2          Workers Launched: 2          ->  Partial Aggregate  (cost=3092171.37..3092171.38 rows=1 width=8) (actual time=39154.405..39154.406 rows=1 loops=3)                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3060587.90 rows=12633390 width=0) (actual time=0.033..38413.690 rows=10115030 loops=3)  Planning Time: 7.968 ms  Execution Time: 39157.942 ms (8 rows)   EXPLAIN ANALYZE SELECT count(*) FROM ui_events_v2 WHERE page ~ 'foo'                                                                    QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------  Finalize Aggregate  (cost=3093173.83..3093173.84 rows=1 width=8) (actual time=39908.495..39908.495 rows=1 loops=1)    ->  Gather  (cost=3093173.61..3093173.82 rows=2 width=8) (actual time=39908.408..39909.848 rows=3 loops=1)          Workers Planned: 2          Workers Launched: 2          ->  Partial Aggregate  (cost=3092173.61..3092173.62 rows=1 width=8) (actual time=39906.317..39906.318 rows=1 loops=3)                ->  Parallel Seq Scan on ui_events_v2  (cost=0.00..3092171.37 rows=897 width=0) (actual time=17250.058..39906.308 rows=2 loops=3)                      Filter: (page ~ 'foo'::text)                      Rows Removed by Filter: 10115028  Planning Time: 0.803 ms  Execution Time: 39909.921 ms (10 rows) 

Aggregate Multiple Instances of Each Row Without Multiple Seq Scans

I am trying to perform some mathematical operations in PostgreSQL that involve calculating multiple values from each row, then aggregating, without requiring multiple Seq Scans over the whole table. Performance is critical for my application, so I want this to run as efficiently as possible on large data sets. Are there any optimizations I can do to cause PostgreSQL to only use a single Seq Scan?

Here’s a simplified example:

Given this test data set:

postgres=> CREATE TABLE values (value int); postgres=> INSERT INTO values (value) SELECT * from generate_series(-500000,500000); postgres=> SELECT * FROM values;   value ---------  -500000  -499999  -499998  -499997  -499996 ...  499996  499997  499998  499999  500000 

And I want to perform this query that counts 2 instances of each row, once by the value column and once by the abs(value). I’m currently accomplishing this with CROSS JOIN:

SELECT   CASE idx   WHEN 0 THEN value   WHEN 1 THEN abs(value)   END,   COUNT(value) FROM values CROSS JOIN LATERAL unnest(ARRAY[0,1]) idx GROUP BY 1; 

Here’s the EXPLAIN ANALYZE result for this query. Notice the loops=2 in the Seq Scan line:

postgres=> EXPLAIN ANALYZE SELECT ....                                                           QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------  HashAggregate  (cost=82194.40..82201.40 rows=400 width=12) (actual time=997.448..1214.576 rows=1000001 loops=1)    Group Key: CASE idx.idx WHEN 0 THEN "values".value WHEN 1 THEN abs("values".value) ELSE NULL::integer END    ->  Nested Loop  (cost=0.00..70910.65 rows=2256750 width=8) (actual time=0.024..390.070 rows=2000002 loops=1)          ->  Function Scan on unnest idx  (cost=0.00..0.02 rows=2 width=4) (actual time=0.005..0.007 rows=2 loops=1)          ->  Seq Scan on "values"  (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.012..82.584 rows=1000001 loops=2)  Planning Time: 0.073 ms  Execution Time: 1254.362 ms 

I compared this to the case of only using 1 instance of each row rather than 2. The 1 instance query performs a single Seq Scan and runs ~50% faster (as expected):

postgres=> EXPLAIN ANALYZE SELECT postgres-> value, postgres-> COUNT(value) postgres-> FROM values postgres-> GROUP BY 1;                                                        QUERY PLAN -------------------------------------------------------------------------------------------------------------------------  HashAggregate  (cost=21350.62..21352.62 rows=200 width=12) (actual time=444.381..662.952 rows=1000001 loops=1)    Group Key: value    ->  Seq Scan on "values"  (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.015..84.494 rows=1000001 loops=1)  Planning Time: 0.044 ms  Execution Time: 702.806 ms (5 rows) 

I want to scale this up to a much larger data set, so performance is critical. Are there any optimizations I cause my original query to run with only 1 Seq Scan? I’ve tried tweaking query plan settings (enable_nestloop, work_mem, etc)

Other Attempts

Here are some other approachs I tried:

  1. Using UNION still performs 2 Seq Scans:
SELECT    value,    COUNT(value) FROM (   SELECT value FROM values UNION   SELECT abs(value) AS value FROM values ) tbl GROUP BY 1; 
postgres=> EXPLAIN ANALYZE ...                                                                   QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------------------  HashAggregate  (cost=130150.31..130152.31 rows=200 width=12) (actual time=1402.221..1513.000 rows=1000001 loops=1)    Group Key: "values".value    ->  HashAggregate  (cost=73731.56..96299.06 rows=2256750 width=4) (actual time=892.904..1112.867 rows=1000001 loops=1)          Group Key: "values".value          ->  Append  (cost=0.00..68089.69 rows=2256750 width=4) (actual time=0.025..343.921 rows=2000002 loops=1)                ->  Seq Scan on "values"  (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.024..86.299 rows=1000001 loops=1)                ->  Seq Scan on "values" values_1  (cost=0.00..18529.69 rows=1128375 width=4) (actual time=0.013..110.885 rows=1000001 loops=1)  Planning Time: 0.067 ms  Execution Time: 1598.531 ms 
  1. Using PL/PGSQL. This performs only 1 Seq Scan, but ARRAY operations in PL/PGSQL are very slow, so this actual executes slower than the original:
CREATE TEMP TABLE result (value int, count int); DO LANGUAGE PLPGSQL $  $     DECLARE     counts int8[];     row record;   BEGIN      counts = array_fill(0, ARRAY[500000]);     FOR row IN (SELECT value FROM values) LOOP       counts[row.value] = counts[row.value] + 1;       counts[abs(row.value)] = counts[abs(row.value)] + 1;     END LOOP;      FOR i IN 0..500000 LOOP       CONTINUE WHEN counts[i] = 0;       INSERT INTO result (value, count) VALUES (i, counts[i]);     END LOOP;   END $  $  ; SELECT value, count FROM result; 
postgres=> \timing Timing is on. postgres=> DO LANGUAGE PLPGSQL $  $   ... DO Time: 2768.611 ms (00:02.769) 
  1. Tweaking Query Plan Configuration. I tried changing enable_seqscan, enable_nestloop, work_mem, and cost constraints and could not find a configuration that performed better than original.

Why Are Table Scans Bad?

At my work, I am often told that I shouldn’t do a table scan because it could negatively impact the database and the app using the database. What exactly do they mean by "negatively impact"? One negative impact I can think of is that a table scan would use up a lot of the disk’s time, blocking other requests from using the disk.

Is there a way to do a table scan without any negative impacts? Sometimes I need to do a table scan for one off checks (for routine things, I would of course make an index).

Alfa card scans networks for a few mins and only when replugging it into USB port

I am running Kali Linux (2020.1b 64bit) via VirtualBox (6.1) on a macOS host (10.15.4) with a brand new Alfa AWUS036NH network card attached via USB.

The network card is recognised by Kali Linux and is set to monitoring mode:

$   iwconfig                                                                                                                                                                                                                                                                                                       lo        no wireless extensions.                                                                                                                                                                                                                                                                                            wlan0     IEEE 802.11  Mode:Monitor  Frequency:2.427 GHz  Tx-Power=20 dBm                                                                                                                                                                                                                                                             Retry short  long limit:2   RTS thr:off   Fragment thr:off                                                                                                                                                                                                                                                                  Power Management:off 

However, when running airodump-ng wlan0, no wireless networks are discovered despite there being numerous.

I have noticed that by physically removing the adapter and re-plugging it back into the USB port, I am able to get the card to discover the nearby networks. This does not always work, but it is the only scenario that I am able to get it working. Furthermore, whenever I get it “working”, this state is short-lived and the networks disappear within a few minutes — and then silence.

What could be causing this unusual behaviour and what diagnostic tests can I run to narrow down the cause?

Why nmap scans port in my default gateway?

I tried to use nmap in my computer and saw that nmap cant find nothing for my local computer ip(even that HTTPS absolutely open).

But when I tried to scan nmap with range of IP’s I saw that the only open port’s nmap found was on the default gateway ip.

Why is that?(cant find nothing on the web).

edit: I have been asked to give exampale.

so lets say my default getway(router) ip its 1.1.1.1 and my first computer local ip is 1.1.1.2 and my second computer loacl ip is 1.1.1.3

when I try nmap(with different parameters) on 1.1.1.2 or 1.1.1.3 I dont getting any open port(“all 1000 ports are closed”)

but when I try nmap on 1.1.1.1 I am getting 12 open ports(that I belive open on my first or second computer).

Is it safe to store user-uploaded ID scans in S3 (with server encryption)?

Currently my web app stores user-uploaded ID scans in S3. I am concerned about an eventual data leak.

The S3 bucket is encrypted with server-side encryption (AES-256) but I figure the next obvious risk is an attacker gaining access to the AWS account itself. I have secured the root account with 2FA but there are several user accounts which have full access to S3 still (such as a Travis CI account).

The solution I am thinking of is periodically moving the ID scans to a different source with client-side encryption (where only I know the private key). That way if a leak happens, there will be only a small amount of data leaked.

Is this a common practice or are there better solutions in this situation?