Difficulty Optimizing Postgres Write Throughput

I’m trying to index a collection of roughly 127K files using a PostgreSQL 9.6 server on RDS. I had expected the process of writing these documents to the DB to take about 8 hours, but over time I observe that the write rate decays to 0 so that the process doesn’t complete (at some point, inserts begin timing out). Unfortunately, I don’t have much DBA/PostgreSQL background, so I’m struggling to debug this.

On average, indexing one file means inserting 125 rows and 0.5 MB of data into a table (some files are significant outliers, yielding ~40K rows). I have several indices on the table (I don’t think I can avoid these, due to other requirements).

Initially, the max WAL size on the database was set to 2 GB. My automation could process roughly 25K files before insert performance became unusable, with acceptable write throughput during the first hour. Increasing the max WAL size to 30 GB helped, but didn’t completely solve the problem; with this configuration the system was able to index 85K files before the insert rate degraded. Looking at the database logs, I saw primarily checkpoints that started due to the configured timeout (15 minutes). On the RDS console, I see a fairly consistent average write throughput of 5-15 MB / sec.

Eventually I may need to index a larger corpus of ~635K files, so I’d like to find settings where I get consistent write throughput.

Database Specifications:

  • PostgreSQL 9.6.15 (on RDS)
  • 6 TB disk
  • 4 CPU
  • 16 GB RAM
  • Max WAL size: 30 GB
  • Checkpoint Timeout: 15 min.
  • Checkpoint Completion Target: 0.9

Questions:

  • Do I need to increase the configured max WAL size again? Is there a rule of thumb for how large this should be?
  • Why did a 15x increase in max WAL size only increase the amount of files I could index by 3-4x?
  • Are there other places I should look for diagnostic information?
  • If I stop the write process temporarily and restart it hours later, write performance improves temporarily. Why is this?

How can I write up interesting filler for my campaign?

Before I ask the main question, bear with me whilst I set the scene as it is now:

I’m running an original campaign set in a city. I’m building this campaign around a generated map, and making it a navigable whole-city zone. The main arc of this campaign involves eventually overthrowing the government, which has largely failed in their task of operating the business of the empire. At the end of the fourth session, just yesterday, I actually ran out of content for the session almost an hour before I thought I would. In my defence, one of the encounters should have been at least 30 minutes, but the players just blew right through it in 2 minutes. That’s gonna bite them in the ass in a few sessions, and if that’s how their characters want to operate, so be it. But that doesn’t help me much as the DM, running out of content early. If anyone has advice on getting my players to take the bait, I’d appreciate that as well as my main question.

So I’m going to employ the old ‘Adventurer’s Guild’ trope to present some space filling side quests. Problem is, for the main arc to keep making sense, all the side quests have to take place within a day’s walk of the city. I’ve got a couple standard ones sketched out. One session length, gold payout at the end, boring if you ask me, but it’s probably what they’re hoping for.

What I’d like is to make an obviously boring ‘joke’ quest that ends up being several sessions long and takes several bizarre, hard left turns. I can write crazy crap all day, so I suppose my question is how do I present it to them? How do I give them a quest that they’re never going to take, until it’s the last one on the list? And then when they finally do it they’ll hate themselves for not doing it sooner.

Write a python or C program to guessing the Key

Key generation program in C:

#include <stdio.h>   #include <stdlib.h>   #include <time.h> #define KEYSIZE 16 void main() { int i; char key[KEYSIZE]; printf("%lld\n", (long long) time(NULL)); srand (time(NULL)); for (i = 0; i< KEYSIZE; i++){ key[i] = rand()%256; printf("%.2x", (unsigned char)key[i]); } printf("\n"); } 

Scenario:

On April 17, 2018, Alice finished her tax return, and she saved the return (a PDF file) on her disk. To protect the file, she encrypted the PDF file using a key generated from the program described above.She wrote down the key in a notebook, which is securely stored in a safe. A few month later, Bob broke into her computer and gets a copy of the encrypted tax return. Since Alice is CEO of a big company, this file is very valuable.

Bob cannot get the encryption key, but by looking around Alice’s computer, he saw the key-generation program, and suspected that Alice’s encryption key may be generated by the program. He also noticed the timestamp of the encrypted file, which is “2018-04-17 23:08:49”. He guessed that the key may be generated within a two-hour window 1 before the file was created.

Since the file is a PDF file, which has a header. The beginning part of the header is always the version number. Around the time when the file was created, PDF-1.5 was the most common version, i.e., the header starts with %PDF-1.5, which is 8 bytes of data. The next 8 bytes of the data are quite easy to predict as well. Therefore, Bob easily got the first 16 bytes of the plaintext. Based on the meta data of the encrypted file,he knows that the file is encrypted using aes-128-cbc. Since AES is a 128-bit cipher, the 16-byte plaintext consists of one block of plaintext, so Bob knows a block of plaintext and its matching ciphertext. Moreover, Bob also knows the Initial Vector (IV) from the encrypted file (IV is never encrypted). Here is what Bob knows:

  • Plaintext: 255044462d312e350a25d0d4c5d80a34
  • Ciphertext: d06bf9d0dab8e8ef880660d2af65aa82
  • IV: 09080706050403020100A2B2C2D2E2F2

Your job is to help Bob find out Alice’s encryption key, so you can decrypt the entire document. You should write a program to try all the possible keys. If the key was generated correctly, this task will not be possible. However, since Alice used time() to seed her random number generator, you should be able to find out her key easily.

Write the smallest positive number that can be represented by the floating point system

Using a normalised floating point representation box with an 8-bit mantissa and a 4-bit exponent, both stored using two’s complement.

(a) Write the smallest positive number that can be represented by the floating point system in the boxes below. The result is: Mantissa 0.1000000 and exponent 1000

Do not see how this can could someone please explain.

How do I correctly use %p and %f in postgresql.conf to make the Write Ahead Log?

I am attempting to modify postgresql.conf (Postgre SQL 10.0 under Ubuntu 18.04 VPS) to setup the Write Ahead Log (WAL).

AIUI, the commands should be:

wal_level = replica             # minimal, replica, or logical archive_mode = on               # enables archiving; off, on, or always                                 # (change requires restart) archive_command = 'cp %p /var/lib/postgresql/10/main/pg_wal/%f'           archive_timeout = 600           # force a logfile segment switch after this                                 # number of seconds; 0 disables 

AIUI, %p references the database file (to be archived) and %f references the filename that will contain the copy of said database file.

In this case: I am saying to copy every 5 minutes the database file to the /pg_wal directory.

However, this is clearly not correct as the error log files shows:

cp: 'pg_wal/000000010000000000000002' and '/var/lib/postgresql/10/main/pg_wal/000000010000000000000002' are the same file 2020-02-16 21:01:05.857 UTC [20707] LOG:  archive command failed with exit code 1 2020-02-16 21:01:05.857 UTC [20707] DETAIL:  The failed archive command was: cp pg_wal/000000010000000000000002 /var/lib/postgresql/10/main/pg_wal/000000010000000000000002 

So I am not sure how %p and/or %f are supposed to be used. If someone could present the syntax, I would be most grateful.

How do programmers write exploits from CVE with no known metasploit exploit?

Please don’t just tell me to go read “A Bug Hunter’s Diary” book. I’ve noticed that lots of CVE at www.cvedetails.com do not have publicly available exploits. But they have high scores (ex : more that 9 score) With such a high score, I’d thought exploits would be readily available but it’s not the case (not even present in exploitdb).

So how would a hacker basically, from the description at www.cvedetails.com write an exploit code. Do hackers even bother do that ?

How to write this access control matrix?

This is a simplified dump for the ls – l shell command in the current folder.

-r--r----- alice admin 1 -r--r--r-- bob bob 2 -rw-rw---- charlie charlie 3 -rw-r----- charlie admin 4 ---x--x--x alice alice editor ---x--s--- bob admin editor-super 

Unix users are alice, bob, charlie. root is the system administrator.

The id command for each user returns:

  • id alice: uid=1000(alice) gid=1000(alice) groups=1000(alice),1003(admin)

  • id bob: uid=1001(bob) gid=1001(bob) groups=1001(bob)

  • id charlie: uid=1002(charlie) gid=1002(charlie) groups=1002(charlie), 1003(admin)

There are 2 executable files:

  • editor lets you open a file with Read and Write capabilities;

  • editor-super does the same as editor.

Draw up an access control matrix with subjects {alice, bob, charlie} and objects {1,2,3,4} that shows, for each combination of subject and object, whether the subject will be able to read (R), and/or write (W) the respective object.

Note: root should not appear in the matrix.

Below is my solution. How should I complete it?

            1       |       2      |      3      |       4     |  Alice   |    r      |       r      |             |      r      |  Bob     |           |       r      |             |             |  Charlie |    r      |       r      |      rw     |       rw    |