Sampling a uniform distribution of fixed size strings containing no forbidden substrings

Given a list of “forbidden” words (substrings), an alphabet, and a desired output string length, how would I efficiently sample output strings containing no forbidden word?

For short output strings with few forbidden words, I would use simple rejection sampling. Pick a string (uniformly) with the specified alphabet and length, return that string if it contains no element of the forbidden list, try again otherwise.

If I use that algorithm for output lengths several times larger than the typical forbidden word, then the probability of rejection will be higher. (Most words are 2 or 3 characters long.)

Assume the requested output length is too long to enumerate and store every possible value. My alphabet size would be 16 to 36 characters, but solutions to large alphabets would be interesting to think about. (In which case I would call these things random sentences, forbidden n-grams, and dictionary words.)

My forbidden word list will have one hundred to one thousand strings. I would like to avoid solutions requiring expensive precomputation or lots of memory.


My first idea was to try to build a random string incrementally, in contrast to the all-or-nothing approach of straightforward rejection sampling. I doubt that my algorithm produces each possible output with equal probability.

The algorithm idea follows:

  1. Initialize a char buffer long enough to fit outlen characters.
  2. Pick a random letter of the alphabet and append it to the buffer.
  3. If the buffer ends with a forbidden word of length k, then remove the last k letters from the char buffer and go to 2.
  4. Otherwise, go to 2 if the buffer has less than outlen characters.
  5. Return the contents of the buffer if it is full.

Step 3 serves to rewind the algorithm, returning the char buffer to a previous legal state.

I understand that clearing the whole buffer in step 3 definitely would produce uniform output just like the straightforward rejection sampling method. However, the average number of rejections before the first valid output is generated will be the same.

I’ve gotten stuck trying to determine if my proposed algorithm is uniform. I have had no luck finding alternative algorithms either. I haven’t yet looked at how this algorithm’s performance would compare to basic rejection sampling.

PRESS RELEASE DISTRIBUTION ON 250+ NEWS WEBSITES | TOP RANKED NEWS WEBSITES for $45

Press Release Distribution Guaranteed Presence On 250+ News Websites, Including Local, National, Regional, Radio And TV, Trade And Industry, International News Websites. Google News and Google Pick Up Guaranteed. The benefit of the press release gain more trust from google and people present your websites to your clients showcase news channels on your website (featured in)Boost SEO ranking ( High PR backlinks )250+ Strong Backlinks ARTICLES and LINKS PUBLISHED ON: International Newspapers & Major News Sites | USAStandard News OutletsIndustry News SitesFinancial FeedsRegional & Local News SitesIndian News SitesSocial Media WHAT YOU GET: 250+ News Websites. Press Release is SEO Optimized to Gain High Ranks in Major Search Engines Up to 4 Links Included in the Body of Press Release. Guaranteed Inclusion in Google News with Anchor Links Back to Your Site. Press Release is Published in News, TV, Radio, Newspaper and Niche Sites. Press Release is Shared on Various Social Media Sites. Eg. Facebook, Twitter. Great For SEO For Instant Authority, Link Diversity, Anchor Text Ratios! Detailed Report is Provided. Press Release Is Also Send For Syndication On News Information Engine (NIE) Circuit. NIE Is Subscribed By Millions Of Journalists, Corporate Decision Makers, Market Makers, Brokers Etc To Track And Break News. With NIE Press Release Will Reach AP, Acquire Media, Comtex, Factiva, LexisNexis, Syndicate, DMN Newswire network & Newsletters, News360, Newscentral And Media Outlets Such As Bloomberg, Digital Journal, Business Week, Individual And Many More. WE DO NOT WRITE/PUBLISH FOR THE BELOW CATEGORIES: Sexually Explicit/Adult Gambling Loans Racial Issues Political Opinions Religious Opinions Sex-Related Escort Services Adult Products Scams Or Scam Related Products Health Claims Or Product Like Health Supplements & Pharmaceuticals Firearms BONUS: 1000+ Real Traffic from the targeted country and targeted Source

by: douglasdavisv
Created: —
Category: Press Release
Viewed: 183


How does one calculate the distribution of the Matt Colville way of rolling stats?

Specifically, the Matt Colville way of rolling stats is:

  1. Roll 4d6, drop the lowest value die for 1 stat;
  2. If this roll is lower than 8, reroll it;
  3. Repeat steps 1 and 2 until you have a set of 6 stats greater than 8;
  4. If there are not at least 2 values of 15 or higher in this set, drop it entirely and start over.

I’ve written some AnyDice code for calculating this process’s distribution but I got stuck at this:

function: ROLL:n reroll BAD:s as REROLL:d {   if ROLL = BAD { result: REROLL }   result: ROLL } function: ROLL:d reroll BAD:s {   loop I over {1..20} {     ROLL: [ROLL reroll BAD as ROLL]   }   result: ROLL } X: [highest 3 of 4d6] Y: 6 d[dX reroll {3..7}] loop P over {1..6} {  output P @ Y named "Ability [P]" } 

This gives me the probabilities for all my abilities individually, but does not take into account the discarding of the set if there are not at least 2 15s. How should I make it take that into account? (Or how do I calculate this distribution in another way?)

Unable to apt-install JDK 8, Oracle distribution

I am trying to install oracle-java8 on ubuntu18.04. I followed several tutorial but I keep getting:

E: Package 'oracle-java8-installer' has no installation candidate   

Here is how I get to that point:

 $    sudo apt-get purge openjdk-\*  $    sudo apt autoremove  $    sudo apt-get update  $    sudo add-apt-repository ppa:webupd8team/java  $    sudo apt-get update  $    sudo apt-get install oracle-java8-installer 

and this is the full output that I get runnig the last command:

Reading package lists... Done Building dependency tree        Reading state information... Done Package oracle-java8-installer is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source 

E: Package ‘oracle-java8-installer’ has no installation candidate

Try looking for similar pkgs with:

apt-cache search oracle-java 

gives me no results..

Can somebody help me out here?

As of now, is there a Linux distribution supporting Apple keyboard and trackpad out of box?

Half a year ago I tried to multi-boot the Ubuntu 18.0 on my MacBook Pro (2018) but while it has booted in general from the attached USB stick, the keyboard and trackpad were not working. External keyboard for a laptop does not look like a portable solution.

I have heard that Apple keyboard and trackpad need specific drivers, and that there are efforts to provide these drivers for the Linux kernel. So far everything the I have found requires building these drivers and including them into the kernel manually – I would eventually do but this is lots of work.

What is the exact current status of this work? Is there any distribution as of the end of 2019 that supports at least the keyboard out of box?

Sharepoint list not receiving distribution group emails

We have a SharePoint list setup to receive emails and attachments which works fine when emailing that address directly. However, we’ve recently added it to an existing universal Exchange distribution group and the emails aren’t being received by the SharePoint server. I’ve checked all sub-folders inside C:\inetpub\mailroot and nothing gets dropped in there.

I’ve confirmed we can add external contacts (such as Gmail) to distribution lists and those seem to go out without a problem.

SharePoint Version: 2016 Foundation

Any ideas how to continue troubleshooting this?

Why is the distribution of the clustering coefficient of a random network independent of degree?

I was reading about clustering coefficient distribution, and it seems that it is independent of node degree for the case of random networks. I’m wondering why this is the case conceptually.

I do understand that the degree distribution in the case of a random network shows a Poisson behavior, but don’t understand why the clustering coefficient shows no change with degree.

Combinatorial optimization, how to select the optimal gamma distribution?

Setup: Let $ A = \{X_1, …, X_n\}$ be independent, but not necessarily identical, Bernoulli random variables. Suppose you are given a set of $ m + 1$ weights, $ W = \{w_0, w_1, …, w_m\}$ , with $ m \leq n$ , and $ -1 \leq w_k \leq 1$ for all $ 0 \leq k\leq m$ . The objective is to select a set $ S\subset A$ of $ m$ random variables, say $ S = \{X_{s_1}, …, X_{s_m}\}$ , that maximizes

$ $ \sum_{i = 0}^m~w_i\mathbb{P}\bigg(\sum_{j = 1}^m X_{s_j} = i\bigg)$ $

Question: Can the optimal solution, or even a constant factor approximation, be computed in polynomial time?

Prior Work: If we have either $ w_0 \leq w_1 \leq … \leq w_m$ or $ w_0 \geq w_1\geq …\geq w_m$ , then we can achieve the optimal solution by select either the $ X_i$ ‘s with the highest probability to yield 1 or the lowest probability. But does it work for a general $ W$ bounded by $ -1$ and $ 1$ ?