Is it possible to model these probabilities in AnyDice?

I was asked by a pal to help him model a dice mechanic in AnyDice. I must admit I am an absolute neophyte with it, and I offered to solve it using software I’m better versed in. I’d like to be able to help them do this in AnyDice.

The mechanic is as follows:

  • The player and their opponent are each assigned a pool of dice. This is done via other mechanics of the game, of which details are not germane. Suffice to say, the player will have some set of dice (say 2D6, 1D8, and 1D12) that will face off against the opponents pool (which will generally be different from the player’s, say 3D6, 2D8 and 1D12).
  • The player and their oppoent roll their pools.
  • The opponent notes their highest value die. This is the target.
  • The player counts the number of their dice that have a higher value than the target, if any.
  • The count of the dice exceeding the target, if any, is the number of success points.

I searched the AnyDice tag here for questions that might be similar, the closest I found was "Modelling an opposed dice pool mechanic in AnyDice", specifically the answer by Ilmari Karonen.

That question and answer, however, deals with only a single die type.

Can a question like "What are the probabilities of N successes when rolling 4D6 and 6D20 as the player against 6D6 and 4D20 for the opponent?", be handled in AnyDice and produce a chart similar to that below?

enter image description here

How to calculate the probabilities for eliminative dice pools (dice cancelling mechanic) in Neon City Overdrive?

The game Neon City Overdrive uses the following resolution mechanic for checks:

  1. create a pool of Action Dice and (possibly) another pool of differently-colored Danger Dice (all d6, generally up to 5 or 6 dice in each pool)
  2. roll all the dice
  3. each Danger Die cancels out an Action Die with the same value – both are discarded
  4. the highest remaining Action Die (if there is any) is the result (the precise meaning of which is irrelevant for the purposes of this question)
    • any extra Action Dice showing 6 (i.e. in addition to the single highest die read as the result) provide a critical success (called a boon)

I’m struggling to find the proper way to model the probabilities of this mechanic in anydice.

I realize that a good starting point would be this answer to a very similar question regarding the mechanic in Technoir (which clearly was a source of inspiration for Neon City Overdrive). Unfortunately, despite my best efforts I can’t say I fully comprehend how the code provided there works, and there’s an important difference between the two games: in Technoir a single "negative die" eliminates all matching "positive dice", whereas in NCO this happens on a one-to-one basis.

I would be very grateful for any help.

Using the features embedding of the output from a transformers to represent probabilities of categorical data

I was considering using a transformer, on input data which can be represented as an embedding, so I can use the attention mechanism in the transformer architecture. As my data is of variable input and output length and the input is sequential. My question is that my output data is suppose to be either numerical or probabilities for each output variable. The output was originally supposed to 13 numerical outputs but I decided to use a probability score as way of normalizing the output. My question is can I use two output vectors with 7 features each instead of 13 numeric outputs. Each feature would map to one of the original output vectors and the the last feature would always be 0. As PyTorch expects your output to be the same number of features as your input. My input variables are embedded as 7 features. Should this approach work, as I am unsure of how the loss function works or is there a loss function that would allow for this.

Channel coding and Error probability. Where are these probabilities from?

From where are the following probabilities?

We consider BSCε with ε = 0,1 and block code C = {c1, c2} with the code words c1 = 010 and c2 = 101. On the received word y we use the decoder D = {D1,D2} which decodes the word to the code word which has the lowest hamming distance to y. Determine D1 and D2 and the global error probability ERROR(D) if the code words have the same probability. Hint: To an output y there exists only one x which gets to a failing decoding. (Y = 100 will only gets wrong decoded if we sent message is x = c1 = 010.) So the term (1− p(D(y)|y)) is equal to Pr[X = x|Y = y] for a suitable x.


$ $ \begin{aligned} &\text { Hamming-Distance: }\ &\begin{array}{c|cc} \text { Code } & 010 & 101 \ \hline \hline 000 & 1 & 2 \ \hline 001 & 2 & 1 \ \hline 010 & 0 & 3 \ \hline 011 & 1 & 2 \ \hline 100 & 2 & 1 \ \hline 101 & 3 & 0 \ \hline 110 & 1 & 2 \ \hline 111 & 2 & 1 \ \hline \end{array} \end{aligned}$ $ $ $ \left.D_{1}=\{000,010,011,110\} \text { (Decides for } 010\right)$ $ \left.D_{2}=\{001,100,101,111\} \text { (Decides for } 101\right)$ $ $ $ \begin{aligned} E R R O R(D) &=\sum_{y \in \Sigma_{A}^{3}} p(y)(1-p(D(y) | y)) \ &=\overbrace{2 \cdot p(y)(1-p(D(y) | y))}+\quad \overbrace{6 \cdot p(y)(1-p(D(y) | y))}^{ } \ &=2 \cdot\left(\frac{729}{2000}+\frac{1}{2000}\right)\left(\frac{7}{250}\right)+6 \cdot\left(\frac{81}{2000}+\frac{9}{2000}\right)\left(\frac{757}{1000}\right) \end{aligned}$ $ How do I get to the probabilities $ $ \frac{7}{250}$ $ and $ $ \frac{757}{1000}$ $ ??

I don’t get this calculation. It should be right. But I don’t get how to get to these probabilities.

Could someone explain me this?

AnyDice: How to calculate opposed roll probabilities for an unusual mechanic

I’m currently playing with a dice mechanic that is basically: Roll 2d10, compare each dice to a target number. If one rolls equal to or below the TN, it’s a weak success. If both roll equal to or below, it’s a strong success.

I’ve managed to figure out the probabilities behind that (not on Anydice – but I’ve done it manually in Google Sheets).

I’ve hit a stumbling block when it comes to opposed rolls. How it works is:

  1. Both parties roll their 2d10 and compare to the attribute they’re rolling.
  2. Count the number of successes (equal to or below the attribute)
  3. The party with the most successes wins

The complicated part (mathematically, anyway). Is that 2 fails is a draw, and if both parties get 1 success, then the highest roll (below the stat) wins. If both parties have 2 successes then the highest roll wins, or the second highest if the highest number is the same. If both parties still have the same number then it’s a draw. Basically it’s a blackjack system. Roll high but below the target number.

It’s quite simple in practice but I have no idea how to calculate the probabilities.

To give an example, the player swings a sword with their 6 strength. They score a 2 and a 5 (two successes). The monster dodges with its 5 agility and scores a 1 and a 5. Both scored a 5 so that’s a tie, but the player’s 2 is higher than the monster’s 1. The player deals damage.

I hope I’ve explained myself correctly. If anyone could help me, it would be greatly appreciated. I’ve reached the edge of my mathematical ability here.

What kind of smoothing was applied to these bigrams probabilities?

A certain program computes bigrams probabilities applying a smoothing factor of K=1 given the corpus 12 1 13 12 15 234 2526. It does the following operations; first computes an “unnormalized bigrams”:

{'12': {'1': 2.0, '15': 2.0}, '1': {'13': 2.0}, '13': {'12': 2.0}, '15': {'234': 2.0}, '234': {'2526': 2.0}}. All of those 2.0 values are from doing k+1.

Then shows the “normalized bigrams”:

{'12': {'1': 0.2, '15': 0.2}, '1': {'13': 0.25}, '13': {'12': 0.25}, '15': {'234': 0.25}, '234': {'2526': 0.25}}.
The operations are:


I don’t know the logic behind these operations, Laplace smoothing would be for example, given P(1|12)=1/2, smoothed; (1+1)/(2+6)=0.25 then, shouldn’t be 0.25 instead of 0.2?
This is the stripped down code from the original one:

from __future__ import print_function from __future__ import division import re class LanguageModel:     "unigram/bigram LM, add-k smoothing"     def __init__(self, corpus):          words=re.findall('[0123456789]+', corpus)         uniqueWords=list(set(words)) # make unique         self.numWords=len(words)         self.numUniqueWords=len(uniqueWords)         self.addK=1.0          # create unigrams         self.unigrams={}         for w in words:             w=w.lower()             if w not in self.unigrams:                 self.unigrams[w]=0             self.unigrams[w]+=1/self.numWords          # create unnormalized bigrams         bigrams={}         for i in range(len(words)-1):             w1=words[i].lower()             w2=words[i+1].lower()             if w1 not in bigrams:                 bigrams[w1]={}             if w2 not in bigrams[w1]:                 bigrams[w1][w2]=self.addK # add-K             bigrams[w1][w2]+=1          #normalize bigrams          for w1 in bigrams.keys():             # sum up             probSum=self.numUniqueWords*self.addK # add-K smoothing             for w2 in bigrams[w1].keys():                 probSum+=bigrams[w1][w2]             # and divide             for w2 in bigrams[w1].keys():                 bigrams[w1][w2]/=probSum         self.bigrams=bigrams         print('Unigrams : ')              print(self.unigrams)         print('Bigrams : ')         print(self.bigrams)         if __name__=='__main__':      LanguageModel('12 1 13 12 15 234 2526') 

Find probabilities of ending up in each leaf node of a directed acyclical graph with uniform probabilities

This is from a competitive programming contest.

Given a DAG and an arbitrary starting node $ S$ , through a random process of selecting the next child node each of which has a uniform probability, find the probabilities of ending in each leaf node.

My current approach is using BFS to find the leaf nodes and then using DFS to compute the probability of choosing a path from $ S$ to a leaf node $ L_i$ .

vector<int> BFS(vector<list<int>>& adj, int n, src){     vector<int> result;      vector<bool> visited(n, false);     queue<int> q;      q.push(src);     visited[src] = true;      while(!queue.empty){         src = q.front();         q.pop();          if(adj[src].size() == 0)             result.emplace_back(src);          for(int child : adj[src]){             if(!visited[child]){                 visited[child] = true;                 q.push(child);             }         }     } } 
map<pair<int, int>, double> dp; // Memoization  double DFS(vector<list<int>>& adj, int src, int dst){     if(src == dst) return 1;      auto it = dp.find(make_pair(src, dst));     if(it != dp.end()) return it->second;      double probability = 0;     for(int child : adj[src]){         probability += DFS(adj, child, dst) / adj[src].size();     }      dp.insert(make_pair(make_pair(src, dst), probability));     return probability; } 

This seems to work for any example I can come up with, it even passes the first test, however it gives a wrong answer for the second one and I cannot figure out why.

Has anyone done research on average RPG probabilities?

I’ve been looking at designing an RPG system lately and I’m currently considering where to set the “difficulty” of a roll at to get some satisfying results.

I’m wondering – has anyone researched some “average” RPG probabilities that “feel good”? My guess would be having something like a 50% success chance as a baseline, then turning it up to something like 75%, 87.5%, etc. as proficiency goes up seems to at least sound good on paper, but it would be nice to have a comparison to some games out there.

I’d be interested to know what kind of probabilities have a good “game feel” – how difficult ought it to be to succeed at a roll of some task you’re new at, something you’re proficient at, and something you’re a master of for example.

I know that in systems like D&D your to-hit bonuses and enemy defences both scale, and many games like PbtAs have a somewhat static difficulty. I’m wondering if there are any design docs for roll probabilities, or does anyone have their design experience to share on such numbers?

Decoding problem and conditional probabilities

I’m reading the book by MacKay “Information theory, inference and learning algorithms” and I’m confused by how he introduces the decoding problem for LDPC codes (page 557).

given a channel output $ \mathbf{r}$ , we wish to find the codeword $ \mathbf{t}$ that whose likelihood $ P(\mathbf{r}|\mathbf{t})$ is biggest.

I can get on board with that, even if it seems a bit backwards since $ \mathbf{r}$ is the evidence we have, so I would be more comfortable if we were trying to find $ \mathbf{t}$ that maximizes $ P(\mathbf{t}|\mathbf{r})$ , and as a matter of fact, he goes on to say

there are two approaches to the decoding problem both of which lead to the generic problem ‘find $ \mathbf{x}$ that maximizes $ $ P^*(\mathbf{x})=P(\mathbf{x})\mathbb{1}[\mathbf{Hx}=\mathbf{z}]$ $

he exposes the two points of view in the following page, the first one especially confuses me

the codeword decoding point of view

First, we note the prior distribution over codewords $ $ P(\mathbf{t})=\mathbb{1}[\mathbf{Ht}=\mathbf{0}]$ $ […]. The posterior distribution over codewords is given by multiplying the prior by the likelihood […] $ $ P(\mathbf{t}|\mathbf{r})\propto P(\mathbf{t})P(\mathbf{r}|\mathbf{t}) $ $

from the generic decoding problem he gave, it looks like we’re actually trying to maximize $ P(\mathbf{x})=P(\mathbf{t}|\mathbf{r})$ , instead of $ P(\mathbf{r}|\mathbf{t})$ .

Is it obvious that the two maxima are the same and with the same maximizer? It is not obvious to me since the maximizer $ \mathbf{t_0}$ that maximizes $ P(\mathbf{r}|\mathbf{t})$ might actually minimize $ P(\mathbf{t})$ , so the maximizer of $ P(\mathbf{t})P(\mathbf{r}|\mathbf{t})$ might be different! I understand that in this case $ P(\mathbf{t})$ is uniform, so this shouldn’t be a problem, but it seems weird to me that this is simply not stated and I feel like I’m missing something.

Why should we start with the problem of maximizing $ P(\mathbf{r}|\mathbf{t})$ and not $ P(\mathbf{t}|\mathbf{r})$ ? Why does he seem to switch after a few sentences? Am I correct in thinking the algorithm presented actually maximizes $ P(\mathbf{t}|\mathbf{r})$ ?

Thank you