Strictly Proper Scoring Rules and f-divergences

Let $ S$ be a scoring rule for probability functions. Define

$ EXP_{S}(Q|P) = \sum \limits_{w} P(w)S(Q, w)$ .

Say that $ S$ is striclty proper if and only if $ S$ minimises $ EXP_{S}(Q|P)$ as a function of $ Q$ . Define

$ D_{S}(P, Q) = EXP_{S}(Q|P) – EXP_{S}(P|P)$ .

If $ S$ is the logarithmic scoring rule defined by $ S(P, w) = -ln(P(w))$ , then $ D_{S}(P, Q)$ is just the Kullback-Leibler divergence between $ P$ and $ Q$ , or equivalently, the inverse Kullback-Leibler divergence between $ Q$ and $ P$ . Note that the inverse Kullback-Leibler divergence is an $ f$ -divergence.

My question is this: is there any other strictly proper scoring rule $ S$ such that $ D_{S}(P, Q)$ is equal to $ F(Q, P)$ for some $ f$ -divergence $ F$ ?

I think that $ D_{S}(P, Q)$ is always a Bregman divergence, and Amari proved that the only $ f$ -divergence that is also a Bregman divergence is the Kullback-Leibler divergence (on the space of probability functions). Is this enough to imply that there are no other strictly proper scoring rules with this property?

Python Pandas NLTK: Adding Frequency Counts or Importance Scoring to Part of Speech Chunks on Dataframe Text Column

I did NLTK part of speech tagging followed by chunking on one column (“train_text”) inside my Pandas data frame.

Below is my code that ran successfully and sample output results.

def process_content():     try:         for i in train_text:             words = nltk.word_tokenize(i)             tagged = nltk.pos_tag(words)             # chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""             chunkGram = r"""Chunk: {<VB.?><NN.?>}"""             chunkParser = nltk.RegexpParser(chunkGram)             chunked = chunkParser.parse(tagged)              for subtree in chunked.subtrees(filter = lambda t: t.label() == 'Chunk'):                 print (subtree)      except Exception as e:         print(str(e))  process_content() 

Results: “xxx” stands for a word; in the first instance it is a verb and in the second instance it is a noun

(Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  

Now that I have the chunks of words, I want to find the 10 most frequently occurring or prominent Verb + Noun chunks. Is there any way I can attach a frequency or importance score to each chunk?

Python Pandas NLTK: Adding Frequency Counts or Importance Scoring to Part of Speech Chunks on Dataframe Text Column

I did NLTK part of speech tagging followed by chunking on one column (“train_text”) inside my Pandas data frame.

Below is my code that ran successfully and sample output results.

def process_content():     try:         for i in train_text:             words = nltk.word_tokenize(i)             tagged = nltk.pos_tag(words)             # chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""             chunkGram = r"""Chunk: {<VB.?><NN.?>}"""             chunkParser = nltk.RegexpParser(chunkGram)             chunked = chunkParser.parse(tagged)              for subtree in chunked.subtrees(filter = lambda t: t.label() == 'Chunk'):                 print (subtree)      except Exception as e:         print(str(e))  process_content() 

Results: “xxx” stands for a word; in the first instance it is a verb and in the second instance it is a noun

(Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  (Chunk xxx/VBN xxx/NN)  

Now that I have the chunks of words, I want to find the 10 most frequently occurring or prominent Verb + Noun chunks. Is there any way I can attach a frequency or importance score to each chunk?