Counting occurrences of word in a text

Let’s say I have a long text of 1M words and I would like to create a table of all the words ordered by the number of occurrences in the text.

One approach would be populating a dynamic array with each word and linear search them to count the occurrences in $ O(n^2)$ then sort the array by occurrences in $ O(n\cdot log~n)$ .

Another approach would be to use y priority queue and a trie. The insertion in the priority queue is $ O(log n)$ and the build of the trie is $ O(n)$ . But traversing the trie to build the priority queue is somehow difficult to evaluate.

Eventually using a hash map seems to be the best solution, but computing the hash could cost a little bit of time even though it is just a constant. In this you have $ n$ insertion/lookup in $ O(1)$ then a final sort of the hashmap by occurrences in $ O(n\cdot log~n)$ .

So it is clear that the former approach is the worse and the latter the best. But how can I evaluate the complexity of the second one?

counting keywords of a research paper

While solving a research article I didn’t understand the following statement: “The number of keywords in study 1 was between 10 and 20, while in Study 2, it was between 100 and 1000”.

How can I find keywords of study 2 which are between 100 and 1000 where the keywords given in study 2 are: Non-Functional Requirements, Automatic Classification, Support Vector Machine

kindly help me on early basis. thanks

Counting the number of binary strings of length m with no consecutive 1s (RR). How to improve it?

I am new to Mathematica and I am trying to solve this problem of counting the number of binary strings of a certain length m, as far as no consecutive 1s are there.

For instance m = 3, my recurrence relation should give 5 i.e. 000, 001, 101, 100, 010.

I started like this with initial seeds: n > 0; a[ 1] = 1, a[2] = 3, a[3] = 5 and then in RSolve I did:

enter image description here

Is there a better way to improve it so that I can use the output in a plot? Currently, I cannot as my solution says it cannot be used as function.


why the complexity of Key-indexed counting algorithm is 11N + 4R?

I am asking about the complexity of Key-indexed counting algorithm is 11N + 4R, I want to know from where 11 came from

this is the steps of the algorithm

int N = a.length; int[] count = new int[R+1];  for (int i = 0; i < N; i++)      count[a[i]+1]++;  for (int r = 0; r < R; r++)     count[r+1] += count[r];  for (int i = 0; i < N; i++)     aux[count[a[i]]++] = a[i];  for (int i = 0; i < N; i++)     a[i] = aux[i]; 

Thanks in advance

Counting pairs of intervals where one is a subset of the other

Given a list of intervals with nonnegative endpoints, e.g. $ [3,5][1,7][2,60] $ , the goal is to find the number of pairs of intervals $ I,J$ such that $ I$ is a subset of $ J$ . In this particular case the total number is 2 because $ [3,5]$ is a subset of $ [1,7]$ and a subset of $ [2,60]$ . Furthermore we were asked to find a solution to this problem with time complexity less than $ O(n^2)$ .

At first I thought of sorting the given sets based on their lower bound and in this example the order would be $ [1,7] \to [2,60] \to [3,5] $ so the time complexity so far is $ O(n\log n)$ , but I can tell nothing about the total number of pairs cause of the order of the upper bounds of the sets is a mess. Then I thought of sorting them based on their middle element and then performing a Binary Search based on this sorting so my time complexity would still be $ O(n\log n)$ . However now I am stuck and a direction would be appreciated.

Computational complexity of counting symbols

Consider the counting function $ \{x\}^* \rightarrow \mathbb N$ that counts the number of occurrences of the symbol $ x$ . I am confused about the (asymptotic) complexity of computing this function as my intuition strongly suggests that this should be linear, i.e., $ O(n)$ where $ n$ is the number of occurrences of the symbol $ x$ in the input.

As far as my understanding goes there are multiple interpretations of computation – e.g.,

  • single-band Turing machines, for which my best idea has run time $ \Omega(n^2 \log n)$ I think (the $ \log \log n$ comes from the assumption that the binary successor function has $ \Omega(n)$ run time, where $ n$ is the length of a binary representation of a natural number, and the $ n^2$ comes from the assumption that the Turing machine has to travel over all the $ x$ ‘s to reach its current count),
  • multi-band Turing machines, for which I think I have an idea of run time $ \Omega(n \log n)$ ,
  • random-access machines, which I don’t know at all.

So my question is the following.

What is the computational complexity of the counting function in the various models of computation? Is it linear in any of them?

If at all relevant, I ask from the point of view of abstract algebra, where I am trying to assess the computational complexity of the word problem in some specific group.