A deterministic finite state automata for finding all (potentially overlapping) regular expression matches?

I was working on a bioinformatics practice problem named Finding a Protein Motif on rosalind.info. In essence, I was given a particular regular expression N[^P](S|T)[^P] and is asked to find all matches.

Solving that problem is not the goal here, I have a ‘working’ solution here. In essence, I manually designed a state machine that can find all matches for that regular expression.

And here is a ‘visualization’ of the state machine, to make it clear how it is manually designed.

digraph G {     0 [label="0,''"]     1 [label="1,'N'"]     2 [label="2,'NN'"]     3 [label="3,'NS|NX'"]     4 [label="4,'NNS'"]     5 [label="5,'NSS|NXS'"]     0 -> 1 [label="N"]     0 -> 0 [label="P"]     0 -> 0 [label="S"]     0 -> 0 [label="X"]     1 -> 2 [label="N"]     1 -> 0 [label="P"]     1 -> 3 [label="S"]     1 -> 3 [label="X"]     2 -> 2 [label="N"]     2 -> 0 [label="P"]     2 -> 4 [label="S"]     2 -> 3 [label="X"]     3 -> 1 [label="N"]     3 -> 0 [label="P"]     3 -> 5 [label="S"]     3 -> 0 [label="X"]     4 -> 1 [label="N(accept)"]     4 -> 0 [label="P"]     4 -> 5 [label="S(accept)"]     4 -> 0 [label="X(accept)"]     5 -> 1 [label="N(accept)"]     5 -> 0 [label="P"]     5 -> 0 [label="S(accept)"]     5 -> 0 [label="X(accept)"] } 

The classical theory allows us to convert a regular expression to a non-deterministic finite-state automaton and then convert it to a deterministic finite-state automaton through subset construction. In particular, subset construction guarantees that if there exists an accepting computation, then the deterministic finite-state automaton would also accept.

Let say I have a regular expression that matched twice, the corresponding deterministic finite-state automaton would accept after the first match, but then it doesn’t know what to do in order to set it to the right state for detecting overlapping matches. I guess I could start one character after the beginning of the first match, which in the worst case would probably lead to quadratic time, as we could imagine with {.*} on {{{{{}

In the worst case, I expect quadratic time (e.g. {{{{{}}}}}}), but it would be great if the timing is output-sensitive, for I believe a good deal of cases aren’t quadratic in output size.

It would be great if my state machine used to find all matches can be generalized (apparently sometimes it need linear space, not just a single state) or automatically designed. Do we know if there are existing theories for that?

No engines matches when importing accounts

Importing accounts to this engine shows “no engine matches” for all urls.

[setup]
enabled=1
default checked=0
engine type=Article
description=
dofollow=1
anchor text=1
creates own page=1
uses pages=0
multiple posts per account=1
;;; API MAIN VARIABLES
[api_url]
type=extract
default=http://gsapi.local:9090
static=1
[api_link_id]
type=extract
front=”link_id”:
back=}
static=1
[api_target_url]
type=extract
default=%targethost%
static=1
;;; API REQUIRED VARIABLES
[api_engine_name]
type=extract
default=test
static=1
;;; NORMAL VARIABLES
[URL]
type=url
[Anchor_Text]
type=text
alternate data=%spinfile-generic_anchor_text.dat%
[Article]
type=memo
allow html=1
must be filled=1
hint=The full article comes here.
auto modify=0
auto add anchor url=2
auto add anchor url content=%file-auto_anchor-article.dat%
custom mode=1
[Login]
type=login
must be filled=1
hint=The login for websites that need an account. Use numbers and letters only.
min length=10
upcase=0
static=1
[Password]
type=password
must be filled=1
hint=A password used for websites that need an account. Use numbers and letters only.
static=1
[Your E-Mail]
type=email
static=1

—–
[STEP1]
link type=Article
just download=1
submit success=”success”:true
submit failed=”success”:false
submit failed retry=XXXXXXXXXXXXXXXXXXXXXXXX
captcha failed=XXXXXXXXXXXXXXXXXXXXXXXX
verify submission=1
verify by=url
verify url=%api_url%/api/link/verify_redirect/%api_link_id%
verify interval=10
verify timeout=99999999999999999999
first verify=5
verify on unknown status=0
[STEP2]
modify url=%api_target_url%
just download=1
[STEP3]
modify url=%api_url%/api/link/create/%api_engine_name%
post data=engine_name=%api_engine_name%&target_url=%api_target_url%&url=%url%
form request with=XMLHttpRequest
encode post data=3
just download=1

Created POST route, but resulting in RoutingError (No Route matches POST)

I am setting up a new route “/api/v1/example_two” that I can POST to (create), however it is resulting in No route matches [POST] RoutingError

I have tried explicitly stating post, try to create the route through resources

config/routes.rb

Rails.application.routes.draw do    resources :roles, only: [:index], defaults: { format: :xml }    defaults format: :json do     scope :v1 do       resources :example_one, only: [:create, :show], param: :uuid       resources :example_two, only: [:create], param: :uuid     end   end end  

and I have a controller: app/controllers/example_two.rb

class example_two < ApplicationController   def create     ...   end end 

I expect it to return whatever is in example_two#create, however it is resulting in ActionController::RoutingError (No route matches [POST] \"/api/v1/example_two\"

Is there a way to filter out partial matches from search results on YouTube?

When searching for “JoJo’s Bizarre Adventure,” I frequently just search the keyword “jojo,” but my first sight is often greeted by a musical artist by the same name.

Instead, I’d prefer that all my YouTube results not show a single result pertaining to the artist, and instead pertaining to the anime itself.

Is there a way to a filter out specific content from the search results?

Matching Algorithm – How to construct a bipartite-like graph with heterogeneous matches rules

We have a set Set. Elements in this set can be matched according to the following rules:

matching rules

The input to the matching algorithm is an array of variable size consisting of elements in S. Each element in the array has a particular size or “quantity” that can be matched (I imagine this can simply be modeled as edge weights).

The first question is, how to maximize the total quantity matched? Second question is, how to optimize time complexity?

Intuitively, I think the problem could be modeling as a weighted bipartite graph and solved as a max-flow algorithm. The challenge is that elements can be matched in different ways, so I’m not sure what the graph should look like given these extra rules or if it implies a different approach should be used.

How to get exact matches on top of search results?

I’m using search api with solr for drupal 8. I’ve added title field to index and gave maximum boost. Problem is when I search a keyword, exact match is not at the top of the results. For example: Assume I have contents with title “test content”, “the test content”, “small test content” etc.. The “test content” should be on the top of the list when I search for “test content”. Now it is appearing below the others. Any help is appreciated. Thank you

JSONLayout has no parameter that matches element KeyValuePair

Below is the JSONLayout configure in log4j2.xml

        <JSONLayout complete="true" charset="UTF-8" compact="true">             <KeyValuePair key="application-name" value="sample-app"></KeyValuePair>         </JSONLayout> 

POM.xml

org.apache.logging.log4j:log4j-core:jar:2.7:compile org.apache.logging.log4j:log4j-api:jar:2.7:compile org.apache.logging.log4j:log4j-jul:jar:2.7:compile com.fasterxml.jackson.core:jackson-databind:jar:2.9.9:compile 

I see the message is being printed in JSON format but somehow keyvalue pair is not being recognized.

2019-06-01 21:11:23,305 localhost-startStop-1 ERROR layout JSONLayout has no parameter that matches element KeyValuePair SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 

Any idea why keyvaluepair is not recognized?

Function to check if received message matches any of the expected messages

I have a message coming in and I need to match it against the expected messages. The program will eventually do something as a result of receiving those messages. I am not very experienced at programming, but surely there should be a better way to declare all those messages it can be in like a separate entity and then be able to use it within this HexSearch.cpp file?

I tried to search how to do that but I couldn’t find the right words to ask about this using a search engine. There is many more messages than those show here which still need to be declared but this is just a sample, which I don’t like to look at already.

#include "HexSearch.h"  void searchFunction(int num, char msg[]) {      static const char readReq[] = { 0x92 };                                 static const char readResp[] = { 0x00, 0x02, 0x12, 0x34, 0xA1 };      static const char writeReq[] = { 0x0A, 0xE0 };                          static const char writeResp[] = { 0x00, 0x02, 0x11, 0x01, 0x98 };      static const char resetReq[] = { 0x00, 0xFF };                            static const char resetResp[] = { 0x00, 0x21, 0x23, 0x0E, 0xAE, 0x11, 0x3A };      static const char verReq[] = {0x00, 0xA2};     static const char verResp[] = {0x00, 0x03, 0x82, 0xAA, 0x07, 0x88, 0xA9};      static const char typeReq[] = {0x00, 0x67};     static const char typeResp[] = {0x00, 0x03, 0x00, 0x00, 0xC4, 0x77};      static const char askReq[] = {0x00, 0x55};     static const char askResp[] = {0x00, 0x01, 0xFE, 0xFF};      if (num == 4) {         replyMsg(msg, 2, 3,  readReq, readResp, sizeof(readResp) / sizeof(readResp[0]));     }     else if (num == 5) {         replyMsg(msg, 2, 4, writeReq, writeResp, sizeof(writeResp) / sizeof(writeResp[0]));         replyMsg(msg, 2, 4, resetReq, resetResp, sizeof(resetResp) / sizeof(resetResp[0]));         replyMsg(msg, 2, 4, verReq, verResp, sizeof(verResp) / sizeof(verResp[0]));         replyMsg(msg, 2, 4, typeReq, typeResp, sizeof(typeResp) / sizeof(typeResp[0]));         replyMsg(msg, 2, 4, askReq, askResp, sizeof(askResp) / sizeof(askResp[0]));     } }  void replyMsg(char msg[], int startArr, int endArr, const char* receiv, const char* resps, int respL) {     if (std::equal(msg + startArr, msg + endArr, receiv)) {         for (int x = 0; x < respL; x++) {             serialPC.putc(resps[x]);         }     } } 

The code works. I am interested in improving it only. num is the total number of bytes of a message. E.g. readReq has one byte of data, but has also got 2 start bytes and 1 end byte so a total of 4. readResp array has the 2 start bytes, 2 data bytes, and one end byte and so it has a total size of 5 bytes. The 2nd byte is the one which specifies the length of a message. msg[] is the message coming in from a serial connection essentially.

As an example, if msg[] = { 0x00, 0x01, 0x92, 0x56 } then num = 4 and replyMsg will compare the 3rd byte to see that it matches readReq and so output readResp