Google Analytics – Match URLs on Unique Number in Content Drilldown Using RegEx?

I am trying to track pageviews on Google Analytics for knowledge base articles (on Zendesk). Each article has a unique number. However, the title of the page is sometimes appended to the URL, and GA tracks this as a separate page. If the title of the article changes, it generates a new URL.

For example, these would all be the same article, so I want to see a single pageviews count, but GA would show as 3 separate stats

/hc/en-us/articles/360039413394 /hc/en-us/articles/360039413394-How-To-Make-A-Sandwich /hc/en-us/articles/360039413394-How-To-Make-A-Turkey-Sandwich 

I want GA to roll up the articles matching on the unique number, and ignoring everything after that. Is there a built-in way to do this? Is there a way to do this with Regex? Where would I add the Regex for the Content Drilldown page? Help!

Thank you.

Optimizing regex based query in sqlite

I’m using an sqlite database to store manually created labels for some data automatically queried from a live system. The data from the live system consists primarily of an address, comprised of 3 parts. Let’s use URLs an example, the three parts being the protocol, the domain and the path. Initially, I would load a couple 100k worth of addresses into a table with each field of the address being a column and together building the primary key. The labels are then in additional columns.

CREATE TABLE OldWebsites (     protocol VARCHAR (255) NOT NULL,     domain   VARCHAR (255) NOT NULL,     path     VARCHAR (255) NOT NULL,     label1   INTEGER,     label2   TEXT,     CONSTRAINT address PRIMARY KEY (         protocol,         domain,         path     ) ); 

I found myself repeating labels over and over based on certain patterns that the address would match. Since I would always extend this table with new data and remove old data, this became too much of a hazzle, so I tried a different approach, namely just loading the existing addresses into one table and then have other tables for the data where I would write regex matchers for the address components

CREATE TABLE Websites (     protocol VARCHAR (255) NOT NULL,     domain   VARCHAR (255) NOT NULL,     path     VARCHAR (255) NOT NULL,     CONSTRAINT address PRIMARY KEY (         protocol,         domain,         path     ) );  CREATE TABLE Label1 (     protocol_re VARCHAR (255) NOT NULL,     domain_re   VARCHAR (255) NOT NULL,     path_re     VARCHAR (255) NOT NULL,     label1   INTEGER     CONSTRAINT address PRIMARY KEY (         protocol,         domain,         path     ) );  CREATE TABLE Label2 (     protocol_re VARCHAR (255) NOT NULL,     domain_re   VARCHAR (255) NOT NULL,     path_re     VARCHAR (255) NOT NULL,     label2   TEXT,     CONSTRAINT address PRIMARY KEY (         protocol,         domain,         path     ) ); 

Assume that I have already (using other queries) guaranteed, that there is exactly one match in each label table for each address in the Websites table. I would now like to write a query that reconstructs a table like the original OldWebsites one by matching labels and automatically queried data.

Something like this

SELECT Websites.*,        Label1.*,        Label2.*   FROM Websites        JOIN        Label1 ON (Websites.protocol REGEXP '^' || Label1.protocol_re || '$  ' AND                    Websites.domain REGEXP '^' || Label1.domain_re || '$  ' AND                    Websites.path REGEXP '^' || Label1.path_re || '$  ')         JOIN        Label2 ON (Websites.protocol REGEXP '^' || Label2.protocol_re || '$  ' AND                    Websites.domain REGEXP '^' || Label2.domain_re || '$  ' AND                    Websites.path REGEXP '^' || Label2.path_re || '$  '); 

Now.. this is really slow, especially for more label tables, using PCRE sqlite3 extension for the REGEXP function.

I would like to know if there’s way to optimize this query using either parallelization (the query should run ideally from python) or using the knowledge that there is exactly 1 match in each Label table.

From my understanding, multiple inner joins should take at most the sum of the individual joins, correct?

Perhaps indexes are also helpful, but I have only a basic idea of what they are and no idea whether they would be of help here.

Source of RegEx examples of Secret Detection patterns in repositories?

Where can I find RegEx that can pattern match common secret strings?

I have a product that scans repos and commits and in case a developer tries to commit a secret (i.e. passwords, keys). It scans for roughly 30 patterns by default which seems insufficient given thousands of repos in over seventy languages. I can expand that scanning with RegEx. However, I don’t know every common secret there is.

Is there a framework, list, or tool that can provide RegEx or patterns for likely secrets?

Where can I get comprehensive lists of secret types?

Or am I doomed to writing a metric ton of RegEx then being held responsible for when something is missed?

Is there a regex way to match generally all possible subdomains in robots.txt?

Given a website with the fictional domain example.com.
The owner of this website added a subdomain : x.example.com.

  • After one year, the owner changed x to y so to have y.example.com
  • After two years, the owner changed y to z so to have z.example.com

Each of the three scenarios did not involve a change of all example.com structures at robots.txt so the owner got a serious long term SEO problem because crawling software were requested to scan non existing webpages (x, and y ones respectively).

What regex prophylaxis could have been used by the owner, beforehand to prevent the SEO problem;
Is there a regex way to match generally all possible subdomains in robots.txt?

Setting up a Destination Goal in Google Analytics using Regex

I want to set up a destination goal in GA to determine when a user reaches the end of a flow, or basically has completed a process.

The problem, the URL changes depending on the activity of the user so we have a URL like this (below) which can vary.

apply/UserAccount?execution=e1s2

The S2 represents the last stage in the flow, which is the page I’m trying to capture. However the e1 could be any number depending on the other activities of the user.

Can someone help with writing a Regex to make sure GA captures the every time a user reaches the last page regardless of the execution number? Is there a way to ignore the “E1” value and simply match the rest? I am completely new to this part so I’d be embarrassed to share what I’ve tried 🙂

regex of three repeating number

I need a regular expression, not one using coding syntax, to write an expression that has a vocabulary of 1,2,3 and every string in the language has at most one occurrence of the substring 222 but never has the occurrence of 123.

((1 or 3)*(211 or 231 or 233 or 2211 or 2231 or 2233))* 222  

is as far as I could get, I can’t see how to prevent 123 from happening while still allowing a string like 1223221222 to occur. Any thoughts?

Can I use REGEX to 301 a URL with extra characters at the end?

I’ve just set up PHPlist to manage my email subscribers.

When folks opt-in, they’re taken to this page:

https://www.example.com/lists/?p=subscribe&id=1

I’d like to redirect them to a custom page here:

https://www.example.com/welcome

I tried to 301 from /lists/?p=subscribe&id=1 to /welcome, but this won’t work. I presume this is because of the characters after /lists/.

And, I can’t 301 from /lists to /welcome because /lists is the first portion of the unsubscribe page as well.

Is there a way I can 301 from the full address above with REGEX? Or is there another way to get folks to a custom page–without editing PHP’s code base?

Thanks!

Finding longest word without help of library functions or regex?

Have the function LongestWord(sen) take the sen parameter being passed and return the largest word in the string. If there are two or more words that are the same length, return the first word from the string with that length. Ignore punctuation and assume sen will not be empty.

**Examples**

Taken from here https://coderbyte.com/information/Longest%20Word

Every single solution that came across in Javascript or C# are the ones which use regex. Is it possible to solve this without regex?

I gave it a shot but could not make it work:

function LongestWord(sen) {   let word = [];   let longestword = "";   let longestwordlen = 0;   let wordlen = 0;   for (let i = 0; i < sen.length - 1; i++) {      if (isAlphabet(sen[i]) && !isInvalidChar(sen[i + 1])) {       wordlen++;       word.push(sen[i]);     }     if (isSpace(sen[i + 1])) {       if (wordlen > longestwordlen) {         longestwordlen = wordlen;         longestword = word.join('')       }       wordlen = 0;       word = [];     }    }    return longestword; }  function isSpace(char) {   if (char.charCodeAt(0) == 32) return true   else return false } function isInvalidChar(char) {   if (!isSpace(char) && !isAlphabet(char)) return true   else return false } function isAlphabet(char) {   if ((char.charCodeAt(0) >= 65 && char.charCodeAt(0) <= 90) || (char.charCodeAt(0) >= 97 && char.charCodeAt(0) <= 122)) return true   else return false }  LongestWord("I am going to kill youeeeeee ") 

Large DFA to regex?

For an assignment for one of my courses, one of the questions is to provide a regular expression for the language:

the set of strings such that the number of 0’s is divisible by six, and the number of 1’s is divisible by five.” over the alphabet {0, 1}.

I made a DFA for this language and it has 30 states. However, going to turn this DFA to a regular expression through state reduction is proving to be very time consuming.

What could be the better or easier way to do create a regular expression that describes that language?