Normalize a rotation around the Z-axis (issue with GLM)

I’m trying to undo some transformations coming from an external tool*. I’m getting different results depending on subtle differences in the input and wondering how to convert to a rotation about the Z-axis only.

The transformations are expressed as a matrix = translation * rotation * -translation. I want to decompose the resulting matrix into a single translation and rotation around Z — I know this is possible given the source material (2D plane).

My problem is coming from GLM decompose. Given a matrix that looks like this:

[         -0.5 |     0.866025 |            0 |            0 ] [    -0.866025 |         -0.5 |            0 |            0 ] [            0 |            0 |            1 |            0 ] [            0 |            0 |            0 |            1 ] 

If I call decompose, then take the eulerAngles of the Rotation I end up with either:

  • ( 0, -0, 2.09439 ) from quat( 0.5, 0, 0.866025, 0 )
  • ( 3.14159, 1.0472, 3.14159 ) from quat( 0.5, 0, 0, 0.866025 )

The difference depends on how the matrix was generated, whether the rotation was 120degrees or -240degrees. The display must be clipping the floating point, introducing a subtle change.

I’m assuming both these rotations are actually the same.

How do I force/convert the result to be a rotation about the Z axis only.


*The external tool is Inkscape which uses the CSS/SVG function rotate(r, cx, cy) instead of a rotation and transform. That function results in the matrix: translate(cx,cy,0) * rotate(r, (0,0,1)) * translate(-cx,cy,0)

TypeError: normalize() missing 1 required positional argument: ‘text’

I’m working on a project that takes a file, goes through a specified directory and returns their similarities in %…i have been able to succeed, but the problem is converting it into GUI, the raw code works, but when implemented into the GUI (PyQt5) it raises an TypeError: normalize() missing 1 required positional argument: 'text' error…

here is the raw code

import docx import nltk, string from sklearn.feature_extraction.text import TfidfVectorizer import os from pathlib import Path from TV1 import App   def getText(filename):     doc = docx.Document(filename)     fullText = []     for para in doc.paragraphs:         fullText.append(para.text)     print('\n'.join(fullText))     return '\n'.join(fullText)   # nltk.download('punkt')  # if necessary...  stemmer = nltk.stem.porter.PorterStemmer() remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)   def stem_tokens(tokens):     return [stemmer.stem(item) for item in tokens]   '''remove punctuation, lowercase, stem'''   def normalize(text):     return stem_tokens(nltk.word_tokenize(text.lower().translate(remove_punctuation_map)))   vectorizer = TfidfVectorizer(tokenizer=normalize, stop_words='english')   def cosine_sim(text1, text2):     text11 = text1     text22 = open(text2, 'r', encoding='utf-8', errors='ignore').read()     tfidf = vectorizer.fit_transform([text11, text22])      n = (((tfidf * tfidf.T) * 100).A)[0, 1]     return '%.3f%% similarity' % n  file = 'BB.docx' spath = r'C:\Users\Black Laptop\Desktop\Work'  print('---------------------------------')  text = getText(file) if os.path.exists(spath):     for path in Path(spath).iterdir():         print(path)         print(os.path.basename(path))         print(cosine_sim(text, path))         print('') 

GUI…

import sys  from PyQt5.QtCore import (QDate, QDateTime, QRegExp, QSortFilterProxyModel, Qt,                           QTime) from PyQt5.QtGui import QStandardItemModel from PyQt5.QtWidgets import (QApplication, QCheckBox, QComboBox, QGridLayout,                              QGroupBox, QHBoxLayout, QLabel, QLineEdit, QTreeView, QVBoxLayout,                              QWidget, QTableView, QTableWidget)  import docx import nltk, string from sklearn.feature_extraction.text import TfidfVectorizer import os from pathlib import Path   class App(QWidget):     FILES, SIMILAR = range(2)      def __init__(self):         super().__init__()         self.title = 'Plagiarism Checker'         self.left = 50         self.top = 50         self.width = 640         self.height = 240         # self.initUI()         self.one()      def initUI(self):[...]      def getText(self, filename):         doc = docx.Document(filename)         fullText = []         for para in doc.paragraphs:             fullText.append(para.text)         print('\n'.join(fullText))         return '\n'.join(fullText)      # nltk.download('punkt')  # if necessary...      stemmer = nltk.stem.porter.PorterStemmer()     remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)      def stem_tokens(self, tokens):         return [self.stemmer.stem(item) for item in tokens]      '''remove punctuation, lowercase, stem'''      def normalize(self, text):         return self.stem_tokens(nltk.word_tokenize(text.lower().translate(self.remove_punctuation_map)))      vectorizer = TfidfVectorizer(tokenizer=normalize, stop_words='english')      def cosine_sim(self, text1, text2):         text11 = text1         text22 = open(text2, 'r', encoding='utf-8', errors='ignore').read()         tfidf = self.vectorizer.fit_transform([text11, text22])          n = (((tfidf * tfidf.T) * 100).A)[0, 1]         return '%.3f%% similarity' % n      def one(self):         file = 'BB.docx'         spath = r'C:\Users\Black Laptop\Desktop\Work'          print('---------------------------------')          text = self.getText(file)         if os.path.exists(spath):             for path in Path(spath).iterdir():                 print(path)                 print(os.path.basename(path))                 print(self.cosine_sim(text, path))                 print('')   if __name__ == '__main__':     app = QApplication(sys.argv)     ex = App()     sys.exit(app.exec_()) 

ERROR..

Traceback (most recent call last):   File "C:/Users/Black Laptop/PycharmProjects/StringPatternMatcher/CT.py", line 117, in <module>     ex = App()   File "C:/Users/Black Laptop/PycharmProjects/StringPatternMatcher/CT.py", line 29, in __init__     self.one()   File "C:/Users/Black Laptop/PycharmProjects/StringPatternMatcher/CT.py", line 111, in one     print(self.cosine_sim(text, path))   File "C:/Users/Black Laptop/PycharmProjects/StringPatternMatcher/CT.py", line 93, in cosine_sim     tfidf = self.vectorizer.fit_transform([text11, text22])   File "C:\Program Files (x86)\Python36-32\lib\site-packages\sklearn\feature_extraction\text.py", line 1652, in fit_transform     X = super().fit_transform(raw_documents)   File "C:\Program Files (x86)\Python36-32\lib\site-packages\sklearn\feature_extraction\text.py", line 1058, in fit_transform     self.fixed_vocabulary_)   File "C:\Program Files (x86)\Python36-32\lib\site-packages\sklearn\feature_extraction\text.py", line 970, in _count_vocab     for feature in analyze(doc):   File "C:\Program Files (x86)\Python36-32\lib\site-packages\sklearn\feature_extraction\text.py", line 352, in <lambda>     tokenize(preprocess(self.decode(doc))), stop_words) TypeError: normalize() missing 1 required positional argument: 'text'  Process finished with exit code 1 

Any help would be appreciated…thanks

Normalize json data in python

I am trying to produce some “friendly json” in python.
The data I am reading comes from and ldap database and can be in the form

sample_object= [ {‘name’:’John’, ‘title’: ‘programmer’ }, {‘name’:’Bob’, ‘title’:[‘full stack developer’, ‘ldap developer’]}]

Notice that the first object title is a single value. In the second object the title is an array.

I want to normalize this data so that title is always a list in the json, even if it only has one value. This will allow the calling programs to process every item without doing an isinstance check on the title to see if it is an array or a single value.

For example: def test_process(self):    group = self.sample_object()     for person in group:          print (person['title'][0])  Output p full stack developer  # want to avoid the if statement def test_process2(self):     group = self.sample_json()     for person in group:         if isinstance(person['title'], list):             print(person['title'][0])         else:             print(person['title']) 

Minimum number of tree operations to normalize a labeled tree

Given a binary tree with labels on the leaves, like $ (bc)(ad)$ or $ ((af)e)(c(db))$ , which we can interpret as a product of terms with respect to a commutative associative operation, how many applications of commutativity (swapping the two children of a node) and associativity (tree rotations) are needed to bring this tree to a sorted normal form like $ a(b(cd))$ or $ a(b(c(d(ef))))$ ? Examples:

$ $ (bc)(ad)\mapsto((bc)a)d\mapsto(a(bc))d\mapsto a((bc)d)\mapsto a(b(cd))$ $

\begin{align} &\phantom{{}\mapsto{}}((af)e)(c(db))\ &\mapsto (a(fe))(c(db))\ &\mapsto (a(fe))((cd)b)\ &\mapsto (a(fe))(b(cd))\ &\mapsto (a(ef))(b(cd))\ &\mapsto a((ef)(b(cd)))\ &\mapsto a((b(cd))(ef))\ &\mapsto a(b((cd)(ef)))\ &\mapsto a(b(c(d(ef)))) \end{align}

There are $ C_{n-1}\cdot n!$ possible trees on $ n$ elements, and about $ 2n$ possible operations to apply at each stage, so the information theoretic bound gives $ \Omega(\log_{2n}(C_{n-1}\cdot n!))=\Omega(n)$ . On the other hand, if we first fully right associate, then we can perform adjacent swaps with $ O(1)$ operations, leading to an upper bound of $ O(n^2)$ operations.

I suspect that $ O(n\log n)$ operations suffice, perhaps even $ O(n)$ , but I have not been able to improve on the above bounds. I know that $ O(n)$ operations work if it is possible to perform exchanges in $ O(1)$ , but every element has its depth change by at most 1 per operation, so exchanges are rather expensive ($ O(n)$ ) in this model. Probably we want to keep the tree balanced during the sorting, but assuming a balanced tree it’s not clear how to sort effectively by block swaps.

Normalize Deflection Equation

This question is probably more math related…

I know the beam deflection equation for a clamped-clamped beam due to a point load at its center (https://ocw.mit.edu/courses/mechanical-engineering/2-080j-structural-mechanics-fall-2013/course-notes/MIT2_080JF13_Lecture5.pdf, equation 5.46b)

$ \left. w \right| _ { \text { point } } = \frac { P x ^ { 2 } } { 48 E I } ( 3 l – 4 x )$

Now, I would like to normalize it, such that:

  1. Maximum Deflection should be equal to 1:

$ a(x) = \frac{w(x)}{w(L/2)}$

  1. x should be normalized with respect to the length of the beam:

$ b(\hat{x}) = a(\hat{x} L)$

When I do this in Mathematica:

ClearAll["Global`*"] w[x_] := P*x^2/(24*Y*ii)*(3*L - 4*x); a[x_] = w[x]/w[L/2]  b[xh_] = a[xh*L] // Simplify Plot[a[xh], {xh, 0, 1}] 

enter image description here

As you can see, this is not an expected shape of a deformed beam due to a point force. Anyone knows what I am doing wrong ? Thanks!

How/when to normalize during ETL?

Let’s say you’re loading a denormalized flat file of purchase transactions that looks like this:

| location_name | location_zip | product | product_price | |---------------|--------------|---------|---------------| |  downtown     |    90001     | fries   |    2.99       | |  west side    |    90048     | burger  |    5.99       | etc.... 

into a SQL database. In a normalized star schema DB, you would have tables for locations where the zip fact is stored, and for products where the price is stored.

So what you should be loading into the purchases table is this:

| location_id | product_id | |-------------|------------| |     01      |     01     | |     02      |     02     | etc.... 

My question is, how can we normalize the data like this during the ETL process, before it enters the database? The process is complicated by the fact that some locations may already exist in the database with assigned IDs, and some do not. It would be very inefficient to query the DB before inserting each purchase row to determine (or insert a new) location and product ID.

Any general advice on how to approach this problem would be greatly appreciated!

How do I normalize a database to decrease duplicate entries?

I am designing a database system + application that deals with pipe objects who are part of a larger grid network. There are about 1000-5000 pipes for each network. Let’s assume each pipe can have 2 states (clean, dirty).

The application is able to change the pipe status after a maintenance job has been done at that pipe. Each maintenance job has an ID and several other information behind it. The same is true for the pipes.

My design for this problem is as follows:

table: pipe_status  id     maintenance_id         pipe_id        status 1      1                      1              clean 2      1                      2              clean 3      1                      3              dirty 4      1                      4              dirty ... 1000   1                      1000           dirty 1001   2                      1              clean 1002   2                      2              clean 1003   2                      3              dirty 1004   2                      4              clean .... 2000   2                      2000           dirty 

So for every maintenance job, each individual pipe should have a status attribute, depending on whether they have been affected by this particular maintenance job or not. This means that for every maintenance job there are as much status entries as there are pipes, resulting in a quickly growing amount of data.

Example: 20+ grids with 1000-5000 pipes. 500+ maintenance jobs per grid and growing, results in 10-50 million entries in this table at the moment.

Is there better way to implement this problem in a database? It is important to note that the status of each pipe is visualized in the application, so even if a maintenance job only affects 5 pipes, the other 995 are still shown in the app. In this application the user can select a specific maintenance job and see the corresponding pipes and their status.