I have a $ 12 \times 12$ (so not really large) system of linear equations in $ \mathbb{F}_2$ which I got to RREF through the usual row reduction. Suppose the system has multiple solutions, and call the unknowns $ x_i$ . What is the least expensive way to find a solution that minimizes the amount of $ x_i$ ‘s such that $ x_i = 1$ , or equivalently, a solution of minimal norm? Is this solution unique?

# Tag: norm

## Moments of the Schatten norm of matrix

I am wondering what is the connection between order of the moment of the p-th shatten norm of the matrix and order if the shatten norm itself.

More precisely, why one would ever sick for the bound of the p-th moment of the p-shatyen norm of the matrix, why we would not consider the q-th moment of the p-th Shatten norm?

(see for example https://scholar.google.nl/scholar?hl=en&as_sdt=0%2C5&as_vis=1&q=moment+of+schatten+norm&btnG=#d=gs_qabs&u=%23p%3Dap9f83X_ylcJ)

## Reference request: norm topology on M(X) vs. weak topology

Let $ (X,d)$ be a metric space and $ \mathcal{M}(X)$ be the space of regular (e.g. Radon) measures on $ X$ . There are two standard topologies on $ \mathcal{M}(X)$ : The (probabilist’s) weak topology and the strong norm topology, where the norm is the total variation norm.

Surprisingly, I have found very little discussion in the literature comparing these two topologies rigourously, besides the oft-cited claim that the norm topology is much stronger than the weak topology. I am looking for a reference that discusses and compares these topologies, esp. things like convergence, boundedness, open sets, projections, etc.

I am mostly concerned with probability measures $ \mathcal{P}(X)\subset\mathcal{M}(X)$ , but I am not sure how much of a difference this makes wrt topological concerns.

## Luxemburg norm as argument of Young’s function: $\Phi\left(\lVert f \rVert_{L^{\Phi}}\right)$

Let $ \Phi$ be a *Youngs’s function*, i.e. $ $ \Phi(t) = \int_0^t \varphi(s) \,\mathrm d s$ $ for some $ \varphi$ satifying

- $ \varphi:[0,\infty)\to[0,\infty]$ is increasing
- $ \varphi$ is lower semi continuous
- $ \varphi(0) = 0$
- $ \varphi$ is neither identically zero nor identically infinite

and define the *Luxemburg norm* of $ f:\Omega\to\mathbb{R}$ as $ $ \lVert f \rVert_{L^{\Phi}} := \inf \left\{\gamma>0\,\middle|\, \int_{\Omega} \Phi\left(\frac {\lvert f(x)\rvert}{\gamma} \right)\,\mathrm{d}x\right\}.$ $

Question: What can we say about $ \Phi\left(\lVert f \rVert_{L^{\Phi}}\right)$ ? In particular, I’d like to know, if $ $ \Phi\left(\lVert f \rVert_{L^{\Phi}}\right) \leq C \int_{\Omega}\Phi(\lvert f(x)\rvert) \,\mathrm d x$ $ holds for some $ C$ independent of $ f$ .

Any idea or hint for a reference is welome!

**Notes**:

- The above inequality trivially holds for $ \Phi(t) = t^p$ , where $ p>1$
- Maybe it’s appropriate to consider this question in the more general framework of Musielak-Orlicz spaces. However, e.g. in Lebesgue and Sobolev Spaces with Variable Exponents I was unable to find an appropriate result.
- I have asked this question on Math.Stackexchange without luck, so I’m trying here.

## Schur norm of weighted Cauchy matrix

The Schur norm of a matrix $ A$ is defined to be $ \|A\|_S=\max\{\|A\circ X\|: \|X\|\leq 1\}$ , where $ \|\cdot \|$ is the operator norm of a matrix, i.e., the largest singular value.

Let $ a_1,\ldots, a_m, b_1,\ldots, b_n$ be positive reals.Let $ A$ be an $ m\times n$ matrix defined to be $ A_{i,j}=(a_i-b_j)/(a_i+b_j)$ .

My question is how to compute $ \|A\|_S$ . Is it upper bounded by an absolute constant independent of $ m, n$ ?

## The norm squared of a moment map

I am studying the paper by E. Lerman: https://arxiv.org/pdf/math/0410568.pdf

Let $ (M,\sigma)$ be a connected symplectic manifold with an hamiltonien action of a compact Lie group $ G$ , so that there exist a moemnt map $ $ \mu : M\to\mathcal{G}^\ast$ $ $ \mathcal{G}^\ast$ being the dual of the Lie algebra of $ G$ . We assume that $ \mu$ is $ G$ -equivariant: $ $ \mu(g\cdot x)=\mathrm{Ad}_g^\ast\circ\mu(x)$ $ and that $ \mu$ is proper (the preimage of any compact is compact). Let $ f=\|\mu\|^2$ (for an Ad-invariant norm on $ \mathcal{G}^\ast$ ).

I know that the moment map is important by:

1- a convexity theorem of Atiyah and Guillemin-Sternberg.

2- symplectic reduction, where the quotient of the zero level of the moment map by the group makes it possible to construct new symplectic manifolds.

Hence my question:

What is the motivation to study the norm squared of a moment map? in particular, why is it important to know that the zero level set of the moment map is a retract by deformation of a piece of the manifold?

As I understand it, $ f$ behaves like a Morse-Bott function (Kirwan works) and that the stable manifold of a critical component of $ f$ is a submanifold. That the gradient flow of $ f$ is defined for all $ t\geq0$ . Here Lerman asserts that this is true because $ f$ is proper, but $ x^3$ is proper but its gradient $ -3x^2\partial_x$ is not defined for all $ t\geq0$ .

I think we have to show that $ \nabla_f$ is $ G$ -invariant and therefore complete.

That $ f$ is real analytic to show that the limit of a trajectory of any point $ \phi_t(x)$ is reduced to a point $ \phi_\infty(x)$ . That the applications $ t\to\phi_t(x)$ and $ x\to\phi_\infty(x)$ are continuous.

## Linear regression: not noramalising by y’s norm

I was recently reading an article on Pearson correlation, and OLS coefficients. I came across the following section.

I understand that using calculus we can arrive at an expression for finding `a`

, the coefficient. The expression’s denominator turns out to not contain y’s norm. In the last paragraph of the excerpt, I could not understand the following line

Not normalizing for y is what you want for the linear regression

Why don’t we want to normalize for y? What is the physical/geometrical significance of this?

## Hoeffding to bound Orlicz norm

I have been reading from Weak Convergence and Empirical Processes, and came across the following: Let $ a_1,\ldots,a_n$ be constants and $ \epsilon_1,\ldots,\epsilon_n\sim$ Rademacher. Then

$ \mathbb{P}\left(\left|\sum_i\epsilon_i a_i\right|>x\right)\leq 2\exp\left(-\frac{x^2}{2||a||^2_2}\right)$

Consequently, $ ||\sum_i\epsilon_ia_i||_{\Psi_2}\leq\sqrt{6}||a||_2$ .

How does this follow (relation between Orlicz norm of Rademacher average and L2 norm of constants)? Thank you in advance for your time.

## Strict Convexity and Uniqueness of Dual norm

So, I have trouble proving the following, I’d be grateful if somebody helps me with this.

Let $ z$ be a given point in $ \mathbb{R}^m$ . Then, $ x\in \mathbb{R}^m$ is a dual vector of $ z$ with respect to $ \|.\|$ if it satisfies $ \|x\|=1$ and $ z^Tx=\|z\|’$ .

A norm $ \|.\|$ is said to be strictly convex if the unit sphere $ \{x:\|x\|=1\}$ contains no line segment.

Now, how does one prove that

The norm $ \|.\|$ is strictly convex if and only if each $ z\in \mathbb{R}^m$ has a unique dual vector.

## Spectral radius is the greatest lower bound for some matrix norm

I’m studying matrix analysis with Horn and Johnson’s book.

I have something trouble while reading the book.

There is lemma 5.6.10 lemma and the following is the proof of that Proof of lemma.

I have trouble in two lines below from the matrix such that 1-norm of (D_t \triangle D_t^{-1}) is less and equal to (\rho(A)+\epsilon).

1-norm is defined as the sum of all element in the matrix.

I understood that off-diagonal elements can be bounded by epsilon for large t. However, I cannot understand how does the sum of absolute values of eigenvalues will be bounded by spectral radius of A.