What are the rules for China’s visa-free transit programs?

Assuming one wants to travel through China, what are the rules for visa-free access? To be more specific:

  • How does the 24-hour visa free transit work?
  • How does the 72-hour version work?
  • What does a person ineligible for either of these programs have to do to transit China?
  • Do the programs apply to flights only or do they also work for transit on land?
  • Would a visitor save much hassle by using visa free transit instead of just getting a visa?

NB: this is intended as a canonical question on transiting China

Can graph execution be better optimized than imperative programs?

I’ve been reading about Google’s TensorFlow, and the way it represents calculations with graphs that are then executed by an engine. While the concept is interesting, I would like to understand why they made that choice instead of the arguably simpler imperative programming found e.g in PyTorch.

The TensorFlow documentation lists several advantages:

  • Parallelism. By using explicit edges to represent dependencies between operations, it is easy for the system to identify operations that can execute in parallel.
  • Distributed execution. By using explicit edges to represent the values that flow between operations, it is possible for TensorFlow to partition your program across multiple devices (CPUs, GPUs, and TPUs) attached to different machines. TensorFlow inserts the necessary communication and coordination between devices.
  • Compilation. TensorFlow’s XLA compiler can use the information in your dataflow graph to generate faster code, for example, by fusing together adjacent operations.
  • Portability. The dataflow graph is a language-independent representation of the code in your model. You can build a dataflow graph in Python, store it in a SavedModel, and restore it in a C++ program for low-latency inference.

Portability makes sense, I’m more interested in the performance aspect. It seems that doing parallel and distributed computations doesn’t require graph execution, since PyTorch does it too. I assumed that creating a graph enabled the engine to do optimizations that could not be done with an imperative program.

But then I read about TensorFlow’s upcoming eager mode, which basically does away with the graph API and lets us use imperative programming like in other libraries. The documentation for eager mode suggests that it approaches and could reach the performance of graph mode:

For compute-heavy models, such as ResNet50 training on a GPU, eager execution performance is comparable to graph execution. But this gap grows larger for models with less computation and there is work to be done for optimizing hot code paths for models with lots of small operations.

One thing I didn’t find is distributed training with eager mode, but as mentioned earlier, other imperative libraries seem to offer that despite not using graphs.

I’m not sure what to make of this. Does graph execution have a performance advantage over imperative programming after all?

Multithreaded parameterized programs with superexponential shared memory size in the number of threads?

In this question, a program means a parameterized multithreaded program with the interleaving semantics, a finite number of per-thread states (which may depend on the number of threads), and finite number of shared states (which may depend on the number of threads).

A shared state is a valuation of the shared memory, and a per-thread (in other terminology, local) state is the valuation of thread-local memory (we assume no stack). Interleaving semantics means that the actions of the threads are interleaved on a single processor and a thread has rw-access to the shared memory and its own local memory and no access to the local memories of the other threads. Parameterized means that we conside a family of programs generated from a finite-desciption template such that the $ n$ th member of the family has $ n$ threads (which typically coincide up to the thread identifier).

To the best of my knowledge, for such a program, the size of the shared state-space is anywhere between constant (e.g., for a single boolean lock variable) and exponential (e.g., Peterson mutual exclusion protocol) in the number of the threads $ n$ .

Is there any well-known academic program in which the size of the shared state-space grows superexponentially in $ n$ ?

If the Intel Pentium processors, was not made compatible to programs written for its predecessor, it could have been designed to be a faster processor

I find this question while solving some government job question bank. If someone could provide the answer along with a little explanation it would be very helpful.

Ques:- If the Intel Pentium processors, was not made compatible to programs written for its predecessor, it could have been designed to be a faster processor.

  1. The statement is true
  2. The statement is false
  3. The speed cannot be predicted
  4. Speed has nothing to do with the compatibility

(I did not find any tag as microprocessor or something so i have to keep it under the tag computer architecture, sorry for that, but i did not have sufficient reputation to create a tag.)

How can I optimize a program’s performance when no profiling tools are available?

I am currently working on an OpenGl program whose performance I would like to improve. The performance is okay but not ideal on powerful dedicated GPUs, but is abysmal on integrated graphics (< 10 fps). In a normal program (CPU-based, no OpenGl or other GPU API), I would run a profiler (perhaps the one built into CLion) on the program, see where most of the time is spent, and then work on a better algorithm for those areas or find a way to reduce the amount that that area is called.

Using this technique on my OpenGl program shows that the vast majority of the program’s time (~86%) on its main thread (the one that I want to optimize) is spent in the OpenGl driver’s .so file. Additionally, the CPU usage of the program while it is running is very low, but the GPU usage hovers between 95% and 100%. Taken together, these pieces of information tell me that the bottleneck is in the GPU, so that is where I should optimize.

This where a problem occurs. My normal technique of using using a profiler to guide my optimizations won’t work without s specific GPU profiler, however. As such, I did some research to find a profiler that will tell me where GPU processing time is being spent. I could not find anything that is remotely usable. Everything was either Windows-only (I run exclusively Linux, and my program isn’t ported to Windows yet — nor will it be until it is much further along), no longer updated, and/or costs way more than the budget for this project is.

As such, I ask: how can I optimize my program’s performance when the relevant profiler does not exist? I tried guessing where the issues are and optimizing from that, however it made no difference whatsoever even though I was able to ascertain that my optimizations (frustum culling) did result in less work for the GPU by about half. A good answer will give some profiling technique that is applicable to Opengl on Linux, or will give a technique that works without a profiler.

Pre-Compile programs for RaspberryPi

I just hope i’m not wrong with asking this in here – often the rpi crashes when i want to install (make) a program/library on it. I think it’s a ram problem, and i could just add a swap partition. While that is a possible solution, i’d also thought about pre-compiling the program on an (virtual)machine and then load it onto the sd-card for the rpi – how would one do this exactly? Or is there an even better approach to this? I thank you all very much for reading and your time!

Ubuntu 19.04 installed into WSL “Windows System for Linux” on Win10 host – How? Will it run Graphics intensive programs?

The current Windows Store App “Ubuntu 18.04LTS” is just a “Terminal” version and less than 2gb. It does not run graphic intensive applications (opengl) like OpenCPN and it does not have a graphic interface.

Is there a way to “upgrade” to the full version of Ubuntu 18.04LTS or install Ubuntu 19.04 on WSL? Will everything work?

See:

Windows Subsystem for Linux Documentation

There’s more to WSL than Ubuntu – Look at Rolling your own

Must Read: Microsoft Put a Real Linux Kernel Inside Windows 10

It appears that this June there will be a new Windows Store app for Ubuntu 19.04 using WSL2. However I believe this Store app is just for the “Windows Terminal” version and not the full version. Just going to a Vbox install might be much easier?

Will I be able to install a full version of Ubuntu 18.04LTS or preferably Ubuntu 19.04 and run OpenCPN in WSL2 this June when Ubuntu 19.04 is available on the Windows Store?