Why don’t they use all kinds of non-linear functions in Neural Network Activation Functions? [duplicate]

Pardon my ignorance, but after just learning about Sigmoid and Tanh activation functions (and a few others), I am wondering why they choose functions that always go up and to the right? Why not use all kinds of crazy input functions, those that fluctuate up and down, ones that are directed down instead of up, etc.? What if used functions like those in your neurons, what is the problem, why isn’t it done? Why do they stick to very primitive very simple functions?

enter image description here enter image description here enter image description here

Why isn’t there just one “keystone” activation function in Neural Networks?

This article says the following:

Deciding between the sigmoid or tanh will depend on your requirement of gradient strength.

I have seen (so far in my learning) 7 activation functions/curves. Each one seems to be building on the last. But then like the quote above, I have read in many places essentially that "based on your requirements, select your activation function and tune it to your specific use case".

This doesn’t seem scalable. From an engineering perspective, a human has to come in and tinker around with each neural network to find the right or optimal activation function, which seems like it would take a lot of time and effort. I’ve seen papers which seem to describe people working on automatically finding the "best" activation function for a particular data set too. From an abstraction standpoint, it’s like writing code to handle each user individually on a website, independently of the others, rather than just writing one user authentication system that works for everyone (as an analogy).

What all these are papers/articles are missing is an explanation of why. Why can’t you just have one activation function that works in all cases optimally? This would make it so engineers don’t have to tinker with each new dataset and neural network, they just create one generalized neural network and it works well for all the common tasks today’s and tomorrow’s neural networks are applied to. If someone finds a more optimal one, then that would be beneficial, but until the next optimal one is found, why can’t you just use one neural network activation function for all situations? I am missing this key piece of information from my current readings.

What are some examples of why it’s not possible to have a keystone activation function?

Is it possible to train a neural network to solve NP-complete problems?

I’m sorry if the question is not relevant, i have tried to search for articles about it but i couldn’t find satisfying answers.

I’m starting to learn about machine learning, neural networks etc … and i was wondering if making a neural network that takes a graph as input, and output the answer of an np-complete problem (e.g. the graph is hamiltonian / the graph has independant set superior to k, and other decision problems) would work ?

I haven’t heard of any np complete problems being solved like this, so i guess it does not work, are there theoretical results stating that a neural network cannot learn np-complete language or something like this ?

How can Kneser-Ney Smoothing be integrated into a neural language model?

I found a paper titled Multimodal representation: Kneser-Ney Smoothing/Skip-Gram based neural language model. I am curious about how the Kneser-Ney Smoothing technique can be integrated into a feed-forward neural language model with one linear hidden layer and a softmax activation. What is the purpose of the Kneser-Ney in such a neural network, and how can it be used for learning the conditional probability for the next word?

Importing A Neural Network From Mathematica For Use In R

I am experimenting with platform interoperability between Mathematica and R.

My aim is to create an untrained Neural Network using Mathematica, export this network in MXNet format as a .json file, and import this network into R for a classification problem.

Creating the Network in Mathematica

Here i have created a basic neural network – this network is untrained. I have exported the network alongside the network parameters.

In mathematica the code is as follows.

dec=NetDecoder["Class",{"Chronic Kidney Disease","No Kidney Disease"}]  net =   NetInitialize@   NetChain[{BatchNormalizationLayer[], LinearLayer[20], Ramp,      DropoutLayer[0.1], LinearLayer[2], SoftmaxLayer[]},    "Input" -> 24, "Output" -> dec    ] 

There are 24 feature variables for the input and the output is the netdecoder. I then export this network.

Export["net.json", net, "MXNet"] 

This produces two files, one with the network, and another with the parameters. By using FilePrint we can visualise this


which returns

{     "nodes":[         {"op":"null","name":"Input","inputs":[]},         {"op":"null","name":"1.Scaling","inputs":[]},         {"op":"null","name":"1.Biases","inputs":[]},         {"op":"null","name":"1.MovingMean","inputs":[]},         {"op":"null","name":"1.MovingVariance","inputs":[]},         {"op":"BatchNorm","name":"1","attrs":{"eps":"0.001","momentum":"0.9","fix_gamma":"false","use_global_stats":"false","axis":"1","cudnn_off":"0"},"inputs":[[0,0,0],[1,0,0],[2,0,0],[3,0,0],[4,0,0]]},         {"op":"null","name":"2.Weights","inputs":[]},         {"op":"null","name":"2.Biases","inputs":[]},         {"op":"FullyConnected","name":"2","attrs":{"num_hidden":"20","no_bias":"False"},"inputs":[[5,0,0],[6,0,0],[7,0,0]]},         {"op":"relu","name":"3$  0","inputs":[[8,0,0]]},         {"op":"Dropout","name":"4$  0","attrs":{"p":"0.1","mode":"always","axes":"()"},"inputs":[[9,0,0]]},         {"op":"null","name":"5.Weights","inputs":[]},         {"op":"null","name":"5.Biases","inputs":[]},         {"op":"FullyConnected","name":"5","attrs":{"num_hidden":"2","no_bias":"False"},"inputs":[[10,0,0],[11,0,0],[12,0,0]]},         {"op":"softmax","name":"6$  0","attrs":{"axis":"1"},"inputs":[[13,0,0]]},         {"op":"identity","name":"Output","inputs":[[14,0,0]]}     ],     "arg_nodes":[0,1,2,3,4,6,7,11,12],     "heads":[[15,0,0]],     "attrs":{         "mxnet_version":["int",10400]     } } 

Importing the Network into R

Now we have an untrained network as a .json file in MXNet format.

We can import this using:

library(rjson) mydata <- fromJSON(file="net.json")  

The Problem

Im not sure how to process the exported net in R. Is it possible to use the imported untrained network from Mathematica, to then be used in R to train on some data?

Creating Loss Ports For Multiple Output Neural Net

I am making a multi-classfication neural net for a set of data. I have created the net but i think i need to specify a loss port at for each classification

Here are the labels for the classification and the encoder & decoders.

labels = {"Dark Colour", "Light Colour", "Mixture"} sublabels = {"Blue", "Yellow", "Mauve"} labeldec = NetDecoder[{"Class", labels}]; sublabdec = NetDecoder[{"Class", sublabels}]; bothdec = NetDecoder[{"Class", Flatten@{labels, sublabels}}]  enc = NetEncoder[{"Class", {"Dark Colour", "Light Colour", "Mixture",      "Blue", "Yellow", "Mauve"}}] 

Here is the Net

SNNnet[inputno_, outputno_, dropoutrate_, nlayers_, class_: True] :=   Module[{nhidden, linin, linout, bias},   nhidden = Flatten[{Table[{(nlayers*100) - i},       {i, 0, (nlayers*100), 100}]}];   linin = Flatten[{inputno, nhidden[[;; -2]]}];   linout = Flatten[{nhidden[[1 ;; -2]], outputno}];   NetChain[    Join[     Table[      NetChain[       {BatchNormalizationLayer[],        LinearLayer[linout[[i]], "Input" -> linin[[i]]],        ElementwiseLayer["SELU"],        DropoutLayer[dropoutrate]}],      {i, Length[nhidden] - 1}],     {LinearLayer[outputno],      If[class, SoftmaxLayer[],       Nothing]}]]]  net = NetInitialize@SNNnet[4, 6, 0.01, 8, True];  

Here are the nodes that are used for the Netgraph function

nodes = Association["net" -> net, "l1" -> LinearLayer[3],     "sm1" -> SoftmaxLayer[], "l2" -> LinearLayer[3],     "sm2" -> SoftmaxLayer[],    "myloss1" -> CrossEntropyLossLayer["Index", "Target" -> enc],    "myloss2" -> CrossEntropyLossLayer["Index", "Target" -> enc]]; 

Here is what i want the NetGraph to do

connectivity = {NetPort["Data"] ->      "net" -> "l1" -> "sm1" -> NetPort["Label"],    "sm1" -> NetPort["myloss1", "Input"],    NetPort[sublabels] -> NetPort["myloss1", "Target"],     "myloss1" -> NetPort["Loss1"],    "net" -> "l2" -> "sm2" -> NetPort["Sublabel"],    "myloss2" -> NetPort["Loss2"],    "sm2" -> NetPort["myloss2", "Input"],    NetPort[labels] -> NetPort["myloss2", "Target"]}; 

The data will diverge at “net” for each classifcation and pass through the subsequent linear and softmax layer and to the relevant NetPort The problem im having is at loss port which diverges at each softmax layer.

When i run this code

NetGraph[nodes, connectivity, "Label" -> labeldec,   "Sublabel" -> sublabdec] 

I recieve the error message: NetGraph::invedgesrc: NetPort[{Blue,Yellow,Mauve}] is not a valid source for NetPort[{myloss1,Target}].

Could anyone tell me why this occurring?

Thanks for reading.

Fully-connected feed-forward Neural Network

I’m working on a practice exam for my machine learning quiz; however, this practice exam does not have any solutions. While I can usually tell if my answers are right or wrong by comparing to my peers, I am unsure about this particular question:

In a fully connected feed-forward neural network, which of the following doesn’t contribute to the partial-derivative sum for a specific node during back propagation, where the node in question is at least two hidden layers deep in the network?

1) Every bias in the first hidden layer.

2) Every activation in the previous layer.

3) Every activation in the node layer.

4) The weight of every edge connecting the node to the previous layer.

5) All of the above contribute.

While I think the answer might be (1) since at layer n, we the computation of a node is based on the activation nodes in the previous layer and the weights and biases of a node in the current layer.

However, intuitively, I think that the answer might be (5) (I don’t have much of a justification for this).

Could someone please tell me which line of reasoning is correct?

Approximating Deep Neural Networks (DNNs) with Binarized Neural Networks (BNNs)

I am working currently as a research intern on Binarized Neural Networks where the weights and the activations of the network are binary. The architecture of this type of networks makes them memory efficient and computationally efficient, which makes them ideal for resource constrained environments, like embedded devices and mobile phones.

The interesting part about BNNs is that we can encode a binarized network as a CNF formula (Boolean Formula). Using this formula, we can verify some properties of the network like Robustness against adversarial examples (carefully crafted samples looking similar to usual inputs but designed to mislead a pre-trained model). We can also extract explanations that support neural network decisions, hence make the neural network explainable.

Currently, I am trying to make a DNN explainable by verifying its decisions using BNNs. The first direction of research is to reduce a DNN to a BNN. Of course the two networks should be equivalent. I am researching ways to make this reduction but I haven’t found any works in the subject. Is it possible to carry out this transformation ? Is there any techniques that can “binarize” a DNN ?

Thanks 🙂