Nuxt HN | Ask HN: How do we get an optimal neural network architecture?

I've been playing around with neural networks and looking at some of the recent llm architectures. I'm far from an expert in this but the way some of them are configured seems a little random? Like stack this layer on top of this layer with these dimension, we don't know why this configuration works well it just does etc. Do we have a way of proving ahead of training that one architecture will be better than another? Or is it just stack more layers, tweak these parameters, train the model and check the benchmarks (how the brain was architected)? In that case whoever has the most compute and can run more trials to find the optimal architecture wins?