There are essentially three problems with these AI methods: Code, Data, and Compute.
Code, in my opinion, would be the least of the problems. With modern deep learning libraries, even complex model can be done relatively easily. Much more problematic is that these models require a huge amount of computation and GPU memory, and my not-so-recent-anymore GPU isn't really up to the task. If you look at e.g. the computation time estimates for Style-GAN for a single GPU, this will be measured in weeks even for the smaller models. And finally, maybe the largest problem, is that these deep-learning models require insane amounts of training data. I guess for something like animating a face you don't need to do too much annotation manually for the videos you feed in, but still you'd probably need a few thousand shot video clips of facial expressions.
The first step of using any sort of AI in WM, IMO, would be to use it to speed up pack creation. In principle it should be possible to automatically classify a folder of input images. I've done some quick tests with that, and the results weren't particularly good. I think part of the problem is that the task isn't well specified and the training data is really noisy, e.g. which tag an image file got might depend on which other tags already had enough images. I guess even an imperfect model would be quite a step forward for pack making, e.g. if when tagging images in a first pass the pack make just has to check whether the guesses of the neural network are correct , and then only has to do manual work an the (maybe 30% or so) remaining images.