I don’t know if this was a one time event and my suggestion is too little too late, but you could take multiple frames using a tripod, people will be in and out of the frames, which is fine. Then you load all the images into photoshop as layers, use the align layers tool, then use the median function for blending.
That goes pixel by pixel and compares that to the other values in the other layers, and chooses the pixel value that is the median. That should leave the final image to consist of only the pixels that are supposed to be there.
There’s a lot of examples but here is one quick one — https://fotographee.com/remove-people-images/