audio coming right off the TDM phone network will be 8-bit samples, 8 kHz sampling rate, companded. Doing anything else requires direct IP peering (e.g. with a cellular carrier) which is out of reach of your typical inexpensive VoIP trunking provider.
They come from my VoIP provider as 16bit 8kHz WAV files. I crunch them down to MP3s using ffmpeg before uploading. I don't get any say in the source quality, but I'm quite pleased with it as well.
On the telephone network it was probably µ-law (https://en.wikipedia.org/wiki/%CE%9C-law_algorithm) encoded 8-bit 8 kHz data. This is what was used on the backbone of analog landlines, and now on VoIP for landlines (outside USA and some other countries there's the A-Law, very similar stuff). It's a logarithmic encoding that gives more quality than 8-bit PCM sound while only using 8 bits per sample. If you want to convert it to plain linear data without quality loss then you need 16-bit, whence that comes out from your provider. (µ-law encoded WAV files are also a thing)
> but I'm quite pleased with it as well.
Have to say that despite the filtering of high frequencies making it sound, well, like a telephone... µ/A-law data sounds pretty well. Much better than GSM and most low-bandwidth codecs, especially if the encoded sound is not just pure voice but also background noise that comes with it.