I found out that this is called muxing, however I cannot find how to do this.
What I want to have is an single stream with the video from the IP camera and the audio from the audio encoder. The video stream comes from an Axis IP Camera (h264) and the audio streams is from a Barix Instreamer (mpga). I have two separate streams, one audio and one video stream.