Skip to content
ldang edited this page Jan 23, 2014 · 26 revisions

Goal

Develop software/tools to run on the A/V capture computer to handle audiovideo streaming and archiving for the Southern California Linux Expo (SCaLE). We are investigating the use of GStreamer. (See GStreamer notes)

At SCaLE, we will have a PTZ (Pan-Tilt-Zoom) dome camera in various conference rooms, streaming video via RTSP (Real Time Streaming Protocol). Audio will be hooked up from handheld or beltpack microphones into the camera. We potentially may have scan converters and video encoders to capture what is being projected to the screen.

Goals

  • Record video from the cameras and slides (in sync) for offline editing and later upload to Youtube. basic capability
  • Ability to display SCALE slides between sessions and support streaming of live video from popular sessions to overflow rooms when necessary.
  • Allow volunteers to do "live switching", which is basically to switch between slides and speaker and picture-in-picture for the live stream.
  • Support live streaming of video to Youtube and/or Ooyala. NOTE: this requires streaming lower quality video (360p or 480p) than the desired recording quality (1080p). The camera supports concurrent streams of differing qualities, but the audio will need to be repackaged and transcoded. Last year, they used Wowza Remote.
  • Cut the video at SCALE, so it can be uploaded to Youtube sooner.

Technical Details

Dealing with ulaw audio from the cameras

Due to the odd sampling rate and format of the audio output from the camera, we would need to filter the RTSP stream

  • Keep the video stream as-is.
  • Upsample the audio and re-encode it in 48kHz aac-lc
  • Multiplex (Mux) the video and audio

Capture slides

Capture slides in a separate stream using a scan converter and video encoder; display both the speaker and the slides using picture-in-picture

  • @omwah, from an archival point of view, thinks that should write video stream, slide stream, and then the picture-in-picture to a file.
  • @mproctor13 is concerned that there is no way to sync the slides to the video. The easiest way is to put them in the same stream.
  • A possible solution is to add timecodes, but the way it's done professionally is to have an external clock on the cameras; if we do it via GStreamer, it is too late.
  • @jbermudes suggested start/stop the recording for each session. But @mproctor is concerned about start/stop, because there is a strong possibility that the operator will forget to start and video won't be captured.

@mproctor13 suggested another possible solution to handle slides that would also solve the issue of being able to display SCALE content on a projector between sessions. See below.

Display SCALE content on projector between sessions

When there is no speaker in a room, or between sessions, it would be nice to be able to display our own content instead of a blank screen.

@mproctor13: We can replace the M7001 with a usb video capture dongle and a linux system (like our existing SCALE A/V little computers or Raspberry Pis). We could use a ~$25 kvm to switch between the speaker laptop and recording computer which would be displaying slides. Bonus to this solution is that every room is an overflow room.

                                          |-----------------------|                 |-----------------------|           |------------------------| 
                                          |                               | s-video  |                               | USB  |                                |
VGA from speaker ---->| scan converter    |------------|    USB Video       |------   | Slides Computer|
Audio from speaker--|   |                               |                |                               |            |                               |
                                      |   |-----------------------|                 |-----------------------|            |-----------------------|
                                      |                 |
                                      |                 |      
                                      ---|-----------------------|        |-----------------------| 
                                          |                               |        |                              | 
VGA from slides     ---->|          KVM             |------|    Projector          |
Audio from slides          |                               |        |                              |
                                          |-----------------------|        |-----------------------|  
                                                        |
                                                        |
                                          |-----------------------| 
                                          |                               |
                                          |   Sound system  |
                                          |                               |
                                          |-----------------------| 

This opens up new items to investigation:

  • How do you control the KVM? We either need a human to flip the switch, or make it possible for the computer to control the KVM.

@mproctor13: "I was assuming that we would control the KVM from the computer, I think those computers have parallel ports which make for easy relay flipping and maybe bit-banging i2c to talk to edid on vga connector. Pushing the buttons on scan converter and easy i2c bus access was the original reason I thought of raspberry pi. But maybe if those computers have parallel ports that will work. Or we just detect when there is video coming from speaker laptop and automatically switch to it. (issue #9)

  • How do we make sure the speaker laptop outputs at the correct resolution for the scan converter? This might involve either hacking VGA cables or the scan converter, to handle the resolution negotiation. Resolution: We discovered that the scan converter does seem to manage the resolution negotiation well, and on our side, what we can do is write a note specifying 1024x768 @ 60Hz.

  • Will the Raspberry Pi work for our purposes? Does the software work? Is it stable? Can it keep up? (See Raspberry Pi notes) Resolution: No, the Raspberry Pi can't keep up. See issue #7. There is thought of trying other small computers, but given the time constraints, this has been tabled for next year

  • Do the SCALE A/V computers from last year have parallel ports we can use to control scan converter and Raspberry Pi?

As an added bonus, provide a web interface, where we can decide whether to stream video from another room or just display slides. See subsequent requirement.

Support overflow rooms

There are times when a presentation is standing room only. Last year, we were able to support overflow to empty conference rooms by streaming the video via VLC on one of the SCALE A/V computers. (We used a bash script that invoked cvlc.)

Provide a basic web interface for people to control the slide computer (See issue #12)

It would be nice to offer people the ability to vote on what session to view within a particular room. @LeStarch is interested in this. (See issue #11)

Adjust/boost the volume on the audio

  • An issue with the SCaLE videos on YouTube is that the audio seems to be quite a bit lower than other videos on YouTube, maybe 50% quieter or more.

Control camera and "live switching"

It would be nice to have camera operators for each room.

  • Control the cameras through a web interface (see issue #10)
  • Use gst-switch and speaker-track to do "live switching" for the live stream. (See issue #3) This is to switch between speaker and slides and picture-in-picture.

Cut video at SCaLE

To cut down on the time and workload involved in getting the videos ready to be posted online, we should figure out a way to cut the video at SCaLE. A way to partly automate this is:

  • Develop a mobile app that can be used to just record the start/stop times for prepopulated sessions in various conference rooms. (Ideally, the timestamp comes from the video capture machine.)
  • Use the output from above to automate a solution where ffmpeg can be used to slice up the room recording into appropriately names session files.
  • Have volunteers check over the video and/or do further editing before uploading.

Stress/Load Testing

  • Can the video capture machine handle the load of managing all the video streams?
  • Will the video capture machine hold up throughout the day, i.e., will processes start crashing at some point?

Hardware Specifications

PTZ cameras:

  • Samsung SNP6200H/SNP6200 outputs MPEG over RTSP stream with 8000kHz mu-law audio. It supports concurrent streams at differing qualities.

@mproctor13 thinks that when we read RTSP streams using the standard library--most are based on Live555, we get separate audio and video streams.

There are firmware updates for the camera, which may have an impact on this software development. (See Firmware update notes)

For the scan converter and video encoder:

  • Generic PC-to-TV converter has VGA input + loop-thru; it has S-Video/RGB output ports. There is a front panel which allows control of pan/zoom via on screen display. The supported resolutions/refresh rates: 640x480 @ 60/72/75/85 Hz, 800x600 @ 60/75 Hz, 1024x768 @ 60/75 Hz, and 120x1024@60 Hz.

  • Axis M7001 video encoder Amazon link can stream both H.264 and MJPEG over RTSP. Can control PTZ for certain PTZ devices. Can be powered over Ethernet (POE) or a DC adapter.

Youtube Specifications

This is what Youtube requires for live streaming:

  • Protocol: RTMP Flash Streaming
  • Video codec: H.264, Main 4.1 (surprisingly they wont accept webm)
  • Frame rate: 30 fps
  • Keyframe frequency: 2 seconds
  • Audio codec: AAC-LC (Audio required)
  • Audio sample rate: 44.1 KHz
  • Audio bitrate: 128 Kbps stereo

Notes from @irabinovitch:

  • Youtube can do transcoding for us on the fly. So if we push higher bit rates they'll create versions as needed for lower bandwidth environments and for mobile.
  • I dont believe our cameras support 44.1KHZ audio, we'll need to take care of that either in whatever tooling we use to transcode.
  • Each live stream can last up to 4 hours, so we'll need to create a "new" event every 4 hours. This may mean needing to update the embed code on the site through out the day.
  • If we want to do the video editing / clipping in youtube, we'll need to break our "live events" into 2 hours. Youtube doesn't let you edit anything

Our approach

We will investigate using GStreamer to capture video. We primarily need to figure out the GStreamer pipeline to do what we need, and then we need to learn how to work with GStreamer programmatically.

  1. Capture from RTSP
  2. Save the stream to file (record)
  3. Deal with audio -- (resample and re-encode)
  4. Multiplex the video stream with reencoded audio stream
  5. Put into appropriate transport stream

System Requirement (to make sure everyone is using same version)

We are working with Ubuntu distro. Probably 12.04.

For GStreamer, we are starting with gstreamer developer ppa to get 1.0. But if we go with gst-switch, there maybe be some other things to install out of git. Standardization is a work in progress.

@ irabinovitch asked about support for AAC encoding since faac is not available. @ mproctor13 said that voaacenc seems to work.

We are not sure yet whether we need to work with GStreamer programmatically; if we do, it might be good to use Python 3, as there are GST bindings.

Clone this wiki locally