DVDMax was MGI Inc. software for playing DVD’s.
Current version on windows is version 5.0. A linux version is currently under development and in ﬁnal stages.
The purpose of this report is to describes the design and implementation of the DVDMax software on Linux to help in the port of the software to SUN Solaris operating system.
The port of the DVDMax software to Solaris will be based on the Linux version.
The following is from the statement of work for DVDMax on Solaris
The features of the sunDVD player will be the same as the features listed in the product description SoftDVDMax, entitled ”SoftDVD Max Reviewer’s Guide (july 1999)” and ”MGI SoftDVD Max speciﬁcations (July 1999)” copies of which are attached to this statement of work.
Examining the above mentioned document, I come up with the following points.
These are the main port issues that I see right now for a successfully port of the Linux based DVDMax to Solaris.
All the above tools needs to be installed on Solaris before starting to build DVDMax.
To minimize changes to the build process, and to speed the port process, gcc will be used initially.
After a successfully completion of the port, gcc can be replaced with SUN cc compiler along with any changes to cc commands and options used in the current Makeﬁles.
There will be changes needed in the makeﬁles in order to build Sparc assembler routines using VIS instructions. Those are the modules that will replace the current intel assembler modules. My current understanding is that the Solaris build will target UltraSparc with v9a extensions, so the option -Xarch=v9a will be used to assembler the Sparc assembler modules.
The resulting executables will only run on Solaris with 64-bit Kernels on UltraSprac hardware. 1
Another changes in the current Makeﬁles that needs to be made are those to link against the Solaris libdvd and, if needed, against MediaLib libraries. The Makeﬁles that come with the examples of how to use the above libraries will be used to help in making the changes.
We also need to ﬁnd a way to make the build system easier to use on multiple operating systems. One possible way is to use autoconf macro to guess the OS name, and based on the OS name, generate the correct compile and link options, and any compile time variables, such as -DSOLARIS into the compile command and have those automatically generated when the conﬁgure command is run. This needs to be investigated more.
On Solaris Motif will be used instead on Qt. But with latest announcement by SUN that they will adopt GNOME as the default desktop for feature Solaris releases, it seems that it will make more sense to use GTK+ for the user interface. 2 .
There will also be an impact on the build process here for build the user interface, a new Makeﬁles will be needed to build the user interface on Solaris since diﬀerent libraries are used than on Linux.
On Solaris a new implementation will be used. This needs to be coordinated and integrated into the Solaris DVDMax build process. My understanding is that a third party will write this software.
Linux implementation of display have an MMX module for doing color space conversion from YUV to RGB.
The Solaris implementation will use VIS sparc instructions to achieve the maximum performance on Sparc hardware. Hence a conversion of the Intel assembler and MMX to Sparc assembler and VIS will be the most critical work that needs to be completed successfully.
I outline in details the current Linux design and implementation of MPEG-2 decoder below with the interfaces to the Intel assembler in order to help with this conversion. It is expected that SUN will do the actual conversion based on these interfaces. 3
udf.c is used by the DVDDisc ﬁlter and the IFO implementation.
The DVDMax software on linux is based on individual components, called ﬁlters. Each ﬁlter has what is called pins. These are software connection that connect the input of one ﬁlter to the output of another ﬁlter.
Each ﬁlter takes some input, process the input, and sends the output to one of its output pins. A ﬁlter can have more than one output pin. For example, the demux ﬁlter have a number of output pins.
A plugin shared library is loaded at run time by the main program of DVDMax, each plugin create one type of ﬁlter. So there is one-to-one mapping between the plugin and the ﬁlter it creates. For example, the dvddisk.so plugin creates the ﬁlter that reads from the dvd disc and write to the demux ﬁlter. The actual methods that implement the work of the ﬁlter are in the plugin, the ﬁlter is the data structure that is used to interface to those methods. The ﬁlter also contains pointers to other variables important for the working of the plugin, such as buﬀers and ﬂags.
Each plugin contains one pthread thread than continuously runs reading input and generating output. 1
Figure 1.1 on page 13 shows the architecture of DVDMax on Linux and Solaris. The grayed components indicates an assembler/MMX intel modules.
The unit of data exchanged between ﬁlters is LMFpacket struct. Figure 1.2 on page 16 shows how an LMFPacket is send from the one of demux ﬁlter output pins that is connected to the mpeg-2 decoder input pin.
A user program main() starts by a call to lmf_manager.c/loadLMFPlugin(), passing it a string name of the plugin (the sharable module) to load. When this call returns, it return back a pointer to a ﬁlter object associated with this plugin.
loadLMFPlugin() uses dlopen() to load the sharable library to memory, then dlsym() is used to obtain a pointer to the function called ’initPlugin’ in that module. So each plugin must have such an entry point.
Then initPlugin() is then called, which creates the ﬁlter object. The ﬁlter is then returned to the caller. Notice that the handle to the plugin sharable library is saved in the ﬁlter object, so it is not lost.
Each ﬁlter has an initPlugin(), which calls lmf_ﬁlter.c/createLMFFilter() to create the Filter structure.
Then another call to lmf_ﬁlter.c/initLMFFilter() is made to do any initialization to the ﬁlter struct and which will setup the command table in the ﬁlter, which contains pointers to default functions in the lmf_ﬁlter.c module.
The plugin simply allocates a ﬁlter struct from the heap. The ﬁlter struct contains a table of function pointers. These functions are inside the plugin and the ﬁlter is passed around during the calls. The ﬁlter struct contains pointers to any data buﬀers used by the plugin.
The mpeg decoder is a ﬁlter, it contains a pthread thread, it reads its input from a ring buﬀer, each entry in the ring buﬀer is a pointer to data of type LMFPacket. The ring buﬀer is written to by the demux ﬁlter via the connection between the output pin of the demux ﬁlter to the input pin of the mpeg ﬁlter. The ring buﬀer is protected by a pthread mutex against concurrency access by more than one thread at a time.
The mpeg thread runs all the time, it calls video.c/PLayBackCycle(), which in turn calls getvideo.c/requestConsecutiveVideoBytes(), which in turn calls mpeg.c/GetVideoInputBytesFromFile() which removes bytes from the ring buﬀer, and write them to the global MPEG buﬀer.
So, when it returns, video.c has stream data in the global MPEG buﬀer to process.
Figure 1.3 on page 20 shows the mpeg ﬁlter and related data structures involved.
The mpeg decoder contains a number of assembler modules. They are:
Life starts when the runﬁlter() thread is started, this is the mpeg-2 internal thread, which continues to loop calling playbackcycle() in video.c.
When playbackcycle() is called, it calls GetVideoStartCode() in video.c which in turn looks into the global vd variable, if it needs video stream data to process, it calls requestConsecutiveVideoBytes() in getvideo.c to get the required number of bytes.
requestConsecutiveVideoBytes() will call GetVideoInputBytesFromFile() in mpeg.c to move the required bytes from the mpeg-2 ring buﬀer to the global vd variable. If the Ringbuﬀer is empty, It will block waiting.
During the decoding of the bit stream. many calls are made to MmxGetBits.S (MMX instructions) for parsing video bit stream.
When GetVideoStartCode() returns, playBackCycle() continues by doing a large switch statement on the picture code (in the global vd.i.startcode). Looking at one case, when a picture() start code is detected, picture() is called (in video.c).
picture() parses the picture stream, it parses each macroblock, there is a LOOP over all macroblocks, each time it needs to decode a macroblock, it calls SampleProcessMacroBlock() in samplemc.c. This assumes that samplemc.c is the modules used to driver the actual decoding at the assembler level. There are two main C modules for doing this, one is samplemc.c which the diagram below is based on, and another one called fastmc.c which I’ll will into in more details later on. These C modules interface to the assembler modules for doing the actual decoding in assembler.
SampleProcessMacroBlock() ﬁnds the type of the block. Motion compensation is ﬁrst done by making calls to ’recon()’ C routine, which ends up calling ’recon_comp()’ in recon.c, in this ﬁle, there is a #if USE_MMX_FOR_RECON to decide if motion compensation is done using MMX or plain C. If MMX is to be used, MMX routine in recon.S is called.
When motion compensation is done, MMX routines in Vscale.S are called to decode the blocks. Either IntraVldIdctEightBitOutput() is called, or NonIntraVldIdctNineBitSun() is called.
MMX instructions in Vscale.S calls _intraVld or _NonIntraVld MMX routine in vld.S to Decodes MPEG Variable Length Code blocks into 8x8 arrays. After the routines in Vld.S return, Vscale.S calls _IntraQuant or _NonIntraQuant MMX routines in vquant.S to Dequantizes, scales, and clamps output arrays from VLD.
vquant.S MMX routines in turn jump to the idct MMX routine in idct.S to Computes IDCT on 8x8 array of DCT coeﬃcients.
When this is all done, and when end of picture is reached, then SampleEndingPictureMC() in samplemc.c is called. This in turn called QueueForDecodeAndDisplay() in vidqueue.c to queue the decoded frame. This ends up calling SampleRenderingFunctionMC() in sampelmc.c which call DecodedYCrCbToDisplay() in swdisp.c to display the frame.
In swdisp.c, there is a queue where the decoded frame is send to the output ﬁlter (X11 ﬁlter for example). Which will actually display the picture to the display. Notice that color mapping conversion is done in the output ﬁlter and not by the mpeg decoder.
Figure below shows the main dataﬂow in the mpeg-2 ﬁlter.
The core of the decoder is in decoding macrobloacks. This is in samplemc.c, in the function SampleProcessMacroBlock() or in fastmc.c in the function FastSoftwareProcessMacroBlockMC() depending on how the build was done.
Figure below shows the algorithm used.
The decoding process goes through these steps
The mpeg is divided in two main section, the C modules does the high level processing, such as reading the bit stream, locating the macroblocks, deciding on the type of the picture and type of prediction needed. Once the macroblock is found and needs to be decoded, the assembler routines are called to do the process. The interface between the C modules and the assembler modules can be looked at as being the fastmc.c module, or the samplemc.c modules depending on the build parameter used (only one of those can be used).
Figure below illustrate the above. It shows that the C modules share C based global variables, and that the assembler modules share assembler based data buﬀers and tables. Also, the assembler routines have access to the C based buﬀers.
mmxGetBits is used to obtain, examine, and skips bits in the video bit stream. It is the main interface to access the video bit stream during the decoding process.
The bit stream is accessed via global pointer vd.i.puDword, 2 MMX registers are used to store the top 128 bits in the bit stream. 2 . The symbolic names of these 2 MMX registers is FIRST and SECOND.
Another MMX register, with a symbolic name of COUNT is used to store the number of bits consumed in the FIRST register. The value in the COUNT register is saved in memory in the variable vd. i.bitsUsedInDword.
Another MMX register with symbolic name of SOURCE is used to contain the address the top of the bit stream, and is advanced by 8 bytes at a time. The value of this register is saved in memory in the variable vd.i.puDword.
The C interface to the mmxGetBits.S is as follows
This is a high level version of mmxGetBits. Lets call this cGetBits.c It will have the same interface as the Mmx based functions. The purpose of this is to help show what the GetBits MMX code does, this is not meant to be working code that will compile as is.
Figure 1.3.4 on page 35 shows detailed walkthough of the initGetBits MMX code. Figure 1.3.4 on page 38 shows detailed walkthough of the InputToMmx MMX code. Figure 1.3.4 on page 41 shows the rest of the walkthough of the InputToMmx MMX code. Figure 1.3.4 on page 44 shows summary of InputToMmx MMX code.
Figure 1.3.4 on page 47 shows detailed walkthough of the MmxToInput MMX code, MmxToInput() basically takes the output of the opertation in MMX registers, and update the vd.i.puDword and vd.i.bitsUsedInDword.
Figure 1.3.4 on page 50 is a walk though of GetVideoBitsSmall(), it takes as an argument the number of bits to return from the video stream, and the return value will contains those bits. Since unsigned int is used for the return value, only 32 bites can be returned per each call. MMX registers COUNT, FIRST and SECOND are updated as needed for next call.
The vrecon.S MMX module contains the code for doing video reconstruction and averaging routines. the C interfaces to the entry points in this module is declared in vrecon.h This MMX module is called from the fastmc.c module. Total number of lines in vrecon.S module is about 1130 including comments.
These are C interfaces from vrecon.h
I’ll will look at one function from the above, the MMX_Recon_no_motion_f() to show the interface to it.
There is a C version of vrecon.S that exist. It is commented out code sections in the same ﬁle vrecon.S, so it is possible to initially use that for Solaris. See the C code (all written as macros) for more description of what the assembler does.
The vscale.S module is called after motion compensation. There are number of interface to this module, however, only two are used. One for non-intra blocks, and one for intra blocks.
These are the entry points to the vscale.S module:
For intra blocks, IntraVldIdctSevenBitShiftedOutput is called.
For non-intra NonIntraVldIdctEightBitShiftedSum is called.
vscale.S does not access C based structures, but will access the 8x8 IDCTbuﬀer deﬁned in data.S assembler module.
For intra blocks, the entry point is _intraVld and for non-intra blocks, the entry point is _NonIntraVld. These MMX entry points read from C based global structures and read and write to assembler based buﬀers such as the IDCTbuﬀer, VLCTable0, VLCtable1, VLCTable2, VLCTable3, DCluma buﬀer, DCShift.
The C based ﬁelds in C structures that this assembler code reads are ﬁelds in a structure of type struct MPEG_VIDEO_VLD_VARIABLES_STRUCT declared in vld.h and type struct MPEG_VIDEO_INPUT_DATA_VARIABLES declared in video.h.
The C ﬁelds read from struct MPEG_VIDEO_VLD_VARIABLES_STRUCT are: ﬂag mc_intraBlockIsLumFlag, ﬂag dPictureFlag, intra_vlc_format, macroblockIsIntraFlag, mpeg2IfNotZero.
The C ﬁelds written into struct MPEG_VIDEO_VLD_VARIABLES_STRUCT are: vldLimitOverﬂowFlag.
The C ﬁelds read from struct MPEG_VIDEO_INPUT_DATA_VARIABLES are: bitsUsedInDword, puDword, bitsUsedInDword.
The C ﬁelds written into struct MPEG_VIDEO_INPUT_DATA_VARIABLES are: bitsUsedInDword.
Dequantizes, scales, and clamps output arrays from VLD.
There are two main entries into this assembler module. For Intra blocks it is _IntraQuant and for non-intra blocks, it is _NonIntraQuant.
This assemble module access C based global variables, and assembler based buﬀers and tables.
The C based variables accessed are ﬁelds in global variable vld, which is of type MPEG_VIDEO_VLD_VARIABLES_STRUCT. The variable vld itself is a ﬁeld in a larger variable, vd, of type MPEG_VIDEO_DECODER_VARIABLES_TYPE that is allocated in the module video.c.
The C based ﬁelds in struct MPEG_VIDEO_VLD_VARIABLES_STRUCT accessed in vld.S are: intra_dc_precision (which can be 1,2,4 or 8), mc_pDcPredictor (ptr to block), psIntraQuantMatrix (ptr to Intra quantizing matrix), intraQuantMatrixScale, psNonIntraQuantMatrix (ptr to non-intra quantizing matrix), nonIntraQuantMatrixScale, quantizer_scale(global quantizer scale).
Assembler based buﬀers accessed are located in data.S module, and they are: IdctColumnMask, 8x8 IDCTBuﬀer (read/write access).
Computes IDCT on 8x8 array of DCT coeﬃcients.
The entry point in this module is idct, which is called to do idct on the 8x8 IDCTBufger.
The C based variables read in this module are ﬁelds in structure of type MPEG_VIDEO_VLD_VARIABLES_STRUCT, and these are: mc_IdctblockDestinationStride, and mc_pucBlockDestination.
The assembler based buﬀer accessed are: 8x8 IDCTBuﬀer. This contains the dct coeﬃcients, to perform idct on. the DCSTEP and various other assembler based constants, all of these are deﬁned in data.S.
idct stores 16 bits ﬁnal results in MMX registers, then it calls the output routine, which clamps the results, scales them to a speciﬁed precision, and stores or sums the results into 8-bit, pre-conﬁgured C based buﬀers.
The output routine for intra blocks is called Out7BitIntra in the assembler module vscale.S, and the output routine for non-intra blocks is called Sum8BitNonIntra in the assembler module vscale.S.
The X11 output ﬁlter takes as input a decoded picture frames from the mpeg-2 decoder (or subpic decoder), and will display the frame. Currently, the output ﬁlter will do YUV to RGB conversion using an MMX module.
The X11 output ﬁlter is located in dvd2/src/filters/sink/video/x11/ directory in the source tree.
The main ﬁlter is implemented in the ﬁle x11video.c. Other supporting ﬁles are display.c which has functions that are called from X11video.c to actually display the frames, and yuvconv.S, which is an MMX modules that does the YUV to RGB conversion. There are two versions of YUV to RGB conversions, one with alpha blending and one without.
The MMX code is called from the display.c module to do the conversion before displaying.
The video output ﬁlter is build in one of 5 modes. Once selects the mode to build the ﬁlter in by manually editing the ﬁle build.h in the same directory, and setting the variable OUTPUT_MODE to the mode needed. Looking at build.h we see:
If OUTPUT_MODE_VIDMEM is selected, the the device /dev/agpgrat is opened and used to write to. The device is opened, then queried using an IOCTL call to compute the video memory size. Then the device is memory maped using mmap() call for the calculated size.
The result of the mmap() call is to return a memory pointer which is the mapped memory the the device accessable memory to write the frame to. This memory address is used in the function DisplayFrame() by the x86 instructions to move the frame buﬀer to the device /dev/agpgrat/ mapped buﬀer as shown below
If OUTPUT_MODE_DGA is used, the ﬁle /usr/X11R6/include/X11/extensions/xf86dga.h is included and the build is linked to shared library xf86dga.so. The process of using direct graphics calls is initialized using the following sequence of calls to function in the xf86dga.so library.
To output a frame using direct graphics mode, after calling YUV to RGB conversion, calls to XF86DGASetViewPort() are made as shown
The screen is closed in DGA mode by making a call to XF86DGADirectVideo(display,screen,0).
The intel i810 graphic card has the following features (obtained from the net http://www.xfree86.org/4.0/i810.html).
Hardware acceleration is not possible when using the framebuﬀer in 32 bit per pixel format, and this mode is not supported by this driver.
Interlace modes cannot be supported.
This driver currently only works for Linux/ix86, and normal use requires the agpgart.o kernel module, included in Linux kernels 2.3.42 and higher.
This mode requires mgilib.h which I was not able to ﬁnd in the source tree. (ask Ben on that).
some of the functions called in the mgilib when running in i810 mode are:
mgiGetDriverInfo(), mgiMapDriverInfo(), mgiStartOverlay(), mgiCloseOverlay().
(Need to ﬁnd more information on this mgilib).
In this mode, we use functions as deﬁned in X11 extention /usr/X11R6/include/X11/extensions/XShm.h
To load the display window in XSHM mode we create a shared memory segment and map it to the window created as shown:
To close the display in XSHM mode we detach from the shared memory segment and then use X call to destroy the display.
To display a frame in XSHM mode we make a call to XShmPutImage() followed by a call to XSync().
The SDL functions used when running in this mode are:
To load the display window in SDL we do
To display a frame in SDL mode, do
The SDL functions are implemeted in the directory dvd2/src/filters/sink/video/sdl/ in the source tree.
These are the steps I did to build DVDMax on Solaris.
where ﬁle.gz is any one of the following
Simply gunzip and tar xf the above, and run ’make install’ from the top level directory. If for some reason you get an error /usr/local/bin/install not found, then from the top level directory of ﬁleutils do this
Assume this is installed in /home/nabbasi/data/QT_downloads/qt-2.2.0-beta2 then add this to your .bashrc (for bash):
To build QT yourself, do this
Assume you extracted it to /export/home/kde-1.1.2-1-Solaris-7-Sparc/ then do this
cd /usr/loca/bin ln -s gcc CC ln -s gcc cc ln -s flex flex++ ln -s flex lex
export CC=''gcc -DSOLARIS''
/* the mode selected */ #define OUTPUT_MODE OUTPUT_MODE_XSHM
cd lmf; rm config.cache; ./configure; make uninstall; make; make install
cd sconv; rm config.cache; ./configure; make uninstall; make; make install
cd dvd2; rm config.cache; ./configure; make uninstall; make -i; make -i install; make -i; make -i install
startx -- -bpp 16
cp ./dvd2/src/app/frontend/skin.tar.gz $HOME cd gunzip skin.tar.gz tar xf skin.tar
su; export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib; cd dvd2/src/tests; ./ftest12
As of 090500, result of Solaris build shows these plugins being build (9 plugins)
On Linux, complete build shows 14 plugins.
plugins failed to link on Solaris are: ac3, decss, mpeg, oss, x11.