Difference between revisions of "Extract DVD"

From Maze's wiki
Jump to: navigation, search
(Subtitles)
(Subtitles)
 
(12 intermediate revisions by the same user not shown)
Line 42: Line 42:
 
Install the packages
 
Install the packages
 
<pre>
 
<pre>
apt-get install libav-tools lsdvd mplayer oggz-tools dvdauthor
+
apt-get install libav-tools lsdvd mplayer oggz-tools dvdauthor tesseract-ocr
 
</pre>
 
</pre>
  
 
===Prepare===
 
===Prepare===
Use lsdvd to see what's on DVD. Determine the stream you would like to extract as well as the aid for audio and the sid for subtitles.
+
Create an image of the disc to avoid disc read problems
 
<pre>
 
<pre>
lsdvd -x /dev/sr0
+
ddrescue -n -b 2048 /dev/<dvddevice> <title>_ddrescue.iso
 
</pre>
 
</pre>
Export the palette to a file. You will later need this when extracting subtitles.
+
Use lsdvd to see what's on DVD. Determine the stream you would like to extract as well as the aid for audio and the sid for subtitles.
 
<pre>
 
<pre>
lsdvd -t <stream> -P /dev/sr0 | grep Palette | cut -d: -f 2 | tr ' ' '\n' | tail -n +2 > <title>.palette
+
lsdvd -x <title>_ddrescue.iso
 
</pre>
 
</pre>
 
Write the stream to the harddrive so the next steps will go faster.
 
Write the stream to the harddrive so the next steps will go faster.
 
<pre>
 
<pre>
mplayer dvdnav://<stream> -dumpstream -dumpfile <title>.vob
+
mplayer dvdnav://<stream>/<title>_ddrescue.iso -dumpstream -dumpfile <title>.vob
 
</pre>
 
</pre>
 
Investigate the VOB file and note the numbers for the videostream and the audiostream
 
Investigate the VOB file and note the numbers for the videostream and the audiostream
Line 113: Line 113:
 
00ff00
 
00ff00
 
</pre>
 
</pre>
The 1st color is red, the 2nd color is yellow, the 3rd color is purple, the 4th color is aqua<br />
+
The 1st color is red, the 2nd color is yellow, the 3rd color is purple, the 4th color is green<br />
 
Extract the subtitles from the stream.
 
Extract the subtitles from the stream.
 
<pre>
 
<pre>
spuunmux -p <title>.rgb <title>.vob
+
spuunmux -s <sid> -p <title>.rgb <title>.vob
 
</pre>
 
</pre>
 
Using an image viewer determine the center color (no outlines) of the subtitle in the image and make this black text on a white background using:
 
Using an image viewer determine the center color (no outlines) of the subtitle in the image and make this black text on a white background using:
 
<pre>
 
<pre>
for file in *.png; do convert $file -fill '#000000' -opaque '#<palette>' -threshold 1% $file.pnm; gocr -a 100 -m 130 -i $file.pnm -o $file.txt; done
+
for file in *.png; do convert $file -fill '#000000' -opaque '#<palette>' -threshold 1% $file.pnm; tesseract -l <3 letter code language> $file.pnm $file; done
 +
</pre>
 +
Now combine the txt files and the sub.xml file and convert it to the SRT format
 +
<pre>
 +
linenr=1; cat sub.xml | sed '/subpictures/d' | sed '/stream/d' | cut -d\" -f 2-6 | tr ' ' ',' | tr -d '"' | sed 's/,[^=]*=/,/g' | sed 's/,\([^,]*\)$/ --> \1/' | while read line; do file=`echo $line | cut -d, -f 1`; times=`echo $line | cut -d, -f 2`; echo $linenr; echo $times | tr '.' ',' ; cat $file.txt; echo; linenr=$(($linenr+1)); done > <title>.srt
 
</pre>
 
</pre>
  
Line 126: Line 130:
 
Combine everything in a ogg container
 
Combine everything in a ogg container
 
<pre>
 
<pre>
oggz-merge -o <title>.ogv <title>.video <title>.audio
+
oggz-merge -o <title>.ogv <title>.video <title>.audio <title>.srt
 
</pre>
 
</pre>

Latest revision as of 13:51, 19 December 2012

Decryption

For decrypting encrypted DVDs build and install libdvdcss from http://download.videolan.org/pub/libdvdcss/last

./configure --prefix=/usr
make
make install

Space not an Issue?

If hdd space is not an issue then grab the whole DVD

Installation

Install the following software

apt-get install  lsdvd gddrescue vobcopy genisoimage

Extract

Run once to get the css key

lsdvd

Read the DVD to a rescue image ignoring bad blocks abd mount the image

ddrescue -n -b 2048 /dev/<dvddevice> <title>_ddrescue.iso
mkdir <title>.mnt
mount <title>_ddrescue.iso <title>.mnt

Extract and unencrypt the video files.

vobcopy <title>.mnt -m -t <title>

Then create an iso image from it

genisoimage -dvd-video -o <title>.iso <title>

Make smaller files

This describes how to convert a DVD to OGG video using:

  • libtheora for video
  • libvorbis for audio
  • srt subtitles

Installation

Install the packages

apt-get install libav-tools lsdvd mplayer oggz-tools dvdauthor tesseract-ocr

Prepare

Create an image of the disc to avoid disc read problems

ddrescue -n -b 2048 /dev/<dvddevice> <title>_ddrescue.iso

Use lsdvd to see what's on DVD. Determine the stream you would like to extract as well as the aid for audio and the sid for subtitles.

lsdvd -x <title>_ddrescue.iso

Write the stream to the harddrive so the next steps will go faster.

mplayer dvdnav://<stream>/<title>_ddrescue.iso -dumpstream -dumpfile <title>.vob

Investigate the VOB file and note the numbers for the videostream and the audiostream

avprobe <title>.vob

Some VOB files report the wrong duration so just in case rebuild the vob file but only with the required videostream and audiostream.

avconv -y -i <title>.vob -map 0:<videostream> -c:v copy -an -sn -f vob <title>_video.vob
avconv -y -i <title>.vob -map 0:<audiostream> -c:a copy -vn -sn -f ac3 <title>_audio.ac3

Audio

Calculate the target bitrate

bitrate=`avprobe -show_format <title>_audio.ac3 2>/dev/null | grep bit_rate | cut -d= -f 2`; samplerate=`avprobe -show_streams <title>_audio.ac3 2>/dev/null | grep sample_rate | cut -d= -f 2` ; echo "scale=10;${bitrate}/(${samplerate}/44100)/2/1000" | bc | cut -d. -f 1

Now extract audio from the stream and convert to OGG.

avconv -y -i <title>_audio.ac3 -c:a libvorbis -b:a <bitrate>k -ar 44100 -vn -sn -f ogg <title>.audio

Video

Detect the amount to crop.

avconv -y -i <title>_video.vob -t 600 -vf cropdetect -an -sn -f rawvideo /dev/null 2>&1 | tail | head -n 1 | sed 's/^.*crop=//'

Calculate the target bitrate

filesize=`find ./ -name <title>_video.vob -printf '%s\n'`; duration=`avprobe -show_streams <title>_video.vob 2>/dev/null | grep duration | cut -d= -f 2 | sed 's/:/\//'`; sar=`avprobe -show_streams <title>_video.vob 2>/dev/null | grep sample_aspect_ratio | cut -d= -f 2 | sed 's/:/\//'`; framerate=`avprobe -show_streams <title>_video.vob 2>/dev/null | grep r_frame_rate | cut -d= -f 2 | sed 's/:/\//'`;echo "scale=10;((${filesize}/3)*8/${duration}/(${sar}))*(25/${framerate})/1000" | bc | cut -d. -f 1

Convert the video stream to ogv in 2 passes

avconv -y -pass 1 -i <title>_video.vob -r 25 -g 100 -bf 16 -filter:v yadif,crop=<cropvalues>,scale=in_w:in_h/sar -b:v <bitrate>k -c:v libtheora -an -sn -f ogg <title>.video
avconv -y -pass 2 -i <title>_video.vob -r 25 -g 100 -bf 16 -filter:v yadif,crop=<cropvalues>,scale=in_w:in_h/sar -b:v <bitrate>k -c:v libtheora -an -sn -f ogg <title>.video

Subtitles

Force a RGB palette file with the name <title>.rgb and the following contents

ff0000
ffff00
ff00ff
00ff00
ff0000
ffff00
ff00ff
00ff00
ff0000
ffff00
ff00ff
00ff00
ff0000
ffff00
ff00ff
00ff00

The 1st color is red, the 2nd color is yellow, the 3rd color is purple, the 4th color is green
Extract the subtitles from the stream.

spuunmux -s <sid> -p <title>.rgb <title>.vob

Using an image viewer determine the center color (no outlines) of the subtitle in the image and make this black text on a white background using:

for file in *.png; do convert $file -fill '#000000' -opaque '#<palette>' -threshold 1% $file.pnm; tesseract -l <3 letter code language> $file.pnm $file; done

Now combine the txt files and the sub.xml file and convert it to the SRT format

linenr=1; cat sub.xml | sed '/subpictures/d' | sed '/stream/d' | cut -d\" -f 2-6 | tr ' ' ',' | tr -d '"' | sed 's/,[^=]*=/,/g' | sed 's/,\([^,]*\)$/ --> \1/' | while read line; do file=`echo $line | cut -d, -f 1`; times=`echo $line | cut -d, -f 2`; echo $linenr; echo $times | tr '.' ',' ; cat $file.txt; echo; linenr=$(($linenr+1)); done > <title>.srt

Merging

Combine everything in a ogg container

oggz-merge -o <title>.ogv <title>.video <title>.audio <title>.srt