aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authordrowe67 <[email protected]>2023-11-20 08:33:17 +1030
committerDavid Rowe <[email protected]>2023-11-20 08:33:17 +1030
commit3d9443facfb06de1ee7efaecf876ff225e2fd5c1 (patch)
treef901b83fdd952142f4f2b371c2a1cac25433dc57 /doc
parent24d7b22e4f4086ef64b27048cbdb5bffc6ed5bd4 (diff)
building up encoder block diagram
Diffstat (limited to 'doc')
-rw-r--r--doc/codec2.pdfbin139875 -> 140472 bytes
-rw-r--r--doc/codec2.tex39
2 files changed, 36 insertions, 3 deletions
diff --git a/doc/codec2.pdf b/doc/codec2.pdf
index b2ad5b3..c8332bf 100644
--- a/doc/codec2.pdf
+++ b/doc/codec2.pdf
Binary files differ
diff --git a/doc/codec2.tex b/doc/codec2.tex
index 4d2c2ac..d60d557 100644
--- a/doc/codec2.tex
+++ b/doc/codec2.tex
@@ -2,7 +2,7 @@
\usepackage{amsmath}
\usepackage{hyperref}
\usepackage{tikz}
-\usetikzlibrary{calc,arrows}
+\usetikzlibrary{calc,arrows,shapes,positioning}
\usepackage{float}
\usepackage{xstring}
@@ -22,6 +22,19 @@
\author{David Rowe\\ \\ Revision: {\gitrevision} on branch: {\branch}}
\begin{document}
+
+% Tikz code used to support block diagrams
+% credit: https://tex.stackexchange.com/questions/175969/block-diagrams-using-tikz
+
+\tikzset{
+block/.style = {draw, fill=white, rectangle, minimum height=3em, minimum width=3em},
+tmp/.style = {coordinate},
+sum/.style= {draw, fill=white, circle, node distance=1cm},
+input/.style = {coordinate},
+output/.style= {coordinate},
+pinstyle/.style = {pin edge={to-,thin,black}}
+}
+
\maketitle
\section{Introduction}
@@ -117,11 +130,31 @@ The parameters of the sinusoidal model are:
\begin{enumerate}
\item Frequencies of each sine wave. As they are all harmonics of $F_0$ we can just send $F_0$ to the decoder, and it can reconstruct the frequency of each harmonic as $F_0,2F_0,3F_0,...,LF_0$. We used 5-7 bits/frame to represent the $F_0$ in Codec 2.
\item The spectral magnitudes, $A_1,A_2,...,A_L$. These are really important as they convey the information the ear needs to make the speech intelligible. Most of the bits are used for spectral magnitude information. Codec 2 uses between 20 and 36 bits/frame for spectral amplitude information.
-\item A voicing model. Speech can be approximated into voiced speech (vowels) and unvoiced speech (like consonants), or some mixture of the two. The example in Figure \ref{fig:hts2a_time} above is for voiced speech. So we need some way to tell the decoder if the speech is voiced or unvoiced, this requires just a few bits/frame.
+\item A voicing model. Speech can be approximated into voiced speech (vowels) and unvoiced speech (like consonants), or some mixture of the two. The example in Figure \ref{fig:hts2a_time} above is for voiced speech. So we need some way to describe voicing to the decoder. This requires just a few bits/frame.
\end{enumerate}
\subsection{Codec 2 Block Diagram}
+\begin{figure}[h]
+\caption{Codec 2 Encoder.}
+\label{fig:codec2_encoder}
+\begin{center}
+\begin{tikzpicture}[auto, node distance=2cm,>=triangle 45,x=1.0cm,y=1.0cm]
+
+\node [input] (rinput) {};
+\node [input, right of=rinput,node distance=1cm] (z) {};
+\node [block, right of=z,node distance=2cm] (pitch_est) {Pitch Estimator};
+\node [block, below of=pitch_est] (fft) {FFT};
+\node [block, right of=fft,node distance=3cm] (est_Am) {Estimate $A_m$};
+
+\draw [->] (rinput) -- node[left,text width=2cm] {Input Speech} (pitch_est);
+\draw [->] (z) |- (fft);
+\draw [->] (pitch_est) -| (est_Am);
+\draw [->] (fft) -- (est_Am);
+
+\end{tikzpicture}
+\end{center}
+\end{figure}
\subsection{Bit Allocation}
@@ -133,7 +166,7 @@ The parameters of the sinusoidal model are:
\section{Further Work}
\begin{enumerate}
-\item Using c2sim to ectract and plot model parameters
+\item Using c2sim to extract and plot model parameters
\item How to use tools to single step through codec operation
\end{enumerate}