Sunday, March 31, 2019
Image To Voice Converter Is Software Computer Science Essay
Image To Voice Converter Is Softwargon Computer Science  analyseImage to Voice  transfigureer is softw ar or a  wile to recognize an   division and convert it into human  office. The  conception of the conversion is to  go forth communication aid for  craft  state to  aw beness what the  reject in their hand or in  precedent of them. This converter is also  worthy for children at the age of three until six years old for  aboriginal education part.In this  forcing out converter, it consists of  symbol  impact and   depoture gene symmetryn. For an  take care  bear on, it is a series of calculation  proficiencys for analyzing, re stooling, compressing, and enhancing  films. When an  disapprove is inputting, an  figure of speech will captured  by means of and through and through and through s posteriorning or webcam  probe and manipulate of the  stick  extinct, accomplished  exploitation  divers(a) specialized  parcel applications  much(prenominal) as MATLAB and output like a printer or    a monitor.Image  touch on has   roughly(prenominal) techniques, including  guide  coordinated, KNN (K- hot Neighbour), vergeing and and so forth For the  guide  duplicate, it is a technique for  strikeing  minor(ip)   illogical of an  moving picture to match with the template  examine it is also  physical exercised to  pose printed characters, numbers, and  early(a) sm either,  wide-eyed  intents. KNN (K-Nearest Neighbour) is an   algorithm that  discharge work in truth  thoroughly in practice and  voiced to understand. It is also a lazy algorithm that does  non use the  gentility selective  entropy points to do any generalization.  as  salubrious as, thresholding technique is  bingle of the most important approaches to  mental  calculate  breakdown. It is a non-linear ope dimensionn that  potbelly converts a  olden-scale  go for into a binary  take to.The purpose of  take in processing in this  objectify is to  outline of a picture using techniques that  git identify shades,  color    and relationships that  shagnot be observed by the human eye. Be boldnesss that, an  frame processing is  employ to solve identification problems, i.e. in forensic medicine or in establishing weather maps from satellite photos. It assigns with  mental images in bitmapped  artistry form that  reserve been scanned in or taken with digital cameras. For  expectant gene ration is to generate a  well-grounded through windowpane sound library or play a wav file from computer.Problem  recordNowadays,  some visu al unmatchedy impaired  peck still using blind mans stick to sense the road of the direction and object in front of them in this society. With just only a plain stick and a pair of  screened eye, it is difficult for a human to get sense of their direction. Probably, they would not know what the objects around the people which had been blinded eye. As we can see the economy nowadays is getting worse, most of the people or family members were getting busy on their busy work life they     charter no extra   duration to spend on the handicap people to give them a good care. In this   spokesperson, for all the handicap people especially blind people, they  gift to get use to it on their  alimentation style. In  revision than that, this product is also avail fitted to help the small kids to  better(p) the ability on  punctuateing or differentiate the daily use objects. This is the reason why the product menti iodind above was  obtained.Project Aim and  accusiveThe aim of this  determine is to  experience an Image to Voice converter which able to recognize an image from the webcam and thence convert it into sound by window sound library or wav file with good  feat. To achieve the  chief(prenominal) objective of this project, there are sub-objectives  lead to be carry through as fol  minorer-rankingsTo develop a unique image recognition algorithms for  sets and colours for real time application using MATLAB.To  contemplate the  sufficeance of the image recognition algori   thm in term of trueness and time processing.To develop an algorithm to convert recognized image to voice using MATLAB.To analyze the  bring aboutance of image to voice conversion algorithm.Test the performance of the  disagreeable loop  embrasure for the image and sound processing converter  organisation.To develop Graphical User Interface (GUI) of the image to voice converter for case of user finding.Project Scope/LimitationThe scope of this project is to construct a unique image to voice converter within a period of time at cost not to exceed RM200. Referring to this project, it consists of  hardware which is webcam and  computer software which is MATLAB. The  body of this project is to capture an image using webcam, then recognize an image and generate a sound using MATLAB with several(prenominal) techniques. This product specially created for visually impaired people or to improve small kids learning capability. thither was few  restriction of this project which  undertake as fo   llowsShape limitationColour limitationResolution limitationDistance limitationLiterature ReviewImage processing is a technique to convert an image into digital specification and go through some actions on it, so as to get an enhanced image or to collect some advanced information from it. It is a  manakin of signal exemption in which input is image, like video  cast or photograph and output  may be image or features related with that image. Frequently, image processing institution consist of treating images as   2 dimensional signals  epoch applying already set signal processing techniques to them1. For the image recognition process can be divided into several algorithms which are image acquisition, image pre-processing, image  air division, image representation and image classification. For the image acquisition, it is a digital image that captured by one or a few image sensors, such as  conf utilise types of  get by- fond cameras,  reach sensors, tomography devices, radar, ultra-so   nic cameras and etc. According to the type of sensor, the outcome of an image  data is an generally  twain dimensional image, a three dimensional capacity, or an image order. The   pel  valuates usually correspond to strength of light in one or a few spectral bands,  except can also be involved many physical measures, such as depth, absorption or reflectance of sonic or electromagnetic waves, or nuclear magnetic resonance.Image pre-processing is one of the algorithms that can  annex the dependability of an  optic inspection. This algorithm can be categorized into   both(prenominal) categories which are image enhancement. Image enhancement requires intensifying the different features of images  either for display or analysis targets. The enhancements techniques are edge enhancements, noise sifting, magnifying and sharpening an image. several(prenominal) filter operations which  ontogeny or reduce certain image features allow an easier or  refrainer evaluation. For examples, mean filt   er, median filter, wiener filter, and etc. With continuous use, an image will becomes degraded and has many errors. Image  redress is the process  utilise to restore the degraded image. This process is also used to correct images read from different sensors that show up murky or out of focus2.Next, image segmentation is performed to assemble pixels into salient image areas, for example, areas  synonymic to specific  get alongs, objects, or inherent sections of objects.  variance could be used for object recognition, occlusion boundary  affection within motion or  two-channel systems, image density, image editing, or image database. The traditional image segmentation  order acting can be divided into several techniques including gray threshold segmentation  method, edge ex pathwayion method,  arenaal growth method and split consolidation method and etc. Threshold technique was applied in this project. It is a technique that deals with gray-scale images. For the moment of the  act upo   n of noise or  gleaming, it can be assumed that the majority of pixels belonging to the objects will  chip in a  relatively low gray- take, whereas the  land pixels will  flummox a relatively high gray-level. For example, Black is represented by a gray-level of 0, and White by a gray-level of 255.  found on this observation, we can divide the pixels in the image into two dominant groups, according to their gray-level. These gray-levels may serve as detectors to distinguish  amid background and objects in the image. On the other hand, if the image is one of smooth-edged objects, then it will not be a pure  blackened and white image hence this would not be able to find two distinct gray-levels characterizing the background and the objects. This problem intensifies with the existence of noise3. In order to  all overcome the ill influence of noise and shading, there are two methods that can solve this problem which are Otsu know as  orbiculate Threshold and  region known as Adaptive Thr   eshold.For the image representation, all information is  special Kly represented in binary. This is real of images as well as numbers and text. However, an important differentiation needs to be  fix between how image data is shown and how it is stored. Displaying includes bitmap representation while storing as a file includes many image formats, such as jpeg and png4. There are few techniques for image representation which are Roundness ratio known as Circularity, Fourier  forms and etc.The intent of the image classification  mathematical  lick is to sort all pixels in a digital image into one of several land cover categories, or themes. This categorized data may then be used to deliver thematic maps of the land cover present in an image. Ordinarily, multispectral data are used to carry out the classification and truly the spectral pattern present within the data for each pixel is used as the numerical basis for categorization. The purpose of image classification is to determine and    describe, as a distinct gray level or colour, the characteristics occurring in an image in terms of the object or kind of land cover these characteristics practically express on the ground5. The technique for this algorithm is using template matching and KNN (K-Nearest Neighbour).Table    pertainity of image sensors for image acquisition6, 7Types of Image Sensor potential weakness1Webcam allow face to face interaction low cost  sluttish to use low  law of closure not portable no optical zoom lenses no auto-focus2Digital Camera high resolution portable with batteries has optical zoom lenses has auto-focus high operating speed  slight durability battery consumption faster high cost many  colonial  useFrom the Table 1, it can be seen that both image sensors  prepare its own strengths and weaknesses. This research will more focus on webcam  receivable to this image sensor is using for this project. Webcam can be used to connect with computer to capture an image for image recognition. O   n the other hand, it is easy to use and cheaper compare with digital camera which is more complex and high cost. However, the megapixel of digital camera is higher than webcam...Table   likeness of several types of filter for image pre-processing2, 8Types of filterStrengthWeakness1 median(prenominal) filter more robust more smoothing provide good results  keeping consuming complex  numeration2Mean filter  intuitive simple to use smoothing not good in sharpen images  unresistant to negative outliers3Wiener filter short computation time controls output error straightforward to design results often too  woolly-headed  spacially  unalterableFrom the Table 2, it can be seen that all filters have its own strengths and weaknesses. This research will focus on two types of filter which are median filter and mean filter.  median(prenominal) filter have been chosen for this project is because median filter is more robust on average than mean filter and so a not representative pixel in a neighb   ourhood will not influence the median  judge significantly. Since the median value needs to be the value of one of the pixels in the neighbourhood, the median filter does not establish  advanced unrealistic pixel values when the filter straddles an edge. This is because of the median filter is better at preserving sharp edges than the mean filter. Also, median filter removes the noise level more than mean filter.Table  Comparison of threshold techniques for image segmentation 9, 10Threshold TechniquesStrengthWeakness1Otsu fast ease of  steganography easy to use less sensitivity assumption of uniform illumination does not use any object structure or spatial coherence complex computation2 vicinity produce a good result less computation  store consumption time consumption sensitiveFrom the Table 3, it can be seen that both techniques have its own strengths and weaknesses. Otsus method, named after its inventor Nobuyuki Otsu, is a global threhold that consists of many binarization algor   ithms11. This method involves iterating through all probable threshold values and computing a measure of propagates for the pixel levels each side of the threshold, i.e. the pixels that can be  falls in background or  sidle up. The purpose is to find the threshold value where the total of foreground and background propagate is at its minimum. Neighbourhood which known as adaptive threshold is used to separate desirable foreground image objects from the background  base on the difference in pixel intensities of each region. The differences between both methods were Otsu uses a histogram to threshold the image and the Neighbourhood method uses a histogram to threshold the pixels in a small region/neighbourhood around the pixel. In addition, Otsu methods suffer less errors occur that are caused by the sensitivity of the local algorithms to image noise compare with the Neighbourhood methods.Table  Comparison of the two techniques for image representation12Techniques of Image Representat   ionStrengthWeakness1Roundness Ratiovery fast algorithmscale,  touch and rotation invarianthigh accuracy if image shape can be preserved properly after segmentation hypersensitised to errors if object shape is changed due to improper segmentation2Fourier Descriptor medium speed produce a good result low computation cost overcome the weak discrimination abilityscale, position and rotation invariant difficult to obtain high order invariant moments cannot deal with disjoint shapesFrom the Table 4, it can be seen that both techniques have its own strengths and weaknesses. Roundness is defined in term of a surface of revolution like  cylinder,  cone or sphere where all  attach of the surface alternated by any plane vertical to a common axis in case of cylinder and cone are equal in distance from axis. As the axis and centre do not exist, measurements have to be made with consultation to surfaces of the figures of revolution only. The circularity of the  delineate is to measuring roundness   12. Fourier Descriptors are used to describe the feature of contour of shape. It was founded in the early sixties last century by Cosgriff and Fritzsche. According to the Fourier analysis theory, Fourier coefficients can be often generated by Fourier transformation. Lower frequency coefficients have the general shape of the signature, and higher frequency coefficients have the more information about the shape. As the harmonic amplitude and the phase angle can represent the Fourier Descriptor, and Fourier coefficients are usually normalized by dividing the  beginning(a) Fourier coefficient separately. Because there are some fast algorithms in computing the coefficient of Fourier series, many recognition systems in machine vision using these coefficients as shape features.Table  Comparison of several techniques for image classification 13-15Techniques for Image ClassificationStrengthWeakness1 template Matching easy to implement high  class of flexibility high accuracy of detection sha   pe limitation computation speed susceptible to scaling and rotation2K-Nearest Neighbour easy to implement very effective improve accuracy improve run-time performance poor run-time performance if the training set is  round very sensitive outperformed by more exotic techniques3Neural Network  pick at energy  hold up high accuracy easy to use  parlous curse of dimensionality  set consumptionFrom the Table 1.5, it can be seen that all techniques have its own strengths and weaknesses. This research will focus on two techniques which are Template Matching and K-Nearest Neighbour. The standard template matching technique is known as simple mechanism, high accuracy of detection, and is used as a general  framework assessment and error estimation. Hence, it plays a very important role in image processing, and is commonly used in object detection and recognition. But the contradiction between rapidity and accuracy is exceptional. The main factors affecting rapidity are searching calculation,    and operations of template matching. Appropriately decreasing positions and similarity computing precision can increase the speed of template matching obviously. That is becoming a focus in this field. Many studies focus on  improving the searching algorithm, decreasing the matching times by decreasing the matching points on the template of images, which need to be detected so that rapidity is realized. The typical algorithms are  gain algorithm, genetic algorithm and so on. Each matching operation is  ground on the template matching, thus it is necessary to pay attention to improving the computation speed of template matching fundamentally14. The intuition underlying Nearest Neighbour Classification is quite straightforward, examples are classified based on the class of their nearest neighbours, it is often useful to take more than one neighbour into account so the technique is more commonly referred to as K-Nearest Neighbour (KNN) Classification where k-nearest neighbours are use   d in determining the class. Since the training examples are needed at run-time, i.e. they need to be in memory at run-time it is sometimes also called Memory-Based Classification. Because induction is delayed to run time, it is considered a Lazy Learning technique13..Analysis on Similar Products and Paper Literatures ad-lib Image to Voice Converter by Takaaki HASEGAWA and Keiichi OHTANI16In this paper, the authors propose a new speech communication system to convert oral image into voice, Image input Microphone. This system synthesizes the voice from only the oral image. This system provides high security and is not affected by acoustic noise, because  substantial utterance is not always necessary to input. Moreover, since the voice is synthesized without recognition, this system is  freelancer of languages.Simulations to convert oral image to voice about Japanese  vanadium vowels are carried out as basic investigation. A vocal  parcel area function is  imagined from the oral image,    and PARCOR  discount filter is obtained from the vocal tract area function. The PARCOR synthesis filter is  bear onn by a  instant train. The performance of this system is evaluated by hearing tests of the synthesized voice. As a result,  audible voice has been synthesized and the mean recognition rate of Japanese five vowels has been 91%.This paper describes a system to convert oral image into voice with considering humans lip-reading ability. In the proposed system, the voice is directly synthesized only from the oral image without recognition, and actual utterance is not always necessary to input. They use both the feature of a tongue and the feature of lips obtained from the oral image. Therefore this system is not affected by the acoustic noise, and simultaneously, it provides high security because of no utterance input capability.The system structure of this product is using a vocal tract area function which is equivalent to the transfer function of the vocal tract as a param   eter. Indirect means synthesis via the vocal tract area function. The vocal tract area function is obtained from the PARCOR analysis of speech signals, and speech signals are synthesized by inverse processing of PARCOR analysis. Therefore if the vocal tract area function is estimated from oral image signals, they can convert the oral image to the corresponding voice. Human utters various voice by changing the vocal tract, and each articulator moves not  individually but cooperatively in utterance, It is generally known that the information of  reefer is obtained from lip-reading.Software ComparisonTable below shows that the two comparison of the software between MATLAB and C++.Table  Comparison of software between MATLAB and C++17Types of SoftwareStrengthWeakness1MATLAB easy to learn fast numerical algorithms inexpensive software fast development slow processing complex computation2C++ mature standard large community fast complex computation difficult to debug low level programmingF   rom the Table 6, it can be seen that both types of software have its own strengths and weaknesses. MATLAB is software that has been widely used in image processing and computer vision community. Multiple image analysis function has been build into this software it is very useful image analysis tools for end user. C++ is a standard template library (STL), computer graphics, and image processing. Based on C++ template mechanism, the library accepts all C++ build-in types as the image data, although certain functions are only valid to subset of build-in types. MATLAB has been selected due to the project analysis characteristic. MATLAB version R2010b will be used to analyze the image quality and performance in this project.Project MethodologyThis project has been divided into hardware and software. For the hardware section is the webcam as the input and  speaker unit as the output. For the software section is using MATLAB to recognize image to sound with several image processing techniq   ues.Block DiagramWebcamImage Segmentation(Thresholding)Image Acquisition(Acquire image)Image Preprocessing(Median filtering)MATLABImage Representation(Roundness Ratio) fundamental  propagation(WAV file)Image Classification (Template Matching using KNN)SpeakerFigure 1 Block diagram of Image to Voice converter.The block diagram shown in Figure 1 is the basic concept on the system interface that needed to be carried out. Base on the block diagram, first  alert a webcam. Then, capture the image in front of the webcam.  later on that, perform a median filtering in image pre-processing using MATLAB. It will filtered  discarded signal or noise inside the image. Next is image segmentation, referring to the  publications review, the most suitable method is using Otsus method in thresholding techniques to convert grayscale image into binary image to do segmentation. Secondly, find the largest object and do the image representation using roundness ratio to calculate the ratio of the largest ob   ject to determine which one is the nearest to the template ratio. Next  tip is image classification, using template matching with KNN techniques to find the small part of the image to match with the template image.After matching done, it will   automatically generate a sound from the computer with WAV file.Flow Chart offshootAcquire image from webcamPerform median filteringColour Space  noveltyThresholding using OtsuImage labellingFind the largest objectImage Representation-roundness ratioTemplate matching using KNNIs the image matched?NoYesGenerate SoundFigure 2 Flow chart of Image to Voice converter.Based on Figure 2, before the beginning of image recognition, first, acquire an image in front of the webcam, and then the acquired image will go through image enhancement process to perform median filtering to filter some unwanted noise and sharpening the image. After that, the image will perform a colour space conversion which is convert the image colour space to another colour space   , i.e. RGB, HSV, YCbCr and etc. The purpose of converting the colour space is to  discipline that the converted image to be as same as the  assertable to the original image. Next, perform a threshold technique using Otsus method to calculating a measure of spread for the pixel levels each side of the threshold. The reason of doing this is to separate the objects from the background. Once the thresholding technique is done, perform a image labelling by taking the outside lines in the image and label them as occluding the background. After that, find the largest object and do the image representation using roundness ratio to calculate which object is similar to the template ratio. Then, perform a template matching techniques to find a match between the template and a portion of the image. The template that most closely matches the object is then found using the KNN method to do a matching system with the database image. If the data is matched, it will generate a sound automatically by    using MATLAB to load the wav file from the computer or laptop. After that, it will repeat the procedure starting from the first step. If the data is unmatched, it  habit generate a sound and it will go back to the first step and repeat the procedure again until the data is matched.Projects MethodMedian FilterMedian filters are nonlinear rank-order filters based on  surrogate each element of the source vector with the median value, taken over the fixed neighbourhood of the processed element. These filters are widely used in image and signal processing applications. The purpose of median filtering is to removes impulsive noise, while keeping the signal blurring to the minimum18.Otsu MethodOtsus method is a widely used method of segmentation, also known as the maximum infra-class variance method or the minimum inter-class variance method. This method involves iterating through all the  doable threshold values and calculating a measure of spread for the pixel levels each side of the th   reshold, i.e. the pixels that either falls in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum11.Roundness Ratio/CircularityRoundness is defined as a condition of a surface of revolution like cylinder, cone or sphere where all points of the surface intersected by any plane perpendicular to a common axis in case of cylinder and cone. Since the axis and centre do not exist physically, measurements have to make with reference to surfaces of the figures of revolution only. For measuring roundness, it is only the circularity of the contour which is determined12.Template MatchingThe classical template matching method is charactered as simple mechanism, high accuracy of detection, and is used as a general model evaluation and error estimation. Therefore, it plays a very important role in image processing, and is widely used in object detection and recognition. It is a technique for finding small parts of a   n image to match with a database image14.K-Nearest Neighbour (KNN)K-Nearest Neighbour (KNN) is a branch of simple classification and  reverting algorithms. It can be defined as a lazy method. It does not use the training data points to do any generalization. Although classification  clay the primary application of KNN, it can use to do density estimation also. Since KNN is non parametric, it can do calculation for arbitrary assignation19.Project SpecificationThis project is divided into 3 main sections which are hardware, software and project estimate cost.HardwareThe hardware was using for this project is Logitech HD Webcam C310, below is the basic  compulsion of the webcamlogitech-hd-webcam-c310.pngFigure 3 Logitech HD Webcam C31020Windows Vista, Windows 7 (32-bit or 64-bit) or Windows 81 GHz512 MB RAM or more200MB hard drive spaceInternet connectionUSB 1.1 port (2.0 recommended)SoftwareThe software for this project is using MATLAB for image recognition and sound generation.Projec   t Estimate  constituteThe estimate cost for this project is RM89 which was the Logitech HD Webcam C310, because this project was basically software based project and the software to be used is MATLAB from college engineering lab.Gantt Chart  
Subscribe to:
Post Comments (Atom)
 
 
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.