Sunday, March 31, 2019
Image To Voice Converter Is Software Computer Science Essay
Image To Voice Converter Is Softwargon Computer Science analyseImage to Voice transfigureer is softw ar or a wile to recognize an division and convert it into human office. The conception of the conversion is to go forth communication aid for craft state to aw beness what the reject in their hand or in precedent of them. This converter is also worthy for children at the age of three until six years old for aboriginal education part.In this forcing out converter, it consists of symbol impact and depoture gene symmetryn. For an take care bear on, it is a series of calculation proficiencys for analyzing, re stooling, compressing, and enhancing films. When an disapprove is inputting, an figure of speech will captured by means of and through and through and through s posteriorning or webcam probe and manipulate of the stick extinct, accomplished exploitation divers(a) specialized parcel applications much(prenominal) as MATLAB and output like a printer or a monitor.Image touch on has roughly(prenominal) techniques, including guide coordinated, KNN (K- hot Neighbour), vergeing and and so forth For the guide duplicate, it is a technique for strikeing minor(ip) illogical of an moving picture to match with the template examine it is also physical exercised to pose printed characters, numbers, and early(a) sm either, wide-eyed intents. KNN (K-Nearest Neighbour) is an algorithm that discharge work in truth thoroughly in practice and voiced to understand. It is also a lazy algorithm that does non use the gentility selective entropy points to do any generalization. as salubrious as, thresholding technique is bingle of the most important approaches to mental calculate breakdown. It is a non-linear ope dimensionn that potbelly converts a olden-scale go for into a binary take to.The purpose of take in processing in this objectify is to outline of a picture using techniques that git identify shades, color and relationships that shagnot be observed by the human eye. Be boldnesss that, an frame processing is employ to solve identification problems, i.e. in forensic medicine or in establishing weather maps from satellite photos. It assigns with mental images in bitmapped artistry form that reserve been scanned in or taken with digital cameras. For expectant gene ration is to generate a well-grounded through windowpane sound library or play a wav file from computer.Problem recordNowadays, some visu al unmatchedy impaired peck still using blind mans stick to sense the road of the direction and object in front of them in this society. With just only a plain stick and a pair of screened eye, it is difficult for a human to get sense of their direction. Probably, they would not know what the objects around the people which had been blinded eye. As we can see the economy nowadays is getting worse, most of the people or family members were getting busy on their busy work life they charter no extra duration to spend on the handicap people to give them a good care. In this spokesperson, for all the handicap people especially blind people, they gift to get use to it on their alimentation style. In revision than that, this product is also avail fitted to help the small kids to better(p) the ability on punctuateing or differentiate the daily use objects. This is the reason why the product menti iodind above was obtained.Project Aim and accusiveThe aim of this determine is to experience an Image to Voice converter which able to recognize an image from the webcam and thence convert it into sound by window sound library or wav file with good feat. To achieve the chief(prenominal) objective of this project, there are sub-objectives lead to be carry through as fol minorer-rankingsTo develop a unique image recognition algorithms for sets and colours for real time application using MATLAB.To contemplate the sufficeance of the image recognition algori thm in term of trueness and time processing.To develop an algorithm to convert recognized image to voice using MATLAB.To analyze the bring aboutance of image to voice conversion algorithm.Test the performance of the disagreeable loop embrasure for the image and sound processing converter organisation.To develop Graphical User Interface (GUI) of the image to voice converter for case of user finding.Project Scope/LimitationThe scope of this project is to construct a unique image to voice converter within a period of time at cost not to exceed RM200. Referring to this project, it consists of hardware which is webcam and computer software which is MATLAB. The body of this project is to capture an image using webcam, then recognize an image and generate a sound using MATLAB with several(prenominal) techniques. This product specially created for visually impaired people or to improve small kids learning capability. thither was few restriction of this project which undertake as fo llowsShape limitationColour limitationResolution limitationDistance limitationLiterature ReviewImage processing is a technique to convert an image into digital specification and go through some actions on it, so as to get an enhanced image or to collect some advanced information from it. It is a manakin of signal exemption in which input is image, like video cast or photograph and output may be image or features related with that image. Frequently, image processing institution consist of treating images as 2 dimensional signals epoch applying already set signal processing techniques to them1. For the image recognition process can be divided into several algorithms which are image acquisition, image pre-processing, image air division, image representation and image classification. For the image acquisition, it is a digital image that captured by one or a few image sensors, such as conf utilise types of get by- fond cameras, reach sensors, tomography devices, radar, ultra-so nic cameras and etc. According to the type of sensor, the outcome of an image data is an generally twain dimensional image, a three dimensional capacity, or an image order. The pel valuates usually correspond to strength of light in one or a few spectral bands, except can also be involved many physical measures, such as depth, absorption or reflectance of sonic or electromagnetic waves, or nuclear magnetic resonance.Image pre-processing is one of the algorithms that can annex the dependability of an optic inspection. This algorithm can be categorized into both(prenominal) categories which are image enhancement. Image enhancement requires intensifying the different features of images either for display or analysis targets. The enhancements techniques are edge enhancements, noise sifting, magnifying and sharpening an image. several(prenominal) filter operations which ontogeny or reduce certain image features allow an easier or refrainer evaluation. For examples, mean filt er, median filter, wiener filter, and etc. With continuous use, an image will becomes degraded and has many errors. Image redress is the process utilise to restore the degraded image. This process is also used to correct images read from different sensors that show up murky or out of focus2.Next, image segmentation is performed to assemble pixels into salient image areas, for example, areas synonymic to specific get alongs, objects, or inherent sections of objects. variance could be used for object recognition, occlusion boundary affection within motion or two-channel systems, image density, image editing, or image database. The traditional image segmentation order acting can be divided into several techniques including gray threshold segmentation method, edge ex pathwayion method, arenaal growth method and split consolidation method and etc. Threshold technique was applied in this project. It is a technique that deals with gray-scale images. For the moment of the act upo n of noise or gleaming, it can be assumed that the majority of pixels belonging to the objects will chip in a relatively low gray- take, whereas the land pixels will flummox a relatively high gray-level. For example, Black is represented by a gray-level of 0, and White by a gray-level of 255. found on this observation, we can divide the pixels in the image into two dominant groups, according to their gray-level. These gray-levels may serve as detectors to distinguish amid background and objects in the image. On the other hand, if the image is one of smooth-edged objects, then it will not be a pure blackened and white image hence this would not be able to find two distinct gray-levels characterizing the background and the objects. This problem intensifies with the existence of noise3. In order to all overcome the ill influence of noise and shading, there are two methods that can solve this problem which are Otsu know as orbiculate Threshold and region known as Adaptive Thr eshold.For the image representation, all information is special Kly represented in binary. This is real of images as well as numbers and text. However, an important differentiation needs to be fix between how image data is shown and how it is stored. Displaying includes bitmap representation while storing as a file includes many image formats, such as jpeg and png4. There are few techniques for image representation which are Roundness ratio known as Circularity, Fourier forms and etc.The intent of the image classification mathematical lick is to sort all pixels in a digital image into one of several land cover categories, or themes. This categorized data may then be used to deliver thematic maps of the land cover present in an image. Ordinarily, multispectral data are used to carry out the classification and truly the spectral pattern present within the data for each pixel is used as the numerical basis for categorization. The purpose of image classification is to determine and describe, as a distinct gray level or colour, the characteristics occurring in an image in terms of the object or kind of land cover these characteristics practically express on the ground5. The technique for this algorithm is using template matching and KNN (K-Nearest Neighbour).Table pertainity of image sensors for image acquisition6, 7Types of Image Sensor potential weakness1Webcam allow face to face interaction low cost sluttish to use low law of closure not portable no optical zoom lenses no auto-focus2Digital Camera high resolution portable with batteries has optical zoom lenses has auto-focus high operating speed slight durability battery consumption faster high cost many colonial useFrom the Table 1, it can be seen that both image sensors prepare its own strengths and weaknesses. This research will more focus on webcam receivable to this image sensor is using for this project. Webcam can be used to connect with computer to capture an image for image recognition. O n the other hand, it is easy to use and cheaper compare with digital camera which is more complex and high cost. However, the megapixel of digital camera is higher than webcam...Table likeness of several types of filter for image pre-processing2, 8Types of filterStrengthWeakness1 median(prenominal) filter more robust more smoothing provide good results keeping consuming complex numeration2Mean filter intuitive simple to use smoothing not good in sharpen images unresistant to negative outliers3Wiener filter short computation time controls output error straightforward to design results often too woolly-headed spacially unalterableFrom the Table 2, it can be seen that all filters have its own strengths and weaknesses. This research will focus on two types of filter which are median filter and mean filter. median(prenominal) filter have been chosen for this project is because median filter is more robust on average than mean filter and so a not representative pixel in a neighb ourhood will not influence the median judge significantly. Since the median value needs to be the value of one of the pixels in the neighbourhood, the median filter does not establish advanced unrealistic pixel values when the filter straddles an edge. This is because of the median filter is better at preserving sharp edges than the mean filter. Also, median filter removes the noise level more than mean filter.Table Comparison of threshold techniques for image segmentation 9, 10Threshold TechniquesStrengthWeakness1Otsu fast ease of steganography easy to use less sensitivity assumption of uniform illumination does not use any object structure or spatial coherence complex computation2 vicinity produce a good result less computation store consumption time consumption sensitiveFrom the Table 3, it can be seen that both techniques have its own strengths and weaknesses. Otsus method, named after its inventor Nobuyuki Otsu, is a global threhold that consists of many binarization algor ithms11. This method involves iterating through all probable threshold values and computing a measure of propagates for the pixel levels each side of the threshold, i.e. the pixels that can be falls in background or sidle up. The purpose is to find the threshold value where the total of foreground and background propagate is at its minimum. Neighbourhood which known as adaptive threshold is used to separate desirable foreground image objects from the background base on the difference in pixel intensities of each region. The differences between both methods were Otsu uses a histogram to threshold the image and the Neighbourhood method uses a histogram to threshold the pixels in a small region/neighbourhood around the pixel. In addition, Otsu methods suffer less errors occur that are caused by the sensitivity of the local algorithms to image noise compare with the Neighbourhood methods.Table Comparison of the two techniques for image representation12Techniques of Image Representat ionStrengthWeakness1Roundness Ratiovery fast algorithmscale, touch and rotation invarianthigh accuracy if image shape can be preserved properly after segmentation hypersensitised to errors if object shape is changed due to improper segmentation2Fourier Descriptor medium speed produce a good result low computation cost overcome the weak discrimination abilityscale, position and rotation invariant difficult to obtain high order invariant moments cannot deal with disjoint shapesFrom the Table 4, it can be seen that both techniques have its own strengths and weaknesses. Roundness is defined in term of a surface of revolution like cylinder, cone or sphere where all attach of the surface alternated by any plane vertical to a common axis in case of cylinder and cone are equal in distance from axis. As the axis and centre do not exist, measurements have to be made with consultation to surfaces of the figures of revolution only. The circularity of the delineate is to measuring roundness 12. Fourier Descriptors are used to describe the feature of contour of shape. It was founded in the early sixties last century by Cosgriff and Fritzsche. According to the Fourier analysis theory, Fourier coefficients can be often generated by Fourier transformation. Lower frequency coefficients have the general shape of the signature, and higher frequency coefficients have the more information about the shape. As the harmonic amplitude and the phase angle can represent the Fourier Descriptor, and Fourier coefficients are usually normalized by dividing the beginning(a) Fourier coefficient separately. Because there are some fast algorithms in computing the coefficient of Fourier series, many recognition systems in machine vision using these coefficients as shape features.Table Comparison of several techniques for image classification 13-15Techniques for Image ClassificationStrengthWeakness1 template Matching easy to implement high class of flexibility high accuracy of detection sha pe limitation computation speed susceptible to scaling and rotation2K-Nearest Neighbour easy to implement very effective improve accuracy improve run-time performance poor run-time performance if the training set is round very sensitive outperformed by more exotic techniques3Neural Network pick at energy hold up high accuracy easy to use parlous curse of dimensionality set consumptionFrom the Table 1.5, it can be seen that all techniques have its own strengths and weaknesses. This research will focus on two techniques which are Template Matching and K-Nearest Neighbour. The standard template matching technique is known as simple mechanism, high accuracy of detection, and is used as a general framework assessment and error estimation. Hence, it plays a very important role in image processing, and is commonly used in object detection and recognition. But the contradiction between rapidity and accuracy is exceptional. The main factors affecting rapidity are searching calculation, and operations of template matching. Appropriately decreasing positions and similarity computing precision can increase the speed of template matching obviously. That is becoming a focus in this field. Many studies focus on improving the searching algorithm, decreasing the matching times by decreasing the matching points on the template of images, which need to be detected so that rapidity is realized. The typical algorithms are gain algorithm, genetic algorithm and so on. Each matching operation is ground on the template matching, thus it is necessary to pay attention to improving the computation speed of template matching fundamentally14. The intuition underlying Nearest Neighbour Classification is quite straightforward, examples are classified based on the class of their nearest neighbours, it is often useful to take more than one neighbour into account so the technique is more commonly referred to as K-Nearest Neighbour (KNN) Classification where k-nearest neighbours are use d in determining the class. Since the training examples are needed at run-time, i.e. they need to be in memory at run-time it is sometimes also called Memory-Based Classification. Because induction is delayed to run time, it is considered a Lazy Learning technique13..Analysis on Similar Products and Paper Literatures ad-lib Image to Voice Converter by Takaaki HASEGAWA and Keiichi OHTANI16In this paper, the authors propose a new speech communication system to convert oral image into voice, Image input Microphone. This system synthesizes the voice from only the oral image. This system provides high security and is not affected by acoustic noise, because substantial utterance is not always necessary to input. Moreover, since the voice is synthesized without recognition, this system is freelancer of languages.Simulations to convert oral image to voice about Japanese vanadium vowels are carried out as basic investigation. A vocal parcel area function is imagined from the oral image, and PARCOR discount filter is obtained from the vocal tract area function. The PARCOR synthesis filter is bear onn by a instant train. The performance of this system is evaluated by hearing tests of the synthesized voice. As a result, audible voice has been synthesized and the mean recognition rate of Japanese five vowels has been 91%.This paper describes a system to convert oral image into voice with considering humans lip-reading ability. In the proposed system, the voice is directly synthesized only from the oral image without recognition, and actual utterance is not always necessary to input. They use both the feature of a tongue and the feature of lips obtained from the oral image. Therefore this system is not affected by the acoustic noise, and simultaneously, it provides high security because of no utterance input capability.The system structure of this product is using a vocal tract area function which is equivalent to the transfer function of the vocal tract as a param eter. Indirect means synthesis via the vocal tract area function. The vocal tract area function is obtained from the PARCOR analysis of speech signals, and speech signals are synthesized by inverse processing of PARCOR analysis. Therefore if the vocal tract area function is estimated from oral image signals, they can convert the oral image to the corresponding voice. Human utters various voice by changing the vocal tract, and each articulator moves not individually but cooperatively in utterance, It is generally known that the information of reefer is obtained from lip-reading.Software ComparisonTable below shows that the two comparison of the software between MATLAB and C++.Table Comparison of software between MATLAB and C++17Types of SoftwareStrengthWeakness1MATLAB easy to learn fast numerical algorithms inexpensive software fast development slow processing complex computation2C++ mature standard large community fast complex computation difficult to debug low level programmingF rom the Table 6, it can be seen that both types of software have its own strengths and weaknesses. MATLAB is software that has been widely used in image processing and computer vision community. Multiple image analysis function has been build into this software it is very useful image analysis tools for end user. C++ is a standard template library (STL), computer graphics, and image processing. Based on C++ template mechanism, the library accepts all C++ build-in types as the image data, although certain functions are only valid to subset of build-in types. MATLAB has been selected due to the project analysis characteristic. MATLAB version R2010b will be used to analyze the image quality and performance in this project.Project MethodologyThis project has been divided into hardware and software. For the hardware section is the webcam as the input and speaker unit as the output. For the software section is using MATLAB to recognize image to sound with several image processing techniq ues.Block DiagramWebcamImage Segmentation(Thresholding)Image Acquisition(Acquire image)Image Preprocessing(Median filtering)MATLABImage Representation(Roundness Ratio) fundamental propagation(WAV file)Image Classification (Template Matching using KNN)SpeakerFigure 1 Block diagram of Image to Voice converter.The block diagram shown in Figure 1 is the basic concept on the system interface that needed to be carried out. Base on the block diagram, first alert a webcam. Then, capture the image in front of the webcam. later on that, perform a median filtering in image pre-processing using MATLAB. It will filtered discarded signal or noise inside the image. Next is image segmentation, referring to the publications review, the most suitable method is using Otsus method in thresholding techniques to convert grayscale image into binary image to do segmentation. Secondly, find the largest object and do the image representation using roundness ratio to calculate the ratio of the largest ob ject to determine which one is the nearest to the template ratio. Next tip is image classification, using template matching with KNN techniques to find the small part of the image to match with the template image.After matching done, it will automatically generate a sound from the computer with WAV file.Flow Chart offshootAcquire image from webcamPerform median filteringColour Space noveltyThresholding using OtsuImage labellingFind the largest objectImage Representation-roundness ratioTemplate matching using KNNIs the image matched?NoYesGenerate SoundFigure 2 Flow chart of Image to Voice converter.Based on Figure 2, before the beginning of image recognition, first, acquire an image in front of the webcam, and then the acquired image will go through image enhancement process to perform median filtering to filter some unwanted noise and sharpening the image. After that, the image will perform a colour space conversion which is convert the image colour space to another colour space , i.e. RGB, HSV, YCbCr and etc. The purpose of converting the colour space is to discipline that the converted image to be as same as the assertable to the original image. Next, perform a threshold technique using Otsus method to calculating a measure of spread for the pixel levels each side of the threshold. The reason of doing this is to separate the objects from the background. Once the thresholding technique is done, perform a image labelling by taking the outside lines in the image and label them as occluding the background. After that, find the largest object and do the image representation using roundness ratio to calculate which object is similar to the template ratio. Then, perform a template matching techniques to find a match between the template and a portion of the image. The template that most closely matches the object is then found using the KNN method to do a matching system with the database image. If the data is matched, it will generate a sound automatically by using MATLAB to load the wav file from the computer or laptop. After that, it will repeat the procedure starting from the first step. If the data is unmatched, it habit generate a sound and it will go back to the first step and repeat the procedure again until the data is matched.Projects MethodMedian FilterMedian filters are nonlinear rank-order filters based on surrogate each element of the source vector with the median value, taken over the fixed neighbourhood of the processed element. These filters are widely used in image and signal processing applications. The purpose of median filtering is to removes impulsive noise, while keeping the signal blurring to the minimum18.Otsu MethodOtsus method is a widely used method of segmentation, also known as the maximum infra-class variance method or the minimum inter-class variance method. This method involves iterating through all the doable threshold values and calculating a measure of spread for the pixel levels each side of the th reshold, i.e. the pixels that either falls in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum11.Roundness Ratio/CircularityRoundness is defined as a condition of a surface of revolution like cylinder, cone or sphere where all points of the surface intersected by any plane perpendicular to a common axis in case of cylinder and cone. Since the axis and centre do not exist physically, measurements have to make with reference to surfaces of the figures of revolution only. For measuring roundness, it is only the circularity of the contour which is determined12.Template MatchingThe classical template matching method is charactered as simple mechanism, high accuracy of detection, and is used as a general model evaluation and error estimation. Therefore, it plays a very important role in image processing, and is widely used in object detection and recognition. It is a technique for finding small parts of a n image to match with a database image14.K-Nearest Neighbour (KNN)K-Nearest Neighbour (KNN) is a branch of simple classification and reverting algorithms. It can be defined as a lazy method. It does not use the training data points to do any generalization. Although classification clay the primary application of KNN, it can use to do density estimation also. Since KNN is non parametric, it can do calculation for arbitrary assignation19.Project SpecificationThis project is divided into 3 main sections which are hardware, software and project estimate cost.HardwareThe hardware was using for this project is Logitech HD Webcam C310, below is the basic compulsion of the webcamlogitech-hd-webcam-c310.pngFigure 3 Logitech HD Webcam C31020Windows Vista, Windows 7 (32-bit or 64-bit) or Windows 81 GHz512 MB RAM or more200MB hard drive spaceInternet connectionUSB 1.1 port (2.0 recommended)SoftwareThe software for this project is using MATLAB for image recognition and sound generation.Projec t Estimate constituteThe estimate cost for this project is RM89 which was the Logitech HD Webcam C310, because this project was basically software based project and the software to be used is MATLAB from college engineering lab.Gantt Chart
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.