Data & Algorithm (The Stage of Data Processing)

Data Processing :-


    

Data processing is a process of converting raw facts or data into a meaningful information.In other words, Everyone is familiar with term "word processing," but computers were really developed for "data processing" _ the organization and manipulation of large amounts of numeric data, or in computer jargon, " number crunching." Some examples of data processing are calculation of satellite orbits, weather forecasting, statistical analyses, and in a more practice sense, business applications such as accounting, payroll, and billing.

image 1

Since the beginning of time people have sought ways to help in the computing, handling, merging, and sorting of numeric data. Think of all the labour that Bob Cratchit performed when keeping track of Ebenezer Scrooge's figures and accounts. Certainly Cratchit wished for an easier approach and undoubtedly Mr. Scrooge longed for a more accurate method to keep track of his accounts.

Stages of Data Processing :-

There are Six main stages in the data  processing cycle :

1. Data Collection

2. Data Preparation

3. Data Input

4. Data Processing

5. Data Output / Interpretation

6. Data Storage

ii. Cycle of stages in Data Processing


1. Data Collection :-  

i. The collection of raw data is the first step of the data processing cycle.

ii. The type of raw data collected has a huge impact on the output produced.

iii. It is important that the raw data sources are trustworthy and well - build so that data collected is of the highest possible quality.

iv. Raw data can include monetary figures, website cookies, profit/ loss statements of a company, user behaviour, etc.

2. Data Preparation :-

i. Data preparation or data cleaning is the process of sorting and filtering the raw data to remove unnecessary and inaccurate data.

ii. Raw data is checked for error and then transformed into a suitable form for future analysis and processing.

iii. This is done to ensure that only the highest quality data is fed into the processing unit.

3. Data Input :-

i. In the step, the raw data is converted into machine readable form and fed into the processing unit.

ii. Data input is the first stage in which raw data begins to take the form of usable information.

iii. This can be in the form of data entry though a keyboard, scanner or any other input source.

4. Data Processing :-

i. During this stage, the data inputted in the previous stage is actually processed for interpretation.

ii. The raw data is subjected to various data processing methods using machine learning and artificial intelligence algorithms to generate a desirable output.

iii. The process itself may vary slightly depending on the source of data being processed and its intended use.

5. Data Storage :- 

i. The final stage of data processing is storage. After all of the data is processed, it is then stored for future use. 
ii. This allows for quick access and retrieval of information whenever needed, and also allows it to be used as input in the next data processing cycle directly.

Forms of Data Processing :-

Different forms of data processing are :-

1.Data  Cleaning
2.Data Integration
3. Data Transformation
       a) Smoothing
       b) Aggregation
       c) Generalization
       d) Normalization
             i. Min-max normalization
             ii. z- score normalization
       e) Attribute/feature construction
4. Data Reduction

1. Data Cleaning :-

 Data cleaning is a process to remove the noisy data, clean the data by filling in the missing values and correct the inconsistencies in data.

2. Data Integration :-

Data integration is a technique that combines the data from multiple heterogeneous data sources into a coherent data store. Data integration may involve inconsistent data and therefore needs data cleaning.

3. Data Transformation :-

 In this step, data is transformed or consolidated into forms appropriate for mining, by performing summary or aggregation operations. It involves the following:

a). Smoothing :-

Smoothing is a process of removing noise from data.

b). Aggregation :-

 Aggregation is a process where summary or aggregation operations are applied to the data.

c). Generalization :-

 In generalization low-level data are replaced with high-level data by using concept hierarchies climbing.

d). Normalization:-

Normalization scaled attribute data so as to fall within a small specified range, such as 0.0 to 1.0. It is of two types :

i. Min-max normalization : It is a technique that helps to normalize data. It will scale the data between 0 and 1.

ii.  z- score normalization : Transform the data by converting the values to a common scale with an average of zero and a standard deviation of one.

e. Attribute/feature construction :

New attributes constructed from the given ones.

4. Data reduction :-

Data reduction is used to obtain reduced representation of data in small values by maintaining the integrity of original data.

       
                                            Thank You ☺☺☺

Post a Comment

0 Comments