KEYWORDS: Data mining, Data modeling, Mining, Data conversion, Databases, Structural design, Data centers, Data analysis, Data storage, Data integration
The data warehouse and the data mining technology is one of information technology research hot topics. At
present the data warehouse and the data mining technology in aspects and so on commercial, financial industry as well
as enterprise's production, market marketing obtained the widespread application, but is relatively less in educational
fields' application. Over the years, the teaching and management have been accumulating large amounts of data in
colleges and universities, while the data can not be effectively used, in the light of social needs of the university
development and the current status of data management, the establishment of data warehouse in university state, the
better use of existing data, and on the basis dealing with a higher level of disposal --data mining are particularly
important. In this paper, starting from the decision-making needs design data warehouse structure of university teaching
state, and then through the design structure and data extraction, loading, conversion create a data warehouse model,
finally make use of association rule mining algorithm for data mining, to get effective results applied in practice. Based
on the data analysis and mining, get a lot of valuable information, which can be used to guide teaching management,
thereby improving the quality of teaching and promoting teaching devotion in universities and enhancing teaching
infrastructure. At the same time it can provide detailed, multi-dimensional information for universities assessment and
higher education research.
With the widely application of databases and sharp development of Internet, The capacity of utilizing information
technology to manufacture and collect data has improved greatly. It is an urgent problem to mine useful information or
knowledge from large databases or data warehouses. Therefore, data mining technology is developed rapidly to meet
the need. But DM (data mining) often faces so much data which is noisy, disorder and nonlinear. Fortunately, ANN
(Artificial Neural Network) is suitable to solve the before-mentioned problems of DM because ANN has such merits as
good robustness, adaptability, parallel-disposal, distributing-memory and high tolerating-error.
This paper gives a detailed discussion about the application of ANN method used in DM based on the analysis of all
kinds of data mining technology, and especially lays stress on the classification Data Mining based on RBF neural
networks. Pattern classification is an important part of the RBF neural network application. Under on-line environment,
the training dataset is variable, so the batch learning algorithm (e.g. OLS) which will generate plenty of unnecessary
retraining has a lower efficiency. This paper deduces an incremental learning algorithm (ILA) from the gradient
descend algorithm to improve the bottleneck. ILA can adaptively adjust parameters of RBF networks driven by
minimizing the error cost, without any redundant retraining. Using the method proposed in this paper, an on-line
classification system was constructed to resolve the IRIS classification problem. Experiment results show the algorithm
has fast convergence rate and excellent on-line classification performance.
KEYWORDS: Data mining, Databases, Mining, Data processing, Data storage, Algorithms, Surface plasmons, Structural design, Computer simulations, Detection and tracking algorithms
Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence
database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent
sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its
application field has not been confined to the business database and has extended to new data sources such as Web and
advanced science fields such as DNA analysis.
The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage.
Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically.
According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed
parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and
utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets
applying frequent concept and search space partition theory and the second task is to structure frequent sequences using
the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't
generate the candidated sequences, which abates the access time and improves the mining efficiency.
Based on the random data generation procedure and different information structure designed, this paper simulated
the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments
demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.
The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous
distributed databases. The information stored at the data warehouse is in form of views referred to as materialized
views. The selection of the materialized views is one of the most important decisions in designing a data warehouse.
Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical
processing queries. The first issue for the user to consider is query response time. So in this paper, we develop
algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost
under the constraint of a given query response time. We call it query_cost view_ selection problem.
First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for
selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the
materialized views selection problem. But with the development of genetic process, the legal solution produced become
more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic
algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated
annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in
order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that
the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized
algorithms will become invaluable tools for data warehouse evolution.
KEYWORDS: Data modeling, Databases, Data mining, Systems modeling, Data storage, Data conversion, Data integration, Data analysis, Data processing, Decision support systems
The interest in analyzing data has grown tremendously in recent years. To analyze data, a multitude of technologies is need, namely technologies from the fields of Data Warehouse, Data Mining, On-line Analytical Processing (OLAP). This paper proposes the system structure model of the data warehouse during modern enterprises environment according to the information demand for enterprises and the actual demand of user's, and also analyses the benefit of this kind of model in practical application, and provides the setting-up course of the data warehouse model. At the same time it has proposes the total design plans of the data warehouses of modern enterprises. The data warehouse that we build in practical application can be offered: high performance of queries; efficiency of the data; independent characteristic of logical and physical data. In addition, A Data Warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support, OLAP queries or data mining. One of the most important decisions in designing a data warehouse is selection of right views to be materialized. In this paper, we also have designed algorithms for selecting a set of views to be materialized in a data warehouse.First, we give the algorithms for selecting materialized views. Then we use experiments do demonstrate the power of our approach. The results show the proposed algorithm delivers an optimal solution. Finally, we discuss the advantage and shortcoming of our approach and future work.
Selecting views to materialize impacts on the efficiency as well as the total cost of establishing and running a data warehouse. One of the most important decisions in designing a data warehouse is selection of right views to be materialized. This problem is to select a right set of views that minimizes total query response time and the cost of view maintenance under a storage space constraint. In this paper, according to our practical application, the factor that refrains us from materializing all views in the data warehouse is not the space constraint but query response time. For queries fast answers may be required. So we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance time under the constraint of a given query response time. We call it query-cost view select problem. First, we design algorithms for query-cost view select problem, we give view node matrix in order to solve it. Second , we use experiments do demonstrate the power of our approach . The results show that our algorithm works better in practical cases. We implemented our algorithms and a performance study of the algorithms shows that the proposed algorithm delivers an optimal solution. Finally, we discuss the observed behavior of the algorithms. We also identify some important issues for future investigations.
KEYWORDS: Databases, Data mining, Data modeling, Data storage, Data processing, Algorithm development, Computer networks, Data integration, Knowledge discovery, Artificial intelligence
The interest in analyzing data has grown tremendously in recent years. To analyze data, a multitude of technologies is need, namely technologies from the fields of Data Warehouse, Data Mining, On-line Analytical Processing (OLAP). This paper gives a new architecture of data warehouse in CIMS according to CRGC-CIMS application engineering. The data source of this architecture comes from database of CRGC-CIMS system. The data is put in global data set by extracting, filtrating and integrating, and then the data is translated to data warehouse according information request. We have addressed two advantages of the new model in CRGC-CIMS application. In addition, a Data Warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support, OLAP queries or data mining. It is important to select the right view to materialize that answer a given set of queries. In this paper, we also have designed algorithms for selecting a set of views to be materialized in a data warehouse in order to answer the most queries under the constraint of given space. First, we give a cost model for selecting materialized views. Then we give the algorithms that adopt gradually recursive method from bottom to top. We give description and realization of algorithms. Finally, we discuss the advantage and shortcoming of our approach and future work.
A Data Warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support or OLAP queries. It is important to select the right view to materialize that answer a given set of queries. In this paper, we have addressed and designed algorithm to select a set of views to materialize in order to answer the most queries under the constraint of a given space. The algorithm presented in this paper aim at making out a minimum set of views, by which we can directly respond to as many as possible user’s query requests. We use experiments to demonstrate our approach. The results show that our algorithm works better. We implemented our algorithms and a performance study of the algorithm shows that the proposed algorithm gives a less complexity and higher speeds and feasible expandability.
A data warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support or OLAP queries. It is important to select the right view to materialize that answer a given set of queries. The goal is the minimization of the combination of the query evaluation and view maintenance costs. In this paper, we have addressed and designed algorithms for selecting a set of views to be materialized so that the sum of processing a set of queries and maintaining the materialized views is minimized. We develop an approach using simulated annealing algorithms to solve it. First, we explore simulated annealing algorithms to optimize the selection of materialized views. Then we use experiments to demonstrate our approach. The results show that our algorithm works better. We implemented our algorithms and a performance study of the algorithms shows that the proposed algorithm gives an optimal solution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.