전체 글
-
Difference between Hadoop and SparkDistributedSystem/HadoopEcyosystem 2019. 9. 25. 04:26
1. Overview Clarify the difference between Hadoop and Spark 2. Description Difference between Hadoop and Spark Features Hadoop Spark Data processing Only for batch processing Batch processing as well as real-time processing Processing speed Slower than Spark cause of I/O disk latency 100x faster in memory and 10x faster while running on disk Category Data processing engine Data analytics engine ..
-
Term frequency–inverse document frequency(TF-IDF)MLAI/Preprocessing 2019. 9. 25. 01:34
1. Overview A document-term or term-document matrix consists of the frequency of terms that exist in a collection of documents. In the document-term matrix, rows represent documents in the collection and columns represent terms whereas the term-document matrix is the transpose of it. 1.1 Motivation we have a large number of documents: Books Academic Articles Legal Documents Websites etc A user t..
-
Array OperationsDynamicPL/Javascript 2019. 9. 22. 09:05
1. Overview Summarize array operations, such as pop, push, shift, unshift, splice, slice, and split in javascript 2. Description var a = [1, 2, 3]; var b = a.unshift(0); console.log(a); //[0, 1, 2, 3] console.log(b); //4 var a = [1, 2, 3]; var b = a.shift(); console.log(a); //[2, 3] console.log(b); //1 var b = a.shift(2); console.log(a); //[3] console.log(b); //2, only one element is shifted var..
-
MongoDBDB/Nosql 2019. 9. 20. 10:11
1. Overview MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schema. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public License (SSPL) 2. Description 2.1 Ad hoc queries MongoDB supports field, range query, and regular expression searches. Queries can return specific fields of ..
-
Spring SecurityFramework/SPRING 2019. 9. 20. 08:05
1. Overview Spring Security is a separate module of the Spring framework that focuses on providing authentication and authorization methods in Java applications. It also takes care of most of the common security vulnerabilities such as CSRF attacks. To use Spring Security in web applications, you can get started with a simple annotation: @EnableWebSecurity. Spring Security is a powerful and high..
-
Apache SparkDistributedSystem/Spark 2019. 9. 20. 00:55
1. Overview An open-source distributed general-purpose cluster computing framework with mostly in-memory data processing engine that can do ETL, analytics, machine learning, and graph processing on large volumes of data at rest(batch processing) or in motion(streaming processing) with rich concise high-level APIs for the programming languages: Scala, Python, Java, R, and SQL 2. Description 2.1 A..
-
KinesisData Engineering 2019. 9. 20. 00:42
Kinesis Data Stream Real-time Data Stream Retention between 1 day to 365 days Ability to reprocess (replay) data Once data is inserted in Kinesis, it can’t be deleted (immutability) Data that share the same partition goes to the same shard (ordering) Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent Consumers Write your own: Kinesis Client Library (KCL), AWS SDK Managed: AWS Lamb..
-
Transaction ManagementFramework/SPRING 2019. 9. 17. 20:36
1. Overview A database transaction is a sequence of actions that are treated as a single unit of work. These actions should either complete entirely or take no effect at all. Transaction management is an important part of RDBMS-oriented enterprise application to ensure data integrity and consistency. 2. Description 2.1 Core Concepts The following four key properties are the core concept of a tra..