What is the Graph?
A Graph is just a collection of vertices and edges .A property graph has the following characteristics:
- It contains nodes and relationships.
- Nodes contain properties (key-value pairs).
- Relationships are named and directed, and always have a start and end node.
- Relationships can also contain properties.
What is the Graph Database?
Graph Database [1] is a type of NoSQL database that uses graph theory (which is the study of points and lines) to store, map and query relationships Figure1. It is also called a graph-oriented database.Figure1 social graph adapted from [7] |
A graph database management system is an online database management system with Create, Read, Update, and Delete (CRUD) methods that expose a graph data model.
Graph databases are generally built for use with transactional (OLTP) systems.
A graph database is essentially a collection of nodes and edges. Each node represents an entity (such as a person or business) and each edge represents a connection or relationship between two nodes.
Every node in a graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs.
Each edge is defined by a unique identifier, a starting-place and/or ending-place node and a set of properties.
Graph databases are well-suited for analyzing interconnections, which is why there has been a lot of interest in using graph databases to mine data from social media.
Graph databases are also useful for working with data in business disciplines that involve complex relationships and dynamic schema, such as supply chain management, identifying the source of an IP telephony issue and creating "customers who bought this also looked at..." recommendations.
What are the Graph Database types?
Google has its own graph computing system called Pregel (you can find the paper here), but there are several commercial and open source graph databases available. Let's look at a few.Neo4j [4,3]
- This is one of the most popular databases in the category, and one of the only open source options.
- It's Java based but has bindings for other languages, including Ruby and Python.
- It's the product of the company Neo Technologies, which recently moved the community edition of Neo4j from the AGPL license to the GPL license.
- FlockDB was created and open-sourced by Twitter.
- It is a real-time, distributed database
- Twitter's Kevin Weil talked about the creation of the database, along with Twitter's use of other NoSQL databses, at Strange Loop last year.
- It is a graph database built around the W3C spec for the Resource Description Framework.
- It's designed for handling Linked Data and the Semantic Web.
- It supports SPARQL, RDFS++, and Prolog.
- It is a proprietary product of Franz Inc., which markets a number of Semantic Web products - including its flagship set of LISP-based development tools.
- It uses efficient memory utilization in combination with disk-based storage, enabling it to scale to billions of quads while maintaining superior performance.
GraphDB [6,3]
- It is graph database built in .NET by the German company sones in Erfurt and Leipzig..
- It's available as a cloud-service through Amazon S3 or Microsoft Azure.
- It is a simple node.js package designed to ease the process of working with graph databases.
- It provides high level graph operations (create node, link, etc) that are generally common amongst all graph databases.
- It uses connector packages to implement store specific serialization.
- It is a distributed graph database implemented in Java.
- Its goal is to create a graph database with "virtually unlimited scalability.
- It is produced by Objectivity, a company that develops data technologies supporting large-scale, distributed data management, object persistence and relationship analytic.
What are the advantages of the Graph Database?
- It offers an extremely flexible data model.
- The performance tends to remain relatively constant, even as the dataset grows. This is because queries are localized to a portion of the graph.
- The execution time for each query is proportional only to the size of the part of the graph traversed to satisfy that query, rather than the size of the overall graph.
- It expresses and accommodates business needs in a way that enables IT to move at the speed of business.
- You can add new kinds of relationships, new nodes, and new sub-graphs to an existing structure without disturbing existing queries and application functionality.
- The schema-free nature of the graph data model, coupled with the testable nature of a graph database’s application programming interface (API) and query language, empower us to evolve an application in a controlled manner.
sets of disconnected documents/values/columns. This makes it difficult to use them for connected data and graphs. One well-known strategy for adding relationships to such stores is to embed an aggregate’s identifier inside the field belonging to another aggregate—effectively introducing foreign keys. But this requires joining aggregates at the application level, which quickly becomes prohibitively expensive [7].
References
- http://whatis.techtarget.com/definition/
- http://en.wikipedia.org/wiki/InfiniteGraph
- http://readwrite.com/
- http://www.neotechnology.com/
- http://www.franz.com/agraph/allegrograph/
- https://npmjs.org/package/graphdb
- http://graphdatabases.com/
Comments
Post a Comment