Friday, 19 July 2019

Operations on Keyspace in Cassandra

Cassandra's Keyspace operations

Create Keyspace
A keyspace is an object that holds the column families, user defined types. In Cassandra, Keyspace is similar to RDBMS Database. Keyspace holds column families, indexes, user defined types, data center awareness, strategy used in keyspace, replication factor, etc.
Command "Create Keyspace" is used to create keyspace in Cassandra.
Syntax
Create keyspace KeyspaceName with replicaton={'class':strategy name, 'replication_factor': No of replications on different nodes};
Various Components of Cassandra Keyspace
Strategy: While declaring strategy name in Cassandra. There are two kinds of strategies declared in Cassandra Syntax.
1.Simple Strategy: Simple strategy is used when you have just one data center. In this strategy, the first replica is placed on the node selected by the partitioner. Remaining nodes are placed in the clockwise direction in the ring without considering rack or node location.
2. Network Topology Strategy: Network topology strategy is used when you have more than one data centers. In this strategy, you have to provide replication factor for each data center separately. Network topology strategy places replicas in nodes in the clockwise direction in the same data center. This strategy attempts to place replicas in different racks.
Replication Factor: Replication factor is the number of replicas of data placed on different nodes. For no failure, 3 is good replication factor. More than two replication factor ensures no single point of failure. Sometimes, the server can be down, or network problem can occur, then other replicas provide service with no failure.
Example: Here is the snapshot of the executed command "Create Keyspace" that will create keyspace in Cassandra.
cqlsh> create keyspace University with replication = {'class':'SimpleStrategy', replication_factor':3};
Details of above query : 
Command to create keyspace(create keyspace)
Keyspace Name (University)
Strategy Name (SimpleStrategy)

Replication Factor('3')
After successful execution of command "Create Keyspace", Keyspace University will be created in Cassandra with strategy "SimpleStrategy" and replication factor 3.

Alter Keyspace
Command "Alter Keyspace" alters the replication factor, strategy name and durable writes properties in created keyspace in Cassandra.
Syntax
Alter Keyspace KeyspaceName with replication={'class':'StrategyName', 'replication_factor': no of replications on different nodes} with DURABLE_WRITES=true/false
Key aspects while altering Keyspace in Cassandra
Keyspace Name: Keyspace name cannot be altered in Cassandra.
Strategy Name: Strategy name can be altered by specifying new strategy name.
Replication Factor: Replication factor can be altered by specifying new replication factor.
DURABLE_WRITES :DURABLE_WRITES value can be altered by specifying its value true/false. By default, it is true. If set to false, no updates will be written to the commit log and vice versa.

Example:
cqlsh>Alter Keyspace University with replication={'class':'NetworktopologyStrategy', 'DataCenter1':1};
Details of above query : 
Command to alter keyspace : Alter keyspace
Replication factor changed from 3 to 1 for DataCenter1 : 'DataCenter1':1

Strategy name changed from SimpleStrategy to NetworktopologyStrategy
After successful execution of the command "Alter Keyspace", Strategyname will be changed from 'SimpleStrategy' to 'NetworkTopologyStrategy' and replication factor will be changed from 3 to 1 for 'DataCenter1.'

Drop/Delete Keyspace

Command 'Drop Keyspace' drops keyspace including all the data, column families, user defined types and indexes from Cassandra. Before dropping the keyspace, Cassandra takes a snapshot of the keyspace. If keyspace does not exist in the Cassandra, Cassandra will return an error unless IF EXISTS is used.
Syntax
Drop keyspace KeyspaceName
OR
DROP keyspace [IF EXISTS] KeyspaceName
Example

cqlsh>Drop keyspace University;
After successful execution of the command 'Drop keyspace University', keyspace University will be dropped from Cassandra with all the data and schema.
Below error is returned when tried to access keyspace that does not exist.
cqlsh>use University;
InvalidRequest: code=2200 [Invalid query] message="Keyspace 'university' does not exist"

Note: There is no difference in drop keyspace and delete keyspace. Drop keyspace is equal to delete keyspace.

Wednesday, 17 July 2019

Cassandra Architecture sub-point

Calculating Tokens for a Multi-Data Center Cluster
In multi-data center deployments, replica placement is calculated per data center using the NetworkTopologyStrategy replica placement strategy. In each data center (or replication group) the first replica for a particular row is determined by the token value assigned to a node. Additional replicas in the same data center are placed by walking the ring clockwise until it reaches the first node in another rack.
If you do not calculate partition-er tokens so that the data ranges are evenly distributed for each data center, you could end up with uneven data distribution within a data center. The goal is to ensure that the nodes for each data center are evenly dispersed around the ring, or to calculate tokens for each replication group individually (without conflicting token assignments).

One way to avoid uneven distribution is to calculate tokens for all nodes in the cluster, and then alternate the token assignments so that the nodes for each data center are evenly dispersed around the ring.
Another way to assign tokens in a multi data center cluster is to generate tokens for the nodes in one data center, and then offset those token numbers by 1 for all nodes in the next data center, by 2 for the nodes in the next data center, and so on. This approach is good if you are adding a data center to an established cluster, or if your data centers do not have the same number of nodes.
strategy_options
Specifies configuration options for the chosen replication strategy.
For SimpleStrategy, it specifies replication_factor in the format of replication_factor:number_of_replicas.
For NetworkTopologyStrategy, it specifies the number of replicas per data center in a comma separated list of datacenter_name:number_of_replicas. Note that what you specify for datacenter_name depends on the cluster-configured snitch you are using. There is a correlation between the data center name defined in the keyspace strategy_options and the data center name as recognized by the snitch you are using. The nodetool ring command prints out data center names and rack locations of your nodes if you are not sure what they are.
See Choosing Keyspace Replication Options for guidance on how to best configure replication strategy and strategy options for your cluster.
Setting and updating strategy options with the Cassandra CLI requires a slightly different command syntax than other attributes; note the brackets and curly braces in this example:

[default@unknown] CREATE KEYSPACE test WITH placement_strategy = 'NetworkTopologyStrategy' AND strategy_options=[{us-east:6,us-west:3}];

Choosing Keyspace Replication Options
When you create a keyspace, you must define the replica placement strategy and the number of replicas you want.
DataStax recommends always choosing NetworkTopologyStrategy for both single and multi-data center clusters. It is as easy to use as SimpleStrategy and allows for expansion to multiple data centers in the future, should that become useful. It is much easier to configure the most flexible replication strategy up front, than to reconfigure replication after you have already loaded data into your cluster.
NetworkTopologyStrategy takes as options the number of replicas you want per data center. Even for single data center (or single node) clusters, you can use this replica placement strategy and just define the number of replicas for one data center. For example (using cassandra-cli):
[default@unknown] CREATE KEYSPACE test WITH placement_strategy = 'NetworkTopologyStrategy' AND strategy_options=[{us-east:6}];
Or for a multi-data center cluster:
[default@unknown] CREATE KEYSPACE test WITH placement_strategy = 'NetworkTopologyStrategy' AND strategy_options=[{DC1:6,DC2:6,DC3:3}];
When declaring the keyspace strategy_options, what you name your data centers depends on the snitch you have chosen for your cluster. The data center names must correlate to the snitch you are using in order for replicas to be placed in the correct location.
As a general rule, the number of replicas should not exceed the number of nodes in a replication group. However, it is possible to increase the number of replicas, and then add the desired number of nodes afterwards. When the replication factor exceeds the number of nodes, writes will be rejected, but reads will still be served as long as the desired consistency level can be met.

listen_address

The IP address or hostname that other Cassandra nodes will use to connect to this node. If left blank, you must have hostname resolution correctly configured on all nodes in your cluster so that the hostname resolves to the correct IP address for this node (using /etc/hostname, /etc/hosts or DNS).

Configuring the PropertyFileSnitch
The PropertyFileSnitch requires you to define network details for each node in the cluster in a cassandra-topology.properties configuration file. A sample of this file is located in /etc/cassandra/conf/cassandra.yaml in packaged installations or $CASSANDRA_HOME/conf/cassandra.yaml in binary installations.
Every node in the cluster should be described in this file, and this file should be exactly the same on every node in the cluster if you are using the PropertyFileSnitch.

Architecture of Cassandra

Architecture of Cassandra A Cassandra instance is a collection of independent nodes that are configured together into a cluster. In a C...