Apache Cassandra CQL3 – Installation and Tests using Python

Installation using PIP

# install python-pip
sudo apt-get install -y python-pip

# install cql driver
sudo pip install cql

Manual Installation from the Sources

# download path
mkdir -p ~/build/python-cql
cd ~/build/python-cql

# download
git clone https://github.com/pcmanus/python-cql
cd python-cql
sudo python setup.py install

Query Example

# create and move to diretory
mkdir -p ~/build/python-cql
cd ~/build/python-cql

# create test file
cat > test.py <<-"_EOF_"

import cql
host, port = "127.0.0.1", 9160
keyspace = "mytest"
con = cql.connect(host, port, cql_version='3.0.0')

print (con)
cursor = con.cursor()

query="DROP KEYSPACE %s;"%keyspace
print(query)
r= cursor.execute(query)
print(r)

query="""
CREATE KEYSPACE %s
  WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};
"""%keyspace
print(query)
r= cursor.execute(query)
print(r)

query="USE %s;"%keyspace
print(query)
r= cursor.execute(query)
print(r)

query="""
CREATE TABLE users (
                nickname text,
                password text,
                gender text,
                email text,
                active int,
                country text,
                date_of_birth timestamp,
                PRIMARY KEY (nickname)
              );
"""
print(query)
r= cursor.execute(query)
print(r)

query="CREATE INDEX country_index ON users (country);"
print(query)
r= cursor.execute(query)
print(r)


query="CREATE INDEX active_index ON users (active);"
print(query)
r= cursor.execute(query)
print(r)


query="""
INSERT INTO users(nickname, email, country, active)
             VALUES ('first', 'first@first123.com', 'UK', 1) ;
"""
print(query)
r= cursor.execute(query)
print(r)


query="""
BEGIN BATCH
INSERT INTO users(nickname, email, country, active) VALUES ('asmith', 'a@aaa.com', 'US', 1) ;
INSERT INTO users(nickname, email, country, active) VALUES ('bsmith', 'b@bbb.com', 'US', 1) ;
APPLY BATCH;
"""
print(query)
r= cursor.execute(query)
print(r)


query="SELECT COUNT(*) FROM users;"
print(query)
r= cursor.execute(query)
print(r)
for row in cursor:  # Iteration is equivalent to lots of fetchone() calls
    print row


query="SELECT * FROM users;"
print(query)
r= cursor.execute(query)
print(r)
for row in cursor:  # Iteration is equivalent to lots of fetchone() calls
    print row

query="SELECT * FROM users where country > 'A' and active=1 ALLOW FILTERING;"
print(query)
r= cursor.execute(query)
print(r)
for row in cursor:  # Iteration is equivalent to lots of fetchone() calls
    print row


query="""

CREATE TABLE timeline (
  nickname text,
  message text,
  created timeuuid,
  ctime timestamp,
  PRIMARY KEY (nickname, created)
) WITH CLUSTERING ORDER BY (created DESC);

"""
print(query)
r= cursor.execute(query)
print(r)

import uuid
key = str(uuid.uuid4())

import datetime
from datetime import datetime
dt=datetime.now().strftime("%Y-%m-%d %H:%M:%S")
query="""
BEGIN BATCH
INSERT INTO timeline(nickname, message, created, ctime) VALUES ('asmith','this message one', now(), '"""+str(dt)+"""' ) ;
INSERT INTO timeline(nickname, message, created, ctime) VALUES ('asmith','this message two', """+str(uuid.uuid1())+""" , '"""+str(dt)+"""' ) ;
APPLY BATCH;
"""
print(query)
r= cursor.execute(query)
print(r)


query="SELECT * FROM timeline;"
print(query)
r= cursor.execute(query)
print(r)
for row in cursor:  # Iteration is equivalent to lots of fetchone() calls
    print row
    print datetime.fromtimestamp(long(row[2]))

# FINISH
cursor.close()
con.close()

"""
import uuid
random_uuid = str(uuid.uuid4()) # anonymous
global_uuid = str(uuid.uuid1()) # machine id coded inside

cursor.execute("CQL QUERY", dict(kw='Foo', kw2='Bar, etc...))
    - cursor.description  # None initially, list of N tuples that represent
                              the N columns in a row after an execute. Only
                              contains type and name info, not values.
    - cursor.rowcount     # -1 initially, N after an execute
    - cursor.arraysize    # variable size of a fetchmany call
    - cursor.fetchone()   # returns  a single row
    - cursor.fetchmany()  # returns  self.arraysize # of rows
    - cursor.fetchall()   # returns  all rows, don't do this.

Query substitution:
 - Use named parameters and a dictionary of names and values.
    e.g. execute("SELECT * FROM CF WHERE name=:name", {"name": "Foo"})
"""

_EOF_


python test.py

Tips

uuid1 vs uuid4

uuid1 vs uuid4
uuid1()
Generates a UUID from a host ID, sequence number, and the current time – guaranteed collision free

uuid4()
Generates a random UUID, can collide, chance extermely small

See Stackoverflow Reference: http://stackoverflow.com/questions/1785503/when-should-i-use-uuid-uuid1-vs-uuid-uuid4-in-python

Troubleshooting

1. InvalidRequestException – Invalid version value: 3.0

Fix: Change CQL Version from “3.0” to “3.0.0” in order to fix the following exception


a@a:~/build/python-cql$ python test.py
Traceback (most recent call last):
  File "test.py", line 7, in 
    con = cql.connect(host, port, cql_version='3.0')
  File "/usr/local/lib/python2.7/dist-packages/cql/connection.py", line 143, in connect
    consistency_level=consistency_level, transport=transport)
  File "/usr/local/lib/python2.7/dist-packages/cql/connection.py", line 59, in __init__
    self.establish_connection()
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 162, in establish_connection
    self.set_cql_version(self.cql_version)
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 165, in set_cql_version
    self.client.set_cql_version(cql_version)
  File "/usr/local/lib/python2.7/dist-packages/cql/cassandra/Cassandra.py", line 1983, in set_cql_version
    self.recv_set_cql_version()
  File "/usr/local/lib/python2.7/dist-packages/cql/cassandra/Cassandra.py", line 2004, in recv_set_cql_version
    raise result.ire
cql.cassandra.ttypes.InvalidRequestException: InvalidRequestException(why='Invalid version value: 3.0 (see http://semver.org/ for details)')

Source: could-not-connect-cql-30

2. Bad Request: line 4:29 mismatched input ‘:’

Fix: Change

WITH strategy_class = 'SimpleStrategy'  AND strategy_options:replication_factor='1';

TO

WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3}

in order to fix the following exception


a@a:~/build/python-cql$ python test.py
ThriftConnection(host='127.0.0.1', port=9160, keyspace=None, conn open)

CREATE KEYSPACE mytest
         WITH strategy_class = 'SimpleStrategy'  AND strategy_options:replication_factor='1';

Traceback (most recent call last):
  File "test.py", line 18, in 
    r= cursor.execute(query)
  File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 80, in execute
    response = self.get_response(prepared_q, cl)
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 77, in get_response
    return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 98, in handle_cql_execution_errors
    raise cql.ProgrammingError("Bad Request: %s" % ire.why)
cql.apivalues.ProgrammingError: Bad Request: line 4:29 mismatched input ':' expecting '='

3. error: cannot assign result of function now

Fix: Upgrade Cassandra from Version 1.2.3 to 1.2.4 to fix error


INSERT INTO timeline(nickname, message, created, ctime) VALUES ('asmith','this message two', now(), 7ed26c4e-a452-11e2-92b6-0022fa7a3a68 ) ;

Traceback (most recent call last):
  File "test.py", line 161, in 
    r= cursor.execute(query)
  File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 80, in execute
    response = self.get_response(prepared_q, cl)
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 77, in get_response
    return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 98, in handle_cql_execution_errors
    raise cql.ProgrammingError("Bad Request: %s" % ire.why)
cql.apivalues.ProgrammingError: Bad Request: Type error: cannot assign result of function now (type timeuuid) to created (type 'org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimeUUIDType)')

Source: https://issues.apache.org/jira/browse/CASSANDRA-5386

Check Cassandra Version using any of the following commands:


nodetool -h localhost version
#ReleaseVersion: 1.2.4

Manually Updgrade Cassandra to 1.2.3
Cassandra version 1.2.4 is currently not yet available in the ubuntu/datax repositories.
A partial upgrade (not safe for production)


#create directories
mkdir -p ~/build/cassandra
cd ~/build/cassandra

# download
version="1.2.4"
wget http://apache.openmirror.de/cassandra/$version/apache-cassandra-$version-bin.tar.gz
tar -xvf apache-cassandra-$version-bin.tar.gz

cd apache-cassandra-$version

# move old jar file to old directory
sudo mkdir -p /usr/share/cassandra/old/
sudo mv /usr/share/cassandra/lib /usr/share/cassandra/old/
sudo mv /usr/share/cassandra/*1.2.3.jar /usr/share/cassandra/old/

# copy new libraries
sudo cp -r lib /usr/share/cassandra/
sudo mv /usr/share/cassandra/lib/apache-cassandra-* /usr/share/cassandra/

# change apache-cassandra link
sudo rm /usr/share/cassandra/apache-cassandra.jar
sudo ln -s /usr/share/cassandra/apache-cassandra-1.2.4.jar /usr/share/cassandra/apache-cassandra.jar

# cassandra
sudo /etc/init.d/cassandra restart

Cannot execute this query as it might involve data filtering

Fix: Change query from

SELECT * FROM users where country > 'A' and active=1;

TO


SELECT * FROM users where country > 'A' and active=1 ALLOW FILTERING;

Source: http://www.datastax.com/docs/1.2/cql_cli/cql/SELECT

Error:


SELECT * FROM users where country > 'A' and active=1;
Traceback (most recent call last):
  File "test.py", line 111, in 
    r= cursor.execute(query)
  File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 80, in execute
    response = self.get_response(prepared_q, cl)
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 77, in get_response
    return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
  File "/usr/local/lib/python2.7/dist-packages/cql/thrifteries.py", line 98, in handle_cql_execution_errors
    raise cql.ProgrammingError("Bad Request: %s" % ire.why)
cql.apivalues.ProgrammingError: Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

References

1. Apache Cassandra Home: http://cassandra.apache.org/
2. python-cql Github Home: https://github.com/pcmanus/python-cql
3. python-cql PIP Home: https://pypi.python.org/pypi/cql
4. CQL Tutorial: http://techdocs.acunu.com/v1.1/html/tutorial.html#using-cql
5. Apache Cassandra 1.2 Documentation(PDF): http://www.datastax.com/doc-source/pdf/cassandra12.pdf
6. Cassandra Query Language Manual: http://cassandra.apache.org/doc/cql3/CQL.html
7. Cassandra, CQL3, and Time Series Data With Timeuuid: http://www.rustyrazorblade.com/2012/10/cassandra-cql3-and-time-series-data-with-timeuuid/
8. CQL3 for Cassandra experts:
9. A thrift to CQL3 upgrade guide: http://www.datastax.com/dev/blog/thrift-to-cql3
10. Understanding the Cassandra data model: http://www.datastax.com/docs/1.2/ddl/index
11. Cassandra Data Modeling Best Practices, Part 1: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
12. PRIMARY KEY’s in CQL: http://planetcassandra.org/blog/post/primary-keys-in-cql
13. CQL3 In Depth: http://www.slideshare.net/yukim/cql3-in-depth
14. CQL and Timeseries Schema: http://stackoverflow.com/questions/13500216/cql-and-timeseries-schema
15. Advanced Time Series with Cassandra: http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
16. Secondary Index ought to have low cardinality: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)