- What is Netezza?
Netezza is a datawarehouse appliance in which both h/w and s/w are tightly coupled to deliver best performance to the business.
Appliance is like a black box. For ex: Washing machine
- List various datawarehouse appliances available in the market? (or) who are all Netezza’s competitors?
- Oracle Exadata
- EMC Greenplum
- HP Vertica
- Microsoft Parallel machine.
Netezza (vs) Hadoop:
Hadoop: For Unstructured data. For ex: Someone wishing you on your birthday
Netezza: Structured data. Registering for a facebook account.
- List various appliance components
- SMP Hosts
- Snippet Blades
- Sub components: Disk Enclosures, SPA, SPU, FPGA
S-Blade = SPU
Storage Array 1, Storage Array 2.
Each Storage Array has 4 disk Enclosures. Each Disk Enclosure contains 12 disks of 1TB size.
Each S-Blade owns several disks, which reside in a storage array within the same rack.
Storage Array = Storage Disk = N disk enclosures
Storage Array 1 + Chassis 1 => Snippet Processing Array (SPA) 1
Sorage Array2 + Chassis 2 => SPA 2
4. List few ETL tools?
Informatica, Datastage, Ab intio
5. List few BI tools?
Cognos, OBIEE, SSIS
6. Where system catalog information is stored? In Netezza host or Netezza disks?
For example: when we create a table, table definition is stored in host disks (df -k) and rows/records of this table are stored in Netezza disks
7. What are default userid’s and passwords in Netezza?
root, nz, admin
8. What is clustering in Netezza?
Clustering is nothing but grouping of similar things. In Netezza, we have 2 SMP hosts. These 2 hosts are part of a cluster. In this cluster, one host is active and another host is passive.
9. List few back slash commands in Netezza
Back slash commands: To avoid basic day to day SQL
For example: \du is to display users. \l is to list the database users.
10. What is the difference between “nzsql” and “nzsql -E”
-E ==> To see the underlying query generated by back slash command
11. List 2 default databases in Netezza
MASTER_DB: Is template database; helps us to create new databases faster.
12. What is the extent and page size in Netezza?
Extent size: 3MB
Page size: 128 KB
Note: The above are fixed. we can’t change them.
13. What are various maximums in Netezza?
Columns: 1600 per table or view
Maximum columns per distribution: 4
Max Length of database and column names: 128 bytes
Max number of characters in a char/varchar field: 64,000
Max connections to the server: 2000. Default: 500
Row size: 65,535 bytes
14. List few environment variables in Netezza
15. List constraints in Netezza
Primary Key (Not enforced)
Unique key (Not enforced)
Referential Integrity (Not enforced)
Not Null (Enforced)
Data type verification (Enforced)
16. Can you insert duplicate rows in Netezza?
17. Techniques to improve Netezza appliance performance?
As a developer:
– At design time, have a best distribution key. To avoid data skew
– Use appropriate data types. For example: If INT2 is good, then don’t use INT4. For age, Byteint (Int1) is good enough
18. What is a Snippet?
Snippet is a part of given query. When we submit a query to Netezza, it is divided into multiple snippets. These snippets are executed by Snippet(s)-Blades.
19. What are zonemaps?
Automatically created persistent internal tables. Minimum & Maximum for Integer & Date columns.
Helps Netezza to avoid unnecessary scanning of Extents (3MB)
Till Netezza 6.0: Zonemaps are till only at extent level
In Netezza 7.0, Zonemaps are even added at Page(128KB) level
Zonemaps (vs) Index: Zonemaps are like Anti-Indexes.
Index points to wherever the data is. But, Zonemaps point to where data doesn’t exist.
20. Generate Statistics (vs) Generate Express Statistics (vs) Automatic Statistics
When a user submits a query, Netezza optimizer decides on best execution path for that query.
For example: If I want to travel from Newyork to Chicago, what is my best mode of transportation in the resources(money & Time) available to me.
Generate statistics: Helps Netezza optimizer to know the details of tables such as number of rows, null/not nulls, etc in advance
Generate Statistics = Generate Express Statistics
Automatic Statistics: Automatically generated statistics.
Generate Statistics (Vs) Automatic Statistics:
With respect to sampling. Automatic statistics only gathers using little sampling of data. Where as Generate statistics, takes complete data as sample.
21. What are materialized views and advantages of creating them?
In Netezza, Materialized views are also called as SPM views.
The objective of SPM views is to avoid unnecessary I/O
If we are using only few columns of a major table in most of the queries, we can create a materialized view on these commonly used columns.
For example: create materialized view as select c_custname, c_age, c_address from custinfo order by c_custname.
If anyone fires a select statement to select customer name, the query gets the result from materialized view instead of main table.
The major advantage here is: Amount of data transferred from storage layer to database layer.
Can’t we use View?
View: Still has to go and get data from main table. The advantage with SPM views is: data is ordered
22. List 2 distribution methods in Netezza
Advantage of Hashing: Co-located joins will happen here
23. What is the default distribution method in Netezza?
Hashing on first column
24. How many maximum columns you can use while distributing a table?
25. Can we update column(s) used in distribution key?
Nope. If you have to update, create another table using “CTAS: Create table as … distribution key”
26. How do you change distribution key of already create table?
CTAS: Create table as …
27. What is CTAS? What is the default distribution that you get when you create table using CTAS?
CTAS: Create table as
The default is: whatever we have on original table
28. What is a collocated join?
The joins that might happen on Hashing distributed tables in which we don’t have to distribute data of tables at run time.
29. What is table skew? How do you check table is skewed or not?
Table skew: Un even distribution of data of a table across data slices.
SELECT datasliceid, count(datasliceid) from <tablename> group by datasliceid;
30. What is broadcasting in Netezza? When Netezza uses broadcasting?
FROM customer, sales
Where customer.cid = sales.sid
If both tables in join doesn’t have same distribution key, then one or both tables have to be re-distributed at run time.
The decision of broadcasting is taken by Netezza: if one of the tables is way smaller than other table.