We’re seeing more and more people trying to use Zeppelin on EMR with YARN but having some challenges with the setup. Here’s a tried and true set of instructions:
1. Create EMR cluster with Hadoop / Spark / Zeppelin enabled.
Create EMR cluster using CLI or web console with Hadoop / Spark / Zeppelin enabled.
– Setup your VPC with subnet Auto-assign public IP : yes
– Setup Key pairs to ssh
– Setup your Security group and allow port 8890
After cluster status becomes “Waiting” you’ll able to browse
http://[Master public DNS]:8890
You’ll see Zeppelin running.
2. Enable authentication
SSH to master node. And enable authentication
sudo cp /etc/zeppelin/conf/shiro.ini.template /etc/zeppelin/conf/shiro.ini
Open /etc/zeppelin/conf/shiro.ini with text editor and configure user / password in [users] section.
Alternatively you can setup different authentication method like LDAP, AD, or ZeppelinHub. You can find more details on integrating with ZeppelinHub here.
Once it’s done, restart Zeppelin daemon
sudo -u zeppelin /usr/lib/zeppelin/bin/zeppelin-daemon.sh restart
if you browse url again, then now you’ll see login button.
3. Configure Interpreter
By default one interpreter session is being shared to all users and all notebooks. If you’d prefer provide each user different interpreter session,
Open interpreter menu (http://[Master public DNS]:8890/interpreter) and press ‘edit’ button on ‘Spark’ interpreter.
Configure option as “The interpreter will be instantiated ‘Per User’ in ‘scoped’ process”. and click “Save”.
Alternatively you can choose other combinations. This article may help understand shared/scoped/isolated mode.
4. Configure ACL of your notebook
Once you create each notebook, by default it’s being shared with all users. (set ‘false’ zeppelin.notebook.public property in /etc/zeppelin/conf/zeppelin-site.xml to not share by default)
Top right corner of each notebook, there’s small ‘lock’ icon and that’s where you can configure ACL of notebook.
Each notebook can configure owners, writers, readers.
For further collaboration and sharing of notebooks among different users, you can try ZeppelinHub as well. Hope this was helpful.