Jan 29, 2007
Remote Reboot
Found via the Nagios site:
* Servprise WebReboot - looks really useful if you can't get direct access to your gear and need to physically kick your systems.
In a similar vein:
* Dataprobe iBoot - web enabled power strips and sockets.
Along with a console server and/or an IP KVM to provide remote control (particularly out of band depending on the situation) being able to have some control over when systems go up or down is pretty vital if you don't have convenient access to your kit.
[/tech/ultimate] | [permalink] | [2007.01.29-01:47.00]
Oct 10, 2006
Run Book
A new job brings new challanges. One of the things that helps a new comer get a handle on what does what is a run book (and an up to date LAN / WAN diagram). A Run Book should contain -
- Hostname + Aliases
- Function
- Hardware details (make, model, serial number/tag)
- Hardware config (disks, ram)
- Installed OS + patch level
- Installed applications (if its an application server)
- Special startup/shutdown procedures (if any)
- Location (server room, rack and geography if you have multiple sites)
- Basic change log - eg when important changes were made to the system - you may want to add a simple service history too
- System Owner / Business Owner (eg the responsible systems admin and the person in the business who looks after the application on the box)
A runbook lends itself to a simple database (we used to use a simple Lotus Domino database which worked well) - absolute worst case use a book in the server room or a text file at the root of the system drive on each server to track basic config and change information. Another advantage of a database is that you can age the information and chase updates (eg every 6 months mail the Helpdesk to ensure someone checks the system configuration and updates the run-book details).
The key is to try and keep it as simple as possible while ensuring the vital information is available to admins when they need it. No one likes entering data into an overly complicated tracking system - it ends up actively discouraging use rather than encouraging it. In fact if the run-book can draw upon information already in an asset management system that would save on duplication - or if the asset tracking system can flag systems as 'special' so you can extract the equivalent of a run-book from within the asset database that would be even better.
[/tech/ultimate] | [permalink] | [2006.10.10-22:54.00]
Oct 02, 2006
Server Room Air Conditioning
Dealing with environmental alerts from your server room (it is monitored 24x7 right ?) is a major PITA. A properly designed server room should take into account proper cooling and venting. Unfortunately most people don't have the luxury of designing their server room from scratch and have to deal with adhoc cooling solutions.
We have a secondary server room that runs very very hot (30+ deg C) - luckily theres nothing super critical in there. Some digging has revealed that the 40+ devices in there pump out 70000 BTU. The BTU (British Thermal Unit - the wikipedia article is pretty fascinating - eg 12000 BTU is the amount of heat required to melt a ton of ice in 24hrs) seems to be the de facto standard for measuring server room cooling capability even though its been superceded in the metric world by the Joule.
At the moment the single ceiling mounted unit seems to be capable of handling 30000BTU and its running at 16 deg C. Running it this cold is pretty pointless as it will never achieve that temperature and trying to run at the units maximum capacity 24x7 is pretty unhealthy. We're looking at getting a portable unit in to handle an additional 20000 BTU - it won't handle the total load but it will take some of the strain off the primary unit.
To find out how many BTU's of cooling capacity are required:
* You need to calculate the size (assuming a 2m ceiling) of the room - length x width x 330BTU = heat from space
* You need to calculate the amount of heat generated by each device - total wattage (I guestimated 400w per device which is a little high) x 3.5 = heat from equipment
Then just add up the figures.
You can also figure in heat from windows, lights and people but unless its a big datacenter or the room faces the sun and has large windows its probably not going to be a huge amount - if you do want to work out the extra capacity to allow for these factors take a look at the calculations here.
Note that 1 Watt is 3.4 BTU when you check out cooling system specs - kW seems to be more common in NZ and the UK for cooling systems.
Thats the amount of cooling capacity your server room needs. Don't forget to allow for growth when you add gear and also redundancy in case you have a unit failure. In an old server room we had three wall mounted units - one big and two small; we could take a loss of one of the smaller ones but if the big unit went the temperature skyrocketed pretty quickly.
Also be sure to have good rack placement to provide airflow and ensure your racks have built-in fans to properly vent the heat away from the equipment.
[/tech/ultimate] | [permalink] | [2006.10.02-00:37.00]
Jul 14, 2006
Metis
I had a quick look at a Metis report which led me to think that this is something a lot of organisations could use. This tool (which I suspect is pretty expensive) lets you map enterprise relationships.
At a very simple level (its suprising their aren't some open-source equivalents because the idea is so simple) you can create pools of services, servers and applications and then tie them together. That way you can use it as an asset list of servers, operating systems, applications, services, run book and change management system. When you decide to upgrade servers you can immediately see the list of affected services or if you want to update your application (eg Oracle) you can see the list of affected hardware and services the database upgrade will impact upon.
One of those things that seems so simple but also really really useful for planning and tracking IT resources and services.
[/tech/ultimate] | [permalink] | [2006.07.14-21:26.00]
Apr 06, 2006
Questions to ask a prospective employer . . .
Well I'm sort of job hunting again.
I've decided that when the interviewer asks the "is there anything you'd like to ask us ?" question I'd followup with "its funny you should ask . . ."
A selection from these questions should give you an idea as to how "switched-on" the organisation is with respect to IT service provision (and potentially the head-aches you might have to deal with if you work for them - then again fixing some of the issues might be what the role is all about).
Then again they might just think you're a smart-ass and not hire you as you may come across as a potential troublemaker ;-)
So in no particular order -
- What make / model servers do you use ?
- How old are they ?
- What hardware vendor maintenance do you have for them ?
- How often do you roll over old hardware ?
- Do you have a standard server build document / run book for each system ?
- How well are the IT systems documented ?
- What processes are in place to migrate services with minimal client impact (eg can you migrate your database server without having all the apps that rely upon it falling over) ?
- What critical services need to remain 'up' all the time and what provision is made should one of these fail ?
- What Monitoring do you use ?
- What Backup system do you use & how reliable is it ?
- How is storage provisioned and managed (do they have a SAN/NAS or consolidated storage plan) ?
- What Failover provision do you have for critical services ?
- Do you use data replication for critical information ?
- What services and applications rely on outside vendors or consultants ?
- Are there key services and applications that rely on any single persons expertise to work ?
- What call tracking system is in use ?
- Is there a change control process in place ?
- What is the induction process ?
- Is there a mentoring system to help you settle in and upskill ?
- Is there an on call component ?
- How often are after-hours alerts raised - what proportion of these can be handled remotely vs going onsite ?
- What SLA's are there ?
- What desktop OS do you use ?
- Is it a managed desktop (eg how is remote support, patching, anti-virus, software deployment, auditing handled) ?
- How do you rollout new hardware, software and services to the client community ?
- What access rights do people have and how is this managed ?
- What firewall, web proxy, virus / spam checking system is in place ?
- What remote access, vpn is in place ?
- What technologies do you use for your extranet / intranet ?
- Do you use a CMS (content management system) for your website ?
- Do you use an intranet, information is availabe to staff and who updates its content ?
- Do you use a DMS (document management system) for knowledge management ?
- What groupware system do you use ?
[/tech/ultimate] | [permalink] | [2006.04.06-00:47.00]
Nov 18, 2005
Ultimate Services (Updated 21-Nov-05)
Much like my Ultimate Server Room I'd like to be able to have the following Services (much of this is obviously cribbed from previous workplaces) -
Single Consolidated Authentication - doesn't need to be single-sign-on (although that would be nice) but there should be a single user database feeding into various services for authentication. Remembering multiple ID's & Passwords may aide security but it drives people nuts. Even if the username and password is common that would be sufficient to keep most people sane.
In an ideal world HR would create new user ID's as part of the new hire process and retire ID's when people leave. This can be made reasonably simple via a simple web form so that non-IT staff could keep it up to date and to which others can have access to as a corporate directory. This would be one of those things where IT would put in a lot of up-front development effort but once up and running it should be operable in a very 'hands-off' mode such that IT wouldn't need to be involved in anything as mundane as creating new users :-)
No new services should be implemented unless they can be integrated into the single login/password system.
Unfortunately most enterprise meta-directory services are hideously expensive. For mere mortals this kind of means you have to roll-your-own (eg MySQL, LDAP, lots of scripts) or use a prepackaged solution (eg Active Directory, Services for Unix).
If you have secure external services (eg VPN) then require a separate username and password from internal services.
For Groupware/Email I'd recommend Lotus Notes. Outlook and Exchange may be what most of the world knows and loves but its insecure and a nightmare to admin.
For Web-browsing go with Firefox.
If you can't stretch to a commercial mail system then a good secure IMAP/POP/SMTP system should do the trick with something like Thunderbird as a mail application.
For productivity its hard to go past OpenOffice.
For storage I'd recommend a Document Management System with a web-interface. I personally like SilentOne which is a commercial NZ DMS. I've also had some experience with FileNet's DMS and it seems to do the job but feels overly complicated. Alternatively there seems to be a bunch of Open Source DMS/Content Management Systems in development too.
For desktops home directories and profiles would redirect to server shares (depending on what people are doing of course - if they generate gigabytes of data on their machines theres no point shovelling it backwards and forwards across the LAN) either as folder redirection (Windows) or mount-points (Linux/Unix). This will mean anyone can log into any machine and get their 'stuff'. Laptops are more of a problem - either leave them working locally with some kind of scripted file sync or use the atrocious Offline File Sync tool for Windows (the only Unix/Linux alternatives seem to be rsync or similar tools like unison).
An intranet with a personal web space for each staff member is also fairly vital for any organisation to aide communication and collaboration. Integrating into the DMS would be handy but not absolutely vital. Something like Zope / Plone / Ubuntu would be cool. Whatever is deployed to peoples desks should be standardised and stripped back - minimal extra applications, locked down permissions and centralised management/configuration.
The desktop has to run on hardware - for WindowsXP or Linux then Dell's Optiplex or Latitude line is pretty good and you get the three year onsite warranty which is hard to beat. For MacOS X you can really only go with Apple. As with the servers its handy to have spare RAM and hard-drives to swap for faulty components. For the majority of staff a small-form-factor machine is more than sufficient (who needs all the extra drive bays and pci slots these days unless you're doing a specialist task ?).
There is the Bastard Operator From Hell part of my brain that wonders if a number of these services couldn't be run either via the web and/or terminal only to obviate the need to install anything at all on a desktop.
Many years ago I read of a Travel Agency that bought up a large number of Macintosh LC class machines (the old Pizza-box LC I, II and III) and installed the PDS slot ethernet card. Running OS 7.5 or thereabouts with everything accessible through a terminal back at the main office Unix server (eg email and travel bookings, finance system etc) and with ClarisWorks for basic productivity. Cheap easy to replace hardware, minimal security problems and no virus issues. I bet they moved to more capable machines and acquired a raft of additional support issues.
[/tech/ultimate] | [permalink] | [2005.11.18-03:53.00]
Nov 16, 2005
Ultimate Server Room (Updated 06-July-06)
Just a place to dump information on what I would consider the 'Ultimate Server Room' setup that should be something most mid-range IT-shops should be aiming for. Most of this information comes direct from previous workplaces.
Decent UPS (Uninterruptable Power Supply) that covers the room and supply (not individual machines on a server or rack basis) - with a mains cutover switch should the UPS fail. This should provide at least 90 minutes of power to core servers (firewall, dns, dhcp, email, external internet presence, switches and routers). Having a couple of spare small UPS units on charge is always handy so if need be you can run a PC/Laptop off it if theres an extended outage.
Ability to remotely power-cycle hardware. Something like these products on offer from 42U. Tied into this is the use of KVM over IP and ILO (Integrated Lights Out) - which lets you get console access via the network (usually you point a web browser to the ILO NIC and you'll see the console). All of these tools let you get to your server remotely.
Servers with dual-power supply, dual-nic's (onboard not on a riser card) and RAID arrays (onboard not on a riser) for redundancy. I'm not a huge brand nut but Dell seem to be pretty good value for the price and their incident management based on service tag + extended warranty options are pretty good too. Ideally with kvm ports in the right locations (eg on the front and back like these Petabox systems. For x86 based machines I can't see the point in opting for more expensive Compaq/IBM gear (and their product serial numbers are way to long ;-)
Spare parts for servers - having a spare hotswap power supply (a nice Dell feature), RAID disk's, NIC's. These may be a luxury but when a disk or power supply goes down being able to plug in a replacement immediately buys you valuable time while you arrange for warranty support. Also if you figure the spares into new purchases you'll have the right parts for the right machines (and if you have a good relationship with your account manager you can often wangle this stuff for free).
Redundant server switch backbone for the dual teemed NIC's (eg each NIC goes into a separate switch port and they're teemed to one IP address).
Redundancy for core internal services. DNS, DHCP, NIS, Domain Controller, Active Directory sitting ready to go on a secondary machine (Windows or Unix based depending on what the function is). Low volume services such as these should be consolidated onto a single server with a secondary slave available. A spare system with everything installed and with plenty of disk space should be available to pick up the slack should a core service that isn't replicated fail (eg file, print, mail, intranet) - a nightly rsync/robocopy should keep the data on this system up to date and act as a handy backup if there is a problem with the tape backup.
I really enjoyed this article over at Adminfoo - Why I'm Not a SAN Fan. I kind of agree - the vendor lockin aspect is a little scarey. Granted at some point you'll hit the limitations of direct-attached (difficult to expand and a pain to backup) or NAS storage (limited by the transport medium for throughput - TCP/IP packet size) but until then its best to consider all the pro's and con's of a SAN very carefully.
A business-critical service such as email should where possible be clustered (Lotus Notes / Domino handles this quite well). File and print services can usually be cut across manually fairly easily if need be. External/DMZ services (firewall, mail relay, ftp, web) may be clustered or have a failover (many organisations will have a secondary internet link / firewall should the primary one fail at a different site).
Tape based backup with offsite storage - SuperDLT's seem to be pretty reliable and fast nowadays with plenty of capacity.
Nice Comms racks - Chatsworth make excellent ones.
Server racks that slot together and have sensible rail mount systems that don't require cage-nuts. Dell racks work this way and are much simpler to manage.
False floor with lift-up panels to hide away cable runs between racks.
Suitable airconditioning and ventilation with fire-suppression and remote environment monitoring.
Suitable alarmed security (access card, key, combination lock etc).
And if the resources were available a full DR setup offsite (at least for core services) :-)
All server room details would be maintained in a run-book with information pertaining to each server, its hardware details, serial numbers, tags, age, support-status, purpose, services, maintenance history (hardware/software) etc.
Monitoring of the network and server / service status. Lots of excellent Open Source tools for this - Cactii / MRTG, Nagios / Big Brother, ntop etc etc. Some form of paging or notification when services fall over is a must.
I'm sure I'll have more to add as I think of things. Next I'll create an 'Ultimate Services' entry for the software side of things I think ...
[/tech/ultimate] | [permalink] | [2005.11.16-04:29.00]